5,827 Matching Annotations
  1. Aug 2025
    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      This manuscript from Jones and colleagues investigates a previously described phenomenon in which P. falciparum malaria parasites display increased trafficking of proteins displayed on the surface of infected RBCs, as well as increased cytoadherence in response to febrile temperatures. While this parasite response was previously described, it was not uniformly accepted, and conflicting reports can be found in the literature. This variability likely arises due to differences in the methods employed and the degree of temperature increase to which the parasites were exposed. Here, the authors are very careful to employ a temperature shift that likely reflects what is happening in infected humans and that they demonstrate is not detrimental to parasite viability or replication. In addition, they go on to investigate what steps in protein trafficking are affected by exposure to increased temperature and show that the effect is not specific to PfEMP1 but rather likely affects all transmembrane domain-containing proteins that are trafficked to the RBC. They also detect increased rates of phosphorylation of trafficked proteins, consistent with overall increased protein export.

      Strengths:

      The authors used a relatively mild increase in temperature (39 degrees), which they demonstrate is not detrimental to parasite viability or replication. This enabled them to avoid potential complications of a more severe heat shock that might have affected previously published studies. They employed a clever method of fractionation of RBCs infected with a var2csa-nanoluc fusion protein expressing parasite line to determine which step in the export pathway was likely accelerating in response to increased temperature. This enabled them to determine that export across the PVM is being affected. They also explored changes in phosphorylation of exported proteins and demonstrated that the effect is not limited to PfEMP1 but appears to affect numerous (or potentially all) exported transmembrane domain-containing proteins.

      Weaknesses:

      All the experiments investigating changes resulting from increased temperature were conducted after an increase in temperature from 16 to 24 hours, with sampling or assays conducted at the 24 hr mark. While this provided consistency throughout the study, this is a time point relatively early in the export of proteins to the RBC surface, as shown in Figure 1E. At 24 hrs, only approximately 50% of wildtype parasites are positive for PfEMP1, while at 32 hrs this approaches 80%. Since the authors only checked the effect of heat stress at 24 hrs, it is not possible to determine if the changes they observe reflect an overall increase in protein trafficking or instead a shift to earlier (or an accelerated) trafficking. In other words, if a second time point had been considered (for example, 32 hrs or later), would the parasites grown in the absence of heat stress catch up?

      We did not assess cytoadhesion at later stages, but in the supplementary figures we show that at 40 hours post infection both heat stress and control conditions have comparable proportions of VAR2CSA-positive iRBCs, whilst they differ at 24h. This is true for the DMSO (control wildtype resembling) HA-tagged lines of HSP70x and PF3D7_072500 (Supplementary Figures 9 and 12 respectively). In the light that protein levels appear not changed, we conclude that trafficking is accelerated during these earlier timepoints, but remains comparable at later stages. This would still increase the overall bound parasite mass as parasites start to adhere earlier during or after a heat stress.

      Reviewer #2 (Public review):

      This manuscript describes experiments characterising how malaria parasites respond to physiologically relevant heat-shock conditions. The authors show, quite convincingly, that moderate heat-shock appears to increase cytoadherance, likely by increasing trafficking of surface proteins involved in this process.

      While generally of a high quality and including a lot of data, I have a few small questions and comments, mainly regarding data interpretation.

      (1) The authors use sorbitol lysis as a proxy for trafficking of PSAC components. This is a very roundabout way of doing things and does not, I think, really show what they claim. There could be a myriad of other reasons for this increased activity (indeed, the authors note potential PSAC activation under these conditions). One further reason could be a difference in the membrane stability following heat shock, which may affect sorbitol uptake, or the fragility of the erythrocytes to hypotonic shock. I really suggest that the authors stick to what they show (increased PSAC) without trying to use this as evidence for increased trafficking of a number of non-specified proteins that they cannot follow directly.

      This is a valid point, however, uninfected RBCs do not lyse following heat stress, nor do much younger iRBCs, indicating that the observed effect is specific to infected RBCs at a defined stage. The sorbitol sensitivity assay is performed at 37°C under normal conditions after cells are returned to non–heat stress temperatures, so the effect is not due to transient changes in membrane permeability at elevated temperature. 

      Planned experiment: However, to increase the strength of our conclusions and further test our hypothesis, we will perform sorbitol sensitivity assays on >20 hours post infection iRBCs following heat stress in the presence and absence of furosemide, a PSAC inhibitor. If iRBC lysis is abolished with furosemide present, this would confirm that the effect is PSAC-dependent. However, the effect could also possibly be due to altered PSAC activity during heat stress which is maintained at lower temperatures, as outlined in the discussion.

      (2) Supplementary Figure 6C/D: The KAHRP signal does not look like it should. In fact, it doesn't look like anything specific. The HSP70-X signal is also blurry and overexposed. These pictures cannot be used to justify the authors' statements about a lack of colocalisation in any way.

      Planned experiment: We agree that the IFAs are not the best as presented and will include better quality supplementary images in a revised version.

      (3) Figure 6: This experiment confuses me. The authors purport to fractionate proteins using differential lysis, but the proteins they detect are supposed to be transmembrane proteins and thus should always be found associated with the pellet, whether lysis is done using equinatoxin or saponin. Have they discovered a currently unknown trafficking pathway to tell us about? Whilst there is a lot of discussion about the trafficking pathways for TM proteins through the host cell, a number of studies have shown that these proteins are generally found in a membrane-bound state. The authors should elaborate, or choose an experiment that is capable of showing compartment-specific localisation of membrane-bound proteins (protease protection, for example).

      We do not believe we identified a novel trafficking pathway, but that we capture trafficking intermediates of PfEMP1 between the PVM and the RBC periphery, in either small vesicles, and/ or possibly Maurer’s clefts. These would still be membrane embedded, but because of their small size, not be pelleted using the centrifugation speeds in our study (we did not use ultracentrifugation). This explanation, we believe, is in line with the current hypothesis of PfEMP1 and other exported TMD protein trafficking to the periphery or the Maurer’s clefts.

      (4) The red blood cell contains, in addition to HSP70-X, a number of human HSPs (HSP70 and HSP90 are significant in this current case). As the name suggests, these proteins non-specifically shield exposed hydrophobic domains revealed upon partial protein unfolding following thermal insult. I would thus have expected to find significantly more enrichment following heat shock, but this is not the case. Is it possible that the physiological heat shock conditions used in this current study are not high enough to cause a real heat shock?

      As noted by the reviewer, we do not see enrichment of red blood cell heat shock proteins following heat stress, either with FIKK10.2-TurboID or in the phosphoproteome. We used a physiologically relevant heat stress that significantly modifies the iRBC, as shown by our functional assays. While a higher temperature might induce an association of red blood cell heat shock proteins, such conditions may not accurately reflect the most commonly found context of malaria infection.

      Reviewer #3 (Public review):

      Summary:

      In this paper, it is established that high fever-like 39 C temperatures cause parasite-infected red blood cells to become stickier. It is thought that high temperatures might help the spleen to destroy parasite-infected cells, and they become stickier in order to remain trapped in blood vessels, so they stop passing through the spleen.

      Strengths:

      The strength of this research is that it shows that fever-like temperatures can cause parasite-infected red blood cells to stick to surfaces designed to mimic the walls of small blood vessels. In a natural infection, this would cause parasite-infected red blood cells to stop circulating through the spleen, where the parasites would be destroyed by the immune system. It is thought that fevers could lead to infected red blood cells becoming stiffer and therefore more easily destroyed in the spleen. Parasites respond to fevers by making their red blood cells stickier, so they stop flowing around the body and into the spleen. The experiments here prove that fever temperatures increase the export of Velcro-like sticky proteins onto the surface of the infected red blood cells and are very thorough and convincing.

      Weaknesses:

      A minor weakness of the paper is that the effects of fever on the stiffness of infected red blood cells were not measured. This can be easily done in the laboratory by measuring how the passage of infected red blood cells through a bed of tiny metal balls is delayed under fever-like temperatures.

      Previous work by Marinkovic et al. (cited in this manuscript) reported that all RBCs, both infected and uninfected, increase in stiffness at 41 °C compared with 37 °C, with trophozoites and schizonts exhibiting a particularly pronounced increase. We agree that it would be interesting to determine whether similar changes occur at physiological fever-like temperatures, and whether this increase in stiffness coincides with the period of elevated protein trafficking. However, since we have already demonstrated enhanced protein export using multiple complementary approaches, we have chosen to address these questions in a follow-up study.

    1. Author response:

      The following is the authors’ response to the original reviews

      We would like to express our sincere gratitude to the reviewers for their thorough analysis of the manuscript and their extremely helpful comments. We have taken all the suggestions into consideration and conducted a range of additional experiments to address the points raised. We have also extensively revised the manuscript to clarify descriptions, correct inaccuracies and remove inconsistencies. We have modified the figures for clarity and content.

      Overall, we expanded the description of the EBH structure to emphasise its dimeric nature and the impact of the two binding sites on interpreting the binding data, including cooperativity. Using ITC, we tested the effect of the pre-SxIP residues on the binding affinity with additional peptides. We found that these residues had a significant effect, albeit much smaller than that of the post-SxIP residues. We analysed the binding of the 11MACF-VLL mutant with EBH-ΔC and evaluated the exchange rates. In agreement with our model, we found that the EBH affinity for the SxIP peptide from CK5P2 (KKSRLPRILIKRSR), which has a C-terminal sequence similar to that of the 11MACF-VLLRK mutant, is 21nM, which is similar to the affinity of the mutant itself. This demonstrates the significant variation in affinity observed among natural SxIP ligands, as predicted by our study. Our responses to the specific points raised by the reviewers are provided below.

      Reviewer #1 (Public Review):

      There is no direct experimental evidence for independent dock and lock steps. The model is certainly plausible given their structural data, but all titration and CEST measurements are fully consistent with a simple one-step binding mechanism. Indeed, it is acknowledged that the results for the VLL peptide are not consistent with the predictions of this model, as affinity and dissociation rates do not co-vary. The model may still be a helpful way to interpret and discuss their results, and may indeed be the correct mechanism, but this has not yet been proven.

      Unfortunately, it is not possible to obtain direct experimental evidence because the folding of the C-terminus is too fast to influence the NMR parameters. However, as the reviewer pointed out, our structural data support the two-step model, since folding of the C-terminus is only possible once the ligand containing the post-SxIP residues has bound. By adopting a mechanistically supported model, we can analyse the contributions to binding and relate them to the structural characteristics of the complex. This provides a clearer insight into the roles of the various regions in the interaction and allows to modify them rationally to enhance the ligand affinity.

      In the revised version, we restate the equations in terms of comparing the on-rates. This provides a clearer view of the effect of the additional stage, which cannot increase the overall on-rate since the two stages are sequential. If the forward rate of the second stage is comparable to or slower than the off-rate of the first stage, the overall on-rate decreases. Conversely, if the forward rate is much faster, the overall on-rate remains unchanged. For the wild-type 11MACF peptide, we observed that the presence of the EBH C-terminus does not affect the on-rate of binding, which is in perfect agreement with the two-step model and indicates that the C-terminus folds very quickly.

      Additionally, we evaluated the binding of the 11MACF-VLL mutant to EBH-ΔC and observed a twofold decrease in Kd compared to WT 11MAC, primarily due to an increase in the on-rate. Interestingly, this rate is approximately twice as low as the overall on-rate for EBH/11MACF-VLL binding, contradicting the sequential two-step model. This suggests a more complex binding process where binding is accelerated by additional hydrophobic interactions with the unfolded C-terminus. However, given the difficulty of quantifying very slow exchange rates, it is more likely that the discrepancy is due to the accuracy of the rate measurements. Therefore, the model allows the rational analysis of changes in binding parameters due to mutations.

      There is little discussion of the fact that binding occurs to EBH dimers -  either in terms of the functional significance of this or in the  acquisition and analysis of their data. There is no discussion of  cooperation in binding (or its absence), either in the analysis of NMR  titrations or in ITC measurements. Complete ITC fit results have not  been reported so it is not possible to evaluate this for oneself.

      We added information about the dimer to the introduction, emphasising its role in enhancing interaction with microtubules (MTs) and its structural role in SxIP binding. The ITC data do not exhibit any biphasic behaviour and can be fitted to a single-site model with 1:1 stoichiometry relative to the EB1c monomer. This corresponds to two independent binding sites in the dimer. We have added the stoichiometry to Table 1 and the description. The NMR titration data for the 11MACF and 11MACF-VLL interactions were fitted to the TITAN dimer model, which includes cooperativity parameters. For WT 11MACF, both cooperativity parameters were zero, corresponding to independent binding sites in the ITC model. For 11MACF-VLL, the fitting suggests weak negative cooperativity, with a ~3-fold increase in Kd for binding to the second site and no change in the off-rate. This difference in Kd is likely to be too small to induce a biphasic shape to the ITC curve. As the cooperativity effect on the NMR spectra is small and absent in the ITC, we used the independent sites model for data analysis, as there is insufficient justification for introducing extra parameters into the model. Crucially, fitting to this model did not alter the off-rate value obtained by NMR or affect the conclusions. We added a description of cooperativity to the results and discussion.

      Three peptides are used to examine the role of C-terminal residues in SxIP motifs: 4-MACF (SKIP), 6-MACF (SKIPTP), and 11-MACF (KPSKIPTPQRK). The 11-mer demonstrates the strongest binding, but this has added residues to the N-terminal as well. It has also introduced charges at both termini, further complicating the interpretation of changes in binding affinities. Given this, I do not believe the authors can reasonably attribute increased affinities solely to post-SxIP residues.

      We tested the 9MACF peptide SKIPTPQRK, which has the same N-terminus as the 4- and 6-MACF peptides, and found that its binding affinity is ~10-fold weaker than that of 11MACF. This demonstrates the contribution of both the pre- and post-SxIP residues. This is likely due to electrostatic interactions between the positively charged N-terminus and the negatively charged EBH surface, similar to those involving the positive charges at the peptide C-terminus. Although significant, the contribution of the N-terminal peptide region is approximately one order of magnitude lower than that of the post-SxIP residues, meaning the post-SxIP region is the main affinity modulator. We have added the binding data on 9MACF and a discussion of the contributions to the manuscript.

      Experimental uncertainties are, with exceptions, not reported.

      Uncertainties added to the number in Table 1 and the text. Information on how uncertainties were calculated added to Table 1.

      Reviewer #1 (Recommendations For The Authors):

      (1) Have you tested the binding of the WT dimer in your cell model?

      We haven’t tested the WT dimer because it has already been reported in the 2009 Cell paper by Honappa et al. In the cell experiments, our main focus was on recruiting the high-affinity mutant to MTs. The low level of recruitment, despite the mutant's high affinity, highlights the importance of dimerisation or additional contributions to binding.

      (2) Please deposit all NMR dynamics measurements (relaxation rates and derived model-free parameters) alongside structural data in the BMRB.

      The relaxation data have been submitted to BMRB, IDs 53187 and 53188

      (3) Please report complete fitting results, e.g. for ITC, including stoichiometries. Clarify what this means for binding to a dimer, and if there is any evidence of cooperativity. Figure 3C, right hand panel, shows an unusual stoichiometry, can the authors comment on this?

      We have added more information on stoichiometry and cooperativity; please refer to our response to the above comment for details. We repeated the titration for the VLLRK mutant using fresh peptide stock. As expected, the stoichiometry was close to 1:1 relative to the EB1c monomer. The new data are now included in the table and figure.

      (4) Please report uncertainties for all measurements of Kd, koff, kon, ∆G, ∆H, ∆S, and explain whether these are determined from statistical analysis, technical or biological repeats (and where reported, clarify between standard deviation/standard error). Please also be aware of standard guidelines for reporting significant figures for data with uncertainties, as these have not been followed in Table 1.

      Uncertainties added to the number in Table 1 and the text. Information on how uncertainties were calculated added to Table 1.

      (5) The construct design for the cell model is unclear - given the importance of flanking residues, please report and discuss how the sequences are attached to venus: which termini is attached, and what is the linker composition?

      We cloned the peptides at the C-terminus of mTFP, after the GS linker of the vector. The peptide itself contains a GS sequence at the N-terminus, creating a highly flexible GSGS linker that separates the SxIP region from mTFP and minimises the potential effect of mTFP on binding. We followed the design of Honappa et al. to enable direct comparison with the published results. We have added this information to the 'Methods' section..

      (6) Which HSQC pulse sequence was used for 2D lineshape analysis? The authors mention non-linear chemical shift changes, presumably associated with the dimer interface - this would be useful to expand upon and clarify.

      For the lineshape analysis, we used the standard Bruker sequence hsqcfpf3gpphwg with soft-pulse watergate water suppression and flip-back. This sequence is included in the TITAN model. We added the description of the non-linear chemical shift changes and connection of these changes to the allosteric effect of the binding to the supplementary information describing details of the lineshape analysis.

      (7) Figure 1A could usefully highlight the dimer interface in the surface representation also.

      We believe that including the interface would make the figure too complicated. The dimer configuration is shown in different colours for the two subunits, clearly demonstrating their involvement in forming the binding site.

      (8) Figures 1C and 1D could usefully show a secondary structure schematic to assist the reader. The x-axis in these figures is not linear and this should be corrected. The calculation of combined chemical shift perturbations should be described.

      Thank you for the helpful suggestion. We changed the scale of the figures and added the diagram of the secondary structure.

      (9) Units are missing from many figure axes.

      We added missing units to the axes. Thank you for highlighting this.

      (10) What peptide concentrations are used in Figure 1C? Presumably, these should be reported at saturation for this to be a fair comparison, this should be clarified.

      The protein concentration was 50 µM. Peptides 4MACF and 6MACF were added at a 100-fold molar excess and peptide 11MACF was added at a 4-fold excess. Saturation was achieved for 11MACF. This was impossible for the short peptides due to their mM affinity. This information has been added to the figure legend. The figure's main aim is to illustrate the differences in the chemical shift perturbation profiles, which can be achieved even if full saturation is not attained. Although the absolute value of the chemical shifts is proportional to the degree of saturation, the distribution of the largest chemical shift changes is independent of this degree. Therefore, we can draw conclusions about the distribution of changes by comparing under non-saturation conditions.

      (11) The presentation of raw peak intensities in Figure 1D shows primarily the flexibility of the C-terminal region associated with high intensities. Beyond this, when comparing the binding of peptides it would be much more informative to show relative peak intensities. Residues around 210-225 appear to show strong broadening in the presence of peptide, but this is masked by the low initial intensity. Can the authors clarify and discuss this? Also, what peptide concentrations were used for this comparison? For a fair comparison, it should be close to saturation - particularly to exclude exchange broadening contributions.

      The protein concentration was 50 µM. 6MACF and 6MACF peptides were added at a 100-fold excess and 11MACF at a 4-fold excess. Saturation was achieved for 11MACF. This was impossible to achieve for the short peptide due to its mM affinity. This information has been added to the figure legend. Upon checking the data, we found a small systematic offset in the coiled-coil region of some of the complexes, as the integral intensity had been used in the initial plot. While this does not change the conclusion regarding the high dynamics of the C-terminus, it does create an inaccurate perception of the relative intensities of the folded regions in the different complexes, as noted by the reviewer. We have now plotted the amplitudes at the maximum of the peaks, which do not exhibit any systematic offset as they are much less susceptible to baseline distortions. We are grateful to the reviewer for highlighting this apparent discrepancy.

      (12) Figure 2 - the scale for S2 order parameters appears to be backwards, given the caption, but its range should be indicated. Similarly, the range of values for Rex should also be indicated. These data should also be tabulated/plotted in supporting information.

      We have corrected the figure legend and added S2 and Rex plots to the supplementary material. The figure aims to highlight regions of increased mobility, while the plots provide full quantitative information on the values. We thank the reviewer for pointing out the error in the figure legend and for the suggestions regarding the plots.

      (13) The scale in Figure 3B is illegible. Indeed, the whole structure is quite small and could usefully be expanded.

      We increased the size of the structure panels and added a scale.

      (14) Figure 4 does not show a decrease in exchange rates, as per the caption - no comparison of exchange rates is shown, only thermodynamic information in panel E. Panel C shows CEST measurements, but it is not clear what system this is for - please clarify, and consider showing the comparable data for the ∆C construct for comparison.

      We have amended the figure legend to clarify that the figure shows binding parameters. We added information about the CEST profiles for the EBH/11MACF interaction to the figure legend (Figure 4C). Exchange with the ∆C construct is too fast for CEST measurements. We used lineshape analysis to evaluate the exchange rates for this construct.

      (15) The schematics shown in Figure 4D, and elsewhere, are really quite difficult to understand. They may pose additional challenges to colourblind readers. Please consider ways that this could be clarified.

      We simplified the colour scheme in the model to make the colours easier to see and to highlight SxIP and non-SxIP regions. We believe that this improved the clarity of the figure.

      (16) Figures S1D/E - the x-axes are unclear and units are missing from the y-axes.

      We re-labelled the axes to clarify the scale and units. Thank you for pointing this.

      Reviewer #2 (Public Review):

      The C-terminal tail of EB1, which is adjacent to EBH and is not analyzed in this study, is highly acidic and plays an important role in protein interactions. If the authors discuss the C-terminus of EB1, they should analyze the whole C-terminus of EB1, which would strengthen the conclusion they have made.

      Honapa et al., Cell, 2009, reported chemical shift perturbations (CSPs) on the peptide binding for the full EB1c fragment, which includes the negatively charged C-terminus. Similar to our study, they observed significant CSPs in the FVIP region but negligible CSPs at the negatively charged EEY end. They concluded that the final eight EB1c residues did not contribute to binding and used a truncated EB1c construct for their structural analysis. Building on that study, we used the same EEY-truncated construct to analyse the contribution of the C-terminus in more detail. We believe that conducting additional experiments with the full C-terminus with respect to SxIP binding would be superfluous, as it would merely replicate the findings of Honapa EA. We have added the rationale for selecting the truncated EB1c construct to the text, referencing Honapa et al.

      Reviewer #2 (Recommendations For The Authors):

      (1) Figure 2C: The authors can analyze the 11MACF peptide as well, to provide more assurance to their argument. It would be easier to distinguish the sequences of "SKIP" and "FVIP" by changing their colors.

      Our relaxation analysis (Fig. 2C) focuses on the dynamics of the unstructured C-terminal region in both the free and complex forms. Further relaxation analysis of the peptide would not provide additional information on this, and would be complicated by the presence of free peptide in solution.

      (2) Figure 3B: Acidic residues in EBH should be labeled.<br /> Page 6, line 11: If the authors insist that the acidic patch will influence the interactions between EB1 and the peptide, the data of the analysis using the entire EB1 C-terminus should be included, given that the C-terminal tail of EB1 is highly acidic.

      To test the contribution of charge to binding, we conducted an ITC experiment at increasing salt concentrations. We observed a significant increase in Kd values when the concentration of NaCl increased from 50 to 150 mM, which supports our conclusion regarding the significant electrostatic contribution. This conclusion is independent of the presence or absence of the C-terminus.

      As we explained earlier, Honapa et al., Cell 2009, conducted an NMR experiment on the full EB1c and observed no CPSs in the EEY region, indicating a negligible contribution from the EEY region to SxIP binding. Therefore, we think that additional experiments involving the entire C-terminus are unnecessary, as they would simply replicate the results of Honapa et al. We have added the rationale for selecting the truncated EB1c to the text, referencing Honapa et al.

      It would be very difficult to label the acidic residues without enlarging 3B considerably. However, we do not think this is necessary as we are not discussing any specific residues. The current figure shows the distribution of the surface charge, which is sufficient for our purposes.

      (3) Figure 2B (Page 4, line 27): The side chain of S5477 should be drawn. The authors should include a figure of the crystal structure of EBH and SxIP as a comparison (Honnappa et al., Cell, 2009). In their paper, Honnappa et al. performed chemical shift perturbation titrations by NMR. From their analysis, I imagine that the EB1 tail may not be critical for the EB1 C-terminus:SxIP interactions, since the signals in the tail are not significantly perturbed. The authors should cite this paper.

      We are grateful to the reviewer for highlighting this. CSP analysis of the Honapa EA revealed significant changes in the FVIP region, which we also observed. They also reported negligible CSPs at the EEY end, demonstrating that this part of the tail is non-critical and can be removed. We have added text to the manuscript to highlight the similarity between CSPs and those observed in Honapa EA. Figure 2B shows the side chains for the residues with the strongest detected contacts. These do not include S5477.

      (4) Figure 3C (ITC data): The stoichiometric ratios in the ITC data look strange. EBH vs KPSKIPVLLRKRK, is it 1:1?

      We repeated the ITC experiments using a new stock of the peptide and a new batch of the protein, checking the concentrations using UV spectroscopy. The new experiments produced a stoichiometry close to 1, as shown in the table.

      (5) Page 10, line 27: "The TPQ sequence of 11MACF is not optimal...": What is the meaning of "optimal"? The transient interaction between EB1 and its binding partner is responsible for the dynamics of the microtubule cytoskeleton. In a sense, the relatively weak interaction is "optimal" for the system. The authors should rephrase the word.

      We agree that weak interactions are optimal from a functional perspective, as they have been selected through evolution. In our case, 'optimal' refers to the hydrophobic interaction with the C-terminus. We replaced 'optimal' with 'ideal' to draw more attention to the second part of the sentence, which clarifies the context.

      (6) Page 11, line 2: "small number of comets enriched in the peptide that were too faint for the quantitative analysis, comparable to the reported previously (Honnappa, Gouveia et al. 2009)." Honnappa et al. used EGFP-fusion constructs in their study: EGFP forms a weak dimer, which presumably gave different results from the authors' mTFP-constructs. The authors can note this point in the text.

      We are grateful to the reviewer for highlighting this. This aligns well with our conclusion that dimerisation is important for localisation to comets. We have added this point to the text.

      (7) Page 10, line 21: The authors calculate the free energy of complex formation between EBH and MACF peptide and explain in the text, but it is hard to follow.

      We simplified and clarified the description of the energy contributions by focusing on the SxIP and non-SxIP regions of the peptide, as well as the EBH C-terminus.

      Minor points:

      Page 2, line 9: IP motifs are not usually located in the C-terminus. For example, SxIP in Tastin is located in the N-terminal region, and SxIPs in CLASP are in the middle.

      We corrected this statement, removing C-terminal.

      Page 3, line 4: The authors should note the residue numbers of SKIP.

      We think that in this context the residue number of the SxIP region are not important and would be distracting.

      Figure 3D and Figure S3F: Make the colors and the order the same between the two figures.

      We changed the colour scheme and the order of ITC parameters in S3F to match the main figure.

      Figure 1A, 2B, Figure S5: Change the color of SKIP from other residues in the same chain, otherwise the readers cannot distinguish. Likewise, change the color of FVIP in Figure 2B.

      We think that changing the colours will complicate the figures unnecessary. The corresponding residues are clearly labelled in the figures.

      Figure 3, Figure S5, S6, S7: Box the letters of SKIP for clarity.

      We boxed the SxIP region in S5 (new S6) and underlined in S6 (new S7). In S7 (new S8) the location of SxIP is very clear from the homology.

      Figure 3B; Figure S2: Hard to recognize the peptide (MACF in green).

      We increased the size of 3D and S2, making it easier to see the peptide.

      Figure 1C and D: Make the residual numbers of the x-axes the same between the two graphs.

      We made new plots with a linear scale for the residue numbers.

      Figure 2A: The structures shown are not EB1. It should be described as EBH or EB1(191-260 a.a.).

      Corrected.

      Page 5, line 17: "the S2 values of the C-terminus" should be "the S2 values of the C-terminal loop in EBH", otherwise it is confusing.

      Corrected.

      Page 6, line 27; Figure S3C and S6: Please indicate the assignments of the resonances from "253FVI255" in the Figures.

      We labelled the peaks corresponding to the 253FVI255 region in figure S6 (new S7). Figure S3 shows EBH-ΔC that does not include this region.

      Page 7, line 25: Figure S7 should be S8.

      Corrected

      Page 12, line 6: "sulfatrahsferases" must by a typo.

      Corrected.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Jiang et al. present a measure of phenological lag by quantifying the effects of abiotic constraints on the differences between observed and expected phenological changes, using a combination of previously published phenology change data for 980 species, and associated climate data for study sites. They found that, across all samples, observed phenological responses to climate warming were smaller than expected responses for both leafing and flowering spring events. They also show that data from experimental studies included in their analysis exhibited increased phenological lag compared to observational studies, possibly as a result of reduced sensitivity to climatic changes. Furthermore, the authors present compelling evidence that spatial trends in phenological responses to warming may differ from what would be expected from phenological sensitivity, due to the seasonal timing of when warming occurs. Thus, climate change may not result in geographic convergences of phenological responses. This study presents an interesting way to separate the individual effects of climate change and other abiotic changes on the phenological responses across sites and species.

      Greater phenological lag with experimental studies results in reduced sensitivity to climatic changes, not other way around.

      Strengths:

      A clearly defined and straightforward mathematical definition of phenological lag allows for this method to be applied in different scientific contexts. Where data exists, other researchers can partition the effects of various abiotic forcings on phenological responses that differ from those expected from warming sensitivity alone.

      Sensitivity does not tell the magnitude of phenological changes, nor does it provide indications of mechanisms responsible for changes in spring phenology. Because of uneven warming, the same average temperature change (annual or spring temperatures) can have greater (greater warming prior to budburst) or smaller (smaller warming prior to budburst) phenological change than that with even warming. When average temperature change is close to zero, uneven warming can lead to infinite sensitivity values, either advanced (warmer temperatures prior to budburst) or delayed (cooler temperatures prior to budburst) spring phenology.

      It is not clear why sensitivity is so popularly used in phenological research.

      Identifying phenological lag and associated contributing factors provides a method by which more nuanced predictions of phenological responses to climate change can be made. Thus, this study could improve ecological forecasting models.

      Weaknesses:

      The authors include very few data visualizations, and instead report results and model statistics in tables. This is difficult to interpret and may obscure underlying patterns in the data. Including visual representations of variable distributions and between-variable relationships, in addition to model statistics, provides stronger evidence than model statistics alone.

      The use of stepwise, automated regression may be less suitable than a hypothesis-driven approach to model selection, combined with expanded data visualization. The use of stepwise regression may produce inappropriate models based on factors of the sample data that may preclude or require different variable selection.

      We used two statistical methods, variance analysis to examine differential phenological responses (Figure 2) and regression analysis to determine the relative importance of forcing change, budburst temperature, and physiological lag, the drivers of changes in spring phenology (Table 2). Our objective was to understand why plants show differential responses by research approach, species origin, climatic region, and growth form identified in previous research. Variable selection may affect minor (altitude, latitude, MAT, and average spring temperature change) or insignificant (photoperiod and long-term precipitation) variables, but not those related to drivers of spring phenology. We are not sure how hypothesis-driven approach can help with our objective.

      Reviewer #2 (Public review):

      Summary:

      This is a meta-analysis of the relative contributions of spring forcing temperature, winter chilling, photoperiod and environmental variables in explaining plant flowering and leafing phenology. The authors develop a new summary variable called phenology lag to describe why species might have different responses than predicted by spring temperature.

      Strengths:

      The summary statistic is used to make a variety of comparisons, such as between observational studies and experimental studies.

      Weaknesses:

      By combining winter chilling effects, photoperiod effects, and environmental stresses that might affect phenology, the authors create a new variable that is hard to interpret. The authors do not provide information in the abstract about new insights that this variable provides.

      Phenological lag contains effects of all constraints that may include chilling effects, photoperiod effects, and environmental stresses and is, indeed, hard to interpret without investigation of individual constraints. In our synthesis, spring phenology (or photoperiod effect) is not significant across all studies complied. It is also unlikely that lack of winter chilling causes the systemic differences in phenological lag between observational and experimental studies or between native and exotic species (see discussion at lines 335-339). At individual study level, the contribution of different constraints to the overall lag effect can be specifically determined if moisture stresses, species chilling and photoperiod effects, or cold hardiness are known from on-site monitoring or previous research.

      The meaning of phenological lag is described at lines 34-38 in the abstract.

      Comments:

      It would be useful to have a map showing the sites of the studies.

      A map showing the sites of the studies was added as supplementary Figure S1.

      The authors should provide a section in which the strengths and weaknesses of the approach are discussed. Is it possible that mixing different types of data, studies, sample sizes, number of years, experimental set-ups, and growth habits results in artifacts that influence the results?

      Both strengths and weaknesses are discussed at various places throughout the paper. The weakness of our method, as indicated by the reviewer, is the inclusion of different constraints in the phenological lag and has been described at lines 34-38 in the abstract and lines 80-86 in the introduction of the concept. We have also expanded Conclusion section to discuss possible caveats at lines 369-393.

      As in all data analyses, the results can change with addition of more/different data, especially when sample size is relatively small. Ideally, comparisons are made among levels of fixed effects while controlling variations of other conditions. In phenological studies, however, climatic, phenological, and biological conditions all vary. For example, observational and experimental studies differ not only in the nature of warming (natural climate change vs artificial warming), but also in levels of warming (greater warming with experimental studies) and climatic, phenological, and biological conditions (Table 1). All phenological syntheses (or meta-analyses) have to make do with this uncontrolled nature of phenological data.

      Now that the authors have created this new variable, phenological lag, which of the components that contribute to it has the most influence on it? Or which components are most influential in which circumstances? For example, what are some examples where photoperiod causes a phenological lag?

      Any of the phenological constraints identified can contribute alone or in combination with others to the overall effect of phenological lag. Across all studies with this synthesis, the lack of significance with spring phenology rules out photoperiod effect, while the association of longer phenological lags with longer accumulation of winter chilling does not suggest general chilling shortage with the current extent of climate change.

      Although spring phenology is not significant across all studies, photoperiod effect can be influential at individual studies where changes in spring phenology are large. However, reported photoperiod effects in the literature are mostly confounding effects with temperatures, i.e., longer photoperiods are associated with longer hours of high daytime temperatures (see Chu et al., 2021). Other than European beech under an unlikely scenario of climate change (growth resumes at beginning of winter), there has been not clear evidence showing the effect of photoperiod in constraining spring phenology.

      Another confounding effect with photoperiod is extra heating effect with artificial light sources in warming experiments. Some early studies have shown that leaf temperature can be several degrees above the ambient air, due to long-wave radiation with artificial light sources. It is hard to believe the constraining effect of photoperiod on spring phenology if phenological changes are within inter-annual variations (can be a few weeks), although photoperiod effect has been increasingly discussed recently.

      Recommendations for the authors:

      Reviewing Editor:

      A key methodological concern is the inconsistent definition of growth temperature across observations. It is calculated over the interval between the baseline phenological date and the expected date under warming - a window that varies by species, site, and treatment. This variability limits comparability across observations and may introduce circularity, as growth temperature is derived from the same modelled expectation (i.e., the expected phenological advance) that it is later used to explain.

      The term “growth temperature” has been replaced with “budburst temperature” to indicate temperatures at species events. Budburst temperature is the average temperature within the window of expected response with the warmer climate and, as indicated by the editor, varies by species, sites, and treatments. This species-specific temperature provides an opportunity to compare among species, sites, and treatments and helps explain differences in observed responses, as demonstrated in the discussion of results in this synthesis.

      Forcing change, budburst temperature, and expected response are related. High budburst temperatures are associated with smaller expected responses, which helps explain smaller observed responses with late season species and areas of warm climates that have been often attributed to chilling or photoperiod effect.

      Additionally, the use of degree days above 0 {degree sign}C as a universal metric for spring forcing oversimplifies species' temperature responses. This approach assumes not only a fixed base temperature but also a linear response to temperature accumulation, which overlooks well-established nonlinear or species-specific thermal response curves. To improve the robustness and interpretability of the phenological lag framework, we encourage the authors to consider these limitations and explore ways to test or justify these modelling assumptions more explicitly.

      The use of 0 degree base temperature may not be the best choice for some species. Except for some early work, there has been few experimental research on physiological aspects of chilling and forcing processes. A popular alternative is modelling using assumed temperature response models. As variables influencing chilling and forcing processes are not controlled, the determined base temperatures and temperature response models may be OK with the species studied under particular conditions but would be inappropriate for applications beyond. It is hard to believe that species, in a study, all have different base temperature for accumulation of spring forcing and optimum temperature for winter chilling. Apparently, this is the result of model fitting, not actual dynamics of chilling and forcing processes.

      Two base temperatures are commonly used, 0 and 5 oC, although choice is not generally justified. It is known for long time that temperatures above 0oC contribute to spring forcing. My personal experience at tree nursery suggests that seedlings will flush after winter cold storage, even at forcing temperatures ≤ 5 oC in the dark. The use of 5 oC is rather the choice of tradition (5 oC is commonly used to define growing season) than scientific justification. The use of high base temperatures may not make much difference at high temperatures due to short forcing duration but will underestimate forcing at low temperatures due to long forcing duration and large proportions of forcing between 0 and base temperatures. We are not aware of any experimental studies that demonstrate non-zero base temperatures.

      Within the dominant range of spring temperatures (e.g., between 5 and 25 oC), the forcing responses to temperatures can be approximated with linear models. Again, we are not aware of any non-linear forcing models that can be safely applied beyond the species studied under particular conditions.

      Regardless, the uses of different base temperatures or forcing models would not affect the partitioning of phenological changes, simply because temperature response models reflect physiological aspects of chilling and forcing processes and would not change with climate warming.

      The authors introduce a new metric, phenological lag, to assess how phenological constraints influence spring phenology, offering new insights into phenological research. However, there are several concerns. First, the research question and the study's aim are not clearly presented. The authors primarily analyzed phenological lag and simply compared it across different groups, but additional analyses are needed to adequately address the research question. In addition, the broader importance of this study is not clearly explained - why this research is necessary and what it contributes to the field should be explicitly stated.

      The research question is outlined at lines 92-108. We added “Our objective was to determine how phenological responses differ among different groups and how differential responses are related to drivers of spring phenology, i.e., forcing change, budburst temperature, and phenological lag” at lines 106-108.

      (1) Abstract: The methodological improvements and more key results should be included.

      Growth temperature has been replaced with “budburst temperature” to indicate temperatures at time of budburst. More results are added at lines 40-48.

      (2) Line 32: Terms such as "sensitivity analysis" and "phenological lag" need clearer definitions.

      We added at lines 32-33 to define sensitivity analysis “that is based on rates of phenological changes, not on drivers of spring phenology”. Phenological lag is defined at lines 34-38.

      (3) Lines 38-47: Further results and the urgency or importance of the study should be conveyed.

      More results are added at lines 40-48. The importance of this study is described at lines 48-50.

      (4) Line 57-58: This sentence is unclear - please clarify.

      The sentence is modified to “difficult using sensitivity analysis that is based on rates of phenological changes, not on drivers of spring phenology".

      (5) Line 60: break "endodormancy".

      Breaking dormancy would mean endodormancy.

      (6) Line 67: What does "growth temperature" refer to?

      Growth temperature has been replaced with “budburst temperature” to indicate temperatures at time of budburst. It is calculated as the average temperature within the window of expected response with the warmer climate.

      (7) Lines 87-94: The specific purpose of the study is vague. Why is this method needed, and how will it serve future research?

      We have modified the paragraph at lines 92-108 to provide justification and objective of the study.

      (8) Lines 163-164: The rationale for exploring differences in observed responses and phenological lag needs to be better justified.

      We added explanations at lines 179-182 why observed responses and phenological lag were chosen in the analysis.

      (9) Lines 178-183: Tables and figures should be properly cited within the text.

      Table S3 was added at line 197.

      (10) Lines 195-198: Clarify whether variables were scaled before model analysis.

      We clarified at line 192 “variables were not standardized prior to regression analysis”.

      (11) Line 206-207: The observed response is presented as the number of advanced days, while temperature sensitivity refers to the response of spring phenology to temperature - these are different variables and should not be conflated.

      The two variables are related but show different aspects of phenological changes. Observed response divided by average temperature change gives temperature sensitivity. Observed response is the total changes in number of days observed, while temperature sensitivity is the change in number of days per unit change in average temperature (oC). Sensitivity may reflects rates of phenological change with temperature (see responses to reviewer 1).

      (12) In the discussion section, the authors compared phenological responses among different groups separately. This section requires substantial improvement to more clearly answer the research question.

      These discussions are related to our objective “how phenological responses differ among different groups identified in previous research (i.e., research approach, species origin, climatic region, and growth form) and how these differential responses are related to drivers of spring phenology, i.e., forcing change, budburst temperature, and phenological lag”.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      This study examined the effect of blood pressure variability on brain microvascular function and cognitive performance. By implementing a model of blood pressure variability using an intermittent infusion of AngII for 25 days, the authors examined different cardiovascular variables, cerebral blood flow, and cognitive function during midlife (12-15-month-old mice). Key findings from this study demonstrate that blood pressure variability impairs baroreceptor reflex and impairs myogenic tone in brain arterioles, particularly at higher blood pressure. They also provide evidence that blood pressure variability blunts functional hyperemia and impairs cognitive function and activity. Simultaneous monitoring of cardiovascular parameters, in vivo imaging recordings, and the combination of physiological and behavioral studies reflect rigor in addressing the hypothesis. The experiments are well-designed, and the data generated are clear. I list below a number of suggestions to enhance this important work:

      (1) Figure 1B: It is surprising that the BP circadian rhythm is not distinguishable in either group. Figure 2, however, shows differences in circadian rhythm at different timepoints during infusion. Could the authors explain the lack of circadian effect in the 24-h traces?

      The circadian rhythm pattern is apparent in Figure 2 (Active BP higher than Inactive BP), where BP is presented as 12hour averages. When the BP data is expressed as one-hour averages (rather than minute-to-minute) over 24hours, now included in the revised manuscript as Supplemental Figure 3C-D, the circadian rhythm becomes noticeable. In addition, we have included one-hour average BP data for all mice in the control and BPV groups, Supplemental Figure 3A-B.

      Notably, the Ang-II induced pulsatile BP pattern remains evident in the one-hour averages for the BPV group, Supplemental Figure 3B. To minimize bias and validate variability, pump administrations start times were randomized for both control and BPV groups, Supplemental Figure 3A-B. Despite these adjustments, the circadian rhythm profile of BP is consistently maintained across individual mice and in the collective dataset, Supplemental Figure 3C-D.

      (2) While saline infusion does not result in elevation of BP when compared to Ang II, there is an evident "and huge" BP variability in the saline group, at least 40mmHg within 1 hour. This is a significant physiological effect to take into consideration, and therefore it warrants discussion.

      Thank you for this comment. The large variations in BP in the raw traces during saline infusion reflects transient BP changes induced by movement/activity, which is now included in Figure 1B (maroon trace). The revised manuscript now includes Line 222 “Note that dynamic activity-driven BP changes were apparent during both saline- and Ang II infusions, Figure 1B”.

      (3) The decrease in DBP in the BPV group is very interesting. It is known that chronic Ang II increases cardiac hypertrophy, are there any changes to heart morphology, mass, and/or function during BPV? Can the decrease in DBP in BPV be attributed to preload dysfunction? This observation should be discussed.

      The lower DBP in the BPV group was already present at baseline, while both groups were still infused with saline, and was a difference beyond our control. However, this is an important and valid consideration, particularly considering the minimal yet significant increase in SBP within the BPV group (Figure 1D). Our goal was to induce significant transient blood pressure responses (BPV) and investigate the impact on cardiovascular and neurovascular outcomes in the absence of hypertension. We did not anticipate any major cardiac remodeling at this early time point (considering the absence of overt hypertension) and thus cardiac remodeling was not assessed and this is now discussed in the revised manuscript (Line 443-453).

      (4) Examining the baroreceptor reflex during the early and late phases of BPV is quite compelling. Figures 3D and 3E clearly delineate the differences between the two phases. For clarity, I would recommend plotting the data as is shown in panels D and E, rather than showing the mathematical ratio. Alternatively, plotting the correlation of ∆HR to ∆SBP and analyzing the slopes might be more digestible to the reader. The impairment in baroreceptor reflex in the BPV during high BP is clear, is there any indication whether this response might be due to loss of sympathetic or gain of parasympathetic response based on the model used?

      We appreciate the reviewer’s suggestion and have accordingly generated new figures displaying scatter plots of SBP vs HR with linear regression analysis (Figure 3D-G). Our goal is to further investigate which branch of the autonomic nervous system is affected in this model. The loss of a bradycardic response suggests either an enhancement of sympathetic activity, a reduction in parasympathetic activity, or a combination of both. This is briefly discussed in the revised manuscript (Line 486-496).

      Heart rate variability (HRV) serves as an index of neurocardiac function and dynamic, non-linear autonomic nervous system processes, as described in Shaffer and Ginsber[1]. However, given that our data was limited to BP and HR readings collected at one-minute intervals, our primary assessment of autonomic function is limited to the bradycardic response. Further studies will be necessary to fully characterize the autonomic parameters influenced by chronic BPV.

      (5) Figure 3B shows a drop in HR when the pump is ON irrespective of treatment (i.e., independent of BP changes). What is the underlying mechanism?

      We apologize for any lack of clarity. These observed heart rate (HR) changes occurred during Ang II infusion, when blood pressure (BP) was actively increasing. In the control group, the pump solution was switched to Ang II during specific periods (days 3-5 and 21-25 of the treatment protocol) to induce BP elevations and a baroreceptor response, allowing direct comparisons between the control and BPV group.

      To clarify this point, we have revised Line 260-263 of the manuscript: “To compare pressure-induced bradycardic responses between BPV and control mice at both early and later treatment stages, a cohort of control mice received Ang II infusion on days 3-5 (early phase) (Supplemental Figure 4) and days 21-25 (late phase) thereby transiently increasing BP”.

      Additionally, a detailed description has been added to the Methods section (Line 96-101): “Controls receiving Ang II: To facilitate between-group comparisons (control vs BPV), a separate cohort of control mice were subjected to the same pump infusion parameters as BPV mice but for a brief period receiving Ang II infusions on days 3-5 and 21-25 for experiments assessing pressure-evoked responses, including bradycardic reflex, myogenic response, and functional hyperemia at high BP.”

      (6) The correlation of ∆diameter vs MAP during low and high BP is compelling, and the shift in the cerebral autoregulation curve is also a good observation. I would strongly recommend that the authors include a schematic showing the working hypothesis that depicts the shift of the curve during BPV.

      Thank you for this insightful comment. The increase in vessel reactivity to BP elevations in parenchymal arterioles of BPV mice suggests that chronic BPV induces a leftward shift and a potential narrowing of the cerebral autoregulation range (lower BP thresholds for both the upper and lower limits of autoregulation). This has been incorporated (and discussed) into the revised manuscript (see Figure 5N).

      One potential explanation for these changes is that the absence of sustained hypertension, a prominent feature in most rodent models of hypertension, limits adaptive processes that protect the cerebral microcirculation from large BP fluctuations (e.g., vascular remodeling). While this study does not specifically address arteriole remodeling, the lack of such adaptation may reduce pressure buffering by upstream arterioles, thereby rendering the microcirculation more vulnerable to significant BP fluctuations.

      The unique model allows for measurements of parenchymal arteriole reactivity to acute dynamic changes in BP (both an increase and decrease in MAP). Our findings indicate that chronic BPV enhances the reactivity of parenchymal arterioles to BP changes—both during an increase in BP and upon its return to baseline, Supplemental Figure 5C, F. The data suggest an increased myogenic response to pressure elevation, indicative of heightened contractility, a common adaptive process observed in rodent models of hypertension[2-4]. However, our model also reveals a notable tendency for greater dilation when the BP drops, Supplemental Figure 5F. This intriguing observation may suggest ischemia during the vasoconstriction phase (at higher BP), leading to enhanced release of dilatory signals, which subsequently manifest as a greater dilation upon BP reduction. This phenomenon bears similarities to chronic hypoperfusion models[5,6], where vasodilatory mechanisms become more pronounced in response to sustained ischemic conditions. Future studies investigating the effects of BPV on myogenic responses and brain perfusion will be a priority for our ongoing research.

      (7) Functional hyperemia impairment in the BPV group is clear and well-described. Pairing this response with the kinetics of the recovery phase is an interesting observation. I suggest elaborating on why BPV group exerts lower responses and how this links to the rapid decline during recovery.

      Based on the heightened reactivity of BPV parenchymal arterioles to intravascular pressure (Figure 5), we anticipate that the reduction of sensory-evoked dilations results from an increased vasoconstrictive activity and/or a decreased availability of vasodilatory signaling pathways (NO, EETs, COX-derived prostaglandins)[7,8]. Consequently, the magnitude of the FH response is blunted during periods of elevated BP in BPV mice.

      Additionally, upon termination of the stimulus-induced response−when vasodilatory signals would typically dominate−vasoconstrictive mechanisms are rapidly engaged (or unmasked), leading to quicker return to baseline. This shift in the balance between vasodilatory and vasoconstrictive forces favors vasoconstriction, contributing to the altered recovery kinetics observed in BPV mice. This has been included in the Discussion section of the revised manuscript.

      (8) The experimental design for the cognitive/behavioral assessment is clear and it is a reasonable experiment based on previous results. However, the discussion associated with these results falls short. I recommend that the authors describe the rationale to assess recognition memory, short-term spatial memory, and mice activity, and explain why these outcomes are relevant in the BPV context. Are there other studies that support these findings? The authors discussed that no changes in alternation might be due to the age of the mice, which could already exhibit cognitive deficits. In this line of thought, what is the primary contributor to behavioral impairment? I think that this sentence weakens the conclusion on BPV impairing cognitive function and might even imply that age per se might be the factor that modulates the various physiological outcomes observed here. I recommend clarifying this section in the discussion.

      We thank the reviewer for this comment. Clinical studies have demonstrated that patients with elevated BPV exhibit impairments across multiple cognitive domains, including declines in processing speed[9] and episodic memory[10]. To evaluate memory function, we utilized behavioral tests: the novel object recognition (NOR) task to assess episodic memory[11] and the spontaneous Y-maze to evaluate short-term spatial memory[12].

      Previous research indicates that older C57Bl6 mice (14-month-old) exhibit cognitive deficits compared to younger counterparts (4- and 9-month-old)[13]. To ensure rigorous selection for behavioral testing, we conducted preliminary NOR assessment, evaluating recognition memory at the one-hour delay but observing failures at the four-, and 24-hour delays, indicating age-related deficits. Based on these results, animals failing recognition criteria were excluded from subsequent behavioral assessment. However, because no baseline cognitive testing was conducted for the spontaneous Y-maze, it is possible that some mice with aged-related deficits were included in this test, which may have influenced data interpretation.

      Additionally, the absence of differences in the Y-maze performance may suggest that short-term spatial memory remains intact following 25 days of BPV, a point that is now discussed in the revised manuscript.

      (9) Why were only male mice used?

      We appreciate this comment and acknowledge the importance of conducting experiments in both male and female mice. Studies involving female mice are currently ongoing, with telemetry data collection approximately halfway completed and two-photon imaging studies on functional hyperemia also partially completed. However, using middleaged mice for these experiments has proven challenging due to high mortality rates following telemetry surgeries. As a result, we initially limited our first cohort to male mice.

      (10) In the results for Figure 3: "Ang II evoked significant increases in SBP in both control and BPV groups;...". Also, in the figure legend: "B. Five-minute average HR when the pump is OFF or ON (infusing Ang II) for control and BPV groups...." The authors should clarify this as the methods do not state a control group that receives Ang II.

      Please refer to response to comment 5.

      Reviewer #2 (Public review):

      Summary:

      Blood pressure variability has been identified as an important risk factor for dementia. However, there are no established animal models to study the molecular mechanisms of increased blood pressure variability. In this manuscript, the authors present a novel mouse model of elevated BPV produced by pulsatile infusions of high-dose angiotensin II (3.1ug/hour) in middle-aged male mice. Using elegant methodology, including direct blood pressure measurement by telemetry, programmable infusion pumps, in vivo two-photon microscopy, and neurobehavioral tests, the authors show that this BPV model resulted in a blunted bradycardic response and cognitive deficits, enhanced myogenic response in parenchymal arterioles, and a loss of the pressure-evoked increase in functional hyperemia to whisker stimulation.

      Strengths:

      As the presentation of the first model of increased blood pressure variability, this manuscript establishes a method for assessing molecular mechanisms. The state-of-the-art methodology and robust data analysis provide convincing evidence that increased blood pressure variability impacts brain health.

      Weaknesses:

      One major drawback is that there is no comparison with another pressor agent (such as phenylephrine); therefore, it is not possible to conclude whether the observed effects are a result of increased blood pressure variability or caused by direct actions of Ang II.

      We acknowledge this limitation and have attempted to address the concern by introducing an alternative vasopressor, norepinephrine (NE), Figure 4. A subcutaneous dose of 45 µg/kg/min was titrated to match Ang II-induced transient BP pulse (Systolic BP ~150-180 mmHg), Figure 4A. Similar to Ang II treated mice, NE-treated mice exhibited no significant changes in average mean arterial pressure (MAP) throughout the 20-day treatment period (Figure 4B). Although there was a trend (P=0.08) towards increased average real variability (ARV) (Figure 4C left), it did not reach statistical significance. The coefficient of variation (CV) (Figure 4C right) was significantly increased by day 3-4 of treatment (P=0.02).

      Notably, unlike the bradycardic response observed during Ang II-induced BP elevations, NE infusions elicited a tachycardic response (Figure 4A), likely due to β-1 adrenergic receptor activation. However, significant mortality was observed within the NE cohort: three of six mice died prematurely during the second week of treatment, and two additional mice required euthanasia on days 18 and 20 due to lethargy, impaired mobility, and tachypnea.

      While we recognize the importance of comparing results across vasopressors, further investigation using additional vasopressors would require a dedicated study, as each agent may induce distinct off-target effects, potentially generating unique animal models. Alternatively, a mechanical approach−such as implanting a tethered intra-aortic balloon[14] connected to a syringe pump−could be explored to modulate blood pressure variability without pharmacological intervention. However, such an approach falls beyond the scope of the present study.

      Ang II is known to have direct actions on cerebrovascular reactivity, neuronal function, and learning and memory. Given that Ang II is increased in only 15% of human hypertensive patients (and an even lower percentage of non-hypertensive), the clinical relevance is diminished. Nonetheless, this is an important study establishing the first mouse model of increased BPV.

      We agree that high Ang II levels are not a predominant cause of hypertension in humans, which is why it is critical that our pulsatile Ang II dosing did not cause overt hypertension, (no increase in 24-hour MAP). Ang II was solely a tool to produce controlled, transient increases in BP to yield a significant increase in BPV.

      Regarding BPV specifically, prior studies indicate that primary hypertensive patients with elevated urinary angiotensinogen-to-creatinine ratio exhibit significantly higher mean 24-hour systolic ARV compared to those with lower ratios[15]. However, the fundamental mechanisms driving these harmful increases in BPV remain poorly defined. A central theme across clinical BPV studies is impaired arterial stiffness, which has been proposed to contribute to BPV through reduced arterial compliance and diminished baroreflex sensitivity. Moreover, increased BPV can exert mechanical stress on arterial walls, leading to arterial remodeling and stiffness−ultimately perpetuating a detrimental feed-forward cycle[16].

      In our model, male BPV mice exhibited a minimal yet significant elevation in SBP without corresponding increases in DBP, potentially reflecting isolated systolic hypertension, which is strongly associated with arterial stiffness[17,18]. Our initial goal was to establish controlled rapid fluctuations in BP, and Ang II was selected as the pressor due to its potent vasoconstrictive properties and short half-life[19].

      We appreciate the reviewer’s insightful comment and acknowledge the necessity of exploring alternative mechanisms underlying BPV, and independent of Ang II. It is our long-term goal to investigate these factors in further studies.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) How was the dose of Ang II determined? It seems that this dose (3.1ug/hr) is quite high.

      The Ang II dose was titrated in a preliminary study to one that induced a significant and transient BP response without increasing 24-hour blood pressure (i.e. no hypertension).

      Ang II was delivered subcutaneously at 3.1 μg/hr, a concentration comparable to high-dose Ang II administration via mini-osmotic pumps (~1700 ng/kg/min)[20], with one-hour pulses occurring every 3-4 hours. With 6 pulses per day, the total daily dose equates to 18.6 µg/day in a ~30 gram mouse.

      For comparison, if the same 18.6 µg/day dose were administered continuously via a mini-osmotic pump (18.6 µg/0.03kg/1440min), the resulting dosage would be approximately 431 ng/kg/min[21,22], aligning with subpressor dose levels. Thus, while the total dose may appear high, it is not delivered in a constant manner but rather intermittently, allowing for controlled, rapid variations in blood pressure.

      (2) Were behavioral studies performed on the same mice that were individually housed? Individual housing causes significant stress in mice that can affect learning and memory tasks (PMC6709207). It's not a huge issue since the control mice would have been housed the same way, but it is something that could be mentioned in the discussion section.

      Behavioral studies were performed on mice that were individually housed following the telemetry surgery. The study was started once BP levels stabilized, as mice required several days to achieve hemodynamic stability post-surgery. Consequently, all mice were individually housed for several days before undergoing behavioral assessment.

      To account for potential cognitive variability, earlier novel object recognition (NOR) tests were conducted to established cognitive capacity, and mice that did not meet criteria were excluded from further behavioral testing. However, we acknowledge that individual housing induces stress, which can influence learning and memory, and this is a factor we were unable to fully control. Given that both experimental and control groups experienced the same housing conditions, this stress effect should be comparable across cohorts. A discussion on this limitation is now included in the text.

      (3) It looks like one control mouse that was included in both Figures 1 and 2 (control n=12) but was excluded in Table 1 (control n=11), this isn't mentioned in the text - please include the exclusion criteria in the manuscript.

      We apologize for the typo−12 control animals were consistently utilized across Figure 1-2, Table 1, Supplemental Table 1, Figure 6C, and Supplemental Figure 2B. Since the initial submission, one control mouse was completed and included into the telemetry control cohort. Thus, in the updated manuscript, we have corrected the control sample size to 13 mice across these figures ensuring consistency.

      Additionally, exclusion criteria have now been explicitly included in the manuscript (Line 173-175). Mice were excluded from the study if they died prematurely (died prior to treatment onset) or mice exhibited abnormally elevated pressure while receiving saline, likely due to complications from telemetry surgery.

      (4) Please include a statement on why female mice were not included in this study.

      As discussed in our response to Reviewer #1, our initial intention was to include both male and female mice in this study. However, high mortality rates following telemetry surgeries significantly constrained our ability to advance all aspects of the study. As a result, we limited our first cohort to males to establish the basics of the model. A statement is now included in the manuscript, Line 50-53: “Female mice were not included in the present study due to high post-surgery mortality observed in 12-14-month-old mice following complex procedures. To minimized confounding effects of differential survival and to establish foundational data for this model, we restricted the investigation to male mice.”

      Potential sex differences might be complex and warrants a separate future research to comprehensively assess sex as a biological variable, which are currently ongoing.

      (5) On page 14, "experiments from control vs experimental mice were not equally conducted in the same season raising the possibility for a seasonal effect" - does this mean that control experiments were not conducted at the same time as the Ang II infusions in BPV mice? This has huge implications on whether the effects observed are induced by treatment or just batch seasonal effects.

      We fully acknowledge the reviewer’s concern, and our statement aims to provide transparency regarding the study’s limitations. Several challenges contributed to this outcome, including high mortality rates following surgeries (primarily telemetry implantation) and technical issues related to instrumentation, particularly telemetry functionality.

      Differences between BPV and saline mice emerge primarily due to mortality or telemetry failures−some mice did not survive post-surgery, while others remain healthy but had non-functional telemeters. This issue was particularly pronounced in 14-month-old mice, as their fragile vasculature occasionally prevented proper BP readings.

      Each experiment required a minimum of two and a half months per mouse to complete, with a cost (also per mouse) exceeding $1500 USD ($300 pump, $175 mouse, $900 telemeters, per diem, drugs, reagents etc.). Despite our best effort to ensure comparable seasonal/batch data, these logistical and technical constraints prevented perfect synchronization.

      To evaluate whether seasonal differences influenced our results, we incorporated additional telemetry data into the control cohort. Of the seven included control mice, six underwent the same treatment but were allocated to a separate branch of the study, which endpoints did not require a chronic cranial window. We found no significant differences in 24-hour average MAP during the baseline period between control mice with or without a cranial window, Supplemental Figure 2A. Additionally, we grouped mice into seasonal categories based on Georgia’s climate: “Spring-Summer” (May-September) and “Fall-Winter” (October-April) but observed no BP differences between these periods, Supplemental Figure 2B.

      Given the absence of seasonal effects on BP and the fact that mice were sourced from two independent suppliers (Jackson Laboratory and NIA), we anticipate that the observed results are driven by treatment rather than seasonal or batch effects.

      (6) Methods, two-photon imaging: did the authors mean "retro-orbital" instead of "intra-orbital" injection of the Texas red dye? Also, is this a Texas red-dextran? If so, what molecular weight?

      Thank you for this comment. The correct terminology is “retro-orbital” rather than “intra-orbital” injection. Additionally, we utilized Texas Red-dextran (70 kDa, 5% [wt/vol] in saline) for the imaging experiments. These details have now been incorporated into the Methods section.

      (1) Shaffer F, Ginsberg JP. An Overview of Heart Rate Variability Metrics and Norms. Front Public Health. 2017;5:258. doi: 10.3389/fpubh.2017.00258

      (2) Pires PW, Jackson WF, Dorrance AM. Regulation of myogenic tone and structure of parenchymal arterioles by hypertension and the mineralocorticoid receptor. Am J Physiol Heart Circ Physiol. 2015;309:H127-136. doi: 10.1152/ajpheart.00168.2015

      (3) Iddings JA, Kim KJ, Zhou Y, Higashimori H, Filosa JA. Enhanced parenchymal arteriole tone and astrocyte signaling protect neurovascular coupling mediated parenchymal arteriole vasodilation in the spontaneously hypertensive rat. J Cereb Blood Flow Metab. 2015;35:1127-1136. doi: 10.1038/jcbfm.2015.31

      (4) Diaz JR, Kim KJ, Brands MW, Filosa JA. Augmented astrocyte microdomain Ca(2+) dynamics and parenchymal arteriole tone in angiotensin II-infused hypertensive mice. Glia. 2019;67:551-565. doi: 10.1002/glia.23564

      (5) Kim KJ, Diaz JR, Presa JL, Muller PR, Brands MW, Khan MB, Hess DC, Althammer F, Stern JE, Filosa JA. Decreased parenchymal arteriolar tone uncouples vessel-to-neuronal communication in a mouse model of vascular cognitive impairment. GeroScience. 2021. doi: 10.1007/s11357-020-00305-x

      (6) Chan SL, Nelson MT, Cipolla MJ. Transient receptor potential vanilloid-4 channels are involved in diminished myogenic tone in brain parenchymal arterioles in response to chronic hypoperfusion in mice. Acta Physiol (Oxf). 2019;225:e13181. doi: 10.1111/apha.13181

      (7) Tarantini S, Hertelendy P, Tucsek Z, Valcarcel-Ares MN, Smith N, Menyhart A, Farkas E, Hodges EL, Towner R, Deak F, et al. Pharmacologically-induced neurovascular uncoupling is associated with cognitive impairment in mice. J Cereb Blood Flow Metab. 2015;35:1871-1881. doi: 10.1038/jcbfm.2015.162

      (8) Ma J, Ayata C, Huang PL, Fishman MC, Moskowitz MA. Regional cerebral blood flow response to vibrissal stimulation in mice lacking type I NOS gene expression. Am J Physiol. 1996;270:H1085-1090. doi: 10.1152/ajpheart.1996.270.3.H1085

      (9) Sible IJ, Nation DA. Blood Pressure Variability and Cognitive Decline: A Post Hoc Analysis of the SPRINT MIND Trial. Am J Hypertens. 2023;36:168-175. doi: 10.1093/ajh/hpac128

      (10) Epstein NU, Lane KA, Farlow MR, Risacher SL, Saykin AJ, Gao S. Cognitive dysfunction and greater visit-to-visit systolic blood pressure variability. Journal of the American Geriatrics Society. 2013;61:2168-2173. doi: 10.1111/jgs.12542

      (11) Antunes M, Biala G. The novel object recognition memory: neurobiology, test procedure, and its modifications. Cognitive processing. 2012;13:93-110. doi: 10.1007/s10339-011-0430-z

      (12) Kraeuter AK, Guest PC, Sarnyai Z. The Y-Maze for Assessment of Spatial Working and Reference Memory in Mice. Methods Mol Biol. 2019;1916:105-111. doi: 10.1007/978-1-4939-8994-2_10

      (13) Singhal G, Morgan J, Jawahar MC, Corrigan F, Jaehne EJ, Toben C, Breen J, Pederson SM, Manavis J, Hannan AJ, et al. Effects of aging on the motor, cognitive and affective behaviors, neuroimmune responses and hippocampal gene expression. Behav Brain Res. 2020;383:112501. doi: 10.1016/j.bbr.2020.112501

      (14) Tediashvili G, Wang D, Reichenspurner H, Deuse T, Schrepfer S. Balloon-based Injury to Induce Myointimal Hyperplasia in the Mouse Abdominal Aorta. J Vis Exp. 2018. doi: 10.3791/56477

      (15) Ozkayar N, Dede F, Akyel F, Yildirim T, Ates I, Turhan T, Altun B. Relationship between blood pressure variability and renal activity of the renin-angiotensin system. J Hum Hypertens. 2016;30:297-302. doi: 10.1038/jhh.2015.71

      (16) Kajikawa M, Higashi Y. Blood pressure variability and arterial stiffness: the chicken or the egg? Hypertens Res. 2024;47:1223-1224. doi: 10.1038/s41440-024-01589-8

      (17) Laurent S, Boutouyrie P. Arterial Stiffness and Hypertension in the Elderly. Front Cardiovasc Med. 2020;7:544302. doi: 10.3389/fcvm.2020.544302

      (18) Wallace SM, Yasmin, McEniery CM, Maki-Petaja KM, Booth AD, Cockcroft JR, Wilkinson IB. Isolated systolic hypertension is characterized by increased aortic stiffness and endothelial dysfunction. Hypertension. 2007;50:228-233. doi: 10.1161/HYPERTENSIONAHA.107.089391

      (19) Al-Merani SA, Brooks DP, Chapman BJ, Munday KA. The half-lives of angiotensin II, angiotensin II-amide, angiotensin III, Sar1-Ala8-angiotensin II and renin in the circulatory system of the rat. J Physiol. 1978;278:471490. doi: 10.1113/jphysiol.1978.sp012318

      (20) Zimmerman MC, Lazartigues E, Sharma RV, Davisson RL. Hypertension caused by angiotensin II infusion involves increased superoxide production in the central nervous system. Circ Res. 2004;95:210-216. doi: 10.1161/01.RES.0000135483.12297.e4

      (21) Gonzalez-Villalobos RA, Seth DM, Satou R, Horton H, Ohashi N, Miyata K, Katsurada A, Tran DV, Kobori H, Navar LG. Intrarenal angiotensin II and angiotensinogen augmentation in chronic angiotensin II-infused mice. Am J Physiol Renal Physiol. 2008;295:F772-779. doi: 10.1152/ajprenal.00019.2008

      (22) Nakagawa P, Nair AR, Agbor LN, Gomez J, Wu J, Zhang SY, Lu KT, Morgan DA, Rahmouni K, Grobe JL, et al. Increased Susceptibility of Mice Lacking Renin-b to Angiotensin II-Induced Organ Damage. Hypertension. 2020;76:468-477. doi: 10.1161/HYPERTENSIONAHA.120.14972

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      (1) All outcomes are attributed specifically to L6b neurons, but the genetic manipulation is not specific to L6b neurons. The authors acknowledge this as a limitation, but in my view, this global manipulation is more than a limitation - it affects the overall interpretations of the data. The Hoerder-Suabedissen et al., 2018 paper shows sparse, but also dense, expression of Drd1a+ neurons in brain regions outside of the L6b. Given this issue, the results are largely overstated throughout the paper.

      We appreciate the reviewer’s careful reading and concern that some of our statements may have overstated the implications of our data. The Drd1a Cre mouse model used (FK164) has a relatively selective expression of Drd1a Cre in cortex, especially in layer 6b, but indeed some expression is seen in layer 6a and subcortically. We will nuance our claims throughout the paper to ensure that the conclusions are supported by our findings, and further discuss the impact of this limitation on the overall interpretation of our results. Specifically, we will discuss the potential contribution of relevant subcortical areas and layer 6a in the effects we observed.

      (2) It is not clear to me that the "silencing" of Drd1a+ neurons was verified.

      In our previous publications, we showed confirmation of the loss of regulated synaptic vesicle release from the Cre positive neuronal population (Marques-Smith et al., 2016; Hoerder-Suabedissen et al., 2018; Messore et al., 2024), which validates our approach to “silence” cortical neurons. We will discuss this further in the revised manuscript.

      (3) There were various discrepancies (and potentially misattributions) between the stated significant differences in Supplementary Table T1 data and Figure 3a & S2 spectral plots. This issue makes it difficult to effectively evaluate the main text and stated outcomes.

      We thank the reviewer for spotting the inconsistencies in how the statistical comparisons were presented: indeed, in the text we described two-way ANOVAs with posthoc tests but in the figures significance markers were positioned based on multiple t-tests. We have revised Supplementary Table T1, Figure 3a and S2 to ensure that all statistics are presented consistently throughout the manuscript, i.e. with two-way ANOVAs and accompanying posthoc tests.

      Related, the authors stated that post hoc comparisons of EEG spectral frequency bins were not corrected for multiple testing. Instead, significance was only denoted if changes in at least two consecutive frequency bins were significant. However, there are multiple plots in which a single significance marker is placed over an isolated bin (i.e., 4c, 6, S5, S6). Unless each marker is equivalent to 2 consecutive frequency bins, these markers should be removed from the plots. Otherwise, please define the frequency and size of these markers in the main text.

      In line with the previous comment, we have adjusted markers to reflect the results from posthoc tests after two-way ANOVAs in Figures 6 and supplementary figures S5 and S6. 

      We thank the reviewer for pointing out that in our comparisons of EEG spectra, in some cases single isolated frequency bins, where p-value reached 0.05 were shown as significantly different, which indeed could have occurred by chance given that, in line with previous literature, we have not employed multiple testing comparison. In the revised manuscript we will use an unbiased approach by plotting actual p-values for all bins, and moderate our conclusions accordingly, while giving the readers the opportunity to evaluate the magnitude and extent of the differences directly, rather than relying on an arbitrary threshold for significance.

      (4) A rainbow color scale, as in Figure 3, we've now learned, can be misleading and difficult to interpret. The viridis color scale or a different diverging color scale are good alternatives.

      Thank you for pointing this out, we have adjusted the colour scale.

      (5) How much time elapsed between vehicle/orexin A & B infusions?

      There were 2-4 non-infusions days between infusions. We will add this information to methods when revising the manuscript.

      (6) For Figure 6, there are statistical discrepancies between the main text and the plots (pg. 10):

      a) The text claims post hoc differences for relative ORXA frontal EEG, but there are no significance markers on the plot.

      b) The text states that there were no post hoc differences for the relative ORXA occipital EEG, but significance markers are on the plot.

      c) The main test for the relative ORXB frontal EEG was not significant, but there are post hoc significance markers on the plot.

      d) For relative ORXB occipital EEG, there are significant markers on the plot outside of the stated range in the text.

      Thank you for your careful observations, these issues reflect the same inconsistency as raise above, where the text describes two-way ANOVAs and the figures refers to results obtained with multiple t tests. We shall adjust the markers in the figures to be only shown when the ANOVA is significant and show the results of posthoc tests after ANOVAs instead of the results of multiple t tests.

      (7) Some important details are only available in figure captions, making it difficult to understand the main text. For example, when describing Figure 3c in the main text on page 7, it is not clear what type of transitions are being discussed without reading the figure caption. Likewise, a "decrease," "shift," and "change" are mentioned, but relative to what? Similar comment for the EEG theta activity description on pages 7 - 8. Please add relevant details to the main text.

      We will adjust the wording in the main text to reflect more precisely which comparisons are shown in the figures.

      (8) Statistical comparisons for data in Figure 3e, post hoc analyses for data in Figure S7a-b REM data, and post hoc analyses for Figure S7c (not b) occipital EEG should be included to support differences claims. Please denote these differences on the respective plots.

      We have added the statistical comparisons for Figure 3e to the results section.

      We have added the statistical comparisons for Figure S7A to the results section.

      We have added the statistical comparison for Figure S7b to the results section.

      In Figure S7c, there was an overall genotype difference, but there was not a time x genotype interaction, so we have not performed posthoc tests and did not plot posthoc significance markers for this figure. We have adjusted the wording in the results section to make this clearer.

      We have adjusted the reference to the figure S7c which was incorrect, thank you for your careful attention.

      (9) In the subsection titled "Layer 6b mediates effects of orexin on vigilance states (pg. 8)," there does not seem to be any stated differences between control and L6b silenced mice. A more accurate subtitle is needed.

      We shall change the subtitle to: “The effects of orexin on vigilance states in L6b silenced mice”. The main finding described in this section is that the increase in EEG theta frequency after ORXB infusion is attenuated in L6b silenced mice, so a statement summarizing this finding could be an alternative title. However, then it would not accurately reflect other, less conspicuous, yet potentially important findings described in this section (during NREM sleep, only in L6b silenced animals there is an increase in power in the lower frequency bins in the frontal derivation; in the occipital derivation, levels of relative SWA during NREM sleep after ORXA infusion were lower in L6b silenced than in control animals).

      Reviewer #2 (Public review):

      Weaknesses:

      (1) Although the authors used a highly selective approach to silence layer 6b neurons, the observed changes in EEG oscillations cannot be solely attributed to layer 6b neurons because of the ICV route for orexin administration.

      We completely agree, and did not want to imply that orexin administered through the ICV route reaches cortical Drd1a Cre expressing neurons only. We will re-word the corresponding sentences accordingly throughout the manuscript.

      (2) The rationale for using only male rats is not provided.

      We agree that this is an important limitation and will acknowledge and discuss it further in the revised manuscript. Unfortunately, our experimental protocol precluded the possibility of monitoring accurately the oestrous cycle, which as well-known has an influence on sleep-wake architecture, brain oscillations as well as orexin signalling and receptor abundance. We therefore decided to use male mice only for the current study, but planning to use both sexes in our follow up work.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Joint Public Review:

      Summary:

      For many years, there has been extensive electrophysiological research investigating the relationship between local field potential patterns and individual cell spike patterns in the hippocampus. In this study, using innovative imaging techniques, they examined spike synchrony of hippocampal cells during locomotion and immobility states. The authors demonstrated that hippocampal place cells exhibit prominent synchronous spikes locked to theta oscillations.

      Strengths:

      The single cell voltage imaging used in this study is a highly novel method that may allow recordings that were not previously possible using existing methods.

      We thank the reviewer for recognizing the strengths of our study.

      Weaknesses:

      The strength of evidence remains incomplete because of the main claim that synchronous events are not associated with ripples. As was mentioned in previous rounds of review, ripples emerge locally and independently in the two hemispheres. Thus, obtaining ripple recordings from the contralateral hemisphere does not provide solid evidence for this claim. The papers the authors are citing to make the claim that "Additionally, we implanted electrodes in the contralateral CA1 region to monitor theta and ripple oscillations, which are known to co-occur across hemispheres (29-31)" do not support this claim. For example, reference 29 contains the following statement: "These findings suggest that ripples emerge locally and independently in the two hemispheres".

      In our previous revisions, we took care to limit our claim to what our data directly supported: that synchronous ensembles of CA1 neurons were not associated with ripple oscillations recorded in the contralateral hippocampus. To address reviewer concerns, we changed the Title, modified the Abstract, adjusted relevant text in the Results, and explicitly acknowledged the methodological limitations in the Discussion. 

      In this round, we further revised the manuscript to directly address the editor’s and reviewer’s remaining concerns: 

      (1) We replaced the word “surprisingly” with a more neutral “Moreover” to avoid implying that the observed dissociation was unexpected given the use of contralateral recordings.

      Introduction (line 67-69):

      “Moreover, these synchronous ensembles occurred outside of contralateral ripples (c-ripples) …”

      (2) We removed the clause stating that ripples “co-occur across hemispheres”, along with the associated citation to Buzsaki et al. (2003), to avoid potential misinterpretation. The sentence now simply states that we recorded ripple and theta oscillations in the contralateral CA1.

      Introduction (line 63-64):

      “Additionally, we implanted electrodes in the contralateral CA1 region to monitor theta and ripple oscillations.” (co-occurrence claim removed)

      (3) We carefully replaced all mentions of “ripples” in the manuscript with “c-ripples” (i.e., contralateral ripples) to ensure that the scope of our findings is clearly defined and cannot be misinterpreted.

      (4) We strengthened the acknowledgment of the methodological limitations in the Discussion. 

      Discussion (line 528-533): 

      “While contralateral LFP recordings can capture large-scale hippocampal theta and ripple oscillations, they do not fully reflect ipsilateral-specific dynamics, such as variation in theta phase alignment or locally generated ripple events (Buzsaki et al., 2003; Szabo et al., 2022; Huang et al., 2024). Given that ripple oscillations can emerge locally and independently in each hemisphere, interpretations based on contralateral recordings must be made with caution. Further studies incorporating simultaneous ipsilateral field potential recordings will be essential to more precisely understand local-global network interactions.”

      These revisions ensure that our manuscript now presents a consistent and appropriately limited interpretation across all sections. We hope these clarifications address all remaining concerns and accurately reflect the scope of our findings.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The work from this paper successfully mapped transcriptional landscape and identified EA-responsive cell types (endothelial, microglia). Data suggest EA modulates BBB via immune pathways and cell communication. However, claims of "BBB opening" are not directly proven (no permeability data).

      (1) No in vivo/in vitro assays confirm BBB permeability changes (e.g., Evans blue leakage, TEER).  

      (2) Only male rats were used, ignoring sex-specific BBB differences.

      (3) Pericytes and neurons, critical for the BBB, were not captured, likely due to dissociation artifacts.

      (4) Protein-level validation (Western blot, IHC) absent for key genes (e.g., LY6E, HSP90).

      (5) Fixed stimulation protocol (2/100 Hz, 40 min); no dose-response or temporal analysis.

      (1) We sincerely apologize for the oversight regarding the description of changes in blood-brain barrier permeability. In fact, our team conducted a series of preliminary studies that verified this aspect, but we did not provide a more detailed introduction in the introduction section. We will address and improve this in the revised manuscript. (2) We are very grateful to the reviewers for pointing out the important and meaningful issue of "gender-specific BBB differences." We will make this a focal point in our future research.

      (2) As for pericytes and neurons, we acknowledge their importance in the function of the blood-brain barrier. We acknowledge the importance of pericytes and neurons in the blood-brain barrier. However, neurons are absent because our sample processing method involves dissociation. During the dissociation procedure, neuronal axons, which are relatively long, are filtered out during the frequent cell suspension steps and cannot enter the downstream microfluidic system for analysis, so they are not present in our data. Since this experiment is primarily focused on non-neuronal cells, we did not choose to use nucleus extraction for sample processing. As for pericytes, we believe they are not captured because their proportion in our samples is extremely low, which is why they are not present in the data. Further research may require single-nucleus transcriptomics or the separate isolation of these two cell types for study. Of course, in our current mechanistic studies, we are also fully considering the important roles these two cell types play in BBB function.

      (3) In addition, for verification at the protein level, we have recently conducted some experiments and will include these results in the revised version.

      (5) Lastly, regarding our electroacupuncture intervention model, we actually conducted a series of parameter optimization experiments during the preliminary exploration phase. This part is indeed lacking in our current introduction, and we will add it to the research background and introduction.

      Reviewer #2 (Public review):

      Summary:

      This study uses single-cell RNA sequencing to explore how electroacupuncture (EA) stimulation alters the brain's cellular and molecular landscape after blood-brain barrier (BBB) opening. The authors aim to identify changes in gene expression and signaling pathways across brain cell types in response to EA stimulation using single-cell RNA sequencing. This direction holds promise for understanding the consequences of noninvasive methods of BBB opening for therapeutic drug delivery across the BBB.

      (1) The work falls short in its current form. The experimental design lacks a clear justification, and readers are not provided with sufficient background information on the extent, timing, or regional specificity of BBB opening in this EA model. These details, established in prior work, are critical to understanding the rationale behind the current transcriptomic analyses.

      (2) Further, the results are often presented with minimal context or interpretation. There is no model of intercellular or molecular coordination to explain the BBB-opening process, despite the stated goal of identifying such mechanisms. The statement that EA induces a "unique frontal cortex-specific transcriptome signature" is not supported, as no data from other brain regions are presented. Biological interpretation is at times unclear or inaccurate - for instance, attributing astrocyte migration effects to endothelial cell clusters or suggesting microglial tight junction changes without connecting them meaningfully to endothelial function.<br /> (3) The study does include analyses of receptor-ligand signaling and cell-cell communication, which could be among its most biologically rich outputs. However, these are relegated to supplementary material and not shown in the leading figures. This choice limits the utility of the manuscript as a hypothesis-generating resource.

      (4) Overall, while the dataset may be of interest to BBB researchers and those developing technologies for drug delivery across the BBB, the manuscript in its current form does not yet fulfill its interpretive goals. A more integrated and biologically grounded analysis would be beneficial.

      (1) It was indeed our mistake that we did not pay attention to the importance of research background factors such as the degree, timing, or regional specificity of BBB opening for the rationale and purpose of this experimental design. In our revision, we will thoroughly elaborate on the relevant previous studies.

      (2) Our current study is actually based on previous findings that electroacupuncture can open the BBB, with a more pronounced effect observed in the frontal lobe (this aspect should be further described in the research background). Building on this foundation, our aim is to delineate the potential biological mechanisms involved. Therefore, we selected frontal lobe tissue as our primary choice for sequencing and have not yet investigated differences across other brain regions, although this may become a focus of future research. Additionally, we recognize that the mechanism underlying BBB opening is complex, and at present, we cannot determine whether it is driven by a single direct factor or by coordinated actions between cells or molecules. As such, our results are presented only briefly for now, and we will carefully consider whether to supplement our findings by incorporating insights from other studies.

      (3) Thank you very much for bringing this to our attention. We will include the key results of the receptor-ligand signaling and cell-cell communication analysis in the main manuscript.

      (4) Indeed, our current dataset and analysis tend to present objective data results. We are also conducting a series of validations that may be related to the biology of the blood-brain barrier, and we look forward to sharing and discussing any future research findings with you and everyone.

    1. Author response:

      We thank the reviewers for their thoughtful comments, and we plan to implement many of their suggestions to improve the paper. We agree that the paper can benefit from clearer links between the two neural signatures (memory traces and uniform shifts) themselves, and between the neural signatures and behavioral phenomena. We will address these limitations in multiple ways. First, as the reviewers noted, RNN models have the potential to probe these relationships, so we plan to perform further analyses and modeling experiments to uncover any causal relationships. Second, we will also establish clearer definitions of the neural signatures and explore how these signatures can be unified using our models. Finally, we will compare the experimental paradigms between Losey et al and Sun, O’Shea et al, and discuss how differences between the paradigms may have impacted our observations, particularly in the context of other experimental and modeling papers.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Bacterial species that frequently undergo horizontal gene transfer events tend to have genomes that approach linkage equilibrium, making it challenging to analyze population structure and establish the relationships between isolates. To overcome this problem, researchers have established several effective schemes for analyzing N. gonorrhoeae isolates, including MLST and NG-STAR. This report shows that Life Identification Number (LIN) Codes provide for a robust and improved discrimination between different N. gonorrhoeae isolates.

      Strengths:

      The description of the system is clear, the analysis is convincing, and the comparisons to other methods show the improvements offered by LIN Codes.

      Weaknesses:

      No major weaknesses were identified by this reviewer.

      We thank the reviewer for their assessment of our paper.

      Reviewer #2 (Public review):

      Summary:

      This paper describes a new approach for analyzing genome sequences.

      Strengths:

      The work was performed with great rigor and provides much greater insights than earlier classification systems.

      Weaknesses:

      A minor weakness is that the clinical application of LIN coding could be articulated in a more in-depth way. The LIN coding system is very impressive and is certainly superior to other protocols. My recommendation, although not necessary for this paper, is that the authors expand their analysis to noncoding sequences, especially those upstream of open reading frames. In this respect, important cis-acting regulatory mutations that might help to further distinguish strains could be identified.

      We thank the reviewer for their comments. LIN code could be applied clinically, for example in the analysis of antibiotic resistant isolates, or to investigate outbreaks associated with a particular lineage. We will update the text to describe this more thoroughly.

      In regards to non-coding sequences: unfortunately, intergenic regions are generally unsuitable for use in typing systems as (i) they are subject to phase variation, which can occlude relationships based on descent; (ii) they are inherently difficult to assemble and therefore can introduce variation due to the sequencing procedure rather than biology. For the type of variant typing that LIN code represents, which aims to replicate phylogenetic clustering, protein encoding sequences are the best choice for convenience, stability, and accuracy. This is not to say that it is not a valid object to base a nomenclature on intergenic regions, which might be especially suitable for predicting some phenotypic characters, but this will still be subject to problem (ii), depending on the sequencing technology used.  Such a nomenclature system should stand beside, rather than be combined with or used in place of, phylogenetic typing. However, we could certainly investigate the relationship between an isolates LIN code and regulatory mutations in the future.

      Reviewer #3 (Public review):

      Summary:

      In this well-written manuscript, Unitt and colleagues propose a new, hierarchical nomenclature system for the pathogen Neisseria gonorrhoeae. The proposed nomenclature addresses a longstanding problem in N. gonorrhoeae genomics, namely that the highly recombinant population complicates typing schemes based on only a few loci and that previous typing systems, even those based on the core genome, group strains at only one level of genomic divergence without a system for clustering sequence types together. In this work, the authors have revised the core genome MLST scheme for N. gonorrhoeae and devised life identification numbers (LIN) codes to describe the N. gonorrhoeae population structure.

      Strengths:

      The LIN codes proposed in this manuscript are congruent with previous typing methods for Neisseria gonorrhea, like cgMLST groups, Ng-STAR, and NG-MAST. Importantly, they improve upon many of these methods as the LIN codes are also congruent with the phylogeny and represent monophyletic lineages/sublineages.

      The LIN code assignment has been implemented in PubMLST, allowing other researchers to assign LIN codes to new assemblies and put genomes of interest in context with global datasets.

      Weaknesses:

      The authors correctly highlight that cgMLST-based clusters can be fused due n to "intermediate isolates" generated through processes like horizontal gene transfer. However, the LIN codes proposed here are also based on single linkage clustering of cgMLST at multiple levels. It is unclear if future recombination or sequencing of previously unsampled diversity within N. gonorrhoeae merges together higher-level clusters, and if so, how this will impact the stability of the nomenclature.

      The authors have defined higher resolution thresholds for the LIN code scheme. However, they do not investigate how these levels correspond to previously identified transmission clusters from genomic epidemiology studies. It would be useful for future users of the scheme to know the relevant LIN code thresholds for these investigations.

      We thank the reviewer for their insightful comments. LIN codes do use multi-level single linkage clustering to define the cluster number of isolates. However, unlike previous applications of simple single linkage clustering such as N. gonorrhoeae core genome groups (Harrison et al., 2020), once assigned in LIN code, these cluster numbers are fixed within an unchanging barcode assigned to each isolate. Therefore, the nomenclature is stable, as the addition of new isolates cannot change previously established LIN codes.

      Cluster stability was considered during the selection of allelic mismatch thresholds. By choosing thresholds based on natural breaks in population structure (Figure 3), applying clustering statistics such as the silhouette score, and by assessing where cluster stability has been maintained within the previous core genome groups nomenclature, we can have confidence that the thresholds which we have selected will form stable clusters. For example, with core genome groups there has been significant group fusion with clusters formed at a threshold of 400 allelic differences, while clustering at a threshold of 300 allelic differences has remained cohesive over time (supported by a high silhouette score) and so was selected as an important threshold in the gonococcal LIN code. LIN codes have now been applied to >27000 isolates in PubMLST, and the nomenclature has remained effective despite the continual addition of new isolates to this collection. The manuscript will be revised to emphasise these points.

      Work is in progress to explore what LIN code thresholds are generally associated with transmission chains. These will likely be the last 7 thresholds (25, 10, 7, 5, 3, 1, 0) as previous work has suggested that isolates linked by transmission within one year are associated with <14 single nucleotide polymorphism differences (De Silva et al., 2016). The results of this analysis will be described in a future article, currently in preparation.

      Harrison, O.B., et al. Neisseria gonorrhoeae Population Genomics: Use of the Gonococcal Core Genome to Improve Surveillance of Antimicrobial Resistance. The Journal of Infectious Diseases 2020.

      De Silva, D., et al. Whole-genome sequencing to determine transmission of Neisseria gonorrhoeae: an observational study. The Lancet Infectious Diseases 2016;16(11):1295-1303.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      The authors try to investigate how the population of microtubules (LSPMB) that originate from sporozoite subpellicular microtubules (SSPM) and are remodelled during liver-stage development of malaria parasites. These bundles shrink over time and help form structures needed for cell division. The authors have used expansion microscopy, live-cell imaging, genetically engineered mutants, and pharmacological perturbation to study parasite development with liver cells.

      A major strength of the manuscript is the live cell imaging and expansion microscopy to study this challenging liver stage of parasite development. It gives important knowledge that PTMs of α-tubulin, such as polyglutamylation and tyrosination/detyrosination, are crucial for microtubule stability. Mutations in α-tubulin reduce the parasite's ability to move and proliferate in the liver cells. The drug oryzalin, which targets microtubules, also blocks parasite development, showing how important dynamic microtubules are at this stage.

      The major problem in the manuscript was the way it flows, as the authors keep shifting from the liver stage to the sporogony stages and then back to the liver stages. It was very confusing at times to know what the real focus of the study is, whether sporozoite development or liver stage development. The flow of the manuscript could be improved. Some of the findings reported here substantiate the previous electron microscopy.

      Overall, the study represents an important contribution towards understanding cytoskeletal remodelling during liver stage infection. The study suggests that tubulin modifications are key for the parasite's survival in the liver and could be targets for new malaria treatments. This is also the stage that has been used for vaccine development, so any knowledge of how parasites proliferate in the liver cells will be beneficial towards intervention approaches.

      We would like to express our sincere gratitude to Reviewer #1 for the positive and encouraging feedback on our manuscript. We are delighted that the reviewer found our experimental design and methodologies appropriate and that our study represents an important contribution to understanding cytoskeletal remodelling during liver stage infection, a critical phase for vaccine development. We are also grateful to the reviewer for highlighting the issue with the manuscript's flow. We acknowledge this limitation and will significantly improve the narrative structure and logical progression in the revised manuscript to ensure clarity and avoid any potential confusion. Thank you again for your thoughtful and constructive comments.

      Reviewer #2 (Public review):

      Summary:

      The authors investigated microtubule distribution and their possible post-translational modifications (PTM) in Plasmodium berghei during development of the liver stage, using either hepatocytes or HeLa cells as models. They used conventional immunofluorescence assays and expansion microscopy with various antibodies recognising tubulin and, in the second part of the work, its candidate PTMs, as well as markers of Plasmodium, in addition to live imaging with a fluorescent marker for tubulin. In the third part of the study, they generated 3 mutants deprived of either the last four residues or the last 11 residues, or where a candidate polyglutamylation site was substituted by an alanine residue.

      Strengths:

      In the first part, microtubules are monitored by a combination of two approaches (IFA and live), revealing nicely the evolution of the sporozoite subpellicular microtubules (SSPM, the sporozoite is the developmental stage present in salivary glands of the mosquitoes and that infects hepatocytes) into a different structure termed liver-stage parasite microtubule bundle (LSPMB). The LSPMB shrinks during the course of parasite development and finally disappears while hemi-spindles emerge over time. Contact points between these two structures are observed frequently in live cells and occasionally in fixed cells, suggesting the intriguing possibility that tubulin might be recycled from the LSPMB to contribute to hemi-spindle formation.

      In the second part, antibodies recognising (1) the final tyrosine found at the C-terminal tail and (2) a stretch of 3 glutamate residues in a side chain are used to monitor these candidate PTMs. Signals are positive at the SSPM, and while it remains positive for polyglutamylation, it becomes negative for the final tyrosine at the LSPM, while a positive signal emerges at hemi-spindles at later stages of development.

      In the last part, the three mutants are fed to mosquitoes, where they show reduced development, the one lacking the alpha-tubulin tail even failing to reach the salivary glands. However, the two other mutants infect HeLa cells normally, whereas sporozoites with the C-terminal tail deletion recovered from the haemolymph did not develop in these cells.

      The first part provides convincing evidence that microtubules are extensively remodelled during the infection of hepatocytes and HeLa cells, in agreement with the spectacular Plasmodium morphogenetic changes accompanying massive and rapid proliferation. The third part brings further confirmation that the C-terminal tail of alpha-tubulin is essential for multiple stages of parasite development, in agreement with previous work (50). Since it is the region where several post-translational modifications take place in other organisms (detyrosination, polyglutamylation, glycylation), it makes sense to propose that the essential function is related to these PTMs also in Plasmodium.

      Weaknesses:

      The significance of tubulin PTM relies on two antibodies whose reactivity to Plasmodium tubulins is unclear (see below). The interpretation of the literature on detyrosination and polyglutamylation is confusing in several places, meaning that the statements about the possible role of these PTMs need to be carefully revisited.

      The authors use the term "tyrosination" but the alpha1-tubulin studied here possesses the final tyrosine when it is synthesised, so it is "tyrosinated" by default. It could potentially be removed by a tyrosine carboxypeptidase of the vasoinhibin family (VASH) as reported in other species. After removal, this tyrosine can be added again by a tubulin-tyrosine ligase (TTL) enzyme. It is therefore more appropriate to talk about detyrosination-retyrosination rather than tyrosination (this confusion is unfortunately common in the literature, see Janke & Magiera, 2020).

      The difficulty here is that there is so far no evidence that detyrosination takes place in Plasmodium. Neither VASH nor TTL could be identified in the Plasmodium genome (ref 31, something we can confirm with our unsuccessful BLAST analyses), and mass spectrometry studies of purified tubulin, albeit from blood stages, did not find evidence for detyrosination (reference 43). Western blots using an antibody against detyrosinated tubulin did not produce a positive signal, neither on purified tubulin, nor on whole parasites (43). Of course, the situation could be different in liver stages, but the question of the detyrosinating enzyme is still there. The existence of a unique Plasmodium system for detyrosination cannot be formally ruled out but given the high degree of conservation of these PTMs and their associated enzymes, it sounds difficult to imagine.

      The fact that the anti-tyrosinated antibody still produced a signal in the cell line where the final tyrosine is deleted raises issues about its specificity. A cross-reactivity with beta-tubulin is proposed, but the Plasmodium beta-tubulin does not carry a final tyrosine, further raising concerns about antibody specificity.

      The interpretation of these results should therefore be considered carefully. There also seems to be some confusion in the function of detyrosination cited from the literature. It is said in line 229 that "tyrosination has been associated with stable microtubules" (33, 34, 50, 55). References 33 and 34 actually show that tyrosinated microtubules turn over faster in neurons or in epithelial cells, respectively, while references 50 and 55 do not study de/retyrosination. The general consensus is that tyrosinated microtubules are more dynamic (see reference 24).

      The situation is a bit different for polyglutamylation since several candidate poly- or mono-glutamylases have been identified in the Plasmodium genome, and at least mono-glutamylation of beta-tubulin has been formally proven, still in bloodstream stages (ref 43). The authors propose that the residue E445 is the polyglutamylation site. To our knowledge, this has not been demonstrated for Plasmodium. This residue is indeed the favourite one in several organisms such as humans and trypanosomes (Eddé et al., Science 1990; Schneider et al., JCS, 1997), and it is tempting to propose it would be the same here. However, TTLLs bind the tubulin tails from their C-terminal end like a glove on a finger (Garnham et al., Cell, 2015), and the presence of two extra residues in Plasmodium tubulins would mean that the reactive glutamate might be in position E447 rather than E445. This is worth discussing.

      On the positive side, it is encouraging to see that signals for both anti-tyrosinated tail and poly-glutamylated side chain are going down in the various mutants, but this would need validation with a comparison for alpha-tubulin signal.

      Line 316: polyglutamylation "is commonly associated with dynamic microtubule behavior (78-80)". Actually, references 78 and 79 show the impact of this PTM on interaction with spastin, and reference 80 discusses polyglutamylation as a marker of stable microtubules in the context of cilia and flagella. The consensus is that polyglutamylated microtubules tend to be more stable (ref24).

      Conclusion:

      The first and the third parts of this manuscript - evolution of microtubules and importance of the C-terminal tails for Plasmodium development - are convincing and well supported by data. However, the presence and role of tubulin PTM should be carefully reconsidered.

      Plasmodium tubulins are more closely related to plant tubulins and are sensitive to inhibitors that do not affect mammalian microtubules. They therefore represent promising drug targets as several well-characterised compounds used as herbicides are available. The work produced here further defines the evolution of the microtubule network in sporozoites and liver stages, which are the initial and essential first steps of the infection. Moreover, Plasmodium has multiple specificities that make it a fascinating organism to study both for cell biology and evolution. The data reported here are elegant and will attract the attention of the community working on parasites but also on the cytoskeleton at large. It will be interesting to have the feedback of other people working on tubulin PTMs to figure out the significance of this part of the work.

      We thank Reviewer #2 for the thoughtful and detailed evaluation of our manuscript. We are pleased that the reviewer found our study elegant and believe it will attract the attention of the broader scientific community, both those working on parasites and those focused on cytoskeleton biology. We also acknowledge the concerns raised regarding the specificity of the antibodies used to detect tubulin post-translational modifications (PTMs), as well as the interpretation of their signals and the current lack of identified detyrosination enzymes in the Plasmodium genome. We agree that these are important limitations, and we will address them thoroughly in the revised manuscript. This includes clarifying our interpretation of tyrosination versus detyrosination, adjusting our claims regarding polyglutamylation sites, and carefully revisiting the literature cited to ensure accurate contextualization of PTM function in microtubule stability.

      We are grateful for the reviewer’s close reading and critical feedback, which will help us substantially improve the clarity, precision, and strength of our manuscript.

      Reviewer #3 (Public review):

      Summary:

      The manuscript by Atchou et al. investigates the role of the microtubule cytoskeleton in sporozoites of Plasmodium berghei, including possible functions of microtubule post-translational modifications (tyrosination and polyglutamylation) in the development of sporozoites in the liver. They also assessed the development of sporozoites in the mosquito. Using cell culture models and in vivo infections with parasites that contain tubulin mutants deficient in certain PTMs, they show that may aspects of the life cycle progression are impaired. The main conclusion is that microtubule PTMs play a major role in the differentiation processes of the parasites.

      However, there are a number of major and minor points of criticism that relate to the interpretation of some of the data.

      We thank Reviewer #3 for the overall positive assessment of our study and for recognizing its contribution to advancing our understanding of Plasmodium biology and malaria pathogenesis. We appreciate the reviewer’s constructive feedback, particularly regarding the interpretation of some of our data. These comments have been very helpful in guiding our revisions, and we have worked to improve both the clarity of our presentation and the precision of our interpretations in the revised manuscript.

      Below, we respond in detail to each of the reviewer’s points.

      Comments:<br /> (1) The first paragraph of "Results" almost suggests that the presence of a subpellicular MT-array in sporozoites is a new discovery. This is not the case, see e.g. the recent publication by Ferreira et al. (Nature Communications, 2023).

      We thank the reviewer for pointing this out and fully agree that the subpellicular microtubule (SPM) array in sporozoites is well established, as documented in earlier work (e.g., Cyrklaff et al., 2007) and more recently by Ferreira et al. (Nat. Commun., 2023). Our intention was not to suggest that the existence of the SSPM is a novel finding. Rather, our study builds on this existing knowledge by demonstrating that these sporozoite-derived microtubules are not disassembled upon hepatocyte entry but are repurposed into a newly described structure, the liver stage parasite microtubule bundle (LSPMB). This reorganization, its persistence into liver stage development, and its dynamic role in microtubule remodeling and nuclear division are, to our knowledge, novel observations. We will revise the manuscript to make this distinction clearer in the introduction and the results section.

      (2) Why were HeLa cells and not hepatocytes (as in Figure 3) used for measuring infection rates of the mutants in Figure 5H and 5L? As I understand, HeLa cells are not natural host cells for invading sporozoites. HeLa cells are epithelial cells derived from a cervical tumour. I am not an expert in Plasmodium biology, but is a HeLa infection an accepted surrogate model for liver stage development?

      We appreciate the opportunity to clarify our experimental model. While HeLa cells are not the natural host cells, they are a well-established and validated in vitro model for studying Plasmodium berghei liver stage development in our lab and others. In this system, the parasite completes its full development and generates infectious merozoites. Numerous studies have successfully used HeLa cells as a liver stage infection model, with key findings subsequently validated in primary hepatocytes or in vivo, confirming its utility as a representative model. We employed this cell line primarily to reduce animal usage in accordance with the 3Rs principles (Replacement, Reduction, Refinement). Importantly, to ensure the biological relevance of our discoveries in HeLa cells, we validated our key findings in primary mouse hepatocytes, as shown in Figure 3. Furthermore, we confirmed the in vivo infectivity of mutant parasite lines that produced typical salivary gland sporozoites through an in vivo infection assay, presented in Figure S4C.

      (3) The tubulin staining in Figures 1A and 1B is confusing and doesn't seem to make sense. Whereas in 1A the antibody nicely stains host and parasite tubulin, in 1B, only parasite tubulin is visible. If the same antibody and the same host cells have been used, HeLa cytoplasmic microtubules should be visible in 1B. In fact, they should be the predominant antigen. The same applies to Figure 2, where host microtubules are also not visible.

      We thank the reviewer for this careful observation regarding the α-tubulin staining in Figures 1A and 1B. The same host cell type (HeLa) and α-tubulin antibody were indeed used in both experiments. Figure 1A shows results from conventional immunofluorescence assays, where both host and parasite microtubules are clearly stained. In contrast, Figure 1B shows the outcome of ultrastructure expansion microscopy (U-ExM), where parasite microtubules appear prominently, while host microtubules are less visible.

      This effect appears to be a technical outcome of the U-ExM protocol, which can differentially preserve or reveal microtubule epitopes. We consistently observed stronger parasite signal across various cell types, including primary hepatocytes (Figure 3A,B). The lack of visible host microtubules in some U-ExM images does not reflect their absence, but rather reduced signal intensity relative to the parasite structures. This is not observed with all antibodies, e.g., host microtubules stain strongly with anti-tyrosinated α-tubulin (Figure 3B), likely reflecting their high tyrosination state.

      To overcome this limitation, we employed PS-ExM and combined PS-ExM/U-ExM approaches (as described in reference 56), which allowed simultaneous high-resolution visualization of both host and parasite microtubule networks. These combined methods are now being used in follow-up studies to investigate host–parasite microtubule interactions in more detail.

      We will clarify this point in the revised manuscript to avoid confusion.

      (4) In Figures 2A and B, the host nuclei appear to have very different sizes in the DMSO controls and in the drug-treated cells. For example, in the 20 µM (-) image (bottom right), the nuclei are much larger than in the DMSO (-) control (top left). If this is the case, expansion microscopy hasn't worked reproducibly, and therefore, quantification of fluorescence is problematic. The scalebar is the same for all panels.

      The expansion microscopy methods used in this study have been rigorously validated for both reproducibility and isotropicity. However, as the reviewer rightly notes, host cell nuclei can vary in size due to several factors, including cell cycle stage, infection status, and the extent of parasite development, all of which can influence host nuclei morphology and size.

      Importantly, the quantifications relevant to our conclusions were focused specifically on parasite structures. We did not rely on host nuclear size or host fluorescence intensity as a quantitative readout in this context. While we acknowledge the observed variability in host nuclear dimensions, it does not compromise the accuracy or reproducibility of the parasite specific measurements central to our study.

      We will clarify this point in the revised figure legend and manuscript.

      (5) I don't quite follow the argument that spindles and the LSPMB are dynamic structures (e.g., lines 145, 174). That is a trivial statement for the spindle, as it is always dynamic, but beyond that, it has only been shown that the structure is sensitive to oryzalin. That says little about any "natural" dynamic behaviour. Any microtubule structure can be destroyed by a particular physical or chemical treatment, but that doesn't mean all structures are dynamic. It also depends on the definition of "dynamic" in a particular context, for example, the time scale of dynamic behaviour (changes within seconds, minutes, or hours).

      We agree that sensitivity to chemical depolymerization alone does not necessarily indicate dynamic behavior, particularly in the absence of data on turnover kinetics or temporal changes.

      Our interpretation was based on two observations: first, that the LSPMB, which derives from the highly stable sporozoite subpellicular microtubules (known to be drug-resistant), becomes susceptible to depolymerization during the liver stage; and second, that the LSPMB gradually shrinks over time during parasite development. These features suggested a transition toward a more dynamic state compared to its origin. However, we fully agree that “dynamic” is a context-dependent term and that direct evidence such as turnover rates or structural changes on short time scales, is required to rigorously define microtubule dynamics.

      We will revise the manuscript to clarify our use of this term and explicitly acknowledge the need for further studies to characterize the timescale and mechanisms underlying LSPMB remodeling.

      (6) I am not sure what part in the story EB1 plays. The data are only shown in the Supplements and don't seem to be of particular relevance. EB1 is a ubiquitous protein associated with microtubule plus ends. The statement (line 192) that it "may play a broader role..." is unsubstantiated and cannot be based merely on the observation that it is expressed in a particular life cycle stage.

      We agree that EB1 is a ubiquitous microtubule plus-end binding protein and that its presence alone does not imply a novel function. Previous studies (e.g., Maurer et al., 2023; Yang et al., 2023; Zeeshan et al., 2023) have focused on its role during Plasmodium sexual stages, while its expression during liver and mosquito stages has not been previously documented.

      Our data extend this knowledge by showing that EB1 is also expressed during liver stage development, particularly during the highly mitotic schizont phase. While we agree that this observation alone does not prove functional involvement, it raises the possibility of a broader role for EB1 in regulating microtubule dynamics beyond sexual stages. To avoid overinterpretation, we have presented these findings in the supplementary material and will revise the manuscript to tone down speculative statements and clearly frame this as a preliminary observation that warrants further investigation.

      (7) Line 196 onwards: The antibody IN105 is better known in the field as polyE. Maybe that should be added in Materials and Methods. Also, the antibody T9028 against tyrosinated tubulin is poorly validated in the literature and rarely used. Usually, researchers in this field use the monoclonal antibody YL1/2. I am not sure why this unusual antibody was chosen in this study. In fact, has its specificity against tyrosinated α-tubulin from Plasmodium berghei ever been shown? The original antigen was human and had the sequence EGEEY. The Plasmodium sequence is YEADY and hence very different. It is stated that the LSPMB is both polyglutamylated and tyrosinated. This is unusual because polyglutamylated microtubules are usually indicative of stable microtubules, whereas tyrosinated microtubules are found on freshly polymerised and dynamic microtubules. However, a co-localisation within the same cell has not been attempted. This is, however, possible since polyE is a rabbit antibody and T9028 is a mouse antibody. I suspect that differences or gradients along the LSPMB would have been noticed. Also, in lines 207/208, it is said that tyrosination disappears after hepatocyte invasion, which is shown in Figure 3. However, in Figure 3A, quite a lot of positive signals for tyrosination are visible in the 54 and 56 hpi panels.

      First, we acknowledge that the IN105 antibody is more widely known as "polyE" in the field. We will update the Materials and Methods section accordingly to reflect this nomenclature.

      Regarding the use of the T9028 antibody against tyrosinated α-tubulin: we agree that this monoclonal antibody is less commonly used than YL1/2, and we appreciate the reviewer drawing attention to this. The original antigen for T9028 is based on the mammalian C-terminal sequence EGEEY, which differs from the Plasmodium α1-tubulin sequence (YEADY). Like many in the field, we face the challenge that most available antibodies are raised against mammalian epitopes, and specificity in Plasmodium can vary. Nonetheless, the literature (e.g., Hirst et al., 2022; Fennell et al., 2008) has demonstrated that tyrosination occurs in Plasmodium α1-tubulin, using anti-tyrosination antibodies including YL1/2.

      Following the reviewer’s excellent suggestion, we are currently repeating the key experiments using the YL1/2 antibody to compare staining patterns directly with those obtained using T9028. We will include these results in the revised manuscript.

      Concerning the potential co-localization of polyglutamylation and tyrosination on the LSPMB: we agree that this is an interesting and testable hypothesis. In the current manuscript, Figures 3A and 3B were generated from independent experiments, and thus co-localization was not assessed. However, as the reviewer correctly notes, polyE and T9028 antibodies are raised in rabbit and mouse, respectively, making co-staining feasible. We will follow up on this experimentally and, if feasible within our revision timeline, include data in the revised version or highlight this as a future direction.

      Finally, with regard to Figure 3 and the observation that tyrosination appears to persist at 54 and 56 hpi (Figure 3B): the reviewer is correct that tyrosination signal is still detectable at these time points. Our statement that tyrosination “disappears after hepatocyte invasion” was intended to refer to an overall decrease in signal intensity during early liver stage development, with a reappearance at later stages (e.g., cytomere formation). We will rephrase this section for greater clarity and ensure that figure annotations and legends unambiguously reflect the dynamics observed.

      (8) In line 229, it is stated that tyrosination "has previously been associated with stable microtubule in motility". This statement is not correct. In fact, none of the cited references that apparently support this statement show that this is the case. On the contrary, stable microtubules, such as flagellar axonemes, are almost completely detyrosinated. Therefore, tyrosination is a marker for dynamic microtubules, whereas detyrosinated microtubules are indicative of stable microtubules. This is an established fact, and it is odd that the authors claim the opposite.

      We fully agree that in canonical eukaryotic systems, tyrosinated microtubules are generally markers of dynamic microtubule populations, whereas detyrosinated microtubules are typically associated with stability particularly in structures such as flagellar axonemes.

      Our original statement will be corrected. In our study, we observed that tyrosinated microtubules are prevalent in invasive stages (sporozoites and merozoites), while detyrosinated forms become more prominent during intracellular liver stage development. This pattern is consistent with the established link between tyrosination and dynamic microtubules.

      What is particularly intriguing in Plasmodium is the apparent cycling of tyrosination despite the absence of known tubulin tyrosine ligase (TTL) homologs in the genome. This suggests either a highly divergent enzyme or the involvement of host cell factors, a hypothesis supported by the reappearance of tyrosinated microtubules during liver stage schizogony (Figure 3B).

      We will revise the relevant text and the Discussion section to reflect these mechanistic considerations more accurately and to avoid misrepresenting established principles of microtubule biology.

      (9) Line 236 onwards: Concerning the generation of tubulin mutants, I think it is necessary to demonstrate successful replacement of the wild-type allele by the mutant allele. I am sure the authors have done this by amplification and subsequent sequencing of the genomic locus using PCR primers outside the plasmid sequences. I suggest including this information, e.g., by displaying the chromatograph trace in a supplementary figure. Or are the sequences displayed in Figure S3B already derived from sequenced genomic DNA? This is not described in the Legend or in Materials and Methods. The left PCR products obtained for Figure S3 B would be a suitable template for sequencing.

      Indeed, these data are presented in Figure 4B and the corresponding sequence data are shown in Figure S3B. We appreciate the reviewer’s suggestion, which will help improve the transparency and reproducibility of our methodology.

      (10) It is also important to be aware of the fact that glutamylation also occurs on β-tubulin. This signal will also be detected by polyE (IN105). Therefore, it is surprising that IN105 immunofluorescence is negative on the C-term Δ cells (Figure S3 D). Is there anything known about confirmed polyglutamylation sites on both α- and β-tubulins in Plasmodium, e.g., by MS? In Toxoplasma, both α- and β-tubulin have been shown to be polyglutamylated.

      Indeed, polyglutamylation is known to occur not only on α-tubulin but also on β-tubulin in many organisms, including Toxoplasma gondii, and the polyE (IN105) antibody is expected to detect polyglutamylation on both tubulin isoforms.

      The parasites shown in Figure S3D correspond to mutant lines originally generated by Spreng et al. (2019): the IntronΔ mutant (with deletion of introns in the Plasmodium α1-tubulin gene) and the C-termΔ mutant (with deletion of the final three C-terminal residues: ADY). As the reviewer correctly notes, this particular C-terminal deletion does not include the predicted polyglutamylation site (E445 or E447, depending on alignment), and thus should not abolish all polyglutamylation. However, in our experiments, the IN105 signal is substantially reduced in this mutant. This may suggest that structural alterations in the tubulin tail affect accessibility of the polyglutamylation epitope or influence the modification itself though we cannot exclude other possibilities, including changes in antibody recognition.

      To date, polyglutamylation sites in Plasmodium tubulins have not been definitively confirmed by mass spectrometry. However, a recent MS-based study (reference 43) detected monoglutamylation on β-tubulin in blood stage parasites. Direct MS evidence for polyglutamylation of either α- or β-tubulin in Plasmodium liver stages is still lacking. We will clarify these points in the revised manuscript to avoid potential confusion and to highlight the need for future biochemical validation of PTM sites.

      (11) Figure S3 is very confusing. In the legend, certain intron deletions are mentioned. How does this relate to posttranslational tubulin modifications? The corresponding section in Results (lines 288-292) is also not very helpful in understanding this.

      The parasite lines shown in Figure S3D were originally generated by Spreng et al. (2019) and are not directly part of the main set of PTM-targeted mutants described in our study. Specifically, the IntronΔ line carries deletions in introns of the Plasmodium α1-tubulin gene, while the C-termΔ line lacks the final three C-terminal residues (ADY). These lines were included for comparative purposes to explore whether structural changes in α-tubulin could impact polyglutamylation signal, as detected by the polyE (IN105) antibody.

      We acknowledge that the figure legend and corresponding text (lines 288–292) did not adequately explain the rationale for including these control lines. We will revise both the legend and Results section to more clearly describe the origin, purpose, and relevance of these mutants to the overall study.

      (12) Figure 4E doesn't look like brightfield microscopy but like some sort of fluorescent imaging. In Figure 4C, were the control (NoΔ) cells with an integrated cassette, but no mutations, or non-transgenic cells?

      The reviewer is absolutely correct: Figure 4E shows a fluorescent image acquired using widefield microscopy and not a brightfield image. We will revise the figure legend accordingly to avoid confusion. The “BF” (brightfield) label applies only to the left panel in Figure 4C, which depicts oocysts imaged using transmitted light.

      Regarding the controls labeled "NoΔ" in Figure 4C, we confirm that these parasites contain the integrated selection cassette but do not harbor any mutations in the target gene. They serve as proper integration controls, allowing us to distinguish the effects of the point mutations or deletions introduced in the experimental lines.

      (13) It is difficult to understand why the TyΔ and the CtΔ mutants still show quite a strong signal using the anti-tyrosination antibody. If the mutants have replaced all wild-type alleles, the signal should be completely absent, unless the antibody (see my comment above concerning T9028) cross-reacts with detyrosinated microtubules. Therefore, the quantitation in Figures 5F and 5G is actually indicative of something that shouldn't be like that. The quantitation of 5F is at odds with the microscopy image in 5D. If this image is representative, the anti-Ty staining in TyΔ is as strong as in the control NoΔ.

      We agree that the persistence of anti-tyrosination signal in the TyΔ and CtΔ mutant lines is unexpected, given that all wild-type alleles were replaced. This discrepancy has led us to further investigate the specificity of the T9028 antibody, as raised in the reviewer’s earlier comment. To address this concern, we are currently repeating the key experiments using the well-established YL1/2 monoclonal antibody, which is widely accepted for detecting tyrosinated α-tubulin in other systems.

      We also acknowledge that Figure 5F shows residual tyrosination signal, and the reviewer is correct that this should not occur if the modified residues are the exclusive PTM sites. One possible explanation is that adjacent residues or even alternative tubulin isoforms may serve as substrates. While α1-tubulin is the dominant isoform in Plasmodium, low-level expression of α2-tubulin has been detected in liver stages based on transcriptomic data, and it may contribute to the observed signal.

      Regarding the apparent discrepancy between the quantification in Figure 5F and the representative image in Figure 5D, we will revise the figure legend to clarify that image selection aimed to show detectable signal, not necessarily the average phenotype. We will also reassess and, if needed, repeat the quantification with improved image sets to ensure accuracy and consistency.

      We will revise the manuscript to reflect these points and include a more nuanced interpretation of the residual staining in the mutant lines.

      (14) The statement that the failure of CtΔ mutants to generate viable sporozoites is due to the lack of microtubule PTMs (lines 295-296) is speculative. The lack of the entire C-terminal tail could have a number of consequences, such as impaired microtubule assembly or failure to recruit and bind associated proteins. This is not necessarily linked to PTMs. Also, it has been shown in yeast that for microtubules to form properly and exquisite regulation (proteostasis) of the ratio between α- and β-tubulin is essential (Wethekam and Moore, 2023). I am not sure, but according to Materials and Methods (line 423), the gene cassettes for replacing the wild-type tubulin gene with the mutant versions contain a selectable marker gene for pyrimethamine selection. Are there qPCR data that show that expression levels of mutant α-tubulin are more or less the same as the wild-type levels?

      We agree that attributing the developmental failure of the CtΔ mutants solely to the absence of microtubule post-translational modifications (PTMs) is speculative. As the reviewer rightly points out, deletion of the entire C-terminal tail may have multiple effects, including impaired microtubule assembly, altered α/β-tubulin stoichiometry, or disruption of interactions with essential microtubule-associated proteins (MAPs). These consequences may arise independently of PTMs.

      That said, we note that PTMs particularly polyglutamylation, can modulate MAP binding by altering the surface charge of microtubules (Genova et al., 2023; Mitchell et al., 2010). Therefore, while PTM loss may be a contributing factor, we acknowledge that the phenotype likely results from a combination of mechanisms. We will revise the relevant section of the manuscript to present a more cautious and balanced interpretation.

      Regarding the reviewer’s question on expression levels: although the replacement constructs include a pyrimethamine resistance cassette, we have not yet quantified α-tubulin transcript levels by qPCR. In the interim, the study by Spreng et al. (2019) (reference 50) on a related α1-tubulin nutations provides valuable insight. They observed no difference in mRNA levels in day 12 oocysts, yet reported fainter microtubule staining and shorter sporozoites, suggesting a post-transcriptional mechanism affecting protein expression or function in later stages. Furthermore, the phenotypic spectrum across their mutant panel (Suppl. Fig. 3 D and E) implies that robust α-tubulin regulation is highly sensitive to specific sequences.

      We acknowledge this as a current limitation in our study and will address it in the revised manuscript, noting that direct measurement of transcript levels is a key area for future investigation.

      (15) In the Discussion, my impression is that two recent studies, the superb Expansion Microscopy study by Bertiaux et al. (2021) and the cryo-EM study by Ferreira et al. (2023), are not sufficiently recognised (although they are cited elsewhere in the manuscript). The latter study includes a detailed description of the microtubule cytoskeleton in sporozoites. However, the present study clearly expands the knowledge about the structure of the cytoskeleton in liver stage parasites and is one of the few studies addressing the distribution and function of microtubule post-translational modifications in Plasmodium.

      Indeed, our work builds upon the established knowledge from Bertiaux et al. (2021) and the cryo-EM study by Ferreira et al. (2023), as rightly mentioned by the reviewer. We agree that these foundational studies, combined with our findings, will significantly expand the understanding of Plasmodium biology and cytoskeleton dynamics across its life cycle and will open the door for further investigations. We are grateful for this suggestion and will ensure these key studies are appropriately acknowledged in the revised manuscript.

      (16) I somewhat disagree with the statement of a co-occurrence of polyglutamylated and tyrosinated microtubules. I think the resolution is too low to reach that conclusion. As this is a bold claim, and would be contrary to what is known from other organisms, it would require a more rigorous validation. Given the apparent problems with the anti-Ty antibody (signal in the TyΔ mutant), one should be very cautious with this claim.

      This is a very important point to clarify. As mentioned previously, the initial experiments for these modifications were performed independently. It is established that sporozoite subpellicular microtubules exhibit both tyrosination and polyglutamylation. We will revise the manuscript to temper this statement and clearly indicate that the co-occurrence of these PTMs remains a hypothesis that requires more rigorous validation. As suggested, we are now conducting additional co-staining experiments using the better validated YL1/2 antibody to re-express and directly compare the distribution of both PTMs within the same cell. These follow-up experiments will help clarify whether both modifications occur simultaneously on the same microtubule structures in Plasmodium liver stages.

      (17) In the Discussion (lines 311 and 377), it is again claimed that tyrosinated microtubules are "a well-known marker of stable microtubules". This statement is completely incorrect, and I am surprised by this serious mistake. A few lines later, the authors say that polyglutamylated is "commonly associated with dynamic microtubule behaviour". Again, this is completely incorrect and is the opposite of what is firmly established in the literature. Polyglutamylation and detyrosination are markers of stable microtubules.

      Indeed, in canonical eukaryotic systems, tyrosinated microtubules are generally considered markers of dynamic microtubule populations, whereas detyrosinated and polyglutamylated microtubules are more commonly associated with stability.

      We acknowledge this mistake and will revise the Discussion to correct these statements accordingly. In the context of Plasmodium, our observations suggest an unusual regulation of microtubule dynamics, which may reflect parasite-specific adaptations. For example, we observed tyrosinated α-tubulin in the stable subpellicular microtubules of sporozoites structures typically known for their exceptional stability. This atypical association implies either non-canonical roles for tyrosination or parasite-specific mechanisms for modulating microtubule properties. Additionally, the presence of both PTMs at different stages of development and on different microtubule populations suggests tightly regulated spatial and temporal modulation of microtubule function.

      We will carefully revise the relevant sections of the manuscript to remove incorrect generalizations and ensure accurate representation of the current consensus in the field, while emphasizing the possibility of Plasmodium-specific adaptations that merit further study.

      (18) In line 339, the authors interpret the residual antibody staining after the introduction of the mutant tubulin as a compensatory mechanism. There is no evidence for this. More likely explanations are firstly the quality of the anti-Ty-antibody used (see comment above), and the fact that also β-tubulin carries C-terminal polyglutamylation sites, which haven't been investigated in this study. PTMs on β-tubulin are not compensatory, but normal PTMs, at least in all other organisms where microtubule PTMs have been investigated.

      As mentioned above, we are currently repeating the key experiments with the [YL1/2] antibody, as suggested. Furthermore, we fully agree with the reviewer's point regarding polyglutamylation on β-tubulin. The C-terminal tail of β-tubulin does indeed contain polyglutamylation sites. As we noted in the manuscript (Lines 340-352), this aspect has not been investigated in the present study, and we acknowledge it as a valuable direction for future research. We will revise the text accordingly to avoid overinterpretation and to more accurately reflect the limitations of our current data.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      The authors define the principles that, based on first principles, should be guiding the optimisation of trascription factors with intrinsically disordered regions (IDR). The first part of the study defines the following principles to optimize the binding affinities to the genome in the receiving region that is called the ”antenna”: (i) reduce the target to IDR-binding distance on the genome, (ii) optimise the distance betwee the DNA binding domain and the binding sites on the IDR to be as close as possible to the distance between their binding sites on the genome; (iii) keep the same number of binding sites and their targets and modulate this number with binding strength, reducing them with increased strenght; (iv) modulate the binding strenght to be above a threshold that depends on the proportion of IDR binding sites in the antenna. The second part defines the scaling of the seach time in function of key parameters such as the volume of the nucleus, and the size of the antenna, derived as a combination of 3D search of the antenna and 1D ”octopusing” on the antenna. The third part focuses on validation, where the current results are compared to binding probabilith data from a single experiment, and new experiment are proposed to further validate the model as well as testing designed transcription factors.

      Strengths:

      The strength of this work is that it provides simple, interpretable and testable theoretical conclusions. This will allow the derived design principles to be understood, evaluated and improved in the future. The theoretical derivations are rigorous. The authors provides a comparison to experiments, and also propose new experiments to be performed in the future, this is a great value in the paper since it will set the stage and inspire new experimental techniques. Further, the field needs inspiration and motivations to develop these techniques, since they are required to benchmark the transcription factors designed with the methods presented in this paper, as well as to develop novel data based or in vivo methods that would greatly benefit the field. As such, this paper is a fundamental contribution to the field.

      Weaknesses:

      The model assumption that the interaction between the transcription factor and the DNA outside of the antenna region is negligible is probably too strong for many/most transcription factors, particularly in organisms with a longer genome than yeasts. The model presents many first principles to drive the design of transcription factor, but arguably, other principles and mechanisms might also play a role by being beneficial to the search and binding process. Specifically: (i) a role of the IDR in complex formation and cooperativity between multiple trascription factors, (ii) ability of the IDR to do parallel searching based on multiple DNA binding sites spaced by disordered regions, (iii) affinity of the IDR to specific compartmentalisations in the nucleus reducing the search time, etc. The paper would be improved by a discussion over alternative mechanisms.

      We thank the reviewer for highlighting that our work delivers simple, interpretable and rigorously derived conclusions, backed by experimental comparison and concrete proposals for future studies.

      Regarding interactions outside the antenna region, Supplementary S10 shows that the non-specific IDR–DNA interactions (on the order of 1 kBT) only slightly alter the 3D diffusion coefficient and thus do not affect our conclusions regarding the optimal search process.

      We have also added sentences in the discussion section regarding the alternative mechanism.

      Reviewer #2 (Public review):

      Summary:

      This is an interesting theoretical exploration of how a flexible protein domain, which has multiple DNAbinding sites along it, affects the stability of the protein-DNA complex. It proposes a mechanism (”octopusing”) for protein doing a random walk while bound to DNA which simultaneously enables exploration of the DNA strand and stability of the bound state.

      Strengths:

      Stability of the protein-DNA bound state and the ability of the protein to perform 1d diffusion along the DNA are two properties of a transcription factor that are usually seen as being in opposition of each other. The octopusing mechanism is an elegant resolution of the puzzle of how both could be accommodated. This mechanism has interesting biological implications for the functional role of intrinsically disordered domains in transcription factor (TF) proteins. They show theoretically how these domains, if flexible and able to make multiple weak contacts with the DNA, can enhance the ability of the TF to efficiently find their binding site on the DNA from which they exert control over the transcription of their target gene. The paper concludes with a comparison of model predictions with experimental data which gives further support to the proposed model. Overall, this is an interesting and well executed theoretical paper that proposes an interesting idea about the functional role for IDR domains in TFs.

      Weaknesses:

      IDR domains are assumed flexible which I believe is not always the case. Also, I’m not sure how ubiquitous are the assumed binding sites on the DNA for multiple subdomains along the IDR. These assumptions though seem like interesting points of departure for further experiments.

      We thank the reviewer for their careful and insightful evaluation of our work. In particular, we appreciate your emphasis on the inherent trade-off between binding stability and one-dimensional diffusion, and your recognition of how the octopusing mechanism elegantly reconciles these conflicting requirements.

      To address the flexibility of TFs with IDRs, we incorporated the spring’s rest length—effectively introducing tunable rigidity—in Supplementary Section S1, and we show that our design principles for binding probability remain robust. Indeed, this is a highly interesting point; a comprehensive study will require more detailed modeling alongside experimental validation.

      We acknowledge that the current evidence for IDR-directed DNA binding is primarily derived from a limited number of well-studied cases, particularly Msn2 in yeast, and the ubiquity of this mechanism across diverse transcription factors remains to be established.

      Reviewer #1 (Recommendations for the authors):

      The paper jumps to fast to the results, an larger introduction might improve the paper, the current introduction jumps too fast to results. Further, line 50, I don’t think that the figure is properly referenced. The formula 2 is confusing since what is the target volume V1 is not explained in the context of the formula, please expand the explanations.

      We appreciate the reviewer’s valuable recommendations. We have expanded the Introduction, clarified V<sub>1</sub>, and updated the line 50.

      Reviewer #2 (Recommendations for the authors):

      I have some mostly minor suggestions to the authors for improving the manuscript:

      In the abstract and introduction on at least two occasions the authors talk about IDRs as though they’re necessarily flexible. My understanding is that, while this is a very reasonable assumption, I don’t think this is something we know with any certainty for most IDRs. If the authors agree with my assessment I think they should reflect this uncertainty in the writing.

      Thank you for the recommendations. We revised the wording to reflect the uncertainty, changing it to: “... commonly assumed to behave as a long, flexible...” and “...can be assumed as flexible....”.

      It took me a bit of time to figure out what’s going on in Figure 1b. To help the reader I would suggest labeling the DBD targets (yellow square) and the IDR targets (gray squares) as such. The figure also left me guessing whether the DBD domain can bind to the IDR targets non-specifically? (I presume not.) This also brought a slightly bigger question into focus for me, wouldn’t the presence of the IDR binding ”sites” (since these ”sites” are on the protein I think the term ”domains” instead of ”sites” ) mean that this would increase the time the protein is bound non-specifically somewhere far from the target thereby increasing the search time. Or is the ability of the protein to bind specifically to DNA away from the DBD target ignored?

      We have labeled the DBD targets and IDR targets in the figure. ‘Domains’ usually refers to structured parts; we keep using ‘sites’ and clarify that they correspond to short linear motifs.

      The reviewer is correct. Our model omits any non-specific binding between the DBD and IDR-binding targets, as well as between the TF and other DNA regions. If such interactions were to substantially lengthen the search time, they would effectively revert our mechanism to the classical bacterial facilitateddiffusion model, which is generally considered inappropriate for IDR-mediated TF search in eukaryotic cells. However, Supplementary Figure S10 demonstrates that non-specific IDR–DNA interactions induce only marginal changes in the effective three-dimensional diffusion coefficient within complex chromatin environments, and therefore do not alter our conclusions regarding the optimal search process.

      In Equation 2 and the text that follows I was left wondering what is the target volume V1. Also, I think it would be helpful to the reader to give them a sense of scale for the dimension full quantities appearing in Equation 2. This is done later when comparing the theory to experimental data, but I think it would be helpful to give a sense of size earlier in the manuscript.

      V<sub>1</sub> denotes the volume of the IDR–binding target region, which is on the order of bp<sup>3</sup>. f(d,l<sub>0</sub>) has units of inverse volume. We have included the units and specified the order of magnitude of V<sub>1</sub> after Equation 2.

      The binding energy EB is discussed a number of times but it wasn’t clear to me that this quantity referred to the energy per IDR site on the DNA or the total energy when the IDR is bound to DNA. In Figure 1 it would seem that the model allows only one IDR domain bound at a given time but I think the model allows for multiple IDR domains to be bound to the IDR target sites simultaneously. Right? Maybe make this clear in the Figure and the text.

      E<sub>B</sub> denotes the binding energy per binding site, where each site corresponds to a short linear motif. Yes, we allow for multiple IDR domains to be bound to the IDR target sites simultaneously. We have clarified the definition of E<sub>B</sub> and adjusted the figure slightly to avoid any misunderstanding.

      After Eq 4 the discussion suggests that for ϕ << 1 the threshold energy is much greater than kBT, but that’s hard to imagine given that the logarithmic dependence of the latter on the former. Also in Figure 2d it seems that the threshold energy is about 8 kBT. Clearly this is not a big deal, just thought the authors might want to revise the language.

      Thank you. We now clarify the sentence using the representative values of ϕ and E<sub>th</sub> after Equation 4.

      Right after Figure 2 there is a discussion of the different parameters that the authors vary. I suggest having a figure that illustrates these parameters (possibly in Figure 1b) to make it easier to follow the discussion.

      We have added explanations of the relevant parameters in Figure 1 for clarity.

      When discussing the dynamics of search the result stated is that the search time is minimum for a specific value of R. I think it would be useful to translate this into a TF concentration. Also, if R represents the radius of the cells nucleus 1/6 um is almost an order of magnitude smaller than the size of a typical nucleus. Is this a worry? Either way some clarification of this number would be helpful.

      Thank you for the suggestion. As noted later in this section, we have translated R into an equivalent TF concentration, and we clarify that we assume the scaling of the minimum search time remains unchanged when extrapolated to the size of a typical nucleus.

      There is a comment regarding the role of the DNA persistence length and how it was not accounted for. It would be helpful if the authors could add a sentence or two explains how a folded DNA conformation, as is the case in the nucleus, would affect their calculation. (So that the reader gets an idea without having to get into the details described in the Supplement).

      Thank you. We have revised the sentence to: “We have verified that reducing the DNA persistence length, which promotes increased DNA coiling, results in only a modest increase in mean search time. Even under extreme coiling conditions, the increase remains below 30% of the baseline value, as detailed in Supplementary S9.”.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      In this work, the authors apply TDCS to awake and anesthetized macaques to determine the effect of this modality on dynamic connectivity measured by fMRI. The question is to understand the extent to which TDCS can influence conscious or unconscious states. Their target was the PFC. During the conscious states, the animals were executing a fixation task. Unconsciousness was achieved by administering a constant infusion of propofol and a continuous infusion of the muscle relaxant cisatracurium. They observed the animals while awake receiving anodal or cathodal hd-TDCS applied to the PFC. During the cathodal stimulation, they found disruption of functional connectivity patterns, enhanced structure-function correlations, a decrease in Shannon entropy, and a transition towards patterns that were more commonly anatomically based. In contrast under propofol anesthesia anodal hd-TDCS stimulation appreciably altered the brain connectivity patterns and decreased the correlation between structure and function. The PFC stimulations altered patterns associated with consciousness as well as those associated with unconsciousness.

      Strengths: 

      The authors carefully executed a set of very challenging experiments that involved applying tDCS in awake and anesthetized non-human primates while conducting functional imaging.

      We thank the Reviewer for summarising our study and for his appreciation of the highly challenging experiments we performed.

      Weaknesses:

      The authors show that tDCS can alter functional connectivity measured by fMRI but they do not make clear what their studies teach the reader about the effects of tDCS on the brain during different states of consciousness. No important finding is stated contrary to what is stated in the abstract. It is also not clear what the work teaches us about how tDCS works nor is it clear what are the "clinical implications for disorders of consciousness." The deep anesthesia is akin to being in a state of coma. This was not discussed.  

      While the authors have executed a set of technically challenging experiments, it is not clear what they teach us about how tDCS works, normal brain neurophysiology, or brain pathological states such as disorders of consciousness.

      We thank the reviewer for his comments. We agree that we could better highlight the value and implications of our work, and we take this opportunity to improve our manuscript according to the suggestions.

      Actions in the text: We have added several new paragraphs in the Discussion section, considering these comments and other related remarks from the Reviewing Editor (see below our answer to the first comment of the Reviewing Editor: REC#1).

      Reviewer #2 (Public review): 

      General comments: 

      The authors investigated the effects of tDCS on brain dynamics in awake and anesthetized monkeys using functional MRI. They claim that cathodal tDCS disrupts the functional connectivity pattern in awake monkeys while anodal tDCS alters brain patterns in anesthetized monkeys. This study offers valuable insight into how brain states can influence the outcomes of noninvasive brain stimulation. However, there are several aspects of the methods and results sections that should be improved to clarify the findings.

      We thank the Reviewer for the summary and appreciation of our study.  

      Major comments 

      For the anesthetized monkeys, the anode location differs between subjects, with the electrode positioned to stimulate the left DLFPC in monkey R and the right DLPFC in monkey N. The authors mention that this discrepancy does not result in significant differences in the electric field due to the monkeys' small head size. However, this is incorrect, as placing the anode on the left hemisphere would result in a much lower EF in the right DLPFC than placing the anode on the right side. Running an electric field simulation would confirm this. Additionally, the small electrode size suggested by the Easy cap configuration for NHP appears sufficient to stimulate the targeted regions focally. If this interpretation is correct, the authors should provide additional evidence to support their claim, such as a computational simulation of the EF distribution.

      We thank the Reviewer for the comments. First, regarding the reviewer’s statement that placing the anode on the left hemisphere would result in a much lower EF in the right DLPFC than placing the anode on the right side, we would like to clarify that we did not use a typical 4 x 1 concentric ring high-definition setup (which consists of a small centre electrode surrounded by four return electrodes), but a two-electrode montage, with one electrode over the left or right PFC and the other one over the contralateral occipital cortex. According to EF modelling papers, a 4 x 1 high-definition setup would produce an EF that is focused and limited to the cortical area circumscribed by the ring of the return electrodes (Datta et al. 2009; Alam et al. 2016). Therefore, targeting the left or right DLPFC with a 4 x 1 setup would produce an EF confined to the targeted hemisphere of the PFC. In contrast, we expect the brain current flow generated with our 2-electrode setup to be broader, despite the small size of the electrodes,  because there is no constraint from return electrodes. Thus, with our setup, the current is expected to flow between the PFC and the occipital cortex (see also our responses to comments R3.3., R.E.C.#2.1. and R.E.C.#2.2.). 

      Second, we would like to point out that in awake experiments, in which we stimulated the right PFC of both monkeys, there was no gross evidence of left or right asymmetry in the computed functional connectivity patterns (Figure 3A, Figure 3 - figure supplement 2A; Figure 5A). These results, showing that our stimulation montages did not induce asymmetric dynamic FC changes in NHPs, support the idea that our setups did not generate EFs that were spatially focused enough to alter brain activity in one hemisphere substantially more than the other.

      Third, it is also worth noting that current evidence suggests that human brains are significantly more lateralized than those of macaques. Macaque monkeys have been found to have some degree of lateralized networks, but these are of lower complexity, and the lateralization is less pronounced and functionally organized than in humans. (Whey et al., 2014; Mantini et al., 2013). This suggests that, even if the stimulation were focal enough to stimulate the left or the right part of the PFC only, the behavioural effects would likely be similar.

      We strongly agree with the reviewer that conducting an EF simulation would be valuable to confirm our expectations and to gain a comprehensive view of the characteristics of the EFs generated with our different setups in NHPs. However, the challenge is in the fact that EF computational models have been developed for humans, and their use in NHPs is not straightforward due to significant anatomical differences. For example, macaque monkeys are distinct from humans in terms of brain size, shape and cortical organisation, skull thickness, and the presence of muscles, as well as different tissue conductivities (Lee et al. 2015; Datta et al.2016; Mantell et al. 2023). We plan to address this in future work.

      Actions in the text: In the Materials and Methods section, we have modified the sentence: “Because of the small size of the monkey's head and because we did not use return electrodes to restrict the current flow (as is achieved with typical high-definition montages (Datta et al. 2009; Alam et al. 2016)), we expected that tDCS stimulation with the two symmetrical montages would result in nearly equivalent electric fields across the monkey’s head and produce roughly similar effects on brain activity.” 

      We also added a new sentence about EF simulation: 

      “This would need to be confirmed by running an electric field simulation. However, computational electric field models have been developed for humans, and their use in NHPs is not straightforward due to anatomical specificities. Indeed, monkeys differ from humans in terms of brain size, shape and cortical organization, skull thickness, tissue conductivities and the presence of muscles (Lee et al. 2015; Datta et al. 2016; Mantell et al. 2023). Modelling of EFs generated with the specific tDCS montages employed in this study will be performed in future work.”

      For the anesthetized monkeys, the authors applied 1 mA tDCS first, followed by 2 mA tDCS. A 20-minute stimulation duration of 1 mA tDCS is strong enough to produce after-effects that could influence the brain state during the 2 mA tDCS. This raises some concerns. Previous studies have shown that 1 mA tDCS can generate EF of over 1 V/m in the brain, and the effects of stimulation are sensitive to brain state (e.g., eye closed vs. eye open). How do the authors ensure that there are no after-effects from the 1 mA tDCS? This issue makes it challenging to directly compare the effects of 1 mA and 2 mA stimulation.

      We agree with the reviewer's comment that 1 mA tDCS may induce aftereffects, as has been observed in several human studies (e.g., (Jamil et al. 2017, 2020). Although the differences between the 1 mA post-stimulation and baseline conditions were not significant in our analyses, it's still possible that the stimulation produced some effects below the threshold of significance that may contribute, albeit weakly, to the changes observed during and after 2 mA stimulation. We have, therefore, amended the paper in line with the reviewer's comments.

      Actions in the text: We have added the following text in the Result section: 

      “While several human studies have reported that 1 mA transcranial stimulation induces aftereffects (e.g., (Jamil et al. 2017, 2020; Monte-Silva et al. 2010), the differences between the 1 mA post-stimulation and baseline conditions were not significant in our analyses. However, it is still possible that the 1 mA stimulation produced some effects below the threshold of significance that may contribute to the changes observed during and after the 2 mA stimulation.”

      The occurrence rate of a specific structural-functional coupling pattern among random brain regions shows significant effects of tDCS. However, these results seem counterintuitive. It is generally understood that noninvasive brain stimulation tends to modulate functional connectivity rather than structural or structural-functional connectivity. How does the occurrence rate of structural-functional coupling patterns provide a more suitable measure of the effectiveness of tDCS than functional connectivity alone? I would recommend that the authors present the results based on functional connectivity itself. If there is no change in functional connectivity, the relevance of changes in structural-functional coupling might not translate into a meaningful alteration in brain function, making it unclear how significant this finding is without corresponding functional evidence.

      First, of all, we would like to make it clear that the occurrence rate of patterns as a function of their SFC is not intended to be used or seen as a ‘better’ measure of the efficacy of tDCS. Instead, it is one aspect of the effects of tDCS on whole-brain functional cortical dynamics, obtained from refined measures (phase-coherences), that specifically addresses the coupling between structure and function. This type of analysis is further motivated by its increasing use in the literature due to its suspected relationship to wakefulness (e.g., (Barttfeld et al. 2015, Demertzi et al. 2019; Castro et al. 2023)). Also, in our analysis, the structure is kept constant: the connectivity matrix used to correlate the functional brain states is always the same (CoCoMac82). Thus, the influence of tDCS on the structure-function side can only be explained by modulating the functional aspects, as suggested by intuition and previous results.

      Then, we agree with the reviewer that studying the functional changes induced by tDCS alone could be valuable. However, usual metrics used in FC analysis are usually done statistically: FC-states are either computed through averaging spatial correlations over time, then analyzed through graph-theoretical properties for instance (or by just directly computing the element-wise differences), or either by considering the properties of the different visited FC-states by computing spatial correlations over a sliding time-window, and then similar analysis can be done as previously explained. But these are static metrics, if the states visited are essentially the same (which is expected from non-invasive neuromodulations that haven’t already demonstrated strong and/or characteristic impact), but the dynamical process of visiting said states changes, one would see no difference in that regard. As such, in the case of resting-state fMRI, differences in FCs are hard to interpret given that between-sessions within-condition differences are usually found with some degree of variance for the respective conditions. Trying then to interpret between-condition differences is quite tricky in the case of subtle modulations of the system’s activity. On the other hand, more subtle differences can be captured by considering more detailed analysis, such as using phase-based methods like we did,  by incorporating some statistical learning component with regard to the dynamicity of the system (supervised learning for instance like we did followed by temporal & transition-based methodology), and by adding some dimensions along which one will be able to give some interpretation to the analysis.  In our case we were interested in characterizing resting-state differences between stimulation conditions, which have nuanced and subtle interactions with the biological system. 

      As such, classical measures of differences between FC states are likely to not be refined and precise enough. In fact, we propose additional files investigating those classically used measures such as differences in average FC matrices, or changes in functional graph properties (like modularity, efficiency and density) of the visited FC states. These figures show that, for the first case, comparing region-to-region specific FCs provides very few statistically significant results. With respect to the second part, we show that virtually no differences are observed in the properties of the functional states visited. 

      These results suggest, as expected, that the actual brain states visited across the different stimulation conditions are topologically quite similar, and that only very few region-specific pairwise functional connectivities are particularly modulated by specific tDCS montages while, on the other hand, the actual dynamical process dictating how the brain activity passes from one state to another is in fact being influenced as shown by the dynamical analysis presented in the main figures in a more apparent and meaningful way (in that it is dependent on the montage, somewhat consistent with regard to the post-stimulations conditions, and can be made sense of by considering the theoretical effect of near-anodal versus near-cathodal neuromodulatory effects).

      Actions in the text: We have added new supplementary files showing the effects of the stimulations on FC matrices and on classical functional graph properties in awake and anesthesia datasets (Supplementary Files 3 & 4).

      We have added new sentences about these new analyses on the effects of the stimulations on FC matrices and on classical functional graph properties in the Results section:

      “In addition, we performed the main analyses separately for the two monkeys, explored the inter-condition variability (Supplementary File 2), and computed classical measures of functional connectivity such as average FC matrices and functional graph properties (modularity, efficiency and density) of the visited FC states (Supplementary File 3).... In contrast, classical FC metrics did not show significant differences across stimulation conditions, highlighting the value of dynamic FC metrics to capture the neuromodulatory effects of tDCS.”

      “Analyses of the two monkeys separately showed that the changes in slope and Shannon entropy were bigger in one of the two monkeys but went in the same direction (Supplementary File 2), while classical FC metrics did not capture any statistical differences between the different stimulation conditions (Supplementary File 3).”

      The authors recorded data from only two monkeys, which may limit the investigation of the group effects of tDCS. As the number of scans for the second monkey in each consciousness condition is lower than that in the first monkey, there is a concern that the main effects might primarily reflect the data from a single monkey. I suggest that the authors should analyze the data for each monkey individually to determine if similar trends are observed in both subjects.

      We agree that the small number of subjects is a limitation of our study. However, we have already addressed these aspects by reporting statistical analyses that consider them, using linear models of such variables, and running them through ANOVA tests. In addition, we experimentally ensured that we recorded a relatively high number of sessions over a period of several years. Regardless, we agree that our study would benefit from further investigation into this matter. We have therefore prepared complementary figures showing the main analysis performed separately for the two monkeys as proposed, as well as further investigations into the inter-condition variability outmatching the inter-individual variability, itself being also outmatched by intra-individual changes. 

      Actions in the text: We have added a supplementary file showing the main analyses performed separately for the two monkeys (Supplementary File 2) and further investigations into the inter-condition variability (Supplementary Files 3 & 4).

      We have added new sentences about these analyses performed separately for the two monkeys in the Results section:

      “In addition, we performed the main analyses separately for the two monkeys, explored the inter-condition variability (Supplementary File 2), and computed classical measures of functional connectivity such as average FC matrices and functional graph properties (modularity, efficiency and density) of the visited FC states (Supplementary File 3). The separate analyses showed that the changes in slope and Shannon entropy were substantially more pronounced in one of the two monkeys, corroborating some of the effects captured in the ANOVA tests.”

      “Analyses of the two monkeys separately showed that the changes in slope and Shannon entropy were bigger in one of the two monkeys but went in the same direction (Supplementary

      File 2)”.

      Anodal tDCS was only applied to anesthetized monkeys, which limits the conclusion that the authors are aiming for. It raises questions about the conclusion regarding brain state dependency. To address this, it would be better to include the cathodal tDCS session for anesthetized monkeys. If cathodal tDCS changes the connectivity during anesthesia, it becomes difficult to argue that the effects of cathodal tDCS vary depending on the state of consciousness as discussed in this paper. On the other hand, if cathodal tDCS would not produce any changes, the conclusion would then focus on the relationship between the polarity of tDCS and consciousness. In that case, the authors could maintain their conclusion but might need to refine it to reflect this specific relationship more accurately. 

      We agree with the reviewer that it would have been interesting to investigate the effects of cathodal tDCS in anesthetized monkeys. However, due to the challenging nature of the experimental procedures under anesthesia, we had to limit the investigations to only one stimulation modality. We chose to deliver anodal stimulation because, from a translational point of view, we aimed to provide new information on the effects of tDCS under anesthesia as a model for disorders of consciousness. It also made much more sense to increase the cortical excitability of the prefrontal cortex in an attempt to wake up the sedated monkeys rather than doing the opposite.

      Actions in the text: We have added a new sentence in the Results section:

      “Due to the challenging nature of the experimental procedures under anesthesia, we limited the investigations to only one stimulation modality. We chose to deliver anodal stimulation to provide new information on the effects of tDCS under anesthesia as a model for disorders of consciousness and to increase the cortical excitability of the PFC in an attempt to wake up the sedated monkeys.”

      Reviewer #3 (Public review): 

      Summary: 

      This study used transcranial direct current stimulation administered using small 'high-definition' electrodes to modulate neural activity within the non-human primate prefrontal cortex during both wakefulness and anaesthesia. Functional magnetic resonance imaging (fMRI) was used to assess the neuromodulatory effects of stimulation. The authors report on the modification of brain dynamics during and following anodal and cathodal stimulation during wakefulness and following anodal stimulation at two intensities (1 mA, 2 mA) during anaesthesia. This study provides some possible support that prefrontal direct current stimulation can alter neural activity patterns across wakefulness and sedation in monkeys. However, the reported findings need to be considered carefully against several important methodological limitations. 

      Strengths: 

      A key strength of this work is the use of fMRI-based methods to track changes in brain activity with good spatial precision. Another strength is the exploration of stimulation effects across wakefulness and sedation, which has the potential to provide novel information on the impact of electrical stimulation across states of consciousness.

      We thank the Reviewer for the summary and for highlighting the strengths of our study. 

      Weaknesses: 

      The lack of a sham stimulation condition is a significant limitation, for instance, how can the authors be sure that results were not affected by drowsiness or fatigue as a result of the experimental procedure?

      We agree with the reviewer that adding control conditions could have strengthened our study. Control conditions usually consist of a sham condition or active control conditions. However, as mentioned in response to one of Reviewer 2 comments (R.2.5), we had to make choices as we could not perform as many experiments due to their demanding nature, especially under anesthesia. 

      In the awake state, we acquired data with two experimental conditions; the monkeys were exposed to either anodal (F4/O1) or cathodal (O1/F4) PFC tDCS. As anodal tDCS of the PFC induced only minor changes in brain dynamics, it could be considered as an active control condition for the cathodal condition, which had striking effects on the cortical dynamics. It is also worth noting that doubts have been raised about the neurobiological inertia of certain sham protocols. Indeed, different sham protocols have been employed in the literature, some of which may produce unintended effects (Fonteneau et al. 2019). Therefore, active control conditions, such as reversing the polarity of the stimulation or targeting a different brain region, have been proposed to provide better control (Fonteneau et al. 2019). Furthermore, in the context of experiments performed under anesthesia, the relevance of a sham control condition typically used to achieve adequate blinding is questionable. 

      With regard to drowsiness and fatigue as a result of the experimental procedure, we agree with the reviewer that this is a common problem in functional imaging due to the length of the recording sessions. We assumed, as was done in previous work (Uhrig, Dehaene, and Jarraya 2014; Wang et al. 2015), that the monkeys' performance on the fixation task during acquisition would capture these periods of fatigue. Therefore, only sessions with fixation rates above 85% were included in our analysis. 

      Actions in the text: We have now specified, in the Materials and Methods section, the fact that only runs with a high fixation rate (> 85%) were included in the study: 

      “To ensure that the results were not biased by fatigue or drowsiness due to the lengthy

      In the anaesthesia condition, the authors investigated the effects of two intensities of stimulation (1 mA and 2 mA). However, a potential confound here relates to the possibility that the initial 1 mA stimulation block might have caused plasticity-related changes in neural activity that could have interfered with the following 2 mA block due to the lack of a sufficient wash-out period. Hence, I am not sure any findings from the 2 mA block can really be interpreted as completely separate from the initial 1 mA stimulation period, given that they were administered consecutively. Several previous studies have shown that same-day repeated tDCS stimulation blocks can influence the effects of neuromodulation (e.g., Bastani and Jaberzadeh, 2014, Clin Neurophysiol; Monte-Silva et al., J. Neurophysiology). 

      We agree with the reviewer’s comment that the initial 1 mA stimulation block might have induced changes in neural activity and that the 20-minute post 1 mA block would not be long enough to wash out these changes. This comment is very similar to the second comment made by Reviewer 2 (R.2.2). Although our experimental data do not support this possibility (as the differences between the 1 mA post-stimulation and baseline conditions were not significant), it is still conceivable that the stimulation produced some effects below the threshold of significance and that these might weakly contribute to the changes observed during and after the 2 mA stimulation. 

      Actions in the text: We have modified the paper according to the reviewers' comments (please see our answer and actions in the text to R.2.2.).

      The different electrode placement for the two anaesthetised monkeys (i.e., Monkey R: F3/O2 montage, Monkey N: F4/O1 montage) is problematic, as it is likely to have resulted in stimulation over different brain regions. The authors state that "Because of the small size of the monkey's head, we expected that tDCS stimulation with these two symmetrical montages would result in nearly equivalent electric fields across the monkey's head and produce roughly similar effects on brain activity"; however, I am not totally convinced of this, and it really would need E-field models to confirm. It is also more likely that there would in fact be notable differences in the brain regions stimulated as the authors used HD-tDCS electrodes, which are generally more focal.

      We thank the Reviewer for the remark, which is very similar to the second comment from Reviewer 2. Please see our answer to the first comment of Reviewer 2 

      Actions in the text: We have modified the paper according to the reviewers' comments (please see the actions taken in response to R.2.1.).

      Given the very small sample size, I think it is also important to consider the possibility that some results might also be impacted by individual differences in response to stimulation. For instance, in the discussion (page 9, paragraph 2) the authors contrast findings observed in awake animals versus anaesthetised animals. However, different monkeys were examined for these two conditions, and there were only two monkeys in each group (monkeys J and Y for awake experiments [both male], and monkeys R and N [male and female] for the anaesthesia condition). From the human literature, it is well known that there is a considerable amount of inter-individual variability in response to stimulation (e.g., Lopez-Alonso et al., 2014, Brain Stimulation; Chew et al., 2015, Brain Stimulation), therefore I wonder if some of these differences could also possibly result from differences in responsiveness to stimulation between the different monkeys? At the end of the paragraph, the authors also state "Our findings also support the use of tDCS to promote rapid recovery from general anesthesia in humans...and suggest that a single anodal prefrontal stimulation at the end of the anesthesia protocol may be effective." However, I'm not sure if this statement is really backed-up by the results, which failed to report "any behavioural signs of awakening in the animals" (page 7)?

      We thank the Reviewer for this comment. Because working with non-human primates is expensive and labor intensive, the sample sizes in classical macaque experiments are generally small (typically 2-4 subjects per experiment). Our sample size (i.e. 2 rhesus macaques in awake experiments and 2 macaques under sedation, 11 +/- 9 scan sessions per animal, 288 and 136 runs in the awake and anesthesia state, respectively) is comparable to other previous work in non-human primates using fMRI (Milham et al. 2018; Yacoub et al. 2020; Uchimura, Kumano, and Kitazawa 2024). In addition, we would like to point out that the baseline cortical dynamics we found before stimulation, whether in the awake or sedated state, are comparable to previous studies (Barttfeld et al. 2015; Uhrig et al. 2018; Tasserie et al. 2022). This suggests our results are reproducible across datasets, despite the small sample size.

      That being said, we agree with the reviewer that inter-individual variability in response to stimulation can be considerable, as shown by a large body of literature in the field. It seems possible that the two monkeys studied in each condition responded differently to the stimulation. But even if that’s the case, our results suggest that at least in one of the two monkeys, cathodal PFC stimulation in the awake state and anodal PFC stimulation under propofol anesthesia induced striking changes in brain dynamics, which we believe is a significant contribution to the field. 

      In fact, supplementary analysis, as proposed by Reviewer 2 (cf R2.4), investigating how the different measurables we’ve used were differently affected by tDCS show that indeed monkey Y’s case is more apparent and significant than monkey J’s. Still, the effects observed in monkey J’s case are still congruent with what is observed in monkey Y’s and at the population level (though less flagrant). We also show that these inter-individual variabilities are outmatched by the inter-condition variability, (as indicated by our initially strong statistical results at the population levels), thus showing that, even though we have different responses depending on the subject, the effects observed at the population level cannot be only accounted for by the differences in subjects’ specificities.

      Lastly, the Reviewer questioned whether our results support that a single anodal prefrontal stimulation at the end of the anesthesia protocol could effectively promote rapid recovery from general anesthesia, because the stimulation did not wake the animals in our experiments. It should be emphasized that in our case, the monkeys were stimulated while they were still receiving continuous propofol perfusion. In contrast, during the recovery process from anesthesia, the delivery of the anesthetic drug is stopped. It is therefore conceivable that anodal PFC tDCS, which successfully enriched brain dynamics in sedated monkeys in our experiments, may accelerate the recovery from anesthesia when the drug is no longer administered. 

      Actions in the text: We have added a line in the Materials and Methods to compare to other studies:

      “Our sample size is comparable to previous work in NHP using fMRI (Milham et al. 2018; Yacoub et al. 2020; Uchimura, Kumano, and Kitazawa 2024).”

      Reviewing Editor Comments: 

      In some cases, authors opt to submit a revised manuscript. Should you choose to do so, please be aware that the reviewers have indicated that their appraisal is unlikely to change unless some of the suggested field modelling is incorporated into the work. This may change the evaluation of the strength of evidence, but the final wording will be subject to reviewer discretion. Details for responding to the reviews are provided at the bottom of this email.

      Reviewer #1 (Recommendations for the authors): 

      The work should discuss the implications of their experiments for using tDCS to arouse a patient from a coma. The anesthetized animal is effectively in a drug-induced coma. While they observed connectivity changes, these changes did not map nicely onto behavioral changes. 

      I would suggest that the authors spell out more clearly what they view as the clinical implications of their work in terms of new insights into how tDCS may be used to either understand and or treat disorders of consciousness.

      We thank the Reviewer for his thoughtful comments. We appreciate the opportunity to clarify and expand on the key findings and implications of our work, particularly regarding the new insights into how tDCS can be used to understand and treat disorders of consciousness. We therefore provide a broader perspective on the clinical implications of our experiments regarding coma and disorders of consciousness. We also agree with the Reviewer that the absence of behavioral changes but the presence of functional differences should be more clearly addressed. 

      Actions in the text: We have added a few lines about the relevance of anesthesia as a model for disorders of consciousness in the Introduction part:

      “Anesthesia provides a unique model for studying consciousness, which, similarly to DOC, is characterized by the disruption or even  the loss of consciousness (Luppi 2024). Additionally, anesthesia mechanisms involve several subcortical nuclei that are key components of the brain's sleep and arousal circuits (Kelz and Mashour 2019).”

      In the Discussion section, we have modified and expanded a paragraph about the effects of tDCS in DOC patients and how this technique could be further used to study consciousness: From another clinical perspective, our results demonstrating that 2 mA anodal PFC tDCS decreased the structure-function correlation and modified the dynamic repertoire of brain patterns during anesthesia (Figures 6 and 7) are consistent with the beneficial effects of such stimulation in DOC patients (Thibaut et al., 2014; Angelakis et al., 2014; Thibaut et al., 2017; Zhang et al., 2017; Martens et al., 2018; Cavinato et al., 2019; Wu et al., 2019; Hermann et al., 2020; Peng et al., 2022; Thibaut et al., 2023). Although some clinical trials investigated the effects of stimulating other brain regions, such as the motor cortex (Martens et al., 2019; Straudi et al., 2019) or the parietal cortex (Huang et al., 2017; Guo et al., 2019; Zhang et al., 2022; Wan et al., 2023; Wang et al., 2020), the DLPFC appears to be the most effective target for patients with a minimally conscious state (Liu et al., 2023). In terms of neuromodulatory effects in DOC patients, DLPFC tDCS has been reported to increase global excitability (Bai et al., 2017), increase the P300 amplitude (Zhang et al., 2017; Hermann et al., 2020), improve the fronto-parietal coherence in the theta band (Bai et al., 2018), enhance the putative EEG markers of consciousness (Bai et al., 2018; Hermann et al., 2020) and reduce the incidence of slow-waves in the resting state (Mensen et al., 2020). Our findings further support the PFC as a relevant target for modulating consciousness level and align with growing evidence showing that the PFC plays a key role in conscious access networks (Mashour, Pal, and Brown 2022; Panagiotaropoulos 2024). Nevertheless, we hypothesize that other brain targets for tDCS may be of interest for consciousness restoration, potentially using multi-channel tDCS (Havlík et al., 2023). Among transcranial electrical stimulation techniques, tDCS has the great advantage of facilitating either excitation or inhibition of brain regions, depending on the polarity of the stimulation (Sdoia et al., 2019) exploited this advantage to investigate the causal involvement of the DLPFC in conscious access to a visual stimulus during an attentional blink paradigm. While conscious access was enhanced by anodal stimulation of the left DLPFC compared to sham stimulation, opposite effects were found with cathodal stimulation compared to sham over the same locus. Finally, this literature and our findings suggest that tDCS constitutes a non-invasive, reversible, and powerful tool for studying consciousness.”

      We have added a new paragraph about patients with cognitive-motor dissociation and dissociation between consciousness and behavioral responsiveness:

      “Changes in the state of consciousness are generally closely associated with changes in behavioural responsiveness, although some rare cases of dissociation have been described. Cognitive-motor dissociation (CMD) is a condition observed in patients with severe brain injury, characterized by behavior consistent with unresponsive wakefulness syndrome or a minimally conscious state minus (Thibaut et al., 2019). However, in these patients, specific cortical brain areas activate in response to mental imagery tasks (e.g., imagining playing tennis or returning home) in a manner indistinguishable from that of healthy controls, as shown through fMRI or EEG (Thibaut et al., 2019; Owen et al., 2006; Monti et al., 2010; Bodien et al., 2024). Thus, although CMD patients are behaviorally unresponsive, they demonstrate cognitive awareness that is not outwardly apparent. It is worth noting that both the structure-function correlation and the rate of the pattern closest to the anatomy were shown to be significantly reduced in unresponsive patients showing command following during mental imagery tasks compared to those who do not show command following (Demertzi et al., 2019). These observations would be compatible with our findings in anesthetized macaques exposed to 2 mA anodal PFC tDCS. The richness of the brain dynamics would be recovered (at least partially, in our experiments), but not the behaviour. This hypothesis also fits with a recent longitudinal fMRI study on patients recovering from coma (Crone et al., 2020). The researchers examined two groups of patients: one group consisted of individuals who were unconscious at the acute scanning session but regained consciousness and improved behavioral responsiveness a few months later, and the second group consisted of patients who were already conscious from the start and only improved behavioral responsiveness at follow-up. By comparing these two groups, the authors could distinguish between the recovery of consciousness and the recovery of behavioral responsiveness. They demonstrated that only initially conscious patients exhibited rich brain dynamics at baseline. In contrast, patients who were unconscious in the acute phase and later regained consciousness had poor baseline dynamics, which became more complex at follow-up. Complete recovery of both consciousness and responsiveness under general anesthesia is possible through electrical stimulation of the central thalamus (Redinbaugh et al., 2020; Tasserie et al., 2022).”

      Reviewer #2 (Recommendations for the authors): 

      Method 

      (1) The authors mentioned that they used HD-tDCS in their experiments; however, they used 1 x 1 tDCS, which is not HD-tDCS but rather single-channel tDCS.

      We thank the Reviewing Editor for pointing out this ambiguous wording. We understand that "HD-tDCS", which we used in our paper to refer to high-density 1x1 tDCS (because we used small carbon electrodes instead of the large sponge electrodes employed in conventional tDCS), may cause some confusion with high-definition tDCS, which uses compact ring electrodes and most commonly refers to a 4x1 montage (1 active central electrode over the target area and 4 return electrodes placed around the central electrode).

      Therefore, to avoid any confusion, we will use the term "tDCS" rather than “HD-tDCS” to qualify the technique used in this paper and suppress mentions of high-density or high-definition tDCS.

      Actions in the text: We have replaced the abbreviation “HD-tDCS” with “tDCS” throughout the paper. We have also suppressed the sentence about high-definition tDCS in the Introduction (“While conventional tDCS relies on the use of relatively large rectangular pad electrodes, high-density tDCS (HD-tDCS) utilizes more compact ring electrodes, allowing for increased focality, stronger electric fields, and presumably, greater neurophysiological changes (Datta et al. 2009; Dmochowski et al. 2011)”) and the two related citations in the References section.

      (2) Please provide the characteristics of electrodes, including their size, shape, and thickness.

      We thank the Reviewing Editor for this recommendation. We now provide the complete characteristics of the tDCS electrodes used in the paper.

      Actions in the text: We have added a sentence describing the characteristics of the tDCS electrodes in the Materials and Methods section:

      “We used a 1x1 electrode montage with two carbon rubber electrodes (dimensions: 1.4 cm x 1.85 cm, 0.93  cm thick) inserted into Soterix HD-tES MRI electrode holders (base diameter: 25 mm; height: 10.5 mm), which are in contact with the scalp. These electrodes (2.59 cm2) are smaller than conventional tDCS sponge electrodes (typically 25 to 35 cm<sup>2</sup>).”

      (3) Could the authors clarify why they chose to stimulate the right DLPFC? Is there a specific rationale for this choice? Additionally, could the authors explain how they ensured that the stimulation targeted the DLPFC, given that the monkey cap might differ from human configurations? In many NHP studies, structural MRI is used to accurately determine electrode placement. Considering that a single channel F4 - O2 montage was used, even a small displacement of the frontal electrode laterally could result in the electric field not adequately covering the DLPFC. Could the authors provide structural MRI images and details of electrode positioning to help readers better understand targeting accuracy?

      We thank the Reviewing Editor for the thoughtful comments and recommendations. We appreciate the opportunity to further clarify our rationale for stimulating the right DLPFC and also the suggestion to provide structural MRI images and details of electrode positioning, which we think will improve the quality of the paper by showing targeting accuracy.

      First, we would like to clarify that our initial decision to stimulate the right PFC in most animals was driven by experimental constraints. Indeed, we had limited access to the left PFC in three of the four macaques, either due to the presence of cement (spreading asymmetrically from the centre of the head) used to fix the head post in awake animals or due to a scar in one of the two animals studied under anesthesia. 

      Second, we agree with the Reviewing Editor on the importance of showing details of electrode positioning and evidence of targeting accuracy across MRI sessions. Therefore, we now provide structural images showing the positions of anodal and cathodal electrodes in almost all acquired sessions: 10 sessions (out of 10) under anesthesia and 30 sessions in the awake state (out of 34 sessions, because we could not acquire structural images in four sessions). These images show that, in anesthesia experiments, the anodal electrode was positioned over the dorsal prefrontal cortex and the cathodal electrode was placed over the contralateral occipital cortex (at the level of the parieto–occipital junction) in both monkeys. In the awake state, the montage still targeted the prefrontal cortex and the occipital cortex, but with a slightly different placement. One of the electrodes was placed over the prefrontal cortex, closer to the premotor cortex than in anesthesia experiments, while the other one was placed over the occipital cortex (V1), slightly more posterior than in anesthesia experiments. These images therefore show that the placement was relatively accurate across sessions and reproducible between monkeys in each of the two arousal conditions.

      Actions in the text: We have added a supplementary file showing electrode positioning in 40 of the 44 acquired MRI sessions (Supplementary File 1). We have also added a new supplement figure (Figure 1 - figure supplement 1) showing electrode positioning in representative MRI sessions of the awake and anesthetized experiments in the main manuscript. 

      We added a few sentences referring to these figures in the Result section: 

      “Representative structural images showing electrode placements on the head of the two awake monkeys are shown in Figure 1 - figure supplement 1A). Supplementary File 1 displays the complete set of structural images, showing that the two electrodes were accurately placed over the prefrontal cortex and the occipital cortex in a reproducible manner across awake sessions.”

      Figure 1 - figure supplement 1. Structural images displaying electrode placements on the head of monkeys. A) Awake experiments. Representative sagittal, coronal and transverse MRI sections, and the corresponding skin reconstruction images showing the position of the prefrontal and the occipital electrodes on the head of monkeys J. and Y. B) Anesthesia experiments. Representative sagittal, coronal and transverse MRI sections, and the corresponding skin reconstruction images showing the position of the prefrontal and occipital electrodes over the occipital cortex on the head of monkeys R. and N.

      Supplementary File 1 (see attached file). Structural images showing the position of the tDCS electrodes on the monkey's head across sessions. Sagittal, coronal and transverse MRI sections, and corresponding skin reconstruction images showing the position of the prefrontal and occipital electrodes on the monkey's head for each MRI session (except for 4 sessions in which no anatomical scan was acquired). The two electrodes were accurately placed over the prefrontal cortex and the occipital cortex in a reproducible manner across sessions and between the two monkeys studied in each arousal state. In anesthesia experiments, the anodal electrode was placed over the dorsal prefrontal cortex, while the cathodal electrode was positioned over the parieto-occipital junction. In awake experiments, the prefrontal electrode was positioned over the dorsal prefrontal cortex/pre-motor cortex, while the occipital electrode was placed over the visual area 1. The position of the two electrodes differed slightly between the anesthetized and awake experiments due to different body positions (the prone position of the sedated monkeys prevented a more posterior position of the occipital electrode) and also due to the presence of a headpost on the head of the two monkeys in awake experiments (the monkeys we worked with in anesthesia experiments did not have an headpost).

      (4) If the authors did not analyze the data for the passive event-related auditory response, it may be helpful to remove the related sentence to avoid potential confusion for readers.

      We thank the Reviewing Editor for the comment. Although we understand the reviewer’s point of view, we decide to keep this information in the paper to inform the reader that the macaques were passively engaged in an auditory task, as this could have some influence on the brain state. In the Materials and Methods section, we already mentioned that the analysis of the cerebral responses to the auditory paradigm is not part of the paper. We have modified the sentence to make it clearer and to avoid potential confusion for readers.

      Actions in the text: We have modified the sentence referring to the passive event-related auditory response in the Materials and Methods section:

      “All fMRI data were acquired while the monkeys were engaged in a passive event-related auditory task, the local-global paradigm, which is based on local and global deviations from temporal regularities (Bekinschtein et al. 2009; Uhrig, Dehaene, and Jarraya 2014). The present paper does not address how tDCS perturbs cerebral responses to local and global deviants, which will be the subject of future work.”

      (5) Could the authors clarify what x(t) represents in the equation? Additionally, it would be better to number the equations.

      We apologize for the confusion,  x(t) represents the evolution of the BOLD signals over time. We have numbered the equations as suggested. 

      Actions in the text: We have added explanations about the notation and numerotation of equations.

      (6) It would be much better to provide schematic illustrations to explain what the authors did for analyzing fMRI data.

      We thank the Reviewing Editor for the suggestion and now provide a new figure as suggested.  

      Actions in the text: We have added a new figure (Figure 2) graphically showing the overall analysis performed. We have added a sentence about the new Figure 2 in the Results section:  “A graphical overview of the overall analysis is shown in Figure 2.” We have renumbered Figure 2 - supplement figures accordingly.

      Figure 2. fMRI Phase Coherence analysis. A) Left) Animals were scanned before, during and after PFC tDCS stimulation in the awake state (two macaques) or under deep propofol anesthesia (two macaques). Right) Example of Z-scored filtered BOLD time series for one macaque, 111 time points with a TR of 2.4 s. B) Hilbert transform of the z-scored BOLD signal of one ROI into its time-varying amplitude A(t) (red) and the real part of the phase φ (green). In blue, we recover the original z-scored BOLD signal as A(t)cos(φ). C) Example of the phase of the Hilbert transform for each brain region at one TR. D) Symmetric matrix of cosines of the phase differences between all pairs of brain regions. E) We concatenated the vectorized form of the triangular superior of the phase difference matrices for all TRs for all participants, in all the conditions for both datasets separately obtaining using the K-means algorithm, the brain patterns whose statistics are then analyzed in the different conditions.

      Results 

      (1) In Figures 3A, 5A, and 6A showing brain connectivity, it is difficult to relate the connectivity variability among the brain regions. Instead of displaying connection lines for nodes, it would be more effective if the authors highlighted significant, strong connectivity within specific brain regions using additional methods, such as bootstrapping.

      We thank the Reviewing Editor for the comment and suggestion. The connection lines indeed represent all the synchronizations above 0.5 and all the anti-synchronization below -0.5 between all pairs of brain regions. As suggested, another element we haven’t addressed is the heterogeneity in coherences between individual brain regions. We hence propose additional supplementary figures showing, for all centroids mentioned in main figures, the variance in phase-based connectivity of the distributions of coherence of all brain regions to the rest of the brain. High value would then indicate a wide range of values of coherence, while low would indicate the different coherence a region has with the rest of the brain have similar values. Thus, a brain with uniform color would indicate high homogeneity in coherence among brain regions, while sharp changes in colors would reveal that certain regions are more subject to high variance in their coherence distributions. We expect this new figure to more clearly expose the connectivity variability among the brain regions.

      Actions in the text: We have added new figures showing, for all centroids mentioned in the main figures, the variances in phase-based connectivity of the distributions of coherence  (Figure 3 - figure supplement 3;  Figure 5 - figure supplement 2; Figure 6 - figure supplement 3; Figure 7 - figure supplement 2). One of them is shown below for the only awake analysis (Figure 3 - figure supplement 3).

      Figure 3 - figure supplement 3. Variance in inter-region phase coherences of brain patterns. Low values (red and light red) indicate that the distribution of synchronizations between a brain region and the rest of the brain has relatively low variance, while high values (blue and light blue) indicate relatively high variance. Are displayed both supra (top) and subdorsal (bottom) views for each brain pattern from the main figure, ordered similarly as previously: from left (1) to right (6) as their respective SFC increases. 

      We added a few sentences about variances in phase-based connectivity of the distributions of coherence in the Result section: 

      “Further investigation of the variances in inter-region phase coherences of brain patterns, presented in Figure 3 - figure supplement 3, revealed two main findings. First, all the patterns exhibited some degree of lateral symmetry. Second, except for the pattern with the highest SFC, most patterns displayed high heterogeneity in their coherence variances and striking inter-pattern differences. These observations reflect both the segmentation of distinct functional networks across patterns and a topological organization within the patterns themselves: some regions showed a broader spectrum of synchrony with the rest of the brain, while others exhibited narrower distributions of coherence variances. For instance, unlike other brain patterns, pattern 5 was characterized by a high coherence variance in the frontal premotor areas and low variance in the occipital cortex, whereas pattern 3 had a high variance in the frontal and orbitofrontal regions. In addition, we performed the main analyses separately for the two monkeys, explored the inter-condition variability (Supplementary File 2), and computed classical measures of functional connectivity such as average FC matrices and functional graph properties (modularity, efficiency and density) of the visited FC states (Supplementary File 3).”

      “The variance in inter-regional phase coherence across brain patterns showed notably that pattern 4, in contrast to most other patterns, was characterized by a high variance in frontal premotor areas and a low variance in the occipital cortex (Figure 5 - figure supplement 2)." 

      “The variance in inter-region phase coherences of the brain patterns is displayed in Figure 6 - figure supplement 3 and showed a striking heterogeneity between the patterns. For example, pattern 5 had a low overall variance (except in the frontal cortex), while pattern 1 was the only pattern with a high variance in the occipital cortex.”

      “The variance in inter-region phase coherences of brain patterns is displayed in Figure 6 - figure supplement 2.”

      (2) For both conditions, only 2 to 3 out of 6 patterns showed significant effects of tDCS on the occurrence rate. Is it sufficient to claim the authors' conclusion?

      We thank the Reviewer Editor for the comment. We would like to point out that similar kinds of differences in the occurrence rates of specific brain patterns (particularly in patterns at the extremities of the SFC scale) have already been reported previously. Prior works in patients suffering from disorders of consciousness, in healthy humans or in non-human primates,  have shown, by using a similar method of analysis, that not all brain states are equally disturbed by loss of consciousness, even in different modalities of unconscious transitioning (Luppi et al. 2021; Z. Huang et al. 2020; Demertzi et al. 2019; Castro et al. 2023; Golkowski et al. 2019; Barttfeld et al. 2015). Therefore, yes we believe that our conclusions are still supported by the results.

      (3) If the authors want to assert that the brain state significantly influences the effects of tDCS as discussed in the manuscript, further analysis is necessary. First, it would be great to show the difference in connectivity between two consciousness conditions during the baseline (resting state) to see how resting state connectivity or structural connectivity varies. Second, demonstrating the difference in connectivity between the awake and anesthetized conditions (e.g., awake during cathodal vs. anesthetized cathodal) to show how the connectivity among the brain regions was changed by the brain state during tDCS. This would strengthen the authors' conclusion.

      We thank the reviewer for this comment. Firstly, we’d like to clarify that the structural connectivity doesn’t change from one session to another in the same animal and minimally between subjects. Secondly, we agree with the Reviewing Editor that it is informative to show the differences between the baselines and this is what we have done. The results are shown in Figures 5 and 7. Regarding the comparison of the stimulating conditions across arousal levels, the only contrast that we could make is to compare 2 mA anodal awake with 2 mA anodal anesthetized (during and post-stimulation). However, as 2 mA anodal stimulation in the awake state did not affect the connectivity much (compared to the awake baseline), the results would be almost similar to the comparison of the awake baseline with 2 mA anodal anesthetized, which is shown in Figure 7. Therefore, we believe that this would result in minimal informative gains and even more redundancy. 

      Reviewer #3 (Recommendations for the authors): 

      Introduction, par 2: HD-tDCS does not necessarily produce stronger electric fields (E-fields) in the brain. The E-field is largely montage-dependent, and some configurations such as the 4x1 configuration can actually have weaker E-fields compared to conventional tDCS designs (i.e., with two sponge electrodes) as electrodes are often closer together resulting in more current being shunted by skull, scalp, and CSF. I would consider re-phrasing this section.

      We agree with the Reviewer Editor that high-definition tDCS does not necessarily produce stronger electric fields in the brain and apologize for the confusion caused by our use of HD-tDCS to refer to high-density tDCS. To avoid any confusion, we have removed the sentence mentioning that HD-tDCS produces stronger electric fields. 

      Actions in the text: We have removed the sentence about high-definition tDCS in the Introduction (“While conventional tDCS relies on the use of relatively large rectangular pad electrodes, high-density tDCS (HD-tDCS) utilizes more compact ring electrodes, allowing for increased focality, stronger electric fields, and presumably, greater neurophysiological changes (Datta et al. 2009; Dmochowski et al. 2011)”) and the two related citations in the References section.

    1. Author response:

      General Statements:

      The formation of three-dimensional tubes is a fundamental process in the development of organs and aberrant tube size leads to common diseases and congenital disorders, such as polycystic kidney disease, asthma, and lung hypoplasia. The apical (luminal) extracellular matrix (ECM) plays a critical role in epithelial tube morphogenesis during organ formation, but its composition and organization remain poorly understood. Using the Drosophila embryonic salivary gland as a model, we reveal a critical role for the PAPS Synthetase (Papss), an enzyme that synthesizes the universal sulfate donor PAPS, as a critical regulator of tube lumen expansion. Additionally, we identify two zona pellucida (ZP) domain proteins, Piopio (Pio) and Dumpy (Dpy) as key apical ECM components that provide mechanical support to maintain a uniform tube diameter.

      The apical ECM has a distinct composition compared to the basal ECM, featuring a diverse array of components. Many studies of the apical ECM have focused on the role of chitin and its modification, but the composition of the non-chitinous apical ECM and its role, and how modification of the apical ECM affects organogenesis remain elusive. The main findings of this manuscript are listed below.

      (1) Through a deficiency screen targeting ECM-modifying enzymes, we identify Papss as a key enzyme regulating luminal expansion during salivary gland morphogenesis. 

      (2) Our confocal and transmission electron microscopy analyses reveal that Papss mutants exhibit a disorganized apical membrane and condensed aECM, which are at least partially linked to disruptions in Golgi structures and intracellular trafficking. Papss is also essential for cell survival and basal ECM integrity, highlighting the role of sulfation in regulating both apical and basal ECM.

      (3) Salivary gland-specific overexpression of wild-type Papss rescues all defects in Papss mutants, but the catalytically inactive mutant form does not, suggesting that defects in sulfation are the underlying cause of the phenotypes.

      (4) We identify two ZP domain proteins, Piopio (Pio) and Dumpy (Dpy), as key components of the salivary gland aECM. In the absence of Papss, Pio is progressively lost from the aECM, while the Dpy-positive aECM structure is condensed and detaches from the apical membrane, resulting in a narrowed lumen. 

      (5) Mutations in pio or dpy, or in Notopleural (Np), which encodes a matriptase that cleaves Pio, cause the salivary gland lumen to develop alternating bulges and constrictions. Additionally, loss of pio results in loss of Dpy in the salivary gland lumen, suggesting that the Dpycontaining filamentous structures of the aECM is critical for maintaining luminal diameter, with Pio playing an essential role in organizing this structure.

      (6) We further reveal that the cleavage of the ZP domain of Pio by Np is critical for the role of Pio in organizing the aECM structure.

      Overall, our findings underscore the essential role of sulfation in organizing the aECM during tubular organ formation and highlight the mechanical support provided by ZP domain proteins in maintaining tube diameter. Mammals have two isoforms of Papss, Papss1 and Papss2. Papss1 shows ubiquitous expression, with higher levels in glandular cells and salivary duct cells, suggesting a high requirement for sulfation in these cell types. Papss2 shows a more restricted expression, such as in cartilage, and mutations in Papss2 have been associated with skeletal dysplasia in humans. Our analysis of the Drosophila Papss gene, a single ortholog of human Papss1 and Papss2, reveals its multiple roles during salivary gland development. We expect that these findings will provide valuable insights into the function of these enzymes in normal development and disease in humans. Our findings on the key role of two ZP proteins, Pio and Dpy, as major components of the salivary gland aECM also provide valuable information on the organization of the non-chitinous aECM during organ formation.

      We believe that our results will be of broad interest to many cell and developmental biologists studying organogenesis and the ECM, as well as those investigating the mechanisms underlying human diseases associated with conserved mutations.

      Point-by-point description of the revisions:

      We are delighted that all three reviewers were enthusiastic about the work. Their comments and suggestions have improved the paper. The details of the changes we have made in response to each reviewer’s comments are included in italicized text below.

      Reviewer #1 (Evidence, reproducibility and clarity):

      PAPS is required for all sulfotransferase reactions in which a sulfate group is covalently attached to amino acid residues of proteins or to side chains of proteoglycans. This sulfation is crucial for properly organizing the apical extracellular matrix (aECM) and expanding the lumen in the Drosophila salivary gland. Loss of Papss potentially leads to decreased sulfation, disorganizing the aECM, and defects in lumen formation. In addition, Papss loss destabilizes the Golgi structures.

      In Papss mutants, several changes occur in the salivary gland lumen of Drosophila. The tube lumen is very thin and shows irregular apical protrusions. There is a disorganization of the apical membrane and a compaction of the apical extracellular matrix (aECM). The Golgi structures and intracellular transport are disturbed. In addition, the ZP domain proteins Piopio (Pio) and Dumpy (Dpy) lose their normal distribution in the lumen, which leads to condensation and dissociation of the Dpy-positive aECM structure from the apical membrane. This results in a thin and irregularly dilated lumen.

      (1) The authors describe various changes in the lumen in mutants, from thin lumen to irregular expansion. I would like to know the correct lumen diameter, and length, besides the total area, by which one can recognize thin and irregular.

      We have included quantification of the length and diameter of the salivary gland lumen in the stage 16 salivary glands of control, Papss mutant, and salivary gland-specific rescue embryos (Figure 1J, K). As described, Papss mutant embryos have two distinct phenotypes, one group with a thin lumen along the entire lumen and the other group with irregular lumen shapes. Therefore, we separated the two groups for quantification of lumen diameter. Additionally, we have analyzed the degree of variability for the lumen diameter to better capture the range of phenotypes observed (Figure 1K’). These quantifications enable a more precise assessment of lumen morphology, allowing readers to distinguish between thin and irregular lumen phenotypes.

      (2) The rescue is about 30%, which is not as good as expected. Maybe the wrong isoform was taken. Is it possible to find out which isoform is expressed in the salivary glands, e.g., by RNA in situ Hyb? This could then be used to analyze a more focused rescue beyond the paper.

      Thank you for this point, but we do not agree that the rescue is about 30%. In Papss mutants, about 50% of the embryos show the thin lumen phenotype whereas the other 50% show irregular lumen shapes. In the rescue embryos with a WT Papss, few embryos showed thin lumen phenotypes. About 40% of the rescue embryos showed “normal, fully expanded” lumen shapes, and the remaining 60% showed either irregular (thin+expanded) or slightly overexpanded lumen. It is not uncommon that rescue with the Gal4/UAS system results in a partial rescue because it is often not easy to achieve the balance of the proper amount of the protein with the overexpression system. 

      To address the possibility that the wrong isoform was used, we performed in situ hybridization to examine the expression of different Papss spice forms in the salivary gland. We used probes that detect subsets of splice forms: A/B/C/F/G, D/H, and E/F/H, and found that all probes showed expression in the salivary gland, with varying intensities. The original probe, which detects all splice forms, showed the strongest signals in the salivary gland compared to the new probes which detect only a subset. However, the difference in the signal intensity may be due to the longer length of the original probe (>800 bp) compared to other probes that were made with much smaller regions (~200 bp). Digoxigenin in the DIG labeling kit for mRNA detection labels the uridine nucleotide in the transcript, and the probes with weaker signals contain fewer uridines (all: 147; ABCFG, 29; D, 36; EFH, 66). We also used the Papss-PD isoform, for a salivary gland-specific rescue experiment and obtained similar results to those with Papss-PE (Figure 1I-L, Figure 4D and E). 

      Furthermore, we performed additional experiments to validate our findings. We performed a rescue experiment with a mutant form of Papss that has mutations in the critical rescues of the catalytic domains of the enzyme, which failed to rescue any phenotypes, including the thin lumen phenotype (Figure 1H, J-L), the number and intensity of WGA puncta (Figure 3I, I’), and cell death (Figure 4D, E). These results provide strong evidence that the defects observed in Papss mutants are due to the lack of sulfation.  

      (3) Crb is a transmembrane protein on the apicolateral side of the membrane. Accordingly, the apicolateral distribution can be seen in the control and the mutant. I believe there are no apparent differences here, not even in the amount of expression. However, the view of the cells (frame) shows possible differences. To be sure, a more in-depth analysis of the images is required. Confocal Z-stack images, with 3D visualization and orthogonal projections to analyze the membranes showing Crb staining together with a suitable membrane marker (e.g. SAS or Uif). This is the only way to show whether Crb is incorrectly distributed. Statistics of several papas mutants would also be desirable and not just a single representative image. When do the observed changes in Crb distribution occur in the development of the tubes, only during stage 16? Is papss only involved in the maintenance of the apical membrane? This is particularly important when considering the SJ and AJ, because the latter show no change in the mutants.

      We appreciate your suggestion more thoroughly analyze Crb distribution. We adapted a method from a previous study (Olivares-Castiñeira and Llimargas, 2017) to quantify Crb signals in the subapical region and apical free region of salivary gland cells. Using E-Cad signals as a reference, we marked the apical cell boundaries of individual cells and calculated the intensity of Crb signals in the subapical region (along the cell membrane) and in the apical free region. We focused on the expanded region of the SG lumen in Papss mutants for quantification, as the thin lumen region was challenging to analyze. This quantification is included in Figure 2D. Statistical analysis shows that Crb signals were more dispersed in SG cells in Papss mutants compared to WT.

      (4) A change in the ECM is only inferred based on the WGA localization. This is too few to make a clear statement. WGA is only an indirect marker of the cell surface and glycosylated proteins, but it does not indicate whether the ECM is altered in its composition and expression. Other important factors are missing here. In addition, only a single observation is shown, and statistics are missing.

      We understand your concern that WGA localization alone may not be sufficient to conclude changes in the ECM. However, we observed that luminal WGA signals colocalize with Dpy-YFP in the WT SG (Figure 5-figure supplement 2C), suggesting that WGA detects the aECM structure containing Dpy. The similar behavior of WGA and Dpy-YFP signals in multiple genotypes further supports this idea. In Papss mutants with a thin lumen phenotype, both WGA and Dpy-YFP signals are condensed (Figure 5E-H), and in pio mutants, both are absent from the lumen (Figure 6B, D). We analyzed WGA signals in over 25 samples of WT and Papss mutants, observing consistent phenotypes. We have included the number of samples in the text. While we acknowledge that WGA is an indirect marker, our data suggest that it is a reliable indicator of the aECM structure containing Dpy. 

      (5) Reduced WGA staining is seen in papss mutants, but this could be due to other circumstances. To be sure, a statistic with the number of dots must be shown, as well as an intensity blot on several independent samples. The images are from single confocal sections. It could be that the dots appear in a different Z-plane. Therefore, a 3D visualization of the voxels must be shown to identify and, at best, quantify the dots in the organ.

      We have quantified cytoplasmic punctate WGA signals. Using spinning disk microscopy with super-resolution technology (Olympus SpinSR10 Sora), we obtained high-resolution images of cytoplasmic punctate signals of WGA in WT, Papss mutant, and rescue SGs with the WT and mutant forms of Papss-PD. We then generated 3D reconstructed images of these signals using Imaris software (Figure 3E-H) and quantified the number and intensity of puncta. Statistical analysis of these data confirms the reduction of the number and intensity of WGA puncta in Papss mutants (Figure 3I, I’). The number of WGA puncta was restored by expressing WT Papss but not the mutant form. By using 3D visualization and quantification, we have ensured that our results are not limited to a single confocal section and account for potential variations in Z-plane localization of the dots.

      (6) A colocalization analysis (statistics) should be shown for the overlap of WGA with ManII-GFP.

      Since WGA labels multiple structures, including the nuclear envelope and ECM structures, we focused on assessing the colocalization of the cytoplasmic WGA punctate signals and ManIIGFP signals. Standard colocalization analysis methods, such as Pearson’s correlation coefficient or Mander’s overlap coefficient, would be confounded by WGA signals in other tissues. Therefore, we used a fluorescent intensity line profile to examine the spatial relationship between WGA and ManII-GFP signals in WT and Papss mutants (Figure 3L, L’). 

      (7) I do not understand how the authors describe "statistics of secretory vesicles" as an axis in Figure 3p. The TEM images do not show labeled secretory vesicles but empty structures that could be vesicles.

      Previous studies have analyzed “filled” electron-dense secretory vesicles in TEM images of SG cells (Myat and Andrew, 2002, Cell; Fox et al., 2010, J Cell Biol; Chung and Andrew, 2014, Development). Consistent with these studies, our WT TEM images show these vesicles. In contrast, Papss mutants show a mix of filled and empty structures. For quantification, we specifically counted the filled electron-dense vesicles (now Figure 3W). A clear description of our analysis is provided in the figure legend.

      (8) The quality of the presented TEM images is too low to judge any difference between control and mutants. Therefore, the supplement must present them in better detail (higher pixel number?).

      We disagree that the quality of the presented TEM images is too low. Our TEM images have sufficient resolution to reveal details of many subcellular structures, such as mitochondrial cisternae. The pdf file of the original submission may not have been high resolution. To address this concern, we have provided several original high-quality TEM images of both WT and Papss mutants at various magnifications in Figure 2-figure supplement 2. Additionally, we have included low-magnification TEM images of WT and Papss mutants in Figure 2H and I to provide a clearer view of the overall SG lumen morphology. 

      (9) Line 266: the conclusion that apical trafficking is "significantly impaired" does not hold. This implies that Papss is essential for apical trafficking, but the analyzed ECM proteins (Pio, Dumpy) are found apically enriched in the mutants, and Dumpy is even secreted. Moreover, they analyze only one marker, Sec15, and don't provide data about the quantification of the secretion of proteins.

      We agree and have revised our statement to “defective sulfation affects Golgi structures and multiple routes of intracellular trafficking”. 

      (10) DCP-1 was used to detect apoptosis in the glands to analyze acellular regions. However, the authors compare ST16 control with ST15 mutant salivary glands, which is problematic. Further, it is not commented on how many embryos were analyzed and how often they detect the dying cells in control and mutant embryos. This part must be improved.

      Thank you for the comment. We agree and have included quantification. We used stage 16 samples from WT and Papss mutants to quantify acellular regions. Since DCP-1 signals are only present at a specific stage of apoptosis, some acellular regions do not show DCP-1 signals. Therefore, we counted acellular regions regardless of DCP-1 signals. We also quantified this in rescue embryos with WT and mutant forms of Papss, which show complete rescue with WT and no rescue with the mutant form, respectively. The graph with a statistical analysis is included (Figure 4D, E).

      (11) WGA and Dumpy show similar condensed patterns within the tube lumen. The authors show that dumpy is enriched from stage 14 onwards. How is it with WGA? Does it show the same pattern from stage 14 to 16? Papss mutants can suffer from a developmental delay in organizing the ECM or lack of internalization of luminal proteins during/after tube expansion, which is the case in the trachea.

      Dpy-YFP and WGA show overlapping signals in the SG lumen throughout morphogenesis. DpyYFP is SG enriched in the lumen from stage 11, not stage 14 (Figure 5-figure supplement 2). WGA is also detected in the lumen throughout SG morphogenesis, similar to Dpy. In the original supplemental figure, only a stage 16 SG image was shown for co-localization of Dpy-YFP and WGA signals in the SG lumen. We have now included images from stage 14 and 15 in Figure 5figure supplement 2C. 

      Given that luminal Pio signals are lost at stage 16 only and that Dpy signals appear as condensed structures in the lumen of Papss mutants, it suggests that the internalization of luminal proteins is not impaired in Papss mutants. Rather, these proteins are secreted but fail to organize properly. 

      (12) Line 366. Luminal morphology is characterized by bulging and constrictions. In the trachea, bulges indicate the deformation of the apical membrane and the detachment from the aECM. I can see constrictions and the collapsed tube lumen in Fig. 6C, but I don't find the bulges of the apical membrane in pio and Np mutants. Maybe showing it more clearly and with better quality will be helpful.

      Since the bulging phenotype appears to vary from sample to sample, we have revised the description of the phenotype to “constrictions” to more accurately reflect the consistent observations. We quantified the number of constrictions along the entire lumen in pio and Np mutants and included the graph in Figure 6F.

      (13) The authors state that Papss controls luminal secretion of Pio and Dumpy, as they observe reduced luminal staining of both in papss mutants. However, the mCh-Pio and Dumpy-YFP are secreted towards the lumen. Does papss overexpression change Pio and Dumpy secretion towards the lumen, and could this be another explanation for the multiple phenotypes? 

      Thank you for the comment. To clarify, we did not observe reduced luminal staining of Pio and Dpy in Papss mutants, nor did we state that Papss controls luminal secretion of Pio and Dpy. In Papss mutants, Pio luminal signals are absent specifically at stage 16 (Figure 5H), whereas strong luminal Pio signals are present until stage 15 (Figure 5G). For Dpy-YFP, the signals are not reduced but condensed in Papss mutants from stages 14-16 (Figure 5D, H). 

      It remains unclear whether the apparent loss of Pio signals is due to a loss of Pio protein in the lumen or due to epitope masking resulting from protein aggregation or condensation. As noted in our response to Comment 11 internalization of luminal proteins seems unaffected in Papss mutants; proteins like Pio and Dpy are secreted into the lumen but fail to properly organize. Therefore, we have not tested whether Papss overexpression alters the secretion of Pio or Dpy.

      In our original submission, we incorrectly stated that uniform luminal mCh-Pio signals were unchanged in Papss mutants. Upon closer examination, we found these signals are absent in the expanded luminal region in stage 16 SG (where Dpy-YFP is also absent), and weak mCh-Pio signals colocalize with the condensed Dpy-YFP signals (Figure 5C, D). We have revised the text accordingly. 

      Regulation of luminal ZP protein level is essential to modulate the tube expansion; therefore, Np releases Pio and Dumpy in a controlled manner during st15/16. Thus, the analysis of Pio and Dumpy in NP overexpression embryos will be critical to this manuscript to understand more about the control of luminal ZP matrix proteins.

      Thanks for the insightful suggestion. We overexpressed both the WT and mutant form of Np using UAS-Np.WT and UAS-Np.S990A lines (Drees et al., 2019) and analyzed mCh-Pio, Pio antibody, and Dpy-YFP signals. It is important to note that these overexpression experiments were done in the presence of the endogenous WT Np. 

      Overexpression of Np.WT led to increased levels of mCh-Pio, Pio, and Dpy-YFP signals in the lumen and at the apical membrane. In contrast, overexpression of Np.S990A resulted in a near complete loss of luminal mCh-Pio signals. Pio antibody signals remained strong at the apical membrane but was weaker in the luminal filamentous structures compared to WT. 

      Due to the GFP tag present in the UAS-Np.S990A line, we could not reliably analyze Dpy-YFP signals because of overlapping fluorescent signals in the same channel. However, the filamentous Pio signals in the lumen co-localized with GFP signals, suggesting that these structures might also include Dpy-YFP, although this cannot be confirmed definitively. 

      These results suggest that overexpressed Np.S990A may act in a dominant-negative manner, competing with endogenous Np and impairing proper cleavage of Pio (and mCh-Pio). Nevertheless, some level of cleavage by endogenous Np still appears to occur, as indicated by the residual luminal filamentous Pio signals. These new findings have been incorporated into the revised manuscript and are shown in Figure 6H and 6I.

      (14) Minor:

      Fig. 5 C': mChe-Pio and Dumpy-YFP are mixed up at the top of the images.

      Thanks for catching this error.  It has been corrected.

      Sup. Fig7. A shows Pio in purple but B in green. Please indicate it correctly.

      It has been corrected.

      Reviewer #1 (Significance):

      In 2023, the functions of Pio, Dumpy, and Np in the tracheal tubes of Drosophila were published. The study here shows similar results, with the difference that the salivary glands do not possess chitin, but the two ZP proteins Pio and Dumpy take over its function. It is, therefore, a significant and exciting extension of the known function of the three proteins to another tube system. In addition, the authors identify papss as a new protein and show its essential function in forming the luminal matrix in the salivary glands. Considering the high degree of conservation of these proteins in other species, the results presented are crucial for future analyses and will have further implications for tubular development, including humans.

      Reviewer #2 (Evidence, reproducibility and clarity):

      Summary:

      There is growing appreciation for the important of luminal (apical) ECM in tube development, but such matrices are much less well understood than basal ECMs. Here the authors provide insights into the aECM that shapes the Drosophila salivary gland (SG) tube and the importance of PAPSS-dependent sulfation in its organization and function.

      The first part of the paper focuses on careful phenotypic characterization of papss mutants, using multiple markers and TEM. This revealed reduced markers of sulfation (Alcian Blue staining) and defects in both apical and basal ECM organization, Golgi (but not ER) morphology, number and localization of other endosomal compartments, plus increased cell death. The authors focus on the fact that papss mutants have an irregular SG lumen diameter, with both narrowed regions and bulged regions. They address the pleiotropy, showing that preventing the cell death and resultant gaps in the tube did not rescue the SG luminal shape defects and discussing similarities and differences between the papss mutant phenotype and those caused by more general trafficking defects. The analysis uses a papss nonsense mutant from an EMS screen - I appreciate the rigorous approach the authors took to analyze transheterozygotes (as well as homozygotes) plus rescued animals in order to rule out effects of linked mutations.

      The 2nd part of the paper focuses on the SG aECM, showing that Dpy and Pio ZP protein fusions localize abnormally in papss mutants and that these ZP mutants (and Np protease mutants) have similar SG lumen shaping defects to the papss mutants. A key conclusion is that SG lumen defects correlate with loss of a Pio+Dpy-dependent filamentous structure in the lumen. These data suggest that ZP protein misregulation could explain this part of the papss phenotype.

      Overall, the text is very well written and clear. Figures are clearly labeled. The methods involve rigorous genetic approaches, microscopy, and quantifications/statistics and are documented appropriately. The findings are convincing, with just a few things about the fusions needing clarification.

      Minor comments

      (1) Although the Dpy and Qsm fusions are published reagents, it would still be helpful to mention whether the tags are C-terminal as suggested by the nomenclature, and whether Westerns have been performed, since (as discussed for Pio) cleavage could also affect the appearance of these fusions.

      Thanks for the comment. Dpy-YFP is a knock-in line in which YFP is inserted into the middle of the dpy locus (Lye et al., 2014; the insertion site is available on Flybase). mCh-Qsm is also a knock-in line, with mCh inserted near the N-terminus of the qsm gene using phi-mediated recombination using the qsm<sup>MI07716</sup> line (Chu and Hayashi, 2021; insertion site available on Flybase). Based on this, we have updated the nomenclature from Qsm-mCh to mCh-Qsm throughout the manuscript to accurately reflect the tag position. To our knowledge, no western blot has been performed on Dpy-YFP or mCh-Qsm lines. We have mentioned this explicitly in the Discussion.  

      (2) The Dpy-YFP reagent is a non-functional fusion and therefore may not be a wholly reliable reporter of Dpy localization. There is no antibody confirmation. As other reagents are not available to my knowledge, this issue can be addressed with text acknowledgement of possible caveats.

      Thanks for raising this important point. We have added a caveat in the Discussion noting this limitation and the need for additional tools, such as an antibody or a functional fusion protein, to confirm the localization of Dpy.

      (3) TEM was done by standard chemical fixation, which is fine for viewing intracellular organelles, but high pressure freezing probably would do a better job of preserving aECM structure, which looks fairly bad in Fig. 2G WT, without evidence of the filamentous structures seen by light microscopy. Nevertheless, the images are sufficient for showing the extreme disorganization of aECM in papss mutants.

      We agree that HPF is a better method and intent to use the HPF system in future studies. We acknowledge that chemical fixation contributes to the appearance of a gap between the apical membrane and the aECM, which we did not observe in the HPF/FS method (Chung and Andrew, 2014). Despite this, the TEM images still clearly reveal that Papss mutants show a much thinner and more electron-dense aECM compared to WT (Figure 2H, I), consistent to the condensed WGA, Dpy, and Pio signals in our confocal analyses. As the reviewer mentioned, we believe that the current TEM data are sufficient to support the conclusion of severe aECM disorganization and Golgi defects in Papss mutants.

      (4) The authors may consider citing some of the work that has been done on sulfation in nematodes, e.g. as reviewed here: https://pubmed.ncbi.nlm.nih.gov/35223994/ Sulfation has been tied to multiple aspects of nematode aECM organization, though not specifically to ZP proteins.

      Thank you for the suggestion. Pioneering studies in C. elegans have highlighted the key role of sulfation in diverse developmental processes, including neuronal organization, reproductive tissue development, and phenotypic plasticity. We have now cited several works.  

      Reviewer #2 (Significance):

      This study will be of interest to researchers studying developmental morphogenesis in general and specifically tube biology or the aECM. It should be particularly of interest to those studying sulfation or ZP proteins (which are broadly present in aECMs across organisms, including humans).

      This study adds to the literature demonstrating the importance of luminal matrix in shaping tubular organs and greatly advances understanding of the luminal matrix in the Drosophila salivary gland, an important model of tubular organ development and one that has key matrix differences (such as no chitin) compared to other highly studied Drosophila tubes like the trachea.

      The detailed description of the defects resulting from papss loss suggests that there are multiple different sulfated targets, with a subset specifically relevant to aECM biology. A limitation is that specific sulfated substrates are not identified here (e.g. are these the ZP proteins themselves or other matrix glycoproteins or lipids?); therefore it's not clear how direct or indirect the effects of papss are on ZP proteins. However, this is clearly a direction for future work and does not detract from the excellent beginning made here.

      My expertise: I am a developmental geneticist with interests in apical ECM

      Reviewer #3 (Evidence, reproducibility and clarity):

      In this work Woodward et al focus on the apical extracellular matrix (aECM) in the tubular salivary gland (SG) of Drosophila. They provide new insights into the composition of this aECM, formed by ZP proteins, in particular Pio and Dumpy. They also describe the functional requirements of PAPSS, a critical enzyme involved in sulfation, in regulating the expansion of the lumen of the SG. A detailed cellular analysis of Papss mutants indicate defects in the apical membrane, the aECM and in Golgi organization. They also find that Papss control the proper organization of the Pio-Dpy matrix in the lumen. The work is well presented and the results are consistent.

      Main comments

      - This work provides a detailed description of the defects produced by the absence of Papss. In addition, it provides many interesting observations at the cellular and tissular level. However, this work lacks a clear connection between these observations and the role of sulfation. Thus, the mechanisms underlying the phenotypes observed are elusive. Efforts directed to strengthen this connection (ideally experimentally) would greatly increase the interest and relevance of this work.

      Thank you for this thoughtful comment. To directly test whether the phenotypes observed in Papss mutants are due to the loss of sulfation activity, we generated transgenic lines expressing catalytically inactive forms of Papss, UAS-PapssK193A, F593P, in which key residues in the APS kinase and ATP sulfurylase domains are mutated. Unlike WT UAS-Papss (both the Papss-PD or Papss-PE isoforms), the catalytically inactive UAS-Papssmut failed to rescue any of the phenotypes, including the thin lumen phenotype (Figure 1I-L), altered WGA signals (Figure I, I’) and the cell death phenotype (Figure 4D, E). These findings strongly support the conclusion that the enzymatic sulfation activity of Papss is essential for the developmental processes described in this study.  

      - A main issue that arises from this work is the role of Papss at the cellular level. The results presented convincingly indicate defects in Golgi organization in Papss mutants. Therefore, the defects observed could stem from general defects in the secretion pathway rather than from specific defects on sulfation. This could even underly general/catastrophic cellular defects and lead to cell death (as observed).

      This observation has different implications. Is this effect observed in SGs also observed in other cells in the embryo? If Papss has a general role in Golgi organization this would be expected, as Papss encodes the only PAPs synthatase in Drosophila.

      Can the authors test any other mutant that specifically affect Golgi organization and investigate whether this produces a similar phenotype to that of Papss?

      Thank you for the comment. To address whether the defects observed in Papss mutants stem from general disruption of the secretory pathway due to Golgi disorganization, we examined mutants of two key Golgi components: Grasp65 and GM130. 

      In Grasp65 mutants, we observed significant defects in SG lumen morpholgy, including highly irregular SG lumen shape and multiple constrictions (100%; n=10/10). However, the lumen was not uniformly thin as in Papss mutants. In contrast, GM130 mutants–although this line was very sick and difficult to grow–showed relatively normal salivary glands morphology in the few embryos that survived to stage 16 (n=5/5). It is possible that only embryos with mild phenotypes progressed to this stages, limiting interpretation. These data have now been included in Figure 3-figure supplement 2. Overall, while Golgi disruption can affect SG morphology, the specific phenotypes seen in Papss mutants are not fully recapitulated by Grasp65 or GM130 loss. 

      - A model that conveys the different observations and that proposes a function for Papss in sulfation and Golgi organization (independent or interdependent?) would help to better present the proposed conclusions. In particular, the paper would be more informative if it proposed a mechanism or hypothesis of how sulfation affects SG lumen expansion. Is sulfation regulating a factor that in turn regulates Pio-Dpy matrix? Is it regulating Pio-Dpy directly? Is it regulating a

      product recognized by WGA?

      For instance, investigating Alcian blue or sulfotyrosine staining in pio, dpy mutants could help to understand whether Pio, Dpy are targets of sulfation.

      Thank you for the comment. We’re also very interested in learning whether the regulation of the Pio-Dpy matrix is a direct or indirect consequence of the loss of sulfation on these proteins. One possible scenario is that sulfation directly regulates the Pio-Dpy matrix by regulating protein stability through the formation of disulfide bonds between the conserved Cys residues responsible for ZP module polymerization. Additionally, the Dpy protein contains hundreds of EGF modules that are highly susceptible to O-glycosylation. Sulfation of the glycan groups attached to Dpy may be critical for its ability to form a filamentous structure. Without sulfation, the glycan groups on Dpy may not interact properly with the surrounding materials in the lumen, resulting in an aggregated and condensed structure. These possibilities are discussed in the Discussion.

      We have not analyzed sulfation levels in pio or dpy mutants because sulfation levels in mutants of single ZP domain proteins may not provide much information. A substantial number of proteoglycans, glycoproteins, and proteins (with up to 1% of all tyrosine residues in an organism’s proteins estimated to be sulfated) are modified by sulfation, so changes in sulfation levels in a single mutant may be subtle. Especially, the existing dpy mutant line is an insertion mutant of a transposable element; therefore, the sulfation sites would still remain in this mutant. 

      - Interpretation of Papss effects on Pio and Dpy would be desired. The results presented indicate loss of Pio antibody staining but normal presence of cherry-Pio. This is difficult to interpret. How are these results of Pio antibody and cherry-Pio correlating with the results in the trachea described recently (Drees et al. 2023)?

      In our original submission, we stated that the uniform luminal mCh-Pio signals were not changed in Papss mutants, but after re-analysis, we found that these signals were actually absent from the expanded luminal region in stage 16 SG (where Dpy-YFP is also absent), and weak mCh-Pio signals colocalize with the condensed Dpy-YFP signals (Figure 5C, D). We have revised the text accordingly. 

      After cleavages by Np and furin, the Pio protein should have three fragments. The Nterminal region contains the N-terminal half of the ZP domain, and mCh-Pio signals show this fragment. The very C-terminal region should localize to the membrane as it contains the transmembrane domain. We think the middle piece, the C-terminal ZP domain, is recognized by the Pio antibody. The mCh-Pio and Pio antibody signals in the WT trachea (Drees et al., 2023) are similar to those in the SG. mCh-Pio signals are detected in the tracheal lumen as uniform signals, at the apical membrane, and in cytoplasmic puncta. Pio antibody signals are exclusively in the tracheal lumen and show more heterogenous filamentous signals. 

      In Papss mutants, the middle fragment (the C-terminal ZP domain) seems to be most affected because the Pio antibody signals are absent from the lumen. The loss of Pio antibody signals could be due to protein degradation or epitope masking caused by aECM condensation and protein misfolding. This fragment seems to be key for interacting with Dpy, since Pio antibody signals always colocalize with Dpy-YFP. The N-terminal mCh-Pio fragment does not appear to play a significant role in forming a complex with Dpy in WT (but still aggregated together in Papss mutants), and this can be tested in future studies.

      In response to Reviewer 1’s comment, we performed an additional experiment to test the role of Np in cleaving Pio to help organize the SG aECM. In this experiment, we overexpressed the WT and mutant form of Np using UAS-Np.WT and UAS-Np.S990A lines (Drees et al., 2019) and analyzed mCh-Pio, Pio antibody, and Dpy-YFP signals. Np.WT overexpression resulted in increased levels of mCh-Pio, Pio, and Dpy-YFP signals in the lumen and at the apical membrane. However, overexpression of Np.S990A resulted in the absence of luminal mCh-Pio signals. Pio antibody signals were strong at the apical membrane but rather weak in the luminal filamentous structures. Since the UAS-Np.S990A line has the GFP tag, we could not reliably analyze Dpy-YFP signals due to overlapping Np.S990A.GFP signals in the same channel. However, the luminal filamentous Pio signals co-localized with GFP signals, and we assume that these overlapping signals could be Dpy-YFP signals. 

      These results suggest that overexpressed Np.S990A may act in a dominant-negative manner, competing with endogenous Np and impairing proper cleavage of Pio (and mCh-Pio). Nevertheless, some level of cleavage by endogenous Np still appears to occur, as indicated by the residual luminal filamentous Pio signals. These new findings have been incorporated into the revised manuscript and are shown in Figure 6H and 6I. 

      A proposed model of the Pio-Dpy aECM in WT, Papss, pio, and Np mutants has now been included in Figure 7.

      -  What does the WGA staining in the lumen reveal? This staining seems to be affected differently in pio and dpy mutants: in pio mutants it disappears from the lumen (as dpy-YFP does), but in dpy mutants it seems to be maintained. How do the authors interpret these findings? How does the WGA matrix relate to sulfated products (using Alcian blue or sulfotyrosine)?

      WGA binds to sialic acid and N-acetylglucosamine (GlcNAc) residues on glycoproteins and glycolipids. GlcNAc is a key component of the glycosaminoglycan (GAG) chains that are covalently attached to the core protein of a proteoglycan, which is abundant in the ECM. We think WGA detects GlcNAc residues in the components of the aECM, including Dpy as a core component, based on the following data. 1) WGA and Dpy colocalize in the lumen, both in WT (as thin filamentous structures) and Papss mutant background (as condensed rod-like structures), and 2) are absent in pio mutants. WGA signals are still present in a highly condensed form in dpy mutants. That’s probably because the dpy mutant allele (dpyov1) has an insertion of a transposable element (blood element) into intron 11 and this insertion may have caused the Dpy protein to misfold and condense. We added the information about the dpy allele to the Results section and discussed it in the Discussion.

      Minor points:

      - The morphological phenotypic analysis of Papss mutants (homozygous and transheterozygous) is a bit confusing. The general defects are higher in Papss homozygous than in transheterozygotes over a deficiency. Maybe quantifying the defects in the heterozygote embryos in the Papss mutant collection could help to figure out whether these defects relate to Papss mutation.

      We analyzed the morphology of heterozygous Papss mutant embryos. They were all normal. The data and quantifications have now been added to Figure 1-figure supplement 3. 

      - The conclusion that the apical membrane is affected in Papss mutants is not strongly supported by the results presented with the pattern of Crb (Fig 2). Further evidences should be provided. Maybe the TEM analysis could help to support this conclusion

      We quantified Crb levels in the sub-apical and medial regions of the cell and included this new quantification in Figure 2D. TEM images showed variation in the irregularity of the apical membrane, even in WT, and we could not draw a solid conclusion from these images.

      - It is difficult to understand why in Papss mutants the levels of WGA increase. Can the authors elaborate on this?

      We think that when Dpy (and many other aECM components) are condensed and aggregated into the thin, rod-like structure in Papss mutants, the sugar residues attached to them must also be concentrated and shown as increased WGA signals.   

      - The explanation about why Pio antibody and mcherry-Pio show different patterns is not clear. If the antibody recognizes the C-t region, shouldn't it be clearly found at the membrane rather than the lumen?

      The Pio protein is also cleaved by furin protease (Figure 5B). We think the Pio fragment recognized by the antibody should be a “C-terminal ZP domain”, which is a middle piece after furin + Np cleavages. 

      - The qsm information does not seem to provide any relevant information to the aECM, or sulfation.

      Since Qsm has been shown to bind to Dpy and remodel Dpy filaments in the muscle tendon (Chu and Hayashi, 2021), we believe that the different behavior of Qsm in the SG is still informative. As mentioned briefly in the Discussion, the cleaved Qsm fragment may localize differently, like Pio, and future work will need to test this. We have shortened the description of the Qsm localization in the manuscript and moved the details to the figure legend of Figure 5-figure supplement 3.

      Reviewer #3 (Significance):

      Previous reports already indicated a role for Papss in sulfation in SG (Zhu et al 2005). Now this work provides a more detailed description of the defects produced by the absence of Papss. In addition, it provides relevant data related to the nature and requirements of the aECM in the SG. Understanding the composition and requirements of aECM during organ formation is an important question. Therefore, this work may be relevant in the fields of cell biology and morphogenesis.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      In this study, the authors identify an insect salivary protein participating viral initiate infection in plant host. They found a salivary LssaCA promoting RSV infection by interacting with OsTLP that could degrade callose in plants. Furthermore, RSV NP bond to LssaCA in salivary glands to form a complex, which then bond to OsTLP to promote degradation of callose.

      The story focus on tripartite virus-insect vector-plant interaction and is interesting. However, the study is too simple and poor-conducted. The conclusion is also overstated due to unsolid findings.

      We thank the reviewer for their constructive feedback. We have conducted additional experiments to strengthen our results and conclusions as detailed below:

      (1) The comparison between vector inoculation and microinjection involves multiple confounding factors that could affect the experimental results, including salivary components, RSV inoculation titers, and the precision of viral deposition. The differential outcomes could be attributed to these various factors rather than definitively demonstrating the necessity of salivary factors. Therefore, we have removed this comparison from the revised manuscript and instead focused on elucidating the specific mechanisms by which LssaCA facilitates viral infection.

      (2) We conducted new experiments to assess the function of LssaCA enzymatic activity in mediating RSV infection. Additional experiments revealed that OsTLP enzymatic activity is highly pH-dependent, with increased activity as pH decreases from 7.5 to 5.0 (Fig. 3H). However, the LssaCA-OsTLP interaction at pH 7.4 significantly enhanced OsTLP enzymatic activity without requiring pH changes. These results demonstrate that LssaCA-OsTLP protein interactions are crucial for mediating RSV infection. In contrast to pH-dependent mechanisms, our study demonstrated that LssaCA's biological function in mediating RSV infection is at least partially, if not completely, independent of its enzymatic activity. We have added these new resulted into the revised manuscript (Lines 220-227). We have also added a comprehensive discussion comparing the aphid CA mechanism described by Guo et al. (2023 doi.org/10.1073/pnas.2222040120) with our findings in the revised manuscript (Lines 350-371).

      (3) We have repeated majority of callose deposition experiments, providing clearer images (Figures 5-6). In addition to aniline blue staining, we quantified callose concentrations using a plant callose ELISA kit to provide more precise measurements (Figure 5A, I, 6A, C and S8A). We utilized RT-qPCR to measure callose synthase expression in both feeding and non-feeding areas, confirming that callose synthesis was induced specifically in feeding regions, leading to localized callose deposition (Figures 5D-G and S8B-E). For sieve plate visualization, we examined longitudinal sections, which revealed callose deposition in sieve plates during SBPH feeding and RSV infection (Figure S7).

      (4) We generated OsTLP mutant rice seedlings (ostlp) and use this mutant to directly demonstrate that LssaCA mediates callose degradation in planta through enhancement of OsTLP enzymatic activity (Lines 288-302 and Figure 6).

      (5) We produced LssaCA recombinant proteins in sf9 cells to ensure full enzymatic activity and constructed a comprehensive CA mutant protein, in which all seven residues constituting the enzymatic active center mutated (LssaCA<sup>H111D</sup>,LssaCA<sup>N139H</sup>,LssaCA<sup>H141D</sup>, LssaCA<sup>H143D</sup>, LssaCA<sup>E153H</sup>, LssaCA<sup>H166D</sup>, LssaCA<sup>T253E</sup>) (Fig. S1B). This LssaCA mutant protein demonstrated complete loss of enzymatic activity (Fig. 1C).

      Major comments:

      (1) The key problem is that how long the LssCA functioned for in rice plant. Author declared that LssCA had no effect on viral initial infection, but on infection after viral inoculation. It is unreasonable to conclude that LssCA promoted viral infection based on the data that insect inoculated plant just for 2 days, but viral titer could be increased at 14 days post-feeding. How could saliva proteins, which reached phloem 12-14 days before, induce enough TLP to degrade callose to promote virus infection? It was unbelievable.

      We appreciate your insightful comment and acknowledge that our initial description may have been unclear. We agree that salivary proteins would not present in plant tissues for two weeks post-feeding or post-injection. Our intention was to clarify that when salivary proteins enhance RSV infection, this initial enhancement leads to sustained high viral loads. We measured viral burden at 14 days post-feeding or post-injection because this is the common measurement time point when viral titers are sufficiently high for reliable detection by qRT-PCR or western blotting. We have clarified this rationale in the revised manuscript (Lines 155-157).

      To determine the actual persistence of LssaCA in plant tissues, we conducted additional experiments where insects were allowed to feed on a defined aera of rice seedlings for two days. We then monitored LssaCA protein levels at 1 and 3 days after removing the insects. Western blotting analysis revealed that LssaCA protein levels decreased post-feeding and remained detectable at 3 days post-feeding. These results are presented in Figure 2H and described in detail in Lines 184-193.

      (2) Lines 110-116 and Fig. 1, the results of viruliferous insect feeding and microinjection with purified virus could not conclude the saliva factor necessary of RSV infection, because these two tests are not in parallel and comparable. Microinjection with salivary proteins combined with purified virus is comparable with microinjection with purified virus.

      We thank the reviewer’s insightful comment. We agree that “the results of viruliferous insect feeding and microinjection with the purified virus could not conclude the saliva factor necessary of RSV infection”. However, due to the technical difficulty in collecting sufficient quantities of salivary proteins to conduct the microinjection experiment, we have removed these results from the revised manuscript.

      (3) The second problem is how many days post viruliferous insect feeding and microinjection with purified virus did author detect viral titers? in Method section, authors declared that viral titers was detected at 7-14 days post microinjection. Please demonstrate the days exactly.

      We thank the reviewer’s insightful comment. We typically measured RSV infection levels at both 7- and 14-days post-microinjection. However, since the midrib microinjection experiments have been removed from the revised manuscript, this methodology has also been removed accordingly.

      (4) The last problem is that how author made sure that the viral titers in salivary glands of insects between two experiments was equal, causing different phenotype of rice plant. If not, different viral titers in salivary glands of insects between two experiments of course caused different phenotype of rice plant.

      We thank the reviewer’s comment. When we compared the effects of LssaCA deficiency on RSV infection of rice plants, we have compared the viral titers in the insect saliva and salivary glands. The results indicated that the virus titers in both tissues have not changed by LssaCA deficiency, suggesting that the viruses inoculated into rice phloem by insects of different treatments were comparable. Please refer to the revised manuscript Figures 2D-G and Lines 161-173.

      (5) The callose deposition in phloem can be induced by insect feeding. In Fig. 5H, why was the callose deposition increased in the whole vascular bundle, but not phloem? Could the transgenic rice plant directional express protein in the phloem? In Fig. 5, why was callose deposition detected at 24 h after insect feeding? In Fig. 6A, why was callose deposition decreased in the phloem, but not all the cells of the of TLP OE plant? Also in Fig.6A and B, expression of callose synthase genes was required.

      We thank the reviewer for these insightful comments.

      (1) Figure 5. The callose deposition increased in multiple cells within the vascular bundle, including sieve tubes, parenchymatic cells, and companion cells. While callose deposition was detected in other parts of the vascular bundle, no significant differences were observed between treatments in these regions, indicating that in response to RSV infection and other treatments, altered callose deposition mainly occurred in phloem cells. Please refer to the revised 5B, 5J, 6B, and 6D.

      (2) Transgenic plant expression. The OsTLP-overexpressing transgenic rice plants express TLP proteins in various cells under the control of CaMV 35S promoter, rather than being directionally expressed in the phloem. However, since TLP proteins are secreted, they are potentially transported and concentrated in the phloem where they can degrade callose.

      (3) Figure 5. The 24-hour time point for callose deposition detection was selected based on established protocols from previous studies. According to Hao et al. (Plant Physiology 2008), callose deposition increased during the first 3 days of planthopper infestation and decreased after 4 days. Additionally, Ellinger and Voigt (Ann Bot 2014) demonstrated that callose visualization typically begins 18-24 hours after treatment, making 24 hours an optimal detection time point.

      (4) Figure 6, Phloem-specific changes. Similar to Figure 5, while callose deposition was detected in other parts of vascular bundle, significant differences between treatments were mainly observed in phloem cells, indicating that RSV infection specifically affects callose deposition in phloem tissue.

      (5) Callose synthase gene expression. We performed RT-qPCR analysis to measure the expression levels of callose synthase genes. The results indicated that OsTLP overexpression did not significantly alter the mRNA levels of these genes, regardless of RSV infection status in SBPH.

      Reviewer #2 (Public Review):

      There is increasing evidence that viruses manipulate vectors and hosts to facilitate transmission. For arthropods, saliva plays an essential role for successful feeding on a host and consequently for arthropod-borne viruses that are transmitted during arthropod feeding on new hosts. This is so because saliva constitutes the interaction interface between arthropod and host and contains many enzymes and effectors that allow feeding on a compatible host by neutralizing host defenses. Therefore, it is not surprising that viruses change saliva composition or use saliva proteins to provoke altered vector-host interactions that are favorable for virus transmission. However, detailed mechanistic analyses are scarce. Here, Zhao and coworkers study transmission of rice stripe virus (RSV) by the planthopper Laodelphax striatellus. RSV infects plants as well as the vector, accumulates in salivary glands and is injected together with saliva into a new host during vector feeding.

      The authors present evidence that a saliva-contained enzyme - carbonic anhydrase (CA) - might facilitate virus infection of rice by interfering with callose deposition, a plant defense response. In vitro pull-down experiments, yeast two hybrid assay and binding affinity assays show convincingly interaction between CA and a plant thaumatin-like protein (TLP) that degrades callose. Similar experiments show that CA and TLP interact with the RSV nuclear capsid protein NT to form a complex. Formation of the CA-TLP complex increases TLP activity by roughly 30% and integration of NT increases TLP activity further. This correlates with lower callose content in RSV-infected plants and higher virus titer. Further, silencing CA in vectors decreases virus titers in infected plants.

      (1) Interestingly, aphid CA was found to play a role in plant infection with two non-persistent non-circulative viruses, turnip mosaic virus and cucumber mosaic virus (Guo et al. 2023 doi.org/10.1073/pnas.2222040120), but the proposed mode of action is entirely different.

      We appreciate the reviewer’s insightful comment and have carefully examined the cited publication. The study by Guo et al. (2023) elucidates a distinct mechanism for aphid-mediated transmission of non-persistent, non-circulative viruses (turnip mosaic virus and cucumber mosaic virus). In their model, aphid-secreted CA-II in the plant cell apoplast leads to H<sup>+</sup> accumulation and localized acidification. This trigger enhanced vesicle trafficking as a plant defense response, inadvertently facilitating virus translocation from the endomembrane system to the apoplast.

      In contrast to these pH-dependent mechanisms, our study demonstrated that LssaCA’s biological function in mediating RSV infection is, if not completely, at least partially independent of its enzymatic activity. We performed additional experiments to reveal that OsTLP enzymatic activity is highly pH-dependent and exhibits increased enzymatic activity as pH decreases from 7.5 to 5.0 (Fig. 3H); however, the LssaCA-OsTLP interaction occurring at pH 7.4 significantly enhanced OsTLP enzymatic activity without any change in buffer pH (Fig. 3G). These results demonstrate the crucial importance of LssaCA-OsTLP protein interactions, rather than enzymatic activity alone, in mediating RSV infection.

      We have incorporated these new experimental results and added a comprehensive discussion comparing the aphid CA mechanism described by Guo et al. (2023) with our findings in the revised manuscript. Please refer to Figures 3G-H, Lines 220-227 and 350-371 for detailed information.

      (2) While this is an interesting work, there are, in my opinion, some weak points. The microinjection experiments result in much lower virus accumulation in rice than infection by vector inoculation, so their interpretation is difficult.

      We acknowledge the reviewer's concern regarding the lower virus accumulation observed in microinjection experiments compared to vector-mediated inoculation. We have removed these experiments from the revised manuscript. To address the core question raised by these experiments, we have conducted new experiments that directly demonstrate the importance of LssaCA-OsTLP protein-protein interactions in mediating RSV infection. These results demonstrate the crucial importance of LssaCA-OsTLP protein interactions, rather than enzymatic activity alone, in mediating RSV infection. Additionally, we have incorporated a comprehensive discussion examining carbonic anhydrase activity, pH homeostasis, and viral infection. Please refer to the detailed experimental results and discussion in the sections mentioned in our previous response (Figures 3G-H, Lines 220-227 and 350-371).

      (3) Also, the effect of injected recombinant CA protein might fade over time because of degradation or dilution.

      We appreciate the reviewer’s insightful comment. This is indeed a valid concern that could affect the interpretation of microinjection results. To address the temporal dynamics of CA protein presence in planta, we conducted time-course experiments to monitor the retention of naturally SBPH-secreted CA proteins in rice plants. Our analysis at 1- and 3- days post-feeding (dpf) revealed that CA protein levels decreased progressively following SBPH feeding, but could also been detected at 3dpf (Fig. 2H). Please refer to Figures 2H and lines 184-193 for detailed information.

      (4) The authors claim that enzymatic activity of CA is not required for its proviral activity. However, this is difficult to assess because all CA mutants used for the corresponding experiments possess residual activity.

      We appreciate the reviewer’s insightful comment. We constructed a comprehensive CA mutant protein in which all seven residues constituting the enzymatic active center mutated (LssaCA<sup>H111D</sup>, LssaCA<sup>N139H</sup>, LssaCA<sup>H141D</sup>, LssaCA<sup>H143D</sup>, LssaCA<sup>E153H</sup>, LssaCA<sup>H166D</sup>, LssaCA<sup>T253E</sup>) (Fig. S1B). This LssaCA mutant protein demonstrated complete loss of enzymatic activity (Fig. 1C). However, since we have removed the recombinant CA protein microinjection experiments from the revised manuscript, we lack sufficient direct evidence to definitively demonstrate that CA enzymatic activity is dispensable for its proviral function. To address the core question raised by these experiments, we have conducted new experiments that provide direct evidence for the importance of LssaCA-OsTLP protein-protein interactions in mediating RSV infection. Additionally, we have incorporated a comprehensive discussion examining carbonic anhydrase activity, pH homeostasis, and viral infection. Please refer to the detailed experimental results and discussion in the sections mentioned in our previous response (Figures 3G-H, Lines 220-227 and 350-371).

      (5) It remains also unclear whether viral infection deregulates CA expression in planthoppers and TLP expression in plants. However, increased CA and TLP levels could alone contribute to reduced callose deposition.

      We have compared LssaCA mRNA levels in RSV-free and RSV-infected L.striatellus salivary glands, which indicated that RSV infection does not significantly affect LssaCA expression (Figure 1J). By using RSV-free and RSV-infected L.striatellus to feed on rice seedlings, we clarified that RSV infection does not affect TLP expression in plants (Figure 5H).

      Reviewer #1: (Recommendations For The Authors):

      Other comments:

      (1) Most data proving viral infection and LssaCA expression were derived from qPCR assays. Western blot data are strongly required to prove the change at the protein level.

      We agree that western blot data are required to prove the change at the protein level. In the revised manuscript, we have added western-blotting results (Figures 1F, 1I, 2C, 2J, and S6).

      (2) Line 145, data that LssaCA was significantly downregulated should be shown.

      Thank you and the data has been added to the revised manuscript. Please refer to Line 165 and Figure 2D.

      (3) Lines 159-161, how did authors assure that the dose of recombinant LssCA was closed to the release level of insect feeding, but not was excessive? How did author exclude the possibility of upregulated RSV titer caused by excessive recombinant LssCA?

      We appreciate this important concern regarding dosage controls. While microinjection of recombinant proteins typically yields viral infection levels significantly lower than those achieved through natural insect feeding, higher protein concentrations are often required to achieve high viral infection levels. In this experiment, we compared RSV infection levels following microinjection of BSA+RSV versus LssaCA+RSV, with the expectation that any observed upregulation in RSV titer would be specifically attributable to recombinant LssaCA rather than excessive protein dosing. However, given the low RSV infection levels observed with viral microinjection, we have removed their corresponding results from the revised manuscript.

      (4) Lines 124-125, recombinantly expressed LssaCA protein should be underlined, but not the LssaCA protein itself.

      We have clearly distinguished recombinantly expressed LssaCA from endogenous LssaCA protein throughout the manuscript, ensuring that all references to recombinant proteins are properly labeled as such.

      (5) LssaCA expression in salivary glands of viruliferous and nonviruliferous insects is required. LssaCA accumulation in rice plant exposed to viruliferous and nonviruliferous insects is also required.

      We have measured LssaCA mRNA levels in salivary glands of viruliferous and nonviruliferous insects (Figure 1J), and protein levels in rice plant exposed to viruliferous and nonviruliferous insects (Figure 1I).

      (6) Fig. 4G, the enzymatic activities of OsTLP were too low compared with that in Fig. 4E and Fig. 7E. Why did the enzymatic activities of the same protein show so obvious difference?

      We apologize for the error in Fig. 4G. The original data presented relative fold changes between OsTLP+BSA and OsTLP+LssaCA treatment, with OsTLP+BSA normalized to 1.0 and OsTLP+LssaCA values expressed as fold changes relative to this baseline. However, the Y-axis was incorrectly labeled as “β-1,3-glucanase (units mg<sup>-1</sup>)”, which suggested absolute enzymatic activity values. We have now corrected the figure (revised Figure 3G) to display the actual absolute enzymatic activity values with the appropriate Y-axis label “β-1,3-glucanase (units mg<sup>-1</sup>)”.

      (7) Fig. 7E, was the LssaCA + NP and LssaCA + GST quantified?

      Yes, all proteins were quantified, and enzymatic activity values were calculated and expressed as units per milligram of proteins (units mg<sup>-1</sup>).

      Minor comments:

      (1) The keywords: In fact, the LssaCA functioned during initial viral infection in plant, but not viral horizontal transmission.

      We appreciate the reviewer’s insightful comment. We have revised the manuscript title to “Rice stripe virus utilizes an Laodelphax striatellus salivary carbonic anhydrase to facilitate plant infection by direct molecular interaction” and changed the keyword from “viral horizontal transmission” to “viral infection of plant”.

      (2) Fig. 2A, how about testes? Was this data derived from female insects? Fig. 2C, is the saliva collected from nonviruliferous insects? Fig. 2E, what is the control?

      We appreciate the reviewer’s insightful comments.

      (1) Fig. 2A: The data present mean and SD calculated from three independent experiments, with 5 tissue samples per experiment. Since 3<sup>rd</sup> instar nymphs were used for feeding experiments in this study, we also used 3<sup>rd</sup> instar RSV-free nymphs to measure gene expression in guts, salivary glands and fat bodies. R-body represents the remaining body after removing these tissues. Female insects were used to measure gene expression in ovaries, and gene expression in testes was also added. We have added this necessary information to the revised manuscript (please refer to new Figure 1F and Lines 402-403).

      (2) Fig. 2C: Yes, saliva was collected from nonviruliferous insects.

      (3) Fig. 2E: The control consisted of 100 mM PBS, as described in the experimental section (Lines 643-644): “A blank control consisted of 2 mL of 100 mM PBS (pH 7.0) mixed with 1 mL of 3 mM p-NPA.” In the revised manuscript, we recombinantly expressed LssaCA and its mutant proteins in both sf9 cells and E.coli. Therefore, we have used the mutant proteins as controls to demonstrate specific enzymatic activity. Please refer to Figure 1C, Lines 115-122 and 621-635 for detailed information.

      (3) Some figure labeling appeared unprofessional. For example, "a-RSV", "loading" in Fig. 1, "W-saliva", "G-saliva" in Fig. 2, and so on, the related explanations were absent.

      We appreciate the reviewer’s insightful comments. We have thoroughly reviewed all figures to ensure professional labels. Specifically, we have:

      (1) Used proper protein names to label western blots and clearly explained the antibodies used for protein detection.

      (2) Provided comprehensive explanations for all abbreviations used in figures within the corresponding figure legends.

      (3) Ensured consistent and clear labeling throughout all figures.

      Please refer to the revised Figures 1-3 for these corrections.

      (4) Lines 83-84, please cite references on callose preventing viral movement. I do not think the present references were relevant.

      We have added a more relevant reference (Yue et al., 2022, Line 82), which revealed that palmitoylated γb promotes virus cell-to-cell movement by interacting with NbREM1 to inhibit callose deposition at plasmodesmata.

      (5) The background of transgenic plants of OsTLP OE should be characterized. And the overexpression of OsTLP should be shown. Which generation of OsTLP OE did authors use?

      The background of transgenic plants of OsTLP OE and its generation used have been shown in the “Materials and methods” section (Line 782-786) and has been mentioned in the main text (Line 214). T<sup>2</sup> lines have been selected for further analysis (Line 789).

      (6) Fig. 5A, the blank, which derived from plants without exposure to insect, was absent.

      We appreciate the reviewer’s insightful comments. We have added the non- fed control in the revised Figure 5A-C.

      (7) Fig. 7A, the nonviruruliferous insects were required to serve as a control.

      Immunofluorescence localization of RSV and LssaCA in uninfected L. striatellus salivary glands have been added to the revised manuscript (Figure S2).

      (8) The manuscript needs English language edit.

      The manuscript has undergone comprehensive English language editing to improve clarity, grammar, and overall readability.

      Reviewer #2 (Recommendations For The Authors):

      (1) The first experiment compares vector inoculation vs microinjection of RSV in tissue. I am not sure that your claim (saliva factors are necessary for inoculation) holds, because the vector injects RSV directly into the phloem, whereas microinjection is less precise and you cannot control where exactly the virus is deposed. However, virus deposited in other tissues than the phloem might not replicate, and indeed you observe, compared to natural vector inoculation, highly reduced virus titers.

      We appreciate the reviewer’s insightful comments. We agree that the comparison between vector inoculation and microinjection involves multiple confounding factors that could affect the experimental results, including salivary components, RSV inoculation titers, and the precision of viral deposition. As the reviewer correctly points out, the differential outcomes could be attributed to these various factors rather than definitively demonstrating the necessity of salivary factors. Therefore, we have removed this comparison from the revised manuscript and instead focused on elucidating the specific mechanisms by which LssaCA facilitates viral infection.

      (2) Next the authors show that a carbonic anhydrase (CA) that they previously detected in saliva is functional and secreted into rice. I assume this is done with non-infected insects, but I did not find the information. Silencing the CA reduces virus titers in inoculated plants at 14 dpi, but not in infected planthoppers. At 1 dpi, there is no difference in RSV titer in plants inoculated with CA silenced planthoppers or control hoppers. To see a direct effect of CA in virus infection, purified virus is injected together with a control protein or recombinant CA into plants. At 14 dpi, there is about double as much virus in the CA-injected plants, but compared to authentic SBPH inoculation, titers are 20,000 times lower. Actually, I believe it is not very likely that the recombinant CA is active or present so long after initial injection.

      We appreciate the reviewer’s insightful comments.

      (1) Our previous study identified the CA proteins from RSV-free insects. We have added this information to the revised manuscript (Line 110).

      (2) We acknowledge the reviewer's concern regarding the lower virus accumulation observed in microinjection experiments compared to vector-mediated inoculation. We have removed these experiments from the revised manuscript and instead focused on elucidating the specific mechanisms by which LssaCA facilitates viral infection.

      (3) We didn’t intend to suggest that LssaCA proteins presented for 14 days post-injection. We measured viral titers at 14 days post-feeding or post-injection because this is the common measurement time point when viral titers are sufficiently high for reliable detection by RT-qPCR or western blotting. We have clarified this rationale in the revised manuscript (Lines 155-157). To determine the actual persistence of LssaCA in plant tissues, we monitored LssaCA protein levels at 1 and 3 dpf. Western blotting analysis revealed that LssaCA protein levels decreased post-feeding and remained detectable at 3 dpf. These results are presented in Figure 2H and described in detail in Lines 184-193.

      (3) Then the authors want to know whether CA activity is required for its proviral action and single amino acid mutants covering the putative active CA site are created. The recombinant mutant proteins have 30-70 % reduced activity, but none of them has zero activity. When microinjected together with RSV into plants, RSV replication is similar as injection with wild type CA. Since no knock-out mutant with zero activity is used, it is difficult to judge whether CA activity is unimportant for viral replication, as claim the authors.

      We appreciate the reviewer’s insightful comment. We constructed a comprehensive CA mutant protein in which all seven residues constituting the enzymatic active center mutated (LssaCA<sup>H111D</sup>, LssaCA<sup>N139H</sup>, LssaCA<sup>H141D</sup>, LssaCA<sup>H143D</sup>, LssaCA<sup>E153H</sup>, LssaCA<sup>H166D</sup>, LssaCA<sup>T253E</sup>) (Fig. S1B). This LssaCA mutant protein demonstrated complete loss of enzymatic activity (Fig. 1C). However, since we have removed the recombinant CA proteins microinjection experiments from the revised manuscript, we lack sufficient direct evidence to definitively demonstrate that CA enzymatic activity is dispensable for its proviral function. To address the core question raised by these experiments, we have conducted new experiments that provide direct evidence for the importance of LssaCA-OsTLP protein-protein interactions in mediating RSV infection. Additionally, we have incorporated a comprehensive discussion examining carbonic anhydrase activity, pH homeostasis, and viral infection. Please refer to the detailed experimental results and discussion in the sections mentioned in our previous response (Figures 3G-H, Lines 220-227 and 350-371).

      (4) Next a yeast two hybrid assay reveals interaction with a thaumatin-like rice protein (TLP). It would be nice to know whether you detected other interacting proteins as well. The interaction is confirmed by pulldown and binding affinity assay using recombinant proteins. The kD is in favor of a rather weak interaction between the two proteins.

      We have added a list of rice proteins that potentially interact with LssaCA (Table S1) and have measured interactions with additional proteins (unpublished data). Despite the relatively weak binding affinity, the functional significance of the LssaCA-OsTLP interaction in enhancing TLP enzymatic activity is substantial.

      (5) Then the glucanase activity of TLP is measured using recombinant TLP-MBP or in vivo expressed TLP. It is not clear to me which TLP is used in Fig. 4G (plant-expressed or bacteria-expressed). If it is plant-expressed TLP, why is its basic activity 10 times lower than in Fig. 4F?

      Fig. 4G is the Fig. 3G in the revised manuscript. A E. coli-expressed TLP protein has been used. We apologize for the error in our original Fig. 4G. The original data presented relative fold changes between OsTLP+BSA and OsTLP+LssaCA treatment, with OsTLP+BSA normalized to 1.0 and OsTLP+LssaCA values expressed as fold changes relative to this baseline. However, the Y-axis was incorrectly labeled as “β-1,3-glucanase (units mg<sup>-1</sup>)”, which suggested absolute enzymatic activity values. We have now corrected the figure to display the actual absolute enzymatic activity values with the appropriate Y-axis label “β-1,3-glucanase (units mg<sup>-1</sup>)”.

      (6) There is also a discrepancy in the construction of the transgenic rice plants: did you use TLP without signal peptide or full length TLP? If you used TLP without signal peptide, you should explain why, because the wild type TLP contains a signal peptide.

      We cloned the full-length OsTLP gene including the signal peptide sequence (Line 782 in the revised manuscript).

      (7) The authors find that CA increases glucanase activity of TLP. Next the authors test callose deposition by aniline blue staining. Feeding activity of RSV-infected planthoppers induces more callose deposition than does feeding by uninfected insects. In the image (Fig. 5A) I see blue stain all over the cell walls of xylem and phloem cells. Is this what the authors expect? I would have expected rather a patchy pattern of callose deposition on cell walls. Concerning sieve plates, I cannot discern any in the image; they are easier to visualize in longitudinal sections than in transversal section as presented here.

      We appreciate the reviewer’s insightful comment.

      (1) Callose deposition pattern: While callose deposition was detected in other parts of the vascular bundle, significant differences between treatments were mainly observed in phloem cells, indicating that phloem-specific callose deposition is the primary response to RSV infection and SBPH feeding (Figures 5B and 5J).

      (2) Sieve plate visualization: We have examined longitudinal sections to visualize sieve plates, which revealed callose deposition in sieve plates during SBPH feeding and RSV infection (Figure S7).

      (3) Quantitative analysis: In addition to aniline blue staining, we quantified callose concentrations using a plant callose ELISA kit to provide more precise measurements (Figure 5A, 5I and S8A).

      (4) Gene expression analysis: We utilized RT-qPCR to measure callose synthase expression in both feeding and non-feeding areas, confirming that callose synthesis was induced specifically in feeding regions, leading to localized callose deposition (Figures 5D-H).

      These experimental results collectively demonstrate that RSV infection induces enhanced callose synthesis and deposition, with this response occurring primarily in phloem cells, including sieve plates, within feeding sites and their immediate vicinity.

      (8) I do not quite understand how you quantified callose deposition (arbitrary areas?) with ImageJ. Please indicate in detail the analysis method.

      We have added more detailed information for the methods to quantify callose deposition (Lines 673-678).

      (9) More callose content is also observed by a callose ELISA assay of tissue extracts and supported by increased expression of glucanase synthase genes. Did you look whether expression of TLP is changed by feeding activity and RSV infection? Silencing CA in planthoppers increases callose deposition, which is inline with the observation that CA increases TLP activity.

      We measured OsTLP expression following feeding by RSV-free or RSV-infected SBPH and found that gene expression was not significantly affected by either insect feeding or RSV infection. These results have been added to the revised manuscript (Lines 275-277 and Figure 5H).

      (10) Next, callose is measured after feeding of RSV-infected insects on wild type or TLP-overexpressing rice. Less callose deposition (after 2 days) and more virus (after 14 days) is observed in TLP overexpressors. I am missing a control in this experiment, that is feeding of uninfected insects on wild type or TLP overexpressing rice, where I would expect intermediate callose levels.

      We appreciate the reviewer’s insightful comment and fully agree with the prediction. In the revised manuscript, we have constructed ostlp mutant plants and conducted additional experiments to further clarify how callose deposition is regulated by insect feeding, RSV infection, LssaCA levels, and OsTLP expression. Specifically: 

      (1) Both SBPH feeding and RSV infection induce callose deposition, with RSV-infected insect feeding resulting in significantly higher callose levels compared to RSV-free insect feeding (Fig. 5A-C).

      (2) LssaCA enhances OsTLP enzymatic activity, thereby promoting callose degradation (Fig. 5I-K).

      (3) OsTLP-overexpressing (OE) plants exhibit lower callose levels than wild-type (WT) plants, while ostlp mutant plants show higher callose levels than WT (Fig. 6A-B).

      (4) In ostlp knockout plants, LssaCA no longer affects callose levels, indicating that OsTLP is required for LssaCA-mediated regulation of callose (Fig. 6C-D).

      These additional data address the reviewer’s concern and support the conclusion that OsTLP plays a central role in modulating callose levels in response to RSV infection and insect feeding.

      (11) Next the authors test for interaction between virions and CA. Immunofluorescence shows that RSV and CA colocalize in salivary glands; in my opinion, there is partial and not complete colocalization (Fig. 7A).

      We agree with the reviewer’s observation. CA is primarily produced in the small lobules of the principal salivary glands, while RSV infects nearly all parts of the salivary glands. In regions where RSV and CA colocalize within the principal glands, the CA signal appears sharper than that of RSV, likely due to the relatively higher abundance of CA compared to RSV in these areas. This may explain the partial, rather than complete, colocalization observed in our original Figure 7A. In the revised manuscript, please refer to Figure 1A.

      (12) Pulldown experiments with recombinant RSV NP capsid protein and CA confirm interaction, binding affinity assays indicate rather weak interaction between CA and NP. Likewise in pull-down experiments, interaction between NP, CA and TLP is shown. Finally, in vitro activity assays show that activity of preformed TLP-CA complexes can be increased by adding NP; activity of TLP alone is not shown.

      We performed two independent experiments to confirm the influence on TLP enzymatic activity by LssaCA or by the LssaCA-RSV NP complex. In the first experiment, we compared the enhancement of TLP activity by LssaCA using TLP alone as a control (Figure 3G). In the second experiment examining the LssaCA-RSV NP complex effect on TLP activity, we used the LssaCA-TLP combination as the baseline control rather than TLP alone (Figure 4B), since we had already established the LssaCA enhancement effect in the previous experiment.

      (13) For all microscopic acquisitions, you should indicate the exact acquisition conditions, especially excitation and emission filter settings, kind of camera used and objectives. Use of inadequate filters or of a black & white camera could for example be the reason why you observe a homogeneous cell wall label in the aniline blue staining assays. Counterstaining cell walls with propidium iodide might help distinguish between cell wall and callose label.

      Thank you for your insightful suggestions. We have added the detailed information to the revised manuscript (Lines 656-659 and 673-678).

      (14) You should provide information whether CA is deregulated in infected planthoppers, as this could also modify its mode of action.\

      We have compared LssaCA mRNA levels in RSV-free and RSV-infected L.striatellus salivary glands. The results indicated that RSV infection does not significantly affect LssaCA expression (Figure 1J).

      (15) You should show purity of the proteins used for affinity binding measurements.

      We have included SDS-PAGE results of purified proteins in the revised manuscript (Figure S3).

      (16) L 39: Not all arboviruses are inoculated into the phloem.

      Thank you. We have revised this description (Lines 40, 73, 95 and 97).

      (17) L 76: Watery saliva is also injected in epidermis and mesophyll cells.

      Thank you. We have revised this description (Line 73).

      (18) L 79: What do you mean by "avirulent gene"?

      Thank you for your valuable comments. We have revised this description as “certain salivary effectors may be recognized by plant resistance proteins to induce effector-triggered immunity”. Please refer to Lines 76-77 for detail.

      (19) L 128: Please add delivery method.

      Thank you. We have added the delivery methods (Line 134).

      (20) L 195: Please explain "MST".

      Explained (Line 124). Thank you.

      (21) L 203: Please add the plant species overexpressing TLP.

      Added (Line 214). Thank you.

      (22) L 213: Callose deposition has also a role against phloem-feeding insects.

      We appreciate the reviewer’s insight comment. We have added this information to the revised manuscript (Line 252).

      (23) L 626: What is a "mutein"?

      "mutein" is an abbreviation for mutant proteins. Since the recombinant protein microinjection experiments have been removed from the revised manuscript, the term “mutein” has also been removed. For all other instances, we now use the full term “mutant proteins”.

      (24) Fig. 1E: what is "loading"? You should rather show here and elsewhere (or add to supplement) complete protein gels and Western blot membranes and not only bands of interest.

      Thank you for your valuable suggestion. Although Figure 1E has been removed from the revised manuscript, we have carefully reviewed all figures to ensure that the term “loading” has been replaced with the specific protein names where appropriate.

      (25) Fig. 2C: Please indicate which is the blot and which is the silver stained gel and add mass markers in kDa to the silver stained gel.

      Thank you for your suggestion. We have revised figure to include labeled silver-stained gels with indicated molecular weight markers (Figure 1H in the revised manuscript).

    1. Author response:

      We sincerely thank the editors and the reviewers for their feedback in helping us improve this manuscript. During the time this work has been under review, 10x Genomics has updated the probe sequences of their gene panels. We therefore plan to update these findings as well as further expand to incorporate reviewer recommendations.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #2 (Public review):

      Summary:

      The paper by Kim et al. investigates the potential of stimulating the dopaminergic A13 region to promote locomotor restoration in a Parkinson's mouse model. Using wild-type mice, 6-OHDA injection depletes dopaminergic neurons in the substantia nigra pars compacta, without impairing those of the A13 region and the ventral tegmentum area, as previously reported in humans. Moreover, photostimulation of presumably excitatory (CAMKIIa) neurons in the vicinity of the A13 region improves bradykinesia and akinetic symptoms after 6-OHDA injection. Whole-brain imaging with retrograde and anterograde tracers reveals that the A13 region undergoes substantial changes in the distribution of its afferents and projections after 6-OHDA injection, thus suggesting a remodeling of the A13 connectome. Whether this remodelling contributes to pro-locomotor effects of the photostimulation of the A13 region remains unknown as causality was not addressed.

      Strengths:

      Photostimulation of presumably excitatory (CAMKIIa) neurons in the vicinity of the A13 region promotes locomotion and locomotor recovery of wild-type mice 1 month after 6-OHDA injection in the medial forebrain bundle, thus identifying a new potential target for restoring motor functions in Parkinson's disease patients. The study also provides a description of the A13 region connectome pertaining to motor behaviors and how it changes after a dopaminergic lesion. Although there is no causal link between anatomical and behavioral data, it raises interesting questions for further studies.

      Thank you for the comments.

      Weaknesses:

      Although CAMKIIa is a marker of presumably excitatory neurons and can be used as an alternative marker of dopaminergic neurons, some uncertainty remains regarding the phenotype of neurons underlying recovery of akinesia and improvement of bradykinesia.

      The primary objective was to focus on a population of neurons that could contribute to functional recovery, with a long-term translational focus in mind. We have followed up on this by creating a rat-based DBS model of stimulating the A13 region (Bisht et al 2025). We agree that the next steps are to genetically dissect the circuits, and we have made a start on this with our recent publication (Sharma et al 2024).

      Figure 4 is improved, but the results from the correlation analyses remain difficult to interpret, as they may reflect changes in various impaired brain regions independently of the A13 region. While the analysis offers a snapshot of correlated changes within the connectome, it does not identify which specific cell or axonal populations are actually increasing or decreasing. Although functional MRI connectome analyses are well-established, anatomical data seem less suitable for this purpose. How can one interpret correlated changes in anatomical inputs or outputs between two distinct regions?

      We appreciate the reviewer's thoughtful comment regarding the interpretability of the correlation analyses in Figure 4. We fully acknowledge that our anatomical data cannot establish causality or identify specific cell types or axonal populations undergoing changes following unilateral nigrostriatal degeneration. However, our intent with this analysis was not to infer mechanistic pathways but rather to provide a systems-level overview of how the global organization of A13 efferents and afferents is altered following 6-OHDA lesioning. By calculating proportions of total inputs and outputs and comparing them across brain regions, we aimed to control for variability in labeling and highlight relative shifts in network organization. The correlation matrices are intended to capture coordinated changes in input/output distribution patterns, effectively reflecting how groups of regions co-vary in their input to or output from the A13 region. In our case, we used correlation analysis to identify how input and output distributions across brain regions reorganize as a network following 6-OHDA lesioning. For example, a positive correlation between inputs from Region A and Region B to the A13 suggests that across animals, when input from Region A is relatively high, input from Region B tends to be high as well, indicating that connectivity from these regions to the A13 may be co-regulated or affected similarly by the lesion. Conversely, a shift from positive to negative correlation may signal a divergence in how regions contribute to the A13 connectome after nigrostriatal degeneration (e.g., increased connectivity to Region A compared to reduced connectivity to Region B). Thus, these patterns offer new insight into the broader reorganization of the A13 connectome and may serve as systems-level signatures of altered anatomical organization, providing a foundation for future mechanistic investigations using circuit-specific tools. We have revised the text to better emphasize the correlative and descriptive nature of these analyses and to clarify that they serve as a hypothesis-generating exploration. Future studies using cell type- and/or projection-specific functional manipulations will be essential to determine the causal roles of these reorganized circuits. We believe our use of this method is justified in the context of exploring broad, lesion-induced network reorganization, and we hope this additional context helps clarify the purpose and limitations of our approach.

      Figure 5 is also improved, but there is room for further enhancement. As currently presented, it is difficult to distinguish the differences between the sham and 6-OHDA groups. The first column could compare afferents, while the second column could compare efferents. Given the small sample size, it would be more appropriate to present individual data rather than the mean and standard deviation.

      We have reorganized Figure 5 as suggested.

      Appraisal and impact

      Although the behavioral experiments are convincing, the low number of animals in the anatomical studies is insufficient to make any relevant statistical conclusions due to extremely low statistical power.

      See previous comments on this.

      Reviewer #2 (Recommendations for the authors):

      Points that need to be addressed:

      Figure S1 is supposed to illustrate the percentage of expression in all mice, but the number of mice does not match (n=3 and 3 in Figure S1 versus n=5 and 6 in Figure 1). Revise the legend or add the missing data.

      We have added the additional data to this graph (Figure 2 – figure supplement 1) and have separated out 6-OHDA and sham mice for clarity.

      Page 4: "There was also an increase in the number of ChR2 cells with c-fos labeling in 6-OHDA ChR2 mice compared to the 6-OHDA eYFP mice. However, there was no net increase in TH+ cells labelled with ChR2 and c-Fos suggesting a heterogeneous population of activated cells." A quantification will be necessary to advance this conclusion.

      We were able to determine that there was a trend of increased c-Fos intensity within the A13 region following photostimulation. However, the variability in the data makes it premature to comment on the TH co-localization and we have deleted this statement.

      Figure 3: The choice of red and green could be a problem for color-blind people.

      Thank you - switched to orange and cyan instead.

      Page 7, 4th paragraph: "6-OHDA mice demonstrated significantly greater descent times than sham mice (Figure 3L, p<0.01)." This is not what is shown in the Figure 3L.

      We made changes in the legend and text to clarify.

      Page 7, last line: PT abbreviation should be introduced in parentheses at the beginning of this section.

      Removed the abbreviation.

      Figure S4A: The authors should show data for the VTA or refer to the quantification of Figure S4G in the text.

      Now referenced correctly in the text.

      Figure S7 and S8 are not referenced in the results or methods.

      References added to text.

      Double-check the formatting of some references: L.-X. Li et al, 2021, L. Kim et al., 2021.

      References checked and corrected.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review): 

      Summary: 

      In this study, Bonnifet et al. profile the presence of L1 ORF1p in the mouse and human brain and report that ORF1p is expressed in the human and mouse brain specifically in neurons at steady state and that there is an age-dependent increase in expression. This is a timely report as two recent papers have extensively documented the presence of full-length L1 transcripts in the mouse and human brain (PMID: 38773348 & PMID: 37910626). Thus, the finding that L1 ORF1p is consistently expressed in the brain is important to document and will be of value to the field. 

      Strengths: 

      Several parts of this manuscript appear to be well done and include the necessary controls. In particular, the documentation of neuron-specific expression of ORF1p in the mouse brain is an interesting finding with nice documentation. This will be very useful information for the field. 

      We thank the reviewer for this positive comment. 

      Weaknesses: 

      Several parts of the manuscript appear to be more preliminary and need further experiments to validate their claims. In particular, the data suggesting expression of L1 ORF1p in the human brain and the data suggesting increased expression in the aged brain need further validation. Detailed comments: 

      (1) The expression of ORF1p in the human brain shown in Fig. 1j is puzzling. Why are there two strong bands in the WB? How can the authors be sure that this signal represents ORF1p expression and not non-specific labelling? While the authors discuss that others have found double bands when examining human ORF1p, there are also several labs that report only one band. This discrepancy in the field should at least be discussed and the uncertainties with their findings should be acknowledged. 

      Please see also our extensive response to this comment we made in round #1 of the revisions.

      As a summary, in response to the initial review, we included several lines of additional evidence in the revised manuscript:

      siRNA-mediated knockdown of ORF1p in human neurons, resulting in ≈50% signal reduction using the antibody in question (Suppl. Fig. 2C) immunoprecipitation using the human ORF1p antibody in question confirming signal specificity (Suppl. Fig. 2B) use of a second antibody in immunostainings, including a new control (Suppl. Fig. 2E) and a revised discussion acknowledging the uncertainty surrounding the lower band:

      “The double band pattern in Western blots has been observed in other studies for human ORF1p outside of the brain as well as for mouse ORF1p. […] The nature of the lower band is unknown, but might be due to truncation, specific proteolysis or degradation.”

      We have also now added more content to the paragraph starting from line 183 : "While there is some discrepancy in the field, the double band pattern in Western blots..."

      To our understanding, this combination of independent methods using two antibodies and complementary validation strategies supports the presence of ORF1p in human brain tissue.

      (2) The data showing a reduction in ORF1p expression in the aged mouse brain is an interesting observation, but the effect magnitude of effect is very limited and somewhat difficult to interpret. This finding should be supported by orthogonal methods to strengthen this conclusion. For example, by WB and by RNA-seq (to verify that the increase in protein is due to an increase in transcription). 

      This would indeed be valuable but at this point, we will not be able to perform these experiments at this point (please also see revision #1 for a more detailed answer)

      (3) The transcriptomic data using human postmortem tissue presented in Figure 4 and Figure 5 are not convincing. Quantification of transposon expression on short read sequencing has important limitations. Longer reads and complementary approaches are needed to study the expression of evolutionarily young L1s (see PMID: 38773348 & PMID: 37910626 for examples of the current state of the art). As presented, the human RNA data is inconclusive due to the short read length and small sample size. The value of including an inconclusive analysis in the manuscript is difficult to understand. With this data set, the authors cannot investigate age-related changes in L1 expression in human neurons. 

      Please see also our extensive response to this comment we made in round #1 of the revisions.

      In the revised version, we have added further statistical analyses, incorporated locus-specific mappability scores and provided an even more nuanced interpretation of our findings, as illustrated in lines 390 and 427.

      We have acknowledged the limitations of short-read sequencing in this context, while referencing established methodologies (e.g., Teissandier et al., 2019) and recent benchmarking studies (e.g., Schwarz et al., 2022) that validate the use of such data under specific precautions—many of which we have implemented.

      Given these considerations, and with the guidance of a co-author with specific expertise in TE bioinformatics, we believe our approach is justified and robust.

      (4) In line with these comments, the title should be changed to better reflect the findings in the manuscript. A title that does not mention "L1 increase with aging" would be better. 

      In line with our response to Point (3), we prefer to retain the current analyses and discussion, which we believe strike an appropriate balance between caution and added scientific value.

      Reviewer #2 (Public review): 

      Summary: 

      Bonnifet et al. sought to characterize the expression pattern of L1 ORF1p expression across the entire mouse brain, in young and aged animals and to corroborate their characterization with Western blotting for L1 ORF1p and L1 RNA expression data from human samples. They also queried L1 ORF1p interacting partners in the mouse brain by IP-MS. 

      Strengths: 

      A major strength of the study is the use of two approaches: a deep-learning detection method to distinguish neuronal vs. non-neuronal cells and ORF1p+ cells vs. ORF1p- cells across large-scale images encompassing multiple brain regions mapped by comparison to the Allen Brain Atlas, and confocal imaging to give higher resolution on specific brain regions. These results are also corroborated by Western blotting on six mouse brain regions. Extension of their analysis to post-mortem human samples, to the extent possible, is another strength of the paper. The identification of novel ORF1p interactors in brain is also a strength in that it provides a novel dataset for future studies. 

      We thank the reviewer for these positive comments.

      Weaknesses: 

      The main weakness of the IP-MS portion of the study is that none of the interactors were individually validated or subjected to follow-up analyses. The list of interactors was compared to previously published datasets, but not to ORF1p interactors in any other mouse tissue.

      As we had stated in the first round of revision, the list of previously published datasets does include a mouse dataset with ORF1p interacting proteins in mouse spermatocytes (please see line 478-4479: “ORF1p interactors found in mouse spermatocytes were also present in our analysis including CNOT10, CNOT11, PRKRA and FXR2 among others (Suppl_Table4).”) -> De Luca, C., Gupta, A. & Bortvin, A. Retrotransposon LINE-1 bodies in the cytoplasm of piRNA-deficient mouse spermatocytes: Ribonucleoproteins overcoming the integrated stress response. PLoS Genet 19, e1010797 (2023)). We agree that a validation of protein interactors of ORF1p in the mouse brain would have been valuable. However, the significant overlap with previously published interactors highlights the validity of our data. As reviewer #2 points out in the comments on revisions, we hope that follow-up studies will address these points and we anticipate that this list of ORF1p protein interactors in the mouse brain will be of further use for the community.

      Comments on revisions: 

      The co-staining of Orf1p with Parvalbumin (PV) presented in Supplemental Figure S5 is a welcome addition exploring the cell type-specificity of Orf1p staining, and broadly corroborates the work of Bodea et al. while revealing that Orf1p also is expressed in non-PV+ cells, consistent with L1 activity across a range of neuronal subtypes. The authors also have strengthened their findings regarding the increased intensity of ORF1p staining in aged compared to young animals, and the newly presented results are indeed more convincing. The prospect of increased neuronal L1 activity with age is exciting, and the results in this paper have provided the groundwork for ongoing discoveries in this area. While it is disappointing that no Orf1p interactors were followed up, this is understandable and the data are nonetheless valuable and will likely prove useful to future studies. 

      Thank you for your time and constructive comments.

      Reviewer #1 (Recommendations for the authors): 

      We would recommend that the human RNA-seq analysis is removed from the manuscript. The human RNA data is inconclusive due to the short read length and small sample size. The value of including an inconclusive analysis in the manuscript is difficult to understand. With this data set, the authors cannot investigate age-related changes in L1 expression in human neurons. 

      Reviewer #2 (Recommendations for the authors): 

      Thank you for addressing my suggestions. I have no further recommendations at this time.

    1. Author response:

      Reviewer #1 (Recommendations for the authors):

      “The gar-3 promoter expression pattern was not discussed in the context of rescue experiments.”

      We agree that the expression pattern of the gar-3 promoter used in our rescue experiments should be clarified. We will include a description of the tissues where the 7.5 kb gar-3 promoter fragment is expressed, based on both prior studies and our own expression data. We will also discuss how the gar-3 cell and tissue expression pattern relates to both our analysis of gar-3 expression in the genome edited strain we generated as well as the observed rescue effects.

      Reviewer #2 (Recommendations for the authors):

      (1) The site of action of cholinergic signaling was not adequately explored.

      We plan to perform additional rescue experiments using heterologous promoters to drive gar-3 expression in specific tissues (e.g. cholinergic neurons, muscle). These experiments will help clarify the sufficiency of unc-17 expression in specific cell types for rescue. However, we point out that cell-specific unc-17 knockdown by RNAi using the unc-17b promoter (expression largely restricted to ventral cord ACh motor neurons) increases sensitivity to PQ in our long-term survival assays. Combined with our analysis of unc-17(e113) mutants, we believe our data offer robust support of a requirement for unc-17 expression in cholinergic motor neurons.

      (2) Pan-neuronal silencing experiments were not connected to ACh/GAR-3 signaling.

      We will expand our discussion to relate the pan-neuronal silencing results to our analysis of ACh signaling. We used the pan-neuronal silencing to motivate further analysis of various neurotransmitter systems. We note that our studies implicate both glutamatergic and cholinergic systems in protective responses to oxidative stress. The effects of silencing on survival during long-term PQ exposure may therefore be derived solely from cholinergic neurons, glutamatergic neurons, or a combination of both neuronal populations. We hope the reviewer will agree that distinguishing between these possibilities may be quite complicated and is not central to the main message of our paper. We therefore suggest this additional analysis lies outside the scope of this revision.

      (3) Inter-tissue signaling and transcriptional regulation by ACh were assumed but not directly shown.

      We will generate GFP reporters for a subset of genes (including proteasomal genes) identified in our RNA-seq analysis or assess their expression by quantitative RT-PCR to validate cholinergic regulation. These experiments will help to identify target tissues and confirm transcriptional regulation by cholinergic signaling.

      We appreciate the opportunity to revise our manuscript and believe that these additions will significantly strengthen the mechanistic insights and overall impact of our study. Please let us know if further clarification is needed.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review): 

      Summary and Strengths:

      The very well-written manuscript by Lövestam et al. from the Scheres/Goedert groups entitled "Twelve phosphomimetic mutations induce the assembly of recombinant fulllength human tau into paired helical filaments" demonstrates the in vitro production of the so-called paired helical filament Alzheimer's disease (AD) polymorph fold of tau amyloids through the introduction of 12 point mutations that attempt to mimic the disease-associated hyper-phosphorylation of tau. The presented work is very important because it enables disease-related scientific work, including seeded amyloid replication in cells, to be performed in vitro using recombinant-expressed tau protein. 

      Weaknesses: 

      The following points are asked to be addressed by the authors:

      (i) In the discussion it would be helpful to note the findings that in AD the chemical structure tau (including phosphorylation) is what defines the polymorph fold and not the buffer/cellular environment. It would be further interesting to discuss these findings in respect to the relationship between disease and structure. The presented findings suggest that due to a cellular/organismal alteration, such as aging or Abeta aggregation, tau is specifically hyper-phosphorylated which then leads to its aggregation into the paired helical filaments that are associated with AD. 

      We have added an extra sentence to the Introduction to emphasise this possibility: “Besides the cellular environment in which they assemble, different tau folds may also be determined by chemical modifications of tau itself.”

      In addition, the last paragraph of the Discussion now reads: “It could be that, besides different cellular environments in which the filaments assemble, different posttranslational modification patterns are also important for the assembly of tau into protofilament folds that are specific for the other tauopathies.”

      (ii) The conditions used for each assembly reaction are a bit hard to keep track of and somewhat ambiguous. In order to help the reader, I would suggest making a table to show conditions used for each type of assembly (including the diameter / throw of the orbital shaker) and the results (structural/biological) of those conditions. For example, presumably the authors did not have ThT in the samples used for cryo-EM but the methods section does not specify this. Also, the presence of trace NaCl is proposed as a possible cause for the CTE fold to appear in the 0N4R sample (page 4) but no explanation of why this particular sample would have more NaCl than the others. Furthermore, it appears that NaCl was actually used in the seeded assembly reactions that produced the PHF and not the CTE fold. This would seem to indicate the CTE structure of 0N4RPAD12 is not actually induced by NaCl (like it was for tau297-391). In order for the reader to better understand the reproducibility of the polymorphs, it would be helpful to indicate in how many different conditions and how many replicates with new protein preparations each polymorph was observed (could be included in the same table)  

      We have added a new table (Table 1) with the buffer conditions, protein concentration and shaking speed and time, for all structures described in this paper. We never added ThT to assembly reactions that were used for cryo-EM.

      We did not use NaCl in the seeded assembly reactions (we used sodium citrate). We don’t really know why 0N4R PAD12 tau more readily forms the CTE fold. The observation that it does so prompted us to use 0N3R for all ensuing experiments. 

      (iii) It is not clear how the authors calculate the percentage of each filament type. In Figure 1 it is stated "discarded solved particles (coloured) and discarded filaments in grey" which leaves the reviewer wondering what a "discarded solved particle" is and which filaments were discarded. From the main text one guesses that the latter is probably false positives from automated picking but if so, these should not be referred to as filaments. Also, are the percentages calculated for filaments or segments? In any case, it would be more helpful in such are report to know the best estimate of the ratio of identified filament types without confusing the reader with a measure of the quality of the picking algorithm. Please clarify. Also, a clarification is asked for the significance of the varying degrees of PHF and AD monomer filaments in the various assembly conditions. It could be expected that there is significant variability from sample to sample but it would be interesting to know if there has been any attempt to reproduce the samples to measure this variability. If not, it might be worth mentioning so that the % values are taking with the appropriate sized grain of salt. Finally, the representation of the data in Figure 1 would seem to imply that the 0N3R forms less or no monofilament AD fold because no cross-section is shown for this structure, however it is very similar to (or statistically the same as) the 1:1 mix of 0N3R:0N4R.

      In the revised manuscript, we have used bi-hierchical clustering of filaments, where each segment (or particle) is classified based on both 2D class assignment and to which filament it belongs (this method is based on [Porthula et al (2019), Ultramicroscopy 203, 132-138] and was further developed in [Lövestam et al (2024) Nature 7993, 119-125]. Based on the assumption that filament type does not change within a single filament type, we have observed that this gives excellent classification results, and that this approach allows classification of many, even small minority, filament types. Using this approach, we now quantify the different filament types on the number of segments extracted from filaments classified in this way. 

      Moreover, we have also addressed the problem of having singlets among the PHF preparation: it turns out that waiting longer, just by transferring samples out of the shaker after one week and incubating it quiescently at 37 ºC for two more weeks, the singlets disappear and only PHFs remain. Filaments made for the fluorophore labelling in the revised Figure 3 were also done using the new protocol. In total, we have N=7 replicates with a mean of 95.3% PHFs and a standard deviation of 9.4%. The revised text in the Results section reads:

      “To further increase the proportions of PHFs-to-singlet ratio, we removed the plate from the shaker after one week and incubated it quiescently at 37 ºC for two more weeks. This resulted in 100% PHFs formed (Figure 1 – figure supplement 4). When repeated seven times, on average 95.3% PHFs formed, with 25% of singlets formed in a single outlier (Figure 1 – figure supplement 5)” 

      (iv) The interpretation of the NMR data on soluble tau that the mutations on the second site are suppressing in part long range dynamic interaction around the aggregationinitiation site (FIA) is sound. It is in particular interesting to find that the mutations have a similar effect as the truncation at residue 391. An additional experiment using solvent PREs to elaborate on the solvent exposed sequence-resolved electrostatic potential and the intra-molecular long range interactions would likely strengthen the interpretation significantly (Iwahara, for example, Yu et al, in JACS 2024). Figure 6D Figure supplement shows the NMR cross peak intensities between tau 151-391 and PAD12tau151-391. Overall the intensities of the PAD12 tau construct are more intense which could be interpreted with less conformational exchange between long range dynamic interactions. There are however several regions which do not show any intensity anymore when compared with the corresponding wildtype construct such as 259-262, 292-294 which should be discussed/explained. 

      While long-range intramolecular interactions of tau have previously been reported through the use of spin labels (Mukrasch et al 2009 PLoS Biol 7(2): e1000034), we have been hesitant to introduce paramagnetic agents into our samples for two reasons. First, the bulky size of the spin label may affect filament formation or influence the dynamic properties of the protein. Second, covalent addition of the spin label requires mutation of the primary sequence to both remove native cysteine residues and add cysteines at the desired label location. We have previously shown that mutation of cysteine 322 to alanine leads to the formation of tau filaments with a structure that is different from the PHF (Santambrogio et al (2025) bioRxiv 2025.03.29.646137). 

      Instead, we have included in the revised manuscript new NMR and cryo-EM data that provide further support for the model that a FIA-like interaction between residues <sub>392</sub>IVYK<sub>395</sub> and residues <sub>306</sub>VQIVYK<sub>311</sub> has an inhibiting effect on filament nucleation in unmodified full-length tau. A mutant of tau297-441 where residues <sub>392</sub>IVYK<sub>395</sub> have been deleted and that does not contain the four PAD12 mutations in the carboxy-terminal domain behaves similarly in the NMR experiment as the tau297-441 construct with those four PAD12 mutations. Moreover, full-length 0N3R tau with the eight PAD12 mutations in the amino-terminal fuzzy coat and with the deletion of<sub>392</sub>IVYK<sub>395</sub>, but without the four PAD12 mutations in the carboxy-terminal domain, assembles readily into amyloid filaments (of which we also solved a cryo-EM structure, see the revised Figure 6B). These observations provide mechanistic insights into the previously proposed paper-clip model [Jeganathan (2008), J Biol Chem 283, 32066-32076], where interactions between the fuzzy coat inhibit filament formation of unmodified full-length tau, and phosphorylation in the fuzzy coat interferes with these interactions, thus leading to filament nucleation. Of course, the identification of residues <sub>392</sub>IVYK<sub>395</sub> for this interaction also explain why truncation of tau at residue 391 leads to spontaneous assembly. We have introduced a new Figure 7 to the revised manuscript to explain this model in more detail. The corresponding new section in the Results reads:

      “To investigate this further, we also tested a tau construct comprising residues tau297-441 without the phosphomimetic mutations, but with a deletion of residues (Δ392-395). Filaments formed rapidly and the cryo-EM structure showed that the ordered core consisted of the amino-terminal part of the construct spanning residues 297-318 (Figure 6B). NMR analysis (Figure 6 – figure supplement 5B) showed that the tau297441 Δ392-395 construct exhibited similar backbone rigidity properties to the tau297-441 PAD12 construct, despite peak locations and local secondary structural propensities being more similar to the wildtype tau297-441 (Figure 6 – figure supplement 5A; Figure 6 – figure supplement 6). HSQC peak intensities in the 297-319 and 392-404 regions of tau297-441 Δ392-395 (Figure 6A, expanded from Figure 6 - figure supplement 5C) were like those in the tau297-441 PAD12. These data suggest that the IVYK deletion has a similar effect as the phosphomimetics on residues 396, 400, 403 and 404 on disrupting an intra-molecular interaction between the FIA core region and the carboxy-terminal domain, which may therefore be mediated by interactions between the two IVYK motifs that are similar to those observed in the FIA (Lövestam et al, 2024).”

      A new section in the Discussion now reads:

      “Our NMR data provide insights into the mechanism by which phosphorylation in the fuzzy coat of tau, or truncations of tau, lead to the formation of filaments with ordered cores of residues that are themselves not phosphorylated. HSQC peak intensity differences between unmodified tau 297-441, PAD12 tau 297-441 and tau297-391 suggest that phosphorylation of the fuzzy coat, particularly near the <sub>392</sub>IVYK<sub>395</sub> motif in the carboxy-terminal domain, a7ects the conformation of the residues of tau that become ordered in the FIA (Lövestam et al., 2024). Removal of residues <sub>392</sub>IVYK<sub>395</sub> in the carboxyterminal domain of tau 297-441 led to rapid filament formation in the absence of phosphomimetics, while HSQC peak intensity di7erences for this construct indicate similar backbone rigidity compared to tau 297-441 without the deletion, but with the four PAD12 mutations in the carboxy-terminal domain. Combined, these observations support a model where the <sub>392</sub>IVYK<sub>395</sub> motif in unmodified full-length tau monomers interacts with the <sub>308</sub>IVYK<sub>311</sub> motif, thus inhibiting filament formation by preventing the formation of the nucleating species, the FIA. Phosphorylation of nearby residues 396, 400, 403 and 404, or truncation at residue 391, disrupt this interaction and lead to filament formation. This model agrees with the previously proposed hairpin-like model of tau (Jeganathan et al., 2008), although the corresponding interaction between the aminoterminal domain of tau and the core-forming region remains unknown (Figure 7).”

      Due to the challenging nature of the assignment, it was not possible to assign all residues in the HSQC of the tau151-391 and the PAD12 tau151-391 samples, including residues 259-262 and 292-294 for PAD12 tau151-391. To make this clearer, we have marked residues that are not assigned with an asterisk in the revised version of Figure 6 – figure supplement 1.  

      (v) Concerning the Cryo-EM data from the different hyper-phosphorylation mimics, it would seem that the authors could at least comment on the proportion of monofilament and paired-filaments even if they could not solve the structures. Nonetheless, based on their previous publications, one would also expect that they could show whether the nontwisted filaments are likely to have the same structure (by comparing the 2D classes to projections of non-twisted models). Also, it is very interesting to note that the twist could be so strongly controlled by the charge distribution on the non-structured regions (and may be also related to the work by Mezzenga on twist rate and buffer conditions). Is the result reported in Figure 2 a one-oT case or was it also reproducible?

      As also indicated in the main text, the assembly conditions for the PAD12+4, PAD12-4 and PAD12+/-4 constructs were kept the same as those for the PAD12 construct. It is possible that further optimisation of the conditions could again lead to twisting filaments, but we chose not to pursue this route. With unlimited resources and time, one could assess in detail which of the PAD12 mutations are required and which ones could be omitted to form PHFs. However, this would require a lot of work and cryo-EM time. For now, we chose to prioritise reporting conditions that do work to reproducibly make PHFs in the laboratory (using the PAD12 construct) and leave the more detailed analysis of other constructs for future studies. 

      Reviewer #2 (Public review): 

      Summary: 

      This manuscript addresses an important impediment in the field of Alzheimer's disease (AD) and tauapathy research by showing that 12 specific phosphomimetic mutations in full-length tau allow the protein to aggregate into fibrils with the AD fold and the fold of chronic traumatic encephalopathy fibrils in vitro. The paper presents comprehensive structural and cell based seeding data indicating the improvement of their approach over previous in vitro attempts on non-full-length tau constructs. The main weaknesses of this work results from the fact that only up to 70% of the tau fibrils form the desired fibril polymorphs. In addition, some of the figures are of low quality and confusing. 

      As also explained in our response to reviewer #1, we have performed better quantification of filament types in the revised manuscript, and we have investigated how to get rid of the singlets. In the revised manuscript, we report that singlets disappear as time passes and that one can obtain 100% pure PHFs by quiescently incubating samples for another two weeks, after shaking for a week.

      Strengths: 

      This study provides significant progress towards a very important and timely topic in the amyloid community, namely the in vitro production of tau fibrils found in patients.

      The 12 specific phosphomimetic mutations presented in this work will have an immediate impact in the field since they can be easily reproduced.

      Multiple high-resolution structures support the success of the phosphomimetic mutation approach. Additional data show the seeding efficiency of the resulting fibrils, their reduced tendency to bundle, and their ability to be labeled without affecting core structure or seeding capability.

      Weaknesses: 

      Despite the success of making full-length AD tau fibrils, still ~30% of the fibrils are either not PHF, or not accounted for. A small fraction of the fibrils are single filaments and another ~20% are not accounted for. The authors mention that ~20% of these fibrils were not picked by the automated algorithm. However, it would be important to get additional clarity about these fibrils. Therefore, it would improve the impact of the paper if the authors could manually analyze passed-over particles to see if they are compatible with PHF or fall into a different class of fibrils. In addition, it would be helpful if the authors could comment on what can be done/tried to get the PHF yield closer to 90-100%

      As mentioned above, in the revised manuscript we show that the singlets disappear over time and we now include a description of a method that leads to 100% PHF formation.

      Reviewer #1 (Recommendations for the authors):

      Minor points: 

      (a) In Figure 6 the dashed purple vertical lines overlap with the black bars, rendering a grey color which is confusing because the grey bars used for the shorter construct. It is suggested to improve the colors (remove transparency on the purple?)

      We thank the reviewers for their suggestions for improving the visualisation of our data. We have recoloured the tau297-391 data from grey to gold and moved the dashed lines to the back of image to remove the apparent colour changes.  

      (b) Is there any support for the suggestion that "part of the second microtubule-binding repeat is ordered" being "related to this construct forming filaments with only a single protofilament"? It seemed to have come out of nowhere.

      There is no further support for this statement, but we thought it would be worth hypothesizing about this observation. 

      (c) Figures 1 and 4 E is better described as a "main chain trace" or "backbone trace" although the latter usually refers to only CA positions. Ribbon usually refers to something else in representations of protein structures. 

      This has been changed into “main chain trace” in Figures 1 and 4. 

      (d) Figure 1 Supplement 3: Panel letters in the legend do not match. 

      This has been fixed.

      Reviewer #2 (Recommendations for the authors): 

      The introduction is a bit lengthy (e.g. 3rd paragraph of introduction) and could benefit by focusing specific question the manuscript addresses. 

      We have shortened the Introduction. It now contains ~1150 words, which we hope provides a better compromise between length and sufficient background information.

      Figure captions are generally not helpful in conveying a message to the reader.

      Figure 1 - figure supplement 3 is quite confusing. The 4 structures in A) do not correspond to the grids in B-E. What is this figure supposed to show?

      This confusion was probably the result of incorrect labelling of panels in the legend, which was also pointed out by reviewer #1. This has been fixed in the revised manuscript.

      Page 11: Although I know what you mean, 'linear increase of ThT fluorescence' is not the correct term. 

      We have replaced “linear” with “rapid”.

      Page 15: Although line shape and peak intensity can be related you are not reporting on line shape or width but simply on peak intensity. Therefore, I wouldn't talk about the result of a 'line shape analysis'.

      We have changed the wording accordingly. 

      Figure 6 (and supplement 1) are confusing and too small to be readable in print. It might be sufficient to show the CSP and upload the remaining data to the BMRB. 

      We have made a clearer version of the main NMR Figure 6 in the revised manuscript showing the most pertinent NMR data and have moved the previous version into the figure supplements. We designed these figures to be viewed as full page A4 panels, ideally seen in one image as they show multiple comparisons of different experiments and constructs.

      As such we feel these will be best viewed on screen as part of the eLife web document. We have uploaded HSQC spectra and assignments to the BMRB (see below).

      Figure 6 supplement 3 might benefit from pointing out key residues in the overlay.

      We have added the labels (this is now Figure 6 supplement 4).

      Data availability: Please upload the assignments to the BMRB together with key spectra (e.g. HSQCs). 

      We have uploaded HSQC data along with our assignments to the BMRB, the accession codes are 52694 – tau297-441 wt; 52695 – tau297-441 PAD-12; 52696 – tau151-391 wt; 52697 – tau151-391 PAD-12; and 53230 – tau297-441 delta392-395.  These accession codes have been added to the manuscript. 

      The quality of some of the figures (specifically Figure 1 - supplement 3 and Figure 6) is not suitable for publication. 

      For the original submission to bioRxiv, we produced a single PDF with a manageable file size. We will liaise with the eLife staff to ensure the images used in the version of record will be suitable for publication.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      The topic of tumor-immune co-evolution is an important, understudied topic with, as the authors  noted, a general dearth of good models in this space. The authors have made important progress on the topic by introducing a stochastic branching process model of antigenicity/immunogenicity and measuring the proportion of simulated tumors that go extinct. The model is extensively explored, and the authors provide some nice theoretical results in addition to simulated results. 

      We thank the reviewer for the positive comments on our work.

      Major comments 

      The text in lines 183-191 is intuitively and nicely explained. However, I am not sure all of it follows from the figure panels in Figure 2. For example, the authors refer to a mutation that has a large immunogenicity, but it's not shown how many mutations, or the relative size of the mutations in Figure 2. The same comment holds true for the claim that spikes also arise for mutations with low antigenicity. 

      We thank the reviewer for helping us to further specify this statement in our original submission. We now added muller plots in a new Appendix Figure (Figure A3) presenting the relative abundances of different types of effector cells in the population over time. Each effector type is colour-coded with its antigenicity and immunogenicity. To align with this Appendix Figure (Figure A3), we also updated our Figure 2 generated under the same realisation as Figure A3. We can now see clearly that the spikes in the mean values of the antigenicity and immunogenicity over the whole effector populations in new Figure 2B&2D indeed correspond to the expansion of single or several antigenic mutations recruiting the specific effector cell types. For example, in Figure 2B, we can see that the spikes of low average antigenicity and high immunogenicity (around time 11) happen at the same time when an effector type in Figure A3 with such a trait (coloured in green) arises and takes over the population. We have rewritten our Results section related (Line 192 - Line 222 in main text and Appendix A6).

      Reviewer #2 (Public review): 

      Summary: 

      In this work, the authors developed a model of tumour-immune dynamics, incorporating stochastic antigenic mutation accumulation and escape within the cancer cell population. They then used this  model to investigate how tumour-immune interactions influence tumour outcome and summary  statistics of sequencing data. 

      Strengths: 

      This novel modeling framework addresses an important and timely topic. The authors consider the useful question of how bulk and single-cell sequencing may provide insights into the tumourimmune interactions and selection processes. 

      We thank the reviewer for the positive comments.

      Weaknesses: 

      One set of conclusions presented in the paper is the presence of cyclic dynamics between effector/cancer cells, antigenicity, and immunogenicity. However, these conclusions are supported in the manuscript by two sample trajectories of stochastic simulations, and these provide mixed support for the conclusions (i.e. the phasing asynchrony described in the text does not seem to apply to Figure 2C). 

      We have now developed a method to quantify the cyclic dynamics in our system (Appendix A7), where can track the directional changes phase portrait of the abundances of the cancer and effector cells. We first tested this method in a non-evolving stochastic predator-prey system, where our method can correctly capture the number of cycles in this system (Figure A7). We then use this method to quantify the number of cycles we observed between cancer and effector cells under different mutation rates (Figure A5) as well as whether they are counter-clockwise or clockwise cycles (Figure A6). Our results showed that the cyclic dynamics are more often to be observed when mutation rates are higher, and the majority of those cycles are counter-clockwise. When the mutation rate is high, we observe an increase of clockwise cycles, which have been observed in predator-prey systems and explained through coevolution. However, even under high mutation rates, counter-clockwise cycles are still the more frequent type. 

      In our simulations, we observed rarely out-of-phase cycles, which was by chance present in our original Figure 2. We have now removed that statement about out-of-phase cycles and replaced by more systematic analysis of the cyclic dynamics as described above (Line 192 to 207 in the revised version). We thank the constructive comment of the reviewer, which motivated us to improve our analysis significantly. 

      Similarly, the authors also find immune selection effects on the shape of the mutational burden in Figure 5 D/H using a qualitative comparison between the distributions and theoretical predictions in  the absence of immune response. However the discrepancy appears quite small in panel D, and  there are no quantitative comparisons provided to evaluate the significance. An analysis of the robustness of all the conclusions to parameter variation is missing. 

      We have now added statistical analysis using Wasserstein distance between the simulated mutation burden distribution and theoretical (neutral) expectation in Figure 5 C, D, G, H as well as in Figure A11 C&D when there is no cancer-immune interaction. We can see that the measurements of the  Wasserstein distance agrees with our statement, that the higher immune effectiveness leads to larger deviation from the neutral expectation.

      Lastly, the role of the Appendix results in the main messages of the paper is unclear. 

      We agree with the review and have now removed the Appendix sections “Deterministic Analysis”. 

      Reviewing Editor Comments: 

      I find the abstract too long. For example, "Knowledge of this coevolutionary system and the selection taking place within it can help us understand tumour-immune dynamics both during tumorigenesis but also when treatments such as immunotherapies are applied." can be shortened to: "Knowledge of this coevolutionary system can help us understand tumour-immune dynamics both during tumorigenesis and during immunotherapy treatments." 

      We agree and have taken the suggestion of the reviewer to shorten our abstract.

      Reviewer #1 (Recommendations for the authors): 

      The discussion at lines 134-140, centered around Figure A1, is an important and nicely constructed feature of the model. 

      Reviewer #2 (Recommendations for the authors): 

      I suggest that the authors conduct a more in-depth analysis of their conclusions on cyclic dynamics over a large set of sample paths.

      Done and please see our detailed response to the reviewer 2 above.

      In addition, statistical comparisons between the observed mutational burden distribution and  theoretical predictions in the absence of immune selection should be carried out to support their conclusions. In all cases, conclusions should be tested extensively for robustness/sensitivity to parameters. 

      Done and please see our detailed response to the reviewer 2 above.

      Here are some specific suggestions/comments: 

      (1) Please provide a precise mathematical description of the model to complement Figure 1. 

      We have significantly revised our “Model” section to provide a precise mathematical description of our model (Line 138 - 148). Please also see our document showing the difference between the revised version and original submission.

      (2) Section on "Interactions dictate outcome of tumour progress" and Figure 3: please define 'tumour outcome' - are the heatmaps produced in Figure 3 tumor size reflecting whether or not the population has reached level K before a particular time? Also, I do not see a definition for the 'slowgrowing' tumour proportion plotted in Figure 3CF or in the accompanying text. 

      We have now added the definition of “tumour outcome” in our “Model” section (line 171 to 176), where we explain our model parameters and quantities measured in the following “Results” section.

      (3) Figure 5C/G: the green dotted vertical line is difficult to see. 

      We have now changed the mean of the simulations to solid red lines instead of using the green dotted vertical lines previously.

      (4) Appendix A1 text under (A2) should U/N be U/C? N does not appear to be defined. 

      We have more removed the previous A1 section. Please see our response to reviewer 2 as well.

      (5) Text under (A5): it is unclear what is meant by "SFS must be heavy tailed (that is, more heterogeneous)" -- a more precise statement regarding tail decay rate and associated consequences would be more helpful. 

      We have more removed the previous A section, where the original text "...SFS must be heavy-tailed" was.

      (6) Section A4 and Figure A1: can these calculations be compared to simulations? 

      We have more removed the previous A section on the deterministic analysis as they are not so  relevant to our stochastic simulations indeed. Please see our response to reviewer 2 as well.

      (7) Also, in general, please clarify how the results in the Appendix are used in the main text conclusions or provide insights relevant to these conclusions. If they are not, one can consider removing them.  

      We have more removed the previous A section on the deterministic analysis. The remaining sections are about stochastic simulations and extended figures which support our main figures.  

      (8) Figure A2: the two lines are difficult to tell apart on each panel. Please consider different styles.

      We have changed one of the dotted lines to be solid. This figure is now Figure A1 in our revision.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      I am not currently convinced by the principal interpretations and think that other explanations based on known phenomena could account for key results. Specifically the authors have not resolved whether oxidative modification to 5mC and 3mC, or chemical attack to ssDNA that is transiently exposed in the repair processing of 5mC and 3mC is the principal source of the observed genotoxicity.

      (1) Original query which still stands: As noted in the manuscript, AlkB repairs alkylation damage by direct reversal (DNA strands are not cut). In the absence of AlkB, repair of alklylation damage/modification is likely through BER or other processes involving strand excision and resulting in single stranded DNA. It has previously been shown that 3mC modification from MMS exposure is highly specific to single stranded DNA (PMID:20663718) occurring at ~20,000 times the rate as double stranded DNA. Consequently the introduction of DNMTs is expected to introduce many methylation adducts genome-wide that will generate single stranded DNA tracts when repaired in an AlkB deficient background (but not in an AlkB WT background), which are then hyper-susceptible to attack by MMS. Such ssDNA tracts are also vulnerable to generating double strand breaks, especially when they contain DNA polymerase stalling adducts such as 3mC. The generation of ssDNA during repair is similarly expected follow the H2O2 or TET based conversion of 5mC to 5hmC or 5fC neither of which can be directly repaired and depend on single strand excision for their removal. The potential importance of ssDNA generation in the experiments has not been [adequately] considered.

      We thank the reviewer for expanding on their previous comment.  We completely agree with the possibility that they raise and have added an extra paragraph in the discussion to expand on our consideration of the role of ssDNA in DNMT-induced DNA damage, which we reproduce here:

      "The observation that TET overexpression sensitizes cells expressing DNMTs to oxidative stress strongly suggests that the site of DNA damage is the modified cytosine itself.  However, we do not currently have definitive evidence supporting this.  As mentioned in the results section, the presence of unrepaired 3mC may lead to increased levels of ssDNA; it is also possible that 5mC itself may increase ssDNA levels.  Loss of alkB would be expected to increase the amount of ssDNA.  Thus DNA damage surrounding modification sites, but not specifically localised to it, might be the cause of the increased sensitivity.  These two different models make different predictions.  If modified cytosines are the source of the damage, mutations arising would be predominantly located at CG dinucleotides.  Alternatively, ssDNA exposure would result in distributed mutations that would not necessarily be located at CG sites.  The highly biased spectrum of mutations that can be screened through the Rif resistance assay does not allow us to address this currently.  However, future experiments to create mutation accumulation lines could allow us to address the question systematically on a genome-wide level. "

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      (1) The authors demonstrate that female Spodoptera littoralis moths prefer to oviposit on wellwatered tomato plants and avoid drought-stressed plants. The study then recorded the sounds produced by drought-stressed plants and found that they produce 30 ultrasonic clicks per minute. Thereafter, the authors tested the response of female S. littoralis moths to clicks with a frequency of 60 clicks per minute in an arena with and without plants and in an arena setting with two healthy plants of which one was associated with 60 clicks per minute. These experiments revealed that in the absence of a plant, the moths preferred to lay eggs on the side of the area in which the clicks could be heard, while in the presence of a plant the S. littoralis females preferred to oviposit on the plant where the clicks were not audible. In addition, the authors also tested the response of S. littoralis females in which the tympanic membrane had been pierced making the moths unable to detect the click sounds. As hypothesised, these females placed their eggs equally on both sites of the area.

      Finally, the authors explored whether the female oviposition choice might be influenced by the courtship calls of S. littoralis males which emit clicks in a range similar to a drought-stressed tomato plant. However, no effect was found of the clicks from ten males on the oviposition behaviour of the female moths, indicating that the females can distinguish between the two types of clicks. Besides these different experiments, the authors also investigated the distribution of egg clusters within a longer arena without a plant, but with a sugar-water feeder. Here it was found that the egg clusters were mostly aggregated around the feeder and the speaker producing 60 clicks per minute. Lastly, video tracking was used to observe the behaviour of the area without a plant, which demonstrated

      that the moths gradually spent more time at the arena side with the click sounds.

      We thank the reviewers for their helpful comments. We agree with the summary, but would like to note that in the control experiment (Figure 2) we used a click rate of 30 clicks per minute—a design choice driven by the editor’s feedback. We have clarified this and, to further probe the system’s dynamics, added a second experiment employing the same click rate (30 clicks per minute) with a dehydrated plant (see details below). In both experiments, females again showed a clear tendency to oviposit nearer the speaker; these findings are described in the updated manuscript.

      (2) The study addresses a very interesting question by asking whether female moths incorporate plant acoustic signals into their oviposition choice, unfortunately, I find it very difficult to judge how big the influence of the sound on the female choice really is as the manuscript does not provide any graphs showing the real numbers of eggs laid on the different plants, but instead only provides graphs with the Bayesian model fittings for each of the experiments. In addition, the numbers given in the text seem to be relatively similar with large variations e.g. Figure 1B3: 1.8 {plus minus} 1.6 vs. 1.1 {plus minus} 1.0. Furthermore, the authors do not provide access to any of the raw data or scripts of this study, which also makes it difficult to assess the potential impact of this study. Hence, I would very much like to encourage the authors to provide figures showing the measured values as boxplots including the individual data points, especially in Figure 1, and to provide access to all the raw data underlying the figures.

      We acknowledge that there are researchers who favor Bayesian graphical representation versus raw data visualization. Therefore, we have added chartplots of the raw data from Figure 1 in the supplementary section. We are aware of the duplication in presentation and apologize for this redundancy.  

      Regarding the variance and means we obtained in our experiment, we have analyzed all raw data using the statistical model presented, and if statistical significance was found despite a particular mean difference or variance, this is meaningful from a biological perspective. One can certainly discuss whether this difference has biological importance, but it should be remembered that in this experimental system, we are trying to isolate the acoustic signal from a complex system that includes multiple signals. Therefore, at no point we’ve suggested that this is a standalone factor, but rather proposed it as an informative and significant component. 

      In addition to the experiments described above, we conducted an experiment in which we counted both eggs and clusters. The results indicate that cluster counts are a reliable proxy for reproductive investment at a given location. In this experiment, we present cluster numbers alongside egg counts (Figure 2).

      Furthermore, we apologize for the technical error that prevented our uploaded data files from reaching the reviewers. We have also uploaded updated data and code.

      (3) Regarding the analysis of the results, I am also not entirely convinced that each night can be taken as an independent egg-laying event, as the amount of eggs and the place were the eggs are laid by a female moth surely depends on the previous oviposition events. While I must admit that I am not a statistician, I would suggest, from a biological point of view, that each group of moths should be treated as a replicate and not each night. I would therefore also suggest to rather analyse the sum of eggs laid over the different consecutive nights than taking the eggs laid in each night as an independent data point.

      We thank the reviewer for this question. This is a valid and point that we will address in three aspects: 

      First, regarding our statistical approach, we used a model that takes into account the sequence of nights and examines whether there is an effect of the order of nights, i.e., we used GLMMs, with the night nested within the repetition. This is equivalent to addressing this as a repeated measure and is, to our best knowledge, the common way to treat such data. 

      Second, following the reviewer's comment, we also reran the statistics of the third experiment (i.e., “sound gradient experiments”, Figure 2 and Supplementary figure 4) when only taking the first night when the female/s laid eggs to avoid the concern of dependency. This analysis revealed the same result – i.e., a significant preference for the sound stimulus. We have now updated our methods and results section to clarify this point.  

      Third, an important detail that may not have been clearly specified in the methods: at the end of each night, we cleaned the arena of counted egg clusters using a cloth with ethanol, so that on the subsequent night, we would not expect there to be evidence of previous oviposition but thus would not exclude some sort of physiological or cognitive memories. We have now updated our methods section to clarify this important procedural point. 

      (4) Furthermore, it did not become entirely clear to me why a click frequency of 60 clicks per minute was used for most experiments, while the plants only produce clicks at a range of 30 clicks per minute. Independent of the ecological relevance of these sound signals, it would be nice if the authors could provide a reason for using this frequency range. Besides this, I was also wondering about the argument that groups of plants might still produce clicks in the range of 60 clicks per minute and that the authors' tests might therefore still be reasonable. I would agree with this, but only in the case that a group of plants with these sounds would be tested. Offering the choice between two single plants while providing the sound from a group of plants is in my view not the most ecologically reasonable choice. It would be great if the authors could modify the argument in the discussion section accordingly and further explore the relevance of different frequencies and dBlevels.

      This is an excellent point. We originally increased the click rate generate a strong signal. However, it was important for us to verify that there was ecological relevance in the stimulus we implemented in the system. For this purpose, we recorded a group of dehydrated plants at a distance of ~20cm and we measured a click rate of 20 clicks per minute (i.e., 0.33 Hz) (see Methods section). Therefore, as mentioned at the beginning of this letter, in the additional experiment described in Figure 2, we reduced the click frequency to 30 clicks per minute, and at this lower rate, the effect was maintained. Increasing plant density would probably lead to a higher rate of 30 clicks per minute. 

      (5) Finally, I was wondering how transferable the findings are towards insects and Lepidopterans in general. Not all insects possess a tympanic organ and might therefore not be able to detect the plant clicks that were recorded. Moreover, I would imagine that generalist herbivorous like Spodoptera might be more inclined to use these clicks than specialists, which very much rely on certain chemical cues to find their host plants. It would be great if the authors would point more to the fact that your study only investigated a single moth species and that the results might therefore only hold true for S. littoralis and closely related species, but not necessary for other moth species such as Sphingidae or even butterflies.

      Good point. Our research uses a specific model system of one moth species and one plant species in a particular plant-insect interaction where females select host plants for their offspring. As with any model-based research that attempts to draw broader conclusions, we've taken care to distinguish between our direct findings and potential wider implications. We believe our system may represent mechanisms relevant to a wider group of herbivorous insects with hearing capabilities, particularly considering that several moth families and other insect orders can detect ultrasound. However, additional research examining more moth and plant species is necessary to determine how broadly applicable these findings are. We have made these clarifications in the text.

      Reviewer #2 (Public review):

      (6) The results are intriguing, and I think the experiments are very well designed. However, if female moths use the sounds emitted by dehydrated plants as cues to decide where to oviposit, the hypothesis would predict that they would avoid such sounds. The discussion mentions the possibility of a multi-modal moth decision-making process to explain these contradictory results, and I also believe this is a strong possibility. However, since this remains speculative, careful consideration is needed regarding how to interpret the findings based solely on the direct results presented in the results section.  

      Thank you for this insightful observation. We agree that the apparent attraction of females to dehydrated-plant sounds contradicts our initial prediction. Having observed this pattern consistently across multiple setups, we have now added a targeted choice experiment to the revised manuscript: here female moths were offered a choice between dehydrated plants broadcasting their natural ultrasonic emissions and a control. These results—detailed in the Discussion and presented in full in the Supplementary Materials (Supplementary Figure 4)—show that when only a dehydrated plant is available, moths would prefer it for oviposition, supporting our hypothesis that in the absence of a real plant, the plant’s sounds might represent a plant..

      (7) Additionally, the final results describing differences in olfactory responses to drying and hydrated plants are included, but the corresponding figures are placed in the supplementary materials. Given this, I would suggest reconsidering how to best present the hypotheses and clarify the overarching message of the results. This might involve reordering the results or re-evaluating which data should appear in the main text versus the supplementary materials

      Thank you for this suggestion. We have reorganized the manuscript and removed the olfactory response data from the current version to maintain a focused narrative on acoustic cues. We agree that a detailed investigation of multimodal interactions deserves a separate study, which we plan to pursue in future work. 

      (8) There were also areas where more detailed explanations of the experimental methods would be beneficial.

      Thank you for highlighting this point. We have expanded and clarified the Methods section to provide comprehensive detail on our experimental procedures.

      Reviewer #1 (Recommendations for the authors):

      (9) Line 1: Please include the name of the species you tested also in the title as your results might not hold true for all moth species.

      We do not fully agree with this comment. Please see comment 5.

      (10) Line 19-20: Please rephrase the sentence so that it becomes clear that the "dehydration stress" refers to the plant and not to the moths.

      Thank you for the suggestion; we have clarified the text accordingly

      (11) Line 31: Male moths might provide many different signals to the females, maybe better "male sound signals" or similar.

      Thank you for the suggestion; we have clarified the text accordingly.

      (12) Line 52-53: Maybe mention here that not all moth species have evolved these abilities.

      Thank you for the suggestion; we have clarified the text accordingly.

      (13) Line 77: add a space after 38.

      Thank you for the suggestion; we have clarified the text accordingly.

      (14) Line 88: Maybe change "secondary predators" to "natural enemies".

      Thank you for the suggestion; we have clarified the text accordingly.

      (15) Line 134: Why is "notably" in italics? I would suggest using normal spelling/formatting rules here.

      Thank you for the suggestion; we have clarified the text accordingly.

      (16) Line 140-144: If you did perform the experiment also with the more ecological relevant playback rate, why not present these findings as your main results and use the data with the higher playback frequency as additional support?

      Thank you for this suggestion. We agree that the ecologically relevant playback data are important; as described in detail at the beginning of this letter and also in comment 4, however, to preserve a clear and cohesive narrative, we have maintained the original ordering of this section. Nevertheless, the various experiments conducted in Figure 1 differ in several components from Figure 2 and the work that examined sounds in plant groups in the appendices. Therefore, we find it more appropriate to use them as supporting evidence for the main findings rather than creating a comparison between different experimental systems. For this reason, we chose to keep them as a separate description in "The ecological playback findings (Lines 140–144) remain fully described in the Results and serve to reinforce the main observations without interrupting the manuscript's flow.

      (17) Line 146: Please explain already here how you deafened the moths.

      Thank you for the suggestion; we have clarified the text accordingly.

      (18) Line 181: should it be "male moths' " ?

      Thank you for the suggestion; we have clarified the text accordingly.

      (19) Line 215: Why is "without a plant" in italics? I would suggest using normal spelling/formatting rules here.

      Thank you for the suggestion; we have clarified the text accordingly.

      (20) Line 234: I do not understand why this type of statistic was used to analyse the electroantennogram (EAG) results. Would a rather simple Student's t-test or a Wilcon rank sum test not have been sufficient? I would also like to caution you not to overinterpret the data derived from the EAG, as you combined the entire headspace into one mixture it is no longer possible to derive information on the different volatiles in the blends. The differences you observe might therefore mostly be due to the amount of emitted volatiles.

      We have reorganized the manuscript and removed the olfactory response data from the current version to maintain a focused narrative on acoustic cues (See comment 7). 

      (21) Line 268: It might be nice to add an additional reference here referring to the multimodal oviposition behaviour of the moths.

      Thank you for the suggestion; we have clarified the text accordingly.

      (22) Line 284: If possible, please add another reference here referring to the different cues used by moths during oviposition.

      Thank you for the suggestion; we have clarified the text accordingly.

      (23) Line 336: What do you mean by "closed together"?

      Thank you for the suggestion; we have clarified the text accordingly.

      (24) Line 434-436: Please see my overall comments. I do not think that you can call it ecologically relevant if the signal emitted by multiple plants is played in the context of just a single plant.

      Please see comments 1 and 4.

      (25) Line 496: Please change "stats" to statistics.

      Thank you for the suggestion; we have clarified the text accordingly.

      (26) Line 522-524: I am not sure whether simply listing their names does give full credit to the work these people did for your study. Maybe also explain how they contributed to your work.

      Thank you for the suggestion; we have clarified the text accordingly.

      Reviewer #2 (Recommendations for the authors):

      (27) L54 20-60kHz --> 20Hz-60kHz or 20kHz - 60kHz?

      OK. We have replaced it.

      (28) L124 Are the results for the condition where nothing was placed and the condition where a decoy silent resistor was placed combined in the analysis? If so, were there no significant differences between the two conditions? Comparing these with a condition presenting band-limited noise in the same frequency range as the drought-stressed sounds might also have been an effective approach to further isolate the specific role of the ultrasonic emissions.

      We have used both conditions due to technical constrains and pooled them tougher for analysis— statistical tests confirmed no significant differences between them—and this clarification has now been added to the Methods section including the results of the statistical test.

      (29) L125 (Fig. 1A), see Exp. 1 in the Methods). -> (Fig.1B. See Exp.1 in the Methods).

      Thank you for the suggestion; we have clarified the text accordingly.

      (30) L132 "The opposite choice to what was seen in the initial experiment (Fig.1B)"

      Thank you for the suggestion; we have clarified the text accordingly.

      (31) L137-143 If you are writing about results, why not describe them with figures and statistics? The current description reads like a discussion.

      These findings were not among our primary research questions; however, we believe that including them in the Results section underscores the experimental differences. In our opinion, introducing an additional figure or expanding the statistical analysis at this point would disrupt the narrative flow and risk confusing the reader.

      (32) L141 "This is higher than the rate reported for a single young plant" Are you referring to the tomato plants used in the experiments? It might be helpful to include in the main text the natural click rate emitted by tomato plants, as this information is currently only mentioned in the Methods section.

      See comment 4.  

      (33) L191 Is the main point here to convey that the plant playback effect remained significant even when the sound presentation frequency was reduced to 30 clicks per minute? The inclusion of the feeder element, however, seems to complicate the message. To simplify the results, moving the content from lines 185-202 to the supplementary materials might be a better approach. Additionally, what is the rationale for placing the sugar solution in the arena? Is it to maintain the moths' vitality during the experiment? Clarifying this in the methods section would help provide context for this experimental detail.

      In this series of experiments, we manipulated four variables—single moths, ultrasonic click rate, arena configuration (from a two-choice design to an elongated enclosure), and the response metric (total egg counts rather than cluster counts)—to evaluate moth oviposition under more ecologically realistic conditions. We demonstrate the system’s robustness and validity in a more realistic setting (by tracking individual moths, counting single eggs, etc.).  

      As noted in the text, feeders were included to preserve the moths’ natural behavior and vitality. We have further clarified this in the revised manuscript.

      (34) L215 Is the click presentation frequency 30 or 60 per minute? Since Figure 3 illustrates examples of moth movement from the experiment described in Figure 1, it might be more effective to present Figure 3 when discussing the results of Figure 1 or to include it in the supplementary materials for better clarity and organization.

      See comments 1 and 4. As mentioned in the above 

      (35) L291 Please provide a detailed explanation of the experiments and measurements for the results shown in Figure S3 (and Figure S2). If the multi-modal hypothesis discussed in the study is a key focus, it might be better to include these results in the main results section rather than in the supplementary materials.

      Thank you for this suggestion. Figure S2 was removed, see comments above. We’ve added now the context to figure S3.

      (36) L303 It might be helpful to include information about the relationship between the moth species used in this study and tomato plants somewhere in the text. This would provide an important context for understanding the ecological relevance of the experiments.

      Thank you for the suggestion; we have clarified the text accordingly.

      (37) Table 1 The significant figures in the numbers presented in the tables should be consistent.

      Thank you for the suggestion; we have clarified the text accordingly.

      (38) L341 The text mentions that experiments were conducted in a greenhouse, but does this mean the arena was placed inside the greenhouse? Also, the term "arena" is used - does this refer to a sealed rectangular case or something similar? For the sound presentation experiments, it seems that the arena cage was placed inside a soundproof room. If the arena is indeed a case-like structure, were there any specific measures taken to prevent sound scattering within the case, such as the choice of materials or structural modifications?

      Here, “arena” refers to the plastic boxes used throughout this study. In this particular experiment, we presented plants alone—reflecting ongoing debate in the literature—and used these trials as a baseline for our subsequent sound-presentation experiments, during which we measured sound intensity as described in the Methods section. All sound-playback experiments were conducted in sound-proof rooms, and acoustic levels were measured beforehand—sound on the control side fell below our system’s detection threshold. 

      (39) L373 "resister similar to the speaker" Could you explain it in more detail? I think this would depend on the type of speaker used-particularly whether it includes magnets. From an experimental perspective, presenting different sounds such as white noise from the speaker might have been a better control. Was there a specific reason for not doing so? Additionally, the study does not clearly demonstrate whether the electric and magnetic field environments on both sides of the arena were appropriately controlled. Without this information, it is difficult to evaluate whether using a resistor as a substitute was adequate.

      Thank you for this comment. We have now addressed this point in the Discussion. We acknowledge that we did not account for the magnetic field, which might have differed between the speaker and the resistor. We agree that using an alternative control, such as white noise, could have been informative, and we now mention this as a limitation in the revised Methods.

      (40) L435 60Hz? The representation of frequencies in the text is inconsistent, with some values expressed in Hz and others as "clicks per second." It would be better to standardize these units for clarity, such as using Hz throughout the manuscript.

      We agree that this is confusing. We reviewed the text and made sure that when we addressed click per second, we meant how many clicks were produced and when we addressed Hz units it was in the context of sound frequencies.  

      (41) L484 "we quantified how many times each individual crossed the center of the arena" Is this data being used in the results?

      Yes. Mentioned in the text just before Figure 3. L220

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      In this study, Meunier et al. investigated the functional role of IL-10 in avian mucosal immunity. While the anti-inflammatory role of IL-10 is well established in mammals, and several confirmatory knockout models are available in mice, IL-10's role in avian mucosal immunity is so far correlative. In this study, the authors generated two different models of IL-10 ablation in Chickens. A whole body knock-out model and an enhancer KO model leading to reduced IL10 expression. The authors first performed in vitro LPS stimulation-based experiments, and then in vivo two different infection models employing C. jejuni and E. tenella, to demonstrate that complete ablation of IL10 leads to enhanced inflammation-related pathology and gene expression, and enhanced pathogen clearance. At a steady-state level, however, IL-10 ablation did not lead to spontaneous colitis. 

      Strengths: 

      Overall, the study is well executed and establishes an anti-inflammatory role of IL-10 in birds. While the results are expected and not surprising, this appears to be the first report to conclusively demonstrate IL-10's anti-inflammatory role upon its genetic ablation in the avian model. Provided this information is applicable in combating pathogen infection in livestock species in sustainable industries like poultry, the study will be of interest to the field. 

      Weaknesses: 

      The study is primarily a confirmation of the already established anti-inflammatory role of IL-10. 

      We do not agree that this work is primarily confirmatory. The anti-inflammatory role of IL10 was indeed known previously from studies in mammals. The much more general insight from the current study is our demonstration of the intrinsic trade-off between inflammation and tolerance in the response to both the microbiome (which was significantly altered in the IL10 knockout birds) and mucosal pathogens. The study of Eimeria challenge in particular highlights the fact that it may be better for the host to tolerate a potential pathogen than to take on the cost of elimination.

      Reviewer #2 (Public review): 

      Summary: 

      The authors were to investigate the functional role of IL10 on mucosal immunity in chickens. CRISPR technology was employed to generate IL10 knock-out chickens in both exon and putative enhancer regions. IL10 expressions were either abolished (knockout in exon) or reduced (enhancer knock-out). IL-10 plays an important role in the composition of the caecal microbiome. Through various enteric pathogen challenges, deficient IL10 expression was associated with enhanced pathogen clearance, but with more severe lesion scores and body weight loss. 

      Strengths: 

      Both in vitro and in vivo knock-out abolished and reduced IL10 expression, and broad enteric pathogens were challenged in vivo, and various parameters were examined to evaluate the functional role of IL10 on mucosal immunity. 

      Weaknesses: 

      Overexpression of IL-10 either in vitro or in vivo may further support the findings from this study. 

      An overexpression experiment, regardless of outcome, would not necessarily support or invalidate the findings of the current study. It would address the question of whether the absolute concentration of IL10 produced alters the outcome of an infection.

      Reviewer #1 (Recommendations for the authors): 

      The following are the recommendations that, in my opinion, will be helpful to enhance the quality of the study. 

      Major point: 

      The authors at a steady state did not observe any sign of spontaneous colitis. Since IL-10 KO in mice leads to enhanced pathological score upon DSS-mediated induction of colitis, and several colitis models are well established in birds, it will be worthwhile to test the consequence of experimentally inducing colitis in this context. 

      One of the novel features of this study is the observation that the microbiome is modified in the IL10KO HOM chicks, which may serve to mitigate potential spontaneous pathology; we now mention this in the discussion. We agree that it could be worthwhile in the future to look at additional challenge models. However, we would argue that the Eimeria challenge is a sufficiently adequate experimentally-induced model of colitis to demonstrate the increased inflammation that occurs in an IL10-deficient bird. This is further supported by evidence of enhanced inflammatory responses in the caeca of IL10KO HOM birds challenged with Campylobacter or Salmonella relative to WT controls. See in the revised manuscript (pages 12-13).

      Minor points: 

      (1) In Figure 2B, the authors should confirm whether the ROS-AV163 groups also have LPS treatment. 

      The legend for Figure 2B already states that neutralizing anti-IL10 antibody was added to LPS-stimulated BMDMs: “Nitric oxide production was assessed by measuring nitrite levels using Griess assay for LPS-stimulated BMDMs […] in the absence or presence of neutralizing anti-IL10 antibody ROS-AV163”. However, for added clarity we have now modified the x-axis label for Figure 2B (“+ROS-AV163” replaced by “+LPS +anti-IL10”) and we have also made minor changes to the figure legend. See in the revised manuscript (page 33).

      (2) In Figure 3F, the authors should discuss why the duodenum of KO birds has enhanced infiltration compared to WT? 

      We are not sure what the reviewer is referring to here. Although not specifically mentioned in Figure 3F, there is no statistically significant difference in cellular infiltration in the duodenum of IL10KO WT and HOM birds raised in our specified pathogen-free (SPF) facility, nor in the duodenum of IL10KO WT and HOM birds raised in our conventional facility (Mann-Whitney U tests, p>0.1 in both cases); this can be seen in the sums of histopathological scores shown in Figures 3C (SPF facility) and 3E (conventional facility). Figure 3F shows that there is a statistically significant difference in cellular infiltration scores in the duodenum and proximal colon of both IL10KO WT and HOM birds based on the environment they are raised in (SPF vs conventional). We have made minor changes to the text to clarify this. See in the revised manuscript (page 7).

      (3) The authors should discuss the observed differences in the C. jejuni colonization results among the two cohorts at week 1 and week 2 post-infection. 

      Numbers of C. jejuni in the caeca of IL10KO HOM birds were markedly lower than for WT controls at 1-week post-infection in cohort 1, and at both time intervals post-infection in cohort 2 (Figure 4A). This reached statistical significance at 1-week post-infection in cohort 1 and at 2-weeks post-infection in cohort 2. It is evident from Figure 4A that considerable inter-animal variance existed in each group, and in the IL10KO HOM birds in particular. This is typical of C. jejuni colonisation in chickens, where bacterial population structures have been reported to be variable and unpredictable (Coward et al., Appl Environ Microbiol 2008, PMID: 18424530). Similar variation between time intervals, birds and repeated experiments has been reported when evaluating vaccines against C. jejuni colonisation (e.g. Buckley et al., Vaccine 2010, PMID: 19853682; Nothaft et al., Front Microbiol 2021, PMID: 34867850). We performed two independent studies for this reason. Taken together, we consider that our data provide convincing evidence of elevated pro-inflammatory responses upon C. jejuni infection in IL10KO HOM birds relative to WT controls that associates with reduced bacterial burden. Our data is also consistent with a published observation that a commercial broiler line with low IL10 expression had correspondingly elevated expression of CXCLi-1, CXCLi-2 and IL-1b (Humphrey et al., mBio 2014, reference 33 in our original submission). We have added text to the discussion to capture the points above.  See in the revised manuscript (page 13).

      Reviewer #2 (Recommendations for the authors): 

      For the animal challenging experiments, both IL10KO HOM and IL10EnKO HOM chickens were used for Eimeria challenge, but not for Salmonella and Campylobacter. Could the authors justify why? 

      The Eimeria challenge produced a much higher and more reproducible level of inflammation than either of the bacterial challenge models. Within the parasite challenge cohorts, IL10KO HET and IL10EnKO HOM birds were only marginally different from WT controls (e.g. parasite replication: Figures 5A and B; lesion scores: Figures 5E and F; body weight gain: Figures 5G and H). Given the more limited response and the inter-individual variation in the bacterial challenge models, we felt that analysis of a sufficiently large cohort of the IL10KO HOM was appropriate, while additional cohorts of IL10KO HET and IL10EnKO HOM birds large enough to detect statistically significant differences could not be justified.

      In the M&M, there was no mention of # of birds generated for IL10EnKO HOM, HET, etc. 

      Full details of bird numbers can be found in SI Appendix Table S1 “Number of IL10KO and IL10EnKO WT, HET and HOM chicks hatched in the NARF SPF chicken facility in the first (G1) and second (G2) generations”. Table S1 is already referred to in the Results section “Generation of IL10-deficient chickens”; we have now also clearly referred to it in the “Animals” and “Generation of surrogate host chickens and establishment of the IL10KO and IL10EnKO lines under SPF conditions” sections of the Materials and Methods. In all three sections we have also added some text to clarify that the table details G1 and G2 bird numbers. See in the revised manuscript (pages 5, 15, 17).

      From the results of Campylobacter challenge, the results from the cohort 1 and cohort 2 were not consistent at both 1 and 2 weeks of post-infection. There is not much discussion on this inconsistency. What is the final conclusion: significant difference in week 1 or week 2, OR none of them, OR both of them. What would happen if an additional cohort were conducted for Salmonella and Eimeria? 

      As noted in response to Reviewer 1 (minor point 3), we have now added text to the discussion on the partial inconsistency between independent C. jejuni challenge studies. We do not feel that additional experiments to address this comment are required. Highly significant increases in the infiltration of lymphoplasmacytic cells and heterophils were detected in IL10KO HOM chickens relative to WT controls in the caeca, a key site of Campylobacter colonisation. This was consistently observed in two independent cohorts at both 1- and 2-weeks post-infection (SI Appendix Figures S7 and S8) and was reflected in similar patterns of expression of pro-inflammatory genes at these intervals in both cohorts (Figure 4B). As our laboratory has observed substantially less variation between repeated Salmonella challenges, a single study was performed, but with adequate power to detect statistical differences.  The effects of E. tenella infection in IL10KO WT and HOM birds were replicated (compare Figure 4 with data from day 6 in Figure 5).

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Axon growth is of course essential to the formation of neural connections. Adhesion is generally needed to anchor and rectify such motion, but whether the tenacity or forces of adhesion must be optimal for maximal axon extension is unknown. Measurements and contributing factors are generally lacking and are pursued here with a laser-induced shock wave approach near the axon growth cone. The authors claim to make measurements of the pressure required to detach axons from low to high matrix density. The results seem to support the authors' conclusions, and the work - with further support - is likely to impact the field of cell adhesion. In particular, there could be some utility of the methods for the adhesion and those interested in aspects of axon growth.

      Strengths:

      A potential ability to control the pressure simply via proximity of the laser spot is convenient and perhaps reasonable. The 0 to 1 scale for matrix density is a good and appropriate measure for comparing adhesion and other results. The attention to detachment speed, time, F-actin, and adhesion protein mutant provides key supporting evidence. Lastly, the final figure of traction force microscopy with matrix varied on a gel is reasonable and more physiological because neural tissue is soft (cite PMID: 16923388); an optimum in Fig.6 also perhaps aligns with axon length results in Fig.5.

      We thank you for your many suggestions to improve the presentation to explain our experimental results obtained. We carefully reconsidered problems you pointed out and revised the manuscripts as follows.

      Weaknesses:

      The results seem incomplete and less than convincing. This is because the force calibration curve seems to be from a >10 yr old paper without any more recent checks or validating measurements.

      As the force calibration data, although we have indicated by the experimental system over 10 years ago, we have used the same system under appropriate maintenance. The system performance has been checked regularly and maintained. Therefore, the calibration data displayed is suitable even in the present. There is no problem with the calibration data.

      Secondly, the claimed effect of pressure on the detachment of the growth cone does not consider other effects such as cavitation or temperature, and certainly needs validation with additional methods that overcome such uncertainties.

      The authors need to check whether the laser perturbs the matrix, particularly local density. A relation between traction stresses of ~20-50 pN/um<sup>2</sup> in Fig.6 and the adhesion pressure of 3-5 kPa of FIg.3 needs to be carefully explained; the former units equate to 0.02-0.05 kPa, and would perhaps suggest cells cannot detach themselves and move forward.

      We have previously reported that a single pulse from a Ti:sapphire femtosecond laser amplifier can effectively generate shockwave and stress waves with minimal thermal effects. Notably, during this process, the temperature elevation at the laser focal point is sufficiently suppressed, allowing efficient force generation without causing significant heating in the surrounding area. By applying this method, we have confirmed that cell have any damage after the force loading. Therefore, this approach enables cell detachment while minimizing thermal and cavitation-induced damage to the cell. This clarification has been incorporated into the revised results section (lines 119-120). We agree with the reviewer that the presented data was insufficient for supporting the proposed model. To this end, we have performed additional experiments and analyses, which are included in the revised version of the manuscript. To examine the impact of femtosecond laser irradiation on laminin, fluorescently labeled laminin was coated onto glass-bottom dishes, and the fluorescent intensity was analyzed before and after the impulsive force loading. The result indicates that the fluorescent intensity at the laser focal point remained unaffected by laser irradiation. This finding suggests that axon detachment results from the dissociation between L1 and laminin rather than the detachment of laminin from the substrate. These data have been incorporated into Supplementary Fig. 1 and page 5 (lines 113-120). In addition, explanation of the relationship between the adhesion pressure and the traction stress has been specified in page 8 (lines 253-258).

      The authors need to measure axon length on gels (Fig.6) as more physiological because neural tissue is soft. The studies are also limited to a rudimentary in vitro model without clear relevance to in vivo.

      In response to the reviewer’s request, we measured the axon length on the polyacrylamide gel with stiffness comparable to brain tissue (0.3kPa). The axon length was consistently shorter on the gel on the glass under our experimental conditions, in agreement with previous findings (Abe at al., 2021). Furthermore, a biphasic relationship between axon outgrowth and laminin concentration was observed. These results suggest that the biphasic behavior of axon outgrowth identified in this study is likely to occur in vivo. We have updated the Fig. 6 and specified the result (lines 224-225) in revised manuscript.

      Reviewer #1 (Recommendations For The Authors):

      The force calibration curve seems to be from a >10 yr old paper without any more recent checks or validating measurements - which are essential. Effects of cavitation and temperature must be checked, and validated with additional methods that overcome such uncertainties. The authors need to check whether the laser perturbs the matrix, particularly local density. A relation between traction stresses of ~20-50 pN/um2 in Fig.6 and the adhesion pressure of 3-5 kPa of FIg.3 needs to be carefully explained; the former units equate to 0.02-0.05 kPa, and would perhaps suggest cells cannot detach themselves and move forward. The authors need to measure axon length on gels (Fig.6) as more physiological because neural tissue is soft. The studies are also limited to a rudimentary in vitro model without clear relevance to in vivo.

      Thank you this reviewer for the recommendations on our manuscript. For this, we have answered above comments. Please find our response there.

      Reviewer #2 (Public Review):

      Summary:

      The authors measure axon outgrowth rate, laminin adhesion strength, and actin rearward flow rate. They find that the axon outgrowth rate has a biphasic dependence on adhesion strength. In interpreting the results, they suggest that the results "imply that adhesion modulation is key to the regulation of axon guidance"; however, they measure elongation rate, not guidance.

      Strengths:

      The measurements of adhesion strength by laser-induced shock waves are reasonable as is the measurement of actin flow rates by speckle microscopy.

      Weaknesses:

      They only measure the length of the axons after 3 days and have no measurements of the actual rate of growth cone movements when they are moving. They do not measure the rate of actin growth at the leading edge to know its contribution to the extension rate. This is inadequate.

      These studies are unlikely to have an impact on the field because the measurement of axon growth rate at short times is missing.

      We thank the reviewer for understanding novelty of our study. We agree with the reviewer’s comment. Following the comment, we performed time-lapse imaging of growth cone movements and quantified the migration rate. Consistent with the length of axons, the migration rate did not exhibit a monotonic increase with increased L1CAM-laminin binding but rather displayed biphasic behavior, where excessive L1CAM-laminin binding led to a reduction in the migration rate. Notably, the biphasic migration behavior was abolished in the L1CAM knockdown neurons. We believe these results provide further support for our proposed model. This has been incorporated into new Fig.5 and page 7 (lines 209-218) of the revised manuscript. In addition, the experimental method has been added in page 13 (lines 385-391).

      Reviewer #2 (Recommendations For The Authors):

      This is a very weak paper because of the lack of relevant measurements to enable correlations between actual extension rate, traction force, and rates of speckle movement.

      Thank you this reviewer for the critical comment on our model. we performed time-lapse imaging of growth cone movements and quantified the migration rate. From this reviewer and reviewer #3 comments, we recognized the importance of prior studies that the measurement of adhesion strength in the growth cone, traction force, the correlation between retrograde flow and outgrowth, and biphasic dependence of substrate concentration of neurite outgrowth (Please also find our response to recommendations from reviewer #3).

      Reviewer #3 (Public Review):

      Summary:

      Yamada et al. build on classic and more recent studies (Chen et al., 2023; Lemmon et al., 1992; Nichol et al., 2016; Zheng et al., 1994; Schense and Hubbell, 2000) to better understand the relationship between substrate adhesion and neurite outgrowth.

      Strengths:

      The primary strength of the manuscript lies in developing a method for investigating the role of adhesion in axon outgrowth and traction force generation using a femtosecond laser technique. The most exciting finding is that both outgrowth and traction force generation have a biphasic relationship with laminin concentration.

      Weaknesses:

      The primary weaknesses are a lack of discussion of prior studies that have directly measured the strength of growth cone adhesions to the substrate (Zheng et al., 1994) and traction forces (Koch et al., 2012), the inverse correlation between retrograde flow rate and outgrowth (Nichol et al., 2016), and prior studies noting a biphasic effect of substrate concentration of neurite outgrowth (Schense and Hubbell, 2000).

      Overall, the claims and conclusions are well justified by the data. The main exception is that the data is more relevant to how the rate of neurite outgrowth is controlled rather than axonal guidance.

      This manuscript will help foster interest in the interrelationship between neurite outgrowth, traction forces, and substrate adhesion, and the use of a novel method to study this problem.

      We thank the reviewer for appropriate comments and recognition of the strength to our manuscript. Regarding to these comments, we recognized the importance of prior studies that the measurement of adhesion strength in the growth cone, traction force, the correlation between retrograde flow and outgrowth, and biphasic dependence of substrate concentration of neurite outgrowth. With respecting the prior studies, we revised the introduction (lines 38-44, 61-65) and discussion (lines 272-281) in the manuscript. The references suggested by the reviewer have been added (Ref. 17, 26, 27, 31, and 35) (see also below responses).

      Reviewer #3 (Recommendations For The Authors):

      Overall, I found the experiments discussed in the manuscript to be excellent. My primary suggestion is to slightly expand the introduction and discussion to put this work in context better. Additionally, the writing is unclear in places and would be helped by a careful edit.

      We appreciate the reviewer’s constructive critiques and would like to thank him/her for the experimental suggestions, which we have taken into account in the revised version of the manuscript. We trust that the additional modification of the text will satisfactorily address the reviewer’s concerns.

      In more detail:

      The introduction is well-written but could be improved by discussing how these studies build earlier work. Through the 1980s and 90s, an important question was whether growth cone guidance occurred as the result of chemical cues that altered the activity of signaling pathways or differences in the adhesion between growth cones and substrates. While there was some clear evidence that growth cones were steered to more adhesive substrates (Hammarback and Letourneau, 1986), there were also important exceptions. For example, (Calof and Lander, 1991) examined the biophysical relationship between neuronal migration and substrate adhesion and found that laminin, which tends to support rapid migration and neurite outgrowth, tended to decrease adhesion.

      Thank you for critical comments to our manuscript. We have modified the introduction to discuss our understanding of the growth cone guidance, particularly regarding the role of neurite migration and substrate adhesion into introduction (line 38-40, 42-44) in revised manuscript.

      To better understand the relationship between substrate adhesion and outgrowth, Heidemann's group (Zheng et al., 1994) was, to the best of my knowledge, the first paper to directly measure the force required to detach growth cones from substrates; including laminin and L1. For DRG neurons, this was ~ 1000 - 3000 dynes (i.e., 10 to 30 nN) and they noted that traction force generation is 3 to 15 times less than the force needed to dislodge growth cones. Additionally, that manuscript goes on to suggest, "These data argue against the differential adhesion mechanism for growth cone guidance preferences in culture." With the rising development of powerful molecular genetic tools and a growing appreciation of the importance of signaling pathways in neurite outgrowth (Huber et al., 2003), the field as the whole has focused on the molecular aspects of growth cone guidance, leaving many aspects of the physical process of neurite outgrowth unanswered. The strength of this manuscript is that it develops a new method for measuring growth cone adhesion forces, which reassuringly generates similar results to classic studies. In turn, it combines this with molecular genetic analysis to determine the contribution L1-LN interaction makes to the overall adhesion strength.

      We will ensure that the manuscript explicitly acknowledges the significance of Zheng et al. (1994) in shaping the field and clarifies how our study expands upon these foundational findings. Following the reviewer’s suggestion we have added Zheng et al. (1994) in reference and modified discussion (line 272-281, Ref. 17) in revised manuscript.

      There are also a couple of other papers directly relevant to this work. In particular, (Koch et al., 2012) measured the traction forces generated by hippocampal neurons on polyacrylamide gels. They estimated it to be ~ 5 to 10 Pa. While the overall results are similar, in this manuscript, it is reported that the forces generated by hippocampal neurons are significantly higher, in the range of 25-75 Pa. I don't have an issue with this difference, but please look at the Koch paper and see if there is some technical reason for the different estimates of traction forces. Along these lines, please note the Young's modulus of the gels used in the experiments.

      As you mentioned, the traction force measured in our experiments is more than 5 times stronger than that reported by Koch et al., While the exact reason remains unclear, difference in gel-coating may have influenced the result. In the study by Koch et al., pre-coating was performed using Cell-Tak before laminin coating. in contrast, our study used poly-lysin for pre-coating. This methodological difference may have affected the measurement of traction force. However, at least, our experiments have consistently yielded reproducible results.

      (Nichol et al., 2016) nicely shows an inverse relationship between RF rate and LN density at low concentrations. While the results reported here are similar, a strength of this paper is that it extends the work to higher LN concentrations.

      Thank you for pointing out the relevance of Nichol et al., 2016 to our study. We agree that their study provides important insights into the relationship between RF rate and LN density at low concentrations. The novelty our study lies not only in extending the analysis to higher LN concentrations, but also performed analysis that include adhesion strength, traction force, and migration rate in the growth cone. We have included this discussion (line 259-261, Ref. 26) in revised manuscript.

      My understanding is that the biphasic effect of LN in neurite outgrowth was previously established. For example, Buetter and Pittman, 1991 note a biphasic effect of LN conc on some parameters of neurite outgrowth, such as RMS, a measure of growth cone velocity, but not others, such as total neurite length. Likewise, (Schense and Hubbell, 2000) noted a biphasic effect of RGB peptides on outgrowth. In light of this, it would seem the main contribution of this paper is the finding that traction force generation has a bi-phasic relationship with LN concentration.

      Thank you for your thoughtful comment. We agree that the main contribution of this study is demonstrating that the biphasic behavior of axon migration arises from the biphasic dependence of the traction force on laminin concentration. We have included this discussion (line 272-281, Ref. 31) in the revised manuscript.

      Please appreciate that I'm not asking the authors to copy-paste the text above into the manuscript. Instead, the references provide a starting point for better explaining the novel contributions here. The interaction of adhesions, traction force generation, the rate of neurite outgrowth, and biophysics of growth cone guidance is a classic problem in neuronal mechanics but is far from solved. My hope is that this manuscript might inspire more interest in this problem.

      Thank you for your thoughtful feedback and for highlighting the importance of better contextualizing our novel contributions within the broader field of neuronal mechanics. We appreciate your emphasis on the classic yet unresolved nature of the interactions between adhesions, traction force generation, axon outgrowth rate, and the biophysics of growth cone guidance.

      We hope these revisions help strengthen the manuscript’s impact and inspire further investigation into this important problem. We appreciate your insightful comments and the opportunity to improve our work.

      The text would be improved with a careful copy edit, for example:

      The last sentence of the introduction currently reads, "We suggested mechanism of the axon outgrowth which depends on the density of laminin on the substrate, revealing L1CAM-laminin binding as a mechanism for the regulation of axon outgrowth." which is challenging to understand.

      We appreciate the reviewer’s comment pointing out the lack of clarity in the final sentence of the introduction. To improve readability and clarity, we have revised the sentence as follows:

      “In this study, we suggested mechanism of the axon outgrowth that depends on the density of laminin on the substrate, i.e. the L1CAM-laminin binding is key to the regulation of axon outgrowth..” We believe this revised version better conveys our main finding in a more concise and comprehensible manner.

      Line 224 needs to be F-actin and the next sentence is difficult to understand.

      Thank you for pointing this out. We have corrected "F-action" to "F-actin" to ensure accuracy (line 256). Additionally, we have revised the following sentence to improve clarity (line 256-258).

      Line 232 instead of "traction force slows", did you mean the rate of retrograde flow slows?

      Thank you for pointing this out. We mean to refer to the rate of retrograde flow, not the traction force itself. We have revised the wording accordingly to avoid confusion (line 266).

      Line 242, shear-stress instead of share-stress.

      We have corrected the typo into "shear-stress" (line 282).

      Lines 255, 267, and the abstract. The paper doesn't directly address axonal guidance. It would be more accurate to replace axonal guidance with neurite outgrowth.

      Thank you for your insightful comment. We agree that the term "neurite outgrowth" more accurately reflects the scope of our study, as we do not directly examine the mechanisms of axonal guidance. Accordingly, we have revised the text in Lines 273, 275, and the abstract to replace "axonal guidance" with "neurite outgrowth" to better align with the presented data and experimental focus.

      Line 362, perhaps reference (Minegishi et al., 2021) here as it provides a nice explanation of the technique.

      Thank you for the helpful suggestion. We have now added a reference to Minegishi et al., 2021 (line 416, Ref.35) in revised manuscript, as it indeed provides a clear explanation of the method.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this paper, Behruznia and colleagues use long-read sequencing data for 339 strains of the Mycobacterium tuberculosis complex to study genome evolution in this clonal bacterial pathogen. They use both a "classical" pangenome approach that looks at the presence and absence of genes, and a pangenome graph based on whole genomes in order to investigate structural variants in non-coding regions. The comparison of the two approaches is informative and shows that much is missed when focussing only on genes. The two main biological results of the study are that 1) the MTBC has a small pangenome with few accessory genes, and that 2) pangenome evolution is driven by genome reduction. In the revised article, the description of the data set and the methods is much improved, and the comparison of the two pangenome approaches is more consistent. I still think, however, that the discussion of genome reduction suffers from a basic flaw, namely the failure to distinguish clearly between orthologs and homologs/paralogs.

      Strengths:

      The authors put together the so-far largest data set of long-read assemblies representing most lineages of the Mycobacterium tuberculosis context, and covering a large geographic area. They sequenced and assembled genomes for strains of M. pinnipedi, L9, and La2, for which no high-quality assemblies were available previously. State-of-the-art methods are used to analyze gene presence-absence polymorphisms (Panaroo) and to construct a pangenome graph (PanGraph). Additional analysis steps are performed to address known problems with misannotated or misassembled genes.

      Weaknesses:

      The revised manuscript has gained much clarity and consistency. One previous criticism, however, has in my opinion not been properly addressed. I think the problem boils down to not clearly distinguishing between orthologs and paralogs/homologs. As this problem affects a main conclusion - the prevalence of deletions over insertions in the MTBC - it should be addressed, if not through additional analyses, then at least in the discussion.

      Insertions and deletions are now distinguished in the following way: "Accessory regions were further classified as a deletion if present in over 50% of the 192 sub-lineages or an insertion/duplication if present in less than 50% of sub-lineages." The outcome of this classification is suspicious: not a single accessory region was classified as an insertion/duplication. As a check of sanity, I'd expect at least some insertions of IS6110 to show up, which has produced lineage- or sublineage-specific insertions (Roychowdhury et al. 2015, Shitikov et al. 2019). Why, for example, wouldn't IS6110 insertions in the single L8 strain show up here?

      In a fully clonal organism, any insertion/duplication will be an insertion/duplication of an existing sequence, and thus produce a paralog. If I'm correctly understanding your methods section, paralogs are systematically excluded in the pangraph analysis. Genomic blocks are summarized at the sublineage levels as follows (l.184 ): "The DNA sequences from genomic blocks present in at least one sub-lineage but completely absent in others were extracted to look for long-term evolution patterns in the pangenome." I presume this is done using blastn, as in other steps of the analysis.

      So a sublineage-specific copy of IS6110 would be excluded here, because IS6110 is present somewhere in the genome in all sublineages. However, the appropriate category of comparison, at least for the discussion of genome reduction, is orthology rather than homology: is the same, orthologous copy of IS6110, at the same position in the genome, present or absent in other sublineages? The same considerations apply to potential sublineage-specific duplicates of PE, PPE, and Esx genes. These gene families play important roles in host-pathogen interactions, so I'd argue that the neglect of paralogs is not a finicky detail, but could be of broader biological relevance.

      Reviewer #2 (Public review):

      Summary:

      The authors attempted to investigate the pangenome of MTBC by using a selection of state-of-the-art bioinformatic tools to analyse 324 complete and 11 new genomes representing all known lineages and sublineages. The aim of their work was to describe the total diversity of the MTBC and to investigate the driving evolutionary force. By using long read and hybrid approaches for genome assembly, an important attempt was made to understand why the MTBC pangenome size was reported to vary in size by previous reports. This study provides strong evidence that the MTBC pangenome is closed and that genome reduction is the main driver of this species evolution.

      Strengths:

      A stand-out feature of this work is the inclusion of non-coding regions as opposed to only coding regions which was a focus of previous papers and analyses which investigated the MTBC pangenome. A unique feature of this work is that it highlights sublineage-specific regions of difference (RDs) that was previously unknown. Another major strength is the utilisation of long-read whole genomes sequences, in combination with short-read sequences when available. It is known that using only short reads for genome assembly has several pitfalls. The parallel approach of utilizing both Panaroo and Pangraph for pangenomic reconstruction illuminated limitations of both tools while highlighting genomic features identified by both. This is important for any future work and perhaps alludes to the need for more MTBC-specific tools to be developed. Lastly, ample statistical support in the form of Heaps law and genome fluidity calculations for each pangenome to demonstrate that they are indeed closed.

      Weaknesses:

      There are no major weaknesses in the revised version of this manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      l. 27: "lineage-specific and -independent deletions": it is still not clear to me what a lineage-independent, or convergent, deletion is supposed to be. TBD1, for instance, is not lineage-specific, but it is also not convergent: it occurred once in the common ancestor of lineages 1, 2, and 3, while convergence implies multiple parallel occurrences.

      We have changed this and in other places to more evolutionary terms, such as divergent (single event) and convergent (multiple events), or explain exactly what is meant where needed.

      l. 118: "where relevant", what does that mean?

      This was superfluous to the description and so is now removed.

      l. 178ff.: It is not clear to me what issue is addressed by this correction of the pangenome graph. Also here there seems to be some confusion regarding orthologs and paralogs. A gene or IS copy can be present at one locus but absent at another, which is not a mistake of Pangraph that would require correction. It's rather the notion of "truly absent region" which is ambiguous.

      We have changed the text to be more specific on the utility of this step. Since it is known that Panaroo mislabels some genes as being absent due to over splitting (see Ceres et al 2022 and our reclassification earlier in the paper), we wanted to see if the same occurred in Pangraph. We have modified the methods text to be more specific (line 181) and in the results included the percentage of total genes/regions affected by this correction.

      In relation to copy number, Pangraph is not syntenic in its approach; if a region is present anywhere it is labelled as present in the genome. Pangraph will look for multiple copies of that region (e.g. an IS element) but indeed we did not look for specific syntenic changes across the genomes. This would be a great analysis and something we will consider in the future; we have indicated such in the discussion (line 454).

      l. 305: "mislabelled as absent": see above, is this really 'mislabelled'?

      See answer to question above

      l. 372: "using the approach": something missing here.

      This was superfluous to the description and so is now removed.

      l. 381: the "additional analysis of paralogous blocks" (l. 381) seems to suffer from the same confusion of ortho- and paralogy described above: no new sub-lineage-specific accessory regions are found presumably because the analysis did consider any copy rather than orthologous copies.

      Paralogous copies were looked for by Pangraph, and we did not find any sub-lineage where all members had additional copies compared to other sub-lineages. Indeed, single genomes could have these, and shorter timescales could see a lot of such insertions, but we looked at longer-scale (all genomes within a sub-lineage) patterns and did not find these. These limitations are already outlined in the discussion.

      l. 415: see above. There is no diagnosis of a problem that would motivate a "correction". That's different from the correction of the Panaroo results, where fragmented annotations have been shown to be a problem.

      Of interest, the refining of regions did re-label multiple regions as being core when Pangraph labelled it as absent from some genomes was at about the same rate as the correction to Pangraph (2% of genes/regions). This indicates there is a stringency issue with pangraph where blocks are mislabelled as absent. The underlying reason or this is not clear but the correction is evidently required in this version of Pangraph.

      l. 430ff.: The issue of paralogy and that the "same" gene or region is defined in terms of homology rather than orthology should be addressed here. For me the given evidence does not support the claim that deletion is driving molecular evolution in the MTBC.

      As outlined above, indeed paralogy may be driving some elements of the overall evolutionary patterns; our analysis just did not find this. Panaroo without merged paralogs did not find paralogous genes as a main differentiating factor for any sub-lineage. Pangraph also did not find multiple copies of blocks present in all genomes in a sub-lineage. As outlined above, indeed single genomes show such patterns but we did not include single genome analyses here, and outline that as a next steps in the discussion. We have also linked to a recent pangenome paper that showed duplication is present in the pangenome of Mtbc, although not related to any specific lineage (Discussion line 485).

      l. 443 ff: "lineage-independent deletions (convergent evolution)": see above, I still think this terminology is unclear

      This has now been made clearer to be specifically about convergent and divergent evolutionary patterns.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      In this study, the authors investigate mechanisms of acquired resistance (AR) to KRAS-G12C inhibitors (sotorasib) in NSCLC, proposing that resistance arises from signaling rewiring rather than additional mutations.

      Strengths:

      Using a panel of AR models - including cell lines, PDXs, CDXs, and PDXOs - they report activation of KRAS and PI3K/AKT/mTOR pathways, with elevated PI3K levels. Pharmacologic inhibition or CRISPR-Cas9 knockout of PI3K partially restores sotorasib sensitivity, and p-4EBP1 upregulation is implicated as an additional contributor, with dual mTORC1/2 inhibition more effective than mTORC1 inhibition alone.

      Weaknesses:

      While the study addresses an important clinical question, it is limited by several weaknesses in experimental rigor, data interpretation, and presentation. The mechanistic findings are not entirely novel, since the role of PI3K-AKT-mTOR signaling in therapeutic resistance is already well-established in the literature. Rather than uncovering new resistance mechanisms, the study largely confirms known pathways. Several key conclusions are not supported by the data, and critical alternative explanations - such as additional mutations or increased KRAS expression - are not thoroughly investigated or ruled out. Furthermore, while the authors use CRISPR-Cas9 to knock out PI3K and 4E-BP1 in H23-AR and H358-AR cells to restore sotorasib sensitivity, they do not perform reconstitution experiments to confirm that re-expressing PI3K or 4E-BP1 reverses the sensitization. This prevents full characterization of PI3K and p-4EBP1 upregulation as contributors to resistance. The manuscript also has several errors, poor figure quality, and a lack of proper quantification. Additional experimental validation, data improvement, and text revisions are required.

      Acquired resistance to KRAS<sup>G12C</sup> inhibitors such as sotorasib or adagrasib remains a significant clinical challenge. Therefore, the identification of mechanisms of acquired resistance, along with the development of alternative therapeutic strategies, including combination therapies with KRAS inhibitors, represents an urgent unmet clinical need. The emergence of secondary KRAS mutations or new mutations in other oncogenic drivers has been observed as a primary cause of acquired resistance in a fraction of patients. No identifiable mutations were detected in more than half of the tumors from patients who developed acquired resistance after treatment with sotorasib or adagrasib.

      Using a discovery-based approach that integrated global proteomic and phosphoproteomic analyses in the TC303AR and TC314AR PDX models, we identified distinct protein signatures associated with KRAS reactivation, upregulation of mTORC1 signaling, and activation of the PI3K/AKT/mTOR pathway. These findings prompted further investigation into these mechanisms of resistance and evaluation of novel therapeutic combinations to overcome resistance. Notably, the combination of sotorasib with copanlisib (a PI3K inhibitor), or the combination of sotorasib with AZD8055 or sapanisertib (mTORC1/2 dual inhibitors) demonstrated strong potential for future clinical use. These regimens effectively restored sotorasib sensitivity in both in vitro and in vivo models and produced robust, synergistic antitumor effects across various acquired resistance models.

      CRISPR-Cas9-mediated PI3K and 4E-BP1 knockout clones were generated in more than one resistant cell line that expressed a robust level of the knockout target, and multiple independent clones in each cell line were evaluated with and without gene disruption. Given the thorough nature of this analysis, additional reconstitution experiments were deemed unnecessary, as they would not yield further insight.

      Whole exome sequencing was performed on resistant cells or PDX models to confirm retention of the KRAS<sup>G12C</sup> mutation and to identify secondary KRAS mutations, none of which were found. We acknowledge that additional resistance mechanisms may be involved. These will be the focus of future investigations.

      The revised manuscript will feature improved figure quality, complete and clarified figure legends, and corrected textual errors to enhance overall clarity and presentation.  

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors focus on the identification of the mechanisms involved in the acquired resistance to Sotorasib in non-small lung KRASG12C mutant cells. To perform this study, the authors generate different clones of cell lines, cell-derived xenografts, patient-derived xenograft organoids, and patient-derived xenografts. In all these models, the authors generate resistant forms (i.e., resistant cell lines PDXs and organoids) and the genetic and molecular changes were characterised using whole-exome sequencing, proteomics, and phospho-proteomics. This analysis led to the identification of an important role of the PI3K/AKT/mTORC1/2 signalling network in the acquisition of resistance in several of the models tested. Molecular characterisation identified changes in the expression of some of the proteins in this network as key changes for the acquisition of resistance, and in particular, the authors show that changes in 4E-BP1 are common to some of the cells downstream of PI3K. Using pharmacological testing, they show that different drugs targeting PI3K, AKT, and MTORC1/2 sensitise some of the resistant models to Sotorasib. The analyses showed that the PI3K inhibitor copanlisib has an effect in NSCLC cells that, in some cases, seems to be synergistic with Sotorasib. Based on the work performed, the authors conclude that the PI3K/mTORC1/2 mediated 4E-BP1 phosphorylation is one of the mechanisms associated with the acquisition of resistance to Sotorasib and that targeting this signalling module could result in effective treatments for NSCLC patients.

      The work as presented in the current manuscript is very interesting, provides cell models that benefit the community, and can be used to expand our knowledge of the mechanism of resistance to KRAS targeting therapies. Overall, the techniques and methodology seem to be performed in agreement with standard practice, and the results support most of the conclusions made by the authors. However, there are some points that, if addressed, would increase the value and relevance of the findings and further extend the impact of this work. Some of the recommendations for changes relate to the way things are explained and presented, which need some work. Other changes might require the performance of additional experiments or reanalysis of the existing data.

      Strengths:

      (1) One of the stronger contributions of this article is the different models used to study the acquisition of resistance to Sotorasib. The resistant cell lines, PDXs and PDXOs, and the fact that the authors have different clones for each, made this collection especially relevant, as they seem to show different mechanisms that the cells used to become resistant to Sotorasib. Although logically, the authors focus on one of these mechanisms, the differential responses of the different clones and models to the treatments used in this work show that some of the clones used additional mechanisms of resistance that can be explored in other studies. Importantly, as they use in vitro and in vivo models, the results also consider the tumour microenvironment and other factors in the response to the treatments.

      (2) Another strength is the molecular characterisation of the different Sotorasib-resistant tumour cells by WES, which shows that these cells do not seem to acquire secondary mutations.

      (3) The use of MS-based proteomics also identifies proteome signatures that are associated with the acquisition of resistance, including PI3K/mTORC1/2. The combination of proteomics and phospho-proteomics results should allow the identification of several mechanisms that are deregulated in Sotorasib-resistant cells.

      (4) The results show a strong response of the NSCLC cells and PDXs to copanlisib, a drug for which there is limited information in this cancer type.

      (5) The way they develop the PDX-resistant and the PDXO seems to be appropriate.

      Weaknesses:

      In general, the data is of good quality, but due to the sheer amount of data included and the way it is presented and discussed, several of the claims or conclusions are not clear.

      (1) The abstract is rather long and gives details that are not usually included in one. This makes it very complicated to identify the most relevant findings of the work. The use of acronyms PDX, PDXO, and CDX without defining them makes it complicated for the non-specialist to know what the models are. Rewriting and reorganisation of the abstract would benefit the manuscript.

      We will revise the abstract to ensure that the key findings and overall message are clearly communicated and easily understood by readers.

      2) Expression, presentation, and grammar should be reviewed in all sections of the manuscript.

      Will be done accordingly in the revised version

      (3) In the different parts of the result section where the models shown in Figure 2 are described the authors indicate "Whole-exome sequencing (WES) confirmed that XXX model retained the KRASG12C mutation with no additional KRAS mutations detected" however, it is not indicated where this data is shown and in not all the cases there is explanation to other possible modifications that might relate to mechanisms of resistance. This information should be included in the manuscript, and the WES made publicly available.

      WES was done for KRAS to identify secondary mutations in the KRAS as well as to verify the retention of the KRAS<sup>G12C</sup> mutation in these AR models. WES data will be provided as supplements

      (4) The way the proteomics analysis of the TC303 and TC314 parental and resistant PDX is described in the text is confusing. The addition of an experimental layout figure would facilitate the understanding. As it is written, it is not obvious that the parental PDX were also analysed. For instance, the authors say, "The global and phosphoproteomic analyses identified over 8,000 and 4,000 gene protein products (GPPs), respectively". Is this comparing only resistant cells, or from the comparison of the parental and resistant pairs? And where are these numbers presented in the figures? Also, there is information that seems more adequate for the materials and methods sections, i.e., "Samples were analyzed using label-free nanoscale liquid chromatography coupled with tandem mass spectrometry (nanoLC-MS/MS) on a Thermo Fusion Mass Spectrometer. The resulting data were processed and quantified using the Proteome Discoverer 2.5 interface with the Mascot search engine, referencing the NCBI RefSeq protein database (Saltzman, Ruprecht). Two-component analysis is better named principal component analysis."

      The texts will be revised accordingly

      (5) While the presentation of the proteomics data could be done in different ways, the way the data is presented in Figure 3 does not allow the reader to get an idea of many of the findings from this experiment. Although it is indicated that a table with the data will be made available, this should be central to the way the data is presented and explained. A table (ie, Excel doc) where the raw data and all the analysis are presented should be included and referenced. Additionally, heat maps for the whole proteomes identified should be included. In the text, it is said, "Global proteomic heatmap analysis revealed unique protein profiles in TC303AR and TC314AR PDXs compared to their sensitive counterparts (Figure 3C)." However, this figure only shows the histogram of the differentially regulated cells. Inclusion of the histogram showing all the cells is necessary, and it might be informative to include the histogram comparing the two isogenic pairs, which could identify common mechanisms and differences between both sets. In Figure 3C, the protein names should be readable, or a reference to tables where the proteins are listed should be included.

      The raw data associated with the proteomics and global proteomics will be added as supplements.

      (6) In Figure 3, the pathway enrichment tool and GO used should be mentioned in the text. The tables with all significant tables should also be provided. The proteomics data seems to convincingly identify mTOR as one of the pathways deregulated in resistant cells, but there is little explanation of what is considered a significant FDR value and if there are other pathways or networks that are also modified, which might not be common to both isogenic models. In MS-based Phosphoproteome could help with the identification of differentially regulated pathways, but it is not really presented in the current manuscript. Most of the analysis of phospho-proteomics comes from the RPPA analysis, which is targeted proteomics. With the way the data is presented, the authors show evidence for a role of mTOR in the acquisition of resistance, but unfortunately, they do not discuss or allow the reader to explore if other pathways might also contribute to this change.

      The authors agree that other pathways may be involved, and this will be the subject of future studies. The raw data will be added as supplements.

      (7) Where is the proteomics data going to be deposited, and will it be made public to comply with FAIR principles?

      will be uploaded according to the journal guidelines

      (8) The authors claim that the resistance shown for H23AR and H353AR cells is due to reactivation of KRAS signalling. This is done by looking to phosphorylation of ERK as a surrogate, as they claim, "KRAS inhibition is commonly assessed by evaluating the inhibition of ERK phosphorylation (p-ERK)". While this might be true in many cases, the data presented does not demonstrate that the increase in p-ERK is due to reactivation of KRAS. To make this claim, the authors should measure activation of KRAS (and possibly H- and NRAS) using GST-pull down or an image-based method.

      We agree that KRAS activation can be assessed through various methods. In this manuscript, which primarily focuses on mechanisms of resistance, pathway analysis revealed upregulation of KRAS signaling. This finding correlated with the incomplete inhibition of p-ERK by sotorasib in resistant cells. Notably, p-ERK status is widely recognized and routinely used as a surrogate marker for KRAS pathway activation.

      (9) The experiments in Figure 4 are very confusing, and some controls are missing. There is no blot where they show the effect of Sotorasib treatment in H23 and H358 parental cells. Is the increase shown in resistant cells shown in parental or is it exclusive for resistant cells only (and therefore acquired)? Experiment 4B should include this control. What is clear is that there is an increase in the expression of AKT and PI3K.

      H23 and H358 cells are highly sensitive to sotorasib, as demonstrated by the cell viability assays presented in Figure 2. As shown in Figure 3—figure supplement 3, sotorasib treatment led to complete inhibition of p-ERK in these parental cell lines. In contrast, p-ERK inhibition was incomplete in the resistant H23AR and H358AR cells. Moreover, these AR cells were continuously cultured under sotorasib pressure to maintain resistance.

      (10) The main point here is whether this is acquired resistance or the sensitivity to the drug is already there, and there was no need to do an omics experiment to find this. In some cases, it seems that the single treatment with PI3K inhibitors is as effective as Sotorasib treatment, promoting the death of the parental cells. This is in line with previous data in H23 and H353 that show sensitivity to PI3K inhibition ( i.e., H358 10.1016/j.jtcvs.2005.06.051 ; 10.1016/j.jtcvs.2005.06.051H23 10.20892/j.issn.2095-3941.2018.0361). The data is clear, especially for copanlisib, but would it be the case that this treatment could be used for the treatment of NSCLC alone or directly in combination with Sotorasib and prevent resistance? The results shown in Figure 4C strongly support that a single treatment might be effective in cases that do not respond to Sotorasib. The data in figure 4D-F (please correct typo "inhibition" in labels) seem to support that PI3K treatment of parental cells is as effective as in the resistant cells.

      We agree. Based on our in vitro (Figure 4) and in vivo (Figure 7) data, copanlisib was able to overcome sotorasib resistance, demonstrating either synergistic or additive effects depending on the specific model. These findings support the potential of combining PI3K inhibition with KRAS<sup>G12C</sup> inhibition as a promising strategy to address acquired resistance.

      (11) The experiments presented in Figure 7 show synergy between Sotorasib and copanlisib treatment in some of the resistant cells. But in Figure 7G, the single treatment of H23AR is as effective as the combination. Did the authors check the effect of this drug on the parental cells? As they do not include this control, it is not possible to know if this is acquired sensitivity to PI3K inhibition or if the parental cells were already sensitive (as indicated by the Figure 4 results).

      Both H23 and H23AR cells showed high sensitivity to copanlisib, as shown in Figure 4. Combination index analysis for the copanlisib + sotorasib treatment (Figure 7A) revealed synergistic effects on cell viability at specific concentrations. However, in the in vivo experiment (Figure 7G), we did not observe a clear synergistic effect of the combination treatment against H23AR xenografts. This may be attributed to the dose of copanlisib used, which was potentially sufficient on its own to produce a strong antitumor response, thereby masking any additional benefit from the combination.

    1. Author response:

      Reviewer #1 (Recommendations for the authors):

      We appreciate the reviewer recognising that our study has been carefully performed and provides a valuable resource for the community. The characterization of Repo-man proline hydroxylation is also recognised as a novel finding.

      With respect to Concerns raised by reviewer 1:

      (1) The study applied HILIC-based chromatographic separation with a goal of enriching and separating hydroxyproline-containing peptides. However, as the authors mentioned, such an approach is not specific to proline hydroxylation. In addition, many other chromatography techniques can achieve deep proteome fractionation such as high pH reverse phase fractionation, strong-cation exchange etc. There was no data in this study to demonstrate that the strategy offered improved coverage of proline hydroxylation proteins, as the identifications of the HyPro sites could be achieved through deep fractionation and a highly sensitive LCMS setup. The data of Figure 2A and S1A were somewhat confusing without a clear explanation of the heat map representations.

      We do not agree that the apparent concern raised here, i.e., that the method we present is not 100% specific for enriching only hydroxylated peptides, is a serious issue. We show specifically that our method indeed enriches samples for hydroxylated peptides, thereby increasing the chances of identifying proline hydroxylated peptides in a cell extract. We never claimed that it was mono-specific for enrichment of hydroxylated peptides. Further, we note that almost no chromatographic method we know of, including those commonly used to enrich for different types of post translationally-modified peptides (including phospho-peptides) is completely mono-specific for a single type of modified peptide. The reviewer comments that it could have been possible to use alternative methods to identify proline-hydroxylated peptides. This may be true, but we know of no published examples, or previous studies, where this has been demonstrated experimentally on a scale comparable to that we show here. Of course there is always more than one way to approach technical challenges and it may be that future methods will be demonstrated that achieve equivalent, or even superior, results with respect to the detection of proline hydroxylated peptides. To the best of our knowledge, however, our current study provides a robust methodology that goes well beyond any previously published analysis of proline hydroxylation.

      (2) The study reported that the HyPro immonium ion is a diagnostic ion for HyPro identification. However, the data showed that only around 5% of the identifications had such a diagnostic ion. In comparison, acetyllysine immonium ion was previously reported to be a useful marker for acetyllysine peptides (PMID: 18338905), and the strategy offered a sensitivity of 70% with a specificity of 98%. In this study, the sensitivity of HyPro immonium ion was quite low. The authors also clearly demonstrated that the presence of immonium ion varied significantly due to MS settings, peptide sequence, and abundance. With further complications from L/I immonium ions, it became very challenging to implement this strategy in a global LC-MS analysis to either validate or invalidate HyPro identifications.

      We feel that the reviewer’s initial comment is potentially misleading - it implies that we were proposing here that the 'HyPro immonium ion is a diagnostic ion for HyPro identification’. In contrast, this concept was already widely held in the field before we started this project. Indeed, the fact that the diagnostic HyPro immonium ion is often difficult to detect, has been used as one of the arguments by other researchers to support the view that HIF-α is the only physiologically relevant target for PHD enzymes, a controversy referenced explicitly by Reviewer 2 below. What we actually show here are novel data that help to explain why the diagnostic HyPro immonium ion is often difficult to detect, when standard approaches and technical parameters for MS analysis are used. We beleive that this observation, along with other data we present, is a useful contribution to the field that can help to resolve the previous controversies concerning the true prevalence and biological roles of PHD-catalysed proline hydroxylation on protein targets.

      (3) The study aimed to apply the HILIC-based proteomics workflow to identify HyPro proteins regulated by the PHD enzyme. However, the quantification strategy was not rigorous. The study just considered the HyPro proteins not identified by FG-4592 treatment as potential PHD targeted proteins. There are a few issues. First, such an analysis was not quantitative without reproducibility or statistical analysis. Second, it did not take into consideration that data-dependent LC-MS analysis was not comprehensive and some peptide ions may not be identified due to background interferences. Lastly, FG-4592 treatment for 24 hrs could lead to wide changes in gene expressions and protein abundances. Therefore, it is not informative to draw conclusions based on the data for bioinformatic analysis.

      We agree that this study is not quantifying or addressing the stoichiometry of proline hydroxylation across the very large number of new PHD target sites we identify. That was not claimed and was not the objective of our study. Nonetheless, we feel the comments of the referee do not adequately take into account the SILAC data we included (cf Figure 8) or the full range of experimental data presented in this study. We would further refer the reviewer also to the data presented in the companion paper by Druker et al., which we cross-referenced extensively in our study and have also made available previously on biorxiv.

      (4) The authors performed an in vitro PHD1 enzyme assay to validate that Repo-man can be hydroxylated by PHD1. However, Figure 9 did not show quantitatively PHD1-induced increase in Repo-man HyPro abundance and it is difficult to assess its reaction efficiency to compare with HIF1a HyPro.

      Here again we refer to the recent controversy referenced explicitly by Reviewer 2 below, concerning the view expressed by some researchers that only HIF-α is a physiological substrate for PHD enzymes in cells. We were challenged to show that any of the novel protein targets of PHDs we identified were indeed hydroxylated by PHD enzymes in vitro and that is what we demonstrated in Figure 9. This was not an experiment performed to quantify stoichiometry and indeed, it is not possible to draw any firm conclusions about efficiency or stiochiometry in vitro when using catalytic PHD subunits alone, given that we do not yet know whether PHDs may show different properties in cells, dependent on interactions with other factors and/or modifications.

      Reviewer #2 (Recommendations for the authors):

      We appreciate the reviewer’s comments that our manuscript presents an advanced, standardized protocol for identifying proline hydroxylation, with well designed experiments, which may help resolve confusion in the field.

      With respect to Concerns raised by reviewer 2:

      (1) The authors should provide a summary of the standard protocol for identifying proline hydroxylation sites in proteins that can easily be followed by others.

      We agree and plan to provide a clearly described, step by step guide to assist other researchers who wish to employ our methods for proline hydroxylation analysis in their own studies.

      (2) Cockman et al. proposed that HIF-α is the only physiologically relevant target for PHDs. Their approach is considered the gold standard for identifying PHD targets. Therefore, the authors should discuss the major progress they made in this manuscript that challenges Cockman's conclusion.

      We agree that our study provides valuable information germane to the recent controversy in the field and the views published by Cockman et al., to the effect that HIF-α is the only physiologically relevant target for PHDs. We will carefully review our statements when preparing a suitably revised version of record with the aim of providing a balanced and objective discussion of this issue.

      Reviewer #3 (Recommendations for the authors):

      We appreciate the reviewer’s comments that our study employs state-of-the-art mass spectrometric techniques with optimized collision parameters to ensure proper detection of the immonium ions, along with their recognition that our study is, 'an advance compared to other similar approaches before.’ We also appreciate their reference to our companion study by Druker et al, in which we characterise the mechanism and biological role in regulation of mitotic progression of the hydroxylation of P604 in the target protein RepoMan (CDCA2), that is identified in this study.

      With respect to the Concern raised by reviewer 3:

      Despite the authors' claim about the specificity of this method in picking up the intended peptides, there is a good amount of potential false positives that also happen to get picked (owing to the limitations of MS-based readout), and the authors' criteria for downstream filtering of such peptides require further clarification. In the same vein, greater and more diverse cell-based validation approach will be helpful to substantiate the claims regarding enrichment of peptides in the described pathway analyses..

      We agree that this study, which has a focus on methodology and technical approaches for detecting sites of PHD- catalysed proline hydroxylation, cannot exhaustively validate the biological significance of all of the putative sites and targets identified. As the reviewer notes, we have performed a detailed functional characterisation of one such novel PHD-catalyed proline hydroxylation site, i.e. P604 in the protein RepoMan (CDCA2). This functional analysis is presented in the companion paper by Druker et al., which has also been reviewed by eLife and placed on biorxiv (doi: https://doi.org/10.1101/2025.05.06.652400). We hope that publication of our identification of many new putative PHD target sites will encourage other researchers to pursue characterisation of their functional reoles in different biological mechanisms and have tried here to provide some degree of guidance to focus attention on the identification of those sites for which we currently have highest confidence.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The investigators undertook detailed characterization of a previously proposed membrane targeting sequence (MTS), a short N-terminal peptide, of the bactofilin BacA in Caulobacter crescentus. Using light microscopy, single molecule tracking, liposome binding assays, and molecular dynamics simulations, they provide data to suggest that this sequence indeed does function in membrane targeting and further conclude that membrane targeting is required for polymerization. While the membrane association data are reasonably convincing, there are no direct assays to assess polymerization and some assays used lack proper controls as detailed below. Since the MTS isn't required for bactofilin polymerization in other bacterial homologues, showing that membrane binding facilitates polymerization would be a significant advance for the field.

      We agree that additional experiments were required to consolidate our results and conclusions. Please see below for a description of the new data included in the revised version of the manuscript.

      Major concerns

      (1) This work claims that the N-termina MTS domain of BacA is required for polymerization, but they do not provide sufficient evidence that the ∆2-8 mutant or any of the other MTS variants actually do not polymerize (or form higher order structures). Bactofilins are known to form filaments, bundles of filaments, and lattice sheets in vitro and bundles of filaments have been observed in cells. Whether puncta or diffuse labeling represents different polymerized states or filaments vs. monomers has not been established. Microscopy shows mis-localization away from the stalk, but resolution is limited. Further experiments using higher resolution microscopy and TEM of purified protein would prove that the MTS is required for polymerization.

      We do not propose that the MTS is directly involved in the polymerization process and state this more clearly now in the Results and Discussion sections of the revised manuscript. To address this point, we performed transmission electron microscopy studies comparing the polymerization behavior of wild-type and mutant BacA variants. The results clearly show that the MTS-free BacA variant (∆2-8) forms polymers that are indistinguishable from those formed by the wild-type protein, when purified from an E. coli overproduction strain (new Figure 1–figure supplement 1). This finding is consistent with structural work showing that bactofilin polymerization is exclusively mediated by the conserved bactofilin domain (Deng et al, Nat Microbiol, 2019). However, at native expression levels, BacA only accumulates to ~200 molecules per cell (Kühn et al, EMBO J, 2006). Under these conditions, the MTS-mediated increase in the local concentration of BacA at the membrane surface and, potentially, steric constraints imposed by membrane curvature, may facilitate the polymerization process. This hypothesis has now been stated more clearly in the Results and Discussion sections.

      For polymer-forming proteins, defined localized signals are typically interpreted as slow-moving or stationary polymeric complexes. A diffuse localization, by contrast, suggests that a protein exists in a monomeric or, at most, (small) oligomeric state in which it diffuses rapidly within the cell and is thus no longer detected as distinct foci by widefield microscopy. Our single-molecule data show that BacA variants that are no longer able to interact with the membrane (as verified by cell fractionation studies and in vitro liposome binding assays) have a high diffusion rate, similar to that measured for the non-polymerizing and non-membrane-bound F130R variant. These results demonstrate that a defect in membrane binding strongly reduces the ability of BacA to form polymeric assemblies. To support this hypothesis, we have now repeated all single-particle tracking experiments and included mVenus as a freely diffusible reference protein. Our data confirm that the mobilities of the ∆2-8 and F130R variants are similar and approach those of free mVenus, supporting the idea that the deficiency to interact with the membrane prevents the formation of extended polymeric structures (which should show much lower mobilities). To underscore the relevance of membrane binding for BacA assembly, we have now included a new experiment, in which we used the PbpC membrane anchor (PbpC<sub>1-132</sub>-mcherry) to restore the recruitment of the ∆2-8 variant to the membrane (Figure 9 and Figure 9–figure supplement 1). The results obtained show that the ∆2-8 variant transitions from a diffuse localization to polar foci upon overproduction of PbpC<sub>1-132</sub>-mcherry. The polymerization-impaired F130R variant, by contrast, remains evenly distributed throughout the cytoplasm under all conditions. These findings further support the idea that polymerization and membrane-association are mutually interdependent processes.

      (2) Liposome binding data would be strengthened with TEM images to show BacA binding to liposomes. From this experiment, gross polymerization structures of MTS variants could also be characterized.

      We do not have the possibility to perform cryo-electron microscopy studies of liposomes bound to BacA. However, the results of the cell fractionation and liposome sedimentation assays clearly support a critical role of the MTS in membrane binding.

      (3) The use of the BacA F130R mutant throughout the study to probe the effect of polymerization on membrane binding is concerning as there is no evidence showing that this variant cannot polymerize. Looking through the papers the authors referenced, there was no evidence of an identical mutation in BacA that was shown to be depolymerized or any discussion in this study of how the F130R mutation might to analogous to polymerization-deficient variants in other bactofilins mentioned in these references.

      Residue F130 in the C-terminal polymerization interface of BacA is conserved among bactofilin homologs, although its absolute position in the protein sequence may vary, depending on the length of the N-terminal unstructured tail. The papers cited in our manuscript show that an exchange of this conserved phenylalanine residue abolishes polymer formation. Nevertheless, we agree that it is important to verify the polymerization defect of the F130R variant in the system under study. We have now included size-exclusion chromatography data showing that BacA-F130R forms a low-molecular-weight complex, whereas the wild-type protein largely elutes in the exclusion volume, indicating the formation of large, polymeric species (new Figure 1–figure supplement 1). In addition, we performed transmission electron microscopy analyses of BacA-F130R, which verified the absence of larger oligomers (new Figure 1–figure supplement 2).

      (4) Microscopy shows that a BacA variant lacking the native MTS regains the ability to form puncta, albeit mis-localized, in the cell when fused to a heterologous MTS from MreB. While this swap suggests a link between puncta formation and membrane binding the relationship between puncta and polymerization has not been established (see comment 1).

      We show that a BacA variant lacking the MTS (∆2-8) regains the ability to form membrane-associated foci when fused to the MTS of MreB. By contrast, a similar variant that additionally carries the F130R exchange (preventing its polymerization) shows a diffuse cytoplasmic localization. In addition, we show that the F130R exchange leads to a loss of membrane binding and to a considerable increase in the mobility of the variants carrying the MTS of E. coli MreB. As described above, we now provide additional data demonstrating that elevated levels of the PbpC membrane anchor can reinstate polar localization for the ∆2-8 variant, whereas it fails to do so for the polymerization-deficient F130R variant (Figure 9 and Figure 9–figure supplement 1). Together, these results support the hypothesis that membrane association and polymerization act synergistically to establish localized bactofilin assemblies at the stalked cell pole.

      (5) The authors provide no primary data for single molecule tracking. There is no tracking mapped onto microscopy images to show membrane localization or lack of localization in MTS deletion/ variants. A known soluble protein (e.g. unfused mVenus) and a known membrane bound protein would serve as valuable controls to interpret the data presented. It also is unclear why the authors chose to report molecular dynamics as mean squared displacement rather than mean squared displacement per unit time, and the number of localizations is not indicated. Extrapolating from the graph in figure 4 D for example, it looks like WT BacA-mVenus would have a mobility of 0.5 (0.02/0.04) micrometers squared per second which is approaching diffusive behavior. Further justification/details of their analysis method is needed. It's also not clear how one should interpret the finding that several of the double point mutants show higher displacement than deleting the entire MTS. These experiments as they stand don't account for any other cause of molecular behavior change and assume that a decrease in movement is synonymous with membrane binding.

      We now provide additional information on the single-particle analysis. A new supplemental figure now shows a mapping of single-particle tracks onto the cells in which they were recorded for all proteins analyzed (Figure 2–figure supplement 1). Due to the small size of C. crescentus, it is difficult to clearly differentiate between membrane-associated and cytoplasmic protein species. However, overall, slow-diffusing particles tend to be localized to the cell periphery, supporting the idea that membrane-associated particles form larger assemblies (apart from diffusing more slowly due to their membrane association). In addition, we have included a movie that shows the single-particle diffusion dynamics of all proteins in representative cells (Figure 2-video 1). Finally, we have included a table that gives an overview of the number of cells and tracks analyzed for all proteins investigated (Supplementary file 1). Figure 2A and 4D show the mean squared displacement as a function of time, which makes it possible to assess whether the particles observed move by normal, Brownian diffusion (which is the case here). We repeated the entire single-particle tracking analysis to verify the data obtained previously and obtained very similar results. Among the different mutant proteins, only the K4E-K7E variant consistently shows a higher mobility than the MTS-free ∆2-8 variant, with MSD values similar to that of free mVenus. The underlying reason remains unclear. However, we believe that an in-depth analysis of this phenomenon is beyond the scope of this paper. We re-confirmed the integrity of the construct encoding the K4E/K7E variant by DNA sequencing and once again verified the size and stability of the fusion protein by Western blot analysis, excluding artifacts due to errors during cloning and strain construction.

      We agree that the single-molecule tracking data alone are certainly not sufficient to draw firm conclusions on the relationship between membrane binding and protein mobility. However, they are consistent with the results of our other in vivo and in vitro analyses, which together indicate a clear correlation between the mobility of BacA and its ability to interact with the membrane and polymerize (processes that promote each other synergistically).

      (6) The experiments that map the interaction surface between the N-terminal unstructured region of PbpC and a specific part of the BacA bactofilin domain seem distinct from the main focus of the paper and the data somewhat preliminary. While the PbpC side has been probed by orthogonal approaches (mutation with localization in cells and affinity in vitro), the BacA region side has only been suggested by the deuterium exchange experiment and needs some kind of validation.

      The results of the HDX analysis per se are not preliminary and clearly show a change in the solvent accessibility of backbone amides in the C-terminal region in the bactofilin domain in the presence of the PbpC<sub>1-13</sub> peptide. However, we agree that additional experiments would be required to verify the binding site suggested by these data. We agree that further research is required to precisely map and verify the PbpC binding site. However, as this is not the main focus of the paper, we would like to proceed without conducting further experiments in this area.

      We now provide additional data showing that elevated levels of the PbpC membrane anchor are able to recruit the MTS-free BacA variant (∆2-8) to the cytoplasmic membrane and stimulate its assembly at the stalked pole (Figure 9). These results now integrate Figure 8 more effectively into the overall theme of the paper.

      Reviewer #2 (Public review):

      Summary:

      The authors of this study investigated the membrane-binding properties of bactofilin A from Caulobacter crescentus, a classic model organism for bacterial cell biology. BacA was the progenitor of a family of cytoskeletal proteins that have been identified as ubiquitous structural components in bacteria, performing a range of cell biological functions. Association with the cell membrane is a common property of the bactofilins studied and is thought to be important for functionality. However, almost all bactofilins lack a transmembrane domain. While membrane association has been attributed to the unstructured N-terminus, experimental evidence had yet to be provided. As a result, the mode of membrane association and the underlying molecular mechanics remained elusive.

      Liu at al. analyze the membrane binding properties of BacA in detail and scrutinize molecular interactions using in-vivo, in-vitro and in-silico techniques. They show that few N-terminal amino acids are important for membrane association or proper localization and suggest that membrane association promotes polymerization. Bioinformatic analyses revealed conserved lineage-specific N-terminal motifs indicating a conserved role in protein localization. Using HDX analysis they also identify a potential interaction site with PbpC, a morphogenic cell wall synthase implicated in Caulobacter stalk synthesis. Complementary, they pinpoint the bactofilin-interacting region within the PbpC C-terminus, known to interact with bactofilin. They further show that BacA localization is independent of PbpC.

      Strengths:

      These data significantly advance the understanding of the membrane binding determinants of bactofilins and thus their function at the molecular level. The major strength of the comprehensive study is the combination of complementary in vivo, in vitro and bioinformatic/simulation approaches, the results of which are consistent.

      Thank you for this positive feedback.

      Weaknesses:

      The results are limited to protein localization and interaction, as there is no data on phenotypic effects. Therefore, the cell biological significance remains somewhat underrepresented.

      We agree that it is interesting to investigate the phenotypic effects caused by the reduced membrane binding activity of BacA variants with defects in the MTS. We have now included phenotypic analyses that shed light on the role of region C1 in the localization of PbpC and its function in stalk elongation under phosphate-limiting conditions (see below).

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      To address the missing estimation of biological relevance, some additional experiments may be carried out.

      For example, given that BacA localizes PbpC by direct interaction, one might expect an effect on stalk formation if BacA is unable to bind the membrane or to polymerize. The same applies to PbpC variants lacking the C1 region. As the mutant strains are available, these data are not difficult to obtain but would help to compare the effect of the deletions with previous data (e.g. Kühn et al.) even if the differences are small.

      We have now analyzed the effect of the removal of region C1 on the ability of mVenus-PbpC to promote stalk elongation in C. crescentus under phosphate starvation. Interestingly, our results show that the lack of the BacA-interaction motif impairs the recruitment of the fusion protein to the stalked pole, but it does not interfere with its stimulatory effect on stalk biogenesis. Thus, the polar localization of PbpC does not appear to be critical for its function in localized peptidoglycan synthesis at the stalk base. These results are now shown in Figure 8–Figure supplement 4. The results obtained may be explained by residual transient interactions of mVenus-PbpC with proteins other than BacA at the stalked pole. Notably, PbpC has also been implicated in the attachment of the stalk-specific protein StpX to components of the outer membrane at the stalk base. The polar localization of PbpC may therefore be primarily required to ensure proper StpX localization, consistent with previous work by Hughes et al. (Mol Microbiol, 2013) showing that StpX is partially mislocalized in a strain producing an N-terminally truncated PbpC variant that no longer localizes to the stalk base.

      We have also attempted to investigate the ability of the Δ2-8 and F130R variants of BacA-mVenus to promote stalk elongation under phosphate starvation. However, the levels of the WT, Δ2-8 and F130R proteins and their stabilities were dramatically different after prolonged incubation of the cells in phosphate-limited medium, so that it was not possible to draw any firm conclusions from the results obtained (not shown).

      In addition, the M23-like endopeptidase LdpA is proposed to be a client protein of BacA (in C. crescentus, Billini et al. 2018, and H. neptunium or R. rubrum, Pöhl et al. 2024). In H. neptunium, it is suggested that the interaction is mediated by a cytoplasmic peptide of LmdC reminiscent of PbpC. This should at least be commented on. It would be interesting to see, if LpdA in C. crescentus is also delocalized and if so, this could identify another client protein of BacA.

      We agree that it would be interesting to study the role of BacA in LdpA function. However, we have not yet succeeded in generating a stable fluorescent protein fusion to LdpA, which currently makes it impossible to study the interplay between these two proteins in vivo. The focus of the present paper is on the mode of interaction between bactofilins and the cytoplasmic membrane and on the mutual interdependence of membrane binding and bactofilin polymerization. Given that PbpC is so far the only verified interaction partner of BacA in C. crescentus, we would like to limit our analysis to this client protein.

      Further comments:

      L105: analyze --> analyzed

      Done.

      L169: Is there any reason why the MTS of E. coli MreB was doubled?

      Previous work has shown that two tandem copies of the N-terminal amphiphilic helix of E. coli MreB were required to partially target a heterologous fusion partner protein (GFP) to the cytoplasmic membrane of E. coli cells (Salje et al, 2011).

      Fig. S3:

      a) Please decide which tag was used (mNG or mVenus) and adapt the figure or legend accordingly.<br /> b) In the legend for panel (C), please describe how the relative amounts were calculated, as the fractions arithmetically cannot add to > 100%. I guess each band was densiometrically rated and independently normalized to the whole-cell signal?

      The fluorescent tag used was mNeonGreen, as indicated in the figure. We have now corrected the legend accordingly. Thank you for making us aware of the wrong labeling of the y-axis. We have now corrected the figure and describe the method used to calculate the plotted values in the legend.

      Legend of Fig 1b: It is not clear to me, to which part of panel B the somewhat cryptic LY... strain names belong. I suggest putting them either next to the images, to delete them, or at least to unify the layout (compare, e.g. to Fig S7). (I would delete the LY numbers and stay with the genes/mutations throughout. This is just a suggestion).

      These names indicate the strains analyzed in panel B, and we have now clarified this in the legend. It is more straightforward to label the images according to the mutations carried by the different strains. Nevertheless, we would like to keep the strain names in the legend, so that the material used for the analysis can be clearly identified.

      Fig. 2a: As some of the colors are difficult to distinguish, I suggest sorting the names in the legend within the graph according to the slope of the curves (e.g. K4E K7E (?) on top and WT being at the bottom).

      Thank you for this suggestion. We have now rearranged the labels as proposed.

      In the legend (L924), correct typo "panel C" to "panel B".

      Done.

      Fig. 3: In the legend, I suggest deleting the abbreviations "S" and "P" as they do not show up in the image. In line 929, I suggest adding: average "relative" amount... or even more precisely: "average relative signal intensities obtained..."

      We have removed the abbreviations and now state that the bars indicate the “average relative signal intensities” obtained for the different fractions.

      Fig 4d: same suggestion as for Fig. 2a.

      Done.

      Fig 8: In the legend (L978), delete 1x "the"

      Done.

      L258 and Fig. S5: The expression "To account for biases in the coverage of bacterial species" seems somewhat unclear. I suggest rephrasing and adding information from the M+M section here (e.g. from L593, if this is meant).

      We now state that this step in the analysis pipeline was performed “To avoid biases arising from the over-representation of certain bacterial species in UniProt”.

      I appreciate the outline of the workflow in panel (a) of Fig. S5. It would be even more useful when some more details about the applied criteria for filtering would be provided (e.g. concerning what is meant with "detailed taxonomic information" or "filter out closely related sequences". Does the latter mean that only one bactofilin sequence per species was used? (As quite many bacteria have more than one but similar bactofilins.)

      We removed sequences from species with unclear phylogeny (e.g. candidate species whose precise taxonomic position has not yet been determined). For many pathogenic species, numerous strains have been sequenced. To account for this bias, only one sequence from clusters of highly similar bactofilin sequences (>90% identity) was retained per species. This information has now been included in the diagram. It is true that many bacteria have more than one bactofilin homolog. However, the sequences of these proteins are typically quite different. For instance, the BacA and BacB from C. crescentus only share 52% identity. Therefore, our analysis does not systematically eliminate bactofilin paralogs that coexist in the same species.

      L281: Although likely, I am not sure if membrane binding has ever been shown for a bactofilin from these phyla. (See also L 380.) Is there an example? Otherwise, membrane binding may not be a property of these bactofilins.

      To our knowledge, the ability of bactofilins from these clades to interact with membranes has not been investigated to date. We agree that the absence of an MTS-like motif may indicate that they lack membrane binding activity, and we have now stated this possibility in the Results and Discussion.

      L285: See comment above concerning the M23-like peptidase LpdA. Although not yet directly shown for C. crescentus, it seems likely that BacACc does also localize this peptidase in addition to PbpC. I suggest rephrasing, e.g. "known" --> "shown"

      We now use the word “reported”.

      L295 and Fig S8: PbpC is ubiquitous. Which criteria/filters have been applied to select the shown sequences?

      C. crescentus PbpC is different from E. coli Pbp1C. It is characterized by distinctive, conserved N- and C-terminal tails and only found in C. crescentus and close relatives. The C. crescentus homolog of E. coli PbpC is called PbpZ (Yakhnina et al, J Bacteriol, 2013; Strobel et al, J Bacterol, 2014), whereas C. crescentus PbpC is related to E. coli PBP1A. We have now added this information to the text to avoid confusion.

      L311: may replace "assembly" by "polymerization"

      Done.

      L320: bactofilin --> bactofilin domain?

      Yes, this was supposed to read “bactofilin domain”. Thank you for spotting this issue.

      L324: The HDX analysis of BacA suggests that the exchange is slowed down in the presence of the PbpC peptide, which is indicative of a physical interaction between these two molecules. To corroborate the claim that BacA polymerization is critical for interaction with the peptide (resp. PbpC), this experiment should be carried out with the polymerization defective BacA version F130R.

      (Or tone this statement down, e.g. show --> suggest.)

      “suggest”

      L386: undergoes --> undergo

      Done.

      L391-400: This idea is tempting but the suggested mechanism then would be restricted to bactofilins of C. crescentus and close relatives. The bactofilin of Rhodomicrobium, for example, was shown to localize dynamically and not to stick to a positively curved membrane.

      In the vast majority of species investigated so far, bactofilins were found to associate with specifically curved membrane regions and to contribute to the establishment of membrane curvature. Unfortu­nately, the sequences of the three co-polymerizing bactofilin paralogs of R. vannielii DSM 166 studied by Richter et al (2023) have not been reported and the genome sequence of this strain is not publicly available. However, in related species with three bactofilin paralogs, only one paralog shows an MTS-like N-terminal peptide and another paralog typically contains an unusual cadherin-like domain of unknown function, as also reported for R. vannielii DSM 166. Therefore, the mechanism controlling the localization dynamics of bactofilins may be complex in the Rhodomicrobium lineage. Nevertheless, at native expression levels, the major bactofilin (BacA) of R. vannielii DSM 166 was shown to localize predominantly to the hyphal tips and the (incipient) bud necks, suggesting that regions of distinct membrane curvature could also play a role in its recruitment. We do not claim that all bactofilins recognize positive membrane curvature, which is clearly not the case. It rather appears as though the curvature preference of bactofilins varies depending on their specific function.

      L405-406: I agree that localization of BacA has been shown to be independent of PbpC. However, this does not generally preclude an effect on BacA localization by other "client" or interacting proteins. (See also comment above about the putative BacA interactor LpdA). I suggest either to corroborate or to change this statement from "client binding" to "PbpC binding".

      Thank you for pointing out the imprecision of this statement. We now conclude that “PbpC binding” is not critical for BacA assembly and positioning.

      Suppl. Fig. S11: In the legend, please correct the copy-paste mismatch (...VirB...).

      Done.

      L482: delete 1x "at"

      Done.

      L484: may be better "soluble and insoluble fractions"?

      We now describe the two fractions as “soluble and membrane-containing insoluble fractions” to make clear to all readers that membrane vesicles are found in the pellet after ultracentrifugation.

      L489-490: check spelling immunoglobulin – immuneglobulin

      Done.

      L500 and 504: º_C --> ºC

      Done.

      Suppl. file X (HDX data): please check the table headline, table should be included in Suppl. file 1

      We have now included a headline in this file (now Supplementary file 3).

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      The authors have studied how a virus (EMCV) uses its RNA (Type 2 IRES) to hijack the host's protein-making machinery. They use cryo-EM to extract structural information about the recruitment of viral Type 2 IRES to ribosomal pre-IC. The authors propose a novel interaction mechanism in which the EMCV Type 2 IRES mimics 28S rRNA and interacts with ribosomal proteins and initiator tRNA (tRNAi).

      Strengths:

      (1) Getting structural insights about the Type 2 IRES-based initiation is novel.

      (2) The study allows a good comparison of other IRES-based initiation systems.

      (3) The manuscript is well-written and clearly explains the background, methods, and results.

      We thank Reviewer 1 for appreciating our efforts and finding structural insights about the type 2 IRES-based initiation presented in this study as novel.

      Weaknesses:

      (1) The main weakness of the work is the low resolution of the structure. This limits the possibility of data interpretation at the molecular level.

      However, despite the moderate resolution of the cryo-EM reconstructions, the model fits well into the density. The analysis of the EMCV IRES-48S PIC structure is thorough and includes meaningful comparisons to previously published structures (e.g., PDB IDs - 7QP6 and 7QP7). These comparisons showed that Map B1 represents a closed conformation, in contrast to Map A in the open state (Figure 2). Additionally, the proposed 28S rRNA mimicry strategy supported by structural superposition with the 80S ribosome and sequence similarity between the I domain of the IRES and the h38 region of 28S rRNA (Fig. 4) is welljustified.

      We agree that the low resolution of the map has compromised the data interpretation at the molecular level, and we thank the reviewer for appreciating our findings at this resolution. Due to the compromise in resolution, we have reported findings related to stretches or regions such as loops and stems, rather than individual nucleotides and interactions.  

      (2) The lack of experimental validation of the functional importance of regions like the GNRA and RAAA loops is another limitation of this study.

      We agree with the lack of any additional experiments other than Cryo-EM for probing the importance of regions such as GNRA and RAAA loops in this study. However, we have cited earlier reports that demonstrate the importance of these regions for overall IRES activity. The essentiality of RAAA loop for type 2 IRES was demonstrated in earlier report López de Quinto and Martínez-Salas, 1997 (Cited in manuscript). Further, the conservation of this loop across the type 2 IRES family adds to the importance of this loop (Manuscript Figure 6B). This loop and its flanking G-C stem are similar to h38 of 28S rRNA, and it appears that RAAA loop adopts a mimicry mechanism to interact with the 40S ribosomal protein- uS19, thus highlighting its importance for interaction with 40S. Experiments destabilising the G-C stem also compromise IRES activity, as shown in the case of FMDV IRES (Fernández et al 2011). Previous studies related to the mutation of the GNRA or GCGA loop in EMCV IRES have shown a deficiency in IRES activity (Roberts and Belsham, 1997; Robertson et al 1999), suggesting the importance of these regions in the viral IRES biology, and these reports are cited in the manuscript. Not only EMCV IRES, but mutation in the GUAA (representative of GNRA) loop of FMDV IRES also showed significant reduction in IRES activity (López de Quinto and Martínez-Salas, 1997). In our study, we observe that GCGA loop interacts with tRNA<sub>i</sub> in EMCV IRES-48S PIC, thus implicating the importance of this loop. Moreover, incubation of FMDV IRES with 40S ribosomes has shown a decrease in SHAPE reactivity in domain 3 apex (position 170- 200 nucleotides) (Lozano et al 2018), which corresponds to EMCV IRES domain I apex. Further, we will attempt to address the concern of lack of experimental validation of GNRA and RAAA loops by performing biochemical assays.

      (3) Minor modifications related to data processing and biochemical studies will further validate and strengthen the findings.

      a) In the cryo-EM data section, the authors should include an image showing rejected particles during 2D classification. This would help readers understand why, despite having over 22k micrographs with sufficient particle distribution and good contrast, only a smaller number of particles were used in the final reconstruction. Additionally, employing mapsharpening tools such as Ewald sphere correction, Bayesian polishing, or reference-based motion correction might further improve the quality of the maps. Targeting high-resolution structures would be particularly informative.

      We thank the reviewer for the suggestions, and we would employ suggested processes that may help improve the quality of the maps further. We will include image for rejected 2D classes in the revised manuscript. We agree with the Reviewer’s query related to the substantial number of micrographs and smaller number of particles for the final reconstruction. The total number of micrographs is the summation of multiple datasets, prepared and collected at various times. Among these, around 8000 micrographs have extremely poor particle number and distribution. As a result, the number of particles per micrograph is heterogeneous in the compiled dataset. We obtained only 237054 ‘good particles’ after multiple rounds of 2D & 3D classifications, and the final reconstruction has 28439 particles (~12%). This class was obtained after masked classification for IRES and ternary complex density. Hence, only the particles that show the best density for both IRES and ternary complex are used for reconstructing this map. Another set of particles that have only a portion of IRES and tRNA but NO density for eIF2 forms another map (26792 particles, 11.3%). Thus, we obtained a total of 55231 particles (23.3%) with IRES density.  

      b) The strategic modelling of different IRES domains into the density, particularly the domain into the region above the 40S head, is appreciable. However, providing the full RNA tertiary structure (RNAfold) of the EMCV IRES (nucleotides 280-905) would better explain the logic behind the model building and its molecular interpretation.

      We thank the reviewer for appreciating the modelling of the domain I apex in the cryo-EM density. We tried to predict the full tertiary structure of the IRES, however, inclusion of the full-length sequence from 280-905 gave models of extremely low confidence, and few domains do not abide by the secondary structure of EMCV IRES as reported in Duke et al 1992. Hence, we used individual domains of EMCV IRES and predicted the tertiary structure independent of other IRES domains. Furthermore, 3D models of FMDV IRES domains 2, 3, and 4 (corresponding to EMCV IRES domains- H, I, and J-K) were predicted from SHAPE reactivity values and RNAComposer server (Figure 3 in Lozano et al 2018). The predicted architecture of domain 3 apex (FMDV IRES) coincides with our I domain apex model (EMCV IRES).

      c)  Although the authors compare their findings with other types of IRESs (Types 1, 3, and 4), there is no experimental validation of the functional importance of regions like the GNRA and RAAA loops. Including luciferase-based assays or mutational studies of these regions for validation of structural interpretations is strongly recommended.

      We have discussed the possibility of how the other IRESs, such as type 1 and type 5 (Aichi virus), might use similar strategies as EMCV IRES to assemble the 48S PIC, given the similarity in the motif sequence and position across the viral IRESs. Like EMCV IRES, the type 1 IRES (e.g. Poliovirus, Coxsackie virus) also harbours the GNRA loop, preceded by a C-rich loop at its longest domain, known for long-range RNA-RNA interactions. The segment harbouring GNRA loop is highly conserved across the type 1 family of IRESs (Kim et al 2015).The Aichi viral IRES (type 5) harbours a GNRA loop in its longest domain, which is domain J. Deletion of the GNRA loop has compromised the IRES activity; however, substitution mutations in this region either elevated the IRES activity or it remained unaltered (Yu et al 2011). We have hypothesized that these IRESs (type 1 and type 5) might use the GNRA motifs in their longest domain (domain IV in type 1, and domain J in type 5) similar to that of EMCV IRES, where GNRA is present in the longest domain (I) and preceded by a C-rich loop. Thus, GNRA can potentially mediate long-range interactions with tRNA<sub>i</sub> as all these IRESs require eIF2-ternary complex for the formation of 48S PIC. Parallelly, like EMCV IRES, type 1 and type 5 IRESs also have similar placement of GNRA motif-containing domain before the eIF4G-binding domain (domain J-K in EMCV IRES, domain V in poliovirus, domain K in Aichi virus). Hence, we suggest the possibility of a similar strategy by these IRESs to interact with tRNA<sub>i</sub> during the formation of 48S PIC.  

      Reviewer #2 (Public review):

      Summary:

      The field of protein translation has long sought the structure of a Type 2 Internal Ribosome Entry Site (IRES). In this work, Das and Hussain pair cryo-EM with algorithmic RNA structure prediction to present a structure of the Type 2 IRES found in Encephalomyocarditis virus (EMCV). Using medium to low resolution cryo-EM maps, they resolve the overall shape of a critical domain of this Type 2 IRES. They use algorithmic RNA prediction to model this domain onto their maps and attempt to explain previous results using this model.

      Strengths:

      (1) This study reveals a previously unknown/unseen binding modality used by IRESes: a direct interaction of the IRES with the initiator tRNA.

      (2) Use of an IRES-associated factor to assemble and pull down an IRES bound to the small subunit of the ribosome from cellular extracts is innovative.

      (3) Algorithmic modeling of RNA structure to complement medium to low resolution cryoEM maps, as employed here, can be implemented for other RNA structures.

      We thank Reviewer 2 for positive and encouraging comments on our work, appreciating our ‘innovative’ approach of using IRES-associated factor to assemble and pull down IRES-bound ribosomal complex.  

      Weaknesses:

      (1) Maps at the resolution presented prevent unambiguous modelling of the EMCV-IRES. This, combined with the lack of any biochemical data, calls into question any inferences made at the level of individual nucleotides, such as the GNRA loop and CAAA loop (Figure 4).

      We understand the concerns raised by the reviewer related to the resolution of the EMCV IRES-48S PIC map. However, we would like to mention that we refrained from commenting on individual nucleotides or molecular interactions in the manuscript. Instead, we discuss about loops, RNA stretches or motifs that could be inferred with more confidence as shown in Manuscript Figure 4. The EMCV IRES can directly interact with the 40S ribosome using its domain H and I (Chamond et al 2014), however, the details this interaction was unknown. We observe that the CAAA loop of domain I apex interacts with 40S ribosome based on the placement of portion of domain I in the cryo-EM map. This is also reflected in the earlier reported SHAPE data (Supplementary figures 2, and 8 in Chamond et al 2014), where a decrease in reactivity is evident in the presence of 40S ribosome. In addition, incubation of EMCV IRES with rabbit reticulocyte lysate (RRL) offered protection to domain I apex regions, which included the CAAA loop (Figure 4b in Maloney and Joseph, 2024).

      Furthermore, this decrease in SHAPE reactivity pattern is also evident for FMDV IRES domain 3 apex (like domain I in EMCV IRES) in the presence of 40S ribosome (Lozano et al 2018).

      Thus, these studies are consistent with the placement of IRES model in the cryo-EM map.

      We aim to improve the resolution of the maps for better clarity and add biochemical experiments to justify the possible interactions.

      (2) The EMCV IRES contains an upstream AUG at position 826, where the PIC can assemble (Pestova et al 1996; PMID 8943341). It is unclear if this start codon was mutated in this study. If it were not mutated, placement of AUG-834 over AUG-826 in the P-site is unexplained.

      We thank the reviewer for bringing up this point, as we missed mentioning this in the manuscript. The EMCV IRES does not require scanning and directly positions the AUG-834 at the P site (Pestova et al 1996). In Pestova et al 1996, the intensity of the toeprint at AUG-834 is much more intense than that of AUG-826. Further, AUG-834 lies in the Kozak context, whereas AUG-826 has a poor Kozak context. Furthermore, the synthesis of the polypeptide requires placement of AUG-834 at the P site. In our cryo-EM map, we observed that the tRNA<sub>i</sub> is in a P<sub>IN</sub> state, which indicates the recognition of the start codon, and we reasoned that it is more likely that AUG-834 is placed at the P site than AUG-826. We will mention this in the revised manuscript, as we had NOT mutated AUG-826.

      (3) The claims the authors make about (i) the general overall shape and binding site of the IRES, (ii) its gross interaction with the two ribosomal proteins, (iii) the P-in state of the 48S, (iv) the rearrangement of the ternary complex are all warranted. Their claims about individual nucleotides or smaller stretches of the IRES-without any supporting biochemical data-is not warranted by the data.

      We thank the reviewer for warranting major claims, and we wish to make further improvements to support our assessment of small stretches and individual nucleotides.

      Reviewer #3 (Public review):

      Summary:

      Type II IRES, such as those from encephalomyocarditis virus (EMCV) and foot-and-mouth disease virus (FMDV), mediate cap-independent translation initiation by using the full complement of eukaryotic initiation factors (eIFs), except the cap-binding protein eIF4E. The molecular details of how IRES type II interacts with the ribosome and initiation factors to promote recruitment have remained unclear. Das and Hussain used cryo-electron microscopy to determine the structure of a translation initiation complex assembled on the EMCV IRES. The structure reveals a direct interaction between the IRES and the 40S ribosomal subunit, offering mechanistic insight into how type II IRES elements recruit the ribosome.

      Strengths:

      The structure reveals a direct interaction between the IRES and the 40S ribosomal subunit, offering mechanistic insight into how type II IRES elements recruit the ribosome.

      Weaknesses:

      While this reviewer acknowledges the technical challenges inherent in determining the structure of such a highly flexible complex, the overall resolution remains insufficient to fully support the authors' conclusions, particularly given that cryo-EM is the sole experimental approach presented in the manuscript.

      The study is biologically significant; however, the authors should improve the resolution or include complementary biochemical validation.

      We thank Reviewer 3 for acknowledging the technical challenges in this study and finding our study biologically significant. We understand the concerns related to low resolution and the requirement of complementary biochemical validation for our reported observations and interpretations in the manuscript. We are attempting to improve the resolution and complement the interpretations with biochemical experiments.

    1. Author response:

      The following is the authors’ response to the original reviews

      Joint Public Review:

      Summary:

      In this study, Daniel et al. used three cognitive tasks to investigate behavioral signatures of cerebellar degeneration. In the first two tasks, the authors found that if an equation was incorrect, reaction times slowed significantly more for cerebellar patients than for healthy controls. In comparison, the slowing in the reaction times when the task required more operations was comparable to normal controls. In the third task, the authors show increased errors in cerebellar patients when they had to judge whether a letter string corresponded to an artificial grammar.

      Strengths:

      Overall, the work is methodologically sound and the manuscript well written. The data do show some evidence for specific cognitive deficits in cerebellar degeneration patients.

      Thank you for the thoughtful summary and constructive feedback. We are pleased that the methodological rigor and clarity of the manuscript were appreciated, and that the data were recognized as providing meaningful evidence regarding cognitive deficits in cerebellar degeneration.

      Weaknesses:

      The current version has some weaknesses in the visual presentation of results. Overall, the study lacks a more precise discussion on how the patterns of deficits relate to the hypothesized cerebellar function. The reviewers and the editor agreed that the data are interesting and point to a specific cognitive deficit in cerebellar patients. However, in the discussion, we were somewhat confused about the interpretation of the result: If the cerebellum (as proposed in the introduction) is involved in forming expectations in a cognitive task, should they not show problems both in the expected (1+3 =4) and unexpected (1+3=2) conditions? Without having formed the correct expectation, how can you correctly say "yes" in the expected condition? No increase in error rate is observed - just slowing in the unexpected condition. But this increase in error rate was not observed. If the patients make up for the lack of prediction by using some other strategy, why are they only slowing in the unexpected case? If the cerebellum is NOT involved in making the prediction, but only involved in detecting the mismatch between predicted and real outcome, why would the patients not show specifically more errors in the unexpected condition?

      Thank you for asking these important questions and initiating an interesting discussion. While decision errors and processing efficiency are not fully orthogonal and are likely related, they are not necessarily the same internal construct. The data from Experiments 1 and 2 suggest impaired processing efficiency rather than increased decision error. Reaction time slowing without increased error rates suggests that the CA group can form expectations but respond more slowly, possibly due to reduced processing efficiency. Thus, this analysis of our data suggests that the cerebellum is not essential for forming expectations, but it plays a critical role in processing their violations.

      Relatedly, a few important questions remain open in the literature concerning the cerebellum’s role in expectation-related processes. The first is whether the cerebellum contributes to the formation of expectations or the processing of their violations. In Experiments 1 and 2, the CA group did not show impairments in the complexity manipulation. Solving these problems requires the formation of expectations during the reasoning process. Given the intact performance of the CA group, these results suggest that they are not impaired in forming expectations. However, in both Experiments 1 and 2, patients exhibited selective impairments in solving incorrect problems compared to correct problems. Since expectation formation is required in both conditions, but only incorrect problems involve a VE, we hypothesize that the cerebellum is involved in VE processes. We suggest that the CA group can form expectations in familiar tasks, but are impaired in processing unexpected compared to expected outcomes. This supports the notion that the cerebellum contributes to VE, rather than to forming expectations.

      In Experiment 3, during training, the participant is learning a novel rule (grammar), forming new expectations on how strings of letters should be. Afterwards, during testing, the participant is requested to identify if a novel string is following the rule or not. We examined sensitivity to distinguish between grammatical and non‐grammatical strings of letters, thus taking into account a baseline ability to identify expected strings. Additionally, both in the low‐similarity and highsimilarity conditions, there are expectations regarding whether the strings are following the rule or not. However, in the high‐similarity condition, there is more uncertainty regarding which strings are following the grammatical rule, as demonstrated in a lower sensitivity (d prime). Given the group differences only in the low similarity condition, these results suggest the CA group is impaired only when the rules are more certain. Given these results, we suggest that forming cognitive expectations is not necessarily dependent on the cerebellum. Rather, we propose that the cerebellum is critical for processing rule-based VE (detection or processing of detected errors) under conditions of more certainty. One remaining question for future studies is whether the cerebellum contributes to detection of a mismatch between the expectation and sensory evidence, or the processing of a detected VE. 

      We suggest that these key questions are relevant to both motor and non-motor domains and were not fully addressed even in the previous, well-studied motor domain. Importantly, while previous experimental manipulations17,19,40,94–96 have provided important insights regarding the cerebellar role in these processes, some may have confounded these internal constructs due to task design limitations (e.g., lack of baseline conditions). Notably, some of these previous studies did not include control conditions, such as correct trials, where there was no VE. In addition, other studies did not include a control measure (e.g., complexity effect), which limits their ability to infer the specific cerebellar role in expectation manipulation. 

      Thus, the current experimental design used in three different experiments provides a valuable novel experimental perspective, allowing us to distinguish between some, but not all, of the processes involved in the formation of expectations and their violations. For instance, to our knowledge, this is the first study to demonstrate a selective impairment in rule-based VE processing in cerebellar patients across both numerical reasoning and artificial grammar tasks. If feasible, we propose that future studies should disentangle different forms of VE by operationalizing them in experimental tasks in an orthogonal manner. This will allow us to achieve a more detailed and well-defined cerebellar motor and non-motor mechanistic account.

      Recommendations for the authors:

      Editors comments:

      The Figures are somewhat sub-standard and should be improved before the paper is made the VOR. Ensure consistent ordering of the group factor (CA, NT) and experimental factor across Figure 3,4, and 6 (panels A). Having the patient group as columns in Figure 4a and in rows in Figure 6a is very confusing.

      We have standardized the layout across Figures 2, 4, and 6 so that the group factor (CA, NT) and experimental conditions are consistently ordered. In all panels, the group factor now appears as a column.

      Subpanels should be numbered A,B,C... not A, B1, B2.

      Subpanel labels have been updated to follow the standard A, B, C format across all figures.

      Fonts should have a 100% aspect ratio - they should not be stretched (Figure 6B).

      We have corrected the font aspect ratios in all figures (e.g., Figure 6B) to ensure proper proportions and readability. 

      Colors should be more suitable to print - use a CYMK color scheme (i.e. avoid neon colors such as the neon green for the CA).

      The color scheme across all figures has been revised to be print-friendly using CMYKcompatible, colorblind-accessible palettes. Neon green for the CA group was replaced with a more muted, distinguishable color.

      Abstract: "The CA group exhibited a disproportionate cost when comparing expected problems compared to unexpected problems" - I recommend switching unexpected and expected, as the disproportional cost in on the former.

      We have changed the wording of the sentence accordingly. 

      Upon re-reading the details for the AGL task were not clear to us. Please do not rely on the reference (78) for the details - your paper should contain enough information to have the reader understand the experimental details. For you to appreciate the depth of our not-understanding, here a simple question: The test strings either followed the grammar in Fig 5 or they did not. If they did not, how exactly was similarity to the grammar measured? If they did, what was the difference between the “Grammatical-high” and “Grammatical-low” trials? If the string was grammatical, there should not be a notion of similarity, no? Or where these trials arbitrary split in half? 

      We have clarified that 50% of the test strings followed the grammar of the training strings. We also elaborated on the calculation of chunk strength as a measure of similarity between the training and testing strings, similar to the previous papers. The differences between low and high similarity are explained in the paper. Specifically, for each test string, we calculated chunk strength by summing the frequencies of all relevant substrings (e.g., bigrams and trigrams) that appeared in the training set. The test strings whose chunk‐strength values fell above the median for grammatical items were classified as “high similarity,” while those falling below the median were classified as “low similarity.” Also, grammatical strings can be of both low and high similarity; this is precisely the beautiful aspect of this experimental manipulation, showing the importance of uncertainty. We have utilized a 2 × 2 fully orthogonal design (grammaticality × similarity).

      Experimental details of the task should be added to the Method section. In the results you should only mention the experimental details that are necessary for understanding the experiments, but details such as the number of trials, etc, can be moved to the methods. 

      We have now moved the experimental task details to the Method sections.

      Reviewer #1 (Recommendations for the author):

      Studies have been done online and not in the lab. Could that have affected the results?

      We addressed this in the Methods section, referring to established protocols for online neuropsychological testing[9–12]. Our results align with similar in-lab findings in both the subtraction and AGL tasks, supporting the online approach's robustness. 

      Figure 2, B1; Figure 4, B1; Figure 6B: How many patients performed worse than the (worst-performing) controls? There appears to be quite some overlap between patients and controls. In the patients who performed worse, was there any difference from the other patients (e.g. disease severity as assessed by SARA score, repeat length, data of attention probes)?

      We appreciate the reviewer’s thoughtful comment. We considered conducting individual-level comparisons to identify patients who performed worse than the lowest-performing controls. However, defining "worse" based on the performance of the lowest control is only one possible criterion. Other definitions—such as a specific number (1/2/3?) of standard deviations below the control mean—are also commonly used in literature, and each may yield different conclusions. This variability highlights the lack of a standardized threshold for what constitutes “worse” or "impaired" performance at the individual level. Given this ambiguity, and in line with prior studies that focus on average group differences rather than “impairment” prevalence, we chose not to include these individual-level comparisons. We believe this approach better aligns with the goals and design of the current study. That said, we agree that examining individual variability is important and may be more appropriate in future studies with larger samples so that percentage is a more robust measure. However, given the rarity of the disease, this would also be a challenge for future studies.  

      SARA ataxia scale does not include oculomotor function. In SCA6 oculomotor deficits are frequent, eg, downbeat nystagmus. Please include information on oculomotor dysfunction.

      We thank the reviewer for this important observation. While it is true that the SARA scale does not explicitly assess oculomotor function, our experimental design – in all three experiments – has control conditions that help account for general processing differences, including those that could arise from oculomotor deficits. These conditions, such as the correct trials and the complexity effects, allow us to isolate effects specifically related to the violation of expectation while minimizing the influence of broader performance factors, such as eye movement abnormalities. We also note that, while some patients can experience oculomotor symptoms such as downbeat nystagmus, none of our tasks required precise visual tracking or gaze shifts. In our experimental tasks, stimuli were centrally presented, and no visual tracking or saccadic responses were required. Moreover, the response time windows and stimulus durations (>2–5 s) were sufficient to mitigate the effects of delayed visual processing due to oculomotor impairment.

      Why was MoCA used and not the CCAS-Schmahmann scale to assess cognitive function?

      We selected the MoCA due to its broad clinical utility, time efficiency, and ability to detect mild cognitive impairment specifically in CA[101,102].  

      Were there any signs of depression in the patient group that could have affected the results?

      None of the patients had a clinical diagnosis of depression or were undergoing psychiatric treatment.  

      Additionally, the interaction between group and expectancy was insignificant when RT was the depended vaibale .." = variable

      This has been corrected to "variable" in the revised manuscript.

      Reviewer #2 (Recommendations for the authors):

      The terms 'unexpected' and 'expected' conditions are confusing. [...] Terming this 'violation of expectation' seems unnecessarily complicated to me. 

      We thank the reviewer for raising this important concern. We recognize that the terms "expected" and "unexpected" can be ambiguous without clarification, and that "violation of expectation" (VE) may initially appear unnecessary. Our choice to use VE terminology is grounded in an established theoretical framework that distinguishes between mere stimulus correctness and prediction mechanisms. Specifically, VE captures the internal processing of mismatches between anticipated and observed outcomes, which we believe is central to the cerebellar function under investigation. While simpler, technical alternatives (e.g., "correct" vs. "incorrect") could describe the stimuli, we find that VE more accurately reflects the mental constructs under study and is consistent with previous literature in both motor and cognitive domains. 

      Both tasks provide an error (or violation of expectation) that is non-informative and therefore unlikely to be used to update a forward model. The authors draw on motor literature to formulate a cognitive task where the presence of an error would engage the cerebellum and lead to longer reaction times in cerebellar patients. But in the motor domain, mismatch of sensory feedback and expectations would lead to an updating of the internal forward model. It seems unlikely to me in the arithmetic and alphabetic addition tasks that patients would update their internal model of addition according to an error presented at the end of each trial. If the error processed in these tasks will not lead to the updating of the internal forward model, can the authors discuss to what extent the cerebellum will be engaged similarly in these tasks, and what exactly connects cerebellar processing in these motor and cognitive tasks.

      We thank the reviewer for this thoughtful and important comment. We fully agree that the current tasks do not directly probe learning-related updating of internal models. As stated in the paper, the goal of the present study was not to support or refute a specific claim regarding the cerebellum’s role in learning processes. Rather, our focus was on examining cerebellar involvement in the processing of VE. While we were inspired by models from the motor domain, our design was not intended to induce learning or adaptation per se, but to isolate the processing of unexpected outcomes. We agree that the tasks in their current form are unlikely to engage forward model updating in the same way as in sensorimotor adaptation paradigms. That said, we believe the current findings can serve as a basis for future research exploring the relationship between cerebellar prediction error processing and learning over time. As we also noted in the paper, this is a direction we propose, and actively pursuing, in ongoing research work.

      The colour scheme is difficult for anyone with colour blindness or red-green visual impairment. Please adjust.

      All figures have been revised to use CMYK-compatible, colorblind-safe palettes, and neon colors have been removed.

      The introduction is a bit difficult to understand, because the authors draw on a number of different theories about cerebellar functioning, without clearly delineating how these relate to each other. For example: a) In the paragraph beginning with 'notably': If the cerebellum is required for sequential operations, why does it show the impairment with the rotation of the letters?

      We understand the concern that if the cerebellum is involved in sequential operations, its involvement in mental letter rotation, which can be assumed as “continuous transformation,” may appear contradictory. We note that the boundary between continuous and stepwise, procedural operations is not always clear-cut and may vary depending on the participant's strategy or previous knowledge, which is not fully known to the researchers. Furthermore, to our knowledge, prior work on mental rotation has not directly investigated the impact of VE during this task. However, these are two debatable considerations. 

      More importantly, a careful reading of our paper suggests that our experiments were designed to examine VE within tasks that involve sequential processing. Notably, we are not claiming that the cerebellum is involved in sequential or procedural processing per se. Rather, our findings point to a more specific role for the cerebellum in processing VE that arises during the construction of multistep procedural tasks. In fact, the results indicate that while the cerebellum may not be directly involved in the procedural process itself, it is critical when expectations are violated within such a context. This distinction is made possible in our study by the inclusion of a control condition (the complexity effect), which allows for a unique dissociation in our experimental design—one that, to our knowledge, has not been sufficiently addressed in previous studies.

      Additionally, in the case of arithmetic problem solving—such as the tasks used in prior studies cited in our manuscript21—there is substantial evidence that these problems are typically solved through stepwise, procedural operations. Arithmetic reasoning, used in Experiments 1 and 2, has been robustly associated with procedural, multi-step strategies, which may be more clearly aligned with traditional views of cerebellar involvement in sequential operations. Thus, we propose that the role of the cerebellum in continuous transformations should be further examined. 

      We suggest a more parsimonious theory that the cerebellum contributes to VE,  a field that was highly examined before. Yet, to reconcile ours and previous findings, we propose that the cerebellum’s contribution may not be limited to either continuous or stepwise operations per se, but rather to a domain-general process: the processing of VE. This theoretical framework can explain performance patterns across both mental rotation tasks and stepwise, procedural arithmetic.   

      The authors mention generation prediction as a function of the cerebellum, processing of prediction errors (or violations of expectations), sequentially, and continuous transformations - but it is unclear whether the authors are trying to dissociate these from each other or whether ALL of these functions have informed task design.

      We propose that the cerebellum’s contribution may not be limited to either continuous transformations or stepwise, procedural operations per se, but rather to a domain-general process: the processing of VE. We would like to clarify that we do not claim the cerebellum contributes to continuous transformations only, as suggested in some earlier work[21]. Rather, it could be that the cerebellum may contribute to continuous transformations, but we propose that it also supports multi-step, procedural processes. Given that framework, in the current study, across three separate experiments, we demonstrated that the cerebellum can also contribute to procedural, multi-step reasoning tasks.  

      Minor Comments

      Typo under paragraph beginning with 'notably' - cerebellum role should be cerebellar role.

      Corrected as suggested.

      When mentioning sequences as a recruiting feature for the cerebellum in the introduction, Van Overwalle's extensive work in the social domain should be referenced for completeness.

      Thank you for the suggestion. We have now cited Van Overwalle’s work on cerebellar involvement in sequence processing within the social domain in the revised Introduction.

    1. Author response:

      Reviewer #1 (Public review): 

      The manuscript by Bru et al. focuses on the role of vacuoles as a phosphate buffering system for yeast cells. The authors describe here the crosstalk between the vacuole and the cytosol using a combination of in vitro analyses of vacuoles and in vivo assays. They show that the luminal polyphosphatases of the vacuole can hydrolyse polyphosphates to generate inorganic phosphate, yet they are inhibited by high concentrations. This balances the synthesis of polyphosphates against the inorganic phosphate pool. Their data further show that the Pho91 transporter provides a valve for the cytosol as it gets activated by a decline in inositol pyrophosphate levels. The authors thus demonstrate how the vacuole functions as a phosphate buffering system to maintain a constant cytosolic inorganic phosphate pool. 

      This is a very consistent and well-written manuscript with a number of convincing experiments, where the authors use isolated vacuoles and cellular read-out systems to demonstrate the interplay of polyphosphate synthesis, hydrolysis, and release. The beauty of this system the authors present is the clear correlation between product inhibition and the role of Pho91 as a valve to release Pi to the cytosol to replenish the cytosolic pool. I find the paper overall an excellent fit and only have a few issues, including: 

      (1) Figure 3: The authors use in their assays 1 mM ZnCl2 or 1mM MgCl2. Is this concentration in the range of the vacuolar luminal ion concentration? Did they also test the effect of Ca2+, as this ion is also highly concentrated in the lumen? 

      The concentrations inside vacuoles can reach those values. However, given that polyP is a potent chelator of divalent metal ions, what would matter are the concentrations of free Zn<sup>2+</sup> or Mg<sup>2+</sup> inside the organelle. These are not known. This is not critical since we use those two conditions only as a convenient tool to differentiate Ppn1 and Ppn2 activity in vitro. In our initial characterisation of Ppn2 (10.1242/jcs.201061), we had also tested Mn, Co, Ca, Ni, Cu. Only Zn and Co supported activity. Ca did not. Andreeva et al. (10.1016/j.biochi.2019.06.001) reached similar conclusions and extended our results.

      (2) Regarding the concentration of 30 mM K-PI, did the authors also use higher and lower concentrations? I agree that there is inhibition by 30 mM, but they cannot derive conclusions on the luminal concentration if they use just one in their assay. A titration is necessary here. 

      The concentration of 30 mM was not arbitrarily chosen. It is the luminal P<sub>i</sub> concentration that the vacuoles could reach through when they entered a plateau of luminal Pi. We consider this as an upper limit because polyP kept increasing which luminal P<sub>i</sub> did not. Thus, there is in principle no physiological motivation for trying higher values. But we will probably add a titration to the revised version.

      (3) What are the consequences on vacuole morphology if the cells lack Pho91? 

      We had not observed significant abnormalities during a screen of the genome-wide deletion collection of yeast (10.1371/journal.pone.0054160)

      (4) Discussion: The authors do not refer to the effect of calcium, even though I would expect that the levels of the counterion should affect the phosphate metabolism. I would appreciate it if they would extend their discussion accordingly. 

      We will pick this up in the discussion. However, the situation is much more complex because major pools of counterions (up to hundreds of mM) are constituted by vacuolar lysine, arginine, polyamines, Mg, Zn etc. Their interplay with polyP is probably complex and worth to be treated in a dedicated project.

      (5) I would appreciate a brief discussion on how phosphate sensing and control are done in human cells. Do they use a similar lysosomal buffer system? 

      Mammalian cells have their Pi exporter XPR1 mainly on a lysosome-like compartment (10.1016/j.celrep.2024.114316). Whether and how it functions there for Pi export from the cytosol is not entirely clear. We will address this situation in the revision.

      Reviewer #2 (Public review): 

      Summary: 

      This manuscript presents a well-conceived and concise study that significantly advances our understanding of polyphosphate (polyP) metabolism and its role in cytosolic phosphate (Pi) homeostasis in a model unicellular eukaryote. The authors provide evidence that yeast vacuoles function as dynamic regulatory buffers for Pi homeostasis, integrating polyP synthesis, storage, and hydrolysis in response to cellular metabolic demands. The work is methodologically sound and offers valuable insights into the conserved mechanisms of phosphate regulation across eukaryotes. 

      Strengths: 

      The results demonstrate that the vacuolar transporter chaperone (VTC) complex, in conjunction with luminal polyphosphatases (Ppn1/Ppn2) and the Pi exporter Pho91, establishes a finely tuned feedback system that balances cytosolic Pi levels. Under Pi-replete conditions, inositol pyrophosphates (InsPPs) promote polyP synthesis and storage while inhibiting polyP hydrolysis, leading to vacuolar Pi accumulation. 

      Conversely, Pi scarcity triggers InsPP depletion, activating Pho91-mediated Pi export and polyP mobilization to sustain cytosolic phosphate levels. This regulatory circuit ensures metabolic flexibility, particularly during critical processes such as glycolysis, nucleotide synthesis, and cell cycle progression, where phosphate demand fluctuates dramatically. 

      From my viewpoint, one of the most important findings is the demonstration that vacuoles act as a rapidly accessible Pi reservoir, capable of switching between storage (as polyP) and release (as free Pi) in response to metabolic cues. The energetic cost of polyP synthesis-driven by ATP and the vacuolar proton gradient-highlights the evolutionary importance of this buffering system. The study also draws parallels between yeast vacuoles and acidocalcisomes in other eukaryotes, such as Trypanosoma and Chlamydomonas, suggesting a conserved role for these organelles in phosphate homeostasis. 

      Weaknesses: 

      While the manuscript is highly insightful, referring to yeast vacuoles as "acidocalcisome-like" may warrant further discussion. Canonical acidocalcisomes are structurally and chemically distinct (e.g., electron-dense, in most cases spherical, and not routinely subjected to morphological changes, and enriched with specific ions), whereas yeast vacuoles have well-established roles beyond phosphate storage. A comment on this terminology could strengthen the comparative analysis and avoid potential confusion in the field. 

      Yeast vacuoles show all major chemical features of acidocalcisomes. They are acidified, contain high concentrations of Ca, polyP (which make them electron-dense, too), other divalent ions, such as Mg, Zn, Mn etc, and high concentrations of basic amino acids. Thus, they clearly have an acidocalcisome-like character. In addition, they have hydrolytic, lysosome-like functions and, depending on the strain background, they can be larger than acidocalcisomes described e.g. in protists. We will elaborate this point, which is obvious to us but probably not to most readers, in the revised version.

      Reviewer #3 (Public review): 

      Bru et al. investigated how inorganic phosphate (Pi) is buffered in cells using S. cerevisiae as a model. Pi is stored in cells in the form of polyphosphates in acidocalcisomes. In S. cerevisiae, the vacuole, which is the yeast lysosome, also fulfills the function of Pi storage organelle. Therefore, yeast is an ideal system to study Pi storage and mobilization. 

      They can recapitulate in their previously established system, using isolated yeast vacuoles, findings from their own and other groups. They integrate the available data and propose a working model of feedback loops to control the level of Pi on the cellular level. 

      This is a solid study, in which the biological significance of their findings is not entirely clear. The data analysis and statistical significance need to be improved and included, respectively. The manuscript would have benefited from rigorously testing the model, which would also have increased the impact of the study.

      It is not clear to us what the reviewer would see as a more rigorous test of the model.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public review):

      Summary:

      Praegel et al. explore the differences in learning an auditory discrimination task between adolescent and adult mice. Using freely-moving (Educage) and head-fixed paradigms, they compare behavioral performance and neuronal responses over the course of learning. The mice were initially trained for seven days on an easy pure frequency tone Go/No-go task (frequency difference of one octave), followed by seven days of a harder version (frequency difference of 0.25 octave). While adolescents and adults showed similar performance on the easy task, adults performed significantly better on the harder task. Quantifying the lick bias of both groups, the authors then argue that the difference in performance is not due to a difference in perception, but rather to a difference in cognitive control. The authors then used neuropixel recordings across 4 auditory cortical regions to quantify the neuronal activity related to the behavior. At the single cell level, the data shows earlier stimulus-related discrimination for adults compared to adolescents in both the easy and hard tasks. At the neuronal population level, adults displayed a higher decoding accuracy and lower onset latency in the hard task as compared to adolescents. Such differences were not only due to learning, but also to age as concluded from recordings in novice mice. After learning, neuronal tuning properties had changed in adults but not in adolescent. Overall, the differences between adolescent and adult neuronal data correlates with the behavior results in showing that learning a difficult task is more challenging for younger mice.

      Strengths:

      The behavioral task is well designed, with the comparison of easy and difficult tasks allowing for a refined conclusion regarding learning across age. The experiments with optogenetics and novice mice are completing the research question in a convincing way.

      The analysis, including the systematic comparison of task performance across the two age groups, is most interesting, and reveals differences in learning (or learning strategies?) that are compelling.

      Neuronal recording during both behavioral training and passive sound exposure is particularly powerful, and allows interesting conclusions.

      Weaknesses:

      The presentation of the paper must be strengthened. Inconsistencies, missing information or confusing descriptions should be fixed.

      We have carefully re-read the manuscript and reviewed it for inconsistencies. We made several corrections in the figures. For example, we removed redundant lines from violin plots and statistics, applied consistent labels, matched y- and x-limits of graphics, and adjusted labels. We also clarified descriptions of some experiment by adding explanations to the text.

      The recording electrodes cover regions in the primary and secondary cortices. It is well known that these two regions process sounds quite differently (for example, one has tonotopy, the other not), and separating recordings from both regions is important to conclude anything about sound representations. The authors show that the conclusions are the same across regions for Figure 4, but is it also the case for the subsequent analysis? Comparing to the original manuscript, the authors have now done the analysis for AuDp and AUDv separately, and say that the differences are similar in both regions. The data however shows that this is not the case (Fig S7). And even if it were the case, how would it compatible with the published literature?

      To address this and previous concerns about regional differences, the manuscript now includes 4 figures (4-1, 4-3, 6-2, 7-1) and 5 supplemental tables (3,4, 5, 6, 8) that explicitly compare results across brain regions.

      Following the reviewer’s request for subsequent analysis, we now added a new supplemental figure (Fig. S6-2) and two new supplementary tables (Tables S5, S6). We show that similar to expert mice (supplementary Table 3, and supplementary Table 4), the firing properties of adolescent and adult novice mice differ across auditory subregions (supplementary Table 5). We also show that the different auditory subregions have different firing properties (supplementary Table 6). With respect to task engagement, we show that (similar to Fig. S4-2) the neuronal discriminability in different auditory subregions is similar in both novice and expert mice (Fig. S6-2).

      Following the comment on Fig. S7-1, we made three changes to the revised manuscript. First, we now highlight that the differences firing properties between adolescent and adult neurons in AUDp and AUDv were distinct, but not significantly different within age-group comparisons. Second, we clearly state that the learning related changes in the measured parameters are different between AUDp and AUDv. Note, however, the greater changes in adult neurons after learning remains consistent between AUDp and AUDv. Third, we softened our original claim but still highlighted the stronger learning-induced plasticity in adults.

      Regarding the concern that different regions should show different patterns due to their known differences (e.g. tonotopy). Of course we agree that different areas differ functionally (as shown in our own previous work and here as well). However, it is still plausible, and biologically reasonable, that developmental changes may proceed in a similar direction across different areas, even if their baseline coding properties differ.

      Reviewer #2 (Public review):

      Summary:

      The authors aimed to find out how and how well adult and adolescent mice discriminate tones of different frequencies and whether there are differences in processing at the level of the auditory cortex that might explain differences in behavior between the two groups. Adolescent mice were found to be worse at sound frequency discrimination than adult mice. The performance difference between the groups was most pronounced when the sounds are close in frequency and thus difficult to distinguish and could, at least in part, be attributed to the younger mice' inability to withhold licking in no-go trials. By recording the activity of individual neurons in the auditory cortex when mice performed the task or were passively listening as well as in untrained mice the authors identified differences in the way that the adult and adolescent brains encode sounds and the animals' choice that could potentially contribute to the differences in behavior.

      Strengths:

      The study combines behavioural testing in freely-moving and head-fixed mice, optogenetic manipulation and high density electrophysiological recordings in behaving mice to address important open questions about age differences in sound-guided behavior and sound representation in the auditory cortex.

      Weaknesses:

      For some of the analyses that the authors conducted it is unclear what the rationale behind them is and, consequently, what conclusion we can draw from them.

      We have carefully re-read the manuscript and reviewed it for analyses that lacked a clear rationale or conclusion. To address this, we have made several changes to clarify the reasoning and strengthen the interpretation of the results.

      Reviewer #1 (Recommendations for the authors):

      It would have helped if the authors had highlighted the changes they made to the manuscript compared to the original version - especially since many replies to the reviewers' comments were as vague as "...we fixed some of the wording so it adheres to the data shown", or "we refined our interpretation", without further details.

      The revised version has improved substantially, and the main claims have been discussed in a more objective way. Important new analyses have been added to allow for a refined interpretation of the results. However, the presentation of the data could still be strengthened significantly (in response to comment A from last review).

      We apologize for the lack of detail in some of our previous responses. Our intention was to keep the replies concise, assuming that the side-by-side version with tracked changes would make the edits sufficiently clear. However, we understand the need for greater transparency. Thus, below we provide the following five lists describing the major changes: (1) List of specific reviewer recommendations, (2) list of corrections in figures, (3) list of clarity issues, (4) list of fixed mistakes, (5) list of new figures. We hope this breakdown makes the revisions clearer and more accessible.

      List of specific reviewer recommendations:

      l.108 mentions a significant change in the vertical line of Fig 1F - Could this significance be indicated and quantified in the figure?

      We quantified and indicated the significance of the vertical line in Fig. 1f and Fig. 1i.

      Fig.1G - the thick and thin lines should be defined, as well as the grey and white dots (same values for adolescents, not for adults).

      (a) We removed the thin inner lines from the violin plot. We define the bar (thick line) of the violin plot in an additional sentence in the methods section under data analysis (LL820-823). b) We adjusted the marker outlines in the adult data (Fig. 1G).

      the figure axis legends should be consistent (trails in Fig D vs # trails in Fig 1F)

      We adjusted the axis legend to # trials in Fig. 1D.

      l.110: is d' always calculated based on the 100 last trials of a session, or is it just for Figure 1F? -etc...

      d’ is always calculated based on the last 100 trials. To clarify this, we added a description in the methods section (L830).

      List of corrections in the figures:

      (1) We removed the internal lines from violin plots in throughout Fig. 1-7.

      (2) We removed the underline of the statistics throughout Fig. 1-7.

      (3) We consistently applied ‘adolescent’ and ‘adult’ figure labels and titles with lowercase letters throughout Fig. 1-7.

      (4) We applied consistent labelling of ‘time (ms)’ throughout Fig. 1-7.

      (5) We matched the size of dashed lines throughout Fig. 1-6.

      (6) We adjusted the x-label of Fig. 1d, Fig. S-1-1 a, Fig. 3c, Fig. 3h-i, Fig, 4d to ‘# trials’.

      (7) We removed the x-label of ‘Experimental Group’ from Fig. 1 to enhance consistency with other figures.

      (8) We removed misaligned dots from the violin plots in Fig. 1g, Fig. 2f, Fig. 3f,g.

      (9) We corrected the plot in Fig. S1-1b.

      (10) We adjusted the y-limits of Fig. S1-1c to be consistent with Fig. S1-1d,e.

      (11) We adjusted the x-labels and y-labels of Fig, 2, Fig. S3-1, Fig, S3-2 and Fig. 3b to ‘freq. (kHz)’.

      (12) We added the age of adolescent and adult mice to the schematic timeline in Fig. 2a.

      (13) We added a label of the reinforcement delay to the schematic trial structure in Fig. 3b.

      (14) We added within-group statistics to Fig. 3e and the figure legend.

      (15) We adjusted the x-label of Fig. 3d to ‘# sessions’.

      (16) We adjusted the x-label of Fig. 3d and Fig. S3-1b to ‘# licks’.

      (17) We changed the y-label in Fig. S3-1a, and Fig. S3-2d, e to ‘lick ratio’ to avoid confusion with the lick rate (Hz) that was calculated in Fig. 4 and Fig. 6.

      (18) We replaced the titles ‘CAMKII’ with ‘dTomato’ in Fig. S3-2 to correctly highlight that both the experimental and control injection were CAMKII injections.

      (19) We adjusted the x-labels and y-labels of Fig, 2, Fig. S3-1, Fig, S3-2 and Fig. 3b to ‘freq. (kHz)’.

      (20) We adjusted the y-label of Fig. S4-1c to ‘# neurons’.

      (21) We matched the x-ticks in Fig. 4e,f.

      (22) We matched the x-ticks in Fig. 6d-g.

      (23) We changed the x-label in Fig. 4g, S4-2 and S6-2 to ‘duration (ms)’ to match the figure label with the manuscript.

      (24) We consistently label ‘Hit’, ‘Miss’, ‘FA’ and ‘CR’ with capital letters in Fig. 4d-e.

      (25) We replaced the double figure label ‘C.’ in Fig. S4-2 with ‘D.’.

      (26) We adjusted the dot-size in Fig. 5 to be equal for all graphs.

      (27) We added ticks to the experimental timeline in Fig. 6a.

      (28) We corrected the y-label in Fig.7c. Now it correctly reflects 5 attenuations from 72-32 dB SPL.

      (29) We matched the y-label of Fig. 7e-h and Fig. S7-1.

      List of clarity issues:

      (1) We replaced the term ‘lower response bias’ with ‘higher lick bias’ (L24) to accurately describe the more negative (lower) criterion-bias, which highlights a higher tendency to lick.

      (2) We replaced the term ‘response bias’ with ‘lick bias’ to consistently describe the calculated criterion-bias (L24, L149, L164, L455, L456, L468).

      (3) We clarify that the age-related differences were ‘more pronounced’ instead of simply ‘higher’ to accurately reflect not simply the increase in adolescent lick-bias, but also the decrease in adult lick-bias (L31).

      (4) We clarified that adolescent sound representations are not merely ’distinct’, but ‘not fully mature’ in L83.

      (5) We clarified in L180 that the impulsive responses we observed in adolescent mice could be related to being ‘less impacted by punishments’.

      (6) We clarified the differences in firing properties of auditory sub-regions analyzed in Supplementary Table 3 (L287-295).

      (7) We explained and clarified the reference to Fig. 3j (LL252-253).

      (8) We added statistics to Fig.S4-2 to support our claim that there are no differences in the onset-latency, duration of discriminability and maximal discriminability between different sub-regions within age-groups (LL 314-315).

      (9) We expanded our explanation of the results in Table 3 (LL370-379).

      (10) We separated the reference to Fig. 6b and Fig. 6c to clarify their meaning (LL358-361).

      (11) We clarified the differences in basic firing properties during the FRA protocol in Fig. 7 (LL409-418).

      (12) We expanded our explanation of the differences of the learning related firing properties in AUDp and AUDv of Fig. S7-1 (LL426-433).

      (13) We changed the term ‘plasticity profiles’ to ‘learning related plasticity’ to further clarify our limitation that L5/6 and L2/3 may exhibit distinct learning related changes (L496).

      (14) We changed the term ‘sluggish’ (L481) to ‘delayed’ to more precisely explain differences between adolescent and adult tuning properties.

      (15) We clarified that the running d’ was calculated in bins of 25 trials, instead of ‘the last 25 trials’ (LL845-846).

      List of fixed mistakes:

      (1) We corrected and matched the age to more accurately reflect the age mice were recorded (P37-42 and P77-82).

      (2) We corrected the attenuation range from 72-42 to 72-32 dB SPL to correctly reflect the 5 attenuations used in the protocol.

      (3) We corrected the number of channels shown in the voltage trace from 10 to 11 (Fig. S4-1a)

      (4) We corrected the number of neurons recorded in novice adolescent mice in the legend of Fig. 6 from 140 to 130 (Fig. 6b).

      (5) We removed redundant, or double brackets, commas, dots, and semi-colons in the figure legends.

      (6) We corrected the LME statistics Table 2.

      List of new figures and tables:

      (1) We added a new supplementary figure to accompany Figure 6. Specifically, Fig. S6-2, shows the interaction of the three measured discriminability properties (onset delay, duration of discriminability, and maximal discriminability) in novice compared to expert mice in the easy and hard task (Go compared to No Go). The figure compares the different auditory sub-regions (similar to Fig. S4-2). We show that the discriminability properties within different groups is not significantly different among the four different sub-regions.

      (2) Supplementary Table 5: We compared the firing properties in different auditory subregions in novice mice, and found (similar to expert mice) that the firing properties differ between adult and adolescent mice across the four different sub-regions.

      (3) Supplementary Table 6: We compared the firing properties between different subregions, separately for adolescent and adult novice mice. Similar to expert mice, we found that different auditory subregions differ in their auditory firing properties.

      Reviewer #2 (Recommendations for the authors):

      The authors largely addressed my suggestions.

      Comparing hit vs correct rejection trials in the population decoding analysis (L313-314): The authors acknowledge that comparing these two trial types conflates choice and stimulus decoding but I am not convinced that the changes to the manuscript text make this clear enough to the reader.

      Thank you for pointing this out. We have made additional revisions to clarify this, and other issues more explicitly, as follows:

      (1) We have expanded the explanation of how our population decoding analysis conflates stimulus and choice, and we acknowledge the limitations of this approach in the Abstract (L28), the Results section (L324-326, LL367-370) and the Discussion (LL516-519).

      (2) We replaced the analysis of impulsivity on the head-fixed task. Instead of analyzing all it is, we focus only on ITIs following FA trials (Fig. S3-1c,d). This is more consistent with the analysis in the Educage (Fig. S2-1), where we show that adolescents exhibit increased impulsivity after FA trials. We found a similar result for ITIs following FA trials in the head-fixed task.

      (3) To provide complementary insight, we now further justify our use of the Fisher separation metric alongside decoding accuracy in Figure 5, with a clearer rationale provided in LL343-345

      (4) We also clarified our reasoning for focusing on 62 dB SPL in the FRA-based analysis in LL400-403.

    1. Author response:

      We thank the reviewers and editors for their careful and constructive assessment of our manuscript. We have provided a provisional response to the eLife assessment and the reviewer’s public comments below, addressing their main concerns and outlining our planned revisions that we believe will substantially strengthen our paper.  

      eLife Assessment

      This study presents a valuable finding on the representational structure of task encoding in the prefrontal cortex. The evidence supporting the claims of the authors is solid, representing an impressive data collection effort and best-practice fMRI analyses. However, at least including visual regions as a control and controlling for behavioral differences in the task in representation analyses would have strengthened the study. The work will be of interest to cognitive neuroscientists interested in the neural basis of cognitive control.

      We plan to address both specific methodological weaknesses mentioned in the assessment in our forthcoming revision. First, the revision will include analyses of an early visual cortex ROI as an additional control region, allowing us to test whether the primary auditory cortex findings generalize to the sensory cortex across input modalities. Preliminary results indicate that the early visual cortex ROI exhibits a similar pattern of results, with evidence for coding both task-relevant and task-irrelevant visual dimensions across both tasks, as well as the context dimension specifically in the hierarchy task. Second, we will include behavioral performance as a covariate for the relevant statistical comparison across tasks to mitigate concerns over performance-related confounds. In addition, we will include a set of control analyses that demonstrate that equating the amount of data for pattern analyses across the two tasks by subsampling from the hierarchy task, while reducing our overall power, does not appreciably alter our results. We note that our analyses of representational geometries relied only on neural data from correct trials and, in the first-level modelling of the fMRI data, already controlled for differences in trial-by-trial response times. Therefore, our analyses of decoding and representation similarity are not directly affected by differences in performance across the two tasks. Finally, we have provided clarifications regarding Reviewer 2’s questions about the size and construction of the regions of interest employed in the study, as well as about the language employed to discuss null results.  

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Bhandari and colleagues present tour-de-force analyses that compare the representational geometry in the lateral prefrontal cortex and primary auditory cortex between two complex cognitive control tasks, with one having a "flat" structure where subjects are asked to form rote memory of all the stimulus-action mappings in the task and one having a "hierarchical" task structure that allows clustering of task conditions and that renders certain stimulus dimensions irrelevant for choices. They discovered that the lPFC geometry is high-dimensional in nature in that it allows above-chance separation between different dichotomies of task conditions. The separability is significantly higher for task-relevant features than task-irrelevant ones. They also found task features that are represented in an "abstract" format (e.g., audio features), i.e., the neural representation generalizes across specific task conditions that share this variable. The neural patterns in lPFC are highly relevant for behaviors as they are correlated with subjects' reaction times and choices.

      Strengths:

      Typically, geometry in coding patterns is reflected in single-unit firings; this manuscript demonstrates that such geometry can be recovered using fMRI BOLD signals, which is both surprising and important. The tasks are well designed and powerful in revealing the differences in neural geometry, and analyses are all done in a rigorous way. I am thus very enthusiastic about this paper and identify no major issues.

      I am curious about the consequence of dimensionality collapse in lPFC. The authors propose a very interesting idea that separability is critical for cognitive control; indeed, separability is high for task-relevant information. What happens when task-relevant separation is low or task-irrelevant separation is high, and will this lead to behavioral errors? Maybe a difference score between the separability of task-relevant and taskirrelevant features is a signature of the strength of cognitive control?

      We appreciate the reviewers’ positive evaluation of our paper.

      Weaknesses:

      The authors show a difference between flat and hierarchical tasks, but the two tasks are different in accuracy, with the flat task having more errors. Will this difference in task difficulty/errors contribute to the task differences in results reported?

      To address the Reviewer’s concern about the difference in behavioural performance between the two tasks influencing our results, we will take several approaches. First, we will include behavioral performance as a covariate for the relevant statistical comparison across tasks. This should ensure that any differences we observe across tasks are over and above those that can be explained by the difference in behavioral performance. Second, we will include a set of decoding analyses that control for differences in performance across the tasks. We note that all our analyses of representational geometries relied on neural data from correct trials only. In addition, the first-level modelling of the fMRI data already controlled for trial-by-trial variability in response times. Therefore, our decoding and representation similarity analyses should not directly be affected by differences in performance across the two tasks. However, one possible issue with this approach is that the larger number of errors in the flat task means that less data was available for estimating multivoxel patterns in the flat task compared to the hierarchy task, resulting in differential power to detect decoding effects across the two tasks. We note that the on average, this difference was not substantial: on average, 21.7 runs were available per participant for the flat task, while 23.8 runs per participant were available for the hierarchy task. Moreover, rerunning our analyses with the number of runs equated for each participant does not meaningfully alter the pattern of results. These additional analyses will be included in the supplement in the forthcoming revised manuscript.  

      Reviewer #2 (Public review):

      Summary:

      The authors study the influence of tasks on the representational geometry of the lPFC and auditory cortex (AC). In particular, they use two context-dependent tasks: a task with a hierarchical structure and a task with a flat structure, in which each context/stimulus maps to a specific response. Their primary finding is that the representational geometry in the lPFC, in contrast to AC, aligns with the optimal organization of the task. They conclude that the geometry of representations adapts, or is tailored, to the task in the lPFC, therefore supporting control processes.

      Strengths:

      (1) Dataset:

      The dataset is impressive and well-sampled. Having data from both tasks collected in the same subjects is a great property. If it is publicly available, it will be a significant contribution to the community.

      (2) Choice of methods:

      The choice of analyses are largely well-suited towards the questions at hand - crosscondition generalization, RSA + regression, in combination with ANOVAs, are well-suited to characterizing task representations.

      (3) I found some of their results, in particular, those presented in Figures 4 and 5, to be particularly compelling.

      (4) The correlation analysis with behavior is also a nice result.

      We thank the reviewer for noting the strengths of the paper. We respond to the weaknesses noted below. 

      Weaknesses:

      (1) Choice of ROIs:

      A strength of fMRI is its spatial coverage of the whole brain. In this study, however, the authors focus on only two ROIs: the lPFC and auditory cortex. Though I understand the justification for choosing lPFC from decades of research, the choice of AC as a control feels somewhat arbitrary - AC is known to have worse SNR in fMRI data, and limiting a 'control' to a single region seems arbitrary. For example, why not also include visual regions, given that the task also involves two visual features?

      We agree with the reviewer that the whole-brain fMRI data certainly provide ample opportunities to explore the nature of these representations across the brain. Our focus in this paper is squarely on the principles of coding and flexibility in the lPFC. We believe that a whole-brain exploration addresses a separate question that would be out of the scope of this study. To clarify, we are not arguing that the lPFC is the only region in the brain that employs the coding principles that our study brings to light. Our contention is only that lPFC employs these principles, and it differs at least from the primary sensory cortex. The questions of whether these principles generalize beyond lPFC (quite likely) and, if so, how broadly, are distinct from the ones addressed in the manuscript. We intend to follow up with another manuscript that addresses these questions.

      Nevertheless, given the focus of this paper, we agree that a second control region, which allows one to test if the primary auditory cortex findings generalize to the sensory cortex more broadly, would strengthen our claims. We will include an early visual cortex ROI in our forthcoming revision. Preliminary results indicate that the early visual cortex ROI shows a similar set of findings – with evidence for coding of task-relevant and taskirrelevant visual dimensions across both tasks, but also specifically the context dimension in the hierarchy task. These results will be detailed in the forthcoming revision

      (2) Construction of ROIs:

      The choice and construction of the ROIs feel a bit arbitrary, as the lPFC region was constructed out of 10 parcels from Schaefer, while the AC was constructed from a different methodology (neurosynth). Did both parcels have the same number of voxels/vertices? It would be helpful to include a visualization of these masks as a figure.

      We defined the lPFC ROIs by selecting Schaefer parcels in the frontal lobe that were previously mapped onto the Control A resting state network identified by Yeo et al. (2011). This network aligns with the multiple-demand network, which has also been identified in the macaque, where it includes the lPFC regions that abut the principal sulcus. Prior results from these regions in the monkey brain provide the scientific premise for our hypotheses. The two lPFC ROIs in each hemisphere were constructed out of 5 Schaefer parcels in each hemisphere. These parcels cluster into the same functional network and tend to behave similarly in univariate analyses. Given that our hypotheses do not distinguish between the different parcels, we elected to improve power by merging them into left and right dlPFC ROIs. 

      On the other hand, the same approach could not be used to identify the primary auditory cortex. As Yeo et al. noted in their paper, the 17 resting state networks they identify did not adequately parcellate somatomotor and auditory cortices into distinct networks, likely due to their proximity (see Fig 14 and related text in Yeo et al. (2011)). We therefore relied on a different approach to define the primary auditory cortex, using an association test in Neurosynth to obtain a map of regions associated with the term “primary auditory”. In the revised manuscript, we will also include a primary auditory cortex ROI, defined again using a term-based association test in Neurosynth.

      Our lPFC ROIs and pAC ROIs are of similar size. In the left hemisphere, the lPFC ROI (constructed from merging Schaefer parcels 128-thru-132) has, on average, 624.55 voxels. The left pAC ROI (defined with Neurosynth) has, on average, 628 voxels. In the right hemisphere, the lPFC ROI (constructed from merging Schaefer parcels 330-thru334), has 470.8 voxels on average. The right pAC ROI has, on average, 568 voxels. A table reporting the size of our parcels and ROIs was included in the supplement. In our forthcoming revision, we will additionally include a supplementary figure visualizing the ROI masks. 

      (3) Task dimensionality:

      In some ways, the main findings - that representation dimensionality is tailored to the task - seem to obviously follow from the choice of two tasks, particularly from a normative modeling perspective. For example, the flat task is effectively a memorization task, and is incompressible in the sense that there are no heuristics to solve it. In contrast, the hierarchical task can have several strategies, an uncompressed (memorized) strategy, and a compressed strategy. This is analogous to other studies evaluating representations during 'rich' vs. 'lazy'/kernel learning in ANNs. However, it seems unlikely (if not impossible) to form a 'rich' representation in the flat task. Posed another way, the flat task will always necessarily have a higher dimensionality than the hierarchical task. Thus, is their hypothesis - that representational geometry is tailored to the task - actually falsifiable? I understand the authors posit alternative hypotheses, e.g., "a fully compressed global axis with no separation among individual stimulus inputs could support responding [in the flat task]" (p. 36). But is this a realistic outcome, for example, in the space of all possible computational models performing this task? I understand that directly addressing this comment is challenging (without additional data collection or modeling work), but perhaps some additional discussion around this would be helpful.

      We thank the reviewer for this comment, which gives us a chance to clarify our argument.

      As noted by the reviewer, whether a network takes advantage of the compressibility of a task depends on its learning regime (i.e. rich vs lazy). One way to frame our question regarding the lPFC’s coding strategy, then, is to ask whether it operates in a rich or a lazy learning regime (which would predict, respectively, task-tailored vs task-agnostic representations). The reviewer’s concern is that the two task structures we employed are differentially compressible, and therefore, it is inevitable that we observe tailored representations and therefore, our hypotheses are not falsifiable.

      First, it is important to clarify the theoretical premise behind our design and how it relates logically to our hypotheses. Under a lazy learning regime, a network would encode highdimensional representations of both tasks, regardless of their compressibility. On the other hand, under a rich learning regime, representational dimensionality will likely be shaped by the tasks’ structure. If the two tasks differ in their compressibility, only in the rich learning regime would the network learn representations of different dimensionality. Therefore, observing representations with dimensionality tailored to the task structure rules out the possibility that the lPFC is operating in a lazy regime. Therefore, the hypotheses are certainly testable.

      The second point of clarification is that, contrary to the reviewer’s assertion, the flat task is, in fact, compressible – the task can be solved with a categorical representation of the response categories, with no sensitivity to the different specific stimuli within each category. Indeed, it is possible to train a simple, three-layer feedforward artificial neural network to perform the flat task perfectly with only 2 units in the hidden layer, demonstrating this compressibility. While we agree with the reviewer that in the space of all possible architectures one might consider the two tasks may differ in compressibility, particularly at the local levels, as we noted above, this does not imply that our hypotheses are not testable.

      Finally, as a third point of clarification, our focus in this paper is on understanding the nature of coding in the lPFC in particular. Arguments based on a normative modelling perspective properly apply to the representations learned by an agent (such as an ANN or a human) as a whole. In a minimal feedforward ANN with a single hidden layer trained in a regime which encourages compression (i.e. a rich learning regime), it would indeed be the case that the representational dimensionality in that hidden layer would be higher for less compressible tasks. However, when applied to humans, such an argument applies to the brain as a whole rather than to an individual region of the brain like the lPFC. As such, it is less straightforward to predict how a single region might represent a task without additional information about the region’s inputs, outputs and broader position in a network. Even for a highly compressible task, a particular brain region may nevertheless be sensitive to all task dimensions. Conversely, even when a task is not compressible, a particular population within the brain may be invariant to some task features. For example, the primary auditory cortex is expected to be invariant to visual task dimensions.

      Therefore, how a task is represented in the lPFC in particular (as opposed to the whole brain) depends on its computational function and coding principles, which remain debated. For instance, as some accounts (such as the guided activation theory) posit, if the primary function of the lPFC is to encode ‘context’ and shape downstream processing based on context, we might only expect to see the abstract coding of the auditory context in the hierarchy task (and, perhaps, the response categories across both tasks as they encode the ’context’ for the lower-level response decision), while being invariant to lowerlevel features of the input. In our paper, we specifically contrast two accounts of lPFC coding that have emerged in the literature – one positing that the lPFC learns a representation tailored to the structure of the task, and another that the lPFC encodes a high-dimensional representation that privileges sensitivity to many task features and their non-linear mixture at the cost of generalization. Regardless of the compressibility of the tasks in question, how the lPFC encodes the two tasks is an empirical question.

      In our forthcoming revision, we will clarify these points in the discussion. We will also include the results of neural network simulations alluded to above.

      (4) Related to the above:

      The authors have a section on p. 27: "Local structure of lPFC representational geometry of the flat task shows high separability with no evidence for abstraction" - I understand a generalization analysis can be done in the feature space, but in practice, the fact that the flat task doubles as a memorization task implies that there are no useful abstractions, so it seems to trivially follow that there would be no abstract representations. In fact, the use of task abstractions in the stimulus space would be detrimental to task performance here. I could understand the use of this analysis as a control, but the phrasing of this section seems to indicate that this is a surprising result.

      As explained above, there is no need for high local separability in the flat task. The lPFC could have completely abstracted over the individual trial-types that contributed to each response category, encoding only the response categories. Indeed, as also noted above, it is possible to train a simple, three-layer feedforward artificial neural network to perform the flat task perfectly with only 2 units in the hidden layer. The two hidden layer units code for each of the two response categories. 

      (5) Statistical inferences:

      Throughout the manuscript, the authors appear to conflate failure to reject the null with acceptance of the null. For example, p. 24: "However, unlike left lPFC, paired t-tests showed no reliable difference in the separability of the task-relevant features vs the orthogonal, task-irrelevant features... Therefore, the overall separability of pAC representations is not shaped by either task-relevance of task structure."

      We thank the reviewer for pointing these out. These sentences will be corrected in the revision. For instance, the sentence above will be modified to “Therefore, we find no evidence that the overall separability of pAC representations is shaped by either taskrelevance or task structure.”

      Reviewer #3 (Public review):

      Summary:

      In this paper, Bhandari, Keglovits, et al. explore the representational structure of task encoding in the lateral prefrontal cortex. Through an impressive fMRI data-collection effort, they compare and contrast neural representations across tasks with different highlevel stimulus-response structures. They find that the lateral prefrontal cortex shows enhanced encoding of task-relevant information, but that most of these representations do not generalize across conditions (i.e., have low abstraction). This appears to be driven in part by the representation of task conditions being clustered by the higher-order task properties ('global' representations), with poor generalization across these clusters ('local' representations). Overall, this paper provides an interesting account of how task representations are encoded in the PFC.

      Strengths:

      (1) Impressive dataset, which may provide further opportunities for investigating prefrontal representations.

      (2) Clever task design, allowing the authors to confound several features within a complex paradigm.

      (3) Best-practice analysis for decoding, similarity analyses, and assessments of representational geometry.

      (4) Extensive analyses to quantify the structure of PFC task representations.

      Weaknesses:

      (1) The paper would benefit from improved presentational clarity: more scaffolding of design and analysis decisions, clearer grounding to understand the high-level interpretations of the analyses (e.g., context, cluster, abstraction), and better visualizations of the key findings.

      (2) The paper would benefit from stronger theoretical motivation for the experimental design, as well as a refined discussion on the implications of these findings for theories of cognitive control.

      We thank the reviewer for highlighting the strengths of our paper and their feedback on the writing. We have reviewed these helpful suggestions with an eye to which we may implement in our revision to improve clarity. Our forthcoming revision will 1) provide clearer scaffolding to aid the reader in understanding our design, analyses and our interpretation of the results 2) incorporate the MDS-based visualization of the representational geometries, which is currently presented in the Supplement, as a figure panel in the main text, 3) provide a justification for the particular task structures we picked in the introduction and 4) incorporate a new paragraph in the Discussion section to highlight the implications of our findings for cognitive control.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review): 

      This paper describes technically-impressive measurements of calcium signals near synaptic ribbons in goldfish bipolar cells. The data presented provides high spatial and temporal resolution information about calcium concentrations along the ribbon at various distances from the site of entry at the plasma membrane. This is important information. Important gaps in the data presented mean that the evidence for the main conclusions is currently inadequate. 

      Strengths 

      The technical aspects of the measurements are impressive. The authors use calcium indicators bound to the ribbon and high speed line scans to resolve changes with a spatial resolution of ~250 nm and temporal resolution of less than 10 ms. These spatial and temporal scales are much closer to those relevant for vesicle release than previous measurements. 

      The use of calcium indicators with very different affinities and of different intracellular calcium buffers helps provide confirmation of key results. 

      Thank you very much for this positive evaluation of our work.

      Weaknesses 

      Multiple key points of the paper lack a statistical test or summary data from populations of cells. For example, the text states that the proximal and distal calcium kinetics in Figure 2A differ. This is not clear from the inset to Figure 2A - where the traces look like scaled versions of each other. Values for time to half-maximal peak fluorescence are given for one example cell but no statistics or summary are provided. Figure 8 shows examples from one cell with no summary data. This issue comes up in other places as well. 

      Thank you for this fair and valuable feedback. Following also the suggestion by the Editor, we have now removed the rise-time kinetic fitting results from the manuscript and only retain the bi-exponential decay time constant values. Further, we explicitly detail the issues with kinetic fitting, and state that the precise quantitative conclusions should not be drawn from the differences in kinetic parameters (pages 7 and 2728). 

      We have included the results of paired-t-tests to compare the amplitudes of proximal vs. distal calcium signals shown in Fig. 2A & B, Fig. 3C & D, Fig. 4C & D, Fig. 5A-D, and Fig. 8E&F. Because proximal and distal calcium signals were obtained from the same ribbons within 500-nm distances, as the Reviewer pointed out, “the traces look like scaled versions of each other”. For experiments where we make comparisons across cells or different calcium indicators, as shown in Fig. 3E & F, Fig.5E, and Fig. 8B&C, we have included the results of an unpaired t-test. We have also included the t-test statistics information in the respective figure legends in the revised version.

      In Figure 8, we have shown example fluorescence traces from two different cells at the bottom of the A panel, and example traces from different ribbons of RBC a in the D, and the summary data is described in B-C and E-F, with statistics provided in the figure legends.

      The rise time measurements in Figure 2 are very different for low and high affinity indicators, but no explanation is given for this difference. Similarly, the measurements of peak calcium concentration in Figure 4 are very different with the two indicators. That might suggest that the high affinity indicator is strongly saturated, which raises concerns about whether that is impacting the kinetic measurements. 

      Yes, we do believe that the high-affinity indicator is partially saturated, and therefore, the measurement with the low-affinity indicator dye is a more accurate reflection of the measured Ca<sup>2+</sup> signal. We now state this more explicitly in the text. Further, we note that the rise time values are no longer listed due to lack of statistical significance for such comparisons, as noted above.

      Reviewer #2 (Public review): 

      Summary: 

      The study introduces new tools for measuring intracellular Ca2+ concentration gradients around retinal rod bipolar cell (rbc) synaptic ribbons. This is done by comparing the Ca2+ profiles measured with mobile Ca2+ indicator dyes versus ribbon-tethered (immobile) Ca2+ indicator dyes. The Ca2+ imaging results provide a straightforward demonstration of Ca2+ gradients around the ribbon and validate their experimental strategy. This experimental work is complemented by a coherent, open-source, computational model that successfully describes changes in Ca2+ domains as a function of Ca2+ buffering. In addition, the authors try to demonstrate that there is heterogeneity among synaptic ribbons within an individual rbc terminal. 

      Strengths: 

      The study introduces a new set of tools for estimating Ca2+ concentration gradients at ribbon AZs, and the experimental results are accompanied by an open-source, computational model that nicely describes Ca2+ buffering at the rbc synaptic ribbon. In addition, the dissociated retinal preparation remains a valuable approach for studying ribbon synapses. Lastly, excellent EM. 

      Thank you very much for this positive evaluation of our work.

      Comments on revisions: 

      Specific minor comments: 

      (1) Rewrite the final sentence of the Abstract. It is difficult to understand. 

      Thank you for pointing that out. We have updated the final sentence of the Abstract.

      (2) Add a definition in the Introduction (and revisit in the Discussion) that delineates between micro- and nano-domain. A practical approach would be to round up and round down. If you round up from 0.6 um, then it is microdomain which means ~ 1 um or higher. Likewise, round down from 0.3 um to nanodomain? If you are using confocal, or even STED, the resolution for Ca imaging will be in the 100 to 300 nm range. The point of your study is that your new immobile Ca2-ribbon indicator may actually be operating on a tens of nm scale: nanophysiology. The Results are clearly written in a way that acknowledges this point but maybe make such a "definition" comment in the intro/discussion in order to: 1) demonstrate the power of the new Ca2+ indicator to resolve signals at the base of the ribbon (effectively nano), and 2) (Discussion) to acknowledge that some are achieving nanoscopic resolution (50 to 100nm?) with light microscopy (as you ref'd Neef et al., 2018 Nat Comm).  

      Thank you for the valuable comments. We have now provided this information in the introduction and discussion.  

      (3) Suggested reference: Grabner et al. 2022 (Sci Adv, Supp video 13, and Fig S5). Here rod Cav channels are shown to be expressed on both sides the ribbon, at its base, and they are within nanometers from other AZ proteins. This agrees with the conclusions from your imaging work.  

      Thank you for the valuable suggestion. We have now provided this information in the introduction and discussion.

      (4) In the Discussion, add a little more context to what is known about synaptic transmission in the outer and inner retina.. First, state that the postsynaptic receptors (for example: mGluR6-OnBCs vs KARs-OffBCs, vs. AMPAR-HCs), and possibly the synaptic cleft (ground squirrel), are known to have a significant impact on signaling in the outer retina. In the inner retina, there are many more unknowns. For example, when I think of the pioneering Palmer JPhysio study, which you sight, I think of NMDAR vs AMPAR, and uncertainty in what type postsynaptic cell was patched (GC or AC....). Once you have informed the reader that the postsynapse is known to have a significant impact on signaling, then promote your experimental work that addresses presynaptic processes: "...the new tool and results allow us to explore release heterogeneity, ribbon by ribbon in dissociated preps, which we eventually plan to use at ribbon synapses within slices......to better understand how the presynapse shapes signaling......". 

      Thank you for the valuable comments. We have now provided this information in the introduction and discussion.

      Reviewer #3 (Public review): 

      Summary: 

      In this study, the authors have developed a new Ca indicator conjugated to the peptide, which likely recognizes synaptic ribbons and have measured microdomain Ca near synaptic ribbons at retinal bipolar cells. This interesting approach allows one to measure Ca close to transmitter release sites, which may be relevant for synaptic vesicle fusion and replenishment. Though microdomain Ca at the active zone of ribbon synapses has been measured by Hudspeth and Moser, the new study uses the peptide recognizing synaptic ribbons, potentially measuring the Ca concentration relatively proximal to the release sites. 

      Strengths: 

      The study is, in principle, technically well done, and the peptide approach is technically interesting, which allows one to image Ca near the particular protein complexes. The approach is potentially applicable to other types of imaging. 

      Thank you very much for this appreciation.

      Weaknesses: 

      Peptides may not be entirely specific, and genetic approach tagging particular active zone proteins with fluorescent Ca indicator proteins may well be more specific. Although the authors are aware of this and the peptide approach is generally used for ribbon synapses, the authors should be aware of this, when interpreting the results. 

      We acknowledge the reviewer’s point and believe the peptides and genetic approaches to measure local calcium signals have their merits, each with separate advantages and disadvantages.  

      Reviewer #1 (Recommendations for the authors): 

      The revisions helped with some concerns about the original paper, but some issues were not adequately addressed. I have left two primary concerns in my public review. To summarize those: 

      The difference in kinetics of proximal and distal locations is emphasized and quantified in the paper, but the quantification consists of a fit to the average responses. This does not give an idea of whether the difference observed is significant or not. Without an estimate of the error across measurements the difference in kinetic quoted is not interpretable. 

      Thank you for this feedback. Since the kinetics information is a minor part of the manuscript, we have followed the Editor’s advice to significantly tone down the comparison of kinetic fit parameters (completely removing the rise-time comparisons), in order to put more focus on the better-documented conclusions. We also note that we did establish statistical significance of the differences in fluorescence signal amplitudes. 

      Somewhat relatedly, the difference in amplitude and kinetics of the calcium signals measured with low and high affinity indicators is quite concerning. The authors added one sentence stating that the high affinity indicator might be saturated. This is not adequate. Should we distrust the measurements using the high affinity indicator? The differences between the results using the low and high affinity indicators is in some cases large - e.g. larger than the differences cited as a key result between distal and proximal locations. This issue needs to be dealt with directly in the paper. 

      Thank you for this feedback. Yes, the measurements from high-affinity indicators cannot report the Ca2+ as accurately as low-affinity indicators. However, the value of HA indicators is in their ability to detect lowamplitude signals that lower-affinity indicators may miss due to lower signal-to-noise resolution.  We added a sentence on page 12 to further stress this point.

      Related to the point about statistics, it is not clear how to related the horizontal lines in Figure 8 to the actual measurements. It is critical for the evaluation of the conclusions from that figure to understand what is plotted and what the error bars are on the plotted data. 

      We apologize for the earlier ambiguity in Fig. 8. In this figure, we first compare proximal (panel B) and distal (panel C) calcium signals across several RBCs, labeled RBC-a through RBC-d. Each RBC contains multiple ribbons, and for each cell, we present the average calcium signals from multiple ribbons using box plots in panels B and C. In these box plots, the horizontal lines represent the average calcium signal for each cell, while the size of the error bars reflects the variability in proximal and distal calcium signals among the ribbons within that RBC.

      For example, RBC-a had five identifiable ribbons. In panels D–F, we use RBC-a to illustrate the variability in calcium signals across individual ribbons. Specifically, we distinguished proximal and distal calcium signals from five ribbons (ribbons 1–5) within RBC-a. When feasible, we acquired multiple x–t line scans at a single ribbon, shown now as individual data points, to assess variability in calcium signals recorded from the same ribbon.

      The box plots in panels E and F display the average calcium signal (horizontal lines) for each ribbon, based on multiple recordings. These plots demonstrate considerable variability between ribbons of RBC-a. Importantly, the lack of or minimal error bars for repeated measurements at the same ribbon indicates that the proximal and distal calcium signals are consistent within a ribbon. These findings emphasize that the observed variability among ribbons and among cells reflects true biological heterogeneity in local calcium domains, rather than experimental noise.

    1. Author response:

      We thank all three anonymous reviewers for their thoughtful evaluations of our manuscript and for recognizing the conceptual advance in combining agent-based behavioral simulations with systems neuroscience models. We are especially encouraged by the acknowledgement of the framework’s potential to support simulation of neural control of individual animal behavior in realistic sensory environments.

      Below, we respond to each reviewer’s public comments in turn. Throughout, we have aimed to clarify our rationale for modeling choices, acknowledge limitations, and outline concrete steps for improvement in the revised manuscript.

      Furthermore, the call for a better description of the model implementation as voiced by all three reviewers and additional requests from community members has prompted us to formulate a separate technically detailed description of the publicly available larvaworld software package as well as of the readily implemented models in form of a preprint paper (Sakagiannis et al., 2025, bioRxiv, DOI: https://doi.org/10.1101/2025.06.15.659765).

      Reviewer #1:

      We are happy to read that this reviewer considers the proposed behavioral architecture ‘a significant step forward in the field’, and that she/he recognizes the strengths of our work in the modular and hierarchical approach that provides connections to influential theories of motor control in the brain, in the experimental evidence it is based on, and in the valuable abstractions that we have chosen for the larval behavioral modeling.

      The reviewer raises important points about the simplifications we have made, both conceptually and in the specific implementation of larval behaviors. Our main goal in this study is to introduce a conceptual framework that integrates agent-based modeling with systems neuroscience models in a modular fashion. To serve this purpose, we aimed for a minimal yet representative implementation at the motor layer of the architecture, calibrated to larval locomotion kinematics. This choice enables efficient simulation while allowing us to test top-down modulation and adaptive mechanisms in higher layers without the computational overhead of a full neuromechanical model. In addition to chemotaxis, we have recently used this simplified approach to model thermotaxis in larvae (Kafle et al., 2025, iScience, DOI: https://doi.org/10.1016/j.isci.2025.112809).

      The reviewer notes the absence of explicit segmental neuromuscular control or central pattern generators (CPGs). We deliberately abstracted from these mechanisms, representing the larval body as two segments with basic kinematic control, to focus on reproducing overall locomotor patterns. This bisegmental simplification, which we illustrate in Supplemental Video “Bisegmental larva-body simplification”, retains the behavioral features relevant to our current aims. However, the modular structure of the framework means that more detailed neuromechanical models—incorporating CPG dynamics or connectome-derived circuit models—can be integrated in future work without altering the architecture as a whole.

      We fully agree that real neural circuits are more complex than a strict subsumption architecture implies. In the Drosophila larva, there is clear evidence for ascending sensory feedback from the motor periphery to premotor and higher brain circuits, as well as neuromodulatory influences. These add layers of complexity beyond the predominantly descending control in our present model. At the same time, both larval and adult connectome data show that across-level descending and ascending connections are sparse compared to the dense within-layer connectivity. We see value in casting our model as a hierarchical control system precisely to make the strengths and limitations of such an abstraction explicit. The revised manuscript will include further discussion of these points.

      In summary, our design choices reflect a trade-off: by limiting the biological detail in the lower layers, we gain computational efficiency and maintain a clear modular structure that can host models at different levels of abstraction. This ensures that the architecture remains both a tool for immediate behavioral simulation and a scaffold for integrating richer neural and biomechanical models as they become available.

      Reviewer #2:

      We thank the reviewer for recognizing the novelty of our locomotory model, particularly the implementation of peristaltic strides based on our new analyses of empirical larval tracks, and for providing constructive feedback that will help us improve the manuscript.

      The reviewer highlights the need for clearer explanations of the chemotaxis and odor preference modules. We expand these sections in the revised manuscript with more explicit descriptions of model structure, parameterization, and calibration. As mentioned above, we have also prepared a separate preprint dedicated to the larvaworld Python package, which contains detailed implementation notes and hands-on tutorials that allow users to adapt or extend individual modules.

      Regarding the comparison to empirical behavior in chemotaxis, our present analysis is indeed primarily qualitative. However, we would like to emphasize that the temporal profile of odor concentration at the larval head in our simulations matches that measured in Gomez-Marin et al. (Nature Comm., 2011, DOI: https://doi.org/10.1038/ncomms1455) using only one additional free parameter, while all parameters of the basic locomotory model had been fitted to a separate exploration dataset before and were kept fixed in the chemotaxis experiments. In addition to the simulation of chemotaxis in the present paper, we recently used larvaworld in a practical model application to estimate a species-specific parameter of thermotaxis from experiments across different drosophilids (Kafle et al., 2025, iScience, DOI: https://doi.org/10.1016/j.isci.2025.112809).

      The preference index in our simulations was computed using the same definition as in the established experimental group assay for larval memory retention, enabling a direct quantitative comparison between simulated and empirical results. Variability in the simulated outcomes arose naturally from inter-individual differences in body length and locomotory parameters, derived from real larval measurements, as well as from the random initial orientation of each individual in the arena. These factors contributed to variation in individual tracks and ultimately produced preference index values that closely matched those observed experimentally. In the revised manuscript, we also discuss handedness, as highlighted by the reviewer, as another meaningful expression of inter-individual variability in Drosophila larvae and insects more generally.

      Finally, we acknowledge the reviewer’s concern about the scalability and broader applicability of the model. While the present paper focuses on three specific behavioral paradigms (exploration, chemotaxis, odor preference), the modular structure of the architecture is designed for flexibility: modules at any layer can be exchanged for more detailed or alternative implementations, and new sensory modalities or behaviors can be integrated without redesigning the system. The larvaworld package, associated codebase, and documentation are openly available to encourage adoption and adaptation by the larval research community.

      Reviewer #3:

      This public review provides an excellent account of our central aim to build an easily configurable, well-documented platform for organism-scale behavioral simulation and we are happy to read that the reviewer considers this an excellent goal.

      We thank the reviewer for her/his account of our well-organized code using contemporary Python tooling. We are currently further improving code readability and code documentation, and we will release a new version of the larvaworld Python package. We further agree with the reviewer’s assessment that understanding the model calibration currently requires reading of the appendix. For the revised manuscript we thus aim at improving our description of all calibration and modeling steps along the way. We will also make sure to improve the description of the experimental datasets used for calibration.

      We recognize that our description of the paper’s scientific contribution could be clearer. In revision, we will sharpen the Introduction and Discussion to highlight our main contributions:

      (1) Promoting a shift from isolated neural circuit modeling to integrated agent-based simulations in realistic environments.

      (2) Proposing the layered behavioral architecture, adopting the subsumption paradigm for modular integration.

      (3) Providing the larvaworld software as a ready-to-use, extensible modeling platform.

      (4) Implementing an empirically calibrated locomotory model and demonstrating its integration with navigation and learning modules in replicated behavioral paradigms.

      We agree with the reviewer that the next challenge is to integrate the empirically based behavioral simulations presented here with functional brain models capable of reproducing or predicting experimental findings at the level of cellular neurophysiology, including the effects of cell-type-specific manipulations such as gene knock-down or optogenetic activation/inhibition. However, based on our experience with systems-level modeling, we deliberately invested in behavioral simulation because functional models of the nervous system—including our own—often lack translation into simulated agent behavior. In many cases, model output is limited to one or more variables that can at best be interpreted as a behavioral bias, and most often represents an “average animal” that fails to capture inter-individual differences. By linking our spiking mushroom body model to behavioral simulations in a group of individual agents during memory retention tests (Figure 6C,D), we were able to achieve a first successful direct comparison between simulated and experimental behavior metrics—in this case, the behavioral preference index reported in Jürgensen et al. (iScience, 2024, DOI: https://doi.org/10.1016/j.isci.2023.108640).

      Finally, we reiterate that the layered behavioral architecture is designed to promote a modular modeling paradigm. Our adoption of a subsumption architecture does not conflict with the concept of behavioral primitives; on the contrary, the notion that such primitives follow (semi-)autonomous motor programs and can be combined into more complex behaviors was the starting point for our implementation of the architecture in the fly larva. In our view, a genuinely contradictory paradigm for neural control of behavior would require a non-modular, strictly non-hierarchical organization of the nervous system and, by extension, of behavioral control.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review)

      Comment 

      Koonce et al. have generated a web-based visualization tool for exploring C. elegans neuronal morphology, contact area between neurons, and synaptic connectivity data. Here, the authors integrate volumetric segmentation of neurons and visualization of contact area patterns of individual neurons generated from Diffusion Condensation and C-PHATE embedding based on previous work from adult volumetric electron microscopy (vEM) data, extended to available vEM data for earlier developmental stages, which effectively summarizes modularity within the collated C. elegans contactomes to date. Overall, NeuroSC's relative ease of use for generating visualizations, its ability to quickly toggle between developmental stages, and its integration of a concise visualization of individual neurons' contact patterns strengthen its utility.

      We thank that reviewer for this positive assessment of our work.

      Comment

      NeuroSC provides an accessible and convenient platform. However, many of the characteristics of NeuroSC overlap with that of an existing tool for visualizing connectomics data, Neuroglancer, which is a widely-used and shared platform with data from other organisms. The authors do not make clear their motivation for generating this new tool rather than building on a system that has already collated previous connectomics data. Although the field will benefit from any tool that collates connectomics data and makes it more accessible and user-friendly, such a tool is only useful if it is kept up-to-date, and if data formatting for submitting electron microscopy data to be added to the tool is made clear. It is unclear from this manuscript whether NeuroSC will be updated with recently published and future C. elegans connectomes, or how additional datasets can be submitted to be added in the future.

      We have added new language to more explicitly state the motivations for developing NeuroSC (Introduction, lines 98-111, and discussion lines 375-384). In a new discussion section, we also include comparisons of the features of NeuroSC with other existing tools, like Neuroglancer and Webknossos, (lines 393-417).

      Briefly, the functional features of NeuroSC are substantially different (and do not exist) in other web-based tools for navigating EM datasets, including NeuroGlancer. This is because the intended use of NeuroSC is substantially different (and purposefully synergistic) to the intended use, and tools available, in NeuroGlancer. 

      NeuroGlancer is a versatile tool designed primarily for web-based visualizations and sharing of large EM datasets. NeuroSC was not designed to enable this type of access to the primary EM data (purposefully done because these features were already available through tools like NeuroGlancer). 

      Instead, the explicit goal of NeuroSC is to provide a platform specifically optimized for examining neuronal relationships across connectomic datasets. NeuroSC builds on the segmentations emerging from programs like NeuroGlancer, but the tools are tailored to explore relationships such as contact profiles in the context of neuronal morphologies and synaptic positions, and across datasets that represent different animals or different developmental stages. 

      To achieve this, all datasets in NeuroSC were optimized to facilitate comparisons across different connectomes of segmented neuronal features, including: 1) alignment of the neurons that are compared upon the display of the segmentations; 2) synchronization of the 3D windows; 3) implementation of a ‘universal color code’ across datasets for each neuron and relationship for easy visual comparisons; 4) use of the specific neuronal names to label instances of the same cells across all available datasets. The use of precise neuronal names among separate data sets allows integration of these objects with other catalogued datasets, including genomic and neuronal activity profiles.

      The formatting and display of the datasets used in NeuroSC was accompanied by the development of new tools including: 1) Rendering of the contact profiles of all neurons in the context of the morphology of the cell and the synapses and 2) C-PHATE diagrams to inspect multidimensional relationship hierarchies based on these contact profiles. In NeuroSC, C-PHATEs can be navigated and compared across multiple stages of development while visualizing neuronal reconstructions, allowing users to compare neuronal relationships across individual datasets.

      We agree with the reviewer that these tools are most useful when integrated. With that intention in mind, we designed NeuroSC as a series of modular, open-source tools that could be integrated into other programs, including Neuroglancer. In that sense our intent was not to produce another free-standing tool, but a set of tools that, if useful, could be integrated to other existing web-based connectomic resources to enhance the user experience of navigating complex EM datasets and draw biological meaning from the relationships between the neurons. Additionally, we intentionally designed NeuroSC to enable the ability to integrate new methods of understanding neuron relationships as they arise. We have dedicated a more detailed section to the discussion (lines 369- 417) to better convey this intention and directly address the unique abilities of NeuroSC as a complementary tool to the powerful existing tools, including Neuroglancer.

      Comment

      The interface for visualizing contacts and synapses would be improved with better user access to the quantitative underlying data. When contact areas or synapses are added to the viewer, adding statistics on the magnitude of the contact area, the number of synapses, and the rank of these values among the neuron's top connections, would make the viewer more useful for hypothesis generation. Furthermore, synapses are currently listed individually, with names that are not very legible to the web user. Grouping them by pre- and postsynaptic neurons and linking these groups across developmental stages would also be an improvement.

      [what do they even mean by linking?]

      We thank the reviewer for this insightful comment and have implemented several improvements to address these suggestions. Specifically, we have added new features to enhance user access to quantitative data within the NeuroDevSCAN viewer:

      Cell, Patch, and Synapse Statistics: Users can now see a statistics panel when clicking on a rendered neuron, contact patch, or a synapse. These panels provide the following information, respectively, and are highlighted in lines 303-315):

      Cell Stats: Click on a cell rendering to show cell stats which displays the total volume and surface area of the selected neuron within the defined neuropil area of our datasets (see Methods). 

      Contact Stats: Click on a patch rendering to show ‘contact stats’. This pop up displays quantifications of the selected contact relationship. Rank compares the summed surface area of contacts ("patches") between these two neurons relative to all other contact relationships for the primary neuron for the cell and the whole nerve ring. A rank of 1, for example, means this neuron pair shares the largest contact surface area of the examined relationship. “Total surface area” is displayed in nanometers, and is the summed surface area of all patches of this identity. Contact percentages are presented in two ways: (1) as the proportion of the primary cell's total surface area occupied by the contact in question, and (2) as the proportion of the total surface area of the nerve ring occupied by that same contact. (Showcased in figure S5). 

      Synapse Stats: A click on a synapse rendering now shows ‘synapse stats’, which displays the number of synapses of the selected identity within the primary neuron, including any polyadic synapse combinations involving the primary neurons. (Showcased in figure S7).

      (1) Grouping and Readability Improvements: While individual synapses are still visualized, their display has been improved for legibility. We have condensed the lengthy naming scheme to improve clarity and codified the synapse type by using superscript letters C, E, U to represent chemical, electrical and undefined synapses, respectively. This is explained and shown in figure S7, we added arrows to indicate the directionality of presumed information flow at each synapse. 

      (2) Developmental Linkage: We can link objects across datasets via cellular identity, but each synapse in the dataset does not yet have an identity attributed to its spatial coordinates, preventing us from linking specific synapses across development beyond their connectivity (ie, that a given synapses connects cell X to cell Y, for instance), also addressed in R1.11.  

      Together, these improvements substantially enhance the utility of the viewer for hypothesis generation by making key quantitative data readily accessible.

      Comment

      While the DC/C-PHATE visualizations are a useful tool for the user, it is difficult to understand when grouping or splitting of cell contact patterns is biologically significant. DC is a deterministic algorithm applied to a contactome from a single organism, and the authors do not provide quantitative metrics of distances between individual neurons or a number of DC iterations on the C-PHATE plot, nor is the selection process for the threshold for DC described in this manuscript. In the application of DC/C-PHATE to larval stage nerve ring strata organization shown by the authors, qualitative observations of C-PHATE plots colored based on adult data seem to be the only evidence shown for persistent strata during development (Figure 3) or changing architectural motifs across stages (Figure 4). Quantitation of differences in neuron position within the DC hierarchy, or differences in modularity across stages, is needed to support these conclusions. Furthermore, illustrating the quantitative differences in C-PHATE plots used to make these conclusions will provide a more instructive guide for users of NeuroSC in generating future hypotheses.

      There are several ways to visualize DC outputs, and one way to quantitatively compare DC clustering events of neurons is via Sankey diagrams. To make the inclusion of these resources more clear, we have highlighted them in lines 175-178 (Supplemental Tables 3-6). ‘DC outputs for each strata across animals can also be inspected using Sankey diagrams (Supplemental Tables 3-6). These spreadsheets detail the neuron members at each iteration of DC, allowing the user to derive quantitative comparisons of clustering events.’

      As the reviewer points out, DC is a deterministic algorithm that will iteratively cluster neurons based on the similarity of their contact profiles. To better explain the selection process for the threshold, the number of DC iterations and the quantitative metrics between the neurons, we have added new text in the Diffusion Condensation methods section.  Briefly:

      Number of DC iterations: During diffusion Condensation (DC) we track the modularity of the resulting clusters at each iteration and select the iteration with the highest modularity to define the clusters that represent the strata  (Moyle et al., 2021), (Brugnone et al., 2019). Mathematically, modularity is calculated by comparing the actual number of edges within clusters to the expected number of such edges in a randomized network with the same degree distribution (Newman et al., 2006). A higher modularity value implies that nodes within the same cluster are more densely connected to each other than to nodes in other clusters. We now better explain this in lines 562-567.

      Threshold for merging points: The threshold (epsilon) used to merge data points in each iteration is set as a small fraction of the spatial extent of the data: for each coordinate dimension (x, y, z), we compute the range (maximum minus minimum), take the maximum of these three values, and divide it by 10,000. This process is performed iteratively for each round of clustering until all data points cluster into a single point. We have updated the manuscript to clarify this threshold selection and included this information in the revised algorithm description and pseudocode. We now better explain this in lines 556-559.

      Distances between neurons in DC C-PHATE: In our previous description in Box 1 algorithm 1, we had provided a general algorithm for DC for any high dimensional dataset. We have now revised the algorithm to indicate how we used DC for these EM datasets. 

      Distances between neurons are determined by the pixel overlap between their segmented shapes in the EM dataset. We use these distances to build a graph with weighted edges, in which the weight of the edge represents the pixel overlap (the adjacency in the actual EM segmentation). Affinities between neurons, which are a proxy for their distance in the graph, are then computed as now revised in Box 1, Algorithm 1. This process is done iteratively as neurons cluster. To better communicate this, we have changed the text in lines 533-538.  

      Comment

      R1.5. While the case studies presented by the authors help to highlight the utility of the different visualizations offered by the NeuroSC platform, the authors need to be more careful with the claims they make from these correlative observations. For example, in Figure 4, the authors use C-PHATE clustering patterns to make conclusions about changes in clustering patterns of individual neurons across development based on single animal datasets. In this and many other cases presented in this study with the limited existing datasets, it is difficult to differentiate between developmental changes and individual variability between the neurite positions, contacts, and synapse differences within these data. This caveat needs to be clearly addressed.

      We now better explain in the manuscript that the selected case study, of the AVF neuron outgrowth, is not one of just correlation based solely on an EM dataset. Instead, the case study represents the NeuroSC-driven exploration of a biologically significant event supported by several independent datasets, as now explained in lines 257-276.

      Briefly, we agree with the reviewer that examining differences across individual EM datasets is insufficient evidence to make conclusions about developmental changes. But the strength of NeuroSC is in its ability to combine and compare multiple datasets, bolstering observations that are not possible by looking at just one dataset, and providing new insights on the way to new hypotheses. We now better explain that we are not looking at single connectomes in isolation and then deriving conclusions, but instead using NeuroSC to compare across 9 EM datasets. We better explain how the tools in NeuroSC, including C-PHATE, enabled comparisons across these multiple connectomes to identify apparent differences in neuronal relationships. We then explain that by using NeuroSC, we could examine these variations in neuronal relationships at the level of individual, cell biological differences of neuronal morphologies between the developmental datasets. This could be due, as pointed by the reviewer, to differences due to development, or just differences between individual animals. In the case of AVF, that features are absent in all early specimens, then arise and persist in all specimens after a certain time point, which lead us to hypothesize they result from a developmental event. Because the segmented objects in NeuroSC are linked to neuronal identities, we are also able to cross reference our observations from the EM datasets with information in other datasets and the literature. In the specific case of postembryonic development of AVF outgrowth, we can now tie the knowledge, from developmental lineage information and molecular profiles, that AVF is a postembryonically born neuron (Sulston et al. 1977, Sun et al 2022, Poole et al 2024, wormatlas.org) to the outgrowth dynamics of its neurites using the postembryonic EM datasets. Our findings using  NeuroSC provide a proof of concept of the utility of the resource and extended our understanding of how the outgrowth of this neuron affects the relationships between the neural circuits in the nerve ring.

      Comment

      R1.6. Given that recent studies have also quantified contact area between neurons across multiple connectomes (Cook et al., Current Biology, 2023; Yim et al., Nature Communications, 2024), and that the authors use a slightly different approach to quantify contact area, a direct comparison between contact area values obtained in this study with prior studies seems appropriate.

      We acknowledge that there are multiple different approaches to calculate adjacencies. In the papers cited above, there are 3 different algorithms used:

      (1) Brittin 2019 (python parse Track EM, boundary thresholds), used in Cook et al 2023, Moyle 2021, and this study).

      (2) Witvliet 2021 (Matlab 2D masks), used in Cook et al 2023.

      (3) Yim 2024 (3D masks), used in Yim et al 2024.

      To briefly describe the different approaches, and the methods we chose for this paper:

      Algorithm 1 (used in this study) defines adjacency based on distances between boundary points in TrakEM2 segmentations, allowing threshold tuning to accommodate differences in resolution and image quality across datasets—an important feature for consistent cross-dataset comparisons.

      Algorithm 2 infers contact via morphological dilation of VAST segmentations, identifying adjacency through overlapping expanded boundaries. 

      Algorithm 3 uses voxelwise contact detection with directional surface area measurements and normalization to account for dataset size differences. 

      In NeuroSC, we use algorithm 1, mostly because we had tested the rigor of this method in (Moyle et al. 2021), where we have shown that results were robust across a range of thresholds. This flexibility enables tailored application across datasets of varying quality and scale, critical for NeuroSC’s mission of curating data sets across differing methodologies to allow for direct relationship comparisons. We detail the methodology for defining thresholds for each dataset in methods section lines 492-521, defined in Supplementary table 1. Another difference between our analysis and the previously cited work is that for our analysis we also chose to include all individually resolved neurons, including post-embryonic cells, without collapsing them into left/right or dorsal/ventral symmetry classes. In this way our approach retains the full cellular resolution of the nervous system. 

      Comment

      Neuroglancer is not mentioned at all in the manuscript, despite it being a very similar and widely accepted platform for vEM data visualization across model organisms. An explicit comparison of NeuroSC and Neuroglancer would be appropriate, given the similarity of the tools. Currently, published C. elegans data (Witvliet et al., 2021; Yim et al., 2024) use Neuroglancer-based viewers, and directly comparing NeuroSC and highlighting its strengths relative to Neuroglancer would strengthen the paper.

      In the original manuscript we had not mentioned tools like Neuroglancer because we envisioned them as distinct, in intended use and output, from NeuroSC. But, as explained in R1.2 comment, in the revised version we have included a section in the Introduction lines 98-108 and in the Discussion (lines 369- 417) that compares these types of web-based tools and highlights synergies. 

      Comment

      Assigning shorthand names to strata, such as "shallow reflex circuit" (page 4, line 172), may oversimplify this group of neurons. Either more detailed support for shorthand names of C-PHATE modules should be included, or less speculative names for strata should be used.

      We appreciate this comment and understand that the original language used in the manuscript to describe strata categorizations may run the risk of oversimplification. We have now clarified the text to communicate that: 1) Strata are labeled by numbers (Strata 1, Strata 2, Strata 3 and Strata 4), rather than functional features of the neurons forming part of the strata, and that 2) the assignment of ‘strata’ is just one level of classification available via DC/CPHATE (as explained below). 

      To be sure, we have observed and published (Moyle et. al. Nature 2021) that within a given stratum, many neurons share the functional identities that we have used as summary descriptors for the strata (eg, shallow reflex circuits for Stratum 1; sensory and integrative circuits in Strata 3 and Strata 4; command interneurons in Strata 2, etc). However, those cell types are not the only members of the strata. We have adjusted the language in lines 197-204 to reflect this more clearly. “Stratum 1, which contains most neurons contributing to shallow reflex circuits that control aversive head movements in response to noxious stimuli, displayed the fewest changes among the developmental connectomes (Figure 3B–F; Supplementary Table 3). In contrast, C. elegans exhibit tractable behaviors that adapt to changing environmental conditions (Flavell et al., 2020). Strata 3 and 4 contain most neurons involved in circuits associated with such learned behaviors, including mechano- and thermo-sensation. This is reflected in Strata 3 and 4 showing the most change in neuronal relationships across postembryonic development.“

      Comment

      The authors state that NeuroSC can be applied to other model organisms. Since model organisms with greater neuron numbers include more individual neurons per cell class, the authors should support this by quantitatively demonstrating how DC/C-PHATE relationships correlate with shared functional roles among C. elegans neurons.

      We now clarify in the manuscript that, like in other organisms, C. elegans neurons are also grouped into functional classes with shared characteristics. In the context of the cylindrical nerve ring of the animal, these neuronal classes are sometimes bilaterally symmetric (forming left-right pairs), four-fold symmetric and six-fold symmetric. We now explain in the discussion that the DC/CPHATE analyses group these neuron classes and their relationships (lines 442-451). In the specific section mentioned by the reviewer, we now also add new text to contextualize this concept and how it might relate to the possible use of these tools in organisms with larger nervous systems: ‘However, our previous work has demonstrated that DC/CPHATE clustering of C. elegans neurons consistently pulls out clusters of shared neuron classes and shared functional roles Moyle et al. (2021). Building on this foundation, we envision applying similar clustering approaches to larger connectomes, aiming to identify classes and functionally related neuronal groups in more complex nervous systems. We suggest that contact profiles, along with neuron morphologies and synaptic partners, can act as ‘fingerprints’ for individual neurons and neuron classes. These ‘fingerprints’ can be aligned across animals of the same species to create identities for neurons. Frameworks for systematic connectomics analysis in tractable model systems such as C. elegans are critical in laying a foundation for future analyses in other organisms with up to a billion-fold increase in neurons (Toga et al., 2012).’

      Comment

      Lack of surface smoothing in NeuroSC leads to processes sometimes appearing to have gaps, which could be remedied by smoothing with a surface mesh. 

      We thank the reviewer for the suggestion, and understand the visibility of gaps in certain neuron processes can be distracting. But this was an intentional choice, with our main goal being to show the most accurate representation of the available data segmentation and avoid any rendering interpretations. In this way, we render the data with the highest fidelity we can and as close as possible to the ground truth of the EM segmentation. We have added language to describe this in the methods, lines 490-491, and in Figure legend 5b.

      Comment

      Toggling between time points while maintaining the same neurons and contact area in NeuroSC is a really valuable feature. The tool would be improved even more by extending this feature to synapses, specifically by allowing the user to add an entire group of synapses to the viewer at once (e.g. "all synapses between AIM and PVQ"), and to keep this synapse group invariant when toggling between developmental stages.

      We thank the reviewer for this suggestion. In response we have now implemented a new feature to ‘clone’ a rendered scene across time while preserving the original elements to ease comparisons. Once the user has rendered a scene, they can use the in-viewer developmental slider to clone the renderings and assigned colors, but display the renderings of the newly selected timepoint. These renderings populate a new window tab which can be dragged to align developmental stage windows side by side. We have added a sentence to account for this in lines 315-317 and to the legend of supplemental Figure S11. 

      Reviewer #2 (Public review)

      Comment

      The ability to visualize the data from both a connectomics and contactomics perspective across developmental time has significant power. The original C. elegans connectome (White et al., 1986) presented their circuits as line drawings with chemical and electrical synapses indicated through arrows and bars. While these line drawings remain incredibly useful, they were also necessary simplifications for a 2D publication and they lack details of the complex architecture seen within each EM image. Koonce et al take advantage of segmented image data of each neuronal process within the nerve ring to create a web interface where users can visualize 3D models for their neuron of choice. The C-PHATE visualization allows users to explore similarities among different neurons in terms of adjacency and then go directly to the 3D model for these neurons. The 3D models it generates are beautiful and will likely be showing up in many future presentations and publications. The tool doesn't require any additional downloading and is open source.

      We thank that reviewer for this positive assessment of our work.

      Comment

      While it's impossible to create one tool that will satisfy all potential users, I found myself wanting to have numbers associated with the data. For example, knowing the number of connections or the total surface area of contacts between individual neurons wasn't possible through the viewer, which limits the utility of taking deep analytical dives. While connectivity data is readily accessible through other interfaces such as Nemanode and WormWiring, a more thorough integration may be helpful to some users.

      We thank the reviewer for this feedback and in response have now implemented displays with quantitative information in NeuroSC. Now, upon hovering over a contact patch or synapse, the user will see the quantitative data of the relationship. For contact patches, you will see the total area shared between two neurons in that dataset. On hovering over a synapse, you will see how many synapses there are in total with the same members and throughout the dataset. We agree that this improves user analyses, (see also R1.3 response).

      Comment

      There were several issues with the user interface that made it a bit clunky to use. For example, as I added additional neurons to the filter search box, the loading time got longer and longer. I ran an experiment uploading all of the amphid neurons, one pair at a time. Each additional neuron pair added an additional 5-10 seconds to the loading. By the time I got to the last pair, it took over a minute to load. Issues like these, some of which may be unavoidable given the size of the data, could be conveyed through better documentation. I did not find the tutorial very helpful and the supplementary movies lacked any voiceover, so it wasn't always clear what they were trying to show.

      We appreciate that some of the more complex models can take a while to load. One of our core goals is to keep the high resolution of our models to most accurately represent the EM data, so we had to compromise between resolution and loading times. But to address this concern we have now added a ‘loading’ prompt that reassures the user when there is a wait. We also added, as suggested, text guidance throughout all of the supplemental videos (Supplemental Videos 1-4).

      Reviewer #3 (Public review)

      Comment

      A web-based app, NeuroSC, that individual researchers can use to interrogate the structure and organization of the C. elegans nerve ring across development In the opinion of this reviewer, only minor revisions are required.

      We thank that reviewer for this positive assessment of our work.

      Comment

      Contact is defined by length, why not contact area? How are these normalized for changes in the overall dimensions of neurons during development?

      To clarify our methodology: the adjacency algorithm that we use generates a 2D adjacency profile by summing the number of adjacent boundary points per EM section, which are then summed across all EM z slices.

      Contact area can be derived by multiplying the adjacency length in each slice by pixel resolution and z-thickness. Prompted by the reviewer we have now also calculated and display contact surface areas, along with their ranks among all contact relationships for a given neuron. These can be inspected directly via the interface by clicking on a rendered cell or contact patch (Figure S5 and lines 308-312). We believe these additional surface area metrics enhance the interpretability and utility of the viewer.

      We apply normalization at the level of the adjacency threshold to account for dataset-specific differences such as contrast, boundary definition, and age-related changes in neuropil packing density. This normalization is applied before running the adjacency algorithm. We do not normalize by individual neuron size, as the contact data are intended to reflect relational differences between neurons, rather than absolute morphological scaling. In fact, our addition of a scale-spheroid within each rendered model emphasizes the large increase in spatial scale that the nerve ring experiences during larval growth.  

      Comment

      Figure 1, C&D, explanation unclear for how the adjacency matrix is correlated with C-Phate schematic in D.

      We thank the reviewer for the comment and have clarified this section by adding greater detail to the explanation of how an adjacency matrix is computed (lines 149-155), as well as a description now in the figure legend 1C. Additionally, we revised Figure 1C and D to simplify neuron representations/colors and to simplify the adjacency heat map gradient. We also extended the area of contact between neurons on Figure 1C to better reflect what would be considered a “contact”. Lastly, in the figure, we changed the color and placement for the z plane arrow and label from black to white, to make it more visible, to highlight the method of computing adjacency for each z slice. 

      Comment

      Figure 4, panels F & G, unclear why AVF is shown in panel G (L3) but not panel F (L1). Explanation (see below) should be provided earlier, i.e., AVF is not generated until the end of the L1.

      We have now clarified this important point by adding labels to Figure 4 panels F and G, ‘Pre-AVF outgrowth’ and ‘Post-AVF outgrowth’ respectively. Briefly, the point is that AVF grows into the nerve ring after the L2 stage, and that is why it is absent in panel F (L1 stage, now with the label ‘Pre-AVF outgrowth’).  

      Comment

      Line 146 What is the justification for the statement: "By end of Larval Stage 1 (L1), neuronal differentiation has concluded...."? This statement is confusing since this sentence also states that "90% of neurons in the neuropil...have entered the nerve ring..." which would suggest that at least 10% additional NR neurons have NOT fully differentiated.

      We have fixed this sentence in the text. Now the sentence reads ‘By Larval stage 1 (L1) 90% of the neurons in the neuropil (161 neurons out of the 181 neurons) have grown into the nerve ring and adopted characteristic morphologies and positions. 

      Lines 171-175 What is meant by the statement that "degree of these changes mapped onto...plasticity? What are examples of "behavioral plasticity?"

      We have added the following new lines of text (lines 200-204) and now additionally cite a review discussing C. elegans behaviors to clarify and give context to behavioral plasticity. ‘C. elegans exhibit tractable behaviors which can adapt due to changing environmental conditions  (Flavell et. al. Genetics 2020). Strata 3 and 4 contain most neurons belonging to circuits associated with such learned behaviors, including chemo, mechano and thermo sensation. This is seemingly reflected by strata 3 and 4 harboring the most readily recognized set of changes in neuronal relationships across postembryonic development.’  

      Comment

      Lines 189-190 The meaning of this sentence is unclear, "The logic in....merge events."

      This sentence has been deleted and we have instead refocused our descriptions of C-PHATES comparisons by neuronal clustering trajectories and cluster members (rather than iterations).

      Comment

      Lines 193-208 This section reports varying levels of convergence across larval development in C-Phate maps for the interneurons AIML and PVQL. Iterations leading to convergence varied: 16 (L1), 14 (L2), 22 (L3), 20 (l4), 14 (adult). The authors suggest that these differences are biologically significant and reflect the reorganization of AIML and PVQL contact relationships especially between the L4 and adult. Are these differences in iterations significant?

      We agree this could be confusing and instead of focusing on comparing the iteration at which each merging event occurs, we now focus on examining the differences in members of clusters, before and after the merge event. Cluster membership is easier to interpret than the differences in the number of DC iterations (lines 224-229).

      Lines 240-241 States that AVF neurons "terminally differentiate in the embryo" which is not correct. AVF neurons are generated from neuronal precursors (P0 and P1) at the end of the L1 stage which accounts for their outgrowth into the NR during the L2 stage. 

      We thank the reviewer for the correction and have edited the text to read: ‘AVF neurons are generated from neuronal precursors (P0 and P1) at the end of the L1 stage (Sulston et al. (1983); Sun and Hobert (2023); Poole et al. (2024); Hall and Altun (2008); Sulston and Horvitz (1977). AVF neurons do not grow into the nerve ring until the L2 stage, and continue to grow until the Adult stage (lines 261-266).’

      Comment

      Lines 289-315. A detailed and highly technical description of website architecture would seem more appropriate for the Methods section.

      We agree and have moved this section to the methods as suggested (lines 663-690).

      Comment

      Line 307 "source data is" should be "source data are"

      Thank you- we have fixed this grammatical error.

      Comment

      Line 324 "circuits identities" should be "circuit identity".

      Thank you- we have fixed this grammatical error.

      Comment

      Trademark/copyright conflict with these sites? https://compumedicsneuroscan.com/about/ https://www.neuroscanai.com/

      We thank the reviewer for drawing our attention to this. To avoid potential conflicts, we have proactively altered the name to NeuroSC throughout the paper.

    1. Author response:

      The following is the authors’ response to the original reviews.\

      Reviewer #1(Public review):

      (1) Changes in blood volume due to brain activity are indirectly related to neuronal responses. The exact relationship is not clear, however, we do know two things for certain: (a) each measurable unit of blood volume change depends on the response of hundreds or thousands of neurons, and (b) the time course of the volume changes are slow compared to the potential time course of the underlying neuronal responses. Both of these mean that important variability in neuronal responses will be averaged out when measuring blood changes. For example, if two neighbouring neurons have opposite responses to a given stimulus, this will produce opposite changes in blood volume, which will cancel each other out in the blood volume measurement due to (a). This is important in the present study because blood volume changes are implicitly being used as a measure of coding in the underlying neuronal population. The authors need to acknowledge that this is a coarse measure of neuronal responses and that important aspects of neuronal responses may be missing from the blood volume measure.

      The reviewer is correct: we do not measure neuronal firing but use blood volume as a proxy for bulk local neuronal activity, which does not capture the richness of single neuron responses. This is why the paper focuses on large-scale spatial representations as well as cross-species comparison. For this latter purpose, fMRI responses are on par with our fUSI data, with both neuroimaging techniques showing the same weakness. We have now added this point to the discussion: 

      “Second, we used blood volume as a proxy for local neuronal activity. Thus, our signal ignores any heterogeneity that might exist at the level of local neuronal populations. However, our main findings are related to the large-scale organization of cortical responses and how they relate to those of humans. For this purpose, the functional spatial resolution of our signal, driven by the spatial resolution of neurovascular coupling, should be adapted. In addition, using hemodynamic signals provides a much better comparison with human fMRI data, where the same limitations are present.”

      (2) More importantly for the present study, however, the effect of (b) is that any rapid changes in the response of a single neuron will be cancelled out by temporal averaging. Imagine a neuron whose response is transient, consisting of rapid excitation followed by rapid inhibition. Temporal averaging of these two responses will tend to cancel out both of them. As a result, blood volume measurements will tend to smooth out any fast, dynamic responses in the underlying neuronal population. In the present study, this temporal averaging is likely to be particularly important because the authors are comparing responses to dynamic (nonstationary) stimuli with responses to more constant stimuli. To a first approximation, neuronal responses to dynamic stimuli are themselves dynamic, and responses to constant stimuli are themselves constant. Therefore, the averaging will mean that the responses to dynamic stimuli are suppressed relative to the real responses in the underlying neurons, whereas the responses to constant stimuli are more veridical. On top of this, temporal following rates tend to decrease as one ascends the auditory hierarchy, meaning that the comparison between dynamic and stationary responses will be differently affected in different brain areas. As a result, the dynamic/stationary balance is expected to change as you ascend the hierarchy, and I would expect this to directly affect the results observed in this study.

      It is not trivial to extrapolate from what we know about temporal following in the cortex to know exactly what the expected effect would be on the authors' results. As a first-pass control, I would strongly suggest incorporating into the authors' filterbank model a range of realistic temporal following rates (decreasing at higher levels), and spatially and temporally average these responses to get modelled cerebral blood flow measurements. I would want to know whether this model showed similar effects as in Figure 2. From my guess about what this model would show, I think it would not predict the effects shown by the authors in Figure 2. Nevertheless, this is an important issue to address and to provide control for.

      We understand the reviewer’s concern about potential differences in response dynamics in stationary vs non-stationary sounds. It seems that the reviewer is concerned that responses to foregrounds may be suppressed in non-primary fields because foregrounds are not stationary, and non-primary regions could struggle to track and respond to these sounds. Nevertheless, we observed the contrary, with non-primary regions overrepresenting non-stationary (dynamic) sounds, over stationary ones. For this reason, we are inclined to think that this explanation cannot falsify our findings. 

      We understand the comment that temporal following rates might differ across regions in the auditory hierarchy and agree. In fact, we do show that tuning to temporal rates differs across regions and partly explains the differences in background invariance we observe. In this regard, we think the reviewer’s suggestion is already implemented by our spectrotemporal model, which incorporates the full range of realistic temporal following rates (up to 128 Hz). The temporal averaging is done as we take the output of the model (which varies continuously through time) and average it in the same window as we used for fUSI data. When we fit this model to the ferret data, we find that voxels in non-primary regions, especially VP (tertiary auditory cortex), tend to be more tuned to low temporal rates (Figure 2F, G), and that background invariance is stronger in voxels tuned to low rates. This is, however, not true in humans, suggesting that background invariance in humans relies on different computational mechanisms. We have added a sentence to clarify this: “The model included a range of realistic temporal rates and this axis was the most informative to discriminate foregrounds from backgrounds.”

      (3) I do not agree with the equivalence that the authors draw between the statistical stationarity of sounds and their classification as foreground or background sounds. It is true that, in a common foreground/background situation - speech against a background of white noise - the foreground is non-stationary and the background is stationary. However, it is easy to come up with examples where this relationship is reversed. For example, a continuous pure tone is perfectly stationary, but will be perceived as a foreground sound if played loudly. Background music may be very non-stationary but still easily ignored as a background sound when listening to overlaid speech. Ultimately, the foreground/background distinction is a perceptual one that is not exclusively determined by physical characteristics of the sounds, and certainly not by a simple measure of stationarity. I understand that the use of foreground/background in the present study increases the likely reach of the paper, but I don't think it is appropriate to use this subjective/imprecise terminology in the results section of the paper.

      We appreciate the reviewer’s comment that the classification of our sounds into foregrounds and backgrounds is not verified by any perceptual experiments. We use those terms to be consistent with the literature (McWalter and McDermott, 2018; McWalter and McDermott, 2019), including the paper we derived this definition from (Kell et al., 2019). These terms are widely used in studies where no perceptual or behavioral experiments are included, and even when animals are anesthetized. We have clarified and justified this choice in the beginning of the Results section:

      “We used three types of stimuli: foregrounds, backgrounds, and combinations of those. We use those terms to refer to sounds differing in their stationarity, under the assumption that stationary sounds carry less information than non-stationary sounds, and are thus typically ignored.”

      We have also added a paragraph in the discussion to emphasize the limits of this definition:

      “First, this study defined foregrounds and backgrounds solely based on their acoustic stationarity, rather than perceptual judgments. This choice allowed us to isolate the contribution of acoustic factors in a simplified setting. Within this controlled framework, we show that acoustic features of foreground and background sounds drive their separation in the brain and the hierarchical extraction of foreground sound features.”

      (4) Related to the above, I think further caveats need to be acknowledged in the study. We do not know what sounds are perceived as foreground or background sounds by ferrets, or indeed whether they make this distinction reliably to the degree that humans do. Furthermore, the individual sounds used here have not been tested for their foreground/background-ness. Thus, the analysis relies on two logical jumps - first, that the stationarity of these sounds predicts their foreground/background perception in humans, and second, that this perceptual distinction is similar in ferrets and humans. I don't think it is known to what degree these jumps are justified. These issues do not directly affect the results, but I think it is essential to address these issues in the Discussion, because they are potentially major caveats to our understanding of the work.

      We agree with the reviewer that the foreground-background distinction might be different in ferrets. In anticipation of that issue, we had enriched the sound set with more ecologically relevant sounds, such as ferret and other animal vocalizations. Nevertheless, we have emphasized this limitation in addition to the limitation of our definition of foregrounds and backgrounds in the discussion: 

      “In addition, most of the sounds included in our study likely have more relevance for humans compared to ferrets (see table \ref{tbl1}). Despite including ferret vocalizations and environmental sounds that are more ecologically relevant for ferrets, it is not clear whether ferrets would behaviorally categorize foregrounds and backgrounds as humans do. Examining how ferrets naturally orient or respond to foreground and background sounds under more ecologically valid conditions, potentially with free exploration or spontaneous listening paradigms, could help address this issue.”

      Reviewer #2(Public review);

      (1) Interpretation of the cerebral blood volume signal: While the results are compelling, more caution should be exercised by the authors in framing their results, given that they are measuring an indirect measure of neural activity, this is the difference between stating "CBV in area MEG was less background invariant than in higher areas" vs. saying "MEG was less background invariant than other areas". Beyond framing, the basic properties of the CBV signal should be better explored:

      a) Cortical vasculature is highly structured (e.g. Kirst et al.( 2020) Cell). One potential explanation for the results is simply differences in vasculature and blood flow between primary and secondary areas of auditory cortex, even if fUS is sensitive to changes in blood flow, changes in capillary beds, etc (Mace et al., 2011) Nat. Methods.. This concern could be addressed by either analyzing spontaneous fluctuations in the CBV signal during silent periods or computing a signal-to-noise ratio of voxels across areas across all sound types. This is especially important given the complex 3D geometry of gyri and sulci in the ferret brain.

      We agree with the reviewers that there could be differences in vasculature across subregions of the auditory cortex and note that this point would also be valid for the published human fMRI data. Nevertheless, even if small differences in vasculature were present, it is unlikely that they would affect our analyses and results, which are designed to be independent of local vascular density. First, we normalize the signal in each voxel using the silent periods, so that the absolute strength of the raw signal, or baseline blood volume in each voxel, is factored in our analysis. Second, we only focus on reliably responsive voxels in each region and do see comparable sound-evoked responses in all regions (Figure S2). Third, our analysis mostly relies on voxel-based correlation across sounds, which is independent of the mean and variance of the voxel responses. Differences in noise, measured through test-retest reliability, can affect values of correlation, which is why we used a noise-correction procedure. After this procedure, invariance does not depend on test-retest, and differences across regions are still seen when matching for test-retest (new  Figure S7). Thus, we believe that differences in vascular architecture across regions are unlikely to affect our results. We added this point in the Methods section when discussing the noise-correction:

      “After this correction, the differences we observed between brain regions were present regardless of voxels' test-retest reliability, or noise level (Figure S7). Thus, potential differences in vasculature across regions are unlikely to affect our results.”

      b) Figure 1 leaves the reader uncertain what exactly is being encoded by the CBV signal, as temporal responses to different stimuli look very similar in the examples shown. One possibility is that the CBV is an acoustic change signal. In that case, sounds that are farther apart in acoustic space from previous sounds would elicit larger responses, which is straightforward to test. Another possibility is that the fUS signal reflects time-varying features in the acoustic signal (e.g. the low-frequency envelope). This could be addressed by cross-correlating the stimulus envelope with fUS waveform. The third possibility, which the authors argue, is that the magnitude of the fUS signal encodes the stimulus ID. A better understanding of the justification for only looking at the fUS magnitude in a short time window (2-4.8 s re: stimulus onset) would increase my confidence in the results.

      We thank the reviewer for raising that point as it highlights that the layout of Figure 1 is misleading. While Figure 1B shows an example snippet of our sound streams, Figure 1D shows the average timecourse of CBV time-locked to a change in sound (foreground or background, isolated or in a mixture). This is the average across all voxels and sounds, aiming at illustrating the dynamics for the three broad categories. In Figure 1E however, we show the cross-validated cross-correlation of CBV across sounds (and different time lags). To obtain this, we compute for each voxel the response to each sound at each time lag, thus obtaining two vectors (size: number of sounds) per lag, one per repeat. Then, we correlate all these vectors across the two repeats, obtaining one cross-correlation matrix per voxel. We finally average these matrices across all voxels. The presence of red squares with high correlations demonstrates that the signal encodes sound identity, since CBV is more similar across two repeats of the same sound (e.g., in the foreground only matrix, 0-5 s vs 0-5 s), than two different sounds (0-5 s vs. 7-12 s). We modified the figure layout as well as the legend to improve clarity.

      (2) Interpretation of the human data: The authors acknowledge in the discussion that there are several differences between fMRI and fUS. The results would be more compelling if they performed a control analysis where they downsampled the Ferret fUS data spatially and temporally to match the resolution of fMRI and demonstrated that their ferret results hold with lower spatiotemporal resolution.

      We agree with the reviewer that the use of different techniques might come in the way of cross-species comparison. We already control for the temporal aspect by using the average of stimulus-evoked activity across time (note that due to scanner noise, sounds are presented cut into small pieces in the fMRI experiments). Regarding the spatial aspect, there are several things to consider. First, both species have brains of very different sizes, a factor that is conveniently compensated for by the higher spatial resolution of fUSI compared to fMRI (0.1 vs 2 mm). Downsampling to fMRI resolution would lead to having one voxel per region per slice, which is not feasible. We also summarize results with one value per region, which is a form of downsampling that is fairer across species. Furthermore, we believe that we already established in a previous study (Landemard et al, 2021 eLife) that fUSI and fMRI data are comparable signals. We indeed could predict human fMRI responses to most sounds from ferret fUSI responses to the same identical sounds. We clarified these points in the discussion:

      “In addition, fMRI has a worse spatial resolution than fUSI (here, 2 vs. 0.1 mm voxels). However, this difference in resolution compensates for the difference in brain size between humans and ferrets. In our previous work, we showed that a large fraction of cortical responses to natural sounds could be predicted from one species to the other using these methods (Landemard et al., 2021).”

      Reviewer #3 (Public review):

      As mentioned above, interpretation of the invariance analyses using predictions from the spectrotemporal modulation encoding model hinges on the model's ability to accurately predict neural responses. Although Figure S5 suggests the encoding model was generally able to predict voxel responses accurately, the authors note in the introduction that, in human auditory cortex, this kind of tuning can explain responses in primary areas but not in non-primary areas (Norman-Haignere & McDermott, PLOS Biol. 2018). Indeed, the prediction accuracy histograms in Figure  S5C suggest a slight difference in the model's ability to predict responses in primary versus non-primary voxels. Additional analyses should be done to a) determine whether the prediction accuracies are meaningfully different across regions and b) examine whether controlling for prediction accuracy across regions (i.e., subselecting voxels across regions with matched prediction accuracy) affects the outcomes of the invariance analyses.

      The reviewer is correct: the spectrotemporal model tends to perform less well in human non-primary cortex. We believe this does not contradict our results but goes in the same direction: while there is a gradient in invariance in both ferrets and humans, this gradient is predicted by the spectrotemporal model in ferrets, but not in humans (possibly indeed because predictions are less good in human non-primary auditory cortex). Regardless of the mechanism, this result points to a difference across species. In ferrets, we found a significantly better prediction accuracy in VP (p=0.001, permutation test) and no differences between MEG and dPEG (p=0.89). In humans, prediction accuracy was slightly higher in primary compared to non-primary auditory cortex, but this effect was not significant (p=0.076). In both species, when matching prediction accuracy between regions, the gradients in invariance were preserved. We have added these analyses to the manuscript (Figure S5).

      A related concern is the procedure used to train the encoding model. From the methods, it appears that the model may have been fit using responses to both isolated and mixture sounds. If so, this raises questions about the interpretability of the invariance analyses. In particular, fitting the model to all stimuli, including mixtures, may inflate the apparent ability of the model to "explain" invariance, since it is effectively trained on the phenomenon it is later evaluated on. Put another way, if a voxel exhibits invariance, and the model is trained to predict the voxel's responses to all types of stimuli (both isolated sounds and mixtures), then the model must also show invariance to the extent it can accurately predict voxel responses, making the result somewhat circular. A more informative approach would be to train the encoding model only on responses to isolated sounds (or even better, a completely independent set of sounds), as this would help clarify whether any observed invariance is emergent from the model (i.e., truly a result of low-level tuning to spectrotemporal features) or simply reflects what it was trained to reproduce.

      We thank the reviewer for this suggestion. We have run an additional prediction using only the sounds presented in isolation, which replicates our main results (new Figure S6). We have added this control to the manuscript:

      “Results were similar if the model was fit solely on isolated sounds, excluding mixtures from the training set (Figure S6).”

      Finally, the interpretation of the foreground invariance results remains somewhat unclear. In ferrets (Figure 2I), the authors report relatively little foreground invariance, whereas in humans (Figure 5G), most participants appear to show relatively high levels of foreground invariance in primary auditory cortex (around 0.6 or greater). However, the paper does not explicitly address these apparent crossspecies differences. Moreover, the findings in ferrets seem at odds with other recent work in ferrets (Hamersky et al. 2025 J. Neurosci.), which shows that background sounds tend to dominate responses to mixtures, suggesting a prevalence of foreground invariance at the neuronal level. Although this comparison comes with the caveat that the methods differ substantially from those used in the current study, given the contrast with the findings of this paper, further discussion would nonetheless be valuable to help contextualize the current findings and clarify how they relate to prior work.

      We thank the reviewer for this point. While we found a trend for higher background invariance than foreground invariance in ferret primary auditory cortex, this difference was not significant and many voxels exhibit similar levels of background and foreground invariance (for example in Figure 2D, G). Thus, we do not think our results are inconsistent with Hamersky et al., 2025, though we agree the bias towards background sounds is not as strong in our data. This might indeed reflect differences in methodology, both in the signal that is measured (blood volume vs spikes), and the sound presentation paradigm. Our timescales are much slower and likely reflect responses post-adaptation, which might not be as true for Hamersky et al. We have added this point to the discussion, as well as a comment on the difference between ferrets and humans in foreground invariance in primary auditory cortex:

      “In ferrets, primary auditory cortex has been found to over-represent backgrounds in mixtures compared to foregrounds (Hamersky et al., 2025). In contrast, we found a slight, non-significant bias towards foregrounds in primary regions. This difference could be driven by a difference in timescales, as we looked at slower timescales in which adaptation might be more present, reducing the strength of background encoding. In humans, we found a much smaller gap between background and foreground invariance in primary auditory cortex, which was not predicted by the spectrotemporal model. Additional, more closely controlled experiments would be needed to confirm and understand this species difference.”

      Reviewer #1 (Recommendations for the authors):

      (1) In the introduction, explain the relationship between background/foreground and stationarity/non-stationarity, and thus why stationary/nonstationary stimuli could be used to probe differences in background/foreground processing.

      We have added a sentence at the beginning of the results section to justify our choice (see public review).  

      (2) Avoid use of the background/foreground terminology in Results (and probably Methods).

      For consistency with previous literature, we decided to keep this terminology, though imperfect. We further justified our choice in the beginning of the Results section (see previous point).

      (3) In the Discussion, explain what the implications of the results are for background/foreground processing, and, importantly, highlight any caveats that result from stationarity not being a direct measure of background/foreground.

      We added a paragraph in the Discussion to highlight this point choice (see public review).

      Reviewer #2 (Recommendations for the authors):

      (1) Figure 1: Showing a silent period in the examples would help in understanding the fUS signal.

      In Figure 1D, we show the average timecourse of CBV time-locked to a change in sound (foreground or background, isolated or in a mixture). This is the average across all voxels and sounds. Thus, it would not be very informative to show an equivalent plot for a silent period, as it would look flat by definition. However, we updated the layout and legend of Figure 1 to make it clearer and avoid confusion.

      (2) "Responses were not homogenous" - would make more sense to say something like "responses were not spatially distributed".

      We removed these words which were indeed not necessary: “We found that reliable soundevoked responses were confined to the central part of ventral gyrus of the auditory cortex.”

      (3) Figure 2D: The maps shown in Figure 2D are difficult to understand for the noninitiated in fUS. At a minimum, labels should be added to indicate A-P, M-L, D-V. I cannot see the white square in the primary figure. An additional graphic would be helpful here to understand the geometry of the measurement.

      We thank the reviewer for pointing out that reading these images is indeed an acquired skill. We added an annotated image of anatomy with indications of main features to guide the reader in Figure 1. We also added missing white squares. 

      (4) Figure 2F: Can the authors better justify why the summary statistic is shown for all three areas, but the individual data only compares primary vs. higher order?`

      We now show individual data for all three areas.

      (5) More methods information is needed to understand how recordings were stitched across days. Was any statistical modeling used to factor out the influence of day on overall response levels?

      We simply concatenated voxels recorded across different sessions and days. The slices were sampled randomly to avoid any systematic effect. Because different slices were sampled in different sessions, any spatial structure spanning several slices is unlikely to be artefactual. For instance, the map of average responses in Figure 2A shows a high level of continuity of spatial patterns across slices. This indicates that this pattern reflects a true underlying organization rather than session-specific noise. It also shows that the overall response levels are not affected by the day or recording session. We added a section in the Methods (“Combining different recordings”) to clarify this point:

      “The whole dataset consisted of multiple slices, each recorded in a different recording session. Slices to image on a given day were chosen at random to avoid any systematic bias. Responses were consistent across neighboring slices recorded on different sessions, as shown by the maps of average responses (Figure 2A, Figure S2) where any spatial continuity across different slices must reflect a true underlying signal in the absence of common noise.”

      Reviewer #3 (Recommendations for the authors):

      (1) Figures:

      The figures are generally very well done and visually appealing. However, I have a few suggestions and questions.

      a)  In Figure 1G, the delta CBV ranges from 0.5 to 1.5, although in subsequent figures (e.g., Figure 2D), the range is much larger (-15 to 45). Is it possible that the first figure is a proportion rather than a percentage, or is there some other explanation for the massive difference in scale? Not being very familiar with this measure, it was confusing.

      The same scale is used in both figures, the major difference being that in Figure 1D, we take the average over all voxels and sounds (for each category), which will include many nonresponsive voxels, and for responsive voxels, sounds that they do not respond a lot to. On the other hand, Figure 2D shows the response of a single, responsive voxel. Thus, the values it reaches for its preferred sounds (45%) are an extreme, which weighs only little in Figure 1D. We have changed the legend of Figure 1D to make this more explicit.

      b)  Similar to the first point, the strength of the correlations in the matrices of Figure 1E is very small (~ 0.05) compared to the test-retest reliabilities plotted in Figure 2B (~0.5). Again, I was confused by this large difference in scale.

      Two main factors explain the difference in values between Figure 1E and Figure 2B. First, in Figure 1B, each correlation is done on the average activity in a window of 0.3 s, opposed to 2.4 s in Figure 2B. More averaging leads to better SNR, which inevitably leads to higher testretest correlations. Second, in Figure 1B, the cross-correlation matrices are averaged across all responsive voxels without any criterion for reliability. On the other hand, Figure 2B show example voxels with good test-retest reliability. 

      c)  In Figure 2D, the example voxels are supposed to be shown in white. It appears that this example voxel is only shown for the non-primary voxel. Please be sure to add these voxels throughout the other panels and figures as well. 

      We fixed this mistake and added the example voxel in all panels.

      d)  Why do the invariance results (e.g., Figure 2F) for individual animals combine across dPEG and VP, while the overall results (across all animals) split things across all three regions? The results in Table 2 do, in fact, provide this data. Upon further examination of the data in Table 2, it seems like there is only a significant difference between background invariance between dPEG and VP for one of the two animals, and that this might be what drives the effect when pooling across all animals. This seems important to both show visually in the figure and to potentially discuss. There is still very clearly a difference between primary and non-primary, but whether there is a real difference between dPEG and VP seems more unclear.

      We added the values for single animals in the plot and highlighted this limitation in the text:

      “While background invariance was overall highest in VP, the differences within non-primary areas were more variable across animals (see table 2).”

      e)  Again, as in Figure 2F, the cross symbols seem like a bad choice as markers since the vertical components of the cross are suggestive of the error of the measurement. However, no error is actually plotted in these figures. I recommend using a different marker and including some measure of error in the invariance plots.

      We replaced the crosses with circles to avoid confusion. The measure of error is provided by the representation of values for single animals.

      f) The caption for Figure 4C states that each line corresponds to one animal, but does not precisely state what this line represents. Is this the median or something?

      Each line indeed represents the median across voxels for one animal. We added this information to the legend.

      g)  In Figure 5, the captions for panels D and E are swapped.

      This has now been corrected.

      (2) Discussion:

      (a) In the paragraph on methodological differences, it mentions that the fMRI voxel size is around 2 mm. This may be true in general, but given the comparison to Kell & McDermott 2019, the voxel size should reflect that used in their study (1 mm).

      The reviewer might refer to this sentence from the methods of Kell et al., 2019: “T1weighted anatomical images were collected in each participant (1-mm isotropic voxels) for alignment and cortical surface reconstruction.” However, this does not correspond to the resolution of the functional data, which is 2 mm, as mentioned a bit further in the Methods:  “In-plane resolution was 2 × 2 mm (96 × 96 matrix), and slice thickness was 2.8 mm with a 10% gap, yielding an effective voxel size of 2 × 2 × 3.08 mm.”

      (b) In the next paragraph on the control of attention, it mentions that attentional differences could play a role. However, in Kell & McDermott 2019, they manipulated attention (attend visual versus attend auditory) and found that it did not substantially affect the observed pattern invariance. I suppose it could potentially affect the degree to which an encoding model could explain the invariance. This seems important, and given that the data was already collected, it could be worth it to analyze that data.

      As the reviewer points out, Kell et al. 2019 ran an additional experiment in which they manipulated auditory vs. visual attention. However, the auditory task was just based on loudness and ensured that the participants were awake and paying attention to the stimuli, but not specifically to the foreground or background. This type of attention did not lead to changes in the observed patterns of invariance, which might have been the case for selective attention to backgrounds or foregrounds in the mixture. Given that these manipulations were not done in the ferret experiments, we chose to not include the analysis of this dataset in the scope of this paper. However, future work investigating that topic further would indeed be of interest.

      (c) The mention of "a convolutional neural network trained to recognize digits in noise" should make more obvious that this is visual recognition rather than auditory recognition.

      We clarified this sentence to make clear that the recognition is visual and not auditory: “For instance, in a convolutional neural network trained to visually recognize digits in different types of noise, when local feedback is implemented, early layers encode noise properties, while later layers represent clean signal.”

      (d) Finally, one explanation of the results in the discussion is that "primary auditory areas could be recruited to maintain background representations, enabling downstream cortical regions to use these representations to specifically suppress background information and enhance foreground representations." This "background-related information" being used to "facilitate further extraction of foregrounds" is similar to what is argued in Hicks & McDermott PNAS 2024.

      We thank the reviewer for suggesting this relevant reference and added it in this paragraph of the discussion.

      (3) Methods:

      In the "Cross-correlation matrices" section, it mentions that time-averaged responses from 2.4 to 4.8 s were used. It would be helpful to provide an explanation of why this particular time window was used. Additionally, I wondered whether one could look at adaptation type effects (e.g., that of Khalighinejad et al., 2019) or whether fUSI does not offer this kind of temporal precision?

      The effects shown in Khalighinejad et al., 2019, are indeed likely too fast to be observed with our methods. However, there are still dynamics in the fUSI signal and in its invariance (Figure S1). Each individual combination of foreground and background is presented for 4.8 s (Figure 1B). Therefore, we chose the range 2.4-4.8 s as the biggest window we could use (to improve SNR) while minimizing contamination from the previous or next sound (indeed, blood volume typically lags neuronal activity by 1.5-2 s). We added this precision to the methods.

      In the "Human analyses" section, it is very unclear which set of data was used from Kell & McDermott 2019. For example, that paper contains 4 different experiments, none of which has 7 subjects. Upon closer reading, it seems that only 7 of the 11 participants from Experiment 1 also heard the background sounds in isolation (thus enabling the foreground invariance analyses). However, they stated that there were only 3 female participants in that experiment, while you state that you used data from 7 females. It would be helpful to double-check this and to more clearly state exactly which participants (i.e., from which experiment) were used and why (e.g., why not use data from Experiment 4 in the visual task/attention condition?).

      We added a sentence to clarify which datasets were used: “Specifically, we used data from Experiment 1 which provided the closest match to our experimental conditions, and only considered the last 7 subjects that heard both the foregrounds and the backgrounds in isolation, in addition to the mixtures.” 

      It was a mistake to mention that it was all female, as the original dataset has 3 females and 8 males, of which we used 7 without any indication of their sex. Thus, we removed this mention from the text.

      In the "Statistical testing" section, why were some tests done with 1000 permutations/shuffles while others were done with 2000?

      We homogenized and used 1000 permutations/shuffles for all statistical tests.

      (4) Miscellany:

      (a) The Hamersky et al. 2023 preprint has recently been published (referenced in the public review), and so you could consider updating the reference.

      This reference has now been updated.

      (b) There are a few borderline statistical tests that could use a bit more nuance. For example (on page 4), "In primary auditory cortex (MEG), there was no significant difference between values of foreground invariance and background invariance (p = 0.063, obtained by randomly permuting the sounds' background and foreground labels, 1000 times)." This test is quite close to being significant, and this might be acknowledged.

      We emphasized the trend to nuance the interpretation of these results: “In primary auditory cortex (MEG), foreground invariance was slightly lower than background invariance, although this difference was not significant (p=0.063, obtained by randomly permuting the sounds' background and foreground labels, 1000 times).”

      (5) Potential typos:

      (a)   Should the title be "natural sound mixtures" instead of "natural sounds mixtures"?

      (b) The caption for Figure 1 says "We imaged the whole auditory through successive slices across several days." I believe this should the "the whole auditory [cortex]." c) In the first paragraph of the discussion, there is a sentence ending in "...are segregated in hemody-namic signal." I believe this should be "hemody-namic signal."

      These errors are now all corrected.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The paper is well written and investigates the cross-species insemination of fish eggs with mouse sperm. I have a few major and minor comments.

      Strengths:

      The experiments are well executed and could provide valuable insights into the complex mechanisms of fertilization in both species. I found the information presented to be very interesting,

      Thank you.

      Weaknesses:

      The rationale of some of the experiments is not well defined.

      Thank you. In the revised manuscript, we have clarified and expanded the rationale behind each experiment to better highlight the specific questions being addressed and how each approach contributes to our overall investigation. These clarifications have been integrated throughout the Results and Discussion sections. We provide detailed rationale in our point-by-point responses to both reviewers, outlining how each experimental design was motivated by prior findings, hypotheses, or specific gaps in knowledge. We hope these revisions make the experimental logic and progression better defined and more compelling.

      Major Comments:

      (1) Figure 5

      I do not understand the rationale for performing experiments using CatSper-null sperm and CD9-null oocytes. It is well established that CatSper-null sperm are unable to penetrate the zona pellucida (ZP), so the relevance of this approach is unclear.

      We thank the reviewer for this comment. This experiment was conducted as the basis to then evaluate the contributions of progressive and hyperactivated motility to the ability of mouse sperm to locate and traverse the zebrafish micropyle. In earlier experiments (Figures 1 and 3), we assessed whether sperm-micropyle interaction was robust by comparing it to binding to the mouse zona pellucida and testing whether both interactions persisted after washing, which is standard approach to distinguish specific binding from non-specific adherence (Avella et al., 2014; Baibakov et al., 2012). Thus, we extended this analysis to CatSper1<sup>Null</sup> sperm; CatSper1<sup>Null</sup> sperm were still capable of binding the zona pellucida comparably to heterozygous controls, though they were unable to cross the zona of Cd9<sup>Null</sup> eggs. These observations served as a validation step for the use of CatSper1<sup>Null</sup> sperm for downstream micropyle interaction assays. Thus, we proceeded to test whether hyperactivated motility, absent in CatSper1<sup>Null</sup> sperm, is required for locating and crossing the micropyle.

      It is indeed well established that CatSper1<sup>Null</sup> sperm are unable to penetrate the zona pellucida, and previous studies have typically used the absence of fertilized eggs as a readout. However, failed fertilization may result from multiple factors, including impaired sperm motility, reduced capacity to bind the zona pellucida, or an inability to penetrate it. To our knowledge, no study has quantitatively assessed the number of CatSper-deficient sperm that successfully bind, cross the zona and reach the perivitelline space. To address this, we first used normal oocytes for sperm binding and Cd9<sup>Null</sup> oocytes (Le Naour et al., 2000), which allow direct quantification of sperm accumulation in the perivitelline space. We have 7included a detailed explanation in the Results to clarify this point, lines 352-365 and 376-369.

      (2) Micropyle penetration and sperm motility

      CatSper-null sperm are reportedly unable to cross the micropyle, but this could be due to their reduced motility rather than a lack of hyperactivation per se. Were these experiments conducted using capacitated or non-capacitated spermatozoa? What was the observed motility of CatSper-null sperm during these assays? Clarifying these conditions is essential to avoid drawing incorrect conclusions from the results.

      Thank you for raising these points. Under our IVF conditions, qualitative observations confirmed that CatSper1<sup>Null</sup> sperm displayed progressive motility, maintained sufficient progressive motility during the first hour post-insemination and exhibited zona binding efficiency comparable to that of CatSper1<sup>Het</sup> controls (Figure 5A and B). This is consistent with previous reports showing that within the first 90 minutes of sperm incubation in media, approximately 20% of CatSper1<sup>Null</sup> sperm preserve motility (Qi et al., 2007). Given previous studies indicating that 15–35% of sperm undergo hyperactivation within 90 minutes (Goodson et al., 2011), and considering that 100,000 progressively motile sperm were used for insemination, we estimate that approximately 3,000 hyperactivated CatSper1<sup>Null</sup> sperm were present in the cross-species insemination dish (mouse sperm x zebrafish eggs). Based on these numbers, we would have expected at least some sperm to locate the micropyle if hyperactivation were not required for its detection and entry. Nevertheless, CatSper1<sup>Null</sup> sperm were detected in proximity to the micropyle canal, its opening, or within the inter-chorion space (ICS). These observations support the conclusion that the inability ofCatSper1<sup>Null</sup> sperm to locate and enter the micropyle is attributable to their failure to hyperactivate. Also, all sperm used in these assays were exposed to identical capacitating conditions (HTF/HSA, 37 °C, 5% CO2). We now clarify this in the Methods, line 624, and we added more rationale under the Results, lines 361-365 and in the Discussion, lines 470-483.

      (3) Rheotaxis and micropyle navigation

      Previous studies have shown that CatSper-null sperm fail to undergo rheotaxis. Could this defect be related to their inability to locate and penetrate the micropyle? Exploring a potential shared mechanism could be informative.

      Thank you for raising this interesting point. Indeed, homozygous mutant mice lacking expression of a different component of the CatSper channel, CatSperz, show reduced rheotactic efficiency and severe subfertility (Chung et al., 2017). We cannot exclude that complete lack of CatSper as shown in CatSper1<sup>Null</sup> mice could lead to reduced rheotactic efficiency, hence we include this interpretation in the Discussion (lines 484-486).

      (4) Lines 61-74

      This paragraph omits important information regarding acrosomal exocytosis, which occurs prior to sperm-egg fusion. Including this detail would strengthen the discussion.

      Thank you. We have revised the text in the discussion to describe the process of acrosome exocytosis, and its relevance for fertilization (lines 504-518).

      Reviewer #2 (Public review):

      Summary:

      Garibova et al. investigated the conservation of sperm recognition and interaction with the egg envelope in two groups of distantly related animals: mammals (mouse) and fish (zebrafish). Previous work and key physiological differences between these two animal groups strongly suggest that mouse sperm would be incapable of interaction with the zebrafish egg envelope (chorion) and its constituent proteins, though homologous to the mammalian zona pellucida (ZP). Indeed, the authors showed that mouse sperm do not bind recombinant zebrafish ZP proteins nor the intact chorion. Surprisingly, however, mouse sperm are able to locate and bind to the zebrafish micropyle, a specialized canal within the chorion that serves as the egg's entry point for sperm. This study suggests that sperm attraction to the egg might be highly conserved from fish to mammals and depends on the presence of a still unknown glycosylated protein within the micropyle. The authors further demonstrate that mouse sperm are able to enter the micropyle and accumulate within the intrachorionic space, potentially through a CatSper-dependent mechanism.

      Strengths:

      The authors convincingly demonstrate that mouse sperm do not bind zebrafish ZP proteins or the chorion. Furthermore, they make the interesting observation that mouse sperm are able to locate and enter the zebrafish micropyle in an MP-dependent manner, which is quite unexpected given the large evolutionary distance between these species, the many physiological differences between mouse and zebrafish gametes, and the largely different modes of both fertilization and reproduction in these species. This may indicate that the sperm chemoattractant in the egg is conserved between mammals and fish; however, whether zebrafish sperm are attracted to mouse eggs was not tested.

      Thank you. We performed an additional experiment with fish sperm used to inseminate ovulated mouse eggs, and results are reported in lines 183-187 and in Supplementary Figure 2.

      Weaknesses:

      The key weakness of this study lies in the rationale behind the overall investigation. In mammals, the zona pellucida (ZP) has been implicated in binding sperm in a taxon-specific manner, such that human sperm are incapable of binding the mouse ZP. Indeed, work by the corresponding author showed that this specificity is mediated by the N-terminal region of the ZP protein ZP2 (Avella et al., 2014). The N-termini of human and mouse ZP2 share 48% identity, which is higher than the overall identity between mouse and zebrafish ZP2, with the latter ortholog entirely lacking the N-terminal domain that is essential for sperm binding to the ZP. Given this known specificity for mouse vs. human sperm-ZP binding, it does not follow that mouse sperm would bind ZP proteins from not only a species that is much more distantly related, but also one that is not even a mammal, the zebrafish. Furthermore, the fish chorion does not play a role in sperm binding at all, while the mammalian ZP can bind sperm at any location. On the contrary, the zebrafish chorion prevents polyspermy by limiting sperm entry to the single micropyle.

      We thank the reviewer for this detailed comment. In this study, our goal was precisely that one of validating the hypothesis that mouse sperm would not bind either recombinant fish ZP proteins or the chorion; in addition, we found it important to examine the observation that mouse sperm could detect the micropyle. We further elaborated this rationale in the Introduction (lines 93-100).

      In addition, though able to provide some information regarding the broad conservation of sperm-egg interaction mechanisms, the biological relevance of these findings is difficult to describe. Fish and mammals are not only two very distinct and distantly related animal groups but also employ opposite modes of fertilization and reproduction (external vs. internal, oviparous vs viviparous). Fish gametes interact in a very different environment compared to mammals and lack many typically mammalian features of fertilization (e.g., sperm capacitation, presence of an acrosome, interaction with the female reproductive tract), making it difficult to make any physiologically relevant claims from this study. While this study may indicate conserved mechanisms of sperm attraction to the egg, the identity of the molecular players involved is not investigated. With this knowledge, the reader is forced to question the motivation behind much of the study.

      We thank the reviewer for their perspective, and we appreciate the opportunity to further elaborate on our rationale. As outlined in our Results and Discussion sections, a growing body of evidence supports the presence of conserved molecular players and signaling pathways involved in gamete interaction across species with diverse reproductive strategies. While zebrafish and mice do differ in their fertilization environments and modes of reproduction, these differences may not necessarily exclude the possibility of conserved molecular mechanisms underlying gamete interaction. For example, the CatSper calcium channel, which plays a key role in regulating sperm motility and hyperactivation, is conserved across a broad range of taxa—from echinoderms such as sea urchins (external fertilizers)(Seifert et al., 2015) to mammals, including mice and humans (internal fertilizers)(Lishko and Mannowetz, 2018). Moreover, sperm from some fish species possess acrosomes that undergo exocytosis prior to fertilization while sperm cross the micropyle (Psenicka et al., 2010). Also, in ovoviviparous species with internal fertilization, such as the black rockfish, sperm do undergo molecular changes while in the female reproductive tract—including immunomodulatory adaptations, glycocalyx remodeling, and interactions with ovarian cells—enabling the sperm with a longer-term survival and a selective persistence that ensures only the fittest sperm can successfully fertilize eggs (Li et al., 2024). As per the mammalian capacitation, it is broadly defined as the process during which sperm undergo hyperactivation (Yanagimachi, 1970), and acquire the ability to undergo the acrosome exocytosis, making the sperm competent for gamete fusion and fertilization (Bhakta et al., 2019; Puga Molina et al., 2018; Yanagimachi, 1957; Yanagimachi et al., 2017). Of note, acrosome exocytosis or changes in sperm motility are not exclusive to internal fertilizers. For example, as we cite in our manuscript (and as just stated above), acrosome exocytosis has been described to occur as sturgeon sperm cross the micropyle (Psenicka et al., 2010). As per changes in flagellar motility, investigations in the Pacific herring (Clupea sp.) demonstrated that sperm remain nearly immotile upon release into seawater and only initiate motility when approaching the micropyle region of the egg (Yanagimachi, 1957; Yanagimachi et al., 2017). In other fish, including bitterling and zebrafish, further enhancement in sperm motility is observed as sperm approach the micropyle area (Suzuki, 1958; Yanagimachi et al., 2017). These studies suggest that functional equivalents of capacitation may exist across taxa.

      We interpret the observation that mouse sperm can locate and enter the micropyle as suggesting that underlying guidance mechanisms may be more broadly conserved across distant species than previously recognized. We have now elaborated on these points in the revised Discussion (lines 531-552), and we hope the motivation behind our study is now more clearly articulated.

      During fertilization in fish, the sperm enters the micropyle and subsequently, the egg, as it is simultaneously activated by exposure to water. During egg activation, the chorion lifts as it separates from the egg and fills with water. This mechanism prevents supernumerary sperm from entering the egg after the successfully fertilizing sperm has bound and fused. In this study, the authors show that mouse sperm enter the micropyle and accumulate in the intrachorionic space. Whether any sperm successfully entered the egg is not addressed, and the status of egg activation is not reported.

      We appreciate the reviewer’s detailed comments and the opportunity to elaborate on this important aspect for our cross-insemination assay. We interpret the reviewer’s reference to “sperm entering the egg” as pertaining to sperm adhesion to the oocyte plasma membrane followed by fusion with the egg cell, two separate steps regulated by different molecular players for sperm-egg plasma membrane adhesion (Bianchi et al., 2014; Fujihara et al., 2021; Herberg et al., 2018; Inoue et al., 2005) and for fusion. It is important to note that proteins mediating gamete fusion are still unidentified in fish and mammals (Bianchi and Wright, 2020; Deneke and Pauli, 2021).

      In our cross-species insemination experiments, zebrafish oocytes were maintained in Hank’s solution to limit spontaneous activation; however, as the reviewer correctly notes, activation likely occurred upon exposure to HTF. While this model does not recapitulate full fertilization events, it serves as a platform to explore whether mammalian sperm can detect (within the scope of our study) and respond (future studies) to putative evolutionarily conserved signals, such as those guiding fish sperm toward the micropyle.

      While investigating cross-species sperm–oocyte fusion was not within the scope of this study and would require a distinct set of experimental approaches, we believe this question is an important one. However, we do not expect our platform to be informative for evaluating sperm adhesion to the fish oolemma or for enabling cross-species gamete fusion. In our assays focused on sperm-micropyle interaction, Hoechst staining of nuclei of transgenically-tagged acrosome sperm revealed no evidence of sperm adhesion to or fusion with the fish egg membrane (Figure 4D). Also, molecular incompatibilities may further prevent this interaction: in zebrafish, the Ly6/uPAR family protein Bouncer is expressed exclusively in the egg and is necessary for sperm–egg membrane adhesion (Herberg et al., 2018). Recent studies in zebrafish and mice have shown that a conserved trimeric complex composed of Izumo1, Spaca6, and Tmem81 on the sperm surface is required for mediating adhesion to the oocyte membrane by interacting with the mammalian oocyte receptor Izumo1R (also known as JUNO) or the zebrafish oocyte receptor Bouncer (Deneke et al., 2024). One would hypothesize that for mouse sperm to adhere to the zebrafish egg membrane, the mouse Izumo1-Spaca6-Tmem81 complex would need to establish binding with Bouncer. To explore this possibility, we performed AlphaFold2-Multimer structural predictions and docking analyses to mimic an interaction between mouse Izumo1-Spaca6-Tmem81 and zebrafish Bouncer, using mouse Izumo1-Spaca6-Tmem81 and Juno or zebrafish Izumo1-Spaca6-Tmem81 and Bouncer as positive controls. We observed low binding affinity between zebrafish Bouncer and the mouse trimeric complex (Izumo1, Spaca6, and Tmem81), as indicated by low ipTM scores and high predicted aligned error (PAE) values. These findings suggest that the mouse complex is unlikely to form an interaction with Bouncer (now shown in Suppl. Figure 7). These predictions were consistent with our observations that no sperm were found adhering or fusing to the egg cell. We describe methods and results in the supplementary files (Supporting Info, lines 53-66) and in the result sections (lines 335-339).

      In Supplementary Videos 3-4, the egg shown has been activated for some time, as evident by the separation of yolk and cytoplasm, yet the chorion is only partially expanded (likely due to mouse IVF conditions). How multiple sperm were able to enter the micropyle but presumably not the egg is not addressed, yet this suggests that the zebrafish mechanism of blocking polyspermy (fertilization by multiple sperm) is not effective for mouse sperm or is rendered ineffective due to mouse IVF conditions. The authors do not discuss these observations in the context of either species' physiological process of fertilization, highlighting the lack of biological context in interpreting the results.

      Thank you for raising this important point. One model for mammalian gamete recognition at the zona supports the notion that mouse sperm can penetrate extracellular matrices as long as sperm can bind to them, and binding is dependent on the cleavage status of ZP2. Zonae surrounding unfertilized mouse eggs present uncleaved ZP2 and these zonae support sperm binding. After gamete fusion, the cortical granules release ovastacin which cleaves ZP2 at the N-terminus, and consequently, zonae presenting cleaved ZP2 no longer support sperm binding. This mechanism acts as block to zona binding and prevents further crossing (Bhakta et al., 2019). Indeed, fertilized mouse eggs or 2-cell embryos surrounded by a zona containing uncleaved ZP2 support de novo sperm binding, and supernumerary sperm cross the zona and accumulate in the perivitelline space, unable to fuse with the fertilized oocyte plasma membrane or blastomere cells (Baibakov et al., 2012, 2007; Burkart et al., 2012; Gahlay et al., 2010). Thus, because under our experimental conditions, mouse sperm could interact with the micropyle opening, we interpret these findings to suggest that once interaction occurs at the micropyle opening, mouse sperm are capable of crossing it, even under conditions where the micropyle may be detached from the oocyte due to oocyte activation. Therefore, our data indicates that mouse sperm may be able to bypass the mechanism of zebrafish oocytes blocking multiple sperm to pass through the micropyle, even after oocyte activation. This point has now been incorporated into the revised Discussion (lines 425-441).

      The authors further show that the zebrafish micropyle does not trigger the acrosome reaction in mouse sperm. Whether the acrosome reacts is not correlated with a sperm's ability to cross the micropyle opening, as both acrosome-intact and acrosome-reacted sperm were observed within the intrachorionic space. While the acrosome reaction is a key event during mammalian fertilization and is required for sperm to fertilize the egg, zebrafish sperm do not contain an acrosome. Thus, these results are particularly difficult to interpret biologically, bringing into question whether this observation has biological relevance or is a byproduct of egg activation/chorion lifting that indirectly draws sperm into the chorion.

      We thank the reviewer for raising this point and we appreciate the opportunity to elaborate on the biological relevance of this experiment. Our motivation to assess acrosome status in mouse sperm following entry into the zebrafish micropyle stemmed from the following biological considerations.  In fish species such as the sturgeon, sperm present an acrosome and undergo acrosome exocytosis while passing through the micropyle, before gamete fusion (Alavi et al., 2012; Psenicka et al., 2010). By contrast, zebrafish sperm lack an acrosome, raising the hypothesis that the zebrafish micropyle may not be able to trigger acrosome exocytosis. However, this possibility has not been experimentally tested. We therefore considered it important to investigate whether passage through the zebrafish micropyle induces acrosome exocytosis in mouse sperm. We have revised the Discussion to better clarify the rationale behind the experiment as well as the interpretation of the findings (lines 504-518). As per the chorion lifting indirectly drawing sperm into the chorion, we have not observed this phenomenon.

      The final experiments regarding CatSper1's role in mediating mouse sperm entry into the micropyle/chorion are not convincing. As no molecular interactions are described or perturbed, the reader cannot be sure whether the sperm's failure to enter is due to signaling via CatSper1 or whether the overall failure to undergo hyperactivation limits sperm motility such that the mutant sperm can no longer find and enter the zebrafish micropyle. Indeed, in Figure 5E, no CatSper1 mutant sperm are visible near any part of the egg, suggesting that overall motility is impaired, and this is not a phenotype specific to interactions with the micropyle.

      We appreciate the comment and the opportunity to further elaborate on the rationale of this experiment. While our data demonstrates a lack ofCatSper1<sup>Null</sup> sperm accumulation within the micropyle and ICS, we appreciate that this may be interpreted as the result of general motility defects, rather than a specific failure in undergoing hyperactivation and micropyle recognition. CatSper1<sup>Null</sup>  sperm are known to lack hyperactivated motility and exhibit a progressive loss of forward motility over time. After 90 minutes, only ~20% of CatSper1<sup>Null</sup>l sperm remain motile, compared to over 70% in fertile sperm (Qi et al., 2007). Of note, under our IVF conditions, CatSper1<sup>Null</sup> sperm retained sufficient progressive motility during the first hour post-insemination to bind the zona pellucida with comparable efficiency to CatSper1<sup>Het</sup> controls. Based on prior reports indicating that 15–35% of sperm exhibit hyperactivation by 90 minutes (Goodson et al., 2011), and considering that we inseminated with 100,000 progressively motile sperm, we estimate that approximately 3,000 hyperactivated CatSper1<sup>Null</sup> sperm were present in the dish. Yet, none were observed near the micropyle canal, its opening, or within the ICS. This led us to conclude that failure to hyperactivate underlies the inability of CatSper1<sup>Null</sup> sperm to reach and traverse the micropyle. Also, we appreciate that identifying the molecular components of the micropyle would allow direct testing of whether the CatSper channel is activated in response to micropyle-associated signals. Indeed, no targeted perturbation of molecular interaction regulating micropyle recognition was performed in this study, as the molecular identity of the zebrafish micropyle guidance cue remains unknown. Efforts to identify and characterize this factor are ongoing in our lab and lie outside the scope of the current work. Therefore, throughout the manuscript, we have clarified that it is the failure to undergo hyperactivation, rather than the absence of CatSper per se, that limits the ability of sperm to locate and traverse the micropyle. The rationale for the experiment, the interpretation of our findings, and relevant future directions have been further elaborated in the revised Abstract, Impact Statement and Discussion (lines 40-41; 46-47; 343-365; 376-379; 389-399; 470-486).

      Reviewer #1 (Recommendations for the authors):

      Minor Comments

      (1) Figure numbering

      There appear to be inconsistencies in the figure references. For example, what is referred to as Figure 3F in the text is actually Figure 4F. Please review and correct all figure labels for accuracy.

      We thank the reviewer for pointing this out. We have carefully reviewed the manuscript and corrected all figure references throughout the text. Also, for better flow and coherence, we have moved the paragraph describing the videos to the end of the Results section titled "Mouse sperm recognize the micropylar region of fish oocytes." Previously, the callout of panels in Figure 3 was out of order (3A, 3B, 3E, 3C, 3D), and this reorganization also helps maintain logical progression through the figure panels.

      (2) Figure 5 terminology:

      The term "normal" sperm should be replaced with "CatSper heterozygous (Het)" sperm to avoid confusion and improve precision.

      We thank the reviewer for this helpful suggestion. We have revised the terminology in Figure 5 and throughout the manuscript, replacing “normal” sperm with “CatSper1 heterozygous (Het)”

      Reviewer #2 (Recommendations for the authors):

      In addition to my comments in the public review, I would encourage the authors to consider the following suggestions:

      The authors show that mouse sperm can find and enter the fish micropyle, and that this depends on the presence of MP. To better assess sperm binding to the micropyle region, the number of sperm binding to the micropyle vs. non-micropyle chorion should be clearly quantified, as well as the percentage of sperm that enter the micropyle compared to the total used for insemination. The authors state several times throughout the text that a "subpopulation" of mouse sperm finds and enters the micropyle, but it would be more precise and informative to give a percentage.

      We thank the reviewer for this suggestion. We have now reported also the number of sperm bound to the other regions of the chorion (away; lines 231-233), as well as the percentage of sperm that entered the micropyle relative to the total number used for insemination (lines 276-279).

      To ensure that all sperm are inside the chorion, the egg should be removed from the insemination dish, washed thoroughly, and then the chorion should be torn open to definitively show that the sperm were indeed inside.

      We thank the reviewer for these excellent suggestions. As per ensuring that the sperm are inside the ICS, (as shown now in Figures 4A, F, G , Supplementary Figure 6 and Supplementary Movies 3–5), the inseminated oocytes were thoroughly washed prior to imaging to ensure that only sperm located inside the chorion were visualized (as described in the Methods, lines 646-648). In addition, to confirm the spatial localization of sperm within the ICS, we are now including additional TEM images showing sperm in the ICS (Figure 4G, right panel). Also, we generated orthogonal views using ZEN Lite software (Zeiss, Germany) from a z-stack encompassing the full volume of the chorion, ICS, and oocyte (added in the supplementary materials, as Supplementary Figure 6). These views display three focal planes: the surface of the WGA-stained chorion, the middle of the ICS, and the oocyte plasma membrane. Sperm nuclei stained with Hoechst are clearly visible below the chorion surface and above the oocyte plasma membrane, confirming their localization within the ICS. Additionally, in a separate set of experiments, as recommended by this reviewer, we mechanically disrupted the chorion and consistently detected sperm within the ICS. This procedure, however, was technically challenging: upon disruption, the chorion often collapsed onto the oocyte, and during the extraction process, sperm were sometimes displaced. As a result, it was not always possible to determine with complete confidence whether the sperm had originally been located inside or outside the chorion. However, we hope that the additional TEM and confocal images (Figure 4G and Supplementary Figure 6) offer further support for the localization of sperm within the ICS.

      I would further suggest that they examine the micropyle opening after the entry of multiple sperm, as well as the dynamics of egg activation during insemination with mouse sperm.

      Thank you. We now include one additional TEM image capturing the full structure of a micropyle that was traversed by multiple mouse sperm (shown in Figure 4G, left panel).

      At what point does the micropyle detach from the egg surface? Live imaging of this process with a confocal microscope would be very informative.

      During live imaging, the interval between placing the oocyte in the imaging dish, replacement of Hank’s solution with HTF and the addition of sperm, followed by the initiation of video acquisition, is approximately 2 to 3 min. By this time, the ICS is already apparent (Supplementary Video 2), although the micropyle appears to remain adherent to the egg cell. Partial detachment of the micropyle from the egg cell begins around 6–7 minutes after imaging starts and continues progressively over time. We provide time-lapse imaging frames to show the micropyle detachment under mouse IVF conditions (Supplementary Figure 5).

      Along the same lines, sperm should be doubly labeled with an acrosome-independent marker, i.e., a live DNA stain or MitoTracker. Then the authors could track if any sperm are actually able to enter the egg itself, which would be highly unlikely but an important detail to confirm.

      Thank you for pointing this out. In our assays designed to study sperm–micropyle interactions, Hoechst staining of nuclei in transgenically labeled acrosome sperm showed no indication of sperm adhesion to, or fusion with, the zebrafish egg cell (Figure 4D).

      Line 242, 282: The text should refer to Figure 4, not 3. Please make sure all figure references correspond to the correct figure and panel.

      Thank you for bringing this to our attention. We have carefully reviewed the manuscript and corrected the reference to Figure 4, along with all other figure and panel citations to ensure they accurately correspond to the correct content. Also, to improve the overall flow, we relocated the paragraph describing the videos to the end of the Results section titled "Mouse sperm recognize the micropylar region of fish oocytes". This change also helped correct the sequence of figure panel references, which were previously cited out of order (i.e., 3A, 3B, 3E, 3C, 3D).

      Line 244: The authors quantify sperm that are "away" from the micropyle, but this is not clearly defined. This should be given as a set radius or distance from the center (e.g., in microns). If the sperm are still motile, can this be accurately measured?

      We thank the reviewer for this valuable suggestion. We have now defined “away from the micropyle” as a distance greater than 160 µm from the center of the micropyle. This measurement was determined using confocal z-stack projections of fixed samples. These details have been added to the revised Methods section (lines 670-674).

      To strengthen the conclusion that the sperm chemoattractant is indeed conserved from fish to mammals, the authors could show that zebrafish sperm are also able to find/approach mouse eggs. Even more compelling would be to show the same is true for other species combinations. As it stands, the choice of comparing mouse and zebrafish does not seem scientifically motivated but rather due to their availability.

      We thank the reviewer for this important suggestion. To test whether zebrafish sperm are capable of binding to the mammalian zona pellucida, we conducted the suggested experiment: ovulated, cumulus-free mouse oocytes were placed in water and incubated with zebrafish sperm. We did not observe any zebrafish sperm bound to the mouse zona pellucida, consistent with the hypothesis that zebrafish sperm do not recognize or interact with mammalian zonae or ZP proteins. This has now been added in the Results (lines 183-187) and shown in Supplementary Figure 2. We interpret these findings as in cross-species insemination assays, reciprocity in sperm-egg interaction is not always observed. For example, while human sperm bind only to human zonae and not to mouse zonae, mouse sperm are able to bind both mouse and human zonae (Avella et al., 2014; Baibakov et al., 2012; Bedford, 1977). This asymmetry may reflect species-specific adaptations in sperm-egg recognition. We have now added this point to the revised Discussion to clarify the rationale and context of our approach (lines 416-423).

      As per the choice of experimental models, while we agree that testing additional species combinations would broaden the scope of the findings, the choice to compare mouse and zebrafish was not solely based on availability. Rather, it was motivated by the opportunity to examine sperm guidance across two evolutionary distant vertebrates. This contrast allows us to seek for potential conservation of structural or molecular cues involved in gamete interaction. Additionally, both zebrafish and mouse offer extensive gene editing, blotting and imaging reagents, which are particularly valuable should future studies aim to identify and functionally disrupt genes encoding micropyle-associated proteins and their putative orthologs in mammals.

      For the CatSper experiment, I would suggest that the authors repeat this experiment with another mouse sperm mutant that is known to have reduced/altered motility. With the current data, I do not believe the failure to find/enter the micropyle is necessarily CatSper-specific. Because we do not know what the sperm interacts with in the micropyle or what the MP interacts with on the sperm, the signaling pathway cannot be tested, making other controls necessary for these results to be meaningful.

      Thank you for highlighting this important point. A wide range of mouse models with sperm motility defects exhibit subfertility or infertility due to structural abnormalities in the axoneme or midpiece rigidity. (Miyata et al., 2024). These defects often result in impaired progressive motility, failure to reach the zona pellucida, or inability to bind or penetrate it. In contrast, we could test and validate that CatSper1<sup>Null</sup> sperm display preserved early progressive motility but fail to transition into hyperactivated motility, making them particularly well suited for specifically assessing the role of hyperactivation in sperm navigation toward and entry into the micropyle. Taken together, these points, along with those discussed in our response to the public review, led us to conclude that the CatSper1<sup>Null</sup> model provides the most biologically relevant context currently available to assess the role of hyperactivation in guiding sperm to the micropyle.

      The authors could greatly strengthen the discussion by addressing the key points I raised in the public review, particularly in terms of interpreting these results in the context of each species' physiological mode of fertilization.

      We thank the reviewer for this important recommendation. We have carefully revised the Discussion to address the key points raised in the public review, particularly by framing our findings within the context of the distinct physiological modes of fertilization in each species, as indicated n our answers to the public review. We hope these additions have strengthened the manuscript as suggested.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Sumary:

      This study evaluates whether species can shift geographically, temporally, or both ways in response to climate change. It also teases out the relative importance of geographic context, temperature variability, and functional traits in predicting the shifts. The study system is large occurrence datasets for dragonflies and damselflies split between two time periods and two continents. Results indicate that more species exhibited both shifts than one or the other or neither, and that geographic context and temp variability were more influential than traits. The results have implications for future analyses (e.g. incorporating habitat availability) and for choosing winner and loser species under climate change. The methodology would be useful for other taxa and study regions with strong community/citizen science and extensive occurrence data.

      We thank Reviewer 1 for their time and expertise in reviewing our study. The suggestions are very helpful and will improve the quality of our manuscript.

      Strengths:

      This is an organized and well-written paper that builds on a popular topic and moves it forward. It has the right idea and approach, and the results are useful answers to the predictions and for conservation planning (i.e. identifying climate winners and losers). There is technical proficiency and analytical rigor driven by an understanding of the data and its limitations.

      We thank Reviewer 1 for this assessment.

      Weaknesses:

      (1) The habitat classifications (Table S3) are often wrong. "Both" is overused. In North America, for example, Anax junius, Cordulia shurtleffii, Epitheca cynosura, Erythemis simplicicollis, Libellula pulchella, Pachydiplax longipennis, Pantala flavescens, Perithemis tenera, Ischnura posita, the Lestes species, and several Enallagma species are not lotic breeding. These species rarely occur let alone successfully reproduce at lotic sites. Other species are arguably "both", like Rhionaeschna multicolor which is mostly lentic. Not saying this would have altered the conclusions, but it may have exacerbated the weak trait effects.

      We thank the reviewer for their expertise on this topic. We obtained these habitat classifications from field guides and trait databases, and reviewed our primary sources to clarify the trait classifications. We reclassified the species according to the expertise of this reviewer and perform our analysis again; please see details below.

      (2) The conservative spatial resolution (100 x 100 km) limits the analysis to wide- ranging and generalist species. There's no rationale given, so not sure if this was by design or necessity, but it limits the number of analyzable species and potentially changes the inference.

      It is really helpful to have the opportunity to contextualize study design decisions like this one, and we thank the reviewer for the query. Sampling intensity is always a meaningful issue in research conducted at this scale, and we addressed it head-on in this work.

      Very small quadrats covering massive geographical areas will be critically and increasingly afflicted by sampling weaknesses, as well as creating a potentially large problem with pseudoreplication. There is no simple solution to this problem. It would be possible to create interpolated predictions of species’ distributions using Species Distribution Models, Joint Species Distribution Models, or various kinds of Occupancy Models. None of these approaches then leads to analyses that rely on directly observed patterns. Instead, they are extrapolations, and those extrapolations typically fail when tested, although they have still been tested (for example, papers by Lee-Yaw demonstrate that it is rare for SDMs to predict things well; occupancy models often perform less well than SDMs and do not capture how things change over time - Briscoe et al. 2021, Global Change Biology). The result of employing such techniques would certainly be to make all conclusions speculative, rather than directly observable. 

      Rather than employing extrapolative models, we relied on transparent techniques that are used successfully in the core macroecology literature that address spatial variation in sampling explicitly and simply. Moreover, we constructed extensive null models that show that range and phenology changes, respectively, are contrary to expectations that arise from sampling difference. 100km quadrats make for a reasonable “middle-ground” in terms of the effects of sampling, and we added a reference to the methods section to clarify this (see details below).

      (3) The objective includes a prediction about generalists vs specialists (L99-103) yet there is no further mention of this dichotomy in the abstract, methods, results, or discussion.

      Thank you for pointing this out - it is an editing error that should have been resolved prior to submission. We replaced the terms specialist and generalist with specific predictions based on traits (see details below).

      (4) Key references were overlooked or dismissed, like in the new edition of Dragonflies & Damselflies model organisms book, especially chapters 24 and 27.

      We thank Reviewer 1 for making us aware of this excellent reference. We have reviewed the text and include it as a reference, in addition to other references recommended by Reviewer 1 and other reviewers (see details below).

      Reviewer #2 (Public review):

      Summary:

      This paper explores a highly interesting question regarding how species migration success relates to phenology shifts, and it finds a positive relationship. The findings are significant, and the strength of the evidence is solid. However, there are substantial issues with the writing, presentation, and analyses that need to be addressed. First, I disagree with the conclusion that species that don't migrate are "losers" - some species might not migrate simply because they have broad climatic niches and are less sensitive to climate change. Second, the results concerning species' southern range limits could provide valuable insights. These could be used to assess whether sampling bias has influenced the results. If species are truly migrating, we should observe northward shifts in their southern range limits. However, if this is an artifact of increased sampling over time, we would expect broader distributions both north and south. Finally, Figure 1 is missed panel B, which needs to be addressed.

      We thank Reviewer 2 for their time and expertise in reviewing our study.

      It is possible that some species with broad niches may not need to migrate, although in general failing to move with climate change is considered an indicator of “climate debt”, signaling that a species may be of concern for conservation (ex. Duchenne et al. 2021, Ecology Letters). We revised the discussion to acknowledge potential differences in outcomes (please see details below).

      We used null models to test whether our results regarding range shifts were robust, and if they varied due to increased sampling over time. We found that observed northern range limit shifts are not consistent with expectations derived from changes in sampling intensity (Figure S1, S2). 

      We thank Reviewer 2 for pointing out this error in Figure 1. This conceptual figure was a challenge to construct, as it must illustrate how phenology and range shifts can occur simultaneously or uniquely to enable a hypothetic odonate to track its thermal niche over time. In a previous version of the figure, we had a second panel and we failed to remove the reference to that panel when we simplified the figure. We have updated the figure and figure caption (please see details below).

      Reviewer #3 (Public review):

      Summary:

      In their article "Range geographies, not functional traits, explain convergent range and phenology shifts under climate change," the authors rigorously investigate the temporal shifts in odonate species and their potential predictors. Specifically, they examine whether species shift their geographic ranges poleward or alter their phenology to avoid extreme conditions. Leveraging opportunistic observations of European and North American odonates, they find that species showing significant range shifts also exhibited earlier phenological shifts. Considering a broad range of potential predictors, their results reveal that geographical factors, but not functional traits, are associated with these shifts.

      We thank Reviewer 3 for their expertise and the time they spent reviewing our study. Their suggestions are very helpful and will improve the quality of our manuscript.

      Strengths:

      The article addresses an important topic in ecology and conservation that is particularly timely in the face of reports of substantial insect declines in North America and Europe over the past decades. Through data integration the authors leverage the rich natural history record for odonates, broadening the taxonomic scope of analyses of temporal trends in phenology and distribution to this taxon. The combination of phenological and range shifts in one framework presents an elegant way to reconcile previous findings improving our understanding of the drivers of biodiversity loss.

      We thank Reviewer 3 for this assessment.

      Weaknesses:

      The introduction and discussion of the article would benefit from a stronger contextualization of recent studies on biological responses to climate change and the underpinning mechanism.

      The presentation of the results (particularly in figures) should be improved to address the integrative character of the work and help readers extract the main results. While the writing of the article is generally good, particularly the captions and results contain many inconsistencies and lack important detail. With the multitude of the relationships that were tested (the influence of traits) the article needs more coherence.

      We thank Reviewer 3 for these suggestions. We revised the introduction and discussion to better contextualize species’ responses to climate change and the mechanisms behind them (see details below). We carefully reviewed all figures and captions, and made changes to improve the clarity of the text and the presentation of results (see details below).

      Reviewer #1 (Recommendations for the authors):

      Comment:

      (1) Following weakness #1 in the public review, the authors should review the habitat classifications, consult with an odonatologist, and reclassify many species from Both to Lentic and redo the analysis.

      Thank you for pointing out this disagreement among expert habitat classifications that we cited and other literature. We reclassified species’ habitat preferences based on classifications by Hof et al., a source that was consistent with your suggestions, and identified additional species as Lentic that our other references had identified as Both. We performed our analysis with this new dataset and, as you suspected, our results did not change qualitatively: species habitat preferences did not predict their range shifts.

      Hof, Christian, Martin Brändle, and Roland Brandl. "Lentic odonates have larger and more northern ranges than lotic species." Journal of Biogeography 33.1 (2006): 63-70.

      Comment:

      (2) Following weakness #2, would it be worthwhile or interesting to analyze a smaller ranging group (e.g. cut the quad size in half, 50 x 50 km) to bring in more species and potentially change the inference? Or is the paper too tightly constructed to allow this, even as a secondary piece?

      Thank you for this comment, as it highlights an important consideration for macroecological analyses, and the importance of balancing multiple factors for determining quadrat size. Issues exist with identifying drivers of range boundaries among species with narrow ranges when they are analyzed separately from wide-ranging species, and examining larger quadrats can actually help clarify drivers (Szabo, Algar, and Kerr 2009). The smaller quadrats are, the higher the likelihood that the species is actually there but was never observed, or that the quadrat only covers unsuitable habitat and the species is absent from the entire (or almost entire) quadrat. Too many absences creates issues with violating model assumptions, and creates noise that makes it difficult to identify drivers of species’ range and phenology shifts.

      Moreover, we constructed extensive null models that show that range and phenology changes, respectively, are contrary to expectations that arise from sampling difference. 100km quadrats make for a reasonable “middle-ground”, and we have included a brief explanation of this in the text: “We assigned species presences to 100×100 km quadrats, a scale that is large enough to maintain adequate sampling intensity but still relevant to conservation and policy (Soroye et al., 2020), to identify the best sampled species.”  (Lines 170-172).

      Szabo, Nora D., Adam C. Algar, and Jeremy T. Kerr. "Reconciling topographic and climatic effects on widespread and range‐restricted species richness." Global Ecology and Biogeography 18.6 (2009): 735-744.

      Comment:

      (3) Following weakness #3, are specialists the ones that "failed to shift" (L18)? If so please specify. The prediction about generalists vs specialists needs to be removed or incorporated in other parts of the paper.

      Thank you for pointing this out, we intended to suggest that species with more generalist habitat requirements might be better able to shift, but ultimately found that traits did not predict species’ shifts. We corrected our prediction regarding habitat generalists as follows: “We predicted that species able to use both lentic and lotic habitats would shift their phenologies and geographies more than those able to use just one habitat type, as generalists outperform specialists as climate and land uses change (Ball-Damerow et al., 2015, 2014; Hassall and Thompson, 2008; Powney et al., 2015; Rapacciuolo et al., 2017).” (Lines 128-132).

      Comment:

      (4) Following weakness #4, cite Pinkert et al at lines 70-73 and Rocha-Ortega et al at lines 73-77 along with https://doi.org/10.1098/rspb.2019.2645. Add Sandall et al https:// doi.org/10.1111/jbi.14457 to L69 references.

      Thank you for the excellent reference suggestions, we have added them as suggested (Lines 80, 86, 77).

      Comment:

      Other comments/suggestions:

      (1) Title: consider adding temp variability 'Range geography and temperature variability, not functional traits,...'.

      Thank you for this suggestion, we have added temperature variability to the title: “Range geography and temperature variability explain cross-continental convergence in range and phenology shifts in a model insect taxon”.

      Comment:

      (2) L125: is (northern) Mexico included in North America?

      Yes, we did include observations from Northern Mexico, and have specified this in the text: “We retained ~1,100,000 records from Canada, the United States, and Northern Mexico, comprising 76 species (Figure 2).” (Lines 174-176).

      Comment:

      (3) L128: I'd label this section 'Temperature variability' rather than 'Climate data'.

      Thank you, we agree that this is a more appropriate title for this section, and have replaced ‘Climate data’ with ‘Temperature variability’ (Line 185).

      Comment:

      (4) Table 2: why are there no estimates for the traits?

      We apologise, this information should have been included in the main body of the manuscript, but was only explained in the Table 2 caption. We have added the following explanation: “Non-significant variables, specifically all functional traits, were excluded from the final models.”. (Line 312-323).

      Comment:

      (5) Figure 2: need to identify the A-D panels.

      We apologise for this error and have clarified the differences between panels in the figure caption:

      “Figure 2: Richness of 76 odonate species sampled in North America and Europe in the historic period (1980-2002; panes A and C) and the recent period (2008-2018; panes B and D). Species richness per 100 × 100 km quadrat is shown in panes A and B, while panes C and D show species richness per 200 × 200 km quadrat. Dark red indicates high species richness, while light pink indicates low species richness.” (Lines 1002-1006).

      Comment:

      (6) L163-173: I am not familiar with this analysis but it sounds interesting and promising, I am not sure if this can be clarified further. Why the -25 to 25, and -30 to 30, doesn't the -35 to 35 cover these? And what is meant by "include only phenology shifts that could be biologically meaningful", that larger shifts would not be meaningful or tied to climate change?

      We used different cutoffs for phenology shifts to inspect for outliers that were likely to be errors, potentially do to insufficient sampling to calculate phenology. We clarified in the text as follows:

      “We retained emergence estimates between March 1st and September 1st, as well as species and quadrats that showed a difference in emergence phenology of -25 to 25 days, -30 to 30 days, or -35 to 35 days between both time periods, to include only phenology shifts that could be biologically meaningful to environmental climate change (i.e. exclude errors).” (Lines 169-173).

      Comment:

      (7) L193-200: I agree but would make a distinction between ecological vs functional traits, as other studies view geographic traits as ecological manifestations of functional biology, e.g. https://doi.org/10.1016/j.biocon.2019.07.001 and https://doi.org/10.1016/ j.biocon.2023.110098.

      Thank you for this suggestion, and for making us aware of the thinking around range geographies as ecological traits. We have specified throughout the manuscript that the ‘traits’ we are considering are ‘functional traits’, changed the methods subsection title to “Range geographies and functional traits” (Line 252), and added a brief discussion of ecological traits: “Geographic range and associated climatic characteristics are often considered ecological traits, as they are consequences of functional traits and their interactions with geographic features (Bried and Rocha-Ortega, 2023; Chichorro et al., 2019).” (Lines 256-259).

      Comment:

      (8) L203: What's the rationale for egg-laying habitat as "biologically relevant to spatial and temporal responses to climate change"? That one's not as obvious as the others and needs a sentence more. Also, I am wondering why other traits were not considered here, like color lightness and voltinism. And why not wing size instead of body size, or better yet the two combined (wing loading) as a proxy for dispersal ability?

      We agree that our rationale for using this trait should be better explained, and we have included the following explanation: “Egg laying habitat was assigned according to whether species use exophytic egg-laying habitat (i.e. eggs laid in water or on land, relatively larger in number), or endophytic egg-laying habitat (i.e. eggs laid inside plants, usually fewer in number); species using exophytic habitats are associated with greater northward range limit shifts (Angert et al., 2011).” (Lines 271-275).

      We considered traits that have been found to be important for range and phenology shifts among odonates, as well as being key traits for expectations for species responses to climate change. Flight duration and body size are correlated with dispersal ability (Powney et al. 2015). Body size is also correlated with competitive ability (Powney et al. 2015), potentially making it an important predictor of a species’ ability to establish and maintain populations in expanding range areas. Traits correlated with range shifts also include breeding habitat type (Powney et al. 2015; Bowler et al. 2021) and egg laying habitat (Angert et al. 2011). Ideally, we would have used dispersal data from mark/release/recapture studies, but it was not available for many of the species included in this study. After finding that none of the functional traits we included were related to range shifts, there was no reason to believe that a further investigation of traits would be meaningful.

      Angert AL, Crozier LG, Rissler LJ, Gilman SE, Tewksbury JJ, Chunco AJ. 2011. Do species’ traits predict recent shifts at expanding range edges? Ecology Letters 14:677–689. doi:10.1111/j.1461-0248.2011.01620.x

      Bowler DE, Eichenberg D, Conze K-J, Suhling F, Baumann K, Benken T, Bönsel A, Bittner T, Drews A, Günther A, Isaac NJB, Petzold F, Seyring M, Spengler T, Trockur B, Willigalla C, Bruelheide H, Jansen F, Bonn A. 2021. Winners and losers over 35 years of dragonfly and damselfly distributional change in Germany.Diversity and Distributions 27:1353–1366. doi:10.1111/ddi.13274

      Powney GD, Cham SSA, Smallshire D, Isaac NJB. 2015. Trait correlates of distribution trends in the Odonata ofBritain and Ireland. PeerJ 3:e1410. doi:10.7717/peerj.1410

      Comment:

      (9) L210: I count at least 5 migratory species in table S3, so although maybe not enough to analyze it's misleading to say "nearly all" were non-migratory, revise to "most" or "vast majority".

      Thank you for pointing this out, we have made the suggested correction (Line 277).

      Comment:

      (10) L252-254: save this for the Discussion and write a more generalized statement for results to avoid citations in the results.

      Thank you for this suggestion, we have moved this to the discussion (Lines 517-527).

      Comment:

      (11) Figures S5 & S6: these are pretty important, I'd consider elevating them to the main document as one figure with two panels.

      Thank you for this suggestion, we agree these figures should be elevated to the main text, and have made them into a panel figure (Figure 4).

      Comment:

      (12) L305-307: great point and recommendation!

      Thank you very much for this positive feedback!

      Comment:

      (13) L335-336: another place to cite https://doi.org/10.1098/rspb.2019.2645 which includes a thermal sensitivity index and would add an odonate citation behind the statement.

      Thank you for this excellent suggestion, we have added this citation (line 480). (Rocha-Ortega et al. 2020)

      Comment:

      (14) L352-353: again see also https://doi.org/10.1098/rspb.2019.2645.

      Thank you for highlighting this reference, we have added it to Line 505 as suggested.

      Comment:

      (15) L355: revise "populations that coexist" to "species that co-occur" (big difference between population and species levels and between coexistence and co-occurrence).

      Thank you very much for pointing this out, we have made the suggested change (Line 507).

      Comment:

      (16) L359-365: are the winners and losers depicted in Figures S5 & S6? If so reference the figure (which I suggest combining and promoting to the main text), if not create a table listing the analyzed species and their winner/loser status.

      We agree that this is an excellent place to bring up Figures S5 and S6 from the supplemental. We have moved them to the main document as one figure and referenced it at line 510.

      Reviewer #2 (Recommendations for the authors):

      Comment:

      (1) Line 53-55: The claim that "These relationships generalize poorly taxonomically and geographically" is valid, but the study only tests Odonata on two continents.

      Thank you for this comment – the word ‘generalize’ may imply that our study tries to find a general pattern across many groups. We have changed the language to: “However, these relationships are inconsistent across taxa and regions, and cross-continental tests have not been attempted (Angert et al., 2011; Buckley and Kingsolver, 2012; Estrada et al., 2016; MacLean and Beissinger, 2017).” (Lines 57-59).

      Comment:

      (2) Line 58-59: Is this statement only true for Odonata? It does not seem to hold for plants, for example.

      Thank you for this comment – this statement references a meta-analysis of multiple animal and plant taxa, but the evidence for the importance of range location comes from animal taxa. We have specified that we are referring to animal species to clarify (Line 60).

      Comment:

      (3) Line 87-91: This section is difficult to understand and needs clarification.

      We have clarified this section as follows: “While warm-adapted species with more equatorial distributions could expand their ranges poleward following warming (Devictor et al., 2008), they could also increase in abundance in this new range area relative to species that historically occupied those areas and are less heat-tolerant (Powney et al., 2015).” (Lines 95-121).

      Comment:

      (4) Line 99-100: Please define "generalist" and "specialist" more clearly here (e.g., based on climate niche?).

      Thank you for pointing this out, we intended to suggest that species with more generalist habitat requirements might be better able to shift, but ultimately found that traits did not predict species’ shifts. We corrected our prediction regarding habitat generalists as follows: “We predicted that species able to use both lentic and lotic habitats would shift their phenologies and geographies more than those able to use just one habitat type, as generalists outperform specialists as climate and land uses change (Ball-Damerow et al., 2015, 2014; Hassall and Thompson, 2008; Powney et al., 2015; Rapacciuolo et al., 2017).” (Lines 128-132).

      Comment:

      (5) Line 122: Replace the English letter "X" in "100x100 km" with the correct mathematical symbol.

      We have made the suggested replacement throughout the manuscript.

      Comment:

      (6) Line 148: To address sampling effects, you could check the paper: https://onlinelibrary.wiley.com/doi/full/10.1111/gcb.15524. Additionally, maximum and minimum values are sensitive to extreme data points, so using 95% percentiles might be more robust.

      Thank you for sharing this paper, as it offers a valuable perspective on the study of species’ ranges. While our dataset is substantially composed of observations from adult sampling protocols, unlike the suggested paper which compares adults and juveniles, this is an interesting alternative approach.

      For our purposes it is meaningful to include outliers, as otherwise we may have missed individuals at the leading edge of range expansions. Our intent here was to detect range limits, as opposed to finding the central tendency of species distributions. This approach is widely accepted in the macroecology literature (i.e. Devictor et al., 2012, 2008; Kerr et al. 2015).

      We have included the following discussion of our approach in the methods section:

      “We followed widely accepted methods to determine species range boundaries (Devictor et al., 2012, 2008; Kerr et al., 2015), although other methods exist that are appropriate for different data types and research questions i.e. (Ni and Vellend, 2021). We assigned species presences to 100×100 km quadrats, a scale that is large enough to maintain adequate sampling intensity but still relevant to conservation and policy (Soroye et al., 2020), to identify the best sampled species.” (Lines 168-173).

      Kerr JT, Pindar A, Galpern P, Packer L, Potts SG, Roberts SM, Rasmont P, Schweiger O, Colla SR, Richardson LL,Wagner DL, Gall LF, Sikes DS, Pantoja A. 2015. Climate change impacts on bumblebees converge across continents. Science 349:177–180. doi:10.1126/science.aaa7031

      Soroye P, Newbold T, Kerr J. 2020. Climate change contributes to widespread declines among bumble bees across continents. Science 367:685–688. doi:10.1126/science.aax8591

      Devictor V, Julliard R, Couvet D, Jiguet F. 2008. Birds are tracking climate warming, but not fast enough.Proceedings of the Royal Society B: Biological Sciences 275:2743–2748. doi:10.1098/rspb.2008.0878

      Devictor V, van Swaay C, Brereton T, Brotons L, Chamberlain D, Heliölä J, Herrando S, Julliard R, Kuussaari M,Lindström Å, Reif J, Roy DB, Schweiger O, Settele J, Stefanescu C, Van Strien A, Van Turnhout C,

      Vermouzek Z, WallisDeVries M, Wynhoff I, Jiguet F. 2012. Differences in the climatic debts of birds and butterflies at a continental scale. Nature Clim Change 2:121–124. doi:10.1038/nclimate1347

      Comment:

      (7) Line 195: The species' climate niche should also be considered a product of evolution.

      Thank you for this suggestion. To address this comment and a comment from another reviewer, we changed the text to the following: “Geographic range and associated climatic characteristics are often considered ecological traits, as they are consequences of functional traits and their interactions with geographic features (Bried and Rocha-Ortega, 2023; Chichorro et al., 2019).” (Lines 256-259).

      Comment:

      (8) Line 244: This speculative statement belongs in the Discussion section.

      Thank you for this suggestion, we have moved this statement to the discussion (Lines 451-453).

      Comment:

      (9) Line 252-254: The projection of Coenagrion mercuriale's range contraction is not part of your results and should be clarified or removed.

      Following this suggestion and a similar suggestion from another reviewer, we moved this text to the discussion (Line 517-527).

      Comment:

      (10) Line 314-316: If the species can tolerate warmer temperatures better, why would they migrate?

      We apologize for the confusion, and we have reworded the section as follows: “Emerging mean conditions in areas adjacent to the ranges of southern species may offer opportunities for range expansions of these relative climate specialists, which can then tolerate climate warming in areas of range expansion better than more cool-adapted historical occupants (Day et al., 2018).” (Lines 445-448).

      Comment:

      (11) Line 334-335: Species' tolerance to temperature likely depends on their traits, which were not tested in this study. This should be noted.

      We agree, and we have removed the wording “rather than traits” from this sentence (Line 479).

      Reviewer #3 (Recommendations for the authors):

      Comment:

      (1) Title: The title is too general not specifying that your results are on odonates only, but also stressing the implicit role of climate change to a degree the tests do not support.

      Following this comment and a suggestion from another reviewer we changed the title to the following: “Range geography and temperature variability explain cross-continental convergence in range and phenology shifts in a model insect taxon”. We wanted to emphasize our use of Odonates as a model species that we used to ask broad questions, while being more specific about the climatic variable that we examined (temperature variability).

      Comment:

      (2) L32: consider including Novella-Fernandez et al. 2023 (NatCommun) which addresses this topic in Odonates.

      Thank you for suggesting this very interesting paper, we have added it as a citation (Line 31-32).

      Comment:

      (3) L35: consider including Grewe et al. 2013 (GEB) and Engelhardt et al. 2022(GCB).

      Thank you for these excellent suggestions, we have added the citations (Line 35).

      Comment:

      (4) L47: rather write 'result from' instead of 'driven by'.

      We agree this is a better characterization and have corrected the wording (Line 48-49).

      Comment:

      (5) L49-52: There has been a recent study on this topic for birds (Neate-Clegg et al., 2024 NEE). However, specifying this to insects would make it not less relevant. This review for odonates might be helpful in this regard (Pinkert et al.. 2022, Chapter: "Odonata as focal taxa for biological responses to climate change" IN Dragonflies & Damselflies: Córdoba-Aguilar et al. (2022) Model Organisms for Ecological and Evolutionary Research.

      Thank you for again suggesting excellent references, we have added them to line 52-53, as well as adding the Pinkert citation to lines 61 and 82.

      Comment:

      (6) L53-66: Combine into one paragraph about drivers. With traits first and the environment second. The natural land cover perspective may be too complicated in this context. Consider focusing on generalities of the impact of changes within species' ranges.

      As suggested we have combined these into one paragraph about drivers (Line 59).

      Comment:

      (7) L67-69: The book from before would be a much stronger reference for this claim. Kalkmann et al (2018) do not address the emphasis of global change research in insects on bees and butterflies. Also, I would highlight that most of the current work is at a national scale, rather than cross-continental.

      Thank you for this suggestion, we have added the suggested reference and included that “…recently assembled databases of odonate observations provide a rare opportunity to investigate species’ spatiotemporal responses at larger taxonomic and spatial scales, particularly as most work has been done at national scales.” (Lines 75-77).

      Comment:

      (8) L68: consider rephrasing this part to '..provide a rare opportunity to investigate spatiotemporal biotic responses at larger taxonomic and spatial scales'

      We appreciate this suggestion and really like the wording. We have changed the phrase to read as follows: “While global change research on insects often emphasizes butterfly and bee taxa, recently assembled databases of odonate observations provide a rare opportunity to investigate species’ spatiotemporal responses at larger taxonomic and spatial scales, particularly as most work has been done at national scales.” (Lines 74-77).

      Comment:

      (9) L69: This characteristic is not unique to odonates and would hamper drawing general conclusions. Honestly, I think the detailed and comprehensive data on them is the selling point.

      Thank you for this suggestion, we have edited the sentence to emphasize their use as an indicator species: “Due to their use of aquatic and terrestrial habitat across life different stages, dragonflies and damselflies are also considered indicator species for both terrestrial and aquatic insect responses to changing climates (Hassall, 2015; Pinkert et al., 2022; Šigutová et al., 2025), giving the study of these species broad relevance for conservation.” (Lines 78-81)

      Comment:

      (10) L73: Indicator for what? The first part of the sentence would suggest lesser surrogacy for responses of other taxa. Reconsider this statement. They are well- established indicators for habitat intactness and freshwater biodiversity. Darwell et al. suggested their diversity can serve as a surrogate for the diversity of both terrestrial and aquatic taxa.

      Thank you for this suggestion, we have edited the sentence to emphasize their use as an indicator species: “Due to their use of aquatic and terrestrial habitat across life different stages, dragonflies and damselflies are also considered indicator species for both terrestrial and aquatic insect responses to changing climates (Hassall, 2015; Pinkert et al., 2022; Šigutová et al., 2025), giving the study of these species broad relevance for conservation.” (Lines 78-81)

      Comment:

      (11) L76: Fritz et al., is a study on mammals, not odonates.

      Thank you for pointing out this error, the reference has been removed (Line 84-85).

      Comment:

      (12) L84: Lotic habitats are generally better connected than lentic ones. Lentic species are considered to have a greater propensity for dispersal DUE to the lower inherent spatiotemporal stability (implying lower connectivity) compared to lotic habitats.

      Thank you for your comment, we have rewritten this section as follows: “For example, differences in habitat connectivity and dispersal ability may constrain range shifts for lentic species (those species that breed in slow moving water like lakes or ponds) and lotic species (those living in fast moving-water) in different ways (Kalkman et al., 2018). More southerly lentic species may expand their range boundaries more than lotic species, as species accustomed to ephemeral lentic habitats better dispersers (Grewe et al., 2013), yet lotic species have also been found to expand their ranges more often than lentic species, potentially due to the loss of lentic habitat in some areas (Bowler et al., 2021).” (Lines 88-95).

      Comment:

      (13) L90: I would be cautious with this interpretation. If only part of the range is considered (here a country in the northern Hemisphere) southern species are moving more of their range into and northern species more of their range out of the study area in response to warming (implying northward shifts).

      We have clarified this section as follows: “While warm-adapted species with more equatorial distributions could expand their ranges poleward following warming (Devictor et al., 2008), they could also increase in abundance in this new range area relative to species that historically occupied those areas and are less heat-tolerant (Powney et al., 2015).” (Lines 95-121)

      Comment:

      (14) L117: Odonata Central contains many county centroids as occurrence records. These could be an issue for your use case. I may have overlooked the steps you took to address this, but I think this requires at least more detail and possibly further removal/checks using for instance CoordinateCleaner. The functions implemented in this package allow you to filter records based on political units to avoid exactly this source of error.

      Thank you for this suggestion, we weren’t aware of this issue with Odonata Central. We used the CoordinaterCleaner tool in R to filter all odonate records that we used in our analyses. Less than 1% of observations in our dataset were identified as having potential problems by the tool, so we would not expect this to affect our inferences. However, in future we will employ this tool when using similar datasets.

      Comment:

      (15) L119: Please add a brief explanation of why this was necessary. I am ok with something along the lines in the supplement.

      We moved this information from the supplemental to the main text as follows: “If a species was found on both continents, we only retained observations from the continent that was the most densely sampled. If we merged data for one species found on both continents, we could not perform a cross-continental comparison. However, if the same species on different continents was treated as different species, this would lead to uninterpretable outcomes (and the creation of pseudo-replication) in the context of phylogenetic analyses. In addition, species found on both continents did not have sufficient data to meet criteria for the phenology analysis.” (Lines 161-167).

      Comment:

      (16) L132: This is the letters 'X' or 'x' are not multiplier symbols! Please change to the math symbol (×), everywhere.

      Thank you for pointing out this error, we have made the correction throughout the manuscript.

      Comment:

      (17) L133: add 'main' before 'flight period'

      Thank you for this suggestion, we have made the change. (Line 190)

      Comment:

      (18) L135: I suggest using the coefficient of variation, as it is controlled for the mean. Otherwise, what you see is partly the signature of temperature and not of its variation. For me, it's very difficult to understand what this variation of the variation means and at least needs more explanation.

      Thank you very much for this suggestion, we agree that using the coefficient of variation is a better fit for the question that we’re asking. We re-ran out analyses with the coefficient of variation as the measure of climate variability: all the results reported in the manuscript are now updated for that analysis (Line 377, Table 2), and we have also updated the methods section (Line 191). The results are qualitatively the same to our previous analysis, but we agree that they are now easier to interpret.            

      Comment:

      (19) L155: Please adequately reference all R packages (state the name, and a reference for them including the authors' names, title, and version).

      Thank you for pointing out this omission, we have added reference information for the glm function in base R (Line 298) and ensured all other packages are properly referenced.

      Comment:

      (20) L207: Mention the literature sources here (again).

      We agree that they should be referenced here again, and we have done so (Lines 267-268).

      Comment:

      (21) L209: You could use the number of grid cells as a proxy for range size.

      Following this excellent suggestion, we re-analysed our data using range size, calculated as the number of quadrats occupied by a species in the historical time period, as a predictor. Range size was not significant in our models, but we believe this is the best way to analyze our data, and so have updated our methods (Lines 261-263) and results (375-378).

      Comment:

      (22) L218: It would be preferable to say 'species-level' instead of 'by-species'.

      Thank you for this suggestion, we agree that this is clearer and made the change (Line 298).

      Comment:

      (23) L219-220: this is unclear. Please rephrase.

      We have clarified as follows: “We used both species-level frequentist (GLM; glm function in R) and Bayesian (Markov Chain Monte Carlo generalized linear mixed model, MCMCglmm; Hadfield, 2010) models to improve the robustness of the results.” (Lines 298-300).

      Comment:

      (24) L224: At least for Europe there is a molecular phylogeny available, which you should preferably use (Pinkert et al. 2018, Ecography). Otherwise, I am ok with using what is available

      We apologize that the nature of the phylogeny that we used was not clear; the phylogeny that we used was built similarly to that in Pinkert et al. 2018, Ecography. It created a molecular phylogeny with a morphological/taxonomic tree as the backbone tree, so that species could only move within their named genera or families. We clarified this in the manuscript as follows:

      “We used the molecular phylogenetic tree published by the Odonate Phenotypic Database (Waller et al., 2019), which used a morphological and taxonomic phylogeny as the backbone tree, allowing species to move within their named genera or families according to molecular evidence (Waller and Svensson, 2017).” (Lines 302-305).

      Comment:

      (25) L233: You said so earlier (1st sentence of this paragraph).

      Thank you for pointing this out, we removed the repetitive sentence (Line 323).

      Comment:

      (26) L236-238: To me, it makes more sense to test this prior to fitting the phylogenetic models.

      MCMC-GLMM is considerably less familiar to most researchers than general linear models or there derivatives/descendants, such as PGLS. We report models both with and without phylogenetic relationships included for the sake of transparency, and we are happy to acknowledge that no interpretation here changes substantially relative to these decisions. However, failing to report models that included possible (if small) effects of phylogenetic relatedness might cause some readers to question what those models might have implied. For the moment, we are opting for the most transparent reporting approach here.

      Comment:

      (27) L241: Rather say directly XX of XX species in our data....

      (28) L245: Same here. Provide the actual numbers, please.

      Thank you for this suggestion, we made this change on Line 332 and Line 334.

      Comment:

      (29) L247-249: Then not necessary.

      This issue highlights a challenge in the global biology literature and around the issue of biodiversity monitoring for understanding global change impacts on species. Almost no studies have been able to report simultaneous range and phenology shifts, and the literature addresses these biotic responses to global change predominantly as distinct phenomena. Differences in numbers of species for which these observations exist, even among the extremely widely-observed odonates, seems to us to be a meaningful issue to report on. If the reviewer prefers that we abbreviate or remove this sentence, we are happy to do so.

      Comment:

      (30) L251:261: That is discussion as you interpret your results.

      Following your suggestion and the suggestion of another reviewer, we moved the following lines to the discussion section: “Species that did not shift their ranges northwards or advance their phenology included Coenagrion mercuriale, a European species that is listed as near threatened by the IUCN Red List (IUCN, 2021), and is projected to lose 68% of its range by 2035 (Jaeschke et al., 2013).” (Lines 517-527).

      Comment:

      (31) 252: Good to mention, but why is the discussion limited to C. mercurial?

      We feel that it is important to link the broad-scale results to the specific biological characteristics of individual species, and C. mercurial is an IUCN threatened species. We are happy to expand links to natural history of this group and have added the following: “This group also includes Coenagrion resolutum, a common North American damselfly (Swaegers et al., 2014), for which we could not find evidence of decline. This may be due in part to the greater area of intact habitat available in North American compared to Europe, enabling C. resolutum to maintain larger populations that are less vulnerable to stochastic climate events. Still, this and other species failing to shift in range or phenology should be assessed for population health, as this species could be carrying an unobserved extinction debt.” (Lines 527-533).

      Comment:

      (32) L264: Insert 'being' before 'consistently'.

      Thank you for the suggestion, we made this change (Line 373).

      Comment:

      (33) L271: .'. However,'.

      Thank you for pointing out this grammatical error, we have corrected it (Line 382).

      Comment:

      (34) L273: 'affected' instead of 'predicted'

      Thank you for the suggestion, we made this change (Line 383).

      Comment:

      (35) L279: 'despite pronounced recent warming' sounds not relevant in this context.

      Thank you for this suggestion, we removed this portion of the sentence (Line 408).

      Comment:

      (36) L281: Rather 'the model performance did not improve....'

      Thank you for the suggestion, we made this change (Line 409).

      Comment:

      (37) L288: Add 'but' before 'not'.

      Thank you for the suggestion, we made this change (Line 416).

      Comment:

      (38) L311-316: Reconsider the causality here. maybe rather rephrase to are associated instead. Greater dispersal ability and developmental plasticity might well lead to higher growth rates, rather than the other way around.

      We agree that plasticity/evolution at range edges is important to consider and have included it as an alternative explanation: “Adaptive evolution and plasticity may enable higher population growth rates in newly-colonized areas (Angert et al., 2020; Usui et al., 2023), but this possibility can only be directly tested with long term population trend data.” (Line 449-451).  

      Comment:

      (39) L313-316: Maybe delete the second 'should be able to'.

      This phrase has been changed in response to other reviewer comments and now reads as follows:

      “Emerging mean conditions in areas adjacent to the ranges of southern species may offer opportunities for range expansions of these relative climate specialists, which can then tolerate climate warming in areas of range expansion better than more cool-adapted historical occupants (Day et al., 2018).” (Lines 445-448).

      Comment:

      (40) L331: Limit this statement ending with 'in North American and European Odonata'.

      Thank you for this suggestion, we made this addition (Lines 475-476).

      Comment:

      (41) L346-347: There are too many of these more-research-is-needed statements in the discussion (at least three in the last paragraphs). Please consider finishing the paragraphs rather with a significance statement.

      Thank you for this suggestion, we have changed the final sentence here to the following: “The extent to which species’ traits actually determine rates of range and phenological shifts, rather than occasionally correlated with them, is worth considering further, but functional traits do not systematically drive patterns in these shifts among Odonates in North America and Europe.” (Lines 480-483).

      We also made additional changes, removing a ‘more-research is needed’ statement from the following paragraph (Line 443), as well as from line 499.

      Comment:

      (42) L349: See also Franke et al. (2022, Ecology and Evolution).

      Thank you for highlighting this excellent reference! We have added it to Line 501.

      Comment:

      (43) L363: Maybe a bit late in the text, but it is important to note that there is the third dimension 'abundance trends' or rather a common factor related to range and phenology shifts. I feel this fits better with the discussion of population growth.

      Thank you for this suggestion, we have addressed the importance of abundance trends in the following sentences: “Further mechanistic understanding of these processes requires abundance data.” (Lines 442-443); “It remains unclear if range and phenology shifts relate to trends in abundance, but our results suggest that there are clear ‘winners’ and ‘losers’ under climate change.” (Lines 509-510).

      Comment:

      (44) L375-377: This last sentence is very similar to L371-373. Please reduce the redundancy. Focus more on specifically stating the process instead of vaguely saying 'new insights into patterns' and 'suggesting processes'. Rather, deliver a strong concluding message here.

      Thank you for this suggestion, we feel that we now have a much stronger concluding message: “By considering both the seasonal and range dynamics of species, emergent and convergent climate change responses across continents become clear for this well-studied group of predatory insects.” (Lines 545-547).

      Comment:

      (45) Table 1: To me, the few estimates presented here do not justify a table. rather include them in the text. OR combine them with Table 2. Also, why not include the traits as predictors (from the range shift models) in these models as well?

      We have clarified in the text that the results displayed in Table 1 are from the analysis of the relationship between range and phenology shifts: “The effect of species’ range shifts on phenology range shifts was significant in our model investigating the relationship between these responses, indicating that species shifting their northern range limits to higher latitudes also showed stronger advances in their emergence phenology (Figure 3).” (Lines 341-344).

      As there were no significant effects in the model of phenology change drivers, we have not shown results of this model: “Emergence phenology shifts were not affected by species’ traits, range geography, nor climate variability; due to this, model results are not displayed here.” (Lines 383-384).

      Comment:

      (46) Table 2: L712-713: What does this mean? Are phenology shifts not used as a predictor of range shifts? (why then this comment?). Or do you want to say phenological shifts are not related to Southern range etc? Why do you present a phylosig here but not in Table 1? Why not include the traits as predictors (from the range shift models) in these models as well? Consider using the range size as a continuous predictor instead of 'Widespread'.

      We are glad the reviewer pointed this out to us. We did not emphasize this issue sufficiently. We DID evaluate traits as predictors both of geographical range and phenological shifts, and species-specific biological traits did not significantly affect models predicting either of those sets of responses. We state this on Lines 312-323, but we have also noted in the discussion (Lines 473-476) that the most commonly assessed traits, like body size, do not alter observed trends here. Instead, where species are found, rather than the characteristics of species, is the key determinant of their overall responses.

      Following this excellent suggestion, we re-analysed our data using range size, calculated as the number of quadrats occupied by a species in the historical time period, as a predictor. Range size was not significant in our models, but we believe this is the best way to analyze our data, and so have updated our methods (Lines 261-263) and results (375-378).

      Comment:

      (47) Figure 1: I don't see any grey points in the figure. Also, there is no A or B. If you are referring to the symbols then write cross and triangle instead and not use capital letters which usually refer to component plots of composite figures. Also, I highly recommend providing a similar figure based on your data (maybe each species as a dot for T1 and another symbol for T2). Given the small number of species, you could try to connect these points with arrows. For the set with only range shifts maybe play the T2-dots at the center of the 'Emergence' axis.

      Thank you for pointing out this error: a previous version of Figure 1 included grey points and multiple panels. We have removed this text from the figure caption to be consistent with the final version of the figure (Line 989).

      The graphical depictions of the conceptual and empirical discoveries in this paper were challenging to create. The reviewer might be suggesting effectively decomposing Figure 3 (change in range on the y axis vs change in phenology among all species into two sets of points on the same graph, where each pair of points is a before and after value for each species. This would make for a very busy figure indeed. We have modified the conceptual Figure 1 to illustrate more clearly, we believe, that species can (in principle) remain within tolerable niche spaces by shifting their activity periods in time (phenology) or in space (geographical range) or both.

      Comment:

      (48) Figure 2: Please add a legend. Also black is a poor background color. The maps appear to be stretched. Please check aspect ratios. Now here are capital letters without an explanation in the caption. From the context I assume the upper panel maps are for the data used to calculate range shifts at the bottom panel maps are for data used to calculate the phenological shifts.

      We apologise for the error in the figure caption and have clarified the differences between panels in the text, as well as changing the map background colour and fixing the aspect ratio:

      “Figure 2: Richness of 76 odonate species sampled in North America and Europe in the historic period (1980-2002; panes A and C) and the recent period (2008-2018; panes B and D). Species richness per 100 × 100 km quadrat is shown in panes A and B, while panes C and D show species richness per 200 × 200 km quadrat. Dark red indicates high species richness, while light pink indicates low species richness.” (Lines 1002-1006).

      Comment:

      (49) Figure 3: Why this citation? Of terrestrial taxa? Please explain. Consider adding some stats here, such as the r-squared value for each of the relationships.

      We have better explained the citation in the figure caption, as well as adding r-squared values:

      “Figure 3: Relationship between range shifts and emergence phenology shifts among North American and European odonate species (N = 66; model R2 = 17.08 for glm, 14.9% for MCMCglmm). For reference, the shaded area shows mean latitudinal range shifts of terrestrial taxa as reported by Lenoir et al. (2020; calculated as the yearly mean dispersal rate of 1.11 +/- 0.96 km per year over 38 years).” (Lines 679-682)

      Comment:

      (50) L801: What are these underscored references?

      This was an issue with the reference software and has been resolved.

      Comment:

      (51) Table S1: L848: Consider starting with 'Samples of 76 North American and European odonate species from between ...'. Please use a horizontal line to separate the content from the table header. Add a horizontal line below the last row. Same for all tables.

      Thank you for this suggestion, we have edited the caption for Figure S1 as suggested (Line 1124). We have also made the suggested line additions to Table S1, S2, and S3.

      Comment:

      (52) Table S3: This is confusing. In Table 1 (main text) both 'southern range' and 'widespread' are used as predictors. Please explain.

      We originally included information on species range geography, including southern versus northern range, and widespread versus not, into one categorical variable. Following additional comments we re-analysed our data using range size, calculated as the number of quadrats occupied by a species in the historical time period, as a predictor. Now the methods section text (Lines 261-263) and Table 1 report results of that variable with distribution options northern, southern, or both. 

      Comment:

      (53) Figure S5 and S6: It would be more coherent if the colors refer to the continents and the suborders are indicated by shading. I would love to see a combination of the two figures with species ordered by the phylogenetic relationship and a dot matrix indicating the traits in the main text! This could really be a good starting point for a synthesis figure.

      The reviewer presents an interesting challenge for us. We have a choice, as we understand things, to present a figure showing phylogeny and traits (as requested here), or an ordered list of species relative to effect sizes in the two main responses to global change. The latter choice centers on the discoveries of the paper, while the former would be valuable for dragonfly biology but would depict information that proved to be biologically uninformative relative to our discovery. That is to say, there is no phylogenetic trend and biological traits among species did not affect results. We have gone some way toward illustrating that issue by retaining phylogeny in the MCMC-GLMM models, but we feel that a figure illustrating phylogeny and traits would (for most readers, at least) illustrate noise, rather than signal. For this reason, we have opted to take on the previous reviewer’s suggestion for a modified, main-text Figure 4, which we include below.

      Figure 4: Distribution of Northern range limit shifts (Panel A, kilometers) and emergence phenology shift (Panel B, Julian day) of 76 European and North American odonate species between a recent time period (2008 - 2018) and a historical time period (1980 - 2002). Anisoptera (dragonflies) are shown in pink, Zygoptera (damselflies) are shown in blue.

      Change last: Figure 3: Relationship between range shifts and emergence phenology shifts among North American and European odonate species (N = 66; model R2 = 17.08 for glm, 14.9% for MCMCglmm). For reference, the shaded area shows mean latitudinal range shifts of terrestrial taxa as reported by Lenoir et al. (2020; calculated as the yearly mean dispersal rate of 1.11 +/- 0.96 km per year over 38 years).

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      (1) The bad equilibria of the model still remain a concern, as well as other features like the transient overshoots that do not match with the data. I think they could achieve more accuracy here by assigning more weight to such specific features, through adding these as separate objectives for the generator explicitly. The traces contain a five-second current steps, and one second before and one second after the training step. This means that in the RMSE, the current step amplitude will dominate as a feature, as this is simply the state for which the data trace contains most time-points. Note that this is further exacerbated by using the IV curve as an auxiliary objective. I believe a better exploration of specific response features, incorporated as independently weighted loss terms for the generator, could improve the fit. E.g. an auxiliary term could be the equilibrium before and after the current step, another term could penalise response traces that do not converge back to their initial equilibrium, etc.

      We thank the reviewer for the suggestion. We supplemented the membrane potential regression loss with errors computed for 3 intervals: pre- post- and mid- stimulation time intervals, improving the accuracy of EP-GAN for baseline membrane potential responses (Figure 2, 3, Table S2, S3). We also changed the simulation protocols for generated parameters by allowing a longer simulation time of 15 seconds, where the stimulation is applied during [5, 10] seconds and no stimulation at t = [0, 5) (pre-stimulation) and t = (10, 15] (post-stimulation). These time intervals are chosen to ensure sufficient stabilization periods before and after stimulation.  

      (2) The explanation of what the authors mean with 'inverse gradient operation' is clear now. However, this term is mathematically imprecise, as the inverse gradient does not exist because the gradient operator is not injective. The method is simply forward integration under the assumption that the derivate of the voltage is known at the grid time-points, and should be described as such.

      We thank the reviewer for the clarification on inverse gradient operation terminology. In the Methods section, we changed the term describing the inverse gradient operation to ‘forward integration’ which is a more accurate description describing the process.

      (3) I appreciate that the authors' method provides parameters of models at a minimal computational cost compared to running an evolutionary optimization for every new recording. I also believe that with some tweaking of the objective, the method could improve in accuracy. However, I share reviewer 2's concerns that the evolutionary baseline methods are not sufficiently explored, as these methods have been used to successfully fit considerably more complex response patterns. One way out of the dilemma is to show that the EP-GAN estimated parameters provide an initial guess that considerably narrows the search space for the evolutionary algorithm. In this context, the authors should also discuss the recent gradient based methods such as Deistler et al. (https://doi.org/10.1101/2024.08.21.608979) or Jones et al (https://doi.org/10.48550/arXiv.2407.04025).

      We supplemented the optimization setup for existing methods (GDE3, NSDE, DEMO, and NSGA2) by incorporating steady-state response constraints as the initial selection process. The process is similar to that of EP-GAN training data generation and DEMO parameter selection process [16] (see Results section, page 6 for detail). We also expanded the testing scenarios by evaluating all methods with respect to both small and large HH-model estimation. The small HH-model scenario estimates 47 parameters consisting of channel conductance, reversal potentials and initial conditions with the channel parameters (n = 129) frozen to default values in [41]. Large HH-model includes estimating channel parameters (i.e. 129) in addition to the 47 parameters by considering +-50% variations from their default values. For both small and large HH-model scenarios, we test total sample sizes of both 32k and 64k for all methods to evaluate their scalability with the number of simulated samples given during optimization. The results show that existing methods show good performances for small HH-model scenarios that scale with sample size consistent with literature. EP-GAN on the other hand shows overall better performance in predicting membrane potential responses on both small and large HH-model scenarios.  

      Reviewer #2 (Public review):

      Major 1: Models do not faithfully capture empirical responses. While the models generated with EPGAN reproduce the average voltage during current injections reasonably well, the dynamics of the response are generally not well captured. For example, for the neuron labeled RIM (Figure 2), the most depolarized voltage traces show an initial 'overshoot' of depolarization, i.e. they depolarize strongly within the first few hundred milliseconds but then fall back to a less depolarized membrane potential. In contrast, the empirical recording shows no such overshoot. Similarly, for the neuron labeled AFD, all empirically recorded traces slowly ramp up over time. In contrast, the simulated traces are mostly flat. Furthermore, all empirical traces return to the pre-stimulus membrane potential, but many of the simulated voltage traces remain significantly depolarized, far outside of the ranges of empirically observed membrane potentials. The authors trained an additional GAN (EPGAN Extended) to improve the fit to the resting membrane potential. Interestingly, for one neuron (AWB), this improved the response during stimulation, which now reproduced the slowly raising membrane potentials observed empirically, however, the neuron still does not reliably return to its resting membrane potential. For the other two neurons, the authors report a decrease in accuracy in comparison to EP-GAN. While such deviations may appear small in the Root mean Square Error (RMSE), they likely indicate a large mismatch between the model and the electrophysiological properties of the biological neuron. The authors added a second metric during the revision - percentages of predicted membrane potential trajectories within empirical range. I appreciate this additional analysis. As the empirical ranges across neurons are far larger than the magnitude of dynamical properties of the response ('slow ramps', etc.), this metric doesn't seem to be well suited to quantify to which degree these dynamical properties are captured by the models.

      We made improvements to the training data generation and architecture of EP-GAN to improve its overall accuracy with predicted membrane potential responses. In particular, we divided training data generation into three neuron types found in C. elegans non-spiking neurons: 1) Transient outward rectifier, 2) Outward rectifier and 3) Bistable [8, 16]. Each randomly generated training sample is categorized into one of 3 types by evaluating its steady-state currents with respect to experimental dI/dV bound constraints (See generating training data section under Methods for more detail). The process is then followed by imposing minimum-maximum constraints on simulated membrane potential responses. The setup allows generations of training samples that are of closer distribution to experimentally recorded neurons. This is further described in Section Methods page 15 in the revised manuscript.

      We also improved the EP-GAN training process by incorporating random masking of input membrane potential responses. The masking forces EP-GAN to make predictions even with missing voltage traces, improving overall accuracy and allowing EP-GAN to use membrane potential inputs with arbitrary clamping protocol (see Methods page 13 for more detail). For the training loss functions, we further supplemented the membrane potential regression loss with errors computed for 2 intervals: pre- and post-stimulation time intervals to improve EP-GAN prediction capabilities for baseline membrane potentials.

      Taken together, these modifications improved EP-GAN’s overall ability to better capture empirical membrane potential responses and we show the results in Figure 2 – 5, Table S2, S3.

      Major 2: Comparison with other approaches is potentially misleading. Throughout the manuscript, the authors claim that their approach outperforms the other approaches tested. But compare the responses of the models in the present manuscript (neurons RIM, AFD, AIY) to the ones provided for the same neurons in Naudin et al. 2022 (https://doi.org/10.1371/journal. pone.0268380). Naudin et al. present models that seem to match empirical data far more accurately than any model presented in the current study. Naudin et al. achieved this using DEMO, an algorithm that in the present manuscript is consistently shown to be among the worst of all algorithms tested. I therefore strongly disagree with the authors claim that a "Comparison of EP-GAN with existing estimation methods shows EP-GAN advantage in the accuracy of estimated parameters". This may be true in the context of the benchmark performed in the study (i.e., a condition of very limited compute resources - 18 generations with a population size of 600, compare that to 2000 generations recommended in Naudin et al.), but while EP-GAN wins under these specific conditions (and yes, here the authors convincingly show that their EP-GAN produces by far the best results!), other approaches seem to win with respect to the quality of the models they can ultimately generate.

      We thank the reviewer for the feedback regarding the comparison with existing methods. We have revised the optimization setup for existing methods (GDE3, NSDE, DEMO, and NSGA2) by incorporating steady-state response constraints as the initial selection process. The process is similar to that of EP-GAN training data generation and DEMO parameter selection process [16] (see Results section, page 6 for detail). Incorporating this process has improved the accuracy of existing methods especially for small HH-model scenarios where DEMO stood out with the best performance alongside NSGA2 (Figure 5, Table 1, 2).

      We also expanded the testing scenarios by evaluating all methods with respect to both small and large HH-model estimation. The small HH-model scenario estimates 47 parameters consisting of channel conductance, reversal potentials and initial conditions with the channel parameters (n = 129) frozen to default values in [41]. Large HH-model includes estimating channel parameters (i.e. 129) in addition to the 47 parameters by considering +-50% variations from their default values. For both small and large HH-model scenarios, we test total sample sizes of both 32k and 64k for all methods to evaluate their scalability with the number of simulated samples given during optimization. The results show that existing methods show good performances for small HH-model scenarios that scale with sample size. EP-GAN on the other hand shows overall better performance in predicting membrane potential responses on both small and large HH-model scenarios. 

      In particular, with extended membrane potential error including pre-, mid- , post-activation periods, EP-GAN (trained with 32k samples, large HH-model, 9 neurons) mean membrane potential responses error of 2.82mV was lower than that of DEMO (12.2mV, 64k samples) trained on identical setup (Table 2) and DEMO (7.78mV, using 36,000k samples, 3 neurons) applied to simpler HHmodel in [16]. With respect to DEMO performance in [16], under identical simulation protocol (i.e., no stimulation during (0, 5s), (10, 15s) and stimulation during (5, 10s)), EP-GAN predicted RIM (large HH-model) showed membrane potential accuracy on par with that of DEMO (simpler HH-model) and EP-GAN predicted AFD showed better accuracy for post-activation membrane potential response where DEMO predicted membrane potentials overshoot above the baseline (not shown in the paper).

      Major 3: As long as the quality of the models generated by the EP-GAN cannot be significantly improved, I am doubtful that it indeed can contribute to the 'ElectroPhysiome', as it seems likely that dynamics that are currently poorly captured, like slow ramps, or the ability of the neuron to return to its resting membrane potential, will critically affect network computations. If the authors want to motivate their study based on this very ambitious goal, they should illustrate that single neuron model generation with their approach is robust enough to warrant well-constrained network dynamics. Based on the currently presented results, I find the framing of the manuscript far too bold.

      We thank the reviewer for the feedback regarding the paper's scope. With revised methods, the overall quality of EP-GAN models is improved with the most significant improvements in baseline membrane potential accuracy. While high quality neuron models could be attained with existing methods given sufficient sample size, our results suggest EP-GAN can predict models with enhanced quality with significantly fewer sample size without a need for retraining, thus complementing the main drawback of evolutionary based methods. While EP-GAN still has limitations (e.g., difficulty in predicting slow ramps) that need to be addressed in the future, we believe its overall performance combined with fast inference speed and flexibility in its input data format (e.g., missing membrane potential traces) is a step forward in the large-scale neuron modeling tasks that can contribute to network models.   

      Major 4: The conclusion of the ablation study 'In addition the architecture of EP-GAN permits inference of parameters even when partial membrane potential and steady-state currents profile are given as inputs' does not seem to be justified given the voltage traces shown in Figure 3. For example, for RIM, the resting membrane potential stays around 0 mV, but all empirical traces are around -40mV. For AFD, all simulated traces have a negative slope during the depolarizing stimuli, but a positive slope in all empirically observed traces. For AIY, the shape of hyperpolarized traces is off. While it may be that by their metric neurons in the 25% category are classified as 'preserving baseline accuracy', this doesn't seem justified given the voltage traces presented in the manuscript. It appears the metric is not strict enough.

      We improved EP-GAN’s training process by incorporating random masking of input membrane potential responses. The masking forces EP-GAN to make predictions even with missing voltage traces, improving overall accuracy and allowing EP-GAN to use membrane potential inputs with arbitrary clamping protocol.

      Such input masking during training has improved the results with ablation studies where EP-GAN now retains baseline membrane potential error (3.3mV, averaged across pre-, mid-, post-activation periods) up to 50% of membrane potential inputs remaining (3.5mV) and up to 25% of steady-state currents remaining (3.5mV).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The manuscript "Drosophila Visuomotor Integration: An Integrative Model and Behavioral Evidence of Visual Efference Copy" provides an integrative model of the visuomotor control in Drosophila melanogaster. This model presents an experimentally derived model based on visually evoked wingbeat pattern recordings of three strategically selected visual stimulus types with well-established behavioral response characteristics. By testing variations of these models, the authors demonstrate that the virtual model behavior can recapitulate the recorded wing beat behavioral results and those recorded by others for these specific stimuli when presented individually. Yet, the novelty of this study and their model is that it allows predictions for natural visual scenes in which multiple visual stimuli occur simultaneously and may have opposite or enhancing effects on behavior. Testing three models that would allow interactions of these visual modalities, the authors show that using a visual efference copy signal allows visual streams to interact, replicating behavior recorded when multiple stimuli are presented simultaneously. Importantly, they validated the prediction of this model in real flies using magnetically tethered flies, e.g., presenting moving bars with varying backgrounds. In conclusion, the presented manuscript presents a commendable effort in developing and demonstrating the validity of a mixture model that allows predictions of the behavior of Drosophila in natural visual environments.

      Strengths:

      Overall, the manuscript is well-structured and clear in its presentation, and the modeling and experimental research are methodically conducted and illustrated in visually appealing and easy-to-understand figures and their captions.

      The manuscript employs a thorough, logical approach, combining computational modeling with experimental behavioral validation using magnetically tethered flies. This iterative integration of simulation and empirical behavioral evidence enhances the credibility of the findings.

      The associated code base is well documented and readily produces all figures in the document.

      Suggestions:

      However, while the experiments provide evidence for the use of a visual efference copy, the manuscript would be even more impressive if it presented specific predictions for the neural implementation or even neurophysiological data to support this model. Or, at the very least, a thorough discussion. Nonetheless, these models and validating behavioral experiments make this a valuable contribution to the field; it is well executed and addresses a significant gap in the modeling of fly behavior and holistic understanding of visuomotor behaviors.

      We appreciate the reviewer’s thoughtful comments on the strengths and weaknesses of our manuscript. We agree that biophysically realistic model reflecting the structure of neural circuits as well as physiological data from them would be invaluable. However, we are currently unable to provide physiological evidence for EC-based suppression, nor provide circuit architecture for efference copy-based suppression of the stability circuit because the neural pathway underlying this behavior remains unidentified. Extensive recordings from the HS/VS system have revealed cell-type-specific motor-related inputs during both spontaneous and loom-evoked flight turns (Fenk et al., 2021; Kim et al., 2017, 2015). These studies predicted suppression of the optomotor stability response during such turns, and our new experiments confirmed this suppression specifically during loom-evoked turns (Figures 5, 6). However, these neurons are primarily involved in the head optomotor response, not the body optomotor response. We hope to extend our current model in future studies to incorporate more cellular-level detail, as the feedforward circuits underlying stability behavior become more clearly defined.

      Here are a few points that should be addressed:

      (1) The biomechanics block (Figure 2) should be elaborated on, to explain its relevance to behavior and relation to the underlying neural mechanisms.

      We appreciate this suggestion. The mathematical representation of the biomechanics block has been developed by other groups in previous studies (Fry et al., 2003; Ristroph et al., 2010). We used exactly the same model, and its parameters were identical to those used in one of those studies (Fry et al., 2003; Ristroph et al., 2010), in which the parameters were estimated from the stabilizing response in response to magnetic “stumbling” pulses. In the previous version of the manuscript, we had a description of the biomechanics block in the Method section (see Equation 4). In response to the reviewer’s comment, we have made a few changes in Figure 2A and expanded the associated description in the main text, as follows.

      (Line 160) “To test the orientation behavior of the model, we developed an expanded model, termed “virtual fly model” hereafter. In this model, we added a biomechanics block that transforms the torque response of the fly to the actual heading change according to kinematic parameters estimated previously (Michael H Dickinson, 2005; Ristroph et al., 2010) (Figure 2A, see Equation 4 in Methods and Movie S1). The virtual fly model, featuring position and velocity blocks that are conditioned on the type of the visual pattern, can now change its body orientation, simulating the visual orientation behavior of flies in the free flight condition.”

      (2) It is unclear how the three integrative models with different strategies were chosen or what relevance they have to neural implementation. This should be explained and/or addressed.

      Thank you for this valuable comment. We selected the three models based on previous studies investigating visuomotor integration across multiple species, under conditions where multiple sensory cues are presented simultaneously.

      The addition-only model represents the simplest hypothesis, analogous to the “additive model” proposed by Tom Collett in his 1980 study (Collett, 1980). We used this model as a baseline to illustrate behavior in the absence of any efference copy mechanism. Notably, some modeling studies have proposed linear (additive) integration for multimodal sensory cues at the behavioral level (Liu et al., 2023; Van der Stoep et al., 2021). However, experimental evidence demonstrating strictly linear integration—either behaviorally or physiologically—remains limited. In our study, new data (Figure 5) show that bar-evoked and background movement-evoked locomotor responses are combined linearly, supporting the addition-only model.

      The graded efference copy model has been most clearly demonstrated in the cerebellum-like circuit of Mormyrid fish during electrosensation (Bell, 1981; Kennedy et al., 2014). In this system, the efference copy signal forms a negative image of the predicted reafferent input and undergoes plastic changes as the environment changes—an idea that inspired our modifiable efference copy model (Figure 4–figure supplement 1). The all-or-none efference copy model is exemplified in the sensory systems of smaller organisms, such as the auditory neurons of crickets during stridulation (Poulet and Hedwig, 2006). Notably, in crickets, the motor-related input is referred to as corollary discharge rather than efference copy. Typically, “efference copy” refers to a graded, subtractive motor-related signal, while “corollary discharge” denotes an all-or-none signal, both counteracting the sensory consequences of self-generated actions. In this manuscript, we use the term efference copy more broadly, encompassing both types of motor-related feedback signals (Sommer and Wurtz, 2008).

      In response to this comment, we have made the following changes in the main text to enhance its accessibility to general readers.

      (Line#268) “This integration problem has been studied across animal sensory systems, typically by analyzing motor-related signals observed in sensory neurons (Bell, 1981; Collett, 1980; Kim et al., 2017; Poulet and Hedwig, 2006). Building on the results of these studies, we developed three integrative models. The first model, termed the “addition-only model”, assumes that the outputs of the object (bar) and the background (grating) response circuits are summed to control the flight orientation (Figure 4B, see Equation 14 in Methods).”

      (Line#272) “In the second and third models, an EC is used to set priorities between different visuomotor circuits (Figure 4C,D). In particular, the EC is derived from the object-induced motor command and sent to the object response system to nullify visual input associated with the object-evoked turn (Bell, 1981; Collett, 1980; Poulet and Hedwig, 2006). These motor-related inputs fully suppress sensory processing in some systems (Poulet and Hedwig, 2006), whereas in others they selectively counteract only the undesirable components of the sensory feedback (Bell, 1981; Kennedy et al., 2014).”

      (3) There should be a discussion of how the visual efference could be represented in the biological model and an evaluation of the plausibility and alternatives.

      Thank you for this helpful comment. We have now added the following discussion to share our perspective on the circuit-level implementation of the visual efference copy in Drosophila.

      (Line#481) “Efference copy in Drosophila vision

      Under natural conditions, various visual features in the environment may concurrently activate multiple motor programs. Because these may interfere with one another, it is crucial for the central brain to coordinate between the motor signals originating from different sensory circuits. Among such coordination mechanisms, the EC mechanisms were hypothesized to counteract so-called reafferent visual input, those caused specifically by self-movement (Collett, 1980; von Holst and Mittelstaedt, 1950). Recent studies reported such EC-like signals in Drosophila visual neurons during spontaneous as well as loom-evoked flight turns (Fenk et al., 2021; Kim et al., 2017, 2015). One type of EC-like signals were identified in a group of wide-field visual motion-sensing neurons that were shown to control the neck movement for the gaze stability (Kim et al., 2017). The EC-like signals in these cells were bidirectional depending on the direction of flight turns, and their amplitudes were quantitatively tuned to those of the expected visual input across cell types. Although amplitude varies among cell types, it remains inconclusive whether it also varies within a given cell type to match the amplitude of expected visual feedback, thereby implementing the graded EC signal. A more recent study examined EC-like signal amplitude in the same visual neurons for loom-evoked turns, across events (Fenk et al., 2021). Although the result showed a strong correlation between wing response and the EC-like inputs, the authors pointed that this apparent correlation could stem from noisy measurement of all-or-none motor-related inputs.

      Thus, these studies did not completely disambiguate between graded vs. all-or-none EC signaling. Another type of EC-like signals observed in the visual circuit tuned to a moving spot exhibited characteristics consistent with all-or-none EC. That is, it entirely suppressed visual signaling, irrespective of the direction of the self-generated turn (Kim et al., 2015; Turner et al., 2022). 

      Efference-copy (EC)–like signals have been reported in several Drosophila visual circuits, yet their behavioral role remains unclear. Indirect evidence comes from a behavioral study showing that the dynamics of spontaneously generated flight turns were unaffected by unexpected background motion (Bender and Dickinson, 2006a). Likewise, our behavioral experiments showed that, during loom-evoked turns, responses to background motion are suppressed in an all-or-none manner (Figures 6 and 7). Consistent with this, motor-related inputs recorded in visual neurons exhibit nearly identical dynamics during spontaneous and loom-evoked turns (Fenk et al., 2021). Together, these behavioral and physiological parallels support the idea that a common efference-copy mechanism operates during both spontaneous and loom-evoked flight turns.

      Unlike loom-evoked turns, bar-evoked turn dynamics changed in the presence of moving backgrounds (Figure 5), a result compatible with both the addition-only and graded EC models. However, when the static background was updated just before a bar-evoked turn—thereby altering the amplitude of optic flow—the turn dynamics remained unaffected (Figures 5 and 7), clearly contradicting the addition-only model. Thus, the graded EC model is the only one consistent with both findings. If a graded EC mechanism were truly at work, however, an unexpected background change should have modified turn dynamics because of the mismatch between expected and actual visual feedback (Figure 4–figure supplement 1)—yet we detected no such effect at any time scale examined (Figure 7–figure supplement 1). This mismatch would be ignored only if the amplitude of the graded EC adapted to environmental changes almost instantaneously—a mechanism that seems improbable given the limited computational capacity of the Drosophila brain. In electric fish, for example, comparable adjustments take more than 10 minutes (Bell, 1981; Muller et al., 2019). Further investigation is needed to clarify how reorienting flies ignore optic flow generated by static backgrounds, potentially by engaging EC mechanisms not captured by the models tested in this study.

      Why would Drosophila rely on the all-or-none EC mechanism instead of the graded one for loom-evoked turns? A graded EC must be adjusted adaptively depending on the environment, as the amplitude of visual feedback varies with both the dynamics of self-generated movement and environmental conditions (e.g., empty vs. cluttered visual backgrounds) (Figure 4—figure supplement 1). Recent studies on electric fish have suggested that a large array of neurons in a multi-layer network is crucial for generating a modifiable efference copy signal matched to the current environment (Muller et al., 2019). Given their small-sized brain, flies might opt for a more economical design for suppressing unwanted visual inputs regardless of the visual environment. Circuits mediating such a type of EC were identified in the cricket auditory system during stridulation (Poulet and Hedwig, 2006), for example. Our study strongly suggests the existence of a similar circuit in the Drosophila visual system. 

      We tested the hypothesis that efference-copy (EC) signals guide action selection by suppressing specific visuomotor reflexes when multiple visual features compete. An alternative motif with a similar function is mutual inhibition between motor pathways (Edwards, 1991; Mysore and Kothari, 2020). In Drosophila, descending neurons form dense lateral connections (Braun et al., 2024), offering a substrate for such competitive interactions. Determining whether—and how—EC and mutual inhibition operate will require recordings from the neurons that ensure visual stability, which remain unidentified. Mapping these pathways and assessing how they are modulated by visual and behavioral context are important goals for future work.”

      Reviewer #2 (Public Review):

      It has been widely proposed that the neural circuit uses a copy of motor command, an efference copy, to cancel out self-generated sensory stimuli so that intended movement is not disturbed by the reafferent sensory inputs. However, how quantitatively such an efference copy suppresses sensory inputs is unknown. Here, Canelo et al. tried to demonstrate that an efference copy operates in an all-or-none manner and that its amplitude is independent of the amplitude of the sensory signal to be suppressed. Understanding the nature of such an efference copy is important because animals generally move during sensory processing, and the movement would devastatingly distort that without a proper correction. The manuscript is concise and written very clearly. However, experiments do not directly demonstrate if the animal indeed uses an efference copy in the presented visual paradigms and if such a signal is indeed non-scaled. As it is, it is not clear if the suppression of behavioral response to the visual background is due to the act of an efference copy (a copy of motor command) or due to an alternative, more global inhibitory mechanism, such as feedforward inhibition at the sensory level or attentional modulation. To directly uncover the nature of an efference copy, physiological experiments are necessary. If that is technically challenging, it requires finding a behavioral signature that unambiguously reports a (copy of) motor command and quantifying the nature of that behavior.

      We thank the reviewer for this insightful and constructive comment. We agree that our current behavioral evidence does not directly identify the underlying circuit mechanism, and that direct recordings from visual neurons modulated by an efference copy would be critical for distinguishing between potential mechanisms.

      A prerequisite for such physiological investigations would be the identification of both (1) the feedforward neurons directly involved in the optomotor response, and (2) the neurons conveying motor-related signals to the optomotor circuit. Despite efforts by several research groups, the location of the feedforward circuit mediating the optomotor response remains elusive. This limitation has prevented us from obtaining direct cellular evidence of flight turn-associated suppression of optomotor signaling.

      In light of the reviewer’s suggestion, we expanded our investigation to strengthen the behavioral evidence for efference copy (EC) mechanisms. In addition to our earlier experiments involving unexpected changes in the static background, we examined how object-evoked flight turns influence the optomotor stability reflex and vice versa (Figures 5 and 6). To quantify the interaction between different visuomotor behaviors, we systematically varied the temporal relationship between two types of visual motion—loom versus moving background, or moving bar versus moving background—and measured the resulting behavioral responses.

      Our findings support pattern- and time-specific suppressive mechanisms acting between flight turns associated with the different visual patterns. Specifically:

      The responses to a moving bar and a moving background add linearly, even when presented in close temporal proximity.

      Loom-evoked turns and the optomotor stability reflex mutually suppress each other in a time-specific manner.

      For both loom- and moving bar-evoked flight turns, changes in the static background had no measurable effect on the dynamics of the object-evoked responses.

      These results provide a detailed behavioral characterization of a suppressive interaction between distinct visuomotor responses. This, in turn, offers correlative evidence supporting the involvement of an efference copy-like mechanism acting on the visual system. While similar efference copy mechanisms have been documented in other parts of the visual system, we acknowledge that our findings do not exclude alternative explanations. In particular, it is still possible that lateral inhibition within the central brain or ventral nerve cord contributes to the suppression we observed.

      Ultimately, definitive proof will require identifying the specific neurons that convey efference copy signals and demonstrating that silencing these neurons abolishes the behavioral suppression. Until such experiments are feasible, our behavioral approach provides an important contribution toward understanding the nature of sensorimotor integration in this system.

      Reviewer #3 (Public Review):

      Summary:

      Canelo et al. used a combination of mathematical modeling and behavioral experiments to ask whether flies use an all-or-none EC model or a graded EC model (in which the turn amplitude is modulated by wide-field optic flow). Particularly, the authors focus on the bar-ground discrimination problem, which has received significant attention in flies over the last 50-60 years. First, they use a model by Poggio and Reichardt to model flight response to moving small-field bars and spots and wide-field gratings. They then simulate this model and compare simulation results to flight responses in a yaw-free tether and find generally good agreement. They then ask how flies may do bar-background discrimination (i.e. complex visual environment) and invoke different EC models and an additive model (balancing torque production due to background and bar movement). Using behavioral experiments and simulation supports the notion that flies use an all-or-none EC since flight turns are not influenced by the background optic flow. While the study is interesting, there are major issues with the conceptual framework.

      Strengths:

      They ask a significant question related to efference copies during volitional movement.

      The methods are well detailed and the data (and statistics) are presented clearly.

      The integration of behavioral experiments and mathematical modeling of flight behavior.

      The figures are overall very clear and salient.

      Weaknesses:

      Omission of saccades: While the authors ask a significant question related to the mechanism of bar-ground discrimination, they fail to integrate an essential component of the Drosophila visuomotor responses: saccades. Indeed, the Poggio and Reichardt model, which was developed almost 50 years ago, while appropriate to study body-fixed flight, has a severe limitation: it does not consider saccades. The authors identify this major issue in the Discussion by citing a recent switched, integrate-and-fire model (Mongeau & Frye, 2017). The authors admit that they "approximated" this model as a smooth pursuit movement. However, I disagree that it is an approximation; rather it is an omission of a motor program that is critical for volitional visuomotor behavior. Indeed, saccades are the main strategy by which Drosophila turn in free flight and prior to landing on an object (i.e. akin to a bar), as reported by the Dickinson group (Censi et al., van Breugel & Dickinson [not cited]). Flies appear to solve the bar-ground discrimination problem by switching between smooth movement and saccades (Mongeau & Frye, 2017; Mongeau et al., 2019 [not cited]). Thus, ignoring saccades is a major issue with the current study as it makes their model disconnected from flight behavior, which has been studied in a more natural context since the work of Poggio.

      Thank you for this helpful comment. We agree that including saccadic turns is essential and qualitatively improves the model. In the revised manuscript, we therefore expanded our bar-tracking model to incorporate an integrate-and-saccade strategy, now presented in Figure 2—figure supplement

      The manuscript now introduces this result as follows:

      (Line#190) “Finally, one important locomotion dynamics that a flying Drosophila exhibits while tracking an object is a rapid orientation change, called a “saccade” (Breugel and Dickinson, 2012; Censi et al., 2013; Heisenberg and Wolf, 1979). For example, while tracking a slowly moving bar, flies perform relatively straight flights interspersed with saccadic flight turns (Collett and Land, 1975; Mongeau and Frye, 2017). During this behavior, it has been proposed that visual circuits compute an integrated error of the bar position with respect to the frontal midline and triggers a saccadic turn toward the bar when the integrated value reaches a threshold (Frighetto and Frye, 2023; Mongeau et al., 2019; Mongeau and Frye, 2017). We expanded our bar fixation model to incorporate this behavioral strategy (Figure 2--figure supplement 2). The overall structure of the modified model is akin to the one proposed in a previous study (Mongeau and Frye, 2017), and the amplitude of a saccadic turn was determined by the sum of the position and velocity functions (Figure 2--figure supplement 2A; see Equation 13 in Methods). When simulated, our model successfully reproduced experimental observations of saccade dynamics across different object velocities (Figure 2--figure supplement 2B-D) (Mongeau and Frye, 2017). Together, our models faithfully recapitulated the results of previous behavioral observations in response to singly presented visual patterns (Collett, 1980; Götz, 1987; H. Kim et al., 2023; Maimon et al., 2008; Mongeau and Frye, 2017).”

      Apart from Figures 1 and 2, most of our data—whether from simulations or behavioral experiments—use brief visual patterns lasting 200 ms or less. These stimuli trigger a single, rapid orientation change reminiscent of a saccadic flight turn. In this part of the paper, we essentially have examined how multiple visuomotor pathways interact to determine the direction of object-evoked turns when several visual patterns occur simultaneously.

      Critically, recent work showed that a group of columnar neurons (T3) appear specialized for saccadic bar tracking through integrate-and-fire computations, supporting the notion of parallel visual circuits for saccades and smooth movement (Frighetto & Frye, 2023 [not cited]).

      Thanks for bringing up this critical issue. We have now added this paper in the following part of the manuscript.

      (Line#193) “During this behavior, it has been proposed that visual circuits compute an integrated error of the horizontal bar position with respect to the frontal midline and triggers a saccadic turn toward the bar when the integrated value reaches a threshold (Frighetto and Frye, 2023; Mongeau and Frye, 2017).”

      (Line#462) “Visual systems extract features from the environment by calculating spatiotemporal relationships of neural activities within an array of photoreceptors. In Drosophila, these calculations occur initially on a local scale in the peripheral layers of the optic lobe (Frighetto and Frye, 2023; Gruntman et al., 2018; Ketkar et al., 2020).”

      A major theme of this work is bar fixation, yet recent work showed that in the presence of proprioceptive feedback, flies do not actually center a bar (Rimniceanu & Frye, 2023). Furthermore, the same study found that yaw-free flies do not smoothly track bars but instead generate saccades. Thus prior work is in direct conflict with the work here. This is a major issue that requires more engagement by the authors.

      Thank you for your thoughtful comments and for drawing our attention to this important paper. In our experiments, bar fixation on oscillating vertical objects emerges during the “alignment” phase of the magneto-tether protocol. The pattern movement dynamics was similar those used by Rimniceanu & Frye (2023), yet the two studies differ in a key respect: Rimniceanu & Frye employed a motion-defined bar, whereas we presented a dark vertical bar against a uniform or random-dot background. The alignment success rate—defined as the proportion of trials in which the fly’s body angle is within ±25° of the target—was about 50 % (data not shown). Our alignment pattern consisted of three vertical stripes spanning ~40° horizontally; when we replaced it with a single, narrower stripe, the success rate was lowered (data not shown). These observations suggest that bar fixation in the magnetically tethered assay is less robust than in the rigid-tethered assay, although flies still orient toward highly salient vertical objects.

      We also observed that bar-evoked turns were elicited more reliably when the bar moved rapidly (45° in 200 ms) in the magneto-tether assay, although the turn magnitude was significantly smaller than the actual bar displacement (Figure 3).

      In response to the reviewer’s comment, we now added the following description in the paper regarding the bar fixation behavior, citing Rimniceanu&Frye 2023.

      (Line#239) “Another potential explanation arises from recent studies demonstrating that proprioceptive feedback provided during flight turns in a magnetically tethered assay strongly dampens the amplitude of wing and head responses (Cellini and Mongeau, 2022; Rimniceanu et al., 2023).”

      Relevance of the EC model: EC-related studies by the authors linked cancellation signals to saccades (Kim et al, 2014 & 2017). Puzzlingly, the authors applied an EC model to smooth movement, when the authors' own work showed that smooth course stabilizing flight turns do not receive cancellation signals (Fenk et al., 2021). Thus, in Fig. 4C, based on the state of the field, the efference copy signal should originate from the torque commands to initiate saccades, and not from torque to generate smooth movement. As this group previously showed, cancellation signals are quantitatively tuned to that of the expected visual input during saccades. Importantly, this tuning would be to the anticipated saccadic turn optic flow. Thus the authors' results supporting an all-or-none model appear in direct conflict with the author's previous work. Further, the addition-only model is not particularly helpful as it has been already refuted by behavioral experiments (Rimneceanu & Frye, Mongeau & Frye).

      Thank you for this constructive comment. Efference copy is best established for brief, discrete actions like flight saccades. While motor-related modulation of visual processing has been reported across short- and long-duration behaviours (Chiappe et al., 2010; Fujiwara et al., 2017; Kim et al., 2015, 2017; Maimon et al., 2010; Turner et al., 2022), only flight saccade-associated signals exhibit the temporal profile appropriate to cancel reafferent input. However, von Holst & Mittelstaedt (1950) originally formulated efference copy to explain the smooth optomotor response of hoverflies. In HS/VS recordings in previous studies, however, we could not detect membrane-potential changes tied to baseline wing-beat amplitude (data not shown), but further work is needed. 

      Note that visually evoked flight turns analyzed in this paper have relatively fast dynamics. Fenk et al. (2021) showed that HS cells carry EC-like motor signals during both loom-evoked turns and spontaneous saccades. Building on this, we tested whether object-evoked rapid turns modulate other visuomotor pathways. Although Fenk et al. also found that optomotor turns lack motor input to HS cells, the authors did not test whether the optomotor pathway suppresses other reflexes, such as loom-evoked turns. Our new behavioral data (Figure 6) show that optomotor turns indeed suppress loom-evoked turns, suggesting a potential EC signal arising from the optomotor pathway that inhibits loom-responsive visual neurons.

      In Kim et al. (2017), the authors argued that HS/VS neurons receive a “quantitatively tuned” efference copy that varies across cell types: yaw-sensitive LPTCs are strongly suppressed, roll-sensitive cells receive intermediate input, and pitch-sensitive cells receive little or none. We also showed that when the amplitude of ongoing visual drive changes, the amplitude of saccade-related potentials (SRPs) scales linearly. This proportionality does not imply a genuinely graded EC, however, because SRP amplitude could vary solely through changes in driving force (Vm – Vrest) with a fixed EC conductance. Crucially, SRPs do not fully suppress feed-forward visual signalling, arguing against an all-or-none EC mechanism.

      How, then, can the cellular and behavioural data be reconciled? Silencing HS/VS neurons—or their primary inputs, the T4/T5 neurons—does not markedly diminish the optomotor response in flight (Fenk et al., 2014; Kim et al., 2017), indicating the presence of additional, as-yet-unidentified pathways.

      Physiological recordings from other visual neurons that drive the optomotor response in flying Drosophila are therefore needed to determine how strongly they are suppressed during loom-evoked turns.

      Behavioral evidence for all-or-none EC model: The authors state "unless the stability reflex is suppressed during the flies' object evoked turns, the turns should slow down more strongly with the dense background than the sparse one". This hypothesis is based on the fact that the optomotor response magnitude is larger with a denser background, as would be predicted by an EMD model (because there are more pixels projected onto the eye). However, based on the authors' previous work, the EC should be tuned to optic flow and thus the turning velocity (or amplitude). Thus the EC need not be directly tied to the background statistics, as they claim. For instance, I think it would be important to distinguish whether a mismatch in reafferent velocity (optic flow) links to distinct turn velocities (and thus position). This would require moving the background at different velocities (co- and anti-directionally) at the onset of bar motion. Overall, there are alternative hypotheses here that need to be discussed and more fully explored (as presented by Bender & Dickinson and in work by the Maimon group).

      We appreciate the reviewer’s important suggestion. In response, we performed the recommended experiment. In Figures 5 and 6 of the revised manuscript, we now present how bar- or loom-evoked flight turns affect the response to a moving background pattern. These experiments revealed that bar-evoked turns do not suppress the optic flow response, whereas loom-evoked turns strongly suppress it. Specifically, when background motion began 100 ms after the onset of loom expansion, the response to the background was significantly suppressed. Although weak residual responses to the background motion were observed in this case, this could be due to background motion occurring outside of the suppression interval, which may correspond in duration to the duration of flight turns (Figure 6C,D). 

      The lack of suppression of the optic flow response during and after bar-evoked turns appears to suggest that the responses are added linearly (Figure 5), seemingly contradicting the lack of dynamic change when the background dot density was altered (Figure 7, Figure 7–figure supplement 1). That is, the experimental result in Figure 5 supports either an addition-only or a graded efference copy (EC) model. However, the result in Figure 7 supports an all-or-none EC model. If a graded EC were used, the amplitude of the EC should be updated almost instantaneously when the static background changes.

      Another possibility is that the optic flow during self-generated turns in a static background is extremely weak compared to the optic flow input generated by physically moving the pattern, perhaps due to the rapid nature of head movements. Indeed, detailed kinematic analysis of head movement during spontaneous saccades in blow flies revealed that the head reaches the target angle before the body completes the orientation change, making the effective speed of reafferent optic flow higher than the speed of body rotation (Hateren and Schilstra, 1999). To test these hypotheses, further experiments will be needed for bar-evoked flight turns.

      Publishing the reviewed preprint:

      (1) The Reviewed Preprint (including the full text of the preprint we reviewed, the eLife assessment, and public reviews) will typically be published in two weeks' time.

      Please let us know if you would like to provide provisional author responses to be posted at the same time (if so, please send these by email). Please do not resubmit within the next two/three weeks, as we will need to publish the first version of the Reviewed Preprint first.

      If there are any factual errors in the eLife assessment or public reviews, or other issues we should be aware of, please let us know as soon as possible.

      (2) After publication of the Reviewed Preprint, you can use the link below to submit a revised version. There is no deadline to resubmit. Before resubmitting, please ensure that you update the preprint at the preprint server to correspond with the revised version. Upon submitting a revised version, we will ask the editors and reviewers if it's appropriate to update their assessment and public reviews, which will be included alongside the revised Reviewed Preprint. At that time we will also post the recommendations to the authors and the author responses you provide with the revised version. In the author response, please respond to the public reviews (where relevant) and the recommendations to the authors.

      (3) Alternatively, you can proceed with the current version of the Reviewed Preprint (once published), without revisions, and request an eLife Version of Record. See the Author Guide for further information: https://elife-rp.msubmit.net/html/elife-rp_author_instructions.html#vor. However, most authors decide to request a Version of Record after a round of revision.

      (4) After publication of eLife's Reviewed Preprint, you also have the option to submit/publish in another journal instead: if you choose to do this, please let us know so we can update our records.

      The reviewers identified two key revisions that could improve the assessment of the paper:

      (1) Consideration of saccades within the model framework (outlined by reviewer 3).

      (2) Addition of physiology data to support the conclusions of the paper (outlined by reviewer 2). If this is not feasible within the timescale of revisions, the paper would need to be revised to clarify that the model leads to a hypothesis that would need to be tested with future physiology experiments.

      Thank you for these comments.

      Regarding revision point #1, we have added Figure 2–figure supplement 2, where we incorporated our position-velocity model (estimated in Figure 1) into the framework of the integrate-and-saccade model. A detailed description of this model is now provided in the main text (Lines 190–203).

      For revision point #2, obtaining electrophysiological evidence for efference copy remains challenging, as neither the visual neurons nor the efference-copy neuron has been identified for the wing optomotor response. As suggested by the reviewers, we have revised the title of the paper to reduce emphasis on efference copy and have noted electrophysiological recordings as a direction for future work.

      old title: A visual efference copy-based navigation algorithm in Drosophila for complex visual environments

      new title: Integrative models of visually guided steering in Drosophila

      Specific recommendations are detailed below.

      Reviewer #2 (Recommendations For The Authors):

      To directly demonstrate if an efference copy is non-scaled, the following experiments can be helpful: record from HS/VS cells and examine the relation between the amplitude of the succade-suppression signal vs. succade amplitude.

      Thanks for raising this important point. We previously carried out the suggested analysis for loom-evoked saccades in Fenk et al. (2021). There, significant correlations emerged between wing-response amplitude and saccade-related potentials (Figures 2F and 3C). However, we did not interpret the strong correlation (r ≈ 0.8) as evidence for a graded efference copy, because the amplitude of saccade-related potentials appeared to be bimodal. Upon presentation of the looming stimulus, flies either executed large evasive turns or showed minimal changes in wing-stroke amplitude. Large wing responses were accompanied by strong, saturated suppression of HS-cell membrane potential, whereas trials without wing responses produced only weak modulations—reflected in the bimodal distribution of saccade-related potential amplitudes (Figure 3C). 

      Importantly, in rigidly tethered preparations—where these potentials are typically measured—the absence of proprioceptive feedback can itself drive wingbeat amplitudes to saturation during saccades. We therefore reasoned that the lack of intermediate-sized flight saccades would naturally yield correspondingly saturated saccade-related potentials, even if a graded EC system is in play. 

      In Kim et al. (2017), we also performed a comprehensive analysis of spontaneous saccade-related potentials across all HS/VS cell types. When we later examined the relationship between saccade amplitude and the corresponding saccade-related potentials in each cell type, we could not find any statistically significant correlation (unpublished data).

      measure how much a weak visual stimulus and a strong visual stimulus are suppressed by the suppression signal. If the signal is non-scaled, visual stimuli should always be suppressed independently of their intensities.

      Thank you for this important suggestion. As mentioned in our response to the previous comment, we believe it is not feasible to record from neurons responsible for the body optomotor response at this point, as their identity remains unknown. Regarding the HS/VS cells, our previous study showed that HS cells are not always fully suppressed. The changes in saccade-related potential amplitude can be described as a linear function of the pre-saccadic visually-evoked membrane potential (Figure 7 in Kim et al., 2017). 

      As suggested by Fenk et al. 2014 (doi: 10.1016/j.cub.2014.10.042), HS cells might also be responsive to a moving bar. If that is the case, and if you present a bar and background (either sparse or dense) in a closed-loop manner to a head-fixed fly, HS cells might be sensitive only to the bar but not to the background (independently of the density).

      Thanks for pointing out this important issue. HS cells indeed respond strongly to the horizontal movement of a vertical bar, as expected given that their receptive fields are formed by the integration of local optic flow vectors. In one of our previous studies (Supplemental Figure 1 in Kim et al., 2015), we showed that the response amplitude to a single vertical bar is roughly equivalent to that elicited by a vertical grating composed of 12 bars of the same size. Therefore, we believe that HS cells are likely to contribute to the head response to a moving vertical bar. In a body-fixed flight simulator, HS cells would respond only to the bar if the bar runs in a closed loop with a static background. In this scenario, HS cells are likely to play a role in the head optomotor response.

      Note also that the role of HS cells in the wing optomotor response remains unresolved. Unilateral activation of HS cells has been shown to elicit locomotor turns in walking Drosophila (Fujiwara et al., 2017), as well as in flying individuals (unpublished data from our lab). However, a previous study also showed that strong silencing of HS/VS cells significantly reduced the head optomotor response, but not the wing optomotor response (Kim et al., 2017).

      If neurophysiology is technically challenging, an alternative way might pay attention to a head movement that exclusively follows the background (Fox et al., 2014 (doi: 10.1242/jeb.080192)). Because HS cells are thought to promote head rotation to background motion, a non-scaled suppression signal on HS cells would always suppress the head rotation independently of the background density.

      Thanks for this helpful comment. We have analyzed head movements during bar-evoked flight turns (Figure 7–figure supplement 1B) and found no significant changes across different background dot densities. We think that this might suggest that HS cells are unlikely to receive suppressive inputs during bar-evoked turns, akin to the lack of modulation during optomotor turns.

      Another way to separate a potential efference copy from other mechanisms (more global inhibition) is the directionality. A global inhibition would suppress the response to the background even if the background moves in the same direction as self-motion, but the efference copy would not.

      Thanks for this important point. In Heisenberg and Wolf, 1979, it was proposed that modulation might be bidirectional, with behavioral effects observed only for perturbations in the “unexpected” direction. In our new data on loom-evoked turns (Figure 6), the suppression appears equally strong for background motion in either direction, supporting an all-or-none suppression mechanism.

      Besides, in general, it is unclear if you think an efference copy operates both in smooth pursuits and saccades or if such a signal is only present during saccades. Your previous neurophysiological work supports the latter. Are your behavioral results consistent with the previous saccade suppression idea, or do you propose a new type of efference copy that also operates in smooth pursuits?

      Thanks for raising this important point. von Holst and Mittelstaedt (1950) originally introduced the concept of efference copy to explain the smooth optomotor response. We previously analyzed electrophysiological recordings from HS cells for membrane-potential changes associated with slow deviations in wing-steering angle but found none. However, this negative result does not entirely rule out modulation of visual processing during smooth flight turns, given the slow drift in membrane potential observed in most whole-cell recordings.

      In this study, We examined only the interactions among visuomotor pathways during these rapid flight turns as the dynamics of visually evoked turns are almost as rapid as spontaneous saccades. Our data reveal that interactions between distinct visuomotor reflexes are more diverse than previously appreciated.

      Minor comments:

      Line 108, 109: match the description between here and the labels in Fig. 1F.

      Thank you for indicating this issue. We have defined the general equation to obtain the position and velocity components in the main text lines 108,109, but due to a slight asymmetry in the data (Fig. 1E) we used the approach indicated in Fig. 1F. and explained in lines 113-117.

      Fig.1 F: If the position-dependent component is due to fatigue, the tuning curve's shape is likely changed (shrunk or extended) depending on the stimulus speed. How can you generalize the tuning curve shown here? Does the result hold even if the stimulus speed/contrast/spatial frequency is changed?

      We appreciate this indication. We believed that fatigue may be the reason why the wing response to the grating stimulus showed that significant decay (Fig. 1E). As you mention, the stimulus speed would increase the amplitude of the fly’s response up to a saturation point. We addressed this in our model by multiplying the derived value by the angular velocity of the grating.

      Regarding the contrast, and spatial frequency we did not test it experimentally, instead, we simulated our model for changing visual feedback (Fig. 4A, B), which can be seen as increasing/decreasing contrast of a grating. An increase in the contrast would increase the response of the fly to the grating and so will contribute to dampening the response to the foreground object (Fig. 4C).

      Line 233-255: Here, the description sounds like you will consider several parallel objects (e.g., two stripes) in the visual field instead of the combination of the figure and background (which is referred to in the following paragraph).

      Thank you for pointing it out. Indeed it was slightly ambiguous. We have addressed this by explaining the specific situation of a combination of an object and the background in lines 231-233.

      Figure 6C: you kept the foreground visual field between sparse and dense random dot backgrounds to keep the bar's saliency. Is it sure that this does not influence the difference in the fly's response to these two backgrounds (in Figure 6B)?

      This is a good point that we have also discussed internally. We also carried out similar experiments with a fully covered background and found no significant differences (Figure 7–figure supplement 1).

      Reviewer #3 (Recommendations For The Authors):

      Identify and analyze flight saccade dynamics in the raw trajectories (e.g., Fig. 3B). There should be some since the bar is near the 'sweet spot' for triggering saccades (see Mongeau & Frye, 2017).

      Thank you for bringing up this interesting point. In previous work, it was reported that the fly fixated on a vertical bar through saccadic turns rather than smooth-tracking (Mongeau & Frye, 2017). When the bar width was thin (<15 deg) there was barely one saccade per second (Mongeau & Frye, 2017, Fig. 4). In our magno tether essay (Fig. 3A, B) the object width was 11.25 degrees, and the object moved for a short time window, and so the fly only generated the saccade related to the onset of the object. It could not be considered as a saccade some small turns of a few degrees that are likely related to small perturbations in comparison to those previously reported (Mongeau & Frye, 2017). Additionally, in our protocol (Fig. 3A) from onset time (‘go’ mark), only a single object moved, within an empty background, so in principle there is no trigger for a switch to a smooth movement. We addressed this in lines x-x.

      Consider updating the Poggio model with flight saccades (switched, integrate-and-fire).

      We appreciate this suggestion. Following previous work (Mongeau et al., 2017), we expanded our model to include a saccade mechanism: the torque produced by the summed position- and velocity-dependent components is now replaced by an integrate-and-fire saccade (Figure 2—figure supplement 2). We optimized the saccade interval and amplitude so that both vary linearly with stimulus amplitude and faithfully reproduce the kinematic properties reported previously (Mongeau et al., 2017).  

      Please engage more with the literature, especially work that directly conflicts with your conclusions (see above). Also, highly relevant work by Bender & Dickinson was not sufficiently discussed. Spot results presented in Fig. 3 should be contextualized in light of the work of Mongeau et al., 2019, who performed similar experiments and identified a switch in saccade valence.

      We appreciate your pointing out the relevant previous work. We have added references to the following papers and tried to describe the relationship between our data and previous ones.

      Bender & Dickinson 2006

      (Line#162) “This simulation experiment is reminiscent of the magnetically tethered flight assay, where a flying fly remains fixed at a position but is free to rotate around its yaw axis (Bender and Dickinson, 2006b; Cellini et al., 2022; G. Kim et al., 2023; Mongeau and Frye, 2017).”

      (Line#218) “We tested the predictions of our models with flies flying in an environment similar to that used in the simulation (Figure 3A). A fly was tethered to a short steel pin positioned vertically at the center of a vertically oriented magnetic field, allowing it to rotate around its yaw axis with minimal friction (Bender and Dickinson, 2006b; Cellini et al., 2022; G. Kim et al., 2023).”

      (Line#238) “To determine if our assay imposes additional friction compared to other assays used in previous studies, we analyzed the dynamics of spontaneous saccades during the “freeze” phase (Figure 3–figure supplement 1A). We found their duration and amplitude to be within the range reported previously (Bender and Dickinson, 2006b; Mongeau and Frye, 2017) (Figure 3–figure supplement 1B-D). 

      Mongeau et al., 2019

      (Line#196) “During this behavior, it has been proposed that visual circuits compute an integrated error of the bar position with respect to the frontal midline and triggers a saccadic turn toward the bar when the integrated value reaches a threshold (Frighetto and Frye, 2023; Mongeau et al., 2019; Mongeau and Frye, 2017). We expanded our bar fixation model to incorporate this behavioral strategy (Figure 2–figure supplement 2).”

      This paper shows that the dynamics of saccadic flight turns elicited by a rotating bar or spot determine whether flies display attraction or aversion. In that study, the visual stimulus—a bar or spot—rotated slowly at a constant 75 deg s⁻¹. By contrast, in our Figure 3 the object moves much faster, driving the neural “integrator” to saturation and triggering an almost immediate flight turn. In Mongeau et al. (2019), saccades occur at variable times and their amplitudes and directions are more stochastic, again reflecting the slower stimulus speed. Because these differences all arise from the disparity in object speed, we did not cite Mongeau et al. (2019) in Figure 3 or the associated text.

      In addition to the two papers cited above, we have incorporated several relevant studies on the Drosophila visuomotor control identified through the reviewers’ insightful comments. Examples include:

      Frighetto G, Frye MA. 2023 (Line#195, 464)

      Rimniceanu et al., 2023 (Line#241)

      Cellini & Mongeau 2020 (Line#91)

      Cellini & Mongeau 2022 (Line#241)

      Cellini et al., 2022 (LIne#91, 162, 218)

      Many citations are not in the proper format (e.g. using numbers rather than authors' last name).

      Thank you for letting us know. We have changed the remaining citations to the proper format.

    1. Author response:

      We would like to thank the reviewers for their helpful comments and critique of our manuscript. We plan to make the following revisions, which will improve the clarity of our manuscript and the robustness of our findings.

      We will revise methodological details and interpretation throughout the manuscript. In particular, we will consider alternative methods for calculating surrogates. We intend to investigate the relationship between apnoea rate and phase-amplitude coupling at other electrodes as suggested by Reviewer 1, and we will revise the details of the linear-mixed effects models.

      In relation to the comments raised by both Reviewers 2 and 3, we will carefully address the wording throughout the manuscript, including addressing the order of hypotheses, our interpretation of the directionality of the relationship between cortical and respiratory activity, and the connection between cortical-respiratory coupling and apnoea. We will further clarify the limitations of our recording setup and approach, in particular the limited EEG montage, and add further details with regards to sleep state and caffeine.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study presents an interesting investigation into the role of trained immunity in inflammatory bowel disease, demonstrating that β-glucan-induced reprogramming of innate immune cells can ameliorate experimental colitis. The findings are novel and clinically relevant, with potential implications for therapeutic strategies in IBD. The combination of functional assays, adoptive transfer experiments, and single-cell RNA sequencing provides comprehensive mechanistic insights. However, some aspects of the study could benefit from further clarification to strengthen the conclusions.

      We are grateful for the reviewer’s positive assessment of our study and constructive suggestions to improve the manuscript.

      Strengths:

      (1) This study elegantly connects trained immunity with IBD, demonstrating how β-glucan-induced innate immune reprogramming can mitigate chronic inflammation.

      (2) Adoptive transfer experiments robustly confirm the protective role of monocytes/macrophages in colitis resolution.

      (3) Single-cell RNA sequencing provides mechanistic depth, revealing the expansion of reparative Cx3cr1⁺ macrophages and their contribution to epithelial repair.

      (4) The work highlights the therapeutic potential of trained immunity in restoring gut homeostasis, offering new directions for IBD treatment.

      Weaknesses:

      While β-glucan may exert its training effect on hematopoietic stem cells, performing ATAC-seq on HSCs or monocytes to profile chromatin accessibility at antibacterial defense and mucosal repair-related genes would further validate the trained immunity mechanism. Alternatively, the authors could acknowledge this as a study limitation and future research direction.

      We agree that further epigenetic profiling—such as ATAC-seq analysis on HSCs or monocytes—would provide additional mechanistic depth to our current findings. We will acknowledge this as a limitation of the present study and highlight it as an important direction for future research.

      Comment (1): It’s better to include a schematic summarizing the proposed mechanism for reader clarity.

      We agree that a visual summary will enhance the clarity and accessibility of our findings. We will add a new schematic diagram (Figure 6) illustrating the proposed mechanism of β-glucan–induced myeloid reprogramming and its protective effects in the experimental colitis model.

      Comment (2): Discuss potential off-target effects of β-glucan-induced trained immunity (e.g., risk of exacerbated inflammation in other contexts).

      We appreciate this important comment regarding the potential off-target effects of β-glucan pretreatment. As trained immunity is known to amplify inflammatory responses upon heterologous stimulation and has been implicated in chronic inflammation–prone conditions such as atherosclerosis, this is an important consideration. Previous in vivo studies have shown that β-glucan pretreatment can enhance antibacterial or antitumor responses without inducing basal inflammation after one week of administration (PMID: 22901542, PMID: 30380404, PMID: 36604547, PMID: 33125892). Nevertheless, it remains possible that β-glucan–induced trained immunity could have unintended effects in certain contexts, which warrants further investigation and caution. We will expand the Discussion section to include a dedicated paragraph addressing these potential off-target effects.

      Reviewer #2 (Public review):

      Summary:

      The study investigates whether β-glucan (BG) can reprogram the innate immune system to protect against intestinal inflammation. The authors show that mice pretreated with BG prior to DSS-induced colitis experience reduced colitis severity, including less weight loss, colon damage, improved gut repair, and lowered inflammation. These effects were independent of adaptive immunity and were linked to changes in monocyte function.

      The authors show that the BG-trained monocytes not only help control inflammation but confer non-specific protection against experimental infections (Salmonella), suggesting the involvement of trained immunity (TI) mechanisms. Using single-cell RNA sequencing, they map the transcriptional changes in these cells and show enhanced differentiation of monocytes into reparative CX3CR1<sup>+</sup> macrophages. Importantly, these protective effects were transferable to other mice via adoptive cell transfer and bone marrow transplantation, suggesting that the innate immune system had been reprogrammed at the level of stem/progenitor cells.

      Overall, this study provides evidence that TI, often associated with heightened inflammatory programs, can also promote tissue repair and resolution of inflammation. Moreover, this BG-induced functional reprogramming can be further harnessed to treat chronic inflammatory disorders like IBD.

      Strengths:

      (1) The authors use advanced experimental approaches to explore the potential therapeutic use of myeloid reprogramming by β-glucan in IBD.

      (2) The authors follow a data-to-function approach, integrating bulk and single-cell RNA sequencing with in vivo functional validation to support their conclusions.

      (3) The study adds to the growing evidence that TI is not a singular pro-inflammatory program, but can adopt distinct functional states, including anti-inflammatory and reparative phenotypes, depending on the context.

      We are grateful for the reviewer’s positive assessment of our study and recognition of its translational implications. We particularly appreciate the acknowledgment that our work expands the therapeutic potential of β-glucan–mediated trained immunity in ameliorating colitis.

      Weaknesses:

      (1) The epigenetic and metabolic basis of TI is not explored, which weakens the mechanistic claim of TI. This is especially relevant given that a novel reparative, anti-inflammatory TI program is proposed.

      We appreciate the reviewer’s valuable comment highlighting the importance of the epigenetic and metabolic basis of TI in providing mechanistic insight. While previous studies, including work from our group (S.-C. Cheng), have extensively characterized the epigenetic and metabolic signatures of monocytes from BG-trained mice—primarily in the context of inflammatory genes—we acknowledge that these aspects are not directly addressed in our current manuscript.

      To strengthen the mechanistic component, we plan to: 1. Reanalyze relevant public datasets, focusing on pathways related to reparative and antibacterial function. 2. Perform monocyte ATAC-seq in our current model to validate the epigenetic changes in these pathways.

      (2) The absence of a BG-only group limits interpretation of the results. Since the authors report tissue-level effects such as enhanced mucosal repair and transcriptional shifts in intestinal macrophages (colonic RNA-Seq), it is important to rule out whether BG alone could influence the gut independently of DSS-induced inflammation.

      Without a BG-only control, it is hard to distinguish a true trained response from a potential modulation caused directly by BG.

      We thank the reviewer for this important suggestion. Although we did not perform qPCR for mucosal repair genes in Figure S1C and Figure S1D, our colon RNA-seq analysis in Figure 5G included a BG-only control group (Colitis_d0). The results from this group indicate that BG preconditioning alone does not alter baseline expression of colon mucosal repair genes, supporting the conclusion that the observed effects occur in the context of DSS-induced inflammation.

      (3) Although monocyte transfer experiments show protection in colitis, the fate of the transferred cells is not described (e.g., homing or differentiation into Cx3cr1⁺ macrophage subsets). This weakens the link between specific monocyte subsets and the observed phenotype.

      (4) While scRNA-seq reveals distinct monocyte/macrophage subclusters (Mono1-3.), their specific functional roles remain speculative. The authors assign reparative or antimicrobial functions based on transcriptional signatures, but do not perform causal experiments (depletion or in vitro assays). The biological roles of these cells remain correlative.

      We agree that the functional role of CX3CR1<sup>+</sup> macrophages is not comprehensively validated and is currently inferred from scRNA-seq clustering. While our flow cytometry data show increased CX3CR1<sup>+</sup> macrophages in the BG-TI group, and our CCR2 KO and monocyte adoptive transfer experiments indicate these macrophages are monocyte-derived, we lack direct depletion experiments due to the unavailability of effective depletion antibodies for this subset.

      We acknowledge this as a limitation and will clarify in the Discussion that our conclusions regarding CX3CR1<sup>+</sup> macrophage function are based on transcriptional profiling and association with protective phenotypes, rather than direct causal evidence.

      (5) While Rag1<sup>-/-</sup> mice were used to rule out adaptive immunity, the potential role of innate lymphoid cells (ILCs), particularly ILC2s and ILC3s, which are known to promote mucosal repair (PMID: 27484190IF: 7.6 Q1 IF: 7.6 Q1 IF: 7.6 Q1 IF: 7.6 Q1 IF: 7.6 Q1 IF: 7.6 Q1 ), was not explored. Given the reparative phenotype observed, the contribution of ILCs remains a confounding factor.

      We appreciate the reviewer’s valuable comment regarding the potential role of ILCs in the observed mucosal repair. Indeed, in examining the BG-trained immunity effect, the contribution of ILCs was not evaluated. We will explicitly acknowledge in the Discussion that Rag1⁻/⁻ mice retain ILCs (including ILC3s) and that BG-induced activation of these cells remains possible.

      The literature (PMID: 21502992; PMID: 32187516) supports a role for ILC3-mediated IL-22 production in tissue repair, which could overlap with our observed effects. However, our monocyte adoptive transfer experiments show that monocytes alone can alleviate DSS-induced colitis, suggesting a dominant role for monocytes in this context. Nonetheless, we will make it clear that ILC contributions cannot be excluded.

      Reviewer #3 (Public review):

      Summary:

      In the present work, Yinyin Lv et al offer evidence for the therapeutic potential of trained immunity in the context of inflammatory bowel disease (IBD). Prior research has demonstrated that innate cells pre-treated (trained) with β-glucan show an enhanced pro-inflammatory response upon a second challenge.

      While an increased immune response can be beneficial and protect against bacterial infections, there is also the risk that it will worsen symptoms in various inflammatory disorders. In the present study, the authors show that mice preconditioned with β-glucan have enhanced resistance to Staphylococcus aureus infection, indicating heightened immune responses.

      The authors demonstrate that β-glucan training of bone marrow hematopoietic progenitors and peripheral monocytes mitigates the pro-inflammatory effects of colitis, with protection extending to naïve recipients of the trained cells.

      Using a dextran sulfate sodium (DSS)-induced model of colitis, β-glucan pre-treatment significantly dampens disease severity. Importantly, the use of Rag1<sup>-/-</sup> mice, which lack adaptive immune cells, confirms that the protective effects of β-glucan are mediated by innate immune mechanisms. Further, experiments using Ccr2<sup>-/-</sup> mice underline the necessity of monocyte recruitment in mediating this protection, highlighting CCR2 as a key factor in the mobilization of β-glucan-trained monocytes to inflamed tissues. Transcriptomic profiling reveals that β-glucan training upregulates genes associated with pattern recognition, antimicrobial defense, immunomodulation, and interferon signaling pathways, suggesting broad functional reprogramming of the innate immune compartment. In addition, β-glucan training induces a distinct monocyte subpopulation with enhanced activation and phagocytic capacity. These monocytes exhibit an increased ability to infiltrate inflamed colonic tissue and differentiate into macrophages, marked by increased expression of Cx3cr1. Moreover, among these trained monocyte and macrophage subsets, other gene expression signatures are associated with tissue and mucosal repair, suggesting a role in promoting resolution and regeneration following inflammatory insult.

      Strengths:

      (1) Overall, the authors present a mechanistically insightful investigation that advances our understanding of trained immunity in IBD.

      (2) By employing a range of well-characterized murine models, the authors investigate specific mechanisms involved in the effects of β-glucan training.

      (3) Furthermore, the study provides functional evidence that the protection conferred by the trained cells persists within the hematopoietic progenitors and can be transferred to naïve recipients. The integration of transcriptomic profiling allows the identification of changes in key genes and molecular pathways underlying the trained immune phenotype.

      (4) This is an important study that demonstrates that β-glucan-trained innate cells confer protection against colitis and promote mucosal repair, and these findings underscore the potential of harnessing innate immune memory as a therapeutic approach for chronic inflammatory diseases.

      We thank the reviewer for their positive evaluation and constructive feedback on our manuscript.

      Weaknesses:

      However, FPKM is not ideal for between-sample comparisons due to its within-sample normalization approach. Best practices recommend using raw counts (with DESeq2) for more robust statistical inference.

      We appreciate the reminder about best practices for RNA-seq analysis. We apologize for the inaccurate description in the Materials and Methods section. For all differential expression analyses, we have in fact used raw count data as input for DESeq2. FPKM values were only used for visualization purposes, such as in heatmaps and clustering analyses. We will correct this description in the revised manuscript to accurately reflect our analysis workflow.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary

      In this study, Takagi and colleagues demonstrate that changes in axonal arborization of the segmental wave motor command neurons are sufficient to change behavioral motor output.

      The authors identify the Wnt receptors DFz2 and DFz4 and the ligand Wnt4 as modulators of stereotypic segmental arborization patterns of segmental wave neurons along the anterior-posterior body axis. Based on both embryonic expression pattern analysis and genetic manipulation of the signaling components in wave neurons (receptors) and the neuropil (Wnt4) the authors convincingly demonstrate that Wnt4 acts as a repulsive ligand for DFz2 that restricts posterior axon guidance of both anterior and posterior wave neurons. They also provide the first evidence that Wnt4 potentially acts as an attractive ligand for Df4 to promote the posterior extension of p-wave neurons. Interestingly, artificial optogenetic activation of all wave neurons that normally induces backward locomotion due to the activity of anterior wave neurons, fails to induce backward locomotion in a DFz2 knockdown condition with altered axonal extensions of all wave neurons towards posterior segments. In addition, the authors now observe enhanced fast-forward locomotion, a feature normally induced by posterior wave neurons. Consistent with these findings, they observe that the natural response to an anterior tactile stimulus is similarly altered in DFz2 knockdown animals. The animals respond with less backward movement and increased fast forward motion. These results suggest that alterations in the innervation pattern of wave motor command neurons are sufficient to switch behavioral response programs.

      Strengths

      The authors convincingly demonstrate the importance of Wnt signaling for anteriorposterior axon guidance of a single class of motor command neurons in the larval CNS. The demonstration that alteration of the expression level of a single axon guidance receptor is sufficient to not only alter the innervation pattern but to significantly modify the behavioral response program of the animal provides a potential entry point to understanding behavioral adaptations during evolution.

      Weaknesses

      While the authors demonstrate an alteration of the behavioral response to a natural tactile stimulus the observed effects, a reduction of backward motion and increased fast-foward locomotion, currently cannot be directly correlated to the morphological alterations observed in the single-neuron analyses. The authors do not report any loss of innervation in the "normal" target region but only a small additional innervation of more posterior regions. An analysis of synaptic connectivity and/or a more detailed morphological analysis that is supported by a larger number of analyzed neurons both in control and experimental animals would further strengthen the confidence of the study. As the authors suggest an alteration of the command circuitry, a direct observation of the downstream activation pattern in response to selective optogenetic stimulation of anterior wave neurons would further strengthen their claims (analogous to Takagi et al., 2017, Figure 4).

      We sincerely thank the reviewer for their insightful comments, which were instrumental in improving our manuscript. In response to the reviewers’ suggestion, we have now studied Brp expression and demonstrate that the ectopically extending Wave axons in the posterior region do contain synapses (new Figure 2). This finding supports the idea that these axons are functionally connected to ectopic downstream circuits. 

      Additionally, we have increased the number of analyzed Wave clones in Figure 1F-J (WT and DFz2 KD) and new Figure 3C-G (WT; formerly Figure 2C-G) to strengthen the morphological analyses. We fully agree with the reviewer that “direct observation of the downstream activation pattern in response to selective optogenetic stimulation” would further reinforce our conclusions. However, this was not feasible in the current study since we found that the Wave-Gal4 driver used in this study, which drives expression during embryonic stages, does not drive sufficiently strong expression in the larvae to enable selective optogenetic stimulation (please see below for details). 

      Reviewer #2 (Public Review):

      Summary:

      The authors previously demonstrated that anterior-located a-Wave neurons (neuromeres A1-A3) extend axons anteriorly to connect to circuits inducing backward locomotion, while p-Wave axon (neuromeres A4-A7) project posteriorly to promote forward locomotion in Drosophila larvae. In the manuscript, the authors aim to determine the molecular mechanisms involved in wiring the segmentally homologous Wave neurons distinctively and thus are functionally different in modulating forward or backward locomotion. The genetic screen focused on Wnt/Fz-signaling due to its known anterior-to-posterior guidance roles in mammals and nematodes.

      Strengths:

      Knock-down (KD) DFz2 with two independent RNAi-lines caused ectopic posterior axon and dendrite extension for all a- and p-Wave neurons, with a-Wave axon extending into regions where p-Wave axons normally project. Both behavioral assays (optogenetic stimulation of all Wave neurons or tactile stimuli on heads using a von Frey filament) show that backward movement is reduced or absent and that the speed of evoked fast-forward locomotion is increased. This demonstrates that altered projections of Wave do alter behavior and the DFz2 KD phenotype is consistent with the potential aberrant wiring of a-Wave neurons to forward locomotion-promoting circuits instead of to backward locomotion-promoting circuits.

      The main conclusion, that Wnt/Fz-signaling is essential for the guidance of Wave neurons and in diversifying their protection pattern in a segment-specific manner, is further supported by the results showing that DFz2 gain of function causes shortening of a-Wave but not p-Wave axon extensions towards the posterior end and that KD of DFz4 causes axonal shortening only in A6-p-Wave neurons but does not affect dendrites or processes of other Wave neurons. A role for ligand Wnt4 is demonstrated by results indicating that WNT4 mutants' posterior extension of aWave axons was elongated similar to DFz2 KD animals and p-Wave axon extension towards the posterior end was shortened similar to DFz2 KD animals. Finally, a DWnt4 gradient decreasing from the posterior (A8) to the anterior end (A2), similar to that described in other species, is supported by analyses of DWnt4 gene expression (using Wnt4 Trojan-Gal4) and protein expression (using antibodies). In contrast, DFz2 receptor levels seemed to decrease from the anterior (A2) to the posterior end (A5/6). Together the results support the conclusion that opposing Wnt/Fz ligand-receptor gradients contribute to the diversification of Wave neurons in a location-dependent manner and that DFz2 and DFz4 have opposing effects on axon extension.

      Weaknesses:

      Wave axon and dendrite projections are not exclusively determined by Wnt4, DFz2, and DFz4, and are likely to involve other Fz receptors, Wt ligands, and other types of receptor-ligand signaling pathways. This is in part supported by the fact that Wnt4 loss of function also resulted in phenotypes that do not mimic DFz2 KD or DFz4 KD (Figures 3D, E, and F) and that other Fz/Wnt mutants caused wave neuron phenotypes (Figure 1-supplement 2, D+E). This is not a weakness per se, since it doesn't affect the main conclusion of the manuscript. However, the description and analyses of the data in particular for Figure 1-supplement 2 D should be clarified in the legend. The number within the bars and the asterisks are not defined. It's presumed they refer to numbers of animals assessed and the asterisk next to DFz2 and DFz4 indicate statistically significant differences. However, only one p-value is provided in the legend. It is also unclear if p-values for the other mutants have not been determined or are non-significant. At least for mutants like Corin, which also exhibit altered axon projections, the p-values should be provided.

      We appreciate this reviewer’s careful attention to detail and intellectual curiosity. We apologize for the confusions caused by the statistical reporting in Figure 1 – figure supplement 2D. The numbers shown in the bars represent the number of neurons (i.e. Wave neurons from left or right hemisphere). As mentioned in Materials and Methods section, we applied Chi-square test followed by Haberman's adjusted residual analysis to determine the statistical significance of each RNAi group. The p-value provided in the figure legend corresponds to the Chi-square test. P-values for Haberman's adjusted residual analysis were calculated for all RNAi groups and groups without the asterisk are not statistically significant. We have clarified these points in the corresponding figure legend.

      Figure 4 D, F. The gradient for Wnt4 was determined by comparison of expression levels of other segments to A8 but the gradient for DFz2 was by comparison to A2 and the data supports opposing gradients. However, for DFz2 (Figure 4, F) it seems that the gradient is bi-directional with the lowest being in A5 and increasing towards A2 as well as A8. Analysis should be performed in reference to A8 as well to determine if it is indeed bi-directional. While such a finding would not affect the interpretation of aWave neurons, it may impact conclusions about p-Wave neuron projections.

      We thank the reviewer for highlighting this interesting possibility. In response, we performed an additional analysis of the DFz2 gradient by comparing the signal from each neuromere to that from A8 (new Figure 5—figure supplement 3). This analysis confirmed that the gradient is indeed bidirectional. We revised the description of DFz2 expression accordingly in the revision. We believe this finding does not affect our main conclusions since only the anterior gradient is relevant for a-Wave axon guidance. 

      As discussed above, the DFz2 KD phenotypes are consistent with the potential aberrant wiring of a-Wave neurons to forward locomotion-promoting circuits instead of to backward locomotion-promoting circuits. However, since the axon and dendrites of a-Wave and p-Wave are affected the actual dendritic and axonal contributions for the altered behavior remain elusive. The authors certainly considered a potential contribution of altered dendrite projection of a-Wave neurons to the phenotype and their conclusion that altered axonal projections are involved is supported by the optogenetic experiment "bypassing" sensory input (albeit it seems unlikely that all Wave neurons are activated simultaneously when perceiving natural stimuli).However, the author should also consider that altered perception and projection of pWave neuron may directly (e.g. extended P-wave axon projections increase forward locomotion input thereby overriding backward locomotion) or indirectly (e.g. feedback loops between forward and backward circuits) contribute to the altered behavioral phenotypes in both assays. It is probably noteworthy that the more complex behavioral alterations observed with mechanical stimulation are likely to also be caused by altered dendritic projections.

      We fully agree with the reviewer’s thoughtful interpretation. We have now included these important possibilities in the revised Discussion section. Specifically, we acknowledge that while the DFz2 knockdown phenotypes are consistent with aberrant wiring of a-Wave neurons to forward locomotion-promoting circuits, the contributions of both axonal and dendritic alterations remain unclear. We also recognize that altered perception and projection of p-Wave neurons may directly or indirectly contribute to the observed behavioral phenotypes, particularly in response to mechanical stimulation.

      Presynaptic varicosities of a-Wave neurons in DFz2 KD animals are indicated by orange arrows in Figure 1. However, no presynaptic markers have been used to confirm actual ectopic synaptic connections. At least the authors should more clearly define what parameters they used to "visually" define potential presynaptic varicosities. Some arrows seem to point to more "globular structures" but for several others, it's unclear what they are pointing at.

      As mentioned in our response to Reviewer #1, we have now performed Brp immunostaining to confirm the presence of ectopic synaptic connections (new Figure 2). This analysis supports the interpretation that the presynaptic varicosities observed in DFz2 knockdown animals represent actual synaptic sites. We also clarified in the figure legend the visual criteria used to identify potential presynaptic varicosities.

      Reviewing Editor (Recommendations For The Authors):

      There are a few major concerns that we recommend the authors address:

      (1) Neuroanatomy: The point aberrant synaptic connectivity of a-Wave neurons following Dfz2 knockdown could be substantiated. This could be done by using a presynaptic marker and showing ectopic posterior presynaptic sites ( and/or reduced anterior presynaptic sites) in a-wave neurons.

      As mentioned in our response to the public review, we now have used Brp as a presynaptic marker to quantify the number and distribution of presynaptic sites along the normal and ectopic a-Wave axons (new Figure 2). We show that ectopic posterior Wave axons do contain presynaptic sites.  

      (2) Gradient calculations: As detailed in the reviews below, the Dfz2 gradient looks like it may be bidirectional. Changing the way the gradient is calculated might help address this point.

      As mentioned in our response above, we now have recalculated the gradient by comparing the DFz2 signal to A8 and show that it indeed is bidirectional (new Figure 5—figure supplement 2; formerly Figure 4—figure supplement 2).

      (3)  Statistics and sample sizes: As detailed in the reviews, some of the statistical reporting could be improved. Further, increasing sample sizes could help bolster confidence in the data as well.

      As mentioned above, we have added a description on the sample size, asterisks, and p-values in Figure 1 – figure supplement 2 legend. We also increased sample sizes of single Wave neurons in control and DFz2 knock-down animals (Figure 1F-J (WT and DFz2 KD) and new Figure 3C-G (WT; formerly Figure 2C-G)).

      (4) It would help to include some discussion of the potential contributions of altered p-wave neurons to the observed phenotypes.

      As described above, we have added in the Discussion potential contributions of altered p-wave neurons to the observed phenotypes. 

      Reviewer #1 (Recommendations For The Authors):

      (1) In the current model the authors assume that posterior elongation of a-wave neuron connectivity (axonal projections) induces a loss of connectivity to their natural targets, as backward motion is no longer induced, and a gain of connectivity to posterior wave neuron targets. Is this at the cost of innervation of p-wave neurons, e.g. did these neurons now lose connectivity to their natural targets as well? Therefore, it would be very interesting if the authors would test the behavioral responses to tactile stimuli in the posterior parts of the animal - does the response pattern change?

      This is indeed an interesting possibility that p-Wave function is altered upon DFz2 knock-down and hence behavioral response to posterior touch is changed. However, it is technically challenging to test this with tactile stimuli, due to the difficulty of (1) distinguishing between normal and fast-forward locomotion and (2) delivering a posterior touch stimulus while the larva is moving forward, which is the default behavior of the larvae on an agar plate.

      As highlighted above, the authors should provide additional evidence that the circuit response to a-wave neurons is changed after a DFz2 knockdown. The authors should monitor the activation wave in response to optogenetic activation of anterior wave neurons - analogous to the data provided in Figure 4 of their 2017 paper. If this response is now switched for a-wave activation but not p-wave activation it would greatly support their claims and this data would be less ambiguous compared to the behavioral locomotion data.

      As described in our response to the public review, we attempted this approach but found that the in vitro optogenetics experiment is unfortunately not feasible due to relatively weak expression of R60G09-GAL4 in the larvae. Local activation of control aWave induced fictive backward locomotion only at low frequencies, making comparison with the experimental a-Wave very difficult.  The MB120B-spGAL4 used in our 2017 study could not be employed in this study as it does not drive expression during the embryonic stages and thus cannot be used to knock down DFz2 during development. 

      (2) Related to this point. Why would the normal "backward" circuitry of a-wave neurons be functionally suppressed in Dfz2 knockdowns? Do the authors observe reduced synaptic connectivity in these segments? Vesicle clustering of synaptotagmin or other presynaptic markers could be used as a first. As the innervation pattern is only extended by approximately one segment, it is surprising that the changes are so significant.

      We agree that these are important and interesting points, which remain to be explored in the future study. As described above, we have performed Brp immunostaining and showed that the posterior ectopic axons of a-Wave do contain synapses (new Figure 2). We also found a slight decrease in the number of synapses in the anterior region, which could partially contribute to the weaker activation of downstream neurons responsible for eliciting backward locomotion. Another possibility is that backward suppression occurs through lateral interaction among downstream circuits. Since forward and backward locomotion do not occur simultaneously, it is likely that the circuits driving these two behaviors are mutually inhibitory. Upon DFz2 knock down in a-Wave, downstream neurons inducing fastforward locomotion may become more strongly activated than those inducing backward locomotion, resulting in inhibition of the latter via a “winner-take-all” mechanism. Since these discussions are highly speculative, we chose not to include them in the revised manuscript.  

      (3) The low number of neurons analyzed per segment is of slight concern. This is particularly the case for the control data set used in Figure 1 and Figure 2. As stated, the same datasets are used for both figures. However, at most 6 neurons were analyzed (and for two segments only 3). The control morphology may be more variable than indicated by this data.

      As mentioned above, we now have dissected 50 larvae each for the control and experimental groups, obtained seven and six clones respectively, and included these data in the revised manuscript. We apologize that the sample sizes are still relatively small but hope the reviewer understands the inherently low “hit rate” of the stochastic labelling method.

      It is somewhat curious that in Figure 1- Supplement 3 the authors report the same number of control clones per segment as in Figure 1/2 - is this simply a coincidence? And if this is an independent dataset why did the author use new controls here but not for Figure 2? It is clear that it is very difficult to generate this data but increasing the n-number beyond 3-6 per segment would significantly increase the confidence in the presented data.

      We apologize for the confusion. The data in Figure 1 – figure supplement 3 represent the innervation pattern of dendrites, not axons. We have corrected the figure caption accordingly. These data were obtained from the same samples used to analyze axonal innervation, as shown in the original version of Figure 1F-J.

      (2) The name of the RNAi lines should be indicated in Figure 1 and Figure Supplement 3 to facilitate reading - at least the precise names should be given in both figure legends.

      We have added these labels in the revised figure legends as requested.

      (3) In Figure 4E again the control numbers of Figure 1 for the A2-wave axon are reused. This does not seem appropriate as now a different Gal4 driver is used and a different method to induce individual neuronal clones. Both components may induce significant variability in expression or arborization. As only 3 clones for the wnt4 mutant condition are analyzed (and compared to 5 control clones), this data does not allow for strong conclusions. The authors clearly state the reuse and different methods in the legend of Figure 4 F/G but should also highlight it for the E panel.

      Here, we assume that the reviewer is referring to the former Figure 3 (now Figure 4). We have added a note in the legend that the control data, obtained using a different method, were reused in this panel.

      (4) The expression levels of DWnt4 and DFz2 were analyzed at the end of embryogenesis. At what developmental stage does the axonal extension of wave neurons take place? Is the gradient maintained throughout the first larval stages?

      Based upon the lateral view of Wave neurons in Figure 1—figure supplement 1D, we think that the axonal extension is already established by approximately 20 hr after egg laying. Previously, we performed Wnt4<sup>MI03717-Trojan-GAL4</sup> > GFP.nls immunostaining in the third instar larva and observed a similar gradient of GFP signals towards the posterior end of the ventral nerve cord (VNC). We have included this data in the revised manuscript (new Figure 5—figure supplement 1).

      (5) The authors state that either 2nd or 3rd instar larvae were used for the optogenetic experiments. This may induce unnecessary variation in their assay and should be avoided. As natural variance exists in larvae regarding forward stride duration, the comparison of "on" state forward stride duration between control and experimental genotype is potentially not the best measurement of effect size. What is the difference between OFF and ON stage within the control and experimental genotype? In both cases stride duration decreases but there may not be a significant difference between the delta of the two genotypes. Thus, the observed effect may in part be due to "slower" animals in the control pool. The authors should discuss this more carefully.

      We thank the reviewer for bringing up this critical issue. Indeed, the stride durations of larvae between the control and DFz2 knock-down are slightly different in the OFF condition, although this is not statistically significant. In addition, the effect size of Wave activation on mean stride duration is -0.14 (s) in control while -0.21 (s) in DFz2 knock-down, which we interpret as DFz2 knock-down resulting in stronger fastforward locomotion upon Wave activation. We have incorporated this note in the corresponding figure legends (new Figure 6; formerly Figure 5).

      (6) While the study clearly provides convincing evidence for their model, the authors should tune down their conclusions in the discussion a little bit and highlight that parts of their discussion are speculative.

      We have revised the discussion as suggested.

      Reviewer #2 (Recommendations For The Authors):

      Albeit the optogenetic behavioral experiments strongly support that the altered axonal projection affect normal locomotion, simultaneous labeling of Wave neurons in DFz2 KD animals with presynaptic markers would strengthen the conclusion of ectopic connection of the extended axon with other circuits.

      Please see our response to your public review.

      Figure 1 K+L, Figure 2H, I, Figure 3 F+G: many of the individual data points are not visible in the Whisker plot- changing their color would be useful to visualize them better.

      We have changed the outline width of the box plots to make the individual data points visible.

      Figure 1-Supplement 2: In addition to the comments in the public review- a) the asterisk font size changes in the different panels, e.g. it is much smaller in G', b) font size in some graphs/legends should be increased - in particular in E the hyphenated letters in the genotypes are so small rendering them almost illegible.

      We have unified the font size to make them readable in the figure. We thank the reviewer for the suggestions.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      Summary:

      The crystal structure of the Sld3CBD-Cdc45 complex presented by Li et al. is a significant contribution that enhances our understanding of CMG formation during the rate-limiting step of DNA replication initiation. This structure provides crucial insights into the intermediate steps of CMG formation, and the particle analysis and model predictions compellingly describe the mechanism of Cdc45 loading. Building upon previously known Sld3 and Cdc45 structures, this study offers new perspectives on how Cdc45 is recruited to MCM DH through the Sld3-Sld7 complex. The most notable finding is the structural rearrangement of Sld3CBD upon Cdc45 binding, particularly the α8-helix conformation, which is essential for Cdc45 interaction and may also be relevant to its metazoan counterpart, Treslin. Additionally, the conformational shift in the DHHA1 domain of Cdc45 suggests a potential mechanism for its binding to Mcm2NTD. Furthermore, Sld3's ssDNA-binding experiments provide evidence of its novel functions in the DNA replication process in yeast, expanding our understanding of its role beyond Cdc45 recruitment.

      Strengths:

      The manuscript is generally well-written, with a precise structural analysis and a solid methodological section that will significantly advance future studies in the field. The predictions based on structural alignments are intriguing and provide a new direction for exploring CMG formation, potentially shaping the future of DNA replication research. This research also opens up several new opportunities to utilize structural biology to unravel the molecular details of the model presented in the paper.

      Weaknesses:

      The main weakness of the manuscript lies in the lack of detailed structural validation for the proposed Sld3-Sld7-Cdc45 model, and its CMG bound models, which could be done in the future using advanced structural biology techniques such as single particle cryo-electron microscopy. It would also be interesting to explore how Sld7 interacts with the MCM helicase, and this would help to build a detailed long-flexible model of Sld3-Sld7-Cdc45 binding to MCM DH and to show where Sld7 will lie on the structure. This will help us to understand how Sld7 functions in the complex. Also, future experiments would be needed to understand the molecular details of how Sld3 and Sld7 release from CMG is associated with ssARS1 binding.

      The proposals based on this study provide new knowledge of the CMG formation process. We agree that our Sld3-Sld7-Cdc45 model will be further confirmed by cryo-EM. We improved our ssARS1-binding assay and quantified data (See the response to Recommendations for the authors of #3 review).

      Reviewer #2 (Public review):

      Summary

      The manuscript presents valuable findings, particularly in the crystal structure of the Sld3CBD-Cdc45 interaction and the identification of additional sequences involved in their binding. The modeling of the Sld7-Sld3CBD-CDC45 subcomplex is novel, and the results provide insights into potential conformational changes that occur upon interaction. Although the single-stranded DNA binding data from Sld3 of different species is a minor weakness, the experiments support a model in which the release of Sld3 from the complex may be promoted by its binding to origin single-stranded DNA exposed by the helicase.

      Strengths

      The Sld3CBD-Cdc45 structure is a novel contribution, revealing critical residues involved in the interaction.

      The model structures generated from the crystal data are well presented and provide valuable insights into the interaction sequences between Sld3 and Cdc45.

      The experiments testing the requirements for interaction sequences are thorough and conducted well, with clear figures supporting the conclusions.

      The conformational changes observed in Sld3 and Cdc45 upon binding are interesting and enhance our understanding of the interaction.

      The modeling of the Sld7-Sld3CBD-CDC45 subcomplex is a new and valuable addition to the field.

      The proposed model of Sld3 release from the complex through binding to single stranded DNA at the origin is intriguing.

      Weaknesses

      The section on the binding of Sld3 complexes to origin single-stranded DNA is somewhat weakened by the use of Sld3 proteins from different species. The comparisons between Sld3-CBD, Sld3CBD-Cdc45, and Sld7-Sld3CBD-Cdc45 involve complexes from different species, limiting the comparisons' value.

      Although the study reveals that Sld3 binds to different residues of Cdc45 than those previously shown to bind Mcm or GINS, the data in the paper do not shed any additional light on how GINS and Sld3 binding to Cdc45 or Mcms. would affect each other. Other previous research has suggested that the binding of GINS and Sld3 to Mcm or Cdc45 may be mutually exclusive. The authors acknowledge that a structural investigation of Sld3, Sld7, Cdc45, and MCM during the stage of GINS recruitment will be a significant goal for future research.

      We agree that it is better to use all samples from a source; however, due to limitations in protein expression, we used Sld7-Sld3CBD-Cdc45 from a different source. The two sources used in this study belong to the same family, and the proteins Sld7, Sld3 and Cdc45 share sequence conservation with similar structures predicted by Alphafold3 (RMSD = 0.356, 1.392, and 0.891 for Ca atoms of Sld7CTD, Sld7NTD-Sld3NTD, and Sld3CBD-Cdc45). Such similarity in source and proteins allows us to do the comparison. We also mentioned that a cryo-EM study of Sld3-Sld7-Cdc45-MCM and Sld3-Sld7-CMG structures will be a significant goal for future research in our manuscript.

      Reviewer #3 (Public review):

      Summary:

      The paper by Li et al. describes the crystal structure of a complex of Sld3-Cdc45-binding domain (CBD) with Cdc45 and a model of the dimer of an Sld3-binding protein, Sld7, with two Sld3-CBD-Cdc45 for the tethering. In addition, the authors showed the genetic analysis of the amino acid substitution of residues of Sld3 in the interface with Cdc45 and biochemical analysis of the protein interaction between Sld3 and Cdc45 as well as DNA binding activity of Sld3 to the single-strand DNAs of the ARS sequence.

      Strengths:

      The authors provided a nice model of an intermediate step in the assembly of an active Cdc45-MCM-GINS (CMG) double hexamers at the replication origin, which is mediated by the Sld3-Sld7 complex. The dimer of the Sld3-Sld7 complexes tethers two MCM hexamers together for the recruitment of GINS-Pol epsilon on the replication origin.

      Weaknesses:

      The biochemical analysis should be carefully evaluated with more quantitative ways to strengthen the authors' conclusion even in the revised version.

      In this revision, we improved our ssARS1-binding assay in more quantitative ways (See the response to Recommendations for the authors).

      Reviewer #1 (Recommendations for the authors):

      I thank the authors for all their replies to my previous questions and for doing all the necessary corrections. I am satisfied with most of their replies, however, upon second reading I have a few more suggestions which could help to improve the manuscript further and make an impact in the field. My comments are listed below.

      (1) In general, the manuscript is well structured, but I feel that it requires professional English correction. In many places it was difficult to understand the sentences and I had to read it several times to understand it. Also, very long sentences should be avoided. The flow should be easy to read and understand, and that is why I feel it requires professional English correction.

      Following the comment, we checked English carefully and shortened the very long sentences.

      (2) Page 5, line 103, please include molecule after the word complex to make it like- "Only one complex molecule exists within an asymmetric unit."

      We revised this sentence (P5/L103).

      (3) Line 113- more than the N-terminal half of the protruding long helix α7 113 was disordered in the Sld3CBD-Cdc45 complex. This sentence is not clear. What does it mean more than the N-terminal half? Please rewrite it.

      We revised this sentence to give the corresponding residue number “(D219–H231)” (P5/L114).

      (4) Page 5, result 2- Conformation changes in Sld3CBD and Cdc45 for binding each other, this section may require a little restructuring. Line 130-131- "Therefore, the helix α8CTP seems to be an intrinsically disordered segment when Sld3 alone but 130 folds into a helix coupled to the binding partner Cdc45 in the Sld3CBD-Cdc45 complex." This statement is the crux of the structural finding and therefore, I feel it should move after the first sentence.

      Thank you for your comments. We rewrote this part (P5/L128-131).

      (5) Line 121-122: Compared to the isolated form (PDBIDs: 5DGO 121 for huCdc45 [31] and 6CC2 for EhCdc45 [33]) and the CMG form (PDBID: 3JC6. Write it in the same format. Make 6CC2 in bracket like other PDB IDs. Restructure this sentence.

      We revised this sentence (P5/122-123).

      (6) Line 127-129: This sentence is also not very clear.

      We revised this sentence together with above No (4). (P5/L128-131)

      (7) In my question 4- "Can authors add a supplementary figure showing the probability of disordernes..."., I meant to use a disorder prediction tool like IUPred for the protein sequences and show that α8 is predicted to be a disordered upon sequence analysis. This will help to show the inherent property of α8 helix, and it could add up to the understanding that a disordered region is being structured in the complex structure.

      The structures showed that α8CTP is stabilized by binding with Cdc45, but disordered in Sld3CBD alone, indicating that this part is flexible, like an intrinsically disordered segment. We have deposited the structure to PDB, so predictions like IUPred cannot show meaningful information.

      (8) Question 9 regarding Supplementary Figure 8- Please include your statement in the figure legend - "WT Sld3CBD was prepared in a complex with Cdc45, while the mutants of Sld3CBD existed alone, we calculated the elements of secondary structure from the crystal structure of Sld3CBD-Cdc45. The concentration of samples was controlled to the same level for CD measurement."

      Following the comment, we optimized the figure legend of Supplementary Figure 8.

      (9) Question 13- I understand that negative staining and SEC-SAXS experiments could be very tricky for such protein complexes, which have very long loops and are flexible. Did authors try a GraFix cross-linking before doing the negative staining TEM? If it is not being tried, then it might be a good idea to try it and it may help to get much cleaner particles and easier class averaging. Although I completely understand the technical challenges the authors describe and I agree with them, I still feel that one good experiment that shows this dimer model would be very helpful to strengthen the claim. I am concerned because if people start using a similar DLS experiment to calculate intermolecular distances, citing your paper, in many cases it might be a wrong interpretation. In case the negative staining still does not work, at least discuss your technical challenges in the discussion section and mention that SEC-SAXS showed a similar length of the complex and show the Guinier plot and Porod plots in the supplementary data.

      We believe that DLS is one of the methods for analyzing the single particle size. Of course, the confirmation by multiple methods will give compelling evidence. Following the comment, we added SEC-SAXS data in the [Results] (P7/L194-196) (Cdc45 recruitment to MCM DH by Sld3 with partner Sld7) and Supplementary Figure 11. The Sld7-Sld3-Cdc45 forms a flexible, long shape. Each binding domain is rigid but linked by the long loops. The flexibility problems are caused by the long loop linkers, but not by binding. So, we did not try to use the cross-linking method for analysis experiments.  

      (10) Page 8, line 221- litter sequence specificity: Correct the word "litter" with little. Also, the word shaped is written as sharped at a few places in the manuscript. Please correct it.

      We apologize for making such mistakes. We have modified these words.

      (11) Page 9, line 237-238: Would it be possible to add a lane showing Sld7 binding to the ssDNA in figure 4. I recommend showing this to understand the ssDNA binding affinity of Sld7 by itself and it will also help us to compare when it is in complex with Sld3.

      Considering that Sld7 on CMG is always a complex with Sld3, the ssDNA binding affinity should use the Sld3-Sld7 complex. Additionally, we attempted to overexpress Sld7, but could not obtain the target protein.

      Reviewer #2 (Recommendations for the authors):

      Thank you for the improved manuscript. The following sentence is unclear: "Cdc45 binds tighter to long ssDNA (>60 bases) with a litter sequence specificity".

      We apologize for making such a mistake. We modified “litter” to “little”.

      I found it challenging to understand which species were used while reading the results section and figure legends. I recommend that the authors revise the text in both the results and figure legends to clearly indicate when proteins from different species are being compared. Additionally, it would be valuable to explicitly acknowledge this limitation in the text.

      Following the comment, we added a description for using different species in results (P8/L224-225) and figure legends (Supplementary Figure 14). We added more information in the Methods to explain why we used two species for preparing proteins.

      Reviewer #3 (Recommendations for the authors):

      Major points:

      (1) The current title is not appropriate for the general readers. At least, DNA replication or DNA replication initiation should be added and abbreviations such as CBD should be avoided.

      Following the comment, we added “DNA replication” into the title. Regarding “CBD”, since the full name of “Cdc45 binding domain” is too long, we continue to use Sld3CBD.

      (2) As in my previous review, I asked for quantification of the EMSA assay shown in Figure 4 and Supplemental Figures 13 and 14. Since some signals of the bands are very weak, it is hard to conclude something. Given different protein concentrations used in the experiment, the authors should provide any kinds of value. For example, Sld3CBD-CDC45 shows weaker DNA binding than Sld3CBD alone (line 231). Is this true (or reproducible)? It is hard to conclude without any quantification.

      We have repeated the EMSA assay four or more times with different rods of overexpression, purification and DNA synthesis, indicating that the EMSA assay is reproducible. In this revision, we changed the DNA stain and adjusted the ratio between the protein and ssDNA with increasing concentrations. The smeared bands of ssDNA with Sld7–Sld3ΔC–Cdc45 or Sld7–Sld3ΔC exhibit enhanced discernibility, and the ssDNA bands are intense enough for grayscale calculations (Figure 4 in the second revised version). We used a series of t-tests to confirm a significantly ssDNA residual level between Sld3CBD–Cdc45 to Sld3CBD, Sld7–Sld3ΔC–Cdc45, and Sld7–Sld3ΔCS (t-test, ****: P<0.0001). We also carefully controlled the sample amount in the EMAS assay and described it in the [Methods].

      Moreover, in this EMSA assay (in Figure 4), the authors suggest that the disappearance of ssDNA bands corresponds with the binding of the protein to the DNA. However, it is also possible that the DNA is degraded. It is very important to show the band of protein-DNA complexes on the gel (a whole gel, not the parts of the gel shown in Figure). Why did the authors use this "insensitive" assay using SyberGreen, not radio-labelled ssDNA?

      In this revision, we added a negative control of no ssDNA-binding by using ssARS1-3_3 for all protein samples (Sld3CBD, Sld3CBD–Cdc45, Sld7–Sld3ΔC–Cdc45 and Sld7–Sld3ΔC), which were the same rod of expression and purification for bound to ssARS1s (ssARS1-2 and ssARS1-5) (Figure 4), showing that the disappearance of ssDNA bands is caused by binding to proteins, not degradation. Moreover, this time, by changing the DNA stain and increasing the concentration of the samples, the smeared ssDNA bands exhibit enhanced discernibility in the high molecular weight regions when mixed with Sld7–Sld3ΔC–Cdc45 or Sld7–Sld3ΔC, whereas no bands appeared in the NC (ssARS1-3_1). The positions of smeared ssDNA bonds correspond to those of protein in the protein-stain pages, indicating that ssARS1 were complexed with proteins. Following the comment, we show all bands on the gel in Figure 4 and Supplementary Figure 14. Compared to Sld7–Sld3ΔC–Cdc45 or Sld7–Sld3ΔC, Sld3CBD and ssDNA bonds could not be observed because the pI value of Sld3CBD, which affects the entry of the samples into the gel.

      We agree that using radio-labelled ssDNA can obtain a sensitive binding assay. However, current laboratory constraints did not allow us to use radio-labelled ssDNA. Furthermore, considering the characteristics of our target proteins, Sld3CBD, Sld3CBD–Cdc45, Sld7–Sld3ΔC–Cdc45, and Sld7–Sld3ΔC, we planned to perform the binding assay in a more natural state without any modifications, labelling or linkers. Additionally, we have attempted to use ITC experiments but failed in the measurements. Presumably, the conformational flexibility of Sld7-Sld3-Cdc45 and Sld7-Sld3 caused a thermodynamic anomaly.

      Minor points:

      (1) Line 215, 80b: This should be "80 nucleotides(nt)". Throughout the text, nucleotides is better than base to show the length of ssDNAs.

      Thank you for your comments. We modified these words throughout the text.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      This is an exploratory study that doesn't explore quite enough. Critically, the authors make a point of mentioning that neuronal firing properties vary across cell types, but only use baseline firing rate as a proxy metric for cell type. This leaves several important explorations on the table, not limited to the following:”

      1a: “Do waveform shape features, which can also be informative of cell type, predict the effect of stimulation?”

      To address this question, we modeled our approach to cell type classification after Peyrache et al. 2012. More specifically, we extracted two features from the mean unit waveforms—the valley-to-peak time (VP) and the peak half-width (PHW). These features were then used to classify units into two distinct clusters (k-means, clusters = 2, based on a strong prior from existing literature), representing putative excitatory and inhibitory neurons. Our approach recapitulated many of the same observations in Peyrache et al. 2012, namely (1) identification of two clusters (low PHW/VP: inhibitory, high PHW/VP: excitatory), (2) an ~80/20 ratio of excitatory/inhibitory neurons, and (3) greater baseline firing rates in the inhibitory vs. excitatory neurons. However, we did not observe a preferential modulation of one cell type compared to another (see newly created Figure 4). A description of this analysis and its takeaways has been incorporated into the manuscript.

      Change to Text:

      Created Figure 4 (Separation of presumed excitatory and inhibitory neurons by waveform morphology).

      Caption: (A) Two metrics were calculated using the averaged waveforms for each detected unit: the valley-to-peak width (VP) and peak half-width (PHW). (B) Scatterplot of the relationship between VP and PHW; note that units with identical metrics are overlaid. Using k-means clustering, we identified two distinct response clusters, representing presumed excitatory (E, blue) and inhibitory (I, red) neurons. The units from which the example waveforms were taken are outlined in black. Probability distributions for each metric are shown along the axes. (C) Total number of units within each cluster, separated by region. (D) Comparison of baseline firing rates, separated by cluster. (E) Percent of modulated units in each cluster. * p < 0.05, NS = not significant.

      Added a description of clustering methodology to lines 132-137: “We calculated two metrics from the averaged waveform from each detected unit: the valley-to-peak-width (VP) and the peak half-width (PHW) (Figure 4A); previously, these two properties of waveform morphology have been used to discriminate pyramidal cells (excitatory) from interneurons (inhibitory) in human intracranial recordings (Peyrache et al., 2012). Next, we performed k-means clustering (n = 2 clusters) on the waveform metrics, in line with previous approaches to cell type classification.

      Added a section in the Results titled “Theta Burst Stimulation Modulates Excitatory and Inhibitory Neurons Equally”. Lines 370-378: “Using k-means clustering, we grouped neurons into two distinct clusters based on waveform morphology, representing neurons that were presumed to be excitatory (E) and inhibitory (I) (Figure 4B). Inhibitory (fast-spiking) neurons exhibited shorter waveform VP and PHW, compared with excitatory (regular-spiking) neurons (I cluster centroid: VP = 0.50ms, PHW = 0.51ms; E cluster centroid: VP = 0.32ms, PHW = 0.31ms), and greater baseline firing rates (U(N<sub>I</sub> = 23, N<<sub>E</sub> = 133) = 1074.50, p = 0.023) (Figure 4D). Although we observed a much greater proportion of excitatory vs. inhibitory neurons (E: 85.3%, I: 14.7%), stimulation appeared to affect excitatory and inhibitory neurons equally, suggesting that one cell type is not preferentially activated over another (Figure 4E).

      Modified discussion of the effects of stimulation on different cell types. Lines 475-483: “…To test these hypotheses directly, we clustered neurons into presumed excitatory and inhibitory neurons based on waveform morphology. In doing so, we observed ~85% excitatory and ~15% inhibitory neurons, which is very similar what has been reported previously in human intracranial recordings (Cowan et al. 2024, Peyrache et al., 2012). Interestingly, stimulation appeared to modulate approximately the same proportion of neurons for each cell type (~30%), despite the differently-sized groups. Recent reports, however, have suggested that the extent to which electrical fields entrain neuronal spiking, particularly with respect to phase-locking, may be specific to distinct classes of cells (Lee et al., 2024).”

      1b:  “Is the autocorrelation of spike timing, which can be informative about temporal dynamics, altered by stimulation? This is especially interesting if theta-burst stimulation either entrains theta-rhythmic spiking or is more modulatory of endogenously theta-modulated units.”

      The reviewer is correct in suggesting that rate-modulation represents only one of many possible ways by which exogenous theta burst stimulation may influence neuronal activity. Indeed, intracranial theta burst stimulation has previously been shown to evoke theta-frequency oscillatory responses in local field potentials (Solomon et al. 2021), and other forms of stimulation (i.e., transcranial alternating current stimulation) may modulate the rhythm, rather than the rate, of neuronal spiking (Krause et al. 2019).

      To investigate whether stimulation altered rhythmicity in neuronal firing, we contrasted the spike timing autocorrelograms, as suggested. More specifically, we computed the pairwise differences in spike timing for each trial, separating spikes into the same pre-, during-, and post-stimulation epochs described in the manuscript (bin size = 5 ms, max lag = 250 ms), grouped neurons by whether they were modulated, and then contrasted the differences in the latencies of the peak normalized autocorrelation value between epochs. Only neurons with a firing rate of ≥ 1 Hz (n = 70/203, 34.5%) were included in this analysis since sparse firing resulted in noisy autocorrelation estimates. Subsequent statistical testing of the peak latency differences between pre-/during- and pre-/post-stimulation did not reveal any group-level differences (Mann-Whitney U tests, p > 0.05). Thus, we were not able to identify neuronal responses suggestive of altered rhythmicity (see Figure S5). A description of this analysis and its takeaways has been incorporated into the manuscript.

      Of note, there are two elements of the data that constrain our ability to detect modulation in the rhythm of firing. First, the baseline activity recorded across neurons modulated by stimulation was relatively low (i.e., median firing rate = 1.77 Hz). Second, stimulation often resulted in a suppression, rather than an enhancement, of firing rate. Taken together, the sparse firing afforded limited opportunity to characterize changes to subtle patterns of spiking. 

      Change to Text:

      Created Figure S5 (Analysis of modulation in spiking rhythmicity)

      Caption: (A) Representative autocorrelograms ACG) for a single neuron. The pairwise differences in spike timing were computed for each trial and epoch (bin size = 5 ms, max lag = 250 ms), then smoothed with a Gaussian kernel. The peak in the normalized ACG across trials was computed for each epoch. (B) Kernel density estimate of the peak ACG lag, separated by epoch. (C) The peak ACG lags were split by whether the neuron was modulated (Mod) or unaffected by stimulation (NS = not significant) for each of the two contrasts: pre- vs. during-stim (left) and pre- vs. post-stim (right).

      Details about the autocorrelation methodology have been incorporated. Lines 166-172: “To investigate whether stimulation altered rhythmicity in neuronal firing, we analyzed the spike timing autocorrelograms. More specifically, we computed the pairwise differences in spike timing for each trial (bin size = 5 ms, max lag = 250 ms) and then contrasted the differences in the latencies of the peak normalized autocorrelation value between epochs (pre-, during-, post-stimulation). Only neurons with a firing rate of ≥ 1 Hz (n = 70/203, 34.5%) were included in this analysis since sparse firing resulted in noisy autocorrelation estimates.

      The results from contrasting the autocorrelograms are now mentioned briefly. Lines 297-298: “Stimulation, however, did not appear to alter the rhythmicity in neuronal firing, as measured by spiking autocorrelograms (Figure S5).”

      1c: “The authors reference the relevance of spike-field synchrony (30-55 Hz) in animal work, but ignore it here. Does spike-field synchrony (comparing the image presentation to post-stimulation) change in this frequency range? This does not seem beyond the scope of investigation here.”

      We agree that a further characterization of spike-field and spike-phase relationships may provide rich insights into more complex regional and interregional dynamics that may be altered by stimulation. Given that many metrics are biased by sample size (e.g., number of spikes), which can vary considerably, computing the pairwise phase consistency (PPC) between spikes and LFP is a preferred metric (Vinck et al. 2010). Although PPC is unbiased, its variance nonetheless increases considerably with low spike counts; pooling spike counts across trials, however, decouples the temporal relationship between spiking and the LFP phase for each trial, confounding results and yielding an unstable estimate.

      To determine whether such an analysis is indeed possible, we calculated the percentage of stimulation trials with ≥ 10 spikes in both the 1s pre- and post-stimulation epochs (a relatively low threshold for inclusion). Only a very small proportion of the total number of trials across all neurons met this criterion (2.5%). Thus, because of the sparse spiking in our data, we are unable to reliably characterize spike-field or spike-phase modulation in detected neurons.

      Change to Text:

      In the manuscript, we have added a description of why our data is not well-suited to investigate these relationships.

      Lines 532-538: “The present study did not investigate interactions between spiking activity and local field potentials because neuronal spiking was sparse at baseline and often further suppressed by stimulation; only a very small proportion of the total number of trials across all neurons exhibited ≥ 10 spikes in both the 1s pre- and post-stimulation epochs (~2.5%). Although certain metrics are not biased by sample size (e.g., pairwise phase consistency), low spike counts can dramatically affect variance and, therefore, result in unstable estimates (Vinck et al., 2011).

      1d: “How does multi-unit activity respond to stimulation? At this somewhat low count of neurons (total n=156 included) it would be valuable to provide input on multi-unit responses to stimulation as well.”

      We thank the reviewer for this suggestion. We have incorporated an analysis of multiunit activity (MUA), which similarly identifies robust modulation via permutation-based statistical testing and characterizes the different profiles of responses (i.e., increased vs. decreased MUA threshold crossings pre- vs. post-stimulation).

      Change to Text:

      Created Figure S8 (Analysis of multiunit activity response to stimulation)

      Caption: (A) Example trace of multiunit activity (MUA) in one channel during a single stimulation trial. Threshold crossings are highlighted with a pink dot overlaid on the MUA signal with a corresponding hash below. (B) The percentage of channels with significantly modulated MUA, separated by the direction of effect. (C) The percentage of channels with significantly modulated MUA, separated by direction effect and region. Inc (red; post > pre) vs. Dec (blue; post < pre). HIP = hippocampus, OFC = orbitofrontal cortex, AMY = amygdala, ACC = anterior cingulate cortex. *** p < 0.001, NS = not significant.

      Details about the MUA methodology have been incorporated. Lines 174-180: “Finally, we measured modulation in multiunit activity (MUA) by filtering the microleectrode signals in a 300-3,000 Hz window and counting the number of threshold crossings. Thresholds were determined on a per-channel basis and defined as -3.5 times the root mean square of the signal during the baseline period; activity during stimulation was excluded since stimulation artifact is difficult to separate from MUA in the absence of spike sorting.

      MUA results are now incorporated. Lines 365-367: “Additional characterization of MUA revealed a dominant signature of increased activity post- vs. pre-stimulation, in line with these trends observed at the single-neuron level (Figure S8).”

      1e: “Several intracranial studies have implicated proximity to white matter in determining the effects of stimulation on LFPs; do the authors see an effect of white matter proximity here?”

      We thank the reviewer for the interesting question. Subsequent characterization revealed only small differences in the proximity of stimulation contacts to white matter (range 1.5-8.0 mm), likely because the chosen target (i.e., basolateral amygdala) has several nearby white matter structures (e.g., stria terminalis). Nonetheless, we performed a linear regression between the proximity to white matter and the stimulation-induced effect on behavior (stimulation vs. no-stimulation d’ difference), the results of which indicate no clear association (p > 0.05; see Figure S9). Critically, this is not to suggest that white matter proximity has no interaction with the reported behavioral effects, but rather, that we could not identify such an association within our data.

      Change to Text:

      Created Figure S9 (The effect of stimulation proximity to white matter and distance to recorded neurons).

      Caption: (A) Kernel density estimate of the Euclidean distance from stimulation contacts to nearest WM structure (in mm); hash marks represent individual observations. (B) The change in memory performance (Δd’) was linearly regressed onto the distance from the stimulated contacts to white matter.

      The following has been added to lines 405-426: “Proximity to white matter has been shown to influence the effects of stimulation on behavior and the strength of evoked responses (Mankin et al., 2021; Mohan et al., 2020; Paulk et al., 2022). Across all stimulated contacts, we observed only small differences in the proximity of stimulation contacts to white matter (median = 4.5 mm, range = 1.5-8.0 mm), likely because the chosen target (i.e., basolateral amygdala) has several nearby white matter structures (e.g., stria terminalis). Nonetheless, we performed a linear regression between the proximity to white matter and the stimulation-induced effect on behavior (stimulation vs. no-stimulation d’ difference), the results of which indicate no clear association (p > 0.05; see Figure S9).

      Comment 2: “It is a little confusing to interpret stimulation-induced modulation of neuronal spiking in the absence of stimulation-induced change in behavior. How do the authors findings tell us anything about the neural mechanisms of stimulation-modulated memory if memory isn't altered? In line with point #1, I would suggest a deeper dive into behavior (e.g. reaction time? Or focus on individual sessions that do change in Figure 4A?) to make a stronger statement connecting the neural results to behavioral relevance.”

      We agree that the connection between the observed stimulation-induced neuronal modulation and effects on behavior is unclear and has proven challenging to elucidate. Per the reviewer’s suggestion, we further focused our analyses on the neuronal modulation effects in the individual sessions that resulted in a robust change in memory performance (stimulation vs. no-stimulation d’ difference threshold of ± 0.5, based on a moderate effect size for Cohen’s d); both a positive and negative threshold were used to capture robust changes in memory performance associated with firing rate modulation, whether enhancement or suppression. To this end, we contrasted the proportion of modulated neurons in the sessions where stimulation resulted in a robust behavioral change (Δd’) with those that did not (~d’). We did not observe a difference in the proportions between groups when collapsed across all sampled regions, or when separately evaluated (Fisher’s exact tests, p > 0.05; see Figure 5C).

      Given that this approach did not further clarify the connection between our neural and behavioral results, we believe it is most appropriate to deemphasize claims in the manuscript regarding the potential insights for behavioral modulation (e.g., memory enhancement), and have done so.

      Change to Text:

      Toned down reference to the memory-related effects of stimulation in the abstract by removing the following lines from the abstract: “Previously, we demonstrated that intracranial theta burst stimulation (TBS) of the basolateral amygdala (BLA) can enhance declarative memory, likely by modulating hippocampal-dependent memory consolidation…” and “…and motivate future neuromodulatory therapies that aim to recapitulate specific patterns of activity implicated in cognition and memory.”

      Changed Figure 4 to Figure 5

      Created Figure 5C (Interaction between behavioral effects and neuronal modulation)(C)  Change in recognition memory performance was split into two categories using a d’ difference threshold of ± 0.5: responder (positive or negative; Δd’, pink) and non-responder (~d’, grey). Individual d’ scores are shown (left) with points colored by outcome category; dotted lines demarcate category boundaries, and the grey-shaded region represents negligible change. The number of sessions within each outcome category (middle) and the proportion of modulated units as a function of outcome category, separated by region (right). NS = not significant.

      The description of the behavioral results has been updated. Lines 394-403: “At the level of individual sessions, we observed enhanced memory (Δd’ > +0.5) in 36.7%, impaired memory (Δd’ < -0.5) in 20.0%, and negligible change (-0.5 ≤ Δd’ ≤ 0.5) in 43.3% when comparing performance between the stim and no-stim conditions; a threshold of Δd’ ± 0.5 was chosen for this classification based on the defined range of a “medium effect” for Cohen’s d. To test our hypothesis that neuronal modulation would be associated with changes in memory performance, we combined the sessions that resulted in either memory enhancement or impairment and contrasted the proportion of modulated units across regions sampled. We did not, however, observe a meaningful difference in the proportion of modulated units when grouped by behavioral outcome (all contrasts p > 0.05) (Figure 5C).

      Lines 213-214 and 394-397 have been edited to reflect a change in the d’ threshold used for categorizing behavioral results (from Δd’ ± 0.2 to Δd’ ± 0.5).

      Comment 3: “It is not clear to me why the assessment of firing rates after image onset and after stim offset is limited to one second - this choice should be more theoretically justified, particularly for regions that spike as sparsely as these.”

      We thank the reviewer for this question and acknowledge that no clear justification was provided for this decision in the manuscript. Our decision to limit each of the analysis epochs to 1s was chosen for two reasons. First, the maximum possible length of the during-stimulation epoch was 1 s (stim on for 1 s). Although the pre- and post-stimulation epochs could be extended without issue, we were concerned that variable time windows could introduce a bias, for instance, resulting in different variances between epochs. Second, we anticipated, both from empirical observations and prior literature, that the neural response following stimulation or task features (e.g., image onset/offset) was likely to be transient, rather than sustained for a period of many seconds. By keeping the windows short, we ensured that our approach to detecting modulation (i.e., contrasting trial-wise spike counts between each pair of epochs) captured the intended effect rather than random noise. We have incorporated a discussion of this rationale in the Peri-Stimulation Modulation Analyses section.

      Change to Text:

      Lines 156-158 have been added: “Each epoch was constrained to 1 s to ensure that subsequent firing rate contrasts were unbiased and to capture potential transient effects (e.g., image onset/offset).”

      Comment 4: “This work coincides with another example of human intracranial stimulation investigating the effect on firing rates (doi: https://doi.org/10.1101/2024.11.28.625915). Given how incredibly rare this type of work is, I think the authors should discuss how their work converges with this work (or doesn't).”

      Thank you for bringing this highly relevant work to our attention. We were unaware of this recent preprint and have incorporated a discussion of its main findings into the manuscript.

      Change to Text:

      New citations: van der Plas et al. 2024 (bioRxiv), Cowan et al. 2024 (bioRxiv)

      The discussion of related studies has been updated. Lines 447-457: “Few studies, however, have characterized the impact of electrical stimulation via macroelectrodes on the spiking activity of human cortical neurons, none of which involve intracranial theta burst stimulation. One study reported a long-lasting reduction in neural excitability among parietal neurons, with variable onset time and recovery following continuous transcranial TBS in non-human primates (Romero et al., 2022). In a similar vein, it was recently shown that human neurons are largely suppressed by single-pulse electrical stimulation (Cowan et al., 2024; Plas et al., 2024). Other emerging evidence suggests that transcranial direct current stimulation may entrain the rhythm rather than rate of neuronal spiking (Krause et al., 2019) and that stimulation-evoked modulation of spiking may meaningfully impact behavioral performance on cognitive tasks (Fehring et al., 2024).”

      Comment 5: “What information does the pseudo-population analysis add? It's not totally clear to me.”

      We recognize the need to further contextualize the motivation for the exploratory pseudo-population analysis and appreciate the reviewer for bringing the lack of detail to our attention. In brief, the analysis allowed us to observe trends in activity across populations of neurons, which, in principle, are not visible by characterizing modulation solely in discrete neurons. Additional details have been incorporated into the manuscript, as suggested.

      Change to Text:

      Additional justification has been incorporated in the description of the methodology. Lines 185-187: “…This approach enables the identification of dominant patterns of coordinated neural activity that may not be apparent when examining individual neurons in isolation.”, lines 192-194: “…By collapsing across subjects into a common pseudo-population, this analysis provides a mesoscale view of how stimulation modulates shared activity patterns across anatomically distributed neural populations.”

      A summary interpretation has been added to the paragraph describing the results. Lines 326-328: “Taken together, these analyses reveal global structure in the state space of responses to BLA stimulation within hippocampal circuits.”

      Reviewer #2 (Public review):

      Comment 1 “Authors suggest that the units modulated by stimulation are largely distinct from those responsive to image offset during trials without stimulation. The subpopulation that responds strongly also tends to have a higher baseline of firing rate. It's important to add that the chosen modulation index is more likely to be significant in neurons with higher firing rates.”

      This is an important point that was not previously addressed in our manuscript. We suspect there are likely two factors at play worth considering with respect to our chosen nonparametric modulation index: neurons with lower activity require smaller changes in spike counts to be significantly modulated (easier to flip ranks), and neurons with higher activity empirically exhibit greater absolute shifts in the number of spikes. Our further use of permutation testing, while mitigating false positives, may also somewhat constrain the ability to detect modulation in sparsely active neurons. Nonetheless, given that many trials entailed few or no spikes, we believe this approach is preferable to alternatives that may be more susceptible to noise (e.g., percent change in trial-averaged firing rate from baseline).

      To better understand the tradeoffs with detection probability, we performed a sensitivity analysis. We generated synthetic data with different baseline firing rates (0.1-5.0 Hz) and effect sizes (± 0.1-0.7 Hz) and simulated the likelihood of detection with our given modulation index across neurons. The results of the simulation support the notion that the probability of detecting modulation is lower for sparsely active neurons (Figure S8C). Further discussion of this consideration for the chosen modulation index, as well as details regarding the sensitivity analysis, have been incorporated into the manuscript.

      Change to Text:

      Created Figure S7C (Detection probability analysis)

      Caption: The same permutation-based analyses reported in the manuscript were repeated under different control conditions… (C) Visualization of the predicted probability of detecting modulation across synthetic neurons with variable firing rates and modulation effect sizes; FR = firing rate.

      Lines 223-224 have been added to the Methods section titled “Firing Rate Control Analyses”: “We performed a series of control analyses to test whether our approach to firing rate detection was robust…”

      A description of the simulation has been incorporated into the same section as above. Lines 234-237: “Finally, to better understand the tradeoffs with our statistical approach, we generated synthetic data with different baseline firing rates (0.1-5.0 Hz) and effect sizes (± 0.1-0.7 Hz), then simulated the likelihood of detecting modulation across variable conditions (Figure S7C).”

      The description of the results from the control analyses has been updated. Lines 330-339: “Finally, we performed three supplementary analyses to evaluate the robustness of our approach to detecting firing rate modulation: a sensitivity analysis assessing the proportion of modulated units at different firing rate thresholds for inclusion/exclusion, a data dropout analysis designed to control for the possibility that non-physiological stimulation artifacts may preclude the detection of temporally adjacent spiking, and a synthetic detection probability analysis. These results recapitulate our observation that units with higher baseline firing are most likely to exhibit modulation (though the probability of detecting modulation is lower for sparsely active neurons) and suggest that suppression in firing rate is not solely attributable to amplifier saturation following stimulation (Figure S7).

      Comment 2: “Readers can benefit from understanding with more details the locations chosen for stimulation - in light of previous studies that found differences between effects based on proximity to white matter (For example - PMID 32446925, Mohan et al, Brain Stimul. 2020 and PMID 33279717 Mankin et al Brain Stimul. 2021).”

      This has been addressed in the above response to Reviewer’s 1 comment 1.1e.

      Change to Text:

      See changes related to Reviewer 1 comment 1.1e.

      Comment 3: “Missing information in the manuscript…”

      3a: “Images of stimulation anatomical locations for all subjects included in this study. Ideally information about the impedance of the contacts to be able to calculate the actual current used.”

      As requested, we have provided an image from the coronal T1 MRI sequence, which highlights the position of the stimulated contacts for each of the 16 patients. Though we did not measure the impedances directly, the stimulation was current-controlled, which ensured that the desired current and charge density were consistent regardless of the tissue or electrode impedance.

      Change to Text:

      Created Figure S1 (Anatomical location of stimulated electrodes).

      Caption: A coronal slice from the T1-weighted MRI scan is shown for each patient who participated in the study (n = 16). Electrode contacts within the same plane of the image are shown with blue circles, and the bipolar pair of stimulated contacts within the basolateral amygdala is highlighted in red.

      Lines 144-145 have been edited to reflect that the delivered stimulation was current-controlled: “Specifically, we administered current-controlled, charge-balanced, …”

      3b: “The studied population is epilepsy patients, and the manuscript lacks description of their condition, proximity to electrodes included in the study to pathological areas, and the number of units from each patient/hemisphere.”

      We agree that additional information regarding patient demographics, experimental details, and clinical characteristics would further contextualize this unique patient population. A new table has been included, which contains the following information: patient ID, sex, age, # experimental session, # SEEG leads (and # microelectrodes), # detected units (L vs. R hemisphere), and suspected seizure onset zone.

      Change to Text:

      Created Table S1 (Patient demographics and clinical characteristics).

      Lines 258-259 have been added: “…(see Table S1 for patient demographics).”

      3c: “I haven't seen any comments on code availability (calculating modulation indices and statistics) and data sharing.”

      For clarification, a section titled Resource Availability is already appended to the end of the manuscript following the Conclusion, which describes the data and code availability.

      Change to Text:

      None

      3d: “Small comment - Figure legend 3E - Define gray markers (non-modulated units?)”

      Thank you for highlighting this omission. We have updated the relevant figure caption.

      Change to Text:

      The following has been added to the Figure 3 caption: “…whereas units without a significant change in activity are shown in grey.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      This manuscript presents a comprehensive structure-guided secretome analysis of gall-forming microbes, providing valuable insights into effector diversity and evolution. The authors have employed AlphaFold2 to predict the 3D structures of the secretome from selected pathogens and conducted a thorough comparative analysis to elucidate commonalities and unique features of effectors among these phytopathogens.

      Strengths:

      The discovery of conserved motifs such as 'CCG' and 'RAYH' and their central role in maintaining the overall fold is an insightful finding. Additionally, the discovery of a nucleoside hydrolase-like fold conserved among various gall-forming microbes is interesting.

      Weaknesses:

      Important conclusions are not verified by experiments.

      Thank you very much. There are many aspects of this study that could be further validated, each potentially requiring years of work. Therefore, we chose to focus on two specific hypotheses: are AlphaFol-Multimer predictions accurate? Can ANK target more than one host protein? Particularly, we focused on the identification of putative targets for one of the ankyrin repeat proteins, PBTT_00818 (Fig. 6). Using one-by-one yeast two-hybrid (Y2H) assays, we tested the AlphaFold-Multimer prediction of an interaction between PBTT_00818 and MPK3. The interaction did not occur in yeast, suggesting it might not take place under those conditions.

      This negative result led us to perform a Y2H screen using an Arabidopsis cDNA library, which identified a GroES-like protein, highly expressed in roots, as a potential target of the ANK effector. Surprisingly, both the PBTT_00818–MPK3 and PBTT_00818–GroES-like protein interactions were later confirmed in planta using BiFC assays. These findings suggest two key points: (1) AlphaFold predictions can be accurate for ANK proteins, and (2) ANK domains, known for mediating protein-protein interactions, may enable these effectors to target multiple host proteins.

      Although the precise biological implications remain unclear, it is possible that ANK proteins act as scaffolds or adaptors for other effectors during infection. The validations presented here open exciting avenues for further research into the role of ANK proteins in Plasmodiophorid pathogenesis and gall formation. This is presented in the corrected preprint and Fig. 7, Table S12, Fig. S7-S8.

      Reviewer #2 (Public review):

      Summary:

      Soham Mukhopadhyay et al. investigated the protein folding of the secretome from gall-forming microbes using the AI-based structure modeling tool AlphaFold2. Their study analyzed six gall-forming species, including two Plasmodiophorid species and four others spanning different kingdoms, along with one non-gall-forming Plasmodiophorid species, Polymyxa betae. The authors found no effector fold specifically conserved among gall-forming pathogens, leading to the conclusion that their virulence strategies are likely achieved through diverse mechanisms. However, they identified an expansion of the Ankyrin repeat family in two gall-forming Plasmodiophorid species, with a less pronounced presence in the non-gall-forming Polymyxa betae. Additionally, the study revealed that known effectors such as CCG and AvrSen1 belong to sequence-unrelated but structurally similar (SUSS) effector clusters.

      Strengths:

      (1) The bioinformatics analyses presented in this study are robust, and the AlphaFold2-derived resources deposited in Zenodo provide valuable resources for researchers studying plant-microbe interactions. The manuscript is also logically organized and easy to follow.

      (2) The inclusion of the non-gall-forming Polymyxa betae strengthens the conclusion that no effector fold is specifically conserved in gall-forming pathogens and highlights the specific expansion of the Ankyrin repeat family in gall-forming Plasmodiophorids.

      (3) Figure 4a and 4b effectively illustrate the SUSS effector clusters, providing a clear visual representation of this finding.

      (4) Figure 1 is a well-designed, comprehensive summary of the number and functional annotations of putative secretomes in gall-forming pathogens. Notably, it reveals that more than half of the analyzed effectors lack known protein domains in some pathogens, yet some were annotated based on their predicted structures, despite the absence of domain annotations.

      Weaknesses:

      (1) The effector families discussed in this paper remain hypothetical in terms of their functional roles, which is understandable given the challenges of demonstrating their functions experimentally. However, this highlights the need for experimental validation as a next step.

      Thank you. Yes, there is a lot of work to do in the coming years.

      (2) Some analyses, such as those in Figure 4e, emphasize motifs derived from sequence alignments of SUSS effector clusters. Since these effectors are sequence-unrelated, sequence alignments might be unreliable. It would be more rigorous to perform structure-based alignments in addition to sequence-based ones for motif confirmation. For instance, methods described in Figure 3E of de Guillen et al. (2015, https://doi.org/10.1371/journal.ppat.1005228) or tools like Foldseek could be useful for aligning structures of multiple sequences.

      In Fig. 4e, we highlight the conserved cysteine residues. While there is no clearly conserved overall motif, the figure illustrates that despite the high sequence divergence, the key cysteines involved in disulfide bridge formation are consistently conserved across the sequences.

      (3) When presenting AlphaFold-generated structures, it is essential to include confidence scores such as pLDDT and PAE. For example, in Figure 1D of Derbyshire and Raffaele (2023, https://doi.org/10.1038/s41467-023-40949-9), the structural representations were colored red due to their high pLDDT scores, emphasizing their reliability.

      Thank you for the observation. Due to the restrictive parameters used in our analysis, over 90% of the structure would appear red. For this reason, we chose not to include the color scale, as it would not provide additional informative value in this context.

      Reviewer #1 (Recommendations for the authors):

      Experimental validation of the significance of 'CCG' and 'RAYH' motifs would further strengthen this study.

      Regarding the Mig1-like protein in Ustilago maydis, the presence of four conserved cysteine residues that are pivotal for maintaining the stability of its folded structure raises an intriguing question. Specifically, while many Mig cluster effectors contain four cysteine residues that form two conserved disulfide bridges, this structure is notably absent in the Mig protein itself. The author has speculated that these four cysteine residues form two conserved disulfide bonds, which are crucial for the stability of Mig protein folding. However, this hypothesis remains unvalidated. To test this prediction, it would be prudent to simulate mutations in the cysteine residues corresponding to the disulfide bonds in Mig and employ molecular dynamics simulations to assess the stability of folding before and after the mutation.

      Mig-1 does contain the four conserved cysteine residues responsible for forming disulfide bridges. However, due to the high divergence among Mig-1-like sequences, the alignment software was unable to properly align all the cysteine residues. As a result, Mig-1 may appear to lack these conserved cysteines in the alignment, although they are indeed present upon individual inspection. This is an area that research groups working with U. maidis as a model could explore further to expand our understanding of this effector family.

      Could you please clarify why talking about Ankyrins and LRR in Arabidopsis thaliana (line 252)? Additionally, what are the structural and functional differences between the LRR sequences of P. brassicae and those of the host plants?

      This sentence refers to the identification of the ANK motif in P. brassicae and S. spongospora, not in Arabidopsis thaliana. While the hydrophobic core of the ANK domains appears conserved between the host and the pathogen, the surface residues are highly polymorphic.

      The evidence supporting the interaction between the ANK effector and Arabidopsis immunity-related proteins, as validated using AlphaFold-Multimer, is currently limited. To enhance the reliability of these data, it is advisable for the author to select several pairs of proteins predicted to interact for further experimental verification.

      We conducted a large-scale yeast two-hybrid (Y2H) screen using the ANK domain effector PBTT_00818, which was selected due to its high iPTM+pTM score. The Y2H interactions were subsequently validated through BiFC assays. Our results show that PBTT_00818 interacts with Arabidopsis MPK3 in the nucleus, consistent with predictions from the AlphaFold2-multimer model. In addition, PBTT_00818 was also found to target AT3G56460, a GroES-like zinc-binding alcohol dehydrogenase, also localized in the nucleus.

      While the manuscript is well-composed, certain sections could be enhanced for clarity and readability. For example, the discussion section could be expanded to include a more in-depth analysis of the implications of the findings for understanding the virulence mechanisms of gall-forming microbes. Additionally, a comparison of the findings with previous studies on related pathogens would provide a more comprehensive perspective.

      Certain sections of the discussion have been expanded. However, we chose to focus on the novel aspects of the study and to avoid comparisons with other plant pathogens, as those mechanisms are already well known and extensively studied. Studies using AlphaFold in plant pathology are also limited.

      *Reviewer #2 (Recommendations for the authors):*

      The results of clustering analyses are highly dependent on the chosen thresholds. Given that the authors provide clear and well-designed visualizations of SUSS effectors in Figures 4a and 4b, applying the same presentation methods to Figures 5a and 5b could make these analyses more convincing.

      We were able to generate the all-vs-all matrix for Figures 4a and 4b because it involved only 13 proteins. However, Figure 5b includes over 40 effectors, making it impractical to visualize the data in the same way. Instead, we presented the sequence-based clusters as nodes and connected them based on structural similarity.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      Comments on revisions:

      I appreciate the authors responding to my comments. I think Fig. S10 helps put the structural data into more context. It would be helpful to make clearer in the legend what proteins are being compared, especially in 10C.

      Although I can see why the authors focus on the NifK extension and its potential connection to oxygen protection, I would point out that Vnf and Anf do not have this extension in their K subunit, and you find both Vnf and Anf in aerobic and facultative anaerobic diazotrophs. This is a minor point, but I think it is important to mention in the discussion.

      We thank the reviewer for their thoughtful comments. We now added an additional line to the Discussion following their recommendation and moved Figure S10 to main text.

      Reviewer #2 (Public review):

      Summary: 

      This work aims to study the evolution of nitrogenanses, understanding how their structure and function adapted to changes in environment, including oxygen levels and changes in metal availability. 

      The study predicts > 3000 structures of nitrogenases, corresponding to extant, ancestral and alternative ancestral sequences. It is observed that structural variations in the nitrogenases correlate with phylogenetic relationships. The amount of data generated in this study represents a massive and admirable undertaking. The study also provides strong insight into how structural evolution correlates with environmental and biological phenotypes. 

      We thank the reviewer for their summary and positive appraisal.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      We appreciate the reviewer’s agreement that our data, "support most of the conclusions made”.

      With respect to Concerns raised by reviewer 1:

      (1) Although ectopically expressed PHD1 interacts with ectopically expressed RepoMan, there is no evidence that endogenous PHD1 binds to endogenous RepoMan or that PHD1 directly binds to RepoMan.

      We do not fully agree that this comment is accurate - the implication is that we only show interaction between two exogenously expressed proteins, i.e. both exogenous  PHD1 and RepoMan, when in fact we show that tagged PHD1 interacts with endogenous RepoMan. The major technical challenge here is the well known difficulty of detetcing endogenous PHD1 in such cell lines. We agree that co-IP studies do not prove that this interaction is direct and never claim to have shown this, though we do feel that a direct interaction is most likely, albeit not proven.

      (2) There is no genetic evidence indicating that PHD1 controls progression through mitosis by catalyzing the hydroxylation of RepoMan.

      We agree that our current study is primarily a biochemical and cell biological study, rather than a genetic study. Nonetheless, similar biochemical and cellular approaches have been widely used and validated in previous studies in mechanisms regulating cell cycle progression and we are confident in the conclusions drawn based on the data obtained so far.

      (3) Data demonstrating the correlation between dynamic changes in RepoMan hydroxylation and H3T3 phosphorylation throughout the cell cycle are needed.

      We agree that it will be very interesting to analyse in more detail the cell cycle dynamics of RepoMan hydroxylation and H3T3 phosphorylation - along with other cell cycle parameters. We view this as outside the scope of our present study and are actively engaged in raising the additional funding needed to pursue such future experiments.

      (4) The authors should provide biochemical evidence of the difference in binding ability between RepoMan WT/PP2A and RepoMan P604A/PP2A.

      Here again we agree that it will be very interesting to analyse in future the detailed binding interactions between wt and mutant RepoMan and other interacting proteins, including PP2A. We view this as outside the scope of our present study and are actively engaged in raising the additional funding needed to pursue such future experiments.

      (5) PHD2 is the primary proline hydroxylase in cells. Why does PHD1, but not PHD2, affect RepoMan hydroxylation and subsequent control of mitotic progression? The authors should discuss this issue further.

      We agree with the main point underlining this comment, i.e., that there are still many things to be learned concerning the specific roles and mechanisms of the different PHD enzymes in vivo. We look forward to addressing these questions in future studies.

      Reviewer #2 (Public review):

      We appreciate the reviewer’s comments that our manuscript uses biochemical and imaging tools to delineate a key mechanism in the regulation of the progression of the cell cycle and their appreciation that our experiments performed are, 'conclusive with well-designed controls.'

      With respect to the specific Concern raised by reviewer 2:

      Lack of in vitro reconstitution and binding data.

      We agree that it will be very interesting to pursue in vitro reconstitution studies and detailed binding data. We view this as outside the scope of our present study and are actively engaged in raising the additional funding needed to pursue such future experiments.

      Reviewer #3 (Public review):

      We appreciate the reviewer’s comments that our study, “is a comprehensive molecular and cell biological characterisation of the effects of P604 hydroxylation by PHD1 on RepoMan, a regulatory subunit of the PPIgamma complex” and their conclusion that, “we should have no question about the validity of the PHD1-mediated hydroxylation”.

      With respect to the specific Concern raised by reviewer 3:

      Reliance on a Proline-Alanine mutation in RepoMan to mimic an unhydroxylatable protein. The mutation will introduce structural alterations, and inhibition or knockdown of PHD1 would be necessary to strengthen the data on how hydroxylates regulate chromatin loading and interactions with B56/PP2A.

      We do not agree that we rely solely on analysis of the single site pro-ala mutatin in RepoMan for our conclusions, since we also present a raft of additional experimental evidence, including knock-down data and experiments using both fumarate and FG. We would also reference the data we present on RepoMan in the parallel study by Jiang et al, which has also been reviewed by eLife and is currently available on biorxiv (doi: https://doi.org/10.1101/2025.05.06.652400). Of course we agree with the reviewer that even although the muatnt RepoMan features only a single amino acid change, this could still result in undetermined structural effects on the RepoMan protein that could conceivably contribute, at least in part, to some of the phenotypic effects observed. Hopefully future studies will help to clarify this.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Thank you for the extensive response to my comments and questions.

      Reviewer #2 (Recommendations for the authors):

      (1) The Fmr1/Fxr2 double KO mice are not well described in the Introduction.

      We have changed the sentence in the introduction to clarify that in Zhang et al ., 2008 they used a mouse lacking both the Fmr1 gene and its paralog Fxr2.

      (3) The Authors decided not to discuss the potential translation of the present study to human patients, despite their final conclusion statement.

      The paragraph below has been added to the end of the discussion:

      “Translational Implications”

      The present findings support the view that circadian disruption is not merely a downstream consequence of disease processes but actively contributes to symptom expression. Hence, the possibility that interventions designed to reinforce circadian rhythms can hold therapeutic value for individuals with FXS and related neurodevelopmental conditions. Given that sleep and circadian dysfunction are detectable early in development and are predictive of more severe clinical phenotypes, circadian-based interventions may be particularly beneficial if applied during periods of heightened neural plasticity. Importantly, time-restricted feeding represents a relatively low-cost, non-invasive strategy that could be feasibly implemented in realworld settings. Further translational work is needed to evaluate whether the mechanistic links identified here—between circadian misalignment, immune dysregulation, and behavioral impairments—are conserved in humans, and similar approaches can be implemented for clinical use.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript by Li and colleagues describes the impact of deficiency on the DKGα and ζ on Treg cells and follicular responses. The experimental approach is based on the characterization of double KO mice that show the emergence of autoimmune manifestations that include the production of autoantibodies. Additionally, there is an increase in Tfh cells, but also Tfr cells in these mice deficient in both DKGα and ζ. Although the observations are interesting, the interpretation of the observations is difficult in the absence of data related to single mutations. While a supplementary figure shows that the autoimmune manifestations are more severe in the DKGα and ζ deficient mice, prior observations show that a single DKGα deficiency has an impact on Treg homeostasis. As such, the contribution of the two chains to the overall phenotype is hard to establish.

      Strengths:

      Well-conducted experiments with informative mouse models with defined genetic defects.

      Weaknesses:

      The major weakness is the lack of clarity concerning what can be attributed to simultaneous DKGα and ζ deficiency versus deficiency on DKGα or ζ alone.

      Some interpretations are also not conclusively supported by data.

      We appreciate the reviewer 1’s positive comments about our manuscript and for the suggestion to include DGKα‑ or DGKζ‑single‑knockout (SKO) Tregs for the mechanistical studies. Unfortunately, performing this sound simple but truly extensive experiment would exceed our current budget and personnel capacity. Importantly, it is well known that DGKα and DGKζ act redundantly or synergistically in T cells, with single loss producing minimal or partial phenotypes compared with the double knockout. The comprehensive mechanistic data already presented for DGKαζ‑DKO Tregs therefore capture the combined functional and mechanistical deficit that is most relevant to DGK functions in Treg biology, and they support the conclusions drawn in this manuscript. The reviewer also pointed out some interpretation issues such as CD25 down regulation in Tfr cells and some minor issues. We appreciate the reviewer’s expertise and have revised the text and discussion accordingly.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Li et al investigate the combined role of diacylglycerol (DAG) kinases (DGK) α and ζ in Foxp3+ Treg cells function that prevent autoimmunity. The authors generated DGK α and ζ Treg-specific double knockout mice (DKO) by crossing Dgkalpha-/- mice to DgKzf and Foxp3YFPCre/+ mice. The resulting "DKO" mice thus lack DGK α in all cells and DGK ζ in Foxp3+Treg cells. The authors show that the DKO mice spontaneously develop autoimmunity, characterized by multiorgan inflammatory infiltration and elevated anti-double-strand DNA (dsDNA), -single-strand DNA (ssDNA), and -nuclear autoantibodies. The authors attribute the DKO mice phenotype to Foxp3+Treg dysfunction, including accelerated conversion into "exTreg" cells with pathogenic activity. Interestingly, the combined deficiency of DGK α and ζ seems to release Treg cell dependence on CD28-mediated costimulatory signals, which the authors show by crossing their DKO mice to CD28-/- mice (TKO mice), which also develop autoimmunity.

      Strengths:

      The phenotypes of the mutant mice described in the manuscript are striking, and the authors provide a comprehensive analysis of the functional processes altered by the lack of DGKs.

      Weaknesses:

      One aspect that could be better explored is the direct role of "ex-Tregs" in causing pathogenesis in the models utilized.

      However, overall, this is an important report that makes a significant addition to the understanding of DAG kinases in Treg cell biology.

      We greatly appreciate reviewer 2’s positive comments about the manuscript. The data we presented in the manuscript show that DGKαζDKO Tregs but not WT Tregs are able to trigger autoimmunity in T cell deficient mice in the presence of WT CD4 T cells support that DGKαζDKO Tregs are pathogenic. Reviewer 2 suggested to test the direct role of DGKαζDKO Treg/ex-Tregs in the pathogenesis of autoimmune diseases in the absence of conventional T cells. This is really an interesting idea that we will test it in the future should recourse for executing the experiment become available.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      When you search for something, you need to maintain some representation (a "template") of that target in your mind/brain. Otherwise, how would you know what you were looking for? If your phone is in a shocking pink case, you can guide your attention to pink things based on a target template that includes the attribute 'pink'. That guidance should get you to the phone pretty effectively if it is in view. Most real-world searches are more complicated. If you are looking for the toaster, you will make use of your knowledge of where toasters can be. Thus, if you are asked to find a toaster, you might first activate a template of a kitchen or a kitchen counter. You might worry about pulling up the toaster template only after you are reasonably sure you have restricted your attention to a sensible part of the scene.

      Zhou and Geng are looking for evidence of this early stage of guidance by information about the surrounding scene in a search task. They train Os to associate four faces with four places. Then, with Os in the scanner, they show one face - the target for a subsequent search. After an 8 sec delay, they show a search display where the face is placed on the associated scene 75% of the time. Thus, attending to the associated scene is a good idea. The questions of interest are "When can the experimenters decode which face Os saw from fMRI recording?" "When can the experimenters decode the associated scene?" and "Where in the brain can the experimenters see evidence of this decoding? The answer is that the face but not the scene can be read out during the face's initial presentation. The key finding is that the scene can be read out (imperfectly but above chance) during the subsequent delay when Os are looking at just a fixation point. Apparently, seeing the face conjures up the scene in the mind's eye.

      This is a solid and believable result. The only issue, for me, is whether it is telling us anything specifically about search. Suppose you trained Os on the face-scene pairing but never did anything connected to the search. If you presented the face, would you not see evidence of recall of the associated scene? Maybe you would see the activation of the scene in different areas and you could identify some areas as search specific. I don't think anything like that was discussed here.

      You might also expect this result to be asymmetric. The idea is that the big scene gives the search information about the little face. The face should activate the larger useful scene more than the scene should activate the more incidental face, if the task was reversed. That might be true if the finding is related to a search where the scene context is presumed to be the useful attention guiding stimulus. You might not expect an asymmetry if Os were just learning an association.

      It is clear in this study that the face and the scene have been associated and that this can be seen in the fMRI data. It is also clear that a valid scene background speeds the behavioral response in the search task. The linkage between these two results is not entirely clear but perhaps future research will shed more light.

      It is also possible that I missed the clear evidence of the search-specific nature of the activation by the scene during the delay period. If so, I apologize and suggest that the point be underlined for readers like me.

      We have added text related to this issue, particularly in the discussion (page 19, line 6), and have also added citations of studies in humans and non-human primates showing a causal relationship between preparatory activity in prefrontal and visual cortex and visual search performance (page 6, line 16).

      Reviewer #2 (Public review):

      Summary:

      This work is one of the best instances of a well-controlled experiment and theoretically impactful findings within the literature on templates guiding attentional selection. I am a fan of the work that comes out of this lab and this particular manuscript is an excellent example as to why that is the case. Here, the authors use fMRI (employing MVPA) to test whether during the preparatory search period, a search template is invoked within the corresponding sensory regions, in the absence of physical stimulation. By associating faces with scenes, a strong association was created between two types of stimuli that recruit very specific neural processing regions - FFA for faces and PPA for scenes. The critical results showed that scene information that was associated with a particular cue could be decoded from PPA during the delay period. This result strongly supports the invoking of a very specific attentional template.

      Strengths:

      There is so much to be impressed with in this report. The writing of the manuscript is incredibly clear. The experimental design is clever and innovative. The analysis is sophisticated and also innovative. The results are solid and convincing.

      Weaknesses:

      I only have a few weaknesses to point out.<br /> This point is not so much of a weakness, but a further test of the hypothesis put forward by the authors. The delay period was long - 8 seconds. It would be interesting to split the delay period into the first 4seconds and the last 4seconds and run the same decoding analyses. The hypothesis here is that semantic associations take time to evolve, and it would be great to show that decoding gets stronger in the second delay period as opposed to the period right after the cue. I don't think this is necessary for publication, but I think it would be a stronger test of the template hypothesis.

      We conducted the suggested analysis, and we did not find clear evidence of differences in decoding scene information between the earlier and later portions of the delay period. This may be due to insufficient power when the data are divided, individual differences in when preparatory activation is the strongest, or truly no difference in activation over the delay period. More details of this analysis can be found in the supplementary materials (page 12, line 16; Figure S1).

      Type in the abstract "curing" vs "during."

      Fixed.

      It is hard to know what to do with significant results in ROIs that are not motivated by specific hypotheses. However, for Figure 3, what are the explanations for ROIs that show significant differences above and beyond the direct hypotheses set out by the authors?

      We added reasoning for the other a priori ROIs in the introduction (page 4, line 26). There is substantial evidence suggesting that frontoparietal areas are involved in cognitive control, attentional control, and working memory. The ROIs we selected from frontal and parietal cortex are based on parcels within resting state networks defined by the s17-network atlases (Schaefer et al., 2018). The IFJ was defined by the HCP-MMP1 (Glasser et al., 2016). These regions are commonly used in studies of attention and cognitive control, and the exact ROIs selected are described in the section on “Regions of interest (ROI) definition”. While we have the strongest hypothesis for IFJ based on relatively recent work from the Desimone lab, the other ROIs in lateral frontal cortex and parietal cortex, are also well documented in similar studies, although the exact computation being done by these regions during tasks can be hard to differentiate with fMRI.\

      Reviewer #3 (Public review):

      The manuscript contains a carefully designed fMRI study, using MVPA pattern analysis to investigate which high-level associate cortices contain target-related information to guide visual search. A special focus is hereby on so-called 'target-associated' information, that has previously been shown to help in guiding attention during visual search. For this purpose the author trained their participants and made them learn specific target-associations, in order to then test which brain regions may contain neural representations of those learnt associations. They found that at least some of the associations tested were encoded in prefrontal cortex during the cue and delay period.

      The manuscript is very carefully prepared. As far as I can see, the statistical analyses are all sound and the results integrate well with previous findings.

      I have no strong objections against the presented results and their interpretation.

      Reviewer #1 (Recommendations for the authors):

      One bit of trivia. In the abstract, you should define IFJ on its first appearance in the text. You get to that a bit later.

      Fixed.

      Reviewer #2 (Recommendations for the authors):

      I really don't have much to suggest, as I thought that this was a clearly written report that offered a clever paradigm and data that supported the conclusions. My only suggestion would be to split the delay period activity and test whether the strength of the template evolves over time. Even though fMRI is not the best tool for this, still you would predict stronger decoding in the second half of the delay period

      Please see above for our response to the same comment.

      Reviewer #3 (Recommendations for the authors):

      I would just like to point out some minor aspects that might be worth improving before publishing this work.

      Abstract: While in general, the writing is clear and concise, I felt that the abstract of the manuscript was particularly hard to follow, probably because the authors at some point re-arranged individual sentences. For example, they write in line 12 about 'the preparatory period', but explain only in the following sentence that the preparatory period ensues 'before search begins'. This made it a bit hard to follow the overall logic and I think could easily be fixed. 

      We have addressed this comment and updated the abstract.

      Also in the abstract: 'The CONTENTS of the template typically CONTAIN...' sounds weird, no? Also, 'information is used to modulate sensory processing in preparation for guiding attention during search' sounds like a very over-complicated description of attentional facilitation. I'm not convinced either whether the sequence is correct here. Is the information really used to (first) modulate sensory processing (which is a sort of definition of attention in itself) to (then) prepare the guidance of attention in visual search?

      We have addressed this comment and updated the abstract.

      The sentence in line 7, 'However, many behavioral studies have shown that target-associated information is used to guide attention,...' (and the following sentence) assumes that the reader is somewhat familiar with the term 'target-associations'. I'm afraid that, for a naive reader, this term may only become fully understandable once the idea is introduced a bit later when mentioning that participants of the study were trained on face-scene pairings. I think it could help to give some very short explanation of 'target-associations' already when it is first mentioned. The term 'statistically co-occurring object pairs', for example, could be of great help here.

      Thank you for the suggestion. We have added it to the abstract.

      page 2, line 22: 'prefrotnal'

      Fixed.

      page 2, line 24/25: 'information ... can SUPPLANT (?) ... information'. (That's also a somewhat unfortunate repetition of 'information')

      Fixed.

      page 4, line 23-25: 'Working memory representations in lateral prefrontal and parietal regions are engaged in cognitive control computations that ARE (?) task non-specific but essential to their functioning'

      Fixed.

      page 7, line 1: maybe a comma before 'suggesting'?

      Fixed.

      page 7, line 14-16: Something seems wrong with this sentence: 'The distractor face was a race-gender match, which we previously FOUND MADE (?) target discrimination difficult enough to make the scene useful for guiding attention'

      We have addressed this comment and rewritten this part (now on page 7, line 18).

      Results / Discussion sections:

      In several figures, like in Fig3A, the three different IFJ regions, are grouped separately from the other frontal areas, which makes sense given the special role IFJ plays for representing task-related templates. However, IFJ is still part of PFC. I think it would be more correct to group the other frontal areas (like FEF vLPFC etc.) as 'Other Frontal' or even 'Other PFC'.

      We have made the changes based on the reviewer’s suggestion.

      In some of the Figures, e.g. Fig 3 and 5, I had the impression that the activation patterns of some conditions in vLPFC were rather close to the location of IFJ, which is just a bit posterior. I think I remember that functional localisers of IFJ can actually vary quite a bit in localisation (see e.g. in the Baldauf/Desimone paper). Also, I think it has been shown in the context of other regions, like the human FEF that its position when defined by localisation tasks is not always nicely and fully congruent with the respective labels in an atlas like the Glasser atlas. It might help to take this in consideration when discussing the results, particularly since the term vLPFC is a rather vague collection of several brain parcels and not a parcel name in the Glasser atlas. Some people might even argue that vLPFC in the broad sense contains IFJ, similar to how 'Frontal' contains IFJ (see above). How strong of a point do the authors want to make about activation in IFJ versus in vlPFC?

      We have now added text discussing the inability to truly differentiate between subregions of IFJ and other parts of vLPFC in the methods section on ROIs (page 25, line 13) and in the discussion (page 18, line 25). However, one might think that it is even more surprising given the likely imprecision of ROI boundaries that we see distinct patterns between the subregions of IFG defined by Glasser HCP-MMP1 and the other vLPFC regions defined by the 17-network atlases. We do not wish to overstate the precision of IFJ regions, but note the ROI results within the context of the larger literature. We are sure that our findings will have to be reinterpreted when newer methods allow for better localization of functional subregions of the vLPFC in individuals.

      Given that the authors nicely explain in the introduction how important templates are in visual search, and given that FEF has such an important role in serially guiding saccades through visual search templates, I think it would be worth discussing the finding that FEF did not hold representation of these targets. Of course, this could be in part due to the specific task at hand, but it may still be interesting to note in the Discussion section that here FEF, although important for some top-down attention signals, did not keep representations of the 'search' templates. Is it because there is no spatial component to the task at hand (like proposed in Bedini 2021)?

      We have now added text directly addressing this point and citing the Bedini et al. paper in the discussion (page 18, line 18). Besides our current findings, the relationship between IFJ and FEF is really interesting and will hopefully be investigated more in the future.

      Page 18, line 5: 'we the(N) associated...'

      Fixed.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      In this study by Li et al., the authors re-investigated the role of cDC1 for atherosclerosis progression using the ApoE model. First, the authors confirmed the accumulation of cDC1 in atherosclerotic lesions in mice and humans. Then, in order to examine the functional relevance of this cell type, the authors developed a new mouse model to selectively target cDC1. Specifically, they inserted the Cre recombinase directly after the start codon of the endogenous XCR1 gene, thereby avoiding off-target activity. Following validation of this model, the authors crossed it with ApoE-deficient mice and found a striking reduction of aortic lesions (numbers and size) following a high-fat diet. The authors further characterized the impact of cDC1 depletion on lesional T cells and their activation state. Also, they provide in-depth transcriptomic analyses of lesional in comparison to splenic and nodal cDC1. These results imply cellular interactions between lesion T cells and cDC1. Finally, the authors show that the chemokine XCL1, which is produced by activated CD8 T cells (and NK cells), plays a key role in the interaction with XCR1-expressing cDC1 and particularly in the atherosclerotic disease progression.<br /> Strengths:

      The surprising results on XCL1 represent a very important gain in knowledge. The role of cDC1 is clarified with a new genetic mouse model.

      Thank you

      Weaknesses:

      My criticism is limited to the analysis of the scRNAseq data of the cDC1. I think it would be important to match these data with published data sets on cDC1. In particular, the data set by Sophie Janssen's group on splenic cDC1 might be helpful here (PMID: 37172103; https://www.single-cell.be/spleen_cDC_homeostatic_maturation/datasets/cdc1). It would be good to assign a cluster based on the categories used there (early/late, immature/mature, at least for splenic DC).

      Thank you very much for your help. Using the scRNA seq data of Xcr1<sup>+</sup> cDC1 sorted from ApoE<sup>–/–</sup> mice, we re-annotated the populations, following the methodology proposed by Sophie Janssen's group. These results are presented in Figure S9 and Figure S10 and described in detail in the Results and Discussion section.

      Please refer to the Results section from line 264 to 284: “Using the scRNA seq data of Xcr1<sup>+</sup> cDC1 sorted from hyperlipidemic mice, we annotated the 10 populations as shown in Figure S9A, following the methodology from a previous study [41]. Ccr7<sup>+</sup> mature cDC1s (Cluster 3, 7 and 9) and Ccr7- immature cDC1s (remaining clusters) were identified across cDC1 cells sorted from aorta, spleen and lymph nodes (Figure S9B). Further stratification based on marker genes reveals that Cluster 10 is the pre-cDC1, with high expression level of CD62L (Sell) and low expression level of CD8a (Figure S9C). Cluster 6 and 8 are the proliferating cDC1s, which express high level of cell cycling genes Stmn1 and Top2a (Figure S9D). Cluster 1 and 4 are early immature cDC1s, and cluster 2 and 5 are late immature cDC1s, according to the expression pattern of Itgae, Nr4a2 (Figure S9E). Cluster 9 cells are early mature cDC1s, with elevated expression of Cxcl9 and Cxcl10 (Figure S9F). Cluster 3 and 7 as late mature cDC1s, characterized by the expression of Cd63 and Fscn1 (Figure S9G). As shown in Figure 5C and Figure S9, the 10 populations displayed a major difference of aortic cDC1 cells that lack in pre-cDC1s (cluster 10) and mature cells (cluster 3, 7 and 9). Interestingly, in hyperlipidemic mice splenic cDC1 possess only Cluster 3 as the late mature cells while the lymph node cDC1 cells have two late mature populations namely Cluster 3 and Cluster 7. In further analysis, we also compared splenic cDC1 cells from HFD mice to those from ND mice. As shown in Figure S10, HFD appears to impact early immature cDC1-1 cells (Cluster 1) and increases the abundance of late immature cDC1 cells (Cluster 2 and 5), regardless of the fact that all 10 populations are present in two origins of samples. We also found that Tnfaip3 and Serinc3 are among the most upregulated genes, while Apol7c and Tifab are downregulated in splenic cDC1 cells sorted from HFD mice”.  

      Please refer to the Discussion section from line 380 to 385: “Based on the maturation analysis of the cDC1 scRNA seq data [41], our findings suggest that the aortic cDC1 cells display a major difference from those of spleen and lymph nodes by lacking the mature clusters, whereas lymph node cDC1 cells contain an additional Fabp5<sup>+</sup> S100a4<sup>+</sup> late mature Cluster. Our results also suggest that hyperlipidemia contributes to alteration in early immature cDC1 and in the abundance of late immature cDC1 cells, which was associated with dramatic change in gene expression of Tnfaip3, Serinc3, Apol7c and Tifab”.

      Reviewer #2 (Public review):

      This study investigates the role of cDC1 in atherosclerosis progression using Xcr1Cre-Gfp Rosa26LSL-DTA ApoE-/- mice. The authors demonstrate that selective depletion of cDC1 reduces atherosclerotic lesions in hyperlipidemic mice. While cDC1 depletion did not alter macrophage populations, it suppressed T cell activation (both CD4+ and CD8+ subsets) within aortic plaques. Further, targeting the chemokine Xcl1 (ligand of Xcr1) effectively inhibits atherosclerosis. The manuscript is well-written, and the data are clearly presented. However, several points require clarification:

      (1) In Figure 1C (upper plot), it is not clear what the Xcr1 single-positive region in the aortic root represents, or whether this is caused by unspecific staining. So I wonder whether Xcr1 single-positive staining can reliably represent cDC1. For accurate cDC1 gating in Figure 1E, Xcr1+CD11c+ co-staining should be used instead.

      The observed false-positive signal in the wavy structures within immunofluorescence Figure 1C (upper panel) results from the strong autofluorescence of elastic fibers, a major vascular wall component (alongside collagen). This intrinsic property of elastic fibers is a well-documented confounder in immunofluorescence studies [A, B].

      In contrast, immunohistochemistry (IHC) employs an enzymatic chromogenic reaction (HRP with DAB substrate) that generates a brown precipitate exclusively at antigen-antibody binding sites. Importantly, vascular elastic fibers lack endogenous enzymatic activity capable of catalyzing the DAB reaction, thereby preventing this source of false positivity in IHC.

      Given that Xcr1 is exclusively expressed on conventional type 1 dendritic cells [C], and considering that IHC lacks the multiplexing capability inherent to immunofluorescence for antigen co-localization, single-positive Xcr1 staining reliably identifies cDC1s in IHC results.

      [A] König, K et al. “Multiphoton autofluorescence imaging of intratissue elastic fibers.” Biomaterials vol. 26,5 (2005): 495-500. doi:10.1016/j.biomaterials.2004.02.059

      [B] Andreasson, Anne-Christine et al. “Confocal scanning laser microscopy measurements of atherosclerotic lesions in mice aorta. A fast evaluation method for volume determinations.” Atherosclerosis vol. 179,1 (2005): 35-42. doi:10.1016/j.atherosclerosis.2004.10.040

      [C] Dorner, Brigitte G et al. “Selective expression of the chemokine receptor XCR1 on cross-presenting dendritic cells determines cooperation with CD8+ T cells.” Immunity vol. 31,5 (2009): 823-33. doi:10.1016/j.immuni.2009.08.027

      (2) Figure 4D suggests that cDC1 depletion does not affect CD4+/CD8+ T cells. However, only the proportion of these subsets within total T cells is shown. To fully interpret effects, the authors should provide:

      (a) Absolute numbers of total T cells in aortas.

      (b) Absolute counts of CD4+ and CD8+ T cells.

      Thanks for your suggestions. We agree that assessing both proportions and absolute numbers in Figure 4 provides a more complete picture of the effects of cDC1 depletion on T cell populations. Furthermore, we also add the absolute count of cDC1 cells and total T cells, and CD44 MFI (mean fluorescence intensity) in CD4<sup>+</sup> and CD8<sup>+</sup> T cells in Figure 4, and supplemented corresponding textual descriptions in the revised manuscript.

      Please refer to the Results section from line 183 to 187: “Subsequently, we assessed T cell phenotype in the two groups of mice. While neither the frequencies nor absolute counts of aortic CD4<sup>+</sup> and CD8<sup>+</sup> T cells differed significantly between two groups of mice (Figure 4D-F), CD69 frequency and CD44 MFI (Mean Fluorescence Intensity), the T cell activation markers, were significantly reduced in both CD4<sup>+</sup> and CD8<sup>+</sup> T cells from Xcr1<sup>+</sup> cDC1 depleted mice compared to controls (Figure 4G and H)”.

      (3) How does T cell activation mechanistically influence atherosclerosis progression? Why was CD69 selected as the sole activation marker? Were other markers (e.g., KLRG1, ICOS, CD44) examined to confirm activation status?

      We sincerely appreciate these insightful comments. As extensively documented in the literature, activated effector T cells (both CD4+ and CD8+) critically promote plaque inflammation and instability through their production of pro-inflammatory cytokines (particularly IFN-γ and TNF-α), which drive endothelial activation, exacerbate macrophage inflammatory responses, and impair smooth muscle cell function [A].

      In our study, we specifically investigated the role of cDC1 cells in atherosclerosis progression. Our key findings demonstrate that cDC1 depletion attenuates T cell activation (as shown by reduced CD69/CD44 expression) and that this reduction in activation is functionally linked to the observed decrease in atherosclerosis burden in our model. 

      Regarding CD44 as an activation marker, we performed quantitative analyses of CD44 mean fluorescence intensity (MFI) in aortic T cells (Figure 4). Importantly, the MFI of CD44 was significantly lower on both CD4+ and CD8+ T cells from Xcr1<sup>Cre-Gfp</sup> Rosa26<sup>LSL-DTA</sup> ApoE<sup>–/–</sup> mice compared to the control ApoE<sup>–/–</sup> mice (data shown below), which is consistent with the result of CD69 in Figure 4. We added the related description in the Result section.

      Please refer to the Results section from line 185 to 187 “CD69 frequency and CD44 MFI (Mean Fluorescence Intensity), the T cell activation markers, were significantly reduced in both CD4+ and CD8+ T cells from Xcr1+ cDC1 depleted mice compared to controls (Figure 4G and H)”.

      Similarly, MFI of CD44 was significantly lower on both CD4<sup>+</sup> and CD8<sup>+</sup> T cells from Xcl1<sup>–/–</sup> ApoE<sup>–/–</sup> mice compared to the control ApoE<sup>–/–</sup> mice (data shown below), which is consistent with the result of CD69 in Figure 7. We also added the related description in the Result section.

      Please refer to the Results section from line 308 to 309 “Crucially, CD69<sup>+</sup> frequency and CD44 MFI remained comparable in both aortic CD4<sup>+</sup> and CD8<sup>+</sup> T cells between two groups (Figure 7D-F).”

      [A] Hansson, Göran K, and Andreas Hermansson. “The immune system in atherosclerosis.” Nature immunology vol. 12,3 (2011): 204-12. doi:10.1038/ni.2001

      (4) Figure 7B: Beyond cDC1/2 proportions within cDCs, please report absolute counts of: Total cDCs, cDC1, and cDC2 subsets. Figure 7D: In addition to CD4+/CD8+ T cell proportions, the following should be included:

      (a) Total T cell numbers in aortas

      (b) Absolute counts of CD4+ and CD8+ T cells.

      Thanks for your suggestions. We have now included in Figure 7 the absolute counts of cDC, cDC1, and cDC2 cells, along with CD4<sup>+</sup> and CD8<sup>+</sup> T cells in aortic tissues. Additionally, we provide the corresponding CD44 mean fluorescence intensity (MFI) measurements for both CD4<sup>+</sup> and CD8<sup>+</sup> T cell populations. We added the related description in the Result section.

      Please refer to the Results section from line 303 to 311: “The flow cytometric results illustrated that both frequencies and absolute counts of Xcr1<sup>+</sup> cDC1 cells in the aorta were significantly reduced, but cDCs and cDC2 cells from Xcl1<sup>–/–</sup> ApoE<sup>–/–</sup> were comparable with that from ApoE<sup>–/–</sup> (Figure 7A-C). Moreover, in both lymph node and spleen, the absolute numbers of pDC, cDC1 and cDC2 from Xcl1<sup>–/–</sup> ApoE<sup>–/–</sup> were comparable with that from ApoE<sup>–/–</sup> (Figure S11). Crucially, CD69<sup>+</sup> frequency and CD44 MFI remained comparable in both aortic CD4<sup>+</sup> and CD8<sup>+</sup> T cells between two groups (Figure 7D-F). However, aortic CD8<sup>+</sup> T cells exhibited reduced frequency and absolute count, while CD4<sup>+</sup> T cells showed increased frequency but unchanged counts in Xcl1<sup>–/–</sup> ApoE<sup>–/–</sup> mouse versus controls (Figure 7G and H).”

      (5) cDC1 depletion reduced CD69+CD4+ and CD69+CD8+ T cells, whereas Xcl1 depletion decreased Xcr1+ cDC1 cells without altering activated T cells. How do the authors explain these different results? This discrepancy needs explanation.

      We sincerely appreciate your professional and insightful comments regarding the mechanistic relationship between cDC1 depletion and T cell activation. Direct cDC1 depletion in the Xcr1<sup>Cre-Gfp</sup> Rosa26<sup>LSL-DTA</sup> ApoE<sup>–/–</sup> micmodel removes both recruited and tissue-resident cDC1s, eliminating their multifunctional roles in antigen presentation, co-stimulation and cytokine secretion essential for T cell activation. In contrast, Xcl1 depletion reduces, but does not eliminate cDC1 migration into plaques. Furthermore, alternative chemokine axes (e.g., CCL5/CCR5, CXCL9/CXCR3, BCL9/BCL9L) may partially rescue cDC1 recruitment [13, 68, 69], and non-cDC1 APCs (e.g., monocytes, cDC2s) may compensate for T cell activation [55, 70]. We emphasize that Xcl1 depletion specifically failed to alter T cell activation in hyperlipidemic ApoE<sup>–/–</sup> mice. However, its impact may differ in other pathophysiological contexts due to compensatory mechanisms. We thank you again for highlighting this nuance, which strengthens our mechanistic interpretation. We have added these points to the discussion section and included new references.

      Please refer to the Discussion section from line 407 to 413: “Notably, while complete ablation of Xcr1<sup>+</sup> cDC1s impaired T cell activation, reduction of Xcr1<sup>+</sup> cDC1 recruitment via Xcl1 deletion did not significantly compromise this process. This discrepancy may arise through compensatory mechanisms: alternative chemokine axes (e.g., CCL5/CCR5, CXCL9/CXCR3, BCL9/BCL9L) may partially rescue Xcr1<sup>+</sup> cDC1 homing [13, 68, 69], while non-cDC1 antigen-presenting cells (e.g., monocytes, cDC2s) may sustain T cell activation [55, 70]. Furthermore, tissue-specific microenvironment factors could potentially modulate its role in other diseases.”. [13] Eisenbarth, S C. “Dendritic cell subsets in T cell programming: location dictates function.” Nature reviews. Immunology vol. 19,2 (2019): 89-103. doi:10.1038/s41577-018-0088-1 [55] Brewitz, Anna et al. “CD8+ T Cells Orchestrate pDC-XCR1+ Dendritic Cell Spatial and Functional Cooperativity to Optimize Priming.” Immunity vol. 46,2 (2017): 205-219. doi:10.1016/j.immuni.2017.01.003 [68] de Oliveira, Carine Ervolino et al. “CCR5-Dependent Homing of T Regulatory Cells to the Tumor Microenvironment Contributes to Skin Squamous Cell Carcinoma Development.” Molecular cancer therapeutics vol. 16,12 (2017): 2871-2880. doi:10.1158/1535-7163.MCT-17-0341.[69] He F, Wu Z, Liu C, Zhu Y, Zhou Y, Tian E, et al. Targeting BCL9/BCL9L enhances antigen presentation by promoting conventional type 1 dendritic cell (cDC1) activation and tumor infiltration. Signal Transduct Target Ther. 2024;9(1):139. Epub 2024/05/30. doi: 10.1038/s41392-024-01838-9. PubMed PMID: 38811552; PubMed Central PMCID: PMCPMC11137111.[70] Böttcher, Jan P et al. “Functional classification of memory CD8(+) T cells by CX3CR1 expression.” Nature communications vol. 6 8306. 25 Sep. 2015, doi:10.1038/ncomms9306.

      Reviewer #1 (Recommendations for the authors):

      (1) Line 32 - The authors might want to add that the mouse model leads to a "constitutive" depletion of cDC1.

      Thanks for your advice, we have revised the sentence as follows.

      Please refer to the Results section from line 31 to 33: “we established Xcr1<sup>Cre-Gfp</sup> Rosa26<sup>LSL-DTA</sup> ApoE<sup>–/–</sup> mice, a novel and complex genetic model, in which cDC1 was constitutively depleted in vivo during atherosclerosis development”.

      (2) Line 187-188: The authors claim that T cell activation was "inhibited" if cDC1 was depleted. The data shows that the T cells were less activated, but there is no indication of any kind of inhibition; this should be corrected.

      Thanks for your advice, we have revised the sentence as follows.

      Please refer to the Results section from line 183 to 187: “Subsequently, we assessed T cell phenotype in the two groups of mice. While neither the frequencies nor absolute counts of aortic CD4<sup>+</sup> and CD8<sup>+</sup> T cells differed significantly between two groups of mice (Figure 4D-F), CD69 frequency and CD44 MFI (Mean Fluorescence Intensity), the T cell activation markers, were significantly reduced in both CD4<sup>+</sup> and CD8<sup>+</sup> T cells from Xcr1<sup>+</sup> cDC1 depleted mice compared to controls (Figure 4G and H)”.

      (3) Why are some splenic DC clusters absent in LNs and vice versa? This is not obvious to this reviewer and should at least be discussed.

      We appreciate the insightful question regarding the absence of certain splenic DC clusters in LNs. This phenomenon in Figure 5 aligns with the 'division of labor' paradigm in dendritic cell biology: tissue microenvironments evolve specialized DC subsets to address local immunological challenges. The absence of universal clusters reflects functional adaptation, not technical artifacts. We acknowledge that this tissue-specific heterogeneity warrants further discussion and have expanded our analysis to address this point in the discussion part of our manuscript.

      Please refer to the Discussion section from line 375 to 385: “This pronounced tissue-specific compartmentalization of Xcr1<sup>+</sup> cDC1 subsets may related to multiple mechanisms including developmental imprinting that instructs precursor differentiation into transcriptionally distinct subpopulations [62], and microenvironmental filtering through organ-specific chemokine axes (e.g., CCL2/CCR2 in spleen) selectively recruits receptor-matched subsets [63, 64]. This spatial specialization optimizes pathogen surveillance for local immunological challenges. Based on the maturation analysis of the cDC1 scRNA seq data [41], our findings suggest that the aortic cDC1 cells display a major difference from those of spleen and lymph nodes by lacking the mature clusters, whereas lymph node cDC1 cells contain an additional Fabp5<sup>+</sup> S100a4<sup>+</sup> late mature Cluster. Our results also suggest that hyperlipidemia contributes to alteration in early immature cDC1 and in the abundance of late immature cDC1 cells, which was associated with dramatic change in gene expression of Tnfaip3, Serinc3, Apol7c and Tifab”.

      [62]. Liu Z, Gu Y, Chakarov S, Bleriot C, Kwok I, Chen X, et al. Fate Mapping via Ms4a3-Expression History Traces Monocyte-Derived Cells. Cell. 2019;178(6):1509-25 e19. Epub 2019/09/07. doi: 10.1016/j.cell.2019.08.009. PubMed PMID: 31491389.

      [63]. Bosmans LA, van Tiel CM, Aarts S, Willemsen L, Baardman J, van Os BW, et al. Myeloid CD40 deficiency reduces atherosclerosis by impairing macrophages' transition into a pro-inflammatory state. Cardiovasc Res. 2023;119(5):1146-60. Epub 2022/05/20. doi: 10.1093/cvr/cvac084. PubMed PMID: 35587037; PubMed Central PMCID: PMCPMC10202633.

      [64]. Mildner A, Schonheit J, Giladi A, David E, Lara-Astiaso D, Lorenzo-Vivas E, et al. Genomic Characterization of Murine Monocytes Reveals C/EBPbeta Transcription Factor Dependence of Ly6C(-) Cells. Immunity. 2017;46(5):849-62 e7. Epub 2017/05/18. doi: 10.1016/j.immuni.2017.04.018. PubMed PMID: 28514690.

      [41]. Bosteels V, Marechal S, De Nolf C, Rennen S, Maelfait J, Tavernier SJ, et al. LXR signaling controls homeostatic dendritic cell maturation. Sci Immunol. 2023;8(83):eadd3955. Epub 2023/05/12. doi: 10.1126/sciimmunol.add3955. PubMed PMID: 37172103.

      (4) The authors should discuss how XCL1 could impact lesional cDC1 and T cell abundance. Notably, preDCs do not express XCR1, and T cells express XCL1 following TCR activation. Is there a recruitment or local proliferation defect of cDC1 in the absence of XCL1? Could there also be a role for NK cells as a potential source of XCL1?

      We appreciate your insightful questions regarding the differential effects of Xcl1 on cDC1s and T cells. Xcl1 primarily mediates the recruitment of mature cDC1s. Our data demonstrate that Xcl1 deletion significantly reduces aortic cDC1 abundance, which correlates with a concomitant decrease in CD8<sup>+</sup> T cell numbers within the aorta. These findings strongly suggest that the Xcl1-Xcr1 axis plays a regulatory role in T cell accumulation in aortic plaques.

      Consistent with prior studies [A, B], cDC1 recruitment can occur in the absence of Xcl1 which echoes our findings that cDC1 cells were still found in Xcl1 knockout aortic plaque but in lower abundance. It is very true that further studies are required to address how the Xcl1 dependent and independent cDC1 cells activate T cells and if they possess capability of proliferation in tissue differentially. We have added these points in discussion section.

      Please refer to the Discussion section from line 407 to 415: “Notably, while complete ablation of Xcr1<sup>+</sup> cDC1s impaired T cell activation, reduction of Xcr1<sup>+</sup> cDC1 recruitment via Xcl1 deletion did not significantly compromise this process. This discrepancy may arise through compensatory mechanisms: alternative chemokine axes (e.g., CCL5/CCR5, CXCL9/CXCR3, BCL9/BCL9L) may partially rescue Xcr1<sup>+</sup> cDC1 homing [13, 68, 69], while non-cDC1 antigen-presenting cells (e.g., monocytes, cDC2s) may sustain T cell activation [55, 70]. Furthermore, tissue-specific microenvironment factors could potentially modulate its role in other diseases. In summary, our findings identify Xcl1 as a potential therapeutic target for atherosclerosis therapy, though its cellular origins and regulation of lesional Xcr1<sup>+</sup> cDC1 and T cells dynamics require further studies”.

      In literatures, Xcl1 are expressed in NK cells and subsects of T cells, and NK cells can be a potential source of Xcl1 during atherosclerosis which deserve further investigations [A, C, D].

      [A] Böttcher, Jan P et al. “NK Cells Stimulate Recruitment of cDC1 into the Tumor Microenvironment Promoting Cancer Immune Control.” Cell vol. 172,5 (2018): 1022-1037.e14. doi:10.1016/j.cell.2018.01.004

      [B] He, Fenglian et al. “Targeting BCL9/BCL9L enhances antigen presentation by promoting conventional type 1 dendritic cell (cDC1) activation and tumor infiltration.” Signal transduction and targeted therapy vol. 9,1 139. 29 May. 2024, doi:10.1038/s41392-024-01838-9

      [C] Woo, Yeon Duk et al. “The invariant natural killer T cell-mediated chemokine X-C motif chemokine ligand 1-X-C motif chemokine receptor 1 axis promotes allergic airway hyperresponsiveness by recruiting CD103+ dendritic cells.” The Journal of allergy and clinical immunology vol. 142,6 (2018): 1781-1792.e12. doi:10.1016/j.jaci.2017.12.1005

      [D] Winkels, Holger et al. “Atlas of the Immune Cell Repertoire in Mouse Atherosclerosis Defined by Single-Cell RNA-Sequencing and Mass Cytometry.” Circulation research vol. 122,12 (2018): 1675-1688. doi:10.1161/CIRCRESAHA.117.312513

      Reviewer #2 (Recommendations for the authors):

      There is a logical error in line 298. I suggest revising to: "Collectively, these data suggest that Xcl1 promotes atherosclerosis by recruiting Xcr1+ cDC1 cells, which subsequently drive T cell activation in lesions."

      Thanks for your advice. Since Xcl1 deficiency reduced both the frequencies and absolute counts of Xcr1+ cDC1 and CD8+ T cells in lesions without affecting T cell activation, we revised the sentence as you suggested.

      Please refer to the Results section from line 314 to 315: “Collectively, these data suggest that Xcl1 promotes atherosclerosis by recruiting Xcr1<sup>+</sup> cDC1 cells, and facilitating CD8<sup>+</sup> T cell accumulation in lesions”.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Major points

      (1) The authors discovered a novel regulation of the Hippo-YAP pathway by SARS-CoV-2 infection but did not address the pathological significance of this finding. It remains unclear why YAP downstream gene transcription needs to be inhibited in response to SARS-CoV-2 infection. Is this inhibition crucial for the innate immune response to SARS-CoV-2? The authors should re-analyze their snRNA-seq and bulk RNA-seq data described in Figure 1 to determine whether any of the affected YAP downstream genes are involved in this process.

      We appreciate the reviewer’s suggestion to clarify the pathological significance of YAP pathway inhibition in SARS-CoV-2 infection. To address this, we re-analyzed our snRNA-seq and bulk RNA-seq datasets to determine whether YAP target genes overlap with known mediators of the innate immune response. As described in Fig. 1C, bulk RNA-seq revealed decreased expression of multiple YAP downstream targets linked to innate immune regulation (e.g., Thbs1, Ccl2, Axl, and Csf1) in SARS-CoV-2–infected cells in vitro.

      snRNA-seq of alveolar type I (AT1) cells from COVID-19 patients revealed a more complex landscape: While we observed reduced YAP activity overall (Fig. 1G), multiple YAP target genes involved in innate immunity and cytokine signaling were paradoxically elevated (Supplemental Fig. 1E). Several factors likelt explain these conflicting observations: 1. In the lung, AT1 cells (which are critical for gas exchange) may cell specifically respond to virus infection by upregulating genes related to immune response by other signaling pathway(s); 2. In vivo, SARS-CoV-2 infection triggers a surge in cytokines, chemokines, and other local factors that can differentially modulate YAP binding sites and thus affect its downstream targets, a complexity not fully captured in vitro; 3. YAP is highly sensitive to mechanical signals and tissue architecture. The 3D structure of altered cell–cell junctions in infected lung tissue, and fluid shear stress in the alveolar space could shape YAP target gene transcription differently from simplified monolayer cell cultures.

      We have expanded the results section of the new version to include the above points. We also acknowledge that ongoing and future work is needed to delineate the exact molecular and tissue-specific pathways through which YAP inhibition confers a potential advantage in combating SARS-CoV-2.

      (2) The authors concluded that helicase activity is required for NSP13-induced inhibition of YAP transcriptional activity based on mutation studies (Figure 3B). This finding is somewhat confusing, as K131, K345/K347, and R567 are all essential residues for NSP13 helicase activity while mutating K131 did not affect NSP13's ability to inhibit YAP (Figure 3B). Additionally, there are no data showing exactly how NSP13 inhibits the YAP/TEAD complex through its helicase function. This point was also not reflected in their proposed working model (Figure 4H).

      We appreciate the reviewer’s concerns regarding the helicase‐dependent inhibition of YAP by NSP13, particularly the roles of K131, K345/K347, and R567. Based on published structural and biochemical studies, each of these residues uniquely supports helicase function (1): K131 is crucial for stabilizing the NSP13 stalk region by interacting with S424. Substituting K131 with alanine (K131A) reduces helicase efficiency but does not completely abolish it; K345/K347 are key DNA‐binding residues, and mutating both (K345A/K347A) largely prevents NSP13 from binding DNA, thus eliminating unwinding. R567 is critical for ATP hydrolysis, and the R567A mutant retains DNA binding capacity but fails to unwind it. In Fig. 3B, K131A suppresses YAP transactivation to nearly the same extent as wild‐type NSP13, suggesting that partial helicase activity is sufficient for complete YAP/TEAD inhibition. Conversely, the K345A/K347A and R567A mutants show markedly diminished repression, underscoring the importance of DNA binding and ATP hydrolysis.

      As the new Fig. 4J illustrates, NSP13 must bind DNA and hydrolyze ATP to unwind nucleic acids. This helicase‐dependent process likely enables NSP13 to remodel chromatin structure by binding TEAD and properly organize YAP repressors at YAP/TEAD complex to prevent YAP/TEAD transactivation. In support of this mechanism, the K345A/K347A mutant, unable to anchor to DNA, fails to repress YAP and slightly increases YAP‐driven transcription (Fig. 3B), presumably by mislocalizing YAP repressors. Likewise, the ATPase‐dead R567A can bind DNA but does not unwind and remodel chromatin to recruit YAP repressors, resulting in a loss of YAP suppression (Fig. 3B and 3F). Our revised model demonstrates that both DNA binding and ATP‐dependent unwinding are essential for NSP13 to suppress YAP transcriptional activity. We have updated the results, discussion, and model accordingly.

      (3) The proposed model that NSP13 binds TEAD4 to recruit repressor proteins and inhibits YAP/TEAD downstream gene transcription (Figure 4H) needs further characterization. Second, NSP13 is a DNA-binding protein, and its nucleic acid-binding mutant K345A/K347A failed to inhibit YAP transcriptional activity (Figure 3B). The authors should investigate whether NSP13 could bind to the TEAD binding sequence or the nearby sequence on the genome to modulate TEAD's DNA binding ability. Third, regarding the identified nuclear repressors, the authors should validate the interaction of NSP13 with the ones whose loss activates YAP transcriptional activity (Figure 4G). Lastly, why can't NSP13 bind TEAD4 in the cytoplasmic fractionation if both NSP13 and TEAD4 are detected there (Figure 3B)? This finding indicates their interaction is not a direct protein-protein interaction but is mediated by something in the nucleus, such as genomic DNA.

      (1) Low TEAD expression in HEK293T cells: Our IP-MS experiments were performed in HEK293T cells, which, according to the Human Protein Atlas, express TEAD1–4 at comparatively low levels (TEAD1: 16.5, TEAD2: 16.4, TEAD3: 4.9, TEAD4: 38.7 nTPM). In contrast, HeLa cells, where we successfully validated NSP13-mediated YAP suppression (Fig. 4H, Supplementary Fig.5B-D), show higher expression of these TEAD isoforms (TEAD1: 97.1, TEAD2: 27.3, TEAD3: 12.2, TEAD4: 48.1 nTPM). Therefore, insufficient TEAD abundance in HEK293T cells may limit the sensitivity needed to detect TEAD–NSP13 interactions in our proteomic screens.

      (2) Transience and potential DNA dependence: Our co-immunoprecipitation (co-IP) experiments (Fig. 4B, Supplementary Fig.4C-E) indicated that NSP13–TEAD4 binding is low-affinity. Under standard IP-MS conditions (which typically do not include chemical cross-linkers or nucleic acids to stabilize transient complexes), weak or short-lived interactions can be lost during washes or sample processing.

      (3). Additional supporting evidence: We carefully checked our IP-MS data and found that the well-known TEAD binding proteins, including CTBP1/2 and GATA4, were pulled down, suggesting TEAD’s absence does not rule out an NSP13–TEAD association.

      (3a) We acknowledge that our NSP13 immunoprecipitation–mass spectrometry (IP-MS) did not identify any TEAD proteins (Fig. 4G and IP-MS tables). Several factors likely contributed to this outcome:

      (3b) We sincerely appreciate the reviewer’s insightful suggestion. While we agree that mapping NSP13 occupancy at individual TEAD-binding motifs is valuable, we respectfully consider this to be beyond the scope of the current study. Biochemical and structural work on coronavirus NSP13 shows that it recognizes nucleic‑acid substrates primarily through their 5′ single‑stranded overhang and duplex architecture, not through a defined base sequence(2, 3). Accordingly, our data (Fig. 3B and 3F) indicate that DNA binding ability, rather than recognition of a specific motif, enables NSP13 to perform its helicase activity in proximity to TEAD and recruit repressors. Moreover, the DNA‑binding mutant K345A/K347A and the ATPase‑dead mutant R567A both fail to suppress YAP/TEAD transcription despite retaining the ability to interact with TEAD (Fig. 3B). These loss‑of‑function phenotypes demonstrate that NSP13’s chromatin engagement and unwinding activity, rather than sequence‑restricted targeting, are essential for repression. For these reasons, motif‑specific binding assays were not pursued in this revision, but we clarified in the discussion that NSP13’s DNA engagement is likely structural or TEAD-dependent, rather than sequence‑directed. We also highlighted this as an important avenue for future investigation.

      (3c) To validate the NSP13 interacting proteins from our IP-MS data, we generated plasmids expressing several candidates (CCT3, SMARCD1, EIF4A1, LMNA, TTF2, and YY2) and performed co-IP assays. As predicted, we confirmed the robust interaction between NSP13 and TEAD (Supplemental Fig. 5E). However, these putative nuclear repressors exhibited weak binding to NSP13 compared with TEAD4, suggesting that NSP13 associates with them indirectly, possibly as part of a larger multiprotein complex or depending on the chromatin structure, rather than via direct protein–protein interaction (Fig. 4J).

      (3d) We appreciate the reviewer’s question. To investigate whether their association might be DNA‐dependent, we performed co‐IP experiments using nuclear lysates in the presence or absence of various nucleases: Universal Nuclease (which degrades all forms of DNA and RNA), DNase I (which cleaves both single‐ and double‐stranded DNA), and RNase H (which selectively cleaves the RNA strand in RNA/DNA hybrids). Our findings revealed that nucleic acid removal did not disrupt the NSP13/TEAD4 interaction (Supplemental Fig.4E), indicating that their binding is not solely mediated by DNA or RNA.

      Reviewer #2 (Public Review):

      Specific comments and suggestions for improvement of the manuscript:

      (1) NSP13 has been reported to block, in a helicase-dependent manner, episomal DNA transcription (PMID: 37347173), raising questions about the effects observed on the data shown from the HOP-Flash and 8xGTIIC assays. It would be valuable to demonstrate the specificity of the proposed effect of NSP13 on TEAD activation by YAP (versus broad effects on reporter assays) and also to show that NSP13 reduces the function of endogenous YAP-TEAD transcriptional activity (i.e., does ectopic NSP13 expression reduce the expression of YAP induced TEAD target genes in cells).

      We appreciate the reviewer’s comments and have carefully revisited the conclusions from the published paper(4) (PMID: 37347173), which reported that NSP13 suppresses episomal DNA transcription, as evidenced by reduced Renilla luciferase (driven by the herpes simplex virus thymidine kinase promoter) and GFP expression upon co‐expression with NSP13. For our experiments, we used a dual‐luciferase assay with Renilla luciferase (under the same promoter) as an internal control. After re-examining our raw Renilla luciferase data (now provided in the supplemental Excel file “Supporting data value”), we found that while 100 ng of NSP13 did not affect Renilla luciferase levels, 400 ng of NSP13 reduced them by approximately 50% relative to the YAP5SA‐only group (Supplemental Fig.2B, Fig.3C-D). We observed a similar reduction with NSP13 truncation mutants—an outcome not fully consistent with the published study (Supplemental Fig.3D, PMID: 37347173). However, unlike their finding of robust episomal DNA suppression, our data indicate that the K345A/K347A mutant of NSP13, which lacks DNA‐binding ability, completely lost its suppressive effect (Fig.3B).

      We performed additional Notch reporter assays to address the concern that NSP13 might nonspecifically inhibit episomal DNA transcription (including the HOP‑Flash and 8×GTIIC reporters). These experiments revealed that co‑expression of NSP13 with NICD (Notch intracellular domain) does not suppress Notch signaling (Supplemental Fig. 2C), indicating that NSP13 does not globally block all reporter systems. To evaluate whether NSP13 reduces endogenous YAP‑TEAD activity, we transiently overexpressed NSP13 WT and its R567A mutant in HeLa cells. However, bulk RNA‑seq and qPCR analyses did not reveal a clear decrease in YAP target genes, possibly due to the low transfection efficiency (< 50%, Supplemental Fig.4D). Interestingly, we observed that YAP5SA was predominantly retained in the nucleus upon NSP13 or R567A co‑expression, suggesting that NSP13 (or together with its interacting partners) restricts YAP5SA cytoplasmic shuttling. Future studies will involve stable cell lines expressing NSP13 WT or R567A to better characterize the mechanisms driving YAP5SA nuclear retention and clarify how NSP13 specifically suppresses YAP activity.

      (2) While the IP-MS experiment may have revealed new regulators of TEAD activity, the data presented are preliminary and inconclusive. No interactions are validated and beyond slight changes in TEAD reporter activity following knockdown, no direct links to YAP-TEAD are demonstrated, and no link to NPS13 was shown. Also, no details are provided about the methods used for the IP-MS experiment, raising some concerns about potential false positive associations within the data.

      We appreciate the reviewer’s feedback regarding our IP-MS findings and acknowledge that additional validation is required to establish definitive links between the identified putative regulators, YAP-TEAD, and NSP13. We have taken the following steps (and plan further experiments) to address these concerns:

      (2a) Co-IP validation: Same with the answer for Reviewer #1 (3c), we generated plasmids expressing several top candidate interactors from the IP-MS data (CCT3, SMARCD1, EIF4A1, LMNA, TTF2, and YY2) and performed direct co-IP assays in a more controlled setting. The results indicated that these putative NSP13 interactors had weaker binding compared to TEAD4, implying that NSP13 may associate with them as part of a larger complex or depending on the chromatin structure rather than through a direct protein–protein interaction (Fig. 4J).

      (2b) qPCR validation: Beyond reporter assays for evaluating YAP transactivation after the candidate YAP suppressor knockdown (Fig. 4H and Supplemental Fig. 5C), we performed qPCR to detect YAP activation on endogenous YAP-TEAD target genes (e.g., CTGF CYR61, and AMOTL2) after CCT3 knockdown. Expression of CTGF and CYR61 was higher compared to control (Supplemental Fig. 5D), strengthening the case for an interaction relevant to YAP-TEAD signaling.

      (2c) To investigate how NSP13‐interacting proteins link to the YAP/TEAD complex, we examined the IP‑MS dataset and identified several well‐known YAP and TEAD binding partners, including CTBP1/2 (TEAD‐binding), GATA4 (TEAD‐binding), and multiple 14‐3‐3 isoforms (YWHAZ/YWHAB/YWHAH/YWHAQ, YAP binding). These findings suggest that NSP13 may form a larger nuclear complex with YAP/TEAD and associated cofactors. In the future, we will determine whether these putative TEAD regulators also interact with NSP13 under various conditions (e.g., in the presence or absence of DNA) and whether co‐expression of NSP13 influences their association with YAP or TEAD. This approach will clarify how NSP13 might leverage these factors to regulate YAP‐TEAD function.

      (2e) For the mass spectrometry experiments, HEK293T cells were transfected with Flag‐YAP1, HA‐NSP13, or Flag‐YAP1 + HA‐NSP13 according to the manufacturer’s standard protocols. After nuclear extraction and lysis, the supernatant was incubated with HA magnetic beads to immunoprecipitate (IP) NSP13. The IP samples were subsequently analyzed by mass spectrometry to identify NSP13‐associated proteins (Fig. 4F). Each experimental condition was performed in duplicate to ensure reproducibility. We included an appropriate negative control (Flag‐YAP1) and stringent data‐filtering criteria to minimize false positives. We apologize for not including these details in our original Methods section; in this revised manuscript, we have fully described the number of replicates, the controls used, and our data analysis pipelines.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      The authors developed a sequence-based method to predict drug-interacting residues in IDP, based on their recent work, to predict the transverse relaxation rates (R2) of IDP trained on 45 IDP sequences and their corresponding R2 values. The discovery is that the IDPs interact with drugs mostly using aromatic residues that are easy to understand, as most drugs contain aromatic rings. They validated the method using several case studies, and the predictions are in accordance with chemical shift perturbations and MD simulations. The location of the predicted residues serves as a starting point for ligand optimization.

      Strengths:

      This work provides the first sequence-based prediction method to identify potential drug-interacting residues in IDP. The validity of the method is supported by case studies. It is easy to use, and no time-consuming MD simulations and NMR studies are needed.

      Weaknesses:

      The method does not depend on the information of binding compounds, which may give general features of IDP-drug binding. However, due to the size and chemical structures of the compounds (for example, how many aromatic rings), the number of interacting residues varies, which is not considered in this work. Lacking specific information may restrict its application in compound optimization, aiming to derive specific and potent binding compounds.

      We fully recognize that different compounds may have different interaction propensity profiles along the IDP sequence. In future studies, we will investigate compound-specific parameter values. The limiting factor is training data, but such data are beginning to be available.

      Reviewer #2 (Public review):

      Summary:

      In this work, the authors introduce DIRseq, a fast, sequence-based method that predicts drug-interacting residues (DIRs) in IDPs without requiring structural or drug information. DIRseq builds on the authors' prior work looking at NMR relaxation rates, and presumes that those residues that show enhanced R2 values are the residues that will interact with drugs, allowing these residues to be nominated from the sequence directly. By making small modifications to their prior tool, DIRseq enables the prediction of residues seen to interact with small molecules in vivo.

      Strengths:

      The preprint is well written and easy to follow

      Weaknesses:

      (1) The DIRseq method is based on SeqDYN, which itself is a simple (which I do not mean as a negative - simple is good!) statistical predictor for R2 relaxation rates. The challenge here is that R2 rates cover a range of timescales, so the physical intuition as to what exactly elevated R2 values mean is not necessarily consistent with "drug interacting". Presumably, the authors are not using the helix boost component of SeqDYN here (it would be good to explicitly state this). This is not necessarily a weakness, but I think it would behove the authors to compare a few alternative models before settling on the DIRseq method, given the somewhat ad hoc modifications to SeqDYN to get DIRseq.

      Actually, the factors that elevate R2 are well-established. These are local interactions and residual secondary structures (if any). The basic assumption of our method is that intra-IDP interactions that elevate R2 convert to IDP-drug interactions. This assumption was supported by our initial observation that the drug interaction propensity profiles predicted using the original SeqDYN parameters already showed good agreement with CSP profiles. We only made relatively small adjustments to the parameters to improve the agreement. Indeed we did not apply the helix boost portion of SeqDYN to DIRseq, and will state as such. We will also compare DIRseq with several alternative models.

      Specifically, the authors previously showed good correlation between the stickiness parameter of Tesei et al and the inferred "q" parameter for SeqDYN; as such, I am left wondering if comparable accuracy would be obtained simply by taking the stickiness parameters directly and using these to predict "drug interacting residues", at which point I'd argue we're not really predicting "drug interacting residues" as much as we're predicting "sticky" residues, using the stickiness parameters. It would, I think, be worth the authors comparing the predictive power obtained from DIRseq with the predictive power obtained by using the lambda coefficients from Tesei et al in the model, local density of aromatic residues, local hydrophobicity (note that Tesei at al have tabulated a large set of hydrophobicity scores!) and the raw SeqDYN predictions. In the absence of lots of data to compare against, this is another way to convince readers that DIRseq offers reasonable predictive power.

      We will compare predictions of these various parameter sets, and summarize the results in a table.

      (2) Second, the DIRseq is essentially SeqDYN with some changes to it, but those changes appear somewhat ad hoc. I recognize that there is very limited data, but the tweaking of parameters based on physical intuition feels a bit stochastic in developing a method; presumably (while not explicitly spelt out) those tweaks were chosen to give better agreement with the very limited experimental data (otherwise why make the changes?), which does raise the question of if the DIRseq implementation of SeqDYN is rather over-parameterized to the (very limited) data available now? I want to be clear, the authors should not be critiqued for attempting to develop a model despite a paucity of data, and I'm not necessarily saying this is a problem, but I think it would be really important for the authors to acknowledge to the reader the fact that with such limited data it's possible the model is over-fit to specific sequences studied previously, and generalization will be seen as more data are collected.

      We have explained the rationale for the parameter tweaks, which were limited to q values for four amino-acid types, i.e., to deemphasize hydrophobic interactions and slightly enhance electrostatic interactions (p. 4-5). We will add that these tweaks were motivated by observations from MD simulations of drug interactions with a-syn (ref 20). As already noted in the response to the preceding comment, we will also present results for the original parameter values as well as for when the four q values are changed one at a time.

      (3) Third, perhaps my biggest concern here is that - implicit in the author's assumptions - is that all "drugs" interact with IDPs in the same way and all drugs are "small" (motivating the change in correlation length). Prescribing a specific lengthscale and chemistry to all drugs seems broadly inconsistent with a world in which we presume drugs offer some degree of specificity. While it is perhaps not unexpected that aromatic-rich small molecules tend to interact with aromatic residues, the logical conclusion from this work, if one assumes DIRseq has utility, is that all IDRs bind drugs with similar chemical biases. This, at the very least, deserves some discussion.

      The reviewer raises a very important point. In Discussion, we will add that it is important to further develop DIRseq to include drug-specific parameters when data for training become available.

      (4) Fourth, the authors make some general claims in the introduction regarding the state of the art, which appear to lack sufficient data to be made. I don't necessarily disagree with the author's points, but I'm not sure the claims (as stated) can be made absent strong data to support them. For example, the authors state: "Although an IDP can be locked into a specific conformation by a drug molecule in rare cases, the prevailing scenario is that the protein remains disordered upon drug binding." But is this true? The authors should provide evidence to support this assertion, both examples in which this happens, and evidence to support the idea that it's the "prevailing view" and specific examples where these types of interactions have been biophysically characterized.

      We will cite several studies showing that IDPs remain disordered upon drug binding.

      Similarly, they go on to say:

      "Consequently, the IDP-drug complex typically samples a vast conformational space, and the drug molecule only exhibits preferences, rather than exclusiveness, for interacting with subsets of residues." But again, where is the data to support this assertion? I don't necessarily disagree, but we need specific empirical studies to justify declarative claims like this; otherwise, we propagate lore into the scientific literature. The use of "typically" here is a strong claim, implying most IDP complexes behave in a certain way, yet how can the authors make such a claim? 

      Here again we will add citations to support the statement.

      Finally, they continue to claim:

      "Such drug interacting residues (DIRs), akin to binding pockets in structured proteins, are key to optimizing compounds and elucidating the mechanism of action." But again, is this a fact or a hypothesis? If the latter, it must be stated as such; if the former, we need data and evidence to support the claim. 

      We will add citations to both compound optimization and mechanism of action.

    1. Author response:

      The following is the authors’ response to the previous reviews

      We have made the following small adjustments and resubmit the manuscript to be published as a Version of Record with eLife.

      Changes in main text of the manuscript:

      We have moved the “Proposed additional tests” subsection to the Discussion section as suggested by the referee. 

      We have added a link to a Github repository and a link to a Zenodo data repository at the beginning of the Materials and Methods section in the “Data and materials availability” subsection. The Github repository contains simulation code and data, and single-cell data analysis code. The Zenodo link contains our experimental data (we await your confirmation before we publish it officially on Zenodo).   

      Changes in the supplemental information files

      We have fixed the typo on page 29 of the SI in which Eq. (8) was referred to in a derivation. It should be Eq. (5) instead. We thank the referee for catching this mistake which has now been corrected.

      We have fixed a typo on page 29 of SI, in which the word “evoke” is now “invoke”.  

      We have clarified the derivation on page 29 of the SI. The referee is correct that the limit condition was used to set the right-hand side of Eq. (5.11) to zero.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      This study addresses a critical gap in veterinary diagnostics by developing a CRISPR-based diagnostic toolbox (SHERLOCK4AAT) for detecting animal African trypanosomosis. It describes the development and field deployment of SHERLOCK4AAT, a CRISPR-Cas13-based diagnostic toolbox for the eco-epidemiological surveillance of animal African trypanosomosis (AAT) in West Africa.The authors successfully created and validated species-specific assays for multiple trypanosomes, including T. congolense, T. vivax, T. theileri, T. simiae, and T. suis, alongside pan-trypanosomatid and pan-Trypanozoon assays. The field validation in pigs from Guinea and Côte d'Ivoire revealed high trypanosome prevalence (62.7%), frequent co-infections, and importantly identified T. b. gambiense in one animal at each site, suggesting pigs may serve as potential reservoirs for this human-infective parasite.

      A major strength of the study lies in its methodological innovation. By adapting SHERLOCK to target both conserved and species-discriminating sequences, the authors achieved high sensitivity and specificity in detecting Trypanosoma species. Their use of dried blood spots, validated thresholds through ROC analyses, and statistical robustness (e.g., Bayesian latent class modeling) provides a strong foundation for their conclusions.

      The results are significant: over 60% of pigs tested positive for at least one trypanosome species, with co-infections observed frequently and T. b. gambiense detected in pigs at both sites. These findings have direct implications for the role of animal reservoirs in human disease transmission and underscore the value of pigs as sentinel hosts in gHAT elimination efforts.

      The limitations are well acknowledged, particularly the suboptimal sensitivity of the T. vivax assay and the reliance on synthetic controls for T. suis and T. simiae. However, these limitations do not undermine the overall conclusions, and the paper provides a clear roadmap for further assay refinement and implementation.

      This study offers a timely, impactful, and well-substantiated contribution to the field. The SHERLOCK4AAT toolbox holds promise for improving AAT diagnostics in resource-limited settings and advancing One Health surveillance frameworks.

      Thank you

      Strengths: 

      (1) The adaptation of SHERLOCK technology for AAT represents a significant technical advancement, offering higher sensitivity than traditional parasitological methods and the ability to detect multiple species simultaneously.

      (2) Rigorously performed with validation using appropriate controls, ROC curve analyses, and Bayesian latent class modelling, establishing clear analytical sensitivity and specificity for most assays.

      (3) Testing 424 pig samples across two countries provides robust evidence of the tool's utility and reveals important epidemiological insights about trypanosome diversity and prevalence.

      (4) The identification of T. b. gambiense in pigs at both sites has significant implications for HAT elimination strategies and highlights the need for integrated One Health approaches.

      (5) The use of dried blood spots and RNA detection for active infections makes the approach practical for field surveillance in resource-limited settings.

      Thank you

      Weaknesses: 

      (1) The manuscript would benefit from more detailed discussion of practical considerations such as cost, equipment requirements, and training needs for implementing SHERLOCK in endemic areas and rural settings which would improve applicability.

      This is now adressed in the revised discussion (end of the first section).

      (2) Limited discussion of pig selection criteria: More justification for choosing pigs as sentinel animals and discussion of potential limitations of this approach would strengthen the manuscript.

      Yes, this is now more clearly explained in the revised discussion (beginning of the first section).

      (3) More details on why certain genes were targeted would strengthen the methods.

      The first result section ‘Selection of targets for broad and species-specific SHERLOCK assays targeting AAT species (SHERLOCK4AAT)’ is already dedicated to extensively explaining target selection, hence we’re afraid we don’t know what could be added.  

      (4) Table formatting could be improved for readability. 

      (5) Some figures are complex and would benefit from additional explanations in the legends.

      We have tried to improve these two aspects as much as possible in the revised manuscript.

      Reviewer #2 (Public review): 

      Summary: 

      The manuscript is important due to the significance of the findings. The strength of evidence is convincing.

      Thank you

      Strengths: 

      (1) Using a Novel SHERLOCK4AAT toolkit for diagnosis. 

      (2) Identification of various sub-species of Trypanosomes. 

      (3) Differentiating the animal subspecies from the human one. 

      Thank you

      Weaknesses: 

      (1) The title is too long, and the use of definite articles should be reduced in the title.

      The title has been improved in the revised version.

      (2) The route of blood sample collection in the animals should be well defined and explained.

      This has been more clearly explained in the revised method section.

      Reviewer #3 (Public review):

      Summary: 

      The study adapts CRISPR-based detection toolkit (SHERLOCK assay) using conserved and species-specific targets for the detection of some members of the Trypanosomatidae family of veterinary importance and species-specific assays to differentiate between the six most common animal trypanosome species responsible for AAT (SHERLOCK4AAT). The assays were able to discriminate between Trypanozoon (T. b. brucei, T. evansi, and T. equiperdum), T. congolense (Savanah, Forest Kilifi, and Dzanga sangha), T. vivax, T. theileri, T. simiae, and T. suis. The design of both broad and species-specific assays was based primarily on sequences of the 18S rRNA, GAPDH (Glyceraldehyde-3-phosphate dehydrogenase), and invariant flagellum antigen (IFX) genes for species identification. Most importantly, the authors showed varying limits of detection for the different SHERLOCK assays, which is somewhat comparable to PCR-derived molecular techniques currently used for detecting animal trypanosomes, even though some of these methodologies have used other primers that target genes such as ITS1 and 7SL sRNA. <br /> The data presented in the study are particularly useful and of significant interest for the diagnosis of AAT in affected areas.

      Thank you

      Strengths: 

      The assays convincingly allow for the analysis and detection of most trypanosomes in AAT.

      Thank you

      Weaknesses: 

      Inability for the assay to distinguish T. b. brucei, T. evansi, and T. equiperdum using the 18S rRNA gene, as well as the IFX gene, not achieving the sensitivity requirements for detection of T. vivax.  Both T. brucei brucei and T. vivax are the most predominant infective species in animals (in addition to T. congolense), therefore, a reliable assay should be able to convincingly detect these to allow for proper use of the diagnostic assay.

      We agree with this point and aim to improve the toolbox for future studies.

      Reviewer #1 (Recommendations for the authors):

      (1) Provide additional details on the practicality of SHERLOCK deployment in the field, including training, costs, and infrastructure (potential challenges for field deployment, including suggestions for how to overcome these barriers).

      This is now adressed in the revised discussion (end of the first section).

      (2) Provide more detailed justification for choosing pigs as the main study species and discuss potential benefits and limitations of extending the approach to other livestock species.

      Yes, this is now more clearly explained in the revised discussion (beginning of the first section).

      (3) Add a comparison table comparing SHERLOCK4AAT performance metrics (sensitivity, specificity, LoD) with existing molecular diagnostic methods for AAT for ease of reference.

      There are dozens of different serological, immunological and molecular approaches with highlty variable levels of sensitivity and specificities already reviewed and compared in detail in two references from 2022 (Desquesnes et al. a and b), which we have cited, as well as in a newly added reference (EBHODAGHE F acta trop 2018). Hence, we decided to only refer to the most comparable studies in the present article.

      (4) Review complex figures and improve legends for better readability and interpretation.

      We have tried to improve this as much as possible in the revised manuscript.

      Reviewer #2 (Recommendations for the authors): 

      (1) Reduce the number of words in the title from 28 to not more than 20.

      The title has been improved in the revised version.

      (2) Specify the particular route of collection of blood samples in the various animals.

      Yes, this is now more clearly explained in the revised method section.

      (3) Correct all typographical errors. 

      We have tried to improve this as much as possible in the revised manuscript.

      Thanks. I wish you the best in your publication process. 

      Thank you

      Reviewer #3 (Recommendations for the authors): 

      Minor comments 

      (1) The authors can expand the discussion to include other recent diagnostic assays for Animal trypanosomiasis, such as those that target other genes like tubulin.

      Please see response to Review 1 point #3 above.

      (2) The cost-effectiveness of the use of the assay can be discussed since the assay is expected to be used for work in some resource-deprived areas. For example, will it cost a researcher less to do a diagnosis with this assay relative to what is already available?

      This is now adressed in the revised discussion (end of the first section).

      (3) Is Cote d'Ivoire more endemic for AAT than Guinea? Will this account for the apparently consistent differences in the percentage of positive samples, or just because of the type of samples used from the two locations?

      As the sampling method, sample preservation and sample analysis were the same for both groups - yes, it appears that pigs, at least for domesticated ones, in the study region of Cote d'Ivoire were more frequently infected than those in the study region of Guinea. It is however risky to extrapolate these observations to the AAT prevalence in the entire countries and/or to other mammals.

      (4) Can the authors comment on how long one can store the samples for an effective and reliable assay?

      The samples can be stored for several months at ambient temperature in a sealed bag with silica gel packages to reduce humidity. We have added this detail in the revised methods section.

      (5) It is not clear whether the authors used conventional molecular diagnostics to compare the data obtained from this particular cohort of animals as reference is made to published data. It is not surprising that the SHERLOCK performed better than using parasitology-based methodology.

      This is now adressed in the revised discussion.

      (6) (Figure 4D-5D) should be 4D and 5D.

      Thank you, this has been corrected.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors used a coarse-grained DNA model (cgNA+) to explore how DNA sequences and CpG methylation/hydroxymethylation influence nucleosome wrapping energy and the probability density of optimal nucleosomal configuration. Their findings indicate that both methylated and hydroxymethylated cytosines lead to increased nucleosome wrapping energy. Additionally, the study demonstrates that methylation of CpG islands increases the probability of nucleosome formation.

      Strengths:

      The major strength of this method is that the model explicitly includes elastic constraints on the positions of phosphate groups facing a histone octamer, as DNA-histone binding site constraints. The authors claim that their model enhances the accuracy and computational efficiency and allows comprehensive calculations of DNA mechanical properties and deformation energies.

      Weaknesses:

      A significant limitation of this study is that the parameter sets for the methylated and hydroxymethylated CpG steps in the cgNA+ model are derived from all-atom molecular dynamics (MD) simulations that suggest that both methylated and hydroxymethylated cytosines increase DNA stiffness and nucleosome wrapping energy (P´erez A, et al. Biophys J. 2012; Battistini, et al. PLOS Comput Biol. 2021). It could predispose the coarse-grained model to replicate these findings. Notably, conflicting results from other all-atom MD simulations, such as those by Ngo T in Nat. Commun. 2016, shows that hydroxymethylated cytosines increase DNA flexibility, contrary to methylated cytosines. If the cgNA+ model was trained on these later parameters or other all-atom force fields, different conclusions might be obtained regarding the effects of methylated and hydroxymethylation on nucleosome formation.

      Despite the training parameters of the cgNA+ model, the results presented in the manuscript indicate that methylated cytosines increase both DNA stiffness and nucleosome wrapping energy. However, when comparing nucleosome occupancy scores with predicted nucleosome wrapping energies and optimal configurations, the authors find that methylated CGIs exhibit higher nucleosome occupancies than unmethylated ones, which seems to contradict their findings from the same paper which showed that increased stiffness should reduce nucleosome formation affinity. In the manuscript, the authors also admit that these conclusions “apparently runs counter to the (perhaps naive) intuition that high nucleosome forming affinity should arise for fragments with low wrapping energy”. Previous all-atom MD simulations (P´erez A, et al. Biophys J. 2012; Battistini, et al. PLOS Comput Biol. 202; Ngo T, et al. Nat. Commun. 20161) show that the stiffer DNA upon CpG methylation reduces the affinity of DNA to assemble into nucleosomes or destabilizes nucleosomes. Given these findings, the authors need to address and reconcile these seemingly contradictory results, as the influence of epigenetic modifications on DNA mechanical properties and nucleosome formation are critical aspects of their study. Understanding the influence of sequence-dependent and epigenetic modifications of DNA on mechanical properties and nucleosome formation is crucial for comprehending various cellular processes. The authors’ study, focusing on these aspects, will definitely garner interest from the DNA methylation research community.

      Training the cgNA+ model on alternative MD simulation datasets is certainly of interest to us. However, due to the significant computational cost, this remains a goal for future work. The relationship between nucleosome occupancy scores and nucleosome wrapping energy is still debated, with conflicting findings reported in the literature, as noted in our Discussion section. Interestingly, we find that our predicted log probability density of DNA spontaneously acquiring a nucleosomal configuration is a better indicator of nucleosome occupancy than our predicted DNA nucleosome wrapping energy.

      Reviewer #2 (Public Review):

      Summary:

      This study uses a coarse-grained model for double-stranded DNA, cgNA+, to assess nucleosome sequence affinity. cgNA+ coarse-grains DNA on the level of bases and accounts also explicitly for the positions of the backbone phosphates. It has been proven to reproduce all-atom MD data very accurately. It is also ideally suited to be incorporated into a nucleosome model because it is known that DNA is bound to the protein core of the nucleosome via the phosphates.

      It is still unclear whether this harmonic model parametrized for unbound DNA is accurate in describing DNA inside the nucleosome. Previous models by other authors, using more coarse-grained models of DNA, have been rather successful in predicting base pair sequence-dependent nucleosome behavior. This is at least the case as far as DNA shape is concerned whereas assessing the role of DNA bendability (something this paper focuses on) has been consistently challenging in all nucleosome models, to my knowledge.

      It is thus of major interest whether this more sophisticated model is also more successful in handling this issue. As far as I can tell the work is technically sound and properly accounts for not only the energy required in wrapping DNA but also entropic effects, namely the change in entropy that DNA experiences when going from the free state to the bound state. The authors make an approximation here which seems to me to be a reasonable first step.

      Of interest is also that the authors have the parameters at hand to study the effect of methylation of CpG-steps. This is especially interesting as it allows us to study a scenario where changes in the physical properties of base pair steps via methylation might influence nucleosome positioning and stability in a cell-type-specific way.

      Overall, this is an important contribution to the question of how the sequence affects nucleosome positioning and affinity. The findings suggest that cgNA+ has something new to offer. But the problem is complex, also on the experimental side, so many questions remain open.

      Strengths:

      The authors use their state-of-the-art coarse-grained DNA model which seems ideally suited to be applied to nucleosomes as it accounts explicitly for the backbone phosphates.

      Weaknesses:

      (1) According to the abstract the authors consider two “scalar measures of the sequence-dependent propensity of DNA to wrap into nucleosomes”. One is the bending energy and the other, is the free energy. Specifically in the latter, the authors take the difference between the free energies of the wrapped and the free DNA. Whereas the entropy of the latter can be calculated exactly, they assume that the bound DNA always has the same entropy (independent of sequence) in its more confined state. The problem is the way in which this is written (e.g. below Eq. 6) which is hard to understand. The authors should mention that the negative of Eq. 6 is what physicists call free energy, namely especially the free energy difference between bound and free DNA.

      We have included the necessary clarifications in the revised manuscript, below Eq. 6.

      (2) In Eq. 5 the authors introduce penalty coefficients c<sub>i</sub>. They write that values are “set by numerical experiment to keep distances ... within the ranges observed in the PDB structure, while avoiding sterical clashes in DNA.” This is rather vague, especially since it is unclear to me what type of sterical clashes might occur. Figure 1 shows then a comparison between crystal structures and simulated structures. They are reasonably similar but standard deviations in the fluctuations of the simulation are smaller than in the experiments. Why did the authors not choose smaller c<sub>i</sub>-values to have a better fit? Do smaller values lead to unwanted large fluctuations that would lead to steric clashes between the two DNA turns? I also wonder what side views of the nucleosomes look like (experiments and simulations) and whether in this side view larger fluctuations of the phosphates can be observed in the simulation that would eventually lead to turn-turn clashes for smaller c<sub>i</sub>-values.

      The side view plots of the experimental and predicted nucleosome structures are now added to Supplementary material (Figure S8). Indeed, smaller c<sub>i</sub> values lead to steric clashes between the two turns of DNA – this is now specified in the Methods section. A possible improvement of our optimisation method and a direction of future work would be adding a penalty which prevents steric clashes to the objective function. Then the c<sub>i</sub> values could be reduced to have bigger fluctuations that are even closer to the experimental structures. We added this explanation to the Results section.

      Reviewer #3 (Public Review):

      Summary:

      In this study, the authors utilize biophysical modeling to investigate differences in free energies and nucleosomal configuration probability density of CpG islands and nonmethylated regions in the genome. Toward this goal, they develop and apply the cgNA+ coarse-grained model, an extension of their prior molecular modeling framework.

      Strengths:

      The study utilizes biophysical modeling to gain mechanistic insight into nucleosomal occupancy differences in CpG and nonmethylated regions in the genome.

      Weaknesses:

      Although the overall study is interesting, the manuscripts need more clarity in places. Moreover, the rationale and conclusion for some of the analyses are not well described.

      We edited the manuscript according to the reviewer’s suggestions and hopefully improved its readability.

      Reviewer #1 (Recommendations For The Authors):

      (1) The cgNA+ model parameters are derived from all-atom molecular dynamics (MD) simulations, yet there is no consensus within all-atom MD simulations regarding the impact of CpG methylation on DNA mechanical properties. The authors could consider fitting the coarsegrained model with a different all-atom force field to verify whether the conclusions regarding the effects of methylation and hydroxymethylation on DNA nucleosome wrapping energies still hold. For further details on MD simulations related to CpG methylation effects, the authors are advised to consult the review paper by Li et al. (2022) titled “DNA methylation: Precise modulation of chromatin structure and dynamics” published in Current Opinion in Structural Biology.

      Parametrizing the cgNA+ model using MD simulations with various force fields is certainly of interest to us. However, due to the computational cost involved, it remains a goal for future work.

      (2) Beyond DNA mechanical properties, which are directly linked to nucleosome wrapping energies in this study, the authors might also consider other factors such as geometric properties that could influence nucleosome formation. This approach might help the authors to reconcile the observed higher nucleosome occupancy scores for methylated CpGs. The authors are encouraged to review the aforementioned paper for additional experimental and MD simulation studies that could support this perspective.

      Geometric properties of DNA are directly incorporated into our method through the cgNA+ model equilibrium shape prediction µ. We compute the mechanical energy needed deform µ to a nucleosomal configuration. Notably, the equilibrium shape µ is sensitive to methylation, as demonstrated in Figure 3.

      (3) There are some issues with citation accuracy in the manuscript. For instance, in the Discussion section, the authors attribute a statement to Collings et al. and Anderson (2017), claiming that “methylated regions, known to have high wrapping energy, are among the highest nucleosome occupied elements in the genome.” However, upon reviewing this paper, it appears that it does not make any claims about the high wrapping energy of methylated regions.

      The paragraph is now edited and a separate citation, P´erez et al. (2012), is given for the statement that methylation regions have high wrapping energy.

      Reviewer #2 (Recommendations For The Authors):

      Please improve the readability by:

      (1) making clear that -ln ρ in Eq. 6 on page 4 is actually the free energy. Also, the word entropy comes too late (on page 7) where the best explanation of Eq. 6 is presented.

      We added a comment about -ln ρ being the free energy after Eq. 6 and also included an equation, relating ln ρ and entropy.

      (2) page 12 and 13 show two sets of experimental data. They are quite different from each other. When reading this, I wondered why there is this difference. But only on page 16, you explain that these are different cell types. The difference should be explained already when the papers are introduced on page 12.

      A corresponding sentence already appeared in page 12: “The observations about nucleosome occupancy should be regarded as preliminary, and be treated with caution, as they are based on experimental data obtained for the cancerous HeLa cells Schwartz et al. (2019) and human genome embryonic stem cells Yazdi et al. (2015)”. Now we also added this information to the first paragraph of the subsection for clarity.

      Finally, I add here some general thoughts that came up when reading the paper, comparing your findings with earlier findings in the field. This is not a strict one-to-one comparison and thus does not have to find its way into this manuscript but might give ideas for future studies. Experiments suggest that nucleosomes prefer DNA with a high content of C’s and G’s. Figure 2 does not look at the GC content but at the number of CpG’s. But in any case, let’s use this as a proxy for GC content. Figure 2a suggests that there is not a strong dependence of the bending energy on the number CpG steps. This is consistent with earlier work with the rigid basepair model which shows the same behavior for GC content (for both MD and crystal parametrizations). Figure 2c (related to the negative free energy) shows that with an increasing number of CpG steps the propensity to bind goes down. This suggests that the entropic cost to confine CpG-rich DNA increases, which in turn reflects that these DNA stretches are softer. This is rather interesting since in the case of the rigid basepair model this effect is observed only when stiffnesses are extracted from crystal data not MD data (however, this refers again to CG content). This might indicate a difference between the rigid bp model and cgNA+ which will be interesting to study in the future. Interesting is also the effect of CpG methylation. The stiffer methylated steps lead to an increase in the energy with the number of such steps (Figure 2a). The entropic cost for binding is thus expected to be smaller and this is indeed observed in Figure 2c when compared to the non-methylated steps.

      We thank the reviewer for this comment. As for the GC content, the energy and lnp plots are indeed very similar to those in Figure 2.

      Reviewer #3 (Recommendations For The Authors):

      (1) The formulation of the cgNA+ model in the method section was not easy to follow and can be described better to improve clarity.

      We have revised the model description and hope that its clarity has been improved

      (2) The authors mention utilizing 100 human genome sequences with 100 configurations from DB. It would be helpful to clarify the source of these 100 human genome sequences. Are these 100 distinct regions on the human reference genome, or are they from a specific dataset or database?

      We now include an explanation about the origin of sequences: “The human genome sequences are a random subset of our sequence sample for the CGI and NMI intersection in the Chromosome 1, but the following observations remain unchanged for sequence samples from different genomic regions.”

      (3) The authors mention the lack of tail unwrapping in their model. It would be beneficial to understand the magnitude of this issue and its potential impact on the overall results. How significant is the lack of unwrapping events in their current model?

      We observed the unwrapping of approximately five base-pairs at each end of our predicted nucleosome configurations, in comparison to the experimental configurations (Figure 1). This issue could be solved by adding additional constraints at the ends of the 147 bp sequence. The wrapping energy would increase marginally, as only about 10 of 147 bp would be affected. We added this remark to the main text.

      (4) Observations from Figure 3 are not described properly. Are these differences statistically significant? Why is twist higher for CpG sites but lower for a roll?

      We added an explanation of how the statistics was computed into the caption of Figure 3. In fact, we didn’t use statistical estimates here, but generated all the possible cases and computed the exact statistics (for the given set of our model parameters). Regarding the changes in twist and roll, we have added the following comment on page 7: “The ground state changes resulting from cytosine modifications – primarily characterized by an average increase in roll and a decrease in twist – may be linked to steric hindrance caused by the cytosine 5-substituent (Battistini et al. (2021)). Notably, the negative coupling between twist and roll has already been observed in X-ray crystallography data (Olson et al. (1998)).”

      (5) Figure 4 does not clarify the authors’ conclusion of higher stiffness for ApT and TpA dinucleotides. The authors should provide further explanation for this observation.

      We revised the text to clarify that the statement regarding ApT and TpA being the most stiff and the most flexible dinucleotides is not a conclusion derived from Figure 4, but rather from earlier work that we cite.

      (6) In Figure 7, the authors note that methylated CGIs have higher nucleosome occupancy on average than unmethylated sequences. Is this observation statistically significant?

      We observe that methylated sequences have a higher average occupancy than unmethylated sequences in Yazdi et al. data, when the CpG count falls into the intervals from 5 to 14 and from 15 to 24. For each of the two intervals this difference is statistically significant: the permutation test, used due to the lack of normality, yields a p-value of 0.0001 for both cases. The differences in mean scores shown in Figure 8 are also statistically significant. Such test results are expected, given the large sample sizes and the observed differences in means, therefore we prefer not to include this discussion in main text.

      (7) The authors note that their analyses to correlate nucleosome occupancy profile with the methylation state of underlying sequences are preliminary, as different cell lines were used to perform these analyses. Given this inconsistency, it needs to be clarified why this analysis was performed and what the takeaway is.

      We added the following comment at the end of the Results section: “Although comparing data from different cell lines is not optimal, to the best of our knowledge, no publicly available methylation and nucleosome occupancy data exist for the entire human genome within the same cell type. Nevertheless, since the lowest log probability densities in the human genome are predicted for CpG-rich sequences regardless of their methylation state (Figure 2d), and the same holds for both sets of the nucleosome occupancy scores (Figure 7), we conclude that the lowest occupancies occur for sequences with the lowest log probability densities.”

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This valuable study revisits the effects of substitution model selection on phylogenetics by comparing reversible and non-reversible DNA substitution models. The authors provide evidence that 1) non time-reversible models sometimes perform better than general time-reversible models when inferring phylogenetic trees out of simulated viral genome sequence data sets, and that 2) non time-reversible models can fit the real data better than the reversible substitution models commonly used in phylogenetics, a finding consistent with previous work. However, the methods are incomplete in supporting the main conclusion of the manuscript, that is that non time-reversible models should be incorporated in the model selection process for these data sets.

      The non-reversible models should be incorporated in the selection model process not because the significantly perform better but only because the do not perform worse than the reversible models and that true biochemical processes of nucleotide substitution does support the science of non-reversibility.

      Reviewer #1 (Public Review):

      The study by Sianga-Mete et al revisits the effects of substitution model selection on phylogenetics by comparing reversible and non-reversible DNA substitution models. This topic is not new, previous works already showed that non-reversible, and also covarion, substitution models can fit the real data better than the reversible substitution models commonly used in phylogenetics. In this regard, the results of the present study are not surprising. Specific comments are shown below.

      True

      It is well known that non-reversible models can fit the real data better than the commonly used reversible substitution models, see for example,

      https://academic.oup.com/sysbio/article/71/5/1110/6525257

      https://onlinelibrary.wiley.com/doi/10.1111/jeb.14147?af=R

      The manuscript indicates that the results (better fitting of non-reversible models compared to reversible models) are surprising but I do not think so, I think the results would be surprising if the reversible models provide a better fitting.

      I think the introduction of the manuscript should be increased with more information about non-reversible models and the diverse previous studies that already evaluated them. Also I think the manuscript should indicate that the results are not surprising, or more clearly justify why they are surprising.

      The surprise in the findings is in NREV12 performing better than NREV6 for double stranded DNA viruses as it was expected that NREV6 would perform better given the biochemical processes discussed in the introduction.

      In the introduction and/or discussion I missed a discussion about the recent works on the influence of substitution model selection on phylogenetic tree reconstruction. Some works indicated that substitution model selection is not necessary for phylogenetic tree reconstruction,

      https://academic.oup.com/mbe/article/37/7/2110/5810088

      https://www.nature.com/articles/s41467-019-08822-w

      https://academic.oup.com/mbe/article/35/9/2307/5040133

      While others indicated that substitution model selection is recommended for phylogenetic tree reconstruction,

      https://www.sciencedirect.com/science/article/pii/S0378111923001774

      https://academic.oup.com/sysbio/article/53/2/278/1690801

      https://academic.oup.com/mbe/article/33/1/255/2579471

      The results of the present study seem to support this second view. I think this study could be improved by providing a discussion about this aspect, including the specific contribution of this study to that.

      In our conclusion we have stated that:

      The lack of available data regarding the proportions of viral life cycles during which genomes exist in single and double stranded states makes it difficult to rationally predict the situations where the use of models such as GTR, NREV6 and NREV12 might be most justified: particularly in light of the poor over-all performance of NREV6 and GTR relative to NREV12 with respect to describing mutational processes in viral genome sequence datasets. We therefore recommend case-by-case assessments of NREV12 vs NREV6 vs GTR model fit when deciding whether it is appropriate to consider the application of non-reversible models for phylogenetic inference and/or phylogenetic model-based analyses such as those intended to test for evidence of natural section or the existence of molecular clocks.

      The real data was downloaded from Los Alamos HIV database. I am wondering if there were any criterion for selecting the sequences or if just all the sequences of the database for every studied virus category were analysed. Also, was any quality filter applied? How gaps and ambiguous nucleotides were considered? Notice that these aspects could affect the fitting of the models with the data.

      We selected varying number of sequences of the database for every studied virus type. Using the software aliview we did quality filter by re-aligning the sequences per virus type.

      How the non-reversible model and the data are compared considering the non-reversible substitution process? In particular, given an input MSA, how to know if the nucleotide substitution goes from state x to state y or from state y to state x in the real data if there is not a reference (i.e., wild type) sequence? All the sequences are mutants and one may not have a reference to identify the direction of the mutation, which is required for the non-reversible model. Maybe one could consider that the most abundant state is the wild type state but that may not be the case in reality. I think this is a main problem for the practical application of non-reversible substitution models in phylogenetics.

      True

      Reviewer #1 (Recommendations for the authors):

      The reversible and non-reversible models used in this study assume that all the sites evolve under the same substitution matrix, which can be unrealistic. This aspect could be mentioned.

      Done

      The manuscript indicates that "a phylogenetic tree was inferred from an alignment of real sequences (Avian Leukosis virus) with an average sequence identity (API) of ~90%.". I was wondering under which substitution model that phylogenetic tree reconstruction was performed? could the use of that model bias posterior results in terms of favoring results based on such a model?

      We have stated that the GTR+G model was used to reconstruct the tree. The use of the GTR+G model could yes bias the posterior results as we have stated in the paper too.

      I was wondering which specific R function was used to calculate the weighted Robinson-Foulds metric. I think this should be included in the manuscript.

      We stated that We used the weighted Robinson-Foulds metric (wRF; implemented in the R phangorn package (Schliep, 2011)⁠)

      Despite a minority, several datasets fitted better with a reversible model than with a non-reversible model. I think that should be clearly indicated. In addition, in my opinion the AIC does not enough penalizes the number of parameters of the models and favors the non-reversible models over the reversible models, but this is only my opinion based on the definition of AIC and it is not supported. Thus, I think the comparison between phylogenetic trees reconstructed under different substitution models was a good idea (but see also my second major comment).

      Noted

      When comparing phylogenetic trees I was wondering if one should consider the effect of the estimation method and quality of the studied data? For example, should bootstrap values be estimated for all the ancestral nodes and only ancestral nodes with high support be evaluated in the comparison among trees?

      Yes the estimation method and quality of the studied data should be considered. When using RF unlike wRF this will not matter but for weighted RF it does. When building the trees, using RaxML only high support nodes are added to the tree.

      In Figure 3, I do not see (by eye) significant differences among the models. I see in the legend that the statistical evaluation was based on a t test but I am not much convinced. Maybe it is only my view. Exactly, which pairs of datasets are evaluated with the t test? Next, I would expect that the influence of the substitution model on the phylogenetic tree reconstruction is higher at large levels of nucleotide diversity because with more substitution events there is more information to see the effects of the model. However, the t test seems to show that differences are only at low levels of nucleotide diversity (and large DNR), what could be the cause of this?

      The paired T-tests compares the wRF distances of the inferred tree real tree and the trees simulated using the GTR model verses the wRF distances of the inferred true tree from the trees simulated using the NREV12 model.

      The reason why the influence of the NREV12 model on the tree reconstructed is not significantly higher at large levels of nucleotide diversity could be because at a certain level the DNR are simply unrealistic.

      Can the user perform substitution model selection (i.e., AIC) among reversible and non-reversible substitution models with IQTREE? If yes, then doing that should be the recommendation from this study, correct?

      But, can DNR be estimated from a real dataset? DNR seems to be the key factor (Figure 3) for the phylogenetic analysis under a proper model.

      Substitution model selection can be performed among reversible and non-reversible using both HyPhy and IQTREE. And we have recommended that model tests should be done as a first step before tree building. Estimating DNR from real datasets requires a substation rate matrix of a non-reversible.

      The manuscript has many text errors (including typos and incorrect citations). For example, many citations in page 20 show "Error! Reference source not found.". I think authors should double check the manuscript before submitting. Also, some text is not formally written. For example, "G represents gamma-distributed rates", rates of what? The text should be clear for readers that are not familiar with the topic (i.e., G represents gamma-distributed substitution rates among sites). In general, I recommend a detailed revision of the whole text of the manuscript.

      Done

      Reviewer #2 (Public Review):

      The authors evaluate whether non time reversible models fit better data presenting strand-specific substitution biases than time reversible models. Specifically, the authors consider what they call NREV6 and NREV12 as candidate non time-reversible models. On the one hand, they show that AIC tends to select NREV12 more often than GTR on real virus data sets. On the other hand, they show using simulated data that NREV12 leads to inferred trees that are closer to the true generating tree when the data incorporates a certain degree of non time-reversibility.

      Based on these two experimental results, the authors conclude that "We show that non-reversible models such as NREV12 should be evaluated during the model selection phase of phylogenetic analyses involving viral genomic sequences". This is a valuable finding, and I agree that this is potentially good practice.

      However, I miss an experiment that links the two findings to support the conclusion: in particular, an experiment that solves the following question: does the best-fit model also lead to better tree topologies?

      By NREV12 leading to inferred trees that are closer to the true generating tree as compared to GTR, it then shows that the best-fit model in this case being NREV12 leads to better tree topologies.

      On simulated data, the significance of the difference between GTR and NREV12 inferences is evaluated using a paired t test. I miss a rationale or a reference to support that a paired t test is suitable to measure the significance of the differences of the wRF distance. Also, the results show that on average NREV12 performs better than GTR, but a pairwise comparison would be more informative: for how many sequence alignments does NREV12 perform better than GTR?

      We have used the popular paired t-test as it is the most widely used when comparing means values between two matched samples where the difference of each mean pair is normally distributed. And the wRF distances do match the guidelines above.

      The paired t-test contains the pairwise comparison and the boxplots side by side show the pairwise wRF comparisions.

      Reviewer #2 (Recommendations for the authors):

      The authors reference Baele et al., 2010 for describing NREV6 and NREV12. I suggest using the same name used in the referenced paper: GNR-SYM and GNR respectively. Although I do not think there is a standard name for these models, I would use a previously used one.

      We have built studies based on the names NREV6 and NREV12. We would like to keep the naming as standard for our studies.

      GTR and NREV12 models are already described in many other papers. I do not see the need to include such an extensive description. Also, a reference should be included to the discrete Gamma rate categories [1]

      We included the extensive description to enable other readers who are not super familiar with these models better understanding since we have given the models our own naming different from those used in other papers.

      We have added referencing for the discrete gamma rate as recommended. (Yang, 1994)

      To evaluate the exhaustiveness and correctness of the results, I would recommend publishing as supplementary material the simulated data sets or the scripts for generating the data set, the scripts or command lines for the analysis, and the versions of the software used (e.g., IQTREE). Also, to strongly support the main conclusion of the manuscript, I suggest adding to the simulations section results the RF-distances of the best-fit selected model under AIC, AICc, and BIC as well.

      We can go ahead and submit all the needed datasets. The simulated data RF-Distances results are available and will be submitted. We cannot however add them to the main document as this will create very long data tables.

      In some instances, it is mentioned that the selection criterion used is AIC, while in others, AIC-c is referenced. Even in the table captions, both terms are mixed. It should be made clearer which criterion is being employed, as AIC is not suitable for addressing the overparameterization of evolutionary models, given that it does not account for the sample size. A previous pre-print of this article [2] does not mention AIC-c, but also explicitly includes the formulas for AIC that do not take the sample size into account, and reports the same results as this manuscript, what indicates that AIC and not AIC-c was used here. This should be clarified. It is recommended to use AIC-c instead of AIC, especially if the sample size to model parameters ratio is low [3]. Two things may be appointed here: some authors consider tree branch lengths as model free parameters and others do not. In this paper it is not specified how the model parameters are counted. AIC tends to select more parameterized models than AIC-c, and overparameterization can lead to different tree inferences, as evidenced in Hoff et al., 2016. Therefore, it is expected that NREV12 is more frequently selected than NREV6 and GTR.

      In my opinion, a pairwise comparison between GTR and NREV12 performance is of great interest here, and the whiskers plots are not useful. Scatterplots would display the results better.

      Boxplots are meant to offer a simplified view of the results as the paired t-tests does all of the comparisons. We shall provide the scatter plots as supplementary information so that readers can get full detailed plots as recommended.

      Some references are missing.

      Missing references added

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      In this manuscript, Gruber et al perform serial EM sections of the antennal lobe and reconstruct the neurites innervating two types of glomeruli one that is narrowly tuned to geosmin and one that is broadly tuned to other odours. They quantify and describe various aspects of the innervations of olfactory sensory neurons (OSNs), uniglomerlular projection neurons (uPNs), and the multiglomerular Local interneurons (LNs) and PNs (mPNs). They find that narrowly tuned glomeruli had stronger connectivity from OSNs to PNs and LNs, and considerably more connections between sister OSNs and sister PNs than the broadly tuned glomeruli. They also had less connectivity with the contralateral glomeruli. These observations are suggestive of strong feed-forward information flow with minimal presynaptic inhibition in narrowly tuned glomeruli, which might be ecologically relevant, for example, while making quick decisions such as avoiding a geosmin-laden landing site. In contrast, information flow in more broadly tuned glomeruli show much more lateralisation of connectivity to the contralateral glomerulus, as well as to other ipsilateral glomeruli. 

      The data are well presented, the manuscript clearly written, and the results will be useful to the olfaction community. I wonder, given the hemibrain and FAFB datasets exist, whether the authors have considered verifying whether the trends they observe in connectivity hold across three brains? Is it stereotypic? 

      We appreciate the reviewer’s positive view of our study and their thoughtful and relevant comment on the issue of individual variation. We agree in that this is a very important question and notice that it was also asked for by the second Reviewer. It reflects both our limited understanding of the range of individual variation in synaptic connectivity—whether in flies, humans, or other species—and the challenge of determining which of the differences observed in our study are stereotypical features of each glomerulus type. Undoubtedly this criticism addresses a crucial problem of practically all connectome studies so far and for which there is no immediate solution. This type of studies requires so much time, efforts and money that increasing the number of samples is seldom feasible. The Reviewer wonders if we could compare our data with that made available by two of the largest connectome studies of Drosophila. This appeared to us to be a very good idea and we have tried to follow the advice but, unfortunately, it was impracticable because of the reasons we explain below. The hemibrain data cannot be used for this purpose because it does not contain the full glomerulus DA2 (Schlegel et al., 2021). A different problem hindered us from using the FAFB dataset, the other dataset mentioned by the Reviewer. In this case the three glomeruli were sectioned and reconstructed but the dataset lacks an annotated list of all synaptic connections corresponding to each glomerulus. Such annotation (a compendium of all synaptic connections inside each glomerulus informing for each connection which type of neuron provides the presynaptic site and which the postsynaptic site) is essential for direct comparison with our data. It is important to keep in mind that the current analytical tools available for the use of these datasets (e.g., NeuPrint, FlyWire and CATMAID) do not offer the ability to extract data on synapses exclusively from the glomerular volume of DA2 or DL5. In this case, it certainly is theoretically possible to obtain the data by doing ourselves the annotation. However, such a study will demand so much time, efforts and financial resources, which we believe would not be justified solely to increase the number of individuals from one to two. Instead, our manuscript includes a comparison of the OSN connectivity in VA1v and DL5 using the hemibrain dataset published by Schlegel et al. (2021) (see revised manuscript: lines 311–315; 431–434; 558–562; 602–606).

      Beyond the opinion, that we share in full with the Reviewer, that a comparison including three flies will be better than a comparison made with one glomerulus of each type we are still challenged by the question of which -if any- of the differences are stereotypic. The clarification of what are stereotypical differences between particular glomeruli in features as those discussed in our study and what is simply differences within the normal range of individual variation is basically a statistical problem. A first attempt at a comprehensive comparison focusing on intra- and inter-individual variability was recently made by comparing two connectome datasets from two different Drosophila individuals (Dorkenwald et al., 2024; Schlegel et al., 2024). At present, it is still unclear how many samples are needed to make a statistically robust comparison of olfactory synaptic circuits in adult flies—perhaps 3, 6, or even 18 individuals?  

      Reviewer #2 (Public Review):

      The chemoreceptor proteins expressed by olfactory sensory neurons differ in their selectivity such that glomeruli vary in the breadth of volatile chemicals to which they respond. Prior work assessing the relationship between tuning breadth and the demographics of principal neuron types that innervate a glomerulus demonstrated that narrowly tuned glomeruli are innervated more projection neurons (output neurons) and fewer local interneurons relative to more broadly tuned glomeruli. The present study used high-resolution electron microscopy to determine which synaptic relationships between principal cell types also vary with glomerulus tuning breadth using a narrowly tuned glomerulus (DA2) and a broadly tuned glomerulus (DL5). The strength of this study lies in the comprehensive, synapse-level resolution of the approach. Furthermore, the authors implement a very elegant approach of using a 2-photon microscope to score the upper and lower bounds of each glomerulus, thus defining the bounds of their restricted regions of interest. There were several interesting differences including greater axo-axonic afferent synapses and dendrodentric output neuron synapses in the narrowly tuned glomerulus, and greater synapses upon sensory afferents from multiglomerular neurons and output neuron autapses in the broadly tuned glomerulus.     The study is limited by a few factors. There was a technical need to group all local interneurons, centrifugal neurons, and multiglomerular projection neurons into one category ("multiglomerular neurons") which complicates any interpretations as even multiglomerular projection neurons are very diverse. Additionally, there were as many differences between the two narrowly tuned glomeruli as there were comparing the narrowly and broadly tuned glomeruli. Architecture differences may therefore not reflect differences in tuning breadth, but rather the ecological significance of the odors detected by cognate sensory afferents. Finally, some synaptic relationships are described as differing and others as being the same between glomeruli, but with only one sample from each glomerulus, it is difficult to determine when measures differ when there is no measure of inter-animal variability. If these caveats are kept in mind, this work reveals some very interesting potential differences in circuit architecture associated with glomerular tuning breadth.

      This work establishes specific hypotheses about network function within the olfactory system that can be pursued using targeted physiological approaches. It also identifies key traits that can be explored using other high-resolution EM datasets and other glomeruli that vary in their tuning selectivity. Finally, the laser "branding" technique used in this study establishes a reduced-cost procedure for obtaining smaller EM datasets from targeted volumes of interest by leveraging the ability to transgenically label brain regions in Drosophila.

      CLASSIFICATION OF NEURONAL TYPES

      We agree that grouping diverse types of interneurons into a single category (referred to as MGNs) limits the ability to make interpretations about synaptic similarities and differences between specific neuronal types. This was, however, an unavoidable compromise resulting from our decision to generate a comprehensive, synapse-level reconstruction of the restricted regions encompassing the DA2 and DL5 glomeruli. As both reviewers have noted, this approach offers significant value and we hope the Editor will also recognize that this limitation does not prevent readers from gaining important and novel insights into the synaptic circuitry of these two glomeruli.  

      Similar to the approach taken by Tobin at al. (2017) we prioritized producing a densely reconstructed neuropile, in which no synapses were omitted (Tobin et al., 2017). The downside of this method is that not all synaptic connections could be reliably assigned to specific neuronal types, with about 12% remaining unassigned." We anticipate that future research, supported by advances in semi-automated tracing methods, improved imaging technologies, and increased personnel resources, will allow not only for the generation of more complete connectomes of the entire brain (Scheffer et al., 2020; Zheng et al., 2018), but also, for the accurate reconstruction and classification of individual synapses—even in highly complex regions such as the olfactory glomeruli. We also expect that a second complete connectome of a male Drosophila will soon become available, which will provide valuable opportunities for comparisons across individuals and between male and female brains in future studies.

      INTERGLOMERULAR DIFFERENCES

      Thank you for this insightful comment. It is indeed true that despite both DA2 and VA1v being narrowly tuned glomeruli, they exhibit considerable differences in specific connectivity features (e.g., relative synaptic strengths above certain thresholds) and that those differences can be as pronounced as those observed between DA2 and the broadly tuned DL5. For this reason, comparing each individual glomerulus to every other is not a practical or informative approach. To derive robust interpretations, we focused instead on whether two glomeruli that share a particular functional characteristic—namely, being narrowly tuned for single odorants—also share connectivity patterns that distinguish them from a broadly tuned reference glomerulus.

      Our results support this. Furthermore, additional connectomics data reinforce our conclusions.

      For example, OSN-OSN connectivity is stronger in the two narrowly tuned glomeruli (DA2 and VA1v) relative to the broadly tuned glomerulus (DL5). While these pairwise differences alone are not conclusive, the finding that the two narrowly tuned glomeruli studied here share features that distinguish them from the broadly tuned glomerulus supports our interpretation. We found further support for this idea in the data reported by Schlegel et al. (2021) further. In that dataset, other narrowly tuned glomeruli (DA1, DL3, and DL4) also exhibit stronger OSNOSN connectivity than other broadly tuned glomeruli (DM1 or DM4).

      We do not deny that there are many differences between any given pair of glomeruli, regardless of whether they are narrowly or broadly tunned. Instead, we propose that our findings on circuit features indicate that most of the observed differences actually grouped the two narrowly tuned glomeruli together relative to the broadly tuned glomerulus. A more concise summary is now provided in the newly added Figure 8. We also added explanatory lines of text in the beginning of the chapter ‘specific features of narrowly tuned glomerular circuits. 

      ECOLOGICAL SIGNIFICANCE

      This is an interesting point. However, it is difficult to disentangle the "ecological significance" of processed odorants from the "tuning breadth" of a glomerulus. In the Drosophila olfactory system, glomerular circuits that respond to ecologically important odorants—such as those involved in reproduction or danger—tend to be more narrowly tuned. Moreover, while we refer to odorants with specific ecological significance as those linked to survival or reproductive behaviors, defining the significance of an odorant with precision is inherently challenging, as it can vary depending on context and environmental conditions.

      What both circuits share is their narrow tuning breadth. We therefore propose that the common circuit features of VA1v and DA2, highlighted in this study, are functionally related to the fact that each circuit processes single odorants. Consequently, their specificity is most likely determined at the level of the receptor. 

      INDIVIDUAL VARIABILITY

      We agree that accounting for inter-animal variability would strengthen the study. However, we are confident that even a modest statistically sound assessment of this variability would require a larger sample size, certainly more than just two or three flies, which is presently not feasible.

      We refer the reviewer to our response to Reviewer #1 regarding this important issue.

      Initial insights into variability between flies have been provided through comparative analyses of the two most comprehensive female Drosophila melanogaster connectomes—the FAFB and hemibrain datasets (Schlegel et al., 2024). For more detailed quantitative comparisons regarding inter-animal variability, please refer to our response to the second major point raised by Reviewer #2. As highlighted by Schlegel et al. (2024), making definitive statements about the stereotypy of neuron numbers, unitary cell-cell connections (edges), or synaptic strengths (weights) remains a complex challenge."

      While appreciating the rigour of this work we were surprised to notice the omission of a comparison of their observations with the two other existing datasets. This would not only have addressed the technical limitation of this particular study - the inability to identify specific neuron types due to imaging a small part of the brain - but would also have shed light on inter-animal variability 

      We strongly recommend that the authors do make this comparison - the datasets are currently extremely user friendly and so we don't estimate the replication of their key findings will be too onerous. This will be particularly important to resolve the issue of having to classify all multiglomerular local interneurons and multiglomerular projection neurons - broadly into "MGN. Such a comparison will dramatically strengthen this study that poses very interesting questions, but in its current form, has this striking shortcoming. 

      INDIVIDUAL VARIABILITY AS EXPRESSED HERE:

      Earlier on we were of the same opinion that the Reviewer express here but, unfortunately, it was not possible to follow his advice. As far as it was possible, we have compared some of our results to the values of the two datasets that the Reviewer refers to, but the absence of glomerulus DA2 in one of the datasets and the absence of synapse annotation for all the relevant glomeruli in the other dataset prevented us from making a full comparison. Moreover, believe that the problem of individual variation most probably cannot be solved by increasing the comparison with one or two more flies.

      Reviewer #1 (Recommendations for The Authors): 

      The lines 270 - 282 confused me in the backdrop of Figure 3B. 

      The concern may stem from our inclusion of a comparison between the uPNs of glomerulus DA2 and the single uPN of glomerulus DL5 in the statistical analysis presented in Figure 3. This comparison was included to ensure a comprehensive representation of the data, highlighting the variability across all major cell groups. We have clarified this rationale in the revised manuscript (see lines 274-282).

      Reviewer #2 (Recommendations for The Authors): 

      I commend the authors for taking such a thorough approach to advance an interesting topic in olfaction. The following suggestions are intended to strengthen this study: 

      Major points: 

      A color-blind-friendly palette should be used for all figures. Currently, five of seven figures use red and green, and in particular, Figure 5 will be uninterpretable for red/green color-blind readers. 

      We are thankful for this important comment. We changed the color palette as suggested by the reviewer, and replaced Red with Magenta and changed the figure legend accordingly.

      This level of analysis is extremely resource and time-consuming, so even obtaining this information at this resolution is an impressive achievement. However, this study would be well served by strategically supplementing the analysis of this dataset with information from other publicly available connectomics datasets. For instance, some interpretations are limited because there is information from only a single DL5 and DA2 glomerulus. Any claims in which one glomerulus has more, less, or the same of a metric must be tempered because without replicates, there are no measures of inter-animal variability. As an example, on lines 386-387 the authors state "The relative synaptic strength between MGN>uPN was stronger in DA2 (12%) than DL5 (10%)". It is difficult to assess whether this represents a difference that is outside of the range of inter-animal variability inherent to the olfactory system. Taking select measures from the Hemibrain and FAFB (via FlyWire) datasets could help strengthen these claims. 

      We fully agree with the Reviewer’s opinion that since our data is from one glomerulus of each type “It is difficult to assess whether this represents a difference that is outside of the range of inter-animal variability inherent to the olfactory system.” This is a weakness of practically all connectome studies based on electron microscopy in both Drosophila and other animals We cannot be sure that measurements from the Hemibrain and FAFB datasets could help strengthen our claims, because the magnitude of the range of individual variation is presently not known and most probably solving this problem will require more than one or two more flies. In any case, it is not possible to follow this advice and compare our data with that of the hemibrain because the DA2 was not included in that study. We ask the Reviewer to read our more detailed explanation in our response to Reviewer 1.

      In the particular case commented by the Reviewer above, the relative difference in synaptic strength exceeds 20%. Whether such a difference has functional relevance remains an open question but Schlegel et al. (2024) support our interpretation. They showed that synaptic weights with differences larger than 20% tend to be consistent across individuals, with strong correlations within and between animals (Pearson’s R = 0.97 and R = 0.8; Fig. 4).

      Grouping all local interneurons, centrifugal neurons response and multiglomerular PNs into one category limits the ability to make interpretations about similarities or differences in the synaptic relationships involving MGNs. The authors could get an estimate of the number of multiglomerular PNs in DL5, VA1v, and DA2 from Hemibrain and FlyWire platforms to get a better sense of differences between glomeruli in the MGN category. 

      We agree in that grouping a variety of interneurons into a single category (called MGNs) limits the ability to make interpretations about similarities or differences in the synaptic relationships involving different neurons. This was the unavoidable price to be paid once we decided to register a “comprehensive, synapse-level resolution” map of these two glomeruli. It appears to us that both reviewers have clearly recognized the intrinsic value of this approach and we hope that the Editor will share this opinion. 

      Consistent with the assumptions of Tobin et al., (2017) our hypothesis on LN connectivity differences is based on the fact that they are the most numerous and broadly arborizing neurons of the class that we call multiglomerular neurons in the AL (Chou et al., 2010; Lin et al., 2012; Tanaka et al., 2012). Recent connectome studies confirm this feature across all glomeruli (Bates et al., 2020; Horne et al., 2018; Scheffer et al., 2020; Schlegel et al., 2021; Zheng et al., 2018).  

      In response to the reviewer’s question, we conducted a case-specific reanalysis of the data from Horne (2018), which provides comprehensive connectivity information for the VA1v glomerulus. This allowed us to quantify the proportional contributions of LNs (n = 56) and mPNs (n = 13) to all MGN connections (MGN-MGN, MGN>OSN, MGN>uPN, uPN>MGN, OSN>MGN).

      Our analysis showed that 84% of MGN output originates from LNs. 57% of the input to MGN comes from LNs and 43% from mPNs, largely due to strong OSN>mPN input. Thus, for the filtered MGN connections relevant to distinguishing narrowly from broadly tuned circuits (e.g., MGN>OSN, uPN>MGN; see Fig. 8), LNs are the dominant contributors in VA1v. (These data are not included in the resubmitted manuscript.) This supports our interpretation that the LN are responsible for the majority of MGN connections underlying the observed differences between glomeruli.

      For instance, prior work has reported fewer local interneurons innervating DA2, but in this study there was an unexpected result that there was greater MGN innervation density and synapse # for DA2 relative to DL5 This discrepancy could be due to differences in the number of multiglomerular PNs innervating each glomerulus, which would be obscured when these PNs are combined with local interneurons in the MGN category. 

      "We agree that the greater MGN innervation density in DA2 in our study could reflect a stronger contribution from mPNs. However, innervation density alone does not indicate how many mPNs actually innervate DA2 or DL5. Alternatively, increased innervation and/or synaptic frequency of local interneurons (LNs) could also account for this observation. In our view, neuron number does not necessarily correlate with branching complexity or synaptic density. 

      For example, the dendritic length of the single uPN in glomerulus DL5 is approximately equal to the combined dendritic length of the multiple uPNs of the DA2. Similarly, Tobin et al. (2017) reported that when comparing uPNs in glomerulus DM6 between the left and right brain hemispheres, they found variability in cell number but not in dendritic length. More recently, the FAFB and hemibrain datasets showed a similar pattern in another neuronal type. A substantial variation in cell number was observed for Kenyon cells between the two Drosophila individuals, but this cell type consistently makes and receives, in both individuals, similar presynapses and post-synapses (Schlegel et al., 2024).

      On line 33 the authors cannot claim that DA2-OSNs experience less presynaptic inhibition based on the data in this study. Even without the limitations of the MGN category (described above), presynaptic inhibition depends on more than just the number of synapses, rather it is affected by GABA B receptor expression levels and the second messenger components downstream of this receptor. Physiological experiments are needed to justify this claim, so I recommend adjusting accordingly.

      We agree with the Reviewer and have adjusted the text on line 33 and in the main body of the text by referring to this finding as “presynaptic input”, which is what we have quantified, instead of “less presynaptic inhibition”.

      Figures 5 and 6 seek to distill the wealth of information from this study into broad takehome points for the reader, while still providing a good amount of detail. I think a final more concise graphic summary (similar to the graphical abstract or Figure 6 of Grabe et al 2016) depicting the most critical differences between glomeruli would further clarify the broad findings of this study. 

      We appreciate this comment and we have added a “graphic summary” as the Reviewer proposed. We made a new figure that becomes Figure 8 and summarizes our results and highlights differences between narrowly and broadly tuned glomeruli in a more concise graphical abstract format.

      Minor points: 

      Much of the manuscript provides details about synapse fractions or % synapses for a given synaptic relationship. Please ensure that it is clear which principal cell types are being described, as it can be easy to get lost.  - Should line 284 say "...than DL5 as it has been reported that DA2 is innervated by fewer LNs..."?

      We appreciate the reviewer’s comment and we have corrected this sentence that now reads as follows: (see text: beginning at line 290).  

      Taisz et al.  has been published, so the citation should be updated. 

      We have updated the corresponding citation.  

      On line 233, the authors ascribe the small electron-dense vesicles as likely housing sNPF released by MGNs. However, Carlsson et al. (2010) demonstrated that sNPF is released by OSNs, which was further functionally characterized by Root et al. (2011) and Ko et al. (2014). In terms of MGNs that release neuropeptides, Carlsson et al. 2010 demonstrated that local interneurons immunolabel for tachykinin, myoinhibitory peptide, and allatostatin-A, while two extrinsic neurons release SIFamide. In theory, aminergic neurons could also have small electron-dense vesicles, but this can be variable. 

      The Reviewer is completely right in his criticism. The MGN certainly contain neurons that have been reported to contain neuropeptides other than sNPF. We have corrected this sentence and it now reads as follows (page7, line 236): “Interestingly, besides the abundant clear small vesicles..

      On line 636, the Berck and Schlegel studies demonstrated that panglomerular local interneurons synapse upon OSN, but not that they induce presynaptic inhibition (which was demonstrated in the studies cited in the next sentence). I recommend adjusting this sentence.

      We agree and we have corrected the text following the Reviewers advice. It now reads as follows (page 19. Line 663): “We also observed that OSNs received less MGN feedback.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      This is a revision of a manuscript previously submitted to Review Commons. The authors have partially addressed my comments, mainly by expanding the introduction and discussion sections. Sandy Schmid, a leading expert on the AP2 adaptor and CME, has been added as a co-corresponding author. The main message of the manuscript remains unchanged. Through overexpression of fluorescently tagged CCDC32, the authors propose that, in addition to its established role in AP2 assembly, CCDC32 also follows AP2 to the plasma membrane and regulates CCP maturation. The manuscript presents some interesting ideas, but there are still concerns regarding data inconsistencies and gaps in the evidence.

      With due respect, we would argue that a role for CCDC32 in AP2 assembly is hardly ‘established’.  Rather a single publication reporting its role as a co-chaperone for AAGAP appeared while our manuscript was under review.  We find some similar and some conflicting results, which are described in our revised manuscript.  However, in combination our two papers clearly show that CCDC32, a previously unrecognized endocytic accessory protein, deserves further study.

      (1) eGFP-CCDC32 was expressed at 5-10 times higher levels than endogenous CCDC32. This high expression can artificially drive CCDC32 to the cell surface via binding to the alpha appendage domain (AD)-an interaction that may not occur under physiological conditions.

      While we acknowledge that overexpression of eGFP-CCDC32 could result in artificially driving it to CCPs, we do not believe this is the case for the following reasons:

      i. The bulk of our studies (Figures 2-4) demonstrate the effects of siRNA knockdown on CCDC32 on CCP early stages of CME, and so it is likely that these functions require the presence of endogenous CCDC32 at nascent CCPs as detected with overexpressed eGFP-CCDC32 by TIRF imaging.

      ii. At these levels of overexpression eGFP-CCDC32 fully rescues the effects of siRNA KD of endogenous CCCDC32 of Tfn uptake and CCP dynamics (Figure 6F,G). If the protein was artificially recruited to the AP2 appendage domain, one would expect it to compete with the recruitment of other EAPS to CCPs and hence exhibit defects in CCP dynamics. Indeed, we see the opposite: CCPs that are positive for eGFP-CCDC32 show normal dynamics and maturation rates, while CCPs lacking eGFP-CCDC32 are short-lived and more likely to be aborted (Figure 1C).

      iii. We have identified two modes of binding of CCDC32 to AP2 adaptors: one is through canonical AP2-AD binding motifs, the second is through an a-helix in CCDC32 that, by modeling, docks only to the open conformation of AP2.  Overexpressed CCDC32 lacking this a-helix is not recruited to CCPs (Fig. 6 D,E), indicating that the canonical AP2 binding motifs are not sufficient to recruit CCDC32 to CCPs, even when overexpressed.

      (2) Which region of CCDC32 mediates alpha AD binding? Strangely, the only mutant tested in this work, Δ78-98, still binds AP2, but shifts to binding only mu and beta. If the authors claim that CCDC32 is recruited to mature AP2 via the alpha AD, then a mutant deficient in alpha AD binding should not bind AP2 at all. Such a mutant is critical for establish the model proposed in this work.

      We understand the reviewer’s confusion and thus devoted a paragraph in the discussion to this issue.  As revealed by AlphaFold 3.0 modeling (Figure S6) binding of CCDC32 to the alpha AD likely occurs via the 2 canonical AP2-AD binding motifs encoded in CCDC32. Given the highly divergent nature of AP2-AD binding motifs, we did not identify these motifs without the AlphaFold 3.0 modeling. While these interactions could be detected by GST-pull downs, they are apparently not of sufficient affinity to recruit CCDC32 to CCPs in cells. In the text, we now describe the a-helix we identified as being essential of CCP recruitment as ‘a’ AP2 binding site on CCDC32 rather than ‘the’ AP2 binding site.  Interestingly, and also discussed, Alphafold 3.0 identifies a highly predicted docking site on a-adaptin that is only accessible in the open, cargo-bound conformation of intact AP2.  This is also consistent with the inability of CCDC32(D78-99) to bind the a:µ2 hemi-complex in cell lysates.

      We agree that further structural studies on CCDC32’s interactions with AP2 and its targeting to CCPs will be of interest for future work.

      (3) The concept of hemicomplexes is introduced abruptly. What is the evidence that such hemicomplexes exist? If CCDC32 binds to hemicomplexes, this must occur in the cytosol, as only mature AP2 tetramers are recruited to the plasma membrane. The authors state that CCDC32 binds the AD of alpha but not beta, so how can the Δ78-98 mutant bind mu and beta?

      We introduced the concept of hemicomplexes based on our unexpected (and now explicitly stated as such) finding that the CCDC32(D78-99) mutant efficiently co-IPs with a b2:µ2 hemicomplex.  As stated, the efficiency of this pulldown suggests that the presumed stable AP2 heterotetramer must indeed exist in equilibrium between the two a:s2 and b2:µ2 hemicomplexes, such that CCDC32(D78-99) can sequester and efficiently co-IP with the b2:µ2 hemicomplex.  A previous study, now cited, had shown that the b2:µ2 hemicomplex could partially rescue null mutations of a in C. elegans (PMID: 23482940).  We do not know how CCDC32 binds to the b2:µ2 hemicomplex and we did not detect these interactions using AlphaFold 3.0. However, these interactions could be indirect and involve the AAGAB chaperone.  It is also likely, based on the results of Wan et al. (PMID: 39145939), that the binding is through the µ2 subunit rather than b2. As mentioned above, and in our Discussion, further studies are needed to define the complex and multi-faceted nature of CCDC32-AP2 interactions.

      (4) The reported ability of CCDC32 to pull down AP2 beta is puzzling. Beta is not found in the CCDC32 interactome in two independent studies using 293 and HCT116 cells (BioPlex). In addition, clathrin is also absent in the interactome of CCDC32, which is difficult to reconcile with a proposed role in CCPs. Can the authors detect CCDC32 binding to clathrin?

      Based on the studies of Wan et al. (PMID: 39145939), it is likely that CCDC32 binds to µ2, rather than to the b2 in the b2:µ2 hemicomplex.  As to clathrin being absent from the CCDC32 pull down, this is as expected since the interactions of clathrin even with AP2 are weak in solution (as shown in Figure 5C, clathrin is not detected in our AP2 pull down) so as not to have spontaneous assembly of clathrin coats in the cytosol. Rather these interactions are strengthened by both the reduction in dimensionality that occurs on the membrane and by avidity of multivalent interactions.  For example, Kirchausen reported that 2 AP2 complexes are required to recruit one clathrin triskelion to the PM.

      (5) Figure 5B appears unusual-is this a chimera?

      Figure 5B shows an internal insertion of the eGFP tag into an unstructured region in the AP2 hinge. As we have previously shown (PMID: 32657003), this construct, unique among other commonly used AP2 tags, is fully functional.  We have rearranged the text in the Figure legend to make this clearer.

      Figure 5C likely reflects a mixture of immature and mature AP2 adaptor complexes.

      This is possible, but mature heterotetramers are by far the dominant species, otherwise the 4 subunits would not be immuno-precipitated at near stoichiometric levels with the a subunit.  Near stoichiometric IP with antibodies to the a-AD have been shown by many others in many cell types. 

      (6) CCDC32 is reduced by about half in siRNA knockdown. Why not use CRISPR to completely eliminate CCDC32 expression?

      Fortuitously, partial knockdown was essential to reveal this second function of CCDC32, as we have emphasized in our Discussion.  Wan et al, used CRISPR to knockout CCDC32 and reveal its essential role as a AAGAB co-chaperone.  In the complete absence of CCDC32 mature AP2 complexes fail to form.  However, under our conditions of partial CCDC32 depletion, the expression of AP2 heterotetramers is unaffected revealing a second function of CCDC32 at early stages of CME.  We expect that the co-chaperone function of CCDC32 is catalytic, while its role in CME is more structural; hence the different concentration dependencies, the former being less sensitive to KD than the latter.  This is one reason that many researchers are turning to CRISPRi for whole genome perturbation studies as many proteins play multiple roles that can be masked in KO studies.

      Reviewer #2 (Public review):

      Yang et al. describes CCDC32 as a new clathrin mediated endocytosis (CME) accessory protein. The authors show that CCDC32 binds directly to AP2 via a small alpha helical region and cells depleted for this protein show defective CME. Finally, the authors show that the CCDC32 nonsense mutations found in patients with cardio-facial-neuro-developmental syndrome (CFNDS) disrupt the interaction of this protein to the AP2 complex. The results presented suggest that CCDC32 may act as both a chaperone (as recently published) and a structural component of the AP2 complex.

      Strengths:

      The conclusions presented are generally well supported by experimental data and the authors carefully point out the differences between their results and the results by Wan et al. (PNAS 2024).

      Weaknesses:

      The experiments regarding the role of CCDC32 in CFNDS still require some clarifications to make them clearer to scientists working on this disease. The authors fail to describe that the CCDC32 isoform they use in their studies is different from the one used when CFNDS patient mutations were described. This may create some confusion. Also, the authors did not discuss that the frame-shift mutations in patients may be leading to nonsense mediated decay.

      As requested we have more clearly described our construct with regard to the human mutations and added the possibility of NMD in the context of the human mutations.

      Reviewer #3 (Public review):

      In this manuscript, Yang et al. characterize the endocytic accessory protein CCDC32, which has implications in cardio-facio-neuro-developmental syndrome (CFNDS). The authors clearly demonstrate that the protein CCDC32 has a role in the early stages of endocytosis, mainly through the interaction with the major endocytic adaptor protein AP2, and they identify regions taking part in this recognition. Through live cell fluorescence imaging and electron microscopy of endocytic pits, the authors characterize the lifetimes of endocytic sites, the formation rate of endocytic sites and pits and the invagination depth, in addition to transferrin receptor (TfnR) uptake experiments. Binding between CCDC32 and CCDC32 mutants to the AP2 alpha appendage domain is assessed by pull down experiments. While interaction between CCDC32 and the alpha appendage domain of AP2 is clearly described, a discussion of potential association with other AP2 domains would be beneficial to understand the impact of CCDC32 in endocytosis.

      The reviewer is correct. That CCDC32 also interacts with other subunits of AP2, is evident from the findings of Wan et al. and by the fact that the CCDC32(D78-99) mutant efficiently co-IPs with the b2:µ2 hemicomplex.  We expanded our discussion around this point. CCDC32 remains an, as yet, poorly characterized, but we now believe very interesting EAP worth further study.

      Together, these experiments allow deriving a phenotype of CCDC32 knock-down and CCDC32 mutants within endocytosis, which is a very robust system, in which defects are not so easily detected. A mutation of CCDC32, mimicking CFNDS mutations, is also addressed in this study and shown to have endocytic defects.

      In summary, the authors present a strong combination of techniques, assessing the impact of CCDC32 in clathrin mediated endocytosis and its binding to AP2.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) The authors must be clear about the differences between the CCDC32 isoform they used in their manuscript and the one used to describe the patient mutations. This could be done, for example, in the methods. This is essential for the capacity of other labs to reproduce, follow up and correctly cite these results.

      We have added this information to the Methods. 

      (2) I believe the authors have misunderstood what nonsense mediated decay is. NMD occurs at the mRNA level and requires a full genome context to occur (introns and exons). The fact that a mutant protein is expressed normally from a construct by no means prove that it does not happen. I believe that adding the possibility of NMD occurring would enrich the discussion.

      Thank you, we have now done more homework and have added this possibility into our discussion of the mutant phenotype.  However, if a robust NMD mechanism resulted in a complete loss of CCDC42 protein, then the essential co-chaperone function reported by Wan et al, would result in complete loss of AP2.  A more detailed characterization of the cellular phenotype of these mutations, including assessing the expression levels of AP2 would be informative.

      Reviewer #3 (Recommendations for the authors):

      - It is not clear what the authors mean by '~30s lifetime cohort' (line 159). They refer to Figure 2H, which shows the % of CCPs. Can the authors explain exactly what kind of tracks they used for this analysis, for example which lifetime variations were accepted? Do they refer to the cohorts in Figure S4? In Figure S4, the most frequent tracks have lifetimes < 20 s (in contrast to what is stated in the main text). Why was this cohort not used?

      The ‘30s cohort’ refers to CCPs with lifetimes between 25-35s which encompasses the most abundant species in control cells and CCDC32 KD cells, as shown by the probability curves in Figure 2H. Given the large number of CCPs analyzed we still have large numbers for our analyses n=5998 and 4418, for control and siRNA treated conditions, respectively.  Figure 2H shows the frequency of CCPs in cells treated with CCDC32 siRNA are shifted to shorter lifetimes. We have clarified this in the text.

      - Figure S1: It is now clear, why the mutant versions of CCDC32 are not detected in this western blot. However, data that show the resistance of these proteins to siCCDC32 is still missing (S1 A is in the absence of siCCSC32 I assume, as the legend suggests). A western blot using an anti-GFP antibody, as the one used in Figure S1, after siRNA knock-known would provide clarity.

      That these constructs all contain the same mutation in the siRNA target sequence gives us confidence that they are indeed resistant to siRNA.

      - Note that the anti-CCDC32 antibody does not detect the eGFP-CCDC32(∆78-98) as well as full-length and is unable to detect eGFP-CCDC32(1-54)'. This phrase should belong to Figure S1 (B), not (A)

      Corrected.

      - The immunoprecipitations of CCDC32 and its mutants with AP2 and its subunits are partially confusing. In Figure 5, the authors show that CCDC32 interacts specifically with the alpha-AD, but not with the beta-AD of AP2. In Figure 6B and C, on the other hand, Co-IPs are shown also with the beta and the mu domain of AP2. This is understandable in the context of the full AP2. However, when interaction with the alpha domain (and sigma) is abolished through mutation of helix 78-98, why would beta and mu still interact, when the beta-AD cannot interact with CCDC32 on its own. Are there interaction sites expected outside the ADs in the beta or mu domains?

      See responses to reviewer 1 above.  This result likely reflects the co-chaperone activity of CCDC32 as reported by Wan et al it likely due to their reported interactions of CCDC32 with the µ2 subnit of b2:µ2 hemicomplexes.

      - Figure S6 D, E and F: How much confidence do the authors have on the AlphaFold predictions? Have the same binding poses been obtained repeatedly by independent predictions?

      We provide, with a color scale, the confidence score for each interaction, which is very high (>90%). Of course, this is still a prediction that will need to be verified by further structural studies as we have stated.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      Cook et al. have presented an important study on the transcriptomic and epigenomic signature underlying craniofacial development in marsupials. Given the lack of a dunnart genome, the authors also prepared long and short-read sequence datasets to assemble and annotate a novel genome to allow for the mapping of RNAseq and ChIPseq data against H3K4me3 and H3K27ac, which allowed for the identification of putative promoter and enhancer sites in dunnart. They found that genes proximal to these regulatory loci were enriched for functions related to bone, skin, muscle and embryonic development, highlighting the precocious state of newborn dunnart facial tissue. When compared with mouse, the authors found a much higher proportion of promoter regions aligned between species than for enhancer regions, and subsequent profiling identified regulatory elements conserved across species and are important for mammalian craniofacial development. In contrast, the identification of dunnart-specific enhancers and patterns of RNA expression further confirm the precocious state of muscle development, as well as for sensory system development, in dunnart suggesting that early formation of these features are critical for neonate marsupials likely to assist with detecting and responding to cues that direct the joeys to the mother's teat after birth. This is one of the few epigenomic studies performed in marsupials (of any organ) and the first performed in fat-tailed dunnart (also of any organ). Marsupials are emerging as an important model for studying mammalian development and evolution and the authors have performed a novel and thorough analysis, impressively including the assembly of a new marsupial reference genome that will benefit many future studies.

      Strengths:

      The study provides multiple pieces of evidence supporting the important role enhancer elements play in mammalian phenotypic evolution, namely the finding of a lower proportion of peaks present in both dunnart and mouse for enhancers than for promoters, and dunnart showing more genes uniquely associated with it's active enhancers than any other combination of mouse and dunnart samples, whereas this pattern was less pronounced than for promoter-associated genes. In addition, rigorous parameters were used for the cross-species analyses to identify the conserved regulatory elements and the dunnart-specific enhancers. For example, for the results presented in Figure 1, I agree that it is a little surprising that the average promoter-TSS distance is greater than that for enhancers, but that this could be related to the possible presence of unannotated transcripts between genes. The authors addressed this well by examining the distribution of promoter-TSS distances and using proximal promoters (cluster #1) as high confidence promoters for downstream analyses.

      The genome assembly method was thorough, using two different long read methods (Pacbio and ONT) to generate the long reads for contig and scaffold construction, increasing the quality of the final assembled genome.

      Weaknesses:

      Biological replicates of facial tissue were collected at a single developmental time point of the fat-tailed dunnart within the first postnatal day (P0), and analysed this in the context of similar mouse facial samples from the ENCODE consortium at six developmental time points, where previous work from the authors have shown that the younger mouse samples (E11.5-12.5) approximately corresponds to the dunnart developmental stage (Cook et al. 2021). However, it would be useful to have samples from at least one older dunnart time point, for example, at a developmental stage equivalent to mouse E15.5. This would provide additional insight into the extent of accelerated face development in dunnart relative to mouse, i.e. how long do the regulatory elements that activated early in dunnart remain active for and does their function later influence other aspects of craniofacial development?

      We thank the reviewer for their feedback and agree that the inclusion of multiple postnatal stages in the dunnart would give further valuable insights to the comparative analyses. Unfortunately, we were limited by the pouch young available and prioritized ensuring robust data at a single stage for this study. We hope to expand this work to more stages in future studies.

      The authors refer to the development of the CNS being delayed in marsupials relative to placental mammals, however, evidence shows how development of the dunnart brain (whole brain or cortex) is protracted compared to mouse, by a factor of at least 2 times, rather than delayed per se (Workman et al. 2013; Paolino et al. 2023). In addition, there is evidence that cortical formation and cell birth may begin at approximately the same stage across species equivalent to the neonate period in dunnart (E10.5 in mouse), and that shortly after this at the stage equivalent to mouse E12.5, the dunnart cortex shows signs of advanced neurogenesis followed by a protracted phase of neuronal maturation (Paolino et al. 2023). Therefore, it is possible that marsupial CNS development appears delayed relative to mouse but instead begins at the same stage and then proceeds to develop on a different timing scale.

      The comparison here is not directly between CNS development in placental and marsupials but CNS development relative to development of a subset of structures of the cranial skeleton and musculature (as first proposed by Kathleen Smith 1997). For example, Smith 1997 found that in eutherians, evagination of the telencephalon and appearance of the pigment in the eye occur before the ossification of the premaxilla, maxilla, and dentary. However, in marsupials, evagination of the telencephalon and appearance of the pigment in the eye occur concurrently with condensation of cartilage in the basicranium and the ossification of the premaxilla, maxilla, and dentary. Smith 1997 reports both a delay in the initiation of CNS development in marsupials relative to craniofacial ossification and a protraction of CNS development compared to placental mammals.

      This also highlights the challenges of correlating different staging systems between placentals and marsupials as stages determined as equivalent can change depending on which developmental events are used. The protracted development of the CNS in marsupials (Smith 1997, Workman et al. 2013; Paolino et al. 2023) still supports the hypothesis that during the short gestation period in marsupials structures required for life outside the womb in an embryonic-like state, such as the orofacial region, are likely prioritized.

      We have clarified this based on the reviewers feedback and added text referring to the protraction of marsupial CNS development to the Discussion section.

      [New text]: Marsupials display advanced development of the orofacial region relative to development of the central nervous system when compared to placental mammals[3,6].

      [New text]: Although development of the central nervous system is protracted in marsupials compared to placentals, marsupials have well-developed peripheral motor nerves and sensory nerves (eg. the trigeminal) at birth [5].

      Reviewer #2 (Public review):

      This study by Cook and colleagues utilizes genomic techniques to examine gene regulation in the craniofacial region of the fat-tailed dunnart at perinatal stages. Their goal is to understand how accelerated craniofacial development is achieved in marsupials compared to placental mammals.

      The authors employ state-of-the-art genomic techniques, including ChIP-seq, transcriptomics, and high-quality genome assembly, to explore how accelerated craniofacial development is achieved in marsupials compared to placental mammals. This work addresses an important biological question and contributes a valuable dataset to the field of comparative developmental biology. The study represents a commendable effort to expand our understanding of marsupial development, a group often underrepresented in genomic studies.

      The dunnart's unique biology, characterized by a short gestation and rapid craniofacial development, provides a powerful model for examining developmental timing and gene regulation. The authors successfully identified putative regulatory elements in dunnart facial tissue and linked them to genes involved in key developmental processes such as muscle, skin, bone, and blood formation. Comparative analyses between dunnart and mouse chromatin landscapes suggest intriguing differences in deployment of regulatory elements and gene expression patterns.

      Strengths

      (1) The authors employ a broad range of cutting-edge genomic tools to tackle a challenging model organism. The data generated - particularly ChIP-seq and RNA-seq from craniofacial tissue - are a valuable resource for the community, which can be employed for comparative studies. The use of multiple histone marks in the ChIP-seq experiments also adds to the utility of the datasets.

      (2) Marsupial occupy an important phylogenetic position, but they remain an understudied group. By focusing on the dunnart, this study addresses a significant gap in our understanding of mammalian development and evolution. Obtaining enough biological specimens for these experiments studies was likely a big challenge that the authors were able to overcome.

      (3) The comparison of enhancer landscapes and transcriptomes between dunnarts and can serve as the basis of subsequent studies that will examine the mechanisms of developmental timing shifts. The authors also carried out liftover analyses to identify orthologous enhancers and promoters in mice and dunnart.

      Weaknesses and Recommendations

      (1) The absence of genome browser tracks for ChIP-seq data makes it difficult to assess the quality of the datasets, including peak resolution and signal-to-noise ratios. Including browser tracks would significantly strengthen the paper by provide further support for adequate data quality.

      We have put together an IGV session with the dunnart genome, annotation and ChIP-seq tracks. This is now available in the FigShare data repository (10.7554/eLife.103592.1).

      (2) The first two figures of the paper heavily rely in gene orthology analysis, motif enrichment, etc, to describe the genomic data generated from the dunnart. The main point of these figures is to demonstrate that the authors are capturing the epigenetic signature of the craniofacial region, but this is not clearly supported in the results. The manuscript should directly state what these analyses aim to accomplish - and provide statistical tests that strengthen confidence on the quality of the datasets.

      As this is the first epigenomic profiling for this species we performed extensive data quality control (See Supplementary Tables 2-3, 18, 20-23 and Supplementary Figures 1-3, 6-11). These figures and corresponding Supplementary Tables show the robustness of the data, including well-described metrics for assessing promoters and enhancers, GO terms relevant to craniofacial development and binding motifs for key developmental TF families.

      We have emphasised this aspect of the work more strongly in the results section, particularly in [Defining craniofacial putative enhancer- and promoter regions in the dunnart].

      (3) The observation that "promoters are located on average 106 kb from the nearest TSS" raises significant concerns about the quality of the ChIP-seq data and/or genome annotation. The results and supplemental information suggest a combination of factors, including unannotated transcripts and enhancer-associated H3K4me3 peaks - but this issue is not fully resolved in the manuscript. The authors should confirm that this is not caused by spurious peaks in the CHIP-seq analysis - and possibly improve genome annotation with the transcriptomic datasets presented in the study.

      Spurious ChIP-seq peaks could be possible as there is no “blacklisted regions” database for the dunnart to filter on, however we used a no-IP control, a stringent FDR of 0.01 and peaks had to be reproducible in two biological replicates when calling peaks - all of which should reduce the likelihood of false positives.

      H3K4me3 activity at enhancers is well-established, in particular when enhancer sequences are also bound by RNA Pol II ((Koch and Andrau, 2011; Pekowska et al., 2011). However, compared to H3K4me3 activity at promoters, H3K4me3 levels at enhancers are low (Calo and Wysocka, 2013). This is in line with our observations that H3K4me3 levels at enhancers are much lower than observed at promoter regions (see Supplementary Note 2). We found that H3K4me3 peaks located closer to the TSS had a stronger peak signal (mean = 46.10) than distal H3K4me3 peaks (mean = 6.95; Wilcoxon FDR-adjusted p < 2.2 x 10<sup>-16</sup>). This suggests that although some distal promoter peaks may be due to missingness in the annotation, the majority likely represent peaks associated with enhancer regions. We have emphasized this finding more strongly in the results section:

      [New text]: H3K4me3 activity at enhancers is well-established[25,26], however, compared to H3K4me3 activity at promoters, H3K4me3 levels at enhancers are low[27]. This is in line with our observations where H3K4me3 levels at distal enhancer peaks are nearly 7 times lower than those observed at promoter regions (see SupNote2).

      (4) The comparison of gene regulation between a single dunnart stage (P1) and multiple mouse stages lacks proper benchmarking. Morphological and gene expression comparisons should be integrated to identify equivalent developmental stages. This "alignment" is essential for interpreting observed differences as true heterochrony rather than intrinsic regulatory differences.

      Given the developmental differences between eutherian and marsupial mammals it is challenging to assign the dunnart a precise “equivalent” developmental stage to the mouse. From our morphological and developmental characterisation (see Cook et al. 2020 Nat Comms Bio) based on ossification patterns the dunnart orofacial region on the day of birth appears to be similar to that of an E12.5 mouse embryo (just prior to the observation of ossified craniofacial bones). However, when we compared both regulatory elements and expressed genes between the dunnart at this stage (P1) and 5 developmental stages in the mouse, there is no obvious equivalent stage. For example, when we simply compare genes linked to enhancer peaks, the group with the largest intersection between dunnart and any mouse stage are ~500 genes that are present in dunnart, and mouse stages E10.5, E12.5 - E15.5, Figure 5B). When we then compare genes expressed in the dunnart to temporal gene expression dynamics during mouse development we find that the largest overlap is with genes highly expressed at E14.5 or E15.5 in the mouse (Figure 6, Supplementary Figure 5). We have strengthened the rationale for the selected mouse stages in the comparative analyses section of the results.

      (5) The low conservation of putative enhancers between mouse and dunnart (0.74-6.77%) is surprising given previous reports of higher tissue-specific enhancer conservation across mammals. The authors should address whether this low conservation reflects genuine biological divergence or methodological artifacts (e.g., peak-calling parameters or genome quality). Comparisons with published studies could contextualize these findings.

      The reported range (0.74 - 6.77%) refers to the number regions called as an active enhancer peak in both species (conserved activity) divided by the total number of dunnart peaks alignable to the mouse genome, which we expect to be low given sequence turnover rates and the evolutionary distance separating dunnart and mice. The alignability (conserved sequence) for dunnart enhancers to the mouse genome was ~13% for 100bp regions and can be found in Supplementary Table 22, we have now clarified this in the main text.

      [New Text]: After building dunnart-mm10 liftover chains (see Methods and SupNote5) we compared mouse and dunnart regulatory elements. The alignability (conserved sequence) for dunnart enhancers to the mouse genome was ~13% for 100bp regions (Supplementary Table 22).

      The activity conservation range reported here is consistent with previously reported for marsupial-placental enhancer comparisons (Villar et al. 2015), where ~1% of conserved liver-specific human enhancers had conserved activity to opossum. Follow up studies in Berthelot et al 2018 also found that approximately 1% of human liver enhancers were conserved across the placental mammals included in the study.

      (6) Focusing only on genes associated with shared enhancers excludes potentially relevant genes without clear regulatory conservation. A broader analysis incorporating all orthologous genes may reveal additional insights into craniofacial heterochrony.

      We appreciate the reviewers comment, we understand that a broader analysis may provide some additional insights to this question however in this study our focus was understanding the enhancers driving craniofacial development in these species. We linked enhancers with gene expression data as additional evidence of regulatory programs involved in craniofacial development. The majority (~70%) of genes reproducibly expressed were linked to an active enhancer and/or promoter.   This has now been highlighted in the result section.

      [New Text]: There were 12,153 genes reproducibly expressed at a level > 1 TPM across three biological replicates, with the majority of genes 67% of genes expressed (67%; 8158/12153) associated with near an active enhancer and/or promoter peak.

      In conclusion, this study provides an important dataset for understanding marsupial craniofacial development and highlights the potential of genomic approaches in non-traditional model organisms. However, methodological limitations, including incomplete genome annotation and lack of developmental benchmarking weaken the robustness and of the findings. Addressing these issues would significantly enhance the study's utility to the field and its ability to support the study's central conclusion that dunnart-specific enhancers drive accelerated craniofacial development.

      Reviewer #1 (Recommendations for the authors):

      Minor comments and corrections:

      (1) ChIP-seq FRiP fractions were much higher in dunnart samples than in mouse. Is this related to any differences in sample preparation they are aware of in the ENCODE datasets of mouse, such as different anti-histone antibodies used (and therefore different efficiency of binding to the same histone markers across species)? The authors appear to have addressed something similar with respect to the much lower enriched peak number observed in the mouse sample relative to dunnart in Supp note 4. I suspect the "technical cofounder" they refer to there is affecting both the FRiP scores and the higher correlation coefficients between IP and input in mouse.

      We chose the same antibodies used in the mouse craniofacial tissue ENCODE experiments however, the procedure is slightly different. We used the MAGnify Chromatin Immunoprecipitation System while in the ENCODE assays performed by Bing Ren’s group in 2012 was an in-house lab protocol for MicroChIP. Given that the samples for mouse and dunnart were not processed together, by the same researcher, with the same protocol there could be any number of technical cofounders impacting enrichment. A low FRiP score suggests low specificity as the majority of reads are in non-specific regions (low enrichment), consistent with the higher correlation between IP and input in mouse. The data quality also appears to vary between H3K27ac and H3K4me3 in the mouse (Supplementary Table 21), with H3K4me3 FRiP scores more similar to those observed in our dunnart experiments. This suggests a potential confounder specific to the mouse H3K27ac IP. QC metrics (FRiP, bam correlation) are consistent between H3K27ac and H3K4me3 IPs in our experiments (Supplementary Table 20).

      (2) Some of the promoter peak numbers in Supp table 1 do not match the numbers in the main text.

      We have corrected the incorrect number reported in the text for promoter peaks with orthologous genes (8590 -> 8597).

      (3) In Supp tables 2 and 3, the number of GO terms similar across tables is 466, which is ~42% of total number of enriched GO terms. However the authors mention that only 23% of terms were the same between promoters and enhancers, and a value of 42% was applied to the proportion of terms uniquely enriched for terms associated with genes assigned to promoters only. Unless I'm reading these Supp tables incorrectly, is it possible the proportions were mixed up?

      Thanks for catching this. The lists provided in Supplementary Table 2 were incorrect. The Supplementary Tables and in text description has been corrected to reflect this.

      (4) Would be helpful to add a legend for the mouse samples in Supp Figure 10.

      We have added the labels to the plot.

      (5) In Supp note 5, regarding the percentage of alignable peaks recovered, the percentages mentioned for the 50bp and 500bp peak summit lengths for enhancers and promoters do not seem to match the values in Supp tables 22 and 23.

      Thank you for catching this - we have corrected the Supplementary Tables and in text.

      (6) Please provide additional information to explain how dunnart RNA expression was associated with the five temporal expression clusters found in the mouse data shown in Figure 6 given there is only one dunnart time point and so the species temporal pattern's could not be compared, i.e. how was the odds ratio calculated and was this applied iteratively for dunnart against each mouse age and within each temporal cluster?

      The TCseq package takes the mouse expression data across all 6 stages and calls differentially expressed genes with an absolute log<sub>2</sub> fold-change > 2 compared to the starting time-point (E10.5). The mouse gene expression patterns were clustered into 5 clusters that each show distinct temporal expression patterns (see Supplementary Figure 5D). The output from this is 5 lists where within each list are unique genes that share a temporal pattern. These lists of mouse genes were then each compared to the orthologous genes expressed in the dunnart using a Fishers Exact test with corrections for multiple testing using the Holm method. We have added additional details in the methods:

      [New text]: Orthologous genes reproducibly expressed >1 TPM in the dunnart were compared to the list of genes for each cluster using Fisher’s Exact Test followed by p-value corrections for multiple testing with the Holm method.

      (7) SupFile1 and SupFile2 - which supplementary note or figure are these referring to?

      Apologies for this error. These items were meant to link to the FigShare repository where the supplementary files can be found. We have corrected this using the DOI for the repository.

      Reviewer #2 (Recommendations for the authors):

      (1) Authors should clarify that the mouse ENCODE data used for the comparisons was obtained from craniofacial tissue.

      This has now been corrected to clarify that the mouse ENCODE data used was from craniofacial tissues. ENCODE mouse embryonic facial prominence ChIP-seq and gene expression quantification file accession numbers and details used in study can be found in Supplementary Table 17.

      (2) Given the large differences in TPM for highly expressed genes shown in Figure 5, a MA or volcano plot would provide a more comprehensive view of global transcriptome differences between species.

      We have added this plot as Supplementary Figure 13.

      (3) It is unclear whether the enrichment analysis was performed for mouse genes, dunnart genes, or both.

      In reference to Figure 5, Gene Ontology enrichment analysis was performed on the top 500 highly expressed genes in dunnart. Because there is not an ontology database for dunnart gene IDs, these top 500 dunnart gene IDs were converted to the orthologous gene ID in mouse before performing the enrichment analysis. We apologise for the lack of clarity and have added additional text in the results section to make this clearer. In addition, the relevant methods section now reads:

      [New text]: As there is no equivalent gene ontology database for dunnart, we converted the Tasmanian devil RefSeq IDs to Ensembl v103 using biomaRt v2.46.3 and then converted these to mouse Ensembl v103 IDs. In this way we were able to use the mouse Ensembl Gene Ontology annotations for the dunnart gene domains. All gene ontology analyses were performed using clusterProfiler v4.1.4[117], with Gene Ontology from the org.Mm.eg.db v3.12.0 database[118], setting an FDR-corrected p-value threshold of 0.01 for statistical significance.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Recommendation For the Authors):

      Thanks to the authors for addressing my suggestions. I think these modifications have improved the clarity of the data and the overall presentation of the manuscript. The methods are now more clearly explained, and the additional details help make the results easier to interpret. Where addressing the comment wasn't feasible, the authors gave reasonable explanations. Overall, the revisions strengthen the paper, and I have no further concerns.

      Thank you for your recommendations, which have significantly improved our paper.

      Reviewer #2 (Recommendation For the Authors):

      The additional work conducted by the authors is greatly appreciated. All concerns (and beyond) have been thoroughly addressed by the authors and I am thankful for their consideration and attention to detail. Only one possible issue with the revisions is described below for consideration:

      Regarding the CFU counts and/or axis labels in Figure S3B, some of the listed "CFU per 1 mL" values (in both the figure itself and File S2B) are extraordinarily high. For example, the greatest CFU for PA14 observed in Figure 4E is ~1x10^9. However, PA14 at 0 ug/mL Ceftazidime reaches nearly 1x10^16 in Figure S3B. From what I can tell, this should be beyond the capacity of bacteria in this space by several orders of magnitude. (E.g., a cubic centimeter [~1 mL] is ~1x10^12 cubic micrometers. At their smallest dimensions and volume, a maximum of ~1x10^13 cells could theoretically fit in this space assuming no liquid and perfect organization.) Similarly, both "AMM" and "AMM (+PA14)" consistently reach CFUs between 1x10^12 and 1x10^14 in this assay. Are the authors confident in the values and/or depiction of CFUs for this figure? It seems like this could be a labeling or dilutioncounting issue.

      Thank you for your positive remarks on our revised manuscript and for your constructive comments that have strengthened our work.

      We agree with the concern regarding the CFU counts in Figure S3B. The very high values (>10<sup>12</sup>CFU) reflect a technical enumeration artifact that, due to the nature of the assay, cannot be fully avoided. The origin of these inflated counts is described in more detail below:

      Following competition assays between Pseudomonas aeruginosa and Stenotrophomonas maltophilia in liquid culture with antibiotics, we enumerate survivors for each species by colony forming unit (CFU) counts. Because two different bacterial species must be quantified from mixed cultures, we use a gentamicin resistance marker carried by one species at a time.

      Each condition is therefore enumerated twice, as we alternate which species harbors the gentamicin cassette.

      During coculture in antibiotics and minimal medium, clinical isolates of P. aeruginosa and S. maltophilia, like those used here, can transiently increase their tolerance to antibiotics, including aminoglycosides. This reduces the effectiveness of gentamicin selection at the plating step necessary for CFU enumeration. For the data presented in Figure S3B, in a subset of highOD₆₀₀ conditions in the competition assay, this tolerance produces artificially inflated CFU values that exceed the biological carrying capacity during the CFU enumeration step.

      We evaluated alternative enumeration strategies (e.g., fluorescent protein markers with a nonselective medium), but these proved unsuitable for these strains due to differences in growth rates and media compatibility, introducing other large biases. Given these constraints, selective plating remains the only feasible approach for this work, and the associated artifact cannot be eliminated entirely.

      Importantly, transient resistance (tolerance), although common, is not a universal occurrence (e.g., we did not observe it when we performed the experiments shown in Figure 4E). When it does arise, it occurs reproducibly under the same experimental high-OD<sub>600</sub> conditions and does not obscure any of the relative comparisons that underpin our conclusions.

      For transparency, we have retained the measured values in Figure S3B and we note in the legend that counts above ~10<sup>12</sup> CFU represent a technical overestimation due to transient gentamicin tolerance. Counts below 10<sup>12</sup> CFU are accurately enumerated.

      Reviewer #3 (Recommendation For the Authors):

      All concerns have been satisfied and the manuscript is ready for publishing.

      Thank you for your recommendations, which have significantly improved our paper.


      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      The study would benefit from presenting raw data in some cases, such as MIC values and SDS-PAGE gels, by clarifying the number of independent experiments used, as well as further clarification on statistical significance for some of the data.

      All original data used to generate Fig. 1, Fig. 4E, Fig. S3 and Fig. S4A are presented in File S2. Tab (A) is dedicated to data used for Fig. 1 and Fig. S4A, while tabs (B) and (C) show the data used for Fig. 4E and S3, respectively. This information is indicated in the legends of the relevant figures.

      All experiments in this study were performed in three independent (biological) experiments (with the exception of the complementation data shown in Fig. S1 and Fig. S5, which were performed in two independent (biological) experiments). The number of biological and technical replicates for each experiment is stated in the figure legends, as well as in the “Statistical analysis of experimental data” part of the “Materials and Methods” section of the paper. Specifically, for antibiotic MIC assays we have not performed statistical analyses as per recommended practice. The reason for this is stated in the following section from the “Statistical analysis of experimental data” part of the “Materials and Methods” section of the paper (lines 699-711 of the revised manuscript):

      “Antibiotic MIC values were determined in biological triplicate, except for MIC values recorded for dsbA complementation experiments in our E. coli K-12 inducible system that were carried out in duplicate. All ETEST MICs were determined as a single technical replicate, and all BMD MICs were determined in technical triplicate. All recorded MIC values are displayed in the relevant graphs; for MIC assays where three or more biological experiments were performed, the bars indicate the median value, while for assays where two biological experiments were performed the bars indicate the most conservative of the two values (i.e., for increasing trends, the value representing the smallest increase and for decreasing trends, the value representing the smallest decrease). We note that in line with recommended practice, our MIC results were not averaged. This should be avoided because of the quantized nature of MIC assays, which only inform on bacterial survival for specific antibiotic concentrations and do not provide information for antibiotic concentrations that lie in-between the tested values.”

      Reviewer #2 (Public review):

      While Figure 5E demonstrates a protective effect of DsbA-dependent β-lactamase, the omission of CFU data for S. maltophilia makes it difficult to assess the applicability of the polymicrobial strategy. Since S. maltophilia is pre-cultured prior to the addition of P. aeruginosa and antibiotics, it is unclear whether the protective effect is dependent on high S. maltophilia CFU. It is also unclear what the fate of the S. maltophilia dsbA dsbL mutant is under these conditions. If DsbA-deficient S. maltophilia CFU is not impacted, then this treatment will result in the eradication of only one of the pathogens of interest. If the mutant is lost during treatment, then it is not clear whether the loss of protection is due specifically to the production of non-functional β-lactamase or simply the absence of S. maltophilia.

      We have simultaneously tracked the abundance of P. aeruginosa and S. maltophilia strains in our cross-protection experiment for select antibiotic concentrations. To be able to perform this experiment, we had to label two extremely-drug-resistant strains of S. maltophilia with an antibiotic resistance marker that allowed us to quantify them in mixtures with P. aeruginosa. Our results can be found in Fig. S3 of our revised manuscript and, in a nutshell, show that ceftazidime treatment leads to eradication of both P. aeruginosa and S. maltophilia when disulfide bond formation is impaired in S. maltophilia.

      The following text was added to address the questions of the reviewer:

      “Due to the naturally different growth rates of these two species (S. maltophilia grows much slower than P. aeruginosa) especially in laboratory conditions, the protocol we followed [1] requires S. maltophilia to be grown for 6 hours prior to co-culturing it with P. aeruginosa. To ensure that at this point in the experiment our two S. maltophilia strains, with and without dsbA, had grown comparatively to each other, we determined their cell densities (Fig. S3A). We found that S. maltophilia AMM dsbA dsbL had grown at a similar level as the wild-type strain, and both were at a higher cell density [~10<sup>7</sup> colony forming units (CFUs)] compared to the P. aeruginosa PA14 inoculum (5 x 10<sup>4</sup> CFUs)” (lines 353-361 of the revised manuscript).

      “To ensure that ceftazidime treatment leads to eradication of both P. aeruginosa and S. maltophilia when disulfide bond formation is impaired in S. maltophilia, we monitored the abundance of both strains in each synthetic community for select antibiotic concentrations (Fig. S3B). In this experiment we largely observed the same trends as in Fig. 4E. At low antibiotic concentrations, for example 4 μg/mL of ceftazidime, S. maltophilia AMM is fully resistant and thrives, thus outcompeting P. aeruginosa PA14 (dark pink and dark blue bars in Fig. S3B). The same can also be seen in Fig. 4E, whereby decreased P. aeruginosa PA14 CFUs are recorded. By contrast S. maltophilia AMM dsbA dsbL already displays decreased growth at 4 μg/mL of ceftazidime because of its non-functional L1-1 enzyme, allowing comparatively higher growth of P. aeruginosa (light pink and light blue bars in Fig. S3B). Despite the competition between the two strains, P. aeruginosa PA14 benefits from S. maltophilia AMM’s high hydrolytic activity against ceftazidime, which allows it to survive and grow in high antibiotic concentrations even though it is not resistant (see 128 μg/mL; dark pink and dark blue bars in Fig. S3B). In stark opposition, without its disulfide bond in S. maltophilia AMM dsbA dsbL, L1-1 cannot confer resistance to ceftazidime, resulting in killing of S. maltophilia AMM dsbA dsbL and, consequently, also of P. aeruginosa PA14 (see 128 μg/mL; light pink and light blue bars in Fig. S3B).

      The data presented here show that, at least under laboratory conditions, targeting protein homeostasis pathways in specific recalcitrant pathogens has the potential to not only alter their own antibiotic resistance profiles (Fig. 3 and 4A-D), but also to influence the antibiotic susceptibility profiles of other bacteria that co-occur in the same conditions (Fig. 5). Admittedly, the conditions in a living host are too complex to draw direct conclusions from this experiment. That said, our results show promise for infections, where pathogen interactions affect treatment outcomes, and whereby their inhibition might facilitate treatment” (lines 381406 of the revised manuscript).

      The alleged clinical relevance and immediate, theoretical application of this approach should be properly contextualized. At multiple junctures, the authors state or suggest that interactions between S. maltophilia and P. aeruginosa are known to occur in disease or have known clinical relevance related to treatment failure and disease states. For instance, the citations provided for S. maltophilia protection of P. aeruginosa in the CF lung environment both describe simplified laboratory experiments rather than clinical or in vivo observations. Similarly, the citations provided for both the role of S. maltophilia in treatment failure and CF disease severity do not support either claim. The role of S. maltophilia in CF is currently unsettled, with more recent work reporting conflicting results that support S. maltophilia as a marker, rather than cause, of severe disease. These citations also do not support the suggestion that S. maltophilia specifically contributes to treatment failure. While it is reasonable to pursue these ideas as a hypothesis or potential concern, there is no evidence provided that these specific interactions occur in vivo or that they have clinical relevance.

      Thank you for your comment. You are entirely correct. We have amended the test throughout our revised manuscript to avoid overstating the role of S. maltophilia in CF infections and to reference additional relevant works in the literature. Please find below representative examples of such passages:

      “On the other hand, CF microbiomes are increasingly found to encompass S. maltophilia [2-4], a globally distributed opportunistic pathogen that causes serious nosocomial respiratory and bloodstream infections [5-7]. S. maltophilia is one of the most prevalent emerging pathogens [6] and it is intrinsically resistant to almost all antibiotics, including β-lactams like penicillins, cephalosporins and carbapenems, as well as macrolides, fluoroquinolones, aminoglycosides, chloramphenicol, tetracyclines and colistin. As a result, the standard treatment option for lung infections, i.e., broad-spectrum β-lactam antibiotic therapy, is rarely successful in countering S. maltophilia [7,8], creating a definitive need for approaches that will be effective in eliminating both pathogens” (lines 33-41 of the revised manuscript).

      “Of the organisms studied in this work, S. maltophilia deserves further discussion because of its unique intrinsic resistance profile. The prognosis of CF patients with S. maltophilia lung carriage is still debated [4,9-16], largely because studies with extensive and well-controlled patient cohorts are lacking. This notwithstanding, the therapeutic options against this pathogen are currently limited to one non-β-lactam antibiotic-adjuvant combination, , which is not always effective, trimethoprim-sulfamethoxazole [17-20], and a few last-line β-lactam drugs, like the fifth-generation cephalosporin cefiderocol and the combination aztreonam-avibactam. Resistance to commonly used antibiotics causes many problems during treatment and, as a result, infections that harbor S. maltophilia have high case fatality rates [7]. This is not limited to CF patients, as S. maltophilia is a major cause of death in children with bacteremia [5]” (lines 440-450 of the revised manuscript).

      Reviewer #3 (Public review):

      The impact of the work can be strengthened by demonstrating increased efficacy of antibiotics in mice models or wound models for Pseudomonas infections. Worm models are relevant, but still distant from investigations in animal models.

      Thank you for this comment. We appreciate the sentiment, and we would have liked to be able to perform experiments in a murine model of infection. There are several reasons that made this not possible, and as a result we used G. mellonella as an informative preliminary in vivo infection model. The DSB proteins have been shown to play a central role in bacterial virulence. Because of this our P. aeruginosa and S. maltophilia mutant strains are not efficient in establishing an infection, even in a wound model. This could be overcome had we been able to use the chemical inhibitor of the DSB system in vivo, however this also is not possible This is due to the fact that the chemical compound that we use to inhibit the function of DsbA acts on DsbB. Inhibition of DsbB blocks the re-oxidation of DsbA and leads to its accumulation in its inactive reduced form. However, the action of the inhibitor can be bypassed through reoxidation and re-activation of DsbA by small-molecule oxidants such as L-cystine, which are abundant in rich growth media or animal tissues. This makes the inhibitor only suitable for in vitro assays that can be performed in minimal media, where the presence of small-molecule oxidants can be strictly avoided, but entirely unsuitable for an insect or a vertebrate animal model.

      Reviewer #1 (Recommendation For the Authors):

      (1) The analysis of the role of DsbA in the assembly of cysteine-containing β-lactamases is a significant finding. However, in addition to showing the MIC fold difference, I think, it would be important to show the raw data for the actual MIC values obtained for each β-lactamase enzyme/antibiotic combination and in both strains (+ and - dsbA).

      Also, can the authors clarify whether these experiments were conducted on 3 independent samples (there seems to be some contradicting information in the paper and the supplementary figures). If possible, I would also recommend showing in the figure whether the MIC differences observed were statistically significant.

      All original data used to generate Fig. 1, Fig. 4E, Fig. S3 and Fig. S4A are presented in File S2. Tab (A) is dedicated to data used for Fig. 1 and Fig. S4A, while tabs (B) and (C) show the data used for Fig. 4E and S3, respectively. This information is indicated in the legends of the relevant figures.

      All experiments in this study were performed in three independent (biological) experiments (with the exception of the complementation data shown in Fig. S1 and Fig. S5, which were performed in two independent (biological) experiments). The number of biological and technical replicates for each experiment is stated in the figure legends, as well as in the “Statistical analysis of experimental data” part of the “Materials and Methods” section of the paper. Specifically, for antibiotic MIC assays we have not performed statistical analyses as per recommended practice. The reason for this is stated in the following section from the “Statistical analysis of experimental data” part of the “Materials and Methods” section of the paper (lines 699-711 of the revised manuscript):

      “Antibiotic MIC values were determined in biological triplicate, except for MIC values recorded for dsbA complementation experiments in our E. coli K-12 inducible system that were carried out in duplicate. All ETEST MICs were determined as a single technical replicate, and all BMD MICs were determined in technical triplicate. All recorded MIC values are displayed in the relevant graphs; for MIC assays where three or more biological experiments were performed, the bars indicate the median value, while for assays where two biological experiments were performed the bars indicate the most conservative of the two values (i.e., for increasing trends, the value representing the smallest increase and for decreasing trends, the value representing the smallest decrease). We note that in line with recommended practice, our MIC results were not averaged. This should be avoided because of the quantized nature of MIC assays, which only inform on bacterial survival for specific antibiotic concentrations and do not provide information for antibiotic concentrations that lie in-between the tested values.”

      (2) For Figure 2A, can the authors provide the full Westerns and ideally the SDS-PAGE gel corresponding to the Westerns where the Β-lactamases and the control DNA-K were detected.

      Thank you for this comment. Full immunoblots and SDS PAGE analysis of the immunoblot samples for total protein content are shown in File S3 of our revised manuscript.

      (3) For the enzymatic assays, was the concentration of enzyme used "normalised " based on the amount detected in the westerns where possible or was only the total amount of protein considered. When similar amounts of enzyme were added, was the activity still compromised?

      The β-lactam hydrolysis assay was normalized based on the weight of the cell pellets (wet cell pellet mass) of the tested strains. This means, that for each enzyme expressed in cells with and without DsbA, strains were normalized to the same weight to volume ratio, and thus strains expressing the same enzyme were only compared to each other.

      Because enzyme degradation in the absence of DsbA is a key factor underlying the effects we describe for most of the tested β-lactamases (see Fig. 2A and S4A; no protein band is detected for 5 of the 7 enzymes in the dsbA mutant), it was not possible to normalize our samples based on enzyme levels detected by immunoblot. Normalization based on enzyme amounts would be feasible had we purified each β-lactamase after expression in the two different strain backgrounds (+/- dsbA) assuming sufficient protein amounts could be isolated from the dsbA mutant strain. Nonetheless, we feel that such a comparison would be misleading, since enzyme degradation likely plays the biggest role in the lack of activity observed for most of these enzymes in the absence of DsbA.

      (4) Not sure whether Fig 3 is very informative. Perhaps it could be redesigned to better encapsulate the findings in this manuscript (combine figurer 3 and 6 into one). I would also include the chemical structure of the inhibitors used and perhaps include how they block the system by binding to DsbB.

      Thank you for this comment. Fig. 3 was combined with Fig. 6 of the submitted manuscript. The new model figure is Fig. 5 in our revised manuscript.

      The inhibitor compound used in our study has been extensively characterized in a previous publication [21]. Considering that this inhibitor is not the main focus of our paper, we have avoided showing its chemical structure in any of the main display items. That said, its structure can be found in File S5 of our revised manuscript, which contains the quality control information on this compound. As suggested, we included the following sentence to describe the mode of action of this inhibitor: “Compound 36 was previously shown to inhibit disulfide bond formation in P. aeruginosa via covalently binding onto one of the four essential cysteine residues of DsbB in the DsbA-DsbB complex [21]” (lines 309-311 of the revised manuscript).

      (5) Figure 4: Similar to my comment above showing in the figure whether the differences observed in Figure 4, particularly A-C, are statistically significant (i.e. galleria survival difference in the presence and absence of dsbA) would be beneficial.

      As mentioned in our answer to comment 1 above, we have not performed statistical analyses for antibiotic MIC assays because, in line with recommended practice, our MIC results were not averaged (Fig. 3A,B,D,E of our revised manuscript). This should be avoided because of the quantized nature of MIC assays, which only inform on bacterial survival for specific antibiotic concentrations and do not provide information for antibiotic concentrations that lie in-between the tested values. Statistical analysis of G. mellonella survival data (Fig. 3C,F of our revised manuscript) was performed and is described fully in the legend of Fig. 3, as well as in the “Statistical analysis of experimental data” part of the “Materials and Methods” section of the paper (lines 729-738 of the revised manuscript). Finally, the statistical analyses for the most important comparisons in panels (C) and (F) of Fig. 3 are also marked directly on the figure.

      (6) Were the authors able to test the redox state of DsbA upon addition of the DsbB inhibitor to further demonstrate that the effects observed were indeed due to the obstruction of the Dsb machinery and not due to off target effects.

      Thank you for the opportunity to clarify this. In previous work from our lab, we have used a DSB system inhibitor termed “compound 12” in [22] with activity against DsbB proteins from Enterobacteria. In our previous study [23] we, indeed, tested the redox state of DsbA in the presence of this inhibitor compound. We could not perform the same experiment here with “compound 36” from [21], because we do not have an antibody against the DsbA protein of S. maltophilia. That said, we have carried out experiments that confirm that our results are due to specific inhibition of the DSB system and not because of off-target effects. In particular, we show that the gentamicin MIC values of S. maltophilia AMM remain unchanged in the presence of the inhibitor and treatment of S. maltophilia AMM dsbA dsbL with the compound does not affects its colistin MIC value (Fig. S2E and lines 317-320 of the revised manuscript).

      (7) Given the remarkable effects shown by the DsbB inhibitor, did the authors use this compound to assess whether inhibition of the Dsb system with small molecules would block cross-resistance in S. maltophilia - P. aeruginosa mixed communities (Fig 5D).

      Unfortunately, this was not possible. The decrease in the ceftazidime MIC value of S. maltophilia AMM in the presence of the DSB inhibitor compound is more modest than the effects we observed when the dsbA dsbL mutant is used (compare Fig. 4D (left) with Fig.4A of the revised manuscript). This means that in the presence of the DSB inhibitor there are still sufficient amounts of functional β-lactamase present and we expect that they would contribute to cross-protection of P. aeruginosa. While the use of the DSB inhibitor does have a drastic impact on the colistin resistance profile of S. maltophilia AMM (Fig. 4D of the revised manuscript), unlike β-lactamases, which act as common goods, MCR enzymes act solely on the lipopolysaccharide of their producer and do not contribute to bacterial interactions, precluding the use of colistin for a cross-protection experiment.

      Reviewer #2 (Recommendation For the Authors):

      (1) The acronym used for synthetic cystic fibrosis sputum medium (lines 523, 531, 535, 601, and 603) is defined in the manuscript as 'SCF', but the common formulation is 'SCFM', including in the provided citation. Suggest changing to SCFM for consistency.

      Thank you for this comment. This has been amended throughout our revised manuscript.

      (2) In Figure 1, while the legend states that "No changes in MIC values are observed for strains harboring the empty vector control (pDM1)[...]" (lines 729-30), the median of ceftazidime in the pDM1 control appears to indicate a 2-fold decrease in MIC. This would not seem to significantly impact the other results since the MIC decreases observed for other conditions are all 3-fold or greater, but this should be addressed and/or explained in the text.

      You are correct. Thank you for the opportunity to clarify this. Generally, since MIC assays have a degree of variability, we have only followed decreases in MIC values that are greater than 2fold. Generally, for most of our controls, the recorded MIC fold changes are below 2-fold. The only exception to this is the ceftazidime MIC drop of the empty-vector control, showing a 2fold change, which we do not consider significant.

      To ensure that this is clear in our text and figure legends the following changes were made:

      The clause “only differences larger than 2-fold were considered” was added to the text (lines 110-111 of the revised manuscript).

      We amended the legend of Fig. 1 accordingly: “No changes in MIC values are observed for the aminoglycoside antibiotic gentamicin (white bars) confirming that absence of DsbA does not compromise the general ability of this strain to resist antibiotic stress. Minor changes in MIC values (≤ 2-fold) are observed for strains harboring the empty vector control (pDM1) or those expressing the class A β-lactamases L2-1 and LUT-1, which contain two or more cysteines (Table S1), but no disulfide bonds (top row)”.

      (3) Similarly, in Fig S1E, there appears to be only partial complementation for BPS-1m. Do the authors hypothesize that this observation is related to a folding defect, rather than degradation of protein, as described for BPS-1m for Figure 2?

      Thank you for the opportunity to clarify this. You are correct that we only achieve partial complementation for the E. coli strain expressing the BPS-1m enzyme from the Burkholderia complex. Despite the fact that the gene for this enzyme was codon optimized, we observed that its expression in E. coli is sub-optimal and incurs fitness effects. In fact, to record the data presented in our manuscript the E. coli strains had to be transformed anew every time. Considering that the related enzyme BPS-6 does not present any of these challenges, we attribute the partial complementation to technical difficulties with the expression of the bps-1m gene in E. coli. 

      We clarified this by adding the following clause to our manuscript: “we only achieve partial complementation for the dsbA mutant expressing BPS-1m, which we attribute to the fact that expression of this enzyme in E. coli is sub-optimal” (lines 132-134 of the revised manuscript).

      (4) Lines 204-206: "[...]we deleted the principal dsbA gene, dsbA1 (pathogenic bacteria often encode multiple DsbA analogues [24,25]), in several multidrug-resistant (MDR) P. aeruginosa clinical strains (Table S2)". That multiple DsbA analogues are often encoded is good information to provide, but it was unclear from quickly looking at the citations whether Pa is counted among these. Is it expected that all oxidative protein folding in Pa functions through DsbA1? Conveying this information, if possible, may make the impact of the results in this model clearer.

      Thank you for this comment. To address it we added the following text to our manuscript:

      “To determine whether the effects on β-lactam MICs observed in our inducible system (Fig. 1 and [23]) can be reproduced in the presence of other resistance determinants in a natural context with endogenous enzyme expression levels, we deleted the principal dsbA gene, dsbA1, in several multidrug-resistant (MDR) P. aeruginosa clinical strains (Table S2). Pathogenic bacteria often encode multiple DsbA analogues [24,25] and P. aeruginosa is no exception. It encodes two DsbAs, but DsbA1 has been found to catalyze the vast majority of the oxidative protein folding reactions taking place in its cell envelope [26]” (lines 172-178 of the revised manuscript).

      (5) Regarding the clinical Pa isolates G4R7 and G6R7, have the authors performed any phenotypic testing on these strains to identify differences that might explain the substantial difference in piperacillin MIC? I.e., can these isolates be distinguished by growth rate, genetic markers or expression levels, early or late infection, mucoidy, etc. This is not essential for the current work, but could weigh on the efficacy of this treatment strategy for AIM1expressing clinical isolates. (E.g., the G4R7 dsbA1 strain exhibits a piperacillin MIC still ~2fold higher than WT G6R7).

      Thank you for the opportunity to clarify this. For clinical strains used in our study, we have evaluated their antibiotic resistance profiles, but we have not performed any additional phenotypic characterization. There are many reasons that contribute to differences in antibiotic resistance, starting simply from β-lactamase expression levels and extending to organismal effects, like the ones mentioned by the reviewer. Such characterization would fall outside the scope of our paper, especially since we sensitize our tested P. aeruginosa clinical isolates for the majority of the β-lactams antibiotics tested. 

      We acknowledged this by adding the following sentence to our revised manuscript: 

      “Despite the fact that P. aeruginosa G4R7 dsbA1 was not sensitized for piperacillintazobactam, possibly due to the high level of piperacillin-tazobactam resistance of the parent clinical strain, our results across these two isolates show promise for DsbA as a target against β-lactam resistance in P. aeruginosa” (lines 191-194 of the revised manuscript).

      (6) Lines 180-2: "This shows that without their disulfide bonds, these proteins are unstable and are ultimately degraded by other cell envelope proteostasis components [33]". While it is clear that protein is significantly lost in all cases except for BPS-1m in 2A, the dsbA pDM1bla constructs in 2B appear to all retain non-trivial (>10-fold) nitrocefin hydrolysis activity compared to the dsbA pDM1 control. This does not impact the other results in 2B, but it would seem that a loss-of-function folding defect, as described subsequently for BPS-1m, is also part of the explanation for the observed MIC decreases, and this was not necessarily clear from the quoted passage. This could simply be clarified in the final sentence - that both mechanisms are potentially in play - if the authors agree with that interpretation.

      You are correct, thank you for your comment. We amended the text in our revised manuscript as follows: 

      The data presented so far (Fig. 1 and 2) demonstrate that disulfide bond formation is essential for the biogenesis (stability and/or protein folding) and, in turn, activity of an expanded set of clinically important β-lactamases, including enzymes that currently lack inhibitor options” (lines 158-161 of the revised manuscript).

      (7) While it is clear from Figure S2 that the various dsb mutants do not have a general growth defect or collateral sensitivity to another antibiotic, it does not appear that there is an analogous control for the DSB inhibitor demonstrating no growth/toxic effects at the concentration used. This could be provided similarly to Figure S2, using gentamicin as a control antibiotic.

      We have carried out experiments that confirm that our results are due to specific inhibition of the DSB system and not because of off-target effects. In particular, we show that the gentamicin MIC values of S. maltophilia AMM remain unchanged in the presence of the inhibitor and treatment of S. maltophilia AMM dsbA dsbL with the compound does not affects its colistin MIC value (Fig. S2E and lines 317-320 of the revised manuscript).

      (8) Complementation is appropriately provided for experiments with E. coli, but are not provided for P. aeruginosa or S. maltophilia. It should be straightforward to complement in Pa, but is also probably less critical considering the evidence from E. coli. However, since the Sm mutant is a gene cluster with two genes, it would seem more imperative to complement this strain. This reviewer is not familiar enough with Sm to know if complementation is routine or feasible with this organism; if not, the controls for the DSB inhibitor should at least be provided.

      As mentioned in our response to comment 7 above, we have carried out experiments that confirm that our DSB inhibitor results are due to specific inhibition of the DSB system and not because of off-target effects.

      Moreover, in response to this comment, we have further demonstrated that our results are due to the specific interaction of DsbA with β-lactamase enzymes by complementing dsbA deletions in representative clinical strains of multidrug-resistant Pseudomonas aeruginosa and extremely-drug-resistant Stenotrophomonas maltophilia. We would like to note here that gene complementation in clinical isolates remains very rare in the literature due to their high levels of resistance and limited genetic tractability. Most of the few complementation examples reported for these two organisms are limited to strains that, although pathogenic, are commonly used in the lab, or to complementation efforts in non-clinical strain systems (for example use of P. aeruginosa PA14 for complementation, instead of the focal clinical isolate).

      We tested three different complementation strategies, two of which ended up being unsuccessful. After approximately 9 months of work, we succeeded in complementing a representative clinical strain for each organism (P. aeruginosa CDC #769 dsbA1 and S. maltophilia AMM dsbA dsbL) by inserting the dsbA1 gene from P. aeruginosa PAO1 into the Tn7 site on the chromosome. Both clinical strains show full complementation for every antibiotic tested; our complementation results can be found in Fig. S2B,D of the revised manuscript.

      The following text was added for P. aeruginosa clinical isolates:

      We have demonstrated the specific interaction of DsbA with the tested β-lactamase enzymes in our E. coli K-12 inducible system using gentamicin controls (Fig. 1 and File S2A) and gene complementation (Fig. S1). To confirm the specificity of this interaction in P. aeruginosa, we performed representative control experiments in one of our clinical strains, P. aeruginosa CDC #769. We first tested the general ability of P. aeruginosa CDC #769 dsbA1 to resist antibiotic stress by recording MIC values against gentamicin, and found it unchanged compared to its parent (Fig. S2A). Gene complementation in clinical isolates is especially challenging and rarely attempted due to the high levels of resistance and lack of genetic tractability in these strains. Despite these challenges, to further ensure the specificity of the interaction of DsbA with tested β-lactamases in P. aeruginosa, we have complemented dsbA1 from P. aeruginosa PAO1 into P. aeruginosa CDC #769 dsbA1. We found that complementation of dsbA1 restores MICs to wild-type values for both tested β-lactam compounds (Fig. S2B) further demonstrating that our results in P. aeruginosa clinical strains are not confounded by off-target effects” (lines 226-239 of the revised manuscript).

      The following text was added for S. maltophilia clinical isolates: 

      “Since the dsbA and dsbL are organized in a gene cluster in S. maltophilia, we wanted to ensure that our results reported above were exclusively due to disruption of disulfide bond formation in this organism. First, we recorded gentamicin MIC values for S. maltophilia AMM dsbA dsbL and found them to be unchanged compared to the gentamicin MICs of the parent strain (Fig. S2C). This confirms that disruption of disulfide bond formation does not compromise the general ability of this organism to resist antibiotic stress. Next, we complemented S. maltophilia AMM dsbA dsbL. The specific oxidative roles and exact regulation of DsbA and DsbL in S. maltophilia remain unknown. For this reason and considering that genetic manipulation of extremely-drug-resistant organisms is challenging, we used our genetic construct optimized for complementing P. aeruginosa CDC #769 dsbA1 with dsbA1 from P. aeruginosa PAO1 (Fig. S2B) to also complement S. maltophilia AMM dsbA dsbL. We based this approach on the fact that DsbA proteins from one species have been commonly shown to be functional in other species [27-30]. Indeed, we found that complementation of S. maltophilia AMM dsbA dsbL with P. aeruginosa PAO1 dsbA1 restores MICs to wild-type values for both ceftazidime and colistin (Fig. S2D), conclusively demonstrating that our results in S. maltophilia are not confounded by off-target effects” (lines 282-297 of the revised manuscript).

      (9) In Figure 5E, the growth inhibition and loss of Pa CFU in 4 ug/mL ceftazidime for the Sm co-culture condition, which is subsequently lost in the Sm dsbA dsbL co-culture, does not appear to be discussed. As Pa is shown to grow fine in monoculture at this concentration, this result should be discussed in relation to the co-culture dynamics. Is it expected or observed that WT Sm is out-competing Pa under this condition and growing to a high CFU/mL? This would seem to have parallels to citation 49.

      As requested by this reviewer (see comment 10 below), we simultaneously tracked the abundance of P. aeruginosa and S. maltophilia strains in our cross-protection experiment. During this process we probed the abundances of the two organisms at 4 µg/mL of ceftazidime. Our results can be seen in Fig. S3B of the revised manuscript. The reviewer is correct and these effects are due to competition between P. aeruginosa and S. maltophilia with the latter being able to reach very high CFUs in this antibiotic concentration. 

      The following text on co-culture dynamics was added to our revised manuscript: 

      At low antibiotic concentrations, for example 4 μg/mL of ceftazidime, S. maltophilia AMM is fully resistant and thrives, thus outcompeting P. aeruginosa PA14 (dark pink and dark blue bars in Fig. S3B). The same can also be seen in Fig. 4E, whereby decreased P. aeruginosa PA14 CFUs are recorded. By contrast S. maltophilia AMM dsbA dsbL already displays decreased growth at 4 μg/mL of ceftazidime because of its non-functional L1-1 enzyme, allowing comparatively higher growth of P. aeruginosa (light pink and light blue bars in Fig. S3B)” (lines 384-390 of the revised manuscript).

      (10) The data presented in Figure 5E would be augmented by the inclusion of, for at least a few representative cases, the Sm CFUs relative to the Pa CFUs. In describing the protective effects of Sm on Pa for imipenem treatment, the authors of citation 12 note that the effect was dependent on Sm cell density. This raises the immediate question of whether the protection observed in this work is similarly dependent on cell density of Sm. It is unclear if the authors expect Sm to persist under these conditions, and it seems Sm CFU should be expected to be relatively high considering it is pre-incubated for 6 hours prior to the assay. What is the physiological state of these cells, and how are they affected by ceftazidime? While many other variables are likely relevant to the translation of this protection, the relative abundance and localization of Sm and Pa commonly observed in CF patients, as well as the effective concentration of antibiotic observed in vivo, is likely worth consideration.

      As mentioned in our response to comment 9 above, we have simultaneously tracked the abundance of P. aeruginosa and S. maltophilia strains in our cross-protection experiment for select antibiotic concentrations. To be able to perform this experiment, we had to label two extremely-drug-resistant strains of S. maltophilia with an antibiotic resistance marker that allowed us to quantify them in mixtures with P. aeruginosa. Our results can be found in Fig. S3 of our revised manuscript and, in a nutshell, show that ceftazidime treatment leads to eradication of both P. aeruginosa and S. maltophilia when disulfide bond formation is impaired in S. maltophilia.

      The following text was added to address the questions of the reviewer:

      “Due to the naturally different growth rates of these two species (S. maltophilia grows much slower than P. aeruginosa) especially in laboratory conditions, the protocol we followed [1] requires S. maltophilia to be grown for 6 hours prior to co-culturing it with P. aeruginosa. To ensure that at this point in the experiment our two S. maltophilia strains, with and without dsbA, had grown comparatively to each other, we determined their cell densities (Fig. S3A). We found that S. maltophilia AMM dsbA dsbL had grown at a similar level as the wild-type strain, and both were at a higher cell density [~10<sup>7</sup> colony forming units (CFUs)] compared to the P.aeruginosa PA14 inoculum (5 x 10<sup>4</sup> CFUs)” (lines 353-361 of the revised manuscript).

      “To ensure that ceftazidime treatment leads to eradication of both P. aeruginosa and S. maltophilia when disulfide bond formation is impaired in S. maltophilia, we monitored the abundance of both strains in each synthetic community for select antibiotic concentrations (Fig. S3B). In this experiment we largely observed the same trends as in Fig. 4E. At low antibiotic concentrations, for example 4 μg/mL of ceftazidime, S. maltophilia AMM is fully resistant and thrives, thus outcompeting P. aeruginosa PA14 (dark pink and dark blue bars in Fig. S3B). The same can also be seen in Fig. 4E, whereby decreased P. aeruginosa PA14 CFUs are recorded. By contrast S. maltophilia AMM dsbA dsbL already displays decreased growth at 4 μg/mL of ceftazidime because of its non-functional L1-1 enzyme, allowing comparatively higher growth of P. aeruginosa (light pink and light blue bars in Fig. S3B). Despite the competition between the two strains, P. aeruginosa PA14 benefits from S. maltophilia AMM’s high hydrolytic activity against ceftazidime, which allows it to survive and grow in high antibiotic concentrations even though it is not resistant (see 128 μg/mL; dark pink and dark blue bars in Fig. S3B). In stark opposition, without its disulfide bond in S. maltophilia AMM dsbA dsbL, L1-1 cannot confer resistance to ceftazidime, resulting in killing of S. maltophilia AMM dsbA dsbL and, consequently, also of P. aeruginosa PA14 (see 128 μg/mL; light pink and light blue bars in Fig. S3B).

      The data presented here show that, at least under laboratory conditions, targeting protein homeostasis pathways in specific recalcitrant pathogens has the potential to not only alter their own antibiotic resistance profiles (Fig. 3 and 4A-D), but also to influence the antibiotic susceptibility profiles of other bacteria that co-occur in the same conditions (Fig. 5). Admittedly, the conditions in a living host are too complex to draw direct conclusions from this experiment. That said, our results show promise for infections, where pathogen interactions affect treatment outcomes, and whereby their inhibition might facilitate treatment” (lines 381406 of the revised manuscript).

      (11) Regarding the role of microbial interactions in CF and other disease/infection contexts, the authors should temper their descriptions in accordance with citations provided. As an example, lines 96-99: "For example, in the CF lung, highly drug-resistant S. maltophilia strains actively protect susceptible P. aeruginosa from β-lactam antibiotics [12], and ultimately facilitate the evolution of β-lactam resistance in P. aeruginosa [14]."

      Neither citation provided here attests to Sm protection of Pa "in the CF lung". Both papers use a simplified in vitro co-culture model to assess Sm protection of Pa from antibiotics and the evolution of Pa antibiotic resistance in the presence or absence of Sm, respectively. In the latter case, it should also be noted that while the authors observed somewhat faster Pa resistance evolution in one co-culture condition, they did not observe it in the other, and that resistance evolution in general was observed regardless of co-culture condition. There are also statements in the ultimate and penultimate paragraphs of the Discussion section that repeat these points. The authors could re-frame this aspect of their investigation as part of a working hypothesis related to potential interactions of these pathogens, and should appropriately caveat what is and is not known from in vitro and in vivo/clinical work.

      Thank you for your comment. You are entirely correct. We have amended the test throughout our revised manuscript to avoid overstating these finding and to be clear about the fact that they originate from experimental studies. Please find below representative examples of such passages:

      “In particular, some antibiotic resistance proteins, like β-lactamases, which decrease the quantities of active drug present, function akin to common goods, since their benefits are not limited to the pathogen that produces them but can be shared with the rest of the bacterial community. This means that their activity enables pathogen cross-resistance when multiple species are present [1,31], something that was demonstrated in recent work investigating the interactions between pathogens that naturally co-exist in CF infections. More specifically, it was shown that in laboratory co-culture conditions, highly drug-resistant S. maltophilia strains actively protect susceptible P. aeruginosa from β-lactam antibiotics [1]. Moreover, this crossprotection was found to facilitate, at least under specific conditions, the evolution of β-lactam resistance in P. aeruginosa [32]” (lines 47-57 of the revised manuscript).

      “The antibiotic resistance mechanisms of S. maltophilia impact the antibiotic tolerance profiles of other organisms that are found in the same infection environment. S. maltophilia hydrolyses all β-lactam drugs through the action of its L1 and L2 β-lactamases [7,8]. In doing so, it has been experimentally shown to protect other pathogens that are, in principle, susceptible to treatment, such as P. aeruginosa [1]. This protection, in turn, allows active growth of otherwise treatable P. aeruginosa in the presence of complex β-lactams, like imipenem [1], and, at least in some conditions, increases the rate of resistance evolution of P. aeruginosa against these antibiotics [32]” (lines 332-340 of the revised manuscript).

      (12) Regarding the role of S. maltophilia in CF disease, the authors should either discuss clinical associations more completely or note the conflicting data on its role in disease. As an example, lines 84-87: "As a result, the standard treatment option, i.e., broad-spectrum βlactam antibiotic therapy, constitutes a severe risk for CF patients carrying both P. aeruginosa and S. maltophilia [10,11], creating an urgent need for antimicrobial approaches that will be effective in eliminating both pathogens."

      It is unclear how this treatment results in a "severe risk" for CF patients colonized by both Sm and Pa. Citation 10 suggests an association between anti-pseudomonal antibiotic use and increased prevalence of Sm, but neither citation supports a worsening clinical outcome from this treatment. Citation 10 further notes that clinical scores between Sm-positive and control cohorts could not be distinguished statistically. Citation 11 is a review that makes note of this conflicting data regarding Sm, including reference to a more recent (at the time) result using multivariate analysis showing no independent affect of Sm on survival.

      The above point similarly applies to other statements in the manuscript, for example at lines 266-267: "Considering the contribution of S. maltophilia strains to treatment failure in CF lung infections [8,10,11][...]" As well as lines 79-80: "Pulmonary exacerbations and severe disease states are also associated with the presence of S. maltophilia [8]"

      Again, the provided citations do not support the implication that Sm specifically 'contributes to treatment failure in CF lung infections' or that Sm is specifically associated with severe disease states. In addition to the previously discussed citations, citation 8 describes broad "pulmotypes" composed of 10 species/genera that could be associated with particular clinical (e.g., exacerbation) or treatment (e.g., antibiotic therapy) characteristics, but these cannot, without further analysis, be associated with, or causally linked to, a specific pathogen. While pulmotype 2 in citation 8 was associated with a more severe clinical state and appeared to have the highest relative abundance of Sm compared to other pulmotypes, Sm was not identified (Figure 4A) as an independent factor that distinguishes between moderate and severe disease, unlike Pa and some anaerobes (4F-H). The authors also observed that decreasing relative abundance of Pa, in particuar, is correlated with subsequent exacerbation, but did not correlate this with the presence of any other species or genera. Again, this should be re-framed with the appropriate caveat that this is a hypothesis with possible clinical significance.

      Several suggested papers are included below on Sm association with clinical characteristics to incorporate into the manuscript if the authors choose to do so:

      https://doi.org/10.1177/14782715221088909

      https://doi.org/10.1016/j.prrv.2010.07.003

      https://doi.org/10.1016/j.jcf.2013.05.009 https://doi.org/10.1002/ppul.23943

      https://doi.org/10.1002/14651858.CD005405.pub2

      https://doi.org/10.1164/rccm.2109078 http://dx.doi.org/10.1136/thx.2003.017707

      https://erj.ersjournals.com/content/23/1/98.short

      Thank you for your comment. You are entirely correct. We have amended the test throughout our revised manuscript to avoid overstating the role of S. maltophilia in CF infections and to reference additional relevant works in the literature. Please find below representative examples of such passages:

      “On the other hand, CF microbiomes are increasingly found to encompass S. maltophilia [2-4], a globally distributed opportunistic pathogen that causes serious nosocomial respiratory and bloodstream infections [5-7]. S. maltophilia is one of the most prevalent emerging pathogens [6] and it is intrinsically resistant to almost all antibiotics, including β-lactams like penicillins, cephalosporins and carbapenems, as well as macrolides, fluoroquinolones, aminoglycosides, chloramphenicol, tetracyclines and colistin. As a result, the standard treatment option for lung infections, i.e., broad-spectrum β-lactam antibiotic therapy, is rarely successful in countering S. maltophilia [7,8], creating a definitive need for approaches that will be effective in eliminating both pathogens” (lines 33-41 of the revised manuscript).

      “Of the organisms studied in this work, S. maltophilia deserves further discussion because of its unique intrinsic resistance profile. The prognosis of CF patients with S. maltophilia lung carriage is still debated [4,9-16], largely because studies with extensive and well-controlled patient cohorts are lacking. This notwithstanding, the therapeutic options against this pathogen are currently limited to one non-β-lactam antibiotic-adjuvant combination, , which is not always effective, trimethoprim-sulfamethoxazole [17-20], and a few last-line β-lactam drugs, like the fifth-generation cephalosporin cefiderocol and the combination aztreonam-avibactam. Resistance to commonly used antibiotics causes many problems during treatment and, as a result, infections that harbor S. maltophilia have high case fatality rates [7]. This is not limited to CF patients, as S. maltophilia is a major cause of death in children with bacteremia [5]” (lines 440-450 of the revised manuscript).

      Reviewer #3 (Recommendation For the Authors):

      (1) The referencing of supplemental figures does not follow a sequential order. For example, Figure S2 appears in the text before S1. The sequential ordering of figure numbers improves the readability and can be considered while editing the manuscript for revision.

      Thank you for this comment. This is amended in our revised manuscript and supplemental figures and files are cited in order.

      (2 )It will be useful to provide a brief description of ambler classes since these are important to study design (for a broader audience).

      Thank you for this suggestion. This has been added and can be found in lines 91-101 of the revised manuscript.

      (3) The rationale for using K12 strain for E. coli should be provided. It appears that is a model system that is well established in their lab, but a scientific rationale can be listed. Maybe this strain does not have any lactamases in its genome other than the one being expressed as compared to pathogenic E. coli?

      Thank you for this suggestion. This has been added and can be found in lines 104-106 of the revised manuscript.

      (4) The reviewers used worm model to test their observations, which is relevant. Given the significant implications of their work in overcoming resistance to clinically used antibiotics and availability of already generated dsbA mutants in clinical strains, it will be useful to investigate survival in animal models or at least wound models of Pseudomonas infections. The reviewer does not deem this necessary, but it will significantly increase the impact of their seminal work.

      Thank you for this comment. We appreciate the sentiment, and we would have liked to be able to perform experiments in a murine model of infection. There are several reasons that made this not possible, and as a result we used G. mellonella as an informative preliminary in vivo infection model. The DSB proteins have been shown to play a central role in bacterial virulence. Because of this our P. aeruginosa and S. maltophilia mutant strains are not efficient in establishing an infection, even in a wound model. This could be overcome had we been able to use the chemical inhibitor of the DSB system in vivo, however this also is not possible This is due to the fact that the chemical compound that we use to inhibit the function of DsbA acts on DsbB. Inhibition of DsbB blocks the re-oxidation of DsbA and leads to its accumulation in its inactive reduced form. However, the action of the inhibitor can be bypassed through reoxidation and re-activation of DsbA by small-molecule oxidants such as L-cystine, which are abundant in rich growth media or animal tissues. This makes the inhibitor only suitable for in vitro assays that can be performed in minimal media, where the presence of small-molecule oxidants can be strictly avoided, but entirely unsuitable for an insect or a vertebrate animal model.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Dixit, Noe, and Weikl apply coarse-grained and all-atom molecular dynamics to determine the response of the mechanosensitive proteins Piezo 1 and Piezo 2 proteins to tension. Cryo-EM structures in micelles show a high curvature of the protein whereas structures in lipid bilayers show lower curvature. Is the zero-stress state of the protein closer to the micelle structure or the bilayer structure? Moreover, while the tension sensitivity of channel function can be inferred from the experiment, molecular details are not clearly available. How much does the protein's height and effective area change in response to tension? With these in hand, a quantitative model of its function follows that can be related to the properties of the membrane and the effect of external forces. 

      Simulations indicate that in a bilayer the protein relaxes from the highly curved cryo-EM dome (Figure 1). 

      Under applied tension, the dome flattens (Figure 2) including the underlying lipid bilayer. The shape of the system is a combination of the membrane mechanical and protein conformational energies (Equation 1). The membrane's mechanical energy is well-characterized. It requires only the curvature and bending modulus as inputs. They determine membrane curvature and the local area metric (Equation 4) by averaging the height on a grid and computing second derivatives (Equations 7, 8) consistent with known differential geometric formulas. 

      The bending energy can be limited to the nano dome but this implies that the noise in the membrane energy is significant. Where there is noise outside the dome there is noise inside the dome. At the least, they could characterize the noisy energy due to inadequate averaging of membrane shape. 

      My concern for this paper is that they are significantly overestimating the membrane deformation energy based on their numerical scheme, which in turn leads to a much stiffer model of the protein itself.

      We agree that “thermal noise” is intrinsic to MD simulations, as in “real” systems, leading to thermally excited shape fluctuations of membranes and conformational fluctuations of proteins. However, for our coarse-grained simulations, the thermally excited membrane shape fluctuations can be averaged out quite well, and the resulting average shapes are smooth, see e.g. the shapes and lines of the contour plots in Fig. 1 and 2. For our atomistic simulations, the averaged shapes are not as smooth, see Fig. 3a and the lines of the contour plots in Fig. 3b. Therefore, we do not report bending energies for the nanodome shapes determined from atomistic simulations, because bending energy calculations are sensitive to remaining “noise” on small scales (due to the scale invariance of the bending energy), in contrast to calculations of excess areas, which we state now on lines 620ff.

      For our coarse-grained simulations, we now corroborate our bending energy calculations based on averaged 3d shapes by comparing to bending energy values obtained from highly smoothened 2d mean curvature profiles (see Fig. 1c for mean curvature profiles in tensionless membranes). We discuss this in detail from line 323 on, starting with:

      “To corroborate our bending energy calculations for these averaged three-dimensional nanodome shapes, we note that essentially identical bending energies can be obtained from the highly smoothened mean curvatures M of the two-dimensional membrane profiles. …”

      Two things would address this: 

      (1) Report the membrane energy under different graining schemes (e.g., report schemes up to double the discretization grain). 

      There are two graining schemes in the modeling, and we have followed the reviewer’s recommendation regarding the second scheme. In the first, more central graining scheme, we use quadratic membrane patches with a sidelength of about 2 nm to determine membrane midplane shapes and lipid densities of each simulation conformation. This graining scheme has also been previously employed in Hu, Lipowsky, Weikl, PNAS 38, 15283 (2013) to determine the shape and thermal roughness of coarse-grained membranes. A sidelength of 2 nm is necessary to have sufficiently many lipid headgroups in the upper and lower leaflet in the membrane patches for estimating the local height of these leaflets, and the local membrane midplane height as average of these leaflet heights (see subsection “Membrane shape of simulation conformation” in the Methods section for details).  However, we strongly believe that doubling the sidelength of membrane patches in this discretization is not an option, because a discretization length of 4 nm is too coarse to resolve the membrane deformations in the nanodome, see e.g. the profiles in Fig. 1b. Moreover, any “noise” from this discretization is rather completely smoothened out in the averaging process used in the analysis of the membrane shapes, at least for the coarse-grained simulations. This averaging process requires rotations of membrane conformations to align the protein orientations of the conformations (see subsection “Average membrane shapes and lipid densities” for details). Because of these rotations, the original discretization is “lost” in the averaging, and a continuous membrane shape is generated. To calculate the excess areas and bending energies for this smooth, continuous membrane shape, we use a discretization of the Monge plane into a square lattice with lattice parameter 1 nm. As a response to the referee’s suggestion, we now report that the results for the excess area do not change significantly when doubling this lattice parameter to 2 nm. On line 597, we write:

      “For a lattice constant of a=2 nm, we obtain extrapolated values of the excess area Delta A from the coarse-grained simulations that are 2 to 3% lower than the values for a=1 nm, which is a small compared to statistical uncertainties with relative errors of around 10%.”

      On lines 614ff, we now state that the bending energy results are about 10% to 13% lower for a=2 nm, likely because of the lower resolution of the curvature in the nanodome compared to a=1 nm, rather than incomplete averaging and remaining roughness of the coarse-grained nanodome shapes.

      (2) For a Gaussian bump with sigma=6 nm I obtained a bending energy of 0.6 kappa, so certainly in the ballpark with what they are reporting but significantly lower (compared to 2 kappa, Figure 5 lower left). It would be simpler to use the Gaussian approximation to their curves in Figure 3 - and I would argue more accurate, especially since they have not reported the variation of the membrane energy with respect to the discretization size and so I cannot judge the dependence of the energy on discretization. I view reporting the variation of the membrane energy with respect to discretization as being essential for the analysis if their goal is to provide a quantitative estimate for the force of Piezo. The Helfrich energy computed from an analytical model with a membrane shape closely resembling the simulated shapes would be very helpful. According to my intuition, finite-difference estimates of curvatures will tend to be overestimates of the true membrane deformation energy because white noise tends to lead to high curvature at short-length scales, which is strongly penalized by the bending energy. 

      Instead of Gaussian bumps, we now calculate the membrane bending energy also from the two-dimensional, continuous mean curvature profiles (see Fig. 1c). These mean curvature profiles are highly smoothened (see figure caption for details). Nonetheless, we obtain essentially the same bending energies as in our discrete calculations of averaged, smoothened threedimensional membrane shapes, see new text on lines 326ff. We believe that this agreement corroborates our bending energy calculations. We still focus on values obtained for threedimensional membrane shapes, because of incomplete rotational symmetry. The three-dimensional membrane shapes exhibit variations with the three-fold symmetry of the Piezo proteins, see Figure 2a and b.

      We agree that the bending energy of thermally rough membranes depends on the discretization scheme, because the discretization length of any discretization scheme leads to a cut-off length for fluctuation modes in a Fourier analysis. But again, we average out the thermal noise, for reasons given in the Results section, and analyse smooth membrane shapes.  

      The fitting of the system deformation to the inverse time appears to be incredibly ad hoc ... Nor is it clear that the quantified model will be substantially changed without extrapolation. The authors should either justify the extrapolation more clearly (sorry if I missed it!) or also report the unextrapolated numbers alongside the extrapolated ones. 

      We report the values of the excess area and bending energy in the different time intervals of our analysis as data points in Fig. 4 with supplement. We find it important to report the time dependence of these quantities, because the intended equilibration of the membrane shapes in our simulations is not “complete” within a certain time window of the simulations. So, just “cutting” the first 20 and 50% of the simulation trajectories, and analysing the remaining parts as “equilibrated” does not seem to be a reasonable choice here, at least for the membrane properties, i.e. for the excess area and bending energy. We agree that the linear extrapolation used in our analysis is a matter of choice. At least for the coarse-grained simulations, the extrapolated values of excess areas and bending energies are rather close to the values obtained in the last time windows (see Figure 4). 

      In summary, this paper uses molecular dynamics simulations to quantify the force of the Piezo 1 and Piezo 2 proteins on a lipid bilayer using simulations under controlled tension, observing the membrane deformation, and using that data to infer protein mechanics. While much of the physical mechanism was previously known, the study itself is a valuable quantification. I identified one issue in the membrane deformation energy analysis that has large quantitative repercussions for the extracted model. 

      Reviewer #2 (Public review): 

      Summary: 

      In this study, the authors suggest that the structure of Piezo2 in a tensionless simulation is flatter compared to the electron microscopy structure. This is an interesting observation and highlights the fact that the membrane environment is important for Piezo2 curvature. Additionally, the authors calculate the excess area of Piezo2 and Piezo1, suggesting that it is significantly smaller compared to the area calculated using the EM structure or simulations with restrained Piezo2. Finally, the authors propose an elastic model for Piezo proteins. Those are very important findings, which would be of interest to the mechanobiology field. 

      Whilst I like the suggestion that the membrane environment will change Piezo2 flatness, could this be happening because of the lower resolution of the MARTINI simulations? In other words, would it be possible that MARTINI is not able to model such curvature due to its lower resolution? 

      Related to my comment above, the authors say that they only restrained the secondary structure using an elastic network model. Whilst I understand why they did this, Piezo proteins are relatively large. How can the authors know that this type of elastic network model restrains, combined with the fact that MARTINI simulations are perhaps not very accurate in predicting protein conformations, can accurately represent the changes that happen within the Piezo channel during membrane tension? 

      These questions regarding the reliability of the Martini model are very reasonable and are the reason why we include also results from atomistic simulations, at least for Piezo 2, and compare the results. In the Martini model, secondary structure constraints are standard. In addition, constraints on the tertiary structure (e.g. via an elastic network model) are also typically used in simulations of soluble, globular proteins. However, such tertiary constraints would make it impossible to simulate the tension-induced flattening of the Piezo proteins. So instead, as we write on lines 427ff, “we relied on the capabilities of the Martini coarse-grained force field for modeling membrane systems with TM helix assemblies (Sharma and Juffer, 2013; Chavent et al., 2014; Majumder and Straub, 2021).” In these refences, Martini simulations were used to study the assembly of transmembrane helices, leading to agreement with experimentally observed structures. As we state in our article, our atomistic simulations corroborate the Martini simulations, with the caveats that are now more extensively discussed in the new last paragraph of the Discussion section starting on line 362.

      Modelling or Piezo1, seems to be based on homology to Piezo2. However, the authors need to further evaluate their model, e.g. how it compares with an Alphafold model. 

      We understand the question, but see it beyond the scope of our article, also because of the computational demand of the simulations. The question is: Do coarse-grained simulations of Piezo1 based on an Alphafold model as starting structure lead to different results? It is important to note that we only model the rather flexible 12 TM helices at the outer ends of the Piezo 1 monomers via homology modeling to the Piezo 2 structure, which includes these TM helices. For the inner 26 TM helices, including the channel, we use the high-quality cryo-EM structure of Piezo 1. Alphafold may be an alternative for modeling the outer 12 helices, but we don’t think this would lead to statistically significant differences in simulations – e.g. because of the observed overall agreement of membrane shapes in all our Piezo 1 and Piezo 2 simulation systems.

      To calculate the tension-induced flattening of the Piezo channel, the authors "divide all simulation trajectories into 5 equal intervals and determine the nanodome shape in each interval by averaging over the conformations of all independent simulation runs in this interval.". However, probably the change in the flattening of Piezo channel happens very quickly during the simulations, possibly within the same interval. Is this the case? and if yes does this affect their calculations? 

      Unfortunately, the flattening is not sufficiently quick, so is not complete within the first time windows, see data points in Figure 4. We therefore report the time dependence with the plots in Figure 4 and extrapolate, see also our response above to reviewer 1.

      Finally, the authors use a specific lipid composition, which is asymmetric. Is it possible that the asymmetry of the membrane causes some of the changes in the curvature that they observe? Perhaps more controls, e.g. with a symmetric POPC bilayer are needed to identify whether membrane asymmetry plays a role in the membrane curvature they observe. 

      Because of the rather high computational demands, such controls are beyond our scope. We don’t expect statistically significant differences for symmetric POPC/cholesterol bilayers. On lines 229ff, we now state:

      “Our modelling assumes that any spontaneous curvature from asymmetries in the lipid composition is small compared to the curvature of the nanodome and, thus, negligible, which is plausible for the rather slight lipid asymmetry of our simulated membranes (see Methods).”

      Reviewer #3 (Public review): 

      Strengths: 

      This work focuses on a problem of deep significance: quantifying the structure-tension relationship and underlying mechanism for the mechanosensitive Piezo 1 and 2 channels. This objective presents a few technical challenges for molecular dynamics simulations, due to the relatively large size of each membrane-protein system. Nonetheless, the technical approach chosen is based on the methodology that is, in principle, established and widely accessible. Therefore, another group of practitioners would likely be able to reproduce these findings with reasonable effort. 

      Weaknesses: 

      The two main results of this paper are (1) that both channels exhibit a flatter structure compared to cryo-EM measurements, and (2) their estimated force vs. displacement relationship. Although the former correlates at least quantitatively with prior experimental work, the latter relies exclusively on simulation results and model parameters. 

      Below is a summary of the key points we recommend addressing in a revised version of the manuscript: 

      (1) The authors should report and discuss controls for the membrane energy calculations, specifically by increasing the density of the discretization graining. We also suggest validating the bending modulus used in the energy calculations for the specific lipid mixture employed in the study. 

      We have addressed both points, see our response to the reviewer’s comments for further details.

      (2) The authors should consider and discuss the potential limitations of the coarse-grained simulation force field and clarify how atomistic simulations validate the reported results, with a more detailed explanation of the potential interdependencies between the two. 

      We now discuss the caveats in the comparison of coarse-grained and atomistic simulations in more detail in a new paragraph starting on line 362.

      (3) The authors should provide further clarification on other points raised in the reviewers' comments, for instance, the potential role of membrane asymmetry. 

      We have done this – see above. We now further explain on lines 437ff why we use an asymmetric membrane. On lines 230ff, we discuss that any spontaneous membrane curvature due to lipid asymmetry is likely small compared to the nanodome curvature and, thus, negligible.

      Reviewer #1 (Recommendations for the authors): 

      (1) Report discretization dependence of the membrane energy (up to double the density of the current discretization graining). 

      We have added several text pieces in the paragraph “Excess area and bending energy” starting on line 583 in which we state how the results depend on the lattice constant a of the calculations.

      (2) Evaluate an analytical energy of a membrane bump with a shape similar to the simulation. This would be free of all sampling and discretization artifacts and would thus be an excellent lower bound of the energy. 

      We have done this for the curvature profile in Figure 1c and corresponding curvature profiles of the shape profiles in Figure 2d, see next text on lines 326ff.

      Minor: 

      (1)  The lipid density (Figure 1 right, 2c, 3c) is not interesting nor is it referred to. It can be dropped. 

      We think the lipid density maps are important for two reasons: First, they show the protein shape obtained after averaging conformations, as low-lipid-density regions. Second, the lipid densities are used in the calculation of the bending energies, to limit the bending energy calculations to the membrane in the nanodome, see Eq. 9. We therefore prefer to keep them.

      (2) Figure 7 is attractive but not used in a meaningful way. I suggest inserting the protein graphic from Figure 7 into Figure 1 with the 4-helix bundles numbered alongside the structure. Figure 7 could then be dropped. 

      Figure 7 is a figure of the Methods section. We need it to illustrate and explain aspects of the setup (numbering of helices, missing loops) and analysis (numbering scheme of 4-TM helix units).

      (3) Some editing of the use of the English language would be helpful. "Exemplary" is a bit of a funny word choice, it implies that the conformation is excellent, and not simply representative. I'd suggest "Representative conformation". 

      We agree and have replaced “exemplary” by “representative”.

      (4) Typos: 

      Equation 4 - Missing parentheses before squared operator inside the square root. 

      We have corrected this mistake.

      Reviewer #2 (Recommendations for the authors): 

      This study focuses mainly on Piezo2; the authors do not perform any atomistic simulations of Piezo1, and the coarse-grained simulations for Piezo1 are shorter. As a result, their analysis for Piezo2 seems more complete. It would be good if the authors did similar studies with Piezo1 as with Piezo2. 

      We agree that atomistic simulations of Piezo 1 would be interesting, too. However, because the atomistic simulations are particularly demanding, this is beyond our scope.

      Reviewer #3 (Recommendations for the authors): 

      (1) At line 63, a very large tension from the previous work by De Vecchis et al is reported (68 mN/m). The authors are sampling values up to about 21 mN/m, which is considerably smaller. However, these values greatly exceed what typical lipid membranes can sustain (about 10 mN/m) before rupturing. When mentioning these large tensions, the authors should emphasize that these values are not physiologically significant, because they would rupture most plasma membranes. That said, their use in simulation could be justified to magnify the structural changes compared to experiments. 

      We agree that our largest membrane tension values are unphysiological. However, we see a main novelty and relevance of our simulations in the fact that we obtain a response of the nanodome in the physiological range of membrane tensions, see e.g. the 3<sup>rd</sup> sentence of the abstract. Yes, we include simulations at tensions of 21 mN/m, but most of our simulated tension values are in the range from 0 to 10 mN/m (see e.g. Fig. 3e), in contrast to previous simulation studies.   

      (2) At line 78 and in the Methods, only the reference paper is for the CHARMM protein force field, but not for the lipid force field. 

      We have added the reference Klauda et al., 2010 for the CHARMM36 lipid force field in both spots.

      (3) (Line 83) Acknowledging that the authors needed to use the structure from micelles (because it has atomic resolution), how closely do their relaxed Piezo structures compare with the lowerresolution data from the MacKinnon and Patapoutian papers? 

      There are no structures reported in these papers to compare with, only a clear flattening as stated.  

      (4) (Line 99) The authors chose a slightly asymmetric lipid membrane composition to capture some specific plasma-membrane features. However, they do not discuss which features are described by this particular composition, which doesn't include different acyl-chain unsaturations between leaflets. Further, they do not seem to comment on whether there is enrichment of certain lipid species coupled to curvature, or whether there is any "scrambling" occurring when the dome section and the planar membrane are stitched together in the preparation phase (Figure 8). 

      Enrichment of lipids in contact with the protein is addressed in the reference Buyan et al., 2020, based on Martini simulations with Piezo 1. We have a different focus, but still wanted to keep an asymmetric membrane as in essentially all previous simulation studies as now stated also on lines 439ff, to mimic the native Piezo membrane environment. There is no apparent “scrambling” in the setup of our membrane systems. We also did not explore any coupling between curvature and lipid composition, but will publish the simulation trajectories to enable such studies.  

      (5) (Caption of Figure 2). Please comment briefly in the text why the tensionless simulation required a longer simulation run (e.g. larger fluctuations?) 

      We added as explanation on line 500 as explanation: “ … to explore the role of the long-range shape fluctuations in tensionless membranes for the relaxation into equilibrium”. The relaxation time of membrane shape fluctuations strongly increases with the wave length, which is only limited by the simulation box size in the absence of tensions. However, also for 8 microsecond trajectories, we do not observe complete equilibriation and therefore decided to extrapolate the excess area and bending energy values obtained for different time intervals of the trajectories.

      (6) (Caption of Figure 3). Please clarify in the Methods how the atomistic simulations were initialized were they taken from independent CG simulation snapshots? If not, the use of the adjective "independent" would be questionable given the very short atomistic simulation time length. 

      We now added that the production simulations started from the same structure. On lines 386, we now discuss the starting structure of the atomistic simulations in more detail.

      (7) (Line 202). The approach of discretizing the bilayer shape is reasonable, but no justification was provided for the 1-nm grid spacing. In my opinion, there should be a supporting figure showing how the bending energy varies with the grid spacing. 

      We now report also the effect of a 2-nm grid spacing on the results, see new text passages on page 18, and provide an explanation for the smaller 1-nm grid spacing on lines 587ff, where we write:

      “This lattice constant [a = 1 nm] is chosen to be smaller than the bin width of about 2nm used in determining the membrane shape of the simulation conformations, to take into account that the averaging of these membrane shapes can lead to a higher resolution compared to the 2 nm resolution of the individual membrane shapes.”

      (8) (Line 211). The choice by the authors to use a mixed lipid composition complicates the task of defining a reasonable bending modulus. Experimentally and in atomistic simulations, lipids with one saturated tail (like POPC or SOPC) are much stiffer when they are mixed with cholesterol (https://doi.org/10.1529/biophysj.105.067652, https://doi.org/10.1103/PhysRevE.80.021931, https://doi.org/10.1093/pnasnexus/pgad269). On the other hand, MARTINI seems to predict a slight *softening* for POPC mixed with cholesterol (https://doi.org/10.1038/s41467-023-43892-x). Further complicating this matter, mixtures of phospholipids with different preferred curvatures are predicted to be softer than pure bilayers (e.g. https://doi.org/10.1021/acs.jpcb.3c08117), but asymmetric bilayers are stiffer than symmetric ones in some circumstances (https://doi.org/10.1016/j.bpj.2019.11.3398). 

      This issue can be quite thorny: therefore, my recommendation would be to either: (a) directly compute k for their lipid composition, which is straightforward when using large CG bilayers (as was done in Fowler et al, 2016), but it would also require more advanced methods for the atomistic ones; (b) use a reasonable *experimental* value for k, based on a similar enough lipid composition. 

      We now justify in somewhat more detail why we use an asymmetric membrane, but agree that his complicates the bending energy estimates. We only aim to estimate the bending energy in the Martini 2.2 force field, because our elasticity model is based on and, thus, limited to results obtained with this force field. We have included the two further references using the Martini 2.2 force field suggested by the reviewer on line 213, and discuss now in more detail how the bending rigidity estimate enters and affects the modeling, see lines 226ff.  

      (9) (Line 224). Does this closing statement imply that all experimental work from ex-vivo samples describe Piezo states under some small but measurable tension? 

      We compare here to the cryo-EM structure in detergent micelles. So, there is no membrane tension, there may be a surface tension of the micelle, but we assume here that Piezo proteins are essentially force free in detergent micelles. Membrane embedding, in contrast, leads to strong forces on Piezo proteins already in the absence of membrane tension, because of the membrane bending energy.

      (10) (Line 304). The Discussion concludes with a reasonable point, albeit on a down note: could the authors elaborate on what kind of experimental approach may be able to verify their modeling results? 

      Very good question, but this is somewhat beyond our expertise. We don’t have a clear recommendation – it is complicated. What can be verified is the flattening, i.e. the height and curvature of the nanodome in lower-resolution experiments. We see our results in line with these experiments, see Introduction. 

      (11) (Line 331). The very title of the Majumder and Straub paper addresses the problem of excessive binding strength between protein beads in the MARTINI force field, which should be mentioned. Figure 3(d) shows that the atomistic systems have larger excess areas than the CG ones. This could be related to MARTINI's "stickiness", or just statistical sampling. Characterizing the grid spacing (see point 7 above) might help illuminate this. 

      We discuss now the larger excess area values of the atomistic simulations on lines 381ff.  

      (12) (Lines 367, 375). Are the harmonic restraints absolute position restraints or additional bonds?

      Note also that the schedule at which the restraints are released (10-ns intervals) is relatively quick. Does the membrane have enough time to equilibrate the number of lipids in each leaflet? 

      These are standard, absolute position restraints. The 10-ns intervals may be too short to fully equilibrate the numbers of lipids, we have not explored this. The main point in the setup was to have a reasonable TM helix embedding with a smooth membrane, without any rupturing. This turned out to be tricky, with the procedures illustrated in Figure 8 as solution. If the membrane is smooth, the lipid numbers quickly equilibrate either in the final relaxation or in the initial nanoseconds of the production runs.

      (13) (Line 387) The use of an isotropic barostat for equilibration further impedes the system's ability to relax its structure. I feel that the authors should validate more strongly their protocol to rule out the possibility that incomplete equilibration could bias dynamics towards flatter membranes, which is one of the main results of this paper. 

      We don’t see how choices in the initial relaxation steps could have affected our results, at least for the coarse-grained simulations. There is more and more flattening throughout all simulation trajectories, see e.g. the extrapolations in Figure 4. All initial simulation structures are significantly less flattened than the final structures in the production runs.

      (14) (Line 403). What is the protocol for reducing the membrane size for atomistic simulation? This is even more important to mention than for CG simulations. 

      We just cut lipids beyond the intended box size of the atomistic simulations. As a technical point, we now have also added on line 507 how PIP2 lipids were converted.

      (15) (Line 423). The CHARMM force field requires a cut-off distance of 12 Å for van der Waals forces, with a force-based continuous switching scheme. The authors should briefly comment on this deviation and its possible impact on membrane properties. Quick test simulations of very small atomistic bilayers with the chosen composition could be used as a comparison. 

      We don’t expect any relevant effect on membrane properties within the statistical accuracies of the quantities of interest here (i.e. excess areas).

      (16) (Equation 4). There are some mismatched parentheses: please check. 

      We have corrected this mistake.

      (17) (Equations 7-8). Why did the authors use finite-differences derivatives of z(x,y) instead of using cubic splines and the corresponding analytical derivatives? 

      In our experience, second derivatives of standard cubic splines can be problematic. The continuous membrane shapes we obtain in our analysis are averages of such splines. We find standard finite differences more reliable, and therefore discretize these shapes. Already for the 2d membrane profiles of Figure 1b and 2d, calculating curvatures from interpolations using splines is problematic.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Gerken et al examined how neurons in the human medial temporal lobe respond to and potentially code dynamic movie content. They had 29 patients watch a long-form movie while neurons within their MTL were monitored using depth electrodes. They found that neurons throughout the region were responsive to the content of the movie. In particular, neurons showed significant responses to people, places, and to a lesser extent, movie cuts. Modeling with a neural network suggests that neural activity within the recorded regions was better at predicting the content of the movies as a population, as opposed to individual neural representations. Surprisingly, a subpopulation of unresponsive neurons performed better than the responsive neurons at decoding the movie content, further suggesting that while classically nonresponsive, these neurons nonetheless provided critical information about the content of the visual world. The authors conclude from these results that low-level visual features, such as scene cuts, may be coded at the neuronal level, but that semantic features rely on distributed population-level codes.

      Strengths:

      Overall, the manuscript presents an interesting and reasonable argument for their findings and conclusions. Additionally, the large number of patients and neurons that were recorded and analyzed makes this data set unique and potentially very powerful. On the whole, the manuscript was very well written, and as it is, presents an interesting and useful set of data about the intricacies of how dynamic naturalistic semantic information may be processed within the medial temporal lobe.

      We thank the reviewer for their comments on our manuscript and for describing the strengths of our presented work

      Weaknesses:

      There are a number of concerns I have based on some of the experimental and statistical methods employed that I feel would help to improve our understanding of the current data.

      In particular, the authors do not address the issue of superposed visual features very well throughout the manuscript. Previous research using naturalistic movies has shown that low-level visual features, particularly motion, are capable of driving much of the visual system (e.g, Bartels et al 2005; Bartels et al 2007; Huth et al 2012; Çukur et al 2013; Russ et al 2015; Nentwich et al 2023). In some of these papers, low-level features were regressed out to look at the influence of semantics, in others, the influence of low-level features was explicitly modeled. The current manuscript, for the most part, appears to ignore these features with the exception of scene cuts. Based on the previous evidence that low-level features continue to drive later cortical regions, it seems like including these as regressors of no interest or, more ideally, as additional variables, would help to determine how well MTL codes for semantic features over top of these lower-order variables.

      We thank the reviewer for this insightful comment and for the relevant literature regarding visual motion in not only the primary visual system but in cortical areas as well. While we agree that the inclusion of visual motion as a regressor of no interest or as an additional variable would be overall informative in determining if single neurons in the MTL are driven by this level of feature, we would argue that our analyses already provide some insight into its role and that only the parahippocampal cortical neurons would robustly track this feature.

      As noted by the reviewer, our model includes two features derived from visual motion: Camera Cuts (directly derived from frame-wise changes in pixel values)  and Scene Cuts (a subset of Camera Cuts restricted to changes in scene). As shown in Fig. 5a, decoding performance for these features was strongest in the parahippocampal cortex (~20%), compared to other MTL areas (~10%). While the entorhinal cortex also showed some performance for Scene Cuts (15%), we interpret this as being driven by the changes in location that define a scene, rather than by motion itself.

      These findings suggest that while motion features are tracked in the MTL, the effect may be most robust in the parahippocampal cortex. We believe that quantifying more complex 3D motion in a naturalistic stimulus like a full-length movie is a significant challenge that would likely require a dedicated study. We agree this is an interesting future research direction and will update the manuscript to highlight this for the reader.

      A few more minor points that would help to clarify the current results involve the selection of data for particular analyses. For some analyses, the authors chose to appropriately downsample their data sets to compare across variables. However, there are a few places where similar downsampling would be informative, but was not completed. In particular, the analyses for patients and regions may have a more informative comparison if the full population were downsampled to match the size of the population for each patient or region of interest. This could be done with the Monte Carlo sampling that is used in other analyses, thus providing a control for population size while still sampling the full population.

      We thank the reviewer for raising this important methodological point. The decision not to downsample the patient- and region-specific analyses was deliberate, and we appreciate the opportunity to clarify our rationale.

      Generally, we would like to emphasize that due to technical and ethical limitations of human single-neuron recordings, it is currently not possible to record large populations of neurons simultaneously in individual patients. The limited and variable number of recorded neurons per subject (Fig. S1) generally requires pooling neurons into a pseudo-populations for decoding, which is a well‐established standard in human single‐neuron studies (see e.g., (Jamali et al., 2021; Kamiński et al., 2017; Minxha et al., 2020; Rutishauser et al., 2015; Zheng et al., 2022)).

      For the patient-specific analysis, our primary goal was to show that no single patient's data could match the performance of the complete pseudo-population. Crucially, we found no direct relationship between the number of recorded neurons and decoding performance; patients with the most neurons (patients 4, 13) were not top performers, and those with the fewest (patients 11, 14) were not the worst (see Fig. 4). This indicates that neuron count was not the primary limiting factor and that downsampling would be unlikely to provide additional insight.

      Similarly, for the region-specific analysis, regions with larger neural populations did not systematically outperform those with fewer neurons (Fig. 5). Given the inherent sparseness of single-neuron data, we concluded that retaining the full dataset was more informative than excluding neurons simply to equalize population sizes.

      We agree that this methodological choice should be transparent and explicitly justified in the text. We will add an explanation to the revised manuscript to justify why this approach was taken and how it differs from the analysis in Fig. 6.

      Reviewer #2 (Public review):

      Summary:

      This study introduces an exciting dataset of single-unit responses in humans during a naturalistic and dynamic movie stimulus, with recordings from multiple regions within the medial temporal lobe. The authors use both a traditional firing-rate analysis as well as a sophisticated decoding analysis to connect these neural responses to the visual content of the movie, such as which character is currently on screen.

      Strengths:

      The results reveal some surprising similarities and differences between these two kinds of analyses. For visual transitions (such as camera angle cuts), the neurons identified in the traditional response analysis (looking for changes in firing rate of an individual neuron at a transition) were the most useful for doing population-level decoding of these cuts. Interestingly, this wasn't true for character decoding; excluding these "responsive" neurons largely did not impact population-level decoding, suggesting that the population representation is distributed and not well-captured by individual-neuron analyses.

      The methods and results are well-described both in the text and in the figures. This work could be an excellent starting point for further research on this topic to understand the complex representational dynamics of single neurons during naturalistic perception.

      We thank the reviewer for their feedback and for summarizing the results of our work.

      (1) I am unsure what the central scientific questions of this work are, and how the findings should impact our understanding of neural representations. Among the questions listed in the introduction is "Which brain regions are informative for specific stimulus categories?". This is a broad research area that has been addressed in many neuroimaging studies for decades, and it's not clear that the results tell us new information about region selectivity. "Is the relevant information distributed across the neuronal population?" is also a question with a long history of work in neuroscience about localist vs distributed representations, so I did not understand what specific claim was being made and tested here. Responses in individual neurons were found for all features across many regions (e.g., Table S1), but decodable information was also spread across the population.

      We thank the reviewer for this important point, which gets to the core of our study's contribution. While concepts like regional specificity are well-established from studies on the blood-flow level, their investigation at the single-neuron level in humans during naturalistic, dynamic stimulation remains a critical open question. The type of coding (sparse vs. distributed) on the other hand cannot be investigated with blood-flow studies as the technology lacks the spatial and temporal resolution.

      Our study addresses this gap directly. The exceptional temporal resolution of single-neuron recordings allows us to move beyond traditional paradigms and examine cellular-level dynamics as they unfold in neuronal response on a frame-by-frame basis to a more naturalistic and ecologically valid stimulus. It cannot be assumed that findings from other modalities or simplified stimuli will generalize to this context.

      To meet this challenge, we employed a dual analytical strategy: combining a classic single-unit approach with a machine learning-based population analysis. This allowed us to create a bridge between prior work and our more naturalistic data. A key result is that our findings are often consistent with the existing literature, which validates the generalizability of those principles. However, the differences we observe between these two analytical approaches are equally informative, providing new insights into how the brain processes continuous, real-world information.

      We will revise the introduction and discussion to more explicitly frame our work in this context, emphasizing the specific scientific question driving this study, while also highlighting the strengths of our experimental design and recording methods.

      (2) The character and indoor/outdoor labels seem fundamentally different from the scene/camera cut labels, and I was confused by the way that the cuts were put into the decoding framework. The decoding analyses took a 1600ms window around a frame of the video (despite labeling these as frame "onsets" like the feature onsets in the responsive-neuron analysis, I believe this is for any frame regardless of whether it is the onset of a feature), with the goal of predicting a binary label for that frame. Although this makes sense for the character and indoor/outdoor labels, which are a property of a specific frame, it is confusing for the cut labels since these are inherently about a change across frames. The way the authors handle this is by labeling frames as cuts if they are in the 520ms following a cut (there is no justification given for this specific value). Since the input to a decoder is 1600ms, this seems like a challenging decoding setup; the model must respond that an input is a "cut" if there is a cut-specific pattern present approximately in the middle of the window, but not if the pattern appears near the sides of the window. A more straightforward approach would be, for example, to try to discriminate between windows just after a cut versus windows during other parts of the video. It is also unclear how neurons "responsive" to cuts were defined, since the authors state that this was determined by looking for times when a feature was absent for 1000ms to continuously present for 1000ms, which would never happen for cuts (unless this definition was different for cuts?).

      We thank the reviewer for the valuable comment regarding specifically the cut labels. The choice to label frames that lie in a time window of 520ms following a cut as positive was selected based on prior research and is intended to include the response onsets across all regions within the MTL (Mormann et al., 2008). We agree that this explanation is currently missing from the manuscript, and we will add a brief clarification in the revised version.

      As correctly noted, the decoding analysis does not rely on feature onset but instead continuously decodes features throughout the entire movie. Thus, all frames are included, regardless of whether they correspond to a feature onset.

      Our treatment of cut labels as sustained events is a deliberate methodological choice. Neural responses to events like cuts often unfold over time, and by extending the label, we provide our LSTM network with the necessary temporal window to learn this evolving signature. This approach not only leverages the sequential processing strengths of the LSTM (Hochreiter et al., 1997) but also ensures a consistent analytical framework for both event-based (cuts) and state-based (character or location) features.

      (3) The architecture of the decoding model is interesting but needs more explanation. The data is preprocessed with "a linear layer of same size as the input" (is this a layer added to the LSTM that is also trained for classification, or a separate step?), and the number of linear layers after the LSTM is "adapted" for each label type (how many were used for each label?). The LSTM also gets to see data from 800 ms before and after the labeled frame, but usually LSTMs have internal parameters that are the same for all timesteps; can the model know when the "critical" central frame is being input versus the context, i.e., are the inputs temporally tagged in some way? This may not be a big issue for the character or location labels, which appear to be contiguous over long durations and therefore the same label would usually be present for all 1600ms, but this seems like a major issue for the cut labels since the window will include a mix of frames with opposite labels.

      We thank the reviewer for their insightful comments regarding the decoding architecture. The model consists of an LSTM followed by 1–3 linear readout layers, where the exact number of layers is treated as a hyperparameter and selected based on validation performance for each label type. The initial linear layer applied to the input is part of the trainable model and serves as a projection layer to transform the binned neural activity into a suitable feature space before feeding it into the LSTM. The model is trained in an end-to-end fashion on the classification task.

      Regarding temporal context, the model receives a 1600 ms window (800 ms before and after the labeled frame), and as correctly pointed out by the reviewer, LSTM parameters are shared across time steps. We do not explicitly tag the temporal position of the central frame within the sequence. While this may have limited impact for labels that persist over time (e.g., characters or locations), we agree this could pose a challenge for cut labels, which are more temporally localized.

      This is an important point, and we will clarify this limitation in the revised manuscript and consider incorporating positional encoding in future work to better guide the model’s focus within the temporal window. Additionally, we will add a data table, specifying the ranges of hyperparameters in our decoding networks. Hyperparameters were optimized for each feature and split individually, but we agree that some more details on how these parameters were chosen are important and we will provide a data table in our revised manuscript giving more insights into the ranges of hyperparameters.

      We thank the reviewer for this important point. We will clarify this limitation in the revised manuscript and note that positional encoding is a valuable direction to better guide the model’s focus within the temporal window. To improve methodological transparency, we will also add a supplementary table detailing the hyperparameter ranges used for our optimization process.

      (4) Because this is a naturalistic stimulus, some labels are very imbalanced ("Persons" appears in almost every frame), and the labels are correlated. The authors attempt to address the imbalance issue by oversampling the minority class during training, though it's not clear this is the right approach since the test data does not appear to be oversampled; for example, training the Persons decoder to label 50% of training frames as having people seems like it could lead to poor performance on a test set with nearly 100% Persons frames, versus a model trained to be biased toward the most common class. [...]

      We thank the reviewer for this critical and thoughtful comment. We agree that the imbalanced and correlated nature of labels in naturalistic stimuli is a key challenge.

      To address this, we follow a standard machine learning practice: oversampling is applied exclusively to the training data. This technique helps the model learn from underrepresented classes by creating more balanced training batches, thus preventing it from simply defaulting to the majority class. Crucially, the test set remains unaltered to ensure our evaluation reflects the model's true generalization performance on the natural data distribution.

      For the “Persons” feature, which appears in nearly all frames, defining a meaningful negative class is particularly challenging. The decoder must learn to identify subtle variations within a highly skewed distribution. Oversampling during training helps provide a more balanced learning signal, while keeping the test distribution intact ensures proper evaluation of generalization.

      The reviewer’s comment—that we are “training the Persons decoder to label 50% of training frames as having people”—may suggest that labels were modified. We want to emphasize this is not the case. Our oversampling strategy does not alter the labels; it simply increases the exposure of the rare, underrepresented class during training to ensure the model can learn its pattern despite its low frequency.

      We will revise the Methods section to describe this standard procedure more explicitly, clarifying that oversampling is a training-only strategy to mitigate class imbalance.

      (5) Are "responsive" neurons defined as only those showing firing increases at a feature onset, or would decreased activity also count as responsive? If only positive changes are labeled responsive, this would help explain how non-responsive neurons could be useful in a decoding analysis.

      We define responsive neurons as those showing increased firing rates at feature onset; we did not test for decreases in activity. We thank the reviewer for this valuable comment and will address this point in the revised manuscript by assessing responseness without a restriction on the direction of the firing rate.

      (6) Line 516 states that the scene cuts here are analogous to the hard boundaries in Zheng et al. (2022), but the hard boundaries are transitions between completely unrelated movies rather than scenes within the same movie. Previous work has found that within-movie and across-movie transitions may rely on different mechanisms, e.g., see Lee & Chen, 2022 (10.7554/eLife.73693).

      We thank the reviewer for pointing out this distinction and for including the relevant work from Lee & Chan (2022) which further contextualizes this distinction. Indeed, the hard boundaries defined in the cited paper differ slightly from ours. The study distinguishes between (1) hard boundaries—transitions between unrelated movies—and (2) soft boundaries—transitions between related events within the same movie. While our camera cuts resemble their soft boundaries, our scene cuts do not fully align with either category. We defined scene cuts to be more similar to the study’s hard boundaries, but we recognize this correspondence is not exact. We will clarify the distinctions between our scene cuts and the hard boundaries described in Zheng et al. (2022) in the revised manuscript, and will update our text to include the finding from Lee & Chan (2022).

      Reviewer #3 (Public review):

      This is an excellent, very interesting paper. There is a groundbreaking analysis of the data, going from typical picture presentation paradigms to more realistic conditions. I would like to ask the authors to consider a few points in the comments below.

      (1) From Figure 2, I understand that there are 7 neurons responding to the character Summer, but then in line 157, we learn that there are 46. Are the other 39 from other areas (not parahippocampal)? If this is the case, it would be important to see examples of these responses, as one of the main claims is that it is possible to decode as good or better with non-responsive compared to single responsive neurons, which is, in principle, surprising.

      We thank the reviewer for pointing out this ambiguity in the text. Yes, the other 39 units are responsive neurons from other areas. We will clarify to which neuronal sets the number of responsive neurons corresponds. We will also include response plots depicting the unit activity for the mentioned units.

      (2) Also in Figure 2, there seem to be relatively very few neurons responding to Summer (1.88%) and to outdoor scenes (1.07%). Is this significant? Isn't it also a bit surprising, particularly for outdoor scenes, considering a previous paper of Mormann showing many outdoor scene responses in this area? It would be nice if the authors could comment on this.

      We thank the reviewer for this insightful point. While a low response to the general 'outdoor scene' label seems surprising at first, our findings align with the established role of the parahippocampal cortex (PHC) in processing scenes and spatial layouts. In previous work using static images, each image introduces a new spatial context. In our movie stimulus, new spatial contexts specifically emerge at scene cuts. Accordingly, our data show a strong PHC response precisely at these moments. We will revise the discussion to emphasize this interpretation, highlighting the consistency with prior work.

      Regarding the first comment, we did not originally test if the proportion of the units is significant using e.g. a binomial test. We will include the results of a binomial test for each region and feature pair in the revised manuscript.

      (3) I was also surprised to see that there are many fewer responses to scene cuts (6.7%) compared to camera cuts (51%) because every scene cut involves a camera cut. Could this have been a result of the much larger number of camera cuts? (A way to test this would be to subsample the camera cuts.)

      The decrease in responsive units for scene cuts relative to camera cuts could indeed be due to the overall decrease in “trials” from one label to the other. To test this, we will follow the reviewer’s suggestion and perform tests using sets of randomly subsampled camera cuts and will include the results in the revised manuscript.

      (4) Line 201. The analysis of decoding on a per-patient basis is important, but it should be done on a per-session basis - i.e., considering only simultaneously recorded neurons, without any pooling. This is because pooling can overestimate decoding performances (see e.g. Quian Quiroga and Panzeri NRN 2009). If there was only one session per patient, then this should be called 'per-session' rather than 'per-patient' to make it clear that there was no pooling.

      The per-patient decoding was indeed also a per-session decoding, as each patient contributed only a single session to the dataset. We will make note of this explicitly in the text to resolve the ambiguity.

      (6) Lines 406-407. The claim that stimulus-selective responses to characters did not account for the decoding of the same character is very surprising. If I understood it correctly, the response criterion the authors used gives 'responsiveness' but not 'selectivity'. So, were people's responses selective (e.g., firing only to Summer) or non-selective (firing to a few characters)? This could explain why they didn't get good decoding results with responsive neurons. Again, it would be nice to see confusion matrices with the decoding of the characters. Another reason for this is that what are labelled as responsive neurons have relatively weak and variable responses.

      We thank the reviewer for pointing out the importance of selectivity in addition to responsiveness. Indeed, our response criterion does not take stimulus selectivity into account and exclusively measures increases in firing activity after feature onsets for a given feature irrespective of other features.

      We will adjust the text to reflect this shortcoming of the response-detection approach used here. To clarify the relationship between neural populations, we will add visualizations of the overlap of responsive neurons across labels for each subregion. These figures will be included in the revised manuscript.

      In our approach, we trained separate networks for each feature to effectively mitigate the issue of correlated feature labels within the dataset (see earlier discussion). While this strategy effectively deals with the correlated features, it precluded the generation of standard confusion matrices, as classification was performed independently for each feature.

      To directly assess the feature selectivity of responsive neurons, we will fit generalized linear models to predict their firing rates from the features. This approach will enable us to quantify their selectivity and compare it to that of the broader neuronal population.

      (7) Line 455. The claim that 500 neurons drive decoding performance is very subjective. 500 neurons gives a performance of 0.38, and 50 neurons gives 0.33.

      We agree with the reviewer that the phrasing is unclear. We will adjust our summary of this analysis as given in Line 455 to reflect that the logistic regression-derived neuronal rankings produce a subset which achieve comparable performance.

      (8) Lines 492-494. I disagree with the claim that "character decoding does not rely on individual cells, as removing neurons that responded strongly to character onset had little impact on performance". I have not seen strong responses to characters in the paper. In particular, the response to Summer in Figure 2 looks very variable and relatively weak. If there are stronger responses to characters, please show them to make a convincing argument. It is fine to argue that you can get information from the population, but in my view, there are no good single-cell responses (perhaps because the actors and the movie were unknown to the subjects) to make this claim. Also, an older paper (Quian Quiroga et al J. Neurophysiol. 2007) showed that the decoding of individual stimuli in a picture presentation paradigm was determined by the responsive neurons and that the non-responsive neurons did not add any information. The results here could be different due to the use of movies instead of picture presentations, but most likely due to the fact that, in the picture presentation paradigm, the pictures were of famous people for which there were strong single neuron responses, unlike with the relatively unknown persons in this paper.

      This is an important point and we thank the reviewer for highlighting a previous paradigm in which responsive neurons did drive decoding performance. Indeed, the fact that the movie, its characters and the corresponding actors were novel to patients could explain the disparity in decoding performance by way of weaker and more variable responses. We will include additional examples in the supplement of responses to features. Additionally, we will modify the text to emphasize the point that reliable decoding is possible even in the absence of a robust set of neuronal responses. It could indeed be the case that a decoder would place more weight on responsive units if they were present (as shown in the mentioned paper and in our decoding from visual transitions in the parahippocampal cortex).

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors explore a novel concept: GPCR-mediated regulation of miRNA release via extracellular vesicles (EVs). They perform an EV miRNA cargo profiling approach to investigate how specific GPCR activations influence the selective secretion of particular miRNAs. Given that GPCRs are highly diverse and orchestrate multiple cellular pathways - either independently or collectively - to regulate gene expression and cellular functions under various conditions, it is logical to expect alterations in gene and miRNA expression within target cells.

      Strengths:

      The novel idea of GPCRs-mediated control of EV loading of miRNAs.

      Weaknesses:

      Incomplete findings failed to connect and show evidence of any physiological parameters that are directly related to the observed changes. The mechanical detail is lacking.

      We appreciate the reviewer's acknowledgment of the novelty of this study. We agree with the reviewer that further mechanistic insights would strengthen the manuscript. The mechanisms by which miRNA is sorted into EVs remain poorly understood. Various factors, including RNA-binding protein, sequence motifs, and cellular location, can influence this sorting process(Garcia-Martin et al., 2022; Liu & Halushka, 2025; Villarroya-Beltri et al., 2013; Yoon et al., 2015). Ago2, a key component of the RNA-induced silencing complexes, binds to miRNA and facilitates miRNA sorting. Ago2 has been found in the EVs and can be regulated by the cellular signaling pathway.  For instance, McKenzie et al. demonstrated that KRAS-dependent activation of MEK-ERK can phosphorylate Ago2 protein, thereby regulating the sorting of specific miRNAs into EVs(McKenzie et al., 2016). In the differentiated PC12 cells, Gαq activation leads to the formation of Ago2-associated granules, which selectively sequester unique transcripts(Jackson et al., 2022). Investigating GPCR, G protein, and GPCR signaling on Ago2 expression, location, and phosphorylation states could provide valuable insights into how GPCRs regulate specific miRNAs within EVs. We have expanded these potential mechanisms and future research in the discussion section.

      The manuscript falls short of providing a comprehensive understanding. Identifying changes in cellular and EV-associated miRNAs without elucidating their physiological significance or underlying regulatory mechanisms limits the study's impact. Without demonstrating whether these miRNA alterations have functional consequences, the findings alone are insufficient. The findings may be suitable for more specialized journals.

      Thank you for the feedback. We acknowledge that validating the target genes of the top candidate miRNAs is an important next step. In response to the reviewer's concerns, we have expanded the discussion of future research in the manuscript. Although this initial study is primarily descriptive, it establishes a novel conceptual link between GPCR signaling and EV-mediated communication.

      Furthermore, a critical analysis of the relationship between cellular miRNA levels and EV miRNA cargo is essential. Specifically, comparing the intracellular and EV-associated miRNA pools could reveal whether specific miRNAs are preferentially exported, a behavior that should be inversely related to their cellular abundance if export serves a beneficial function by reducing intracellular levels. This comparison is vital to strengthen the biological relevance of the findings and support the proposed regulatory mechanisms by GPCRs.

      We appreciate the valuable suggestions from the reviewer. EV miRNA and cell miRNAs may exhibit distinct profiles as miRNAs can be selectively sorted into or excluded from EVs(Pultar et al., 2024; Teng et al., 2017; Zubkova et al., 2021). Investigating the difference between cellular miRNA levels and EV miRNA cargo would provide insight into the mechanism of miRNA sorting and the functions of miRNAs in the recipient cells. The expression of the cellular miRNAs is a highly dynamic process. To accurately compare the miRNA expression levels, profiling of EV miRNA and cellular miRNA should be conducted simultaneously. However, as a pilot study, we were unable to measure the cellular miRNAs without conducting the entire experiment again.

      Reviewer #2 (Public review):

      Summary:

      This study examines how activating specific G protein-coupled receptors (GPCRs) affects the microRNA (miRNA) profiles within extracellular vesicles (EVs). The authors seek to identify whether different GPCRs produce unique EV miRNA signatures and what these signatures could indicate about downstream cellular processes and pathological processes.

      Methods:

      (1) Used U2OS human osteosarcoma cells, which naturally express multiple GPCR types.

      (2) Stimulated four distinct GPCRs (ADORA1, HRH1, FZD4, ACKR3) using selective agonists.

      (3) Isolated EVs from culture media and characterized them via size exclusion chromatography, immunoblotting, and microscopy.

      (4) Employed qPCR-based miRNA profiling and bioinformatics analyses (e.g., KEGG, PPI networks) to interpret expression changes.

      Key Findings:

      (1) No significant change in EV quantity or size following GPCR activation.

      (2) Each GPCR triggered a distinct EV miRNA expression profile.

      (3) miRNAs differentially expressed post-stimulation were linked to pathways involved in cancer, insulin resistance, neurodegenerative diseases, and other physiological/pathological processes.

      (4) miRNAs such as miR-550a-5p, miR-502-3p, miR-137, and miR-422a emerged as major regulators following specific receptor activation.

      Conclusions:

      The study offers evidence that GPCR activation can regulate intercellular communication through miRNAs encapsulated within extracellular vesicles (EVs). This finding paves the way for innovative drug-targeting strategies and enhances understanding of drug side effects that are mediated via GPCR-related EV signaling.

      Strengths:

      (1) Innovative concept: The idea of linking GPCR signaling to EV miRNA content is novel and mechanistically important.

      (2) Robust methodology: The use of multiple validation methods (biochemical, biophysical, and statistical) lends credibility to the findings.

      (3) Relevance: GPCRs are major drug targets, and understanding off-target or systemic effects via EVs is highly valuable for pharmacology and medicine.

      Weaknesses:

      (1) Sample Size & Scope: The analysis included only four GPCRs. Expanding to more receptor types or additional cell lines would enhance the study's applicability.

      We are encouraged that the reviewer recognized the novelty, methodological rigor, and significance of our work. We recognize the limitations of our current model system and emphasize the need to test additional GPCR families and cell lines in the future studies, as detailed in the discussion section.

      (2) Exploratory Nature: This study is primarily descriptive and computational. It lacks functional validation, such as assessing phenotypic effects in recipient cells, which is acknowledged as a future step.

      We appreciate the feedback. We recognize the importance of validating the function of the top candidate miRNAs in the recipient cells, and this will be included in future studies. 

      (3) EV heterogeneity: The authors recognize that they did not distinguish EV subpopulations, potentially confounding the origin and function of miRNAs.

      Thank you for the comment. EV isolation and purification are major challenges in EV research. Current isolation techniques are often ineffective at separating vesicles produced by different biogenetic pathways. Furthermore, the lack of specific markers to differentiate EV subtypes adds to this complexity. We recognize that the presence of various subpopulations can complicate the interpretation of EV cargos. In our study, we used a combined approach of ultrafiltration followed by size-exclusion chromatography to achieve a balance between EV purity and yield. We adhere to the MISEV (Minimal Information for Studies of Extracellular Vesicles 2023) guidelines by reporting detailed isolation methods, assessing both positive and negative protein markers, and characterizing EVs by electron microscopy to confirm vesicle structure, as well as nanoparticle tracking analysis to verify particle size distribution(Welsh et al., 2024). By following these guidelines, we can ensure the quality of our study and enhance the ability to compare our findings with other studies.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      Felipe and colleagues try to answer an important question in Sarbecovirus Orf9b-mediated interferon signaling suppression, given that this small viral protein adopts two distinct conformations, a dimeric β-sheet-rich fold and a helix-rich monomeric fold when bound by Tom70 protein. Two Orf9b structures determined by X-ray crystallography and Cryo-EM suggest an equilibrium between the two Orf9b conformations, and it is important to understand how this equilibrium relates to its functions. To answer these questions, the authors developed a series of ordinary differential equations (ODE) describing the Orf9b conformation equilibrium between homodimers and monomers binding to Tom70. They used SPR and a fluorescent polarization (FP) peptide displacement assay to identify parameters for the equilibrium and create a theoretical model. They then used the model to characterize the effect of lipid-binding and the effects of Orf9b mutations in homodimer stability, lipid binding, and dimer-monomer equilibrium. They used their model to further analyze dimerization, lipid binding, and Orf9b-Tom70 interactions for truncated Orf9b, Orf9b fusion mutant S53E (blocking Tom70 binding), and Orf9b from a set of Sars-CoV-2 VOCs. They evaluated the ability of different Orf9b variants for binding Tom70 using Co-IP experiments and assessed their activity in suppressing IFN signaling in cells.

      Overall, this work is well designed, the results are of high quality and well-presented; the results support their conclusions.

      We thank reviewer #1 for their thoughtful assessment of our work and their constructive feedback.

      Strengths:

      (1) They developed a working biophysical model for analyzing Orf9b monomer-dimer equilibrium and Tom70 binding based on SPR and FP experiments; this is an important tool for future investigation.

      (2) They prepared lipid-free Orf9b homodimer and determined its crystal structure.

      (3) They designed and purified obligate Orf9b monomer, fused-dimer, etc., a very important Orf9b variant for further investigations.

      (4) They identified the lipid bound by Orf9b homodimer using mass spectra data.

      (5) They proposed a working model of Orf9b-Tom70 equilibrium.

      Weaknesses:

      (1) It is difficult to understand why the obligate Orf9b dimer has similar IFN inhibition activity as the WT protein and obligate Orf9b monomer truncations.

      We thank the reviewer for their observation and agree that the obligate homodimer IFN results were not what we expected to observe given our FP kinetic results with the purified obligate homodimer and noted our surprise in the discussion. We also note that we have two possible hypotheses for why this is the case.

      In our discussion, we noted the possible introduction of an increased avidity effect with fused homodimer and have improved it as follows with additions in red:

      “This result was unexpected as we had anticipated the obligate homodimer results to resemble the phosphomimetic. We hypothesize that this may be explained by two possible factors. First, we can’t exclude the introduction of an increased avidity between Orf9b and Tom70 when using the fused homodimer. Although our modeled decrease in the association rate of Orf9b:Tom70 (which increases the K<sub>D</sub> of the complex) suggests that fusing two copies of Orf9b decreases the affinity to Tom70, one copy of the fusion construct could also be capable of either binding to two copies of Tom70, or, one copy of the fusion could undergo rapid rebinding to Tom70. These effects would lead to a much tighter interaction in cellular assays than we modeled in vitro. A second possible explanation is that our assumptions about high lipid binding are not valid for cell based assays.”

      We also noted that a second possible explanation is due to our limitations in isolating the apo-fused homodimer to compare to the lipid-bound fused homodimer and possible differences this could have on our assays and briefly expanded upon this. Again, we improved this with additions in red:

      “As we have shown with both WT and fusion constructs, recombinantly expressed and purified Orf9b is lipid-bound and this can stabilize the homodimer to slow or inhibit the binding to Tom70. For the Orf9b fusion construct, we attempted to isolate the lipid-free species through protein refolding as previously described to compare the effect of lipid-binding on the homodimer fusion (similar to our WT experiments); however, we could not recover the stably folded homodimer. We hypothesize that the discrepancy between our kinetic results and Co-IP/IFN results could be due to subsaturation of the Orf9b fusion homodimers by lipids in cell based assays. While we have shown that lipid-binding occurs in recombinant expression systems, it is possible that in our cell based signaling assays that lipid-binding only affects a minor population of Orf9b. Given that we were unable to isolate the apo-fusion homodimer, we could not directly compare whether there are differences in fusion homodimer stability in the presence or absence of lipid-binding. Therefore, it is possible that the apo-fusion homodimer undergoes unfolding and refolding into alpha helices that lead to Tom70 binding similar to the WT construct.”

      (2) The role of Orf9b homodimer and the role of Orf9b-bound lipid in virus infection, remains unknown.

      We agree that we did not try to directly test for the role of the homodimer during infection and this remains an open area of exploration for future studies. We have included this caveat in our discussion but suggested possible experiments and future directions that could help shed light on this:

      “Although we have not directly tested for the role the homodimer conformation plays during infection, we have demonstrated that lipid-binding to the homodimer can bias the equilibrium away from Tom70. Lipids including palmitate have been shown to act as both a signaling molecule as well as a post-translational modification during antiviral innate immune signaling (S Mesquita et al. 2024; Wen et al. 2022; S. Yang et al. 2019). As a post-translational modification (referred to as S-acylation), MAVS, a mitochondrial type 1 IFN signaling protein that associates with Tom70 (X.-Y. Liu et al. 2010; McWhirter, Tenoever, and Maniatis 2005; Seth et al. 2005), has been shown to be post-translationally palmitoylated which affects its ability to localize to the mitochondrial outer membrane during viral infection and is a known target of Orf9b (Bu et al. 2024; Lee et al. 2024). When this is impaired (either by mutation or by depletion of the palmitoylation enzyme ZDHHC24), IFN activation is impaired (Bu et al. 2024). Therefore, future investigations should consider if the homodimer conformation of Orf9b is capable of antagonizing other IFN signaling factors such as MAVS by binding to palmitoyl groups. Indeed, Orf9b has already been shown to be capable of binding to MAVS by Co-IP (Han et al. 2021), however, whether or not this occurs through the palmitoyl modification remains unknown.”

      Reviewer #2 (Public review):

      Summary:

      This study focuses on Orf9b, a SARS-COV1/2 protein that regulates innate signaling through interaction with Tom70. San Felipe et al use a combination of biophysical methods to characterize the coupling between lipid-binding, dimerization, conformational change, and protein-protein-interaction equilibria for the Orf9b-Tom70 system. Their analysis provides a detailed explanation for previous observations of Orf9b function. In a cellular context, they find other factors may also be important for the biological functioning of Orf9b.

      Strengths:

      San Felipe et al elegantly combine structural biology, biophysics, kinetic modelling, and cellular assays, allowing detailed analysis of the Orf9b-Tom70 system. Such complex systems involving coupled equilibria are prevalent in various aspects of biology, and a quantitative description of them, while challenging, provides a detailed understanding and prediction of biological outcomes. Using SPR to guide initial estimates of the rate constants for solution measurements is an interesting approach.

      Weaknesses:

      This study would benefit from a more quantitative description of uncertainties in the numerous rate constants of the models, either through a detailed presentation of the sensitivity analysis or another approach such as MCMC. Quantitative uncertainty analysis, such as MCMC is not trivial for ODEs, particularly when they involve many parameters and are to be fitted to numerous data points, as is the case for this study. The authors use sensitivity analysis as an alternative, however, the results of the sensitivity analysis are not presented in detail, and I believe the authors should consider whether there is a way to present this analysis more quantitatively. For example, could the residuals for each +/-10% parameter change for the peptide model be presented as a supplementary figure, and similarly for the more complex models? Further details of the range of rate constants tested would be useful, particularly for the ka and kB parameters.

      We thank the reviewer for their constructive feedback and have generated supplemental figures providing a deeper analysis of the residuals for each model parameter adjusted +/- 10% from the reported values which we have added to our supplemental figures as Figure 1 - Supplemental 3 and Figure 4 - Supplemental 5  .

      We note that there are modest improvements in residual plots where model parameters are individually lowered by 10% from their reported value when considering this single dataset, however, our choice of using the reported values was driven by finding values that were suitable for improving model behavior across multiple concentration series in different datasets. Specifically, we have also included the RMSD values for each model parameter subjected to a +/-10% change from a single concentration time course as well as the percent change in RMSD relative to the RMSD generated by our reported model parameters to illustrate this. We have also included text that makes note of the observed pattern in the residuals from Figure 4 - Supplement 5 and provided some explanations for why this may occur.

      “Inspection of the residuals from the 5uM apo-Orf9b homodimer time course showed clear patterns when individual model parameters were subjected to a 10% increase or decrease from the reported values. While our proposed model qualitatively describes the concentration dependent change in kinetic behavior, the residual plots may suggest that additional binding reactions may also be occurring that are not captured by our model.”

      Figure 1 - Supplemental 3. Plots of residuals from Orf9b peptide model showing effect of an increase or decrease by 10% on each model parameter. All residuals and reporting are with respect to the100uM of unlabeled Orf9b peptide condition. Blue dots: reported value. Red dots: 10% increase in reported value. Green dots: 10% decrease in reported value. Table reporting of RMSD values for model fitsafter +/-10% change to model parameter (Left column) and percent change in RMSD relative to reported model RMSD (Right column).

      “As an alternative to attempting to place CIs on the parameters, we performed sensitivity analysis to determine which parameters the model was most sensitive to (see methods and Figure 1 - Supplemental 3). Additionally, we note that the model parameters were derived from the fit of only one concentration (100uM), but fit the other concentrations equally well. We observed that the model parameter that was most sensitive to change was the rate of Orf9b-FITC:Tom70 ([PT]) dissociation when subjected to a 10% increase or decrease whereas all other model parameters showed no sensitivity to change (Figure 1 - Supplemental 3).”

      Figure 4 - Supplemental 5: Plot of residuals showing the effect of increasing or decreasing individual model parameters 10% compared to the reported values. All residual plots are with respect to the 5uM apo-Orf9b homodimer condition. Blue dots: reported value. Red dot: 10% increase in reported value. Green dot: 10% decrease in reported value. (Left columns) Table of RMSD values calculated from model fits showing the effect of both +/-10% change to individual model parameters. (Right columns) Percent change in RMSD values subjected to +/-10% change for individual model parameters relative to the RMSD of the reported model.

      We have also included the following revised text to accompany this figure.

      “Further, we repeated the sensitivity analysis described previously for the peptide model and also considered the sensitivity of model parameters by inspecting each individually (Figure 4- figure supplemental 5). We found that when examining the residuals of the lowest concentration of 5uM, the model was most sensitive to changes in three parameters: the rate of homodimer association and dissociation and the conversion from β to α-monomers.”

      “Therefore, under low concentrations of Orf9b homodimer, binding to Tom70 is limited by the rate of homodimer association and dissociation as well as the conversion of Orf9b monomers to the α-helical conformation.”

      We have also included a supplemental figure showing how changes in the model parameters ka and kB affect the models behavior to help illustrate the range of values tested as Figure 4 - Supplemental 4.

      Figure 4 - Supplemental 4: Plots of model behavior showing the effect of changes to alpha-beta and beta-alpha monomer  interconversion rates compared to experimental values. Data is modeled with respect to the apo-Orf9b homodimer 5uM condition. Black line represents reported model fit and values used.

      We have also incorporated the following revised text.

      “The model parameters k<sub>a</sub> and k<sub>B</sub> describe the rate of interchange between the β-sheet and α-helix monomer conformations. These parameters must be estimated by modeling because our assays do not allow us to directly measure the folding rates between these conformations. To identify these values, we performed a scan of k<sub>a</sub> and k<sub>B</sub> values that yielded the best agreement between the model and the experimental conditions (Figure 4 - figure supplemental 4).”

      The authors build a model that incorporates an α-helix-β-sheet conformational change, but the rate constant for the conversion to the α-helix conformation is required to be second order. Although the authors provide some rationale, I do not find this satisfactorily convincing given the large number of adjustable parameters in the model and the use of manual model fitting. The authors should discuss whether there is any precedence for second-order rate constants for conformational changes in the literature. On page 14, the authors state this rate constant "had to be non-linear in the monomer β-sheet concentration" - how many other models did the authors explore? For example, would αT↔α↔αα↔ββ (i.e., conformational change before dimer dissociation) or α↔βαT↔ββ (i.e., Tom70 binding driving dimer dissociation) be other plausible models for the conformational change that do not require assumptions of second-order rate constants for the conformational change?

      We thank the reviewer for their feedback. During our studies, we tested several models prior to the final one presented in Figure 4A. The first model that we tested as described in Figure 4 - Supplemental 3 described ββ↔α↔αT with no conformational change. We tested several models that integrated the existing structural data for both Orf9b and Tom70 and found that while these models could fit individual time series, they did not explain the concentration dependent changes in subsequent time series nor did they explain changes induced by lipid-binding and mutations in VOC.

      With respect to the possibilities of αT↔α↔αα↔ββ and α↔βαT↔ββ models, we have revised our manuscript to mention that we did test additional models before we settled on the model that we presented.

      “We tested different reaction schemes that incorporated the interconversion between β-sheet to α-helix conformations by considering models that described a conformational change in the homodimer leading to Tom70 binding rather than monomers. None of these models adequately described our experimental results, therefore we continued developing our model as outlined in Figure 4D”

      With respect to the second-order rate describing the fold change from β to α, we have added the revised text to the manuscript:

      “We initially tested the impact of keeping the rate constant k<sub>a</sub> first order, just like k<sub>B</sub> which did yield the sigmoidal behavior we observed in the 5uM apo-homodimer condition. However, this assumption failed to describe the data at other concentrations resulting in a substantial overestimation compared to our experimental results when holding k<sub>B</sub> at a constant value throughout. We found that when the β-sheet to α-helix rate (k<sub>a</sub> ) was made a second order rate constant, we were able to hold the rate constant across all concentrations tested suggesting a non-linearity in the monomer β-sheet concentration.”

      While this was surprising to us, we reasoned that a biological explanation for why the conversion from β to α was second order was that the β-monomers may transiently self-associate to cooperatively fold into the α-helical conformation. We did acknowledge this choice to make the β to α parameter non-linear (unlike the α to β conversion which was single order).

      We concede that we could not find specific examples describing non-linear kinetics comparable to the system we described in literature, however, such systems have been reported for proteins that exhibit high structural plasticity where transient interactions with another copy of the protein or another protein altogether drive folding changes and we have revised this manuscript to include some additional citations to papers that describe such systems (Zuber et al. 2022; Tuinstra et al. 2008).

      Overall, this study progresses the analysis of coupled equilibria and provides insights into Orf9b function.

      Reviewer #1 (Recommendations for the authors):

      (1) What was the unlabeled Orf9b peptide is added to the pre-equilibrated Orf9b-FITC:Tom70 solution as a competitor? Figure 1D illustrates that the competitor was full-length Orf9b.

      We have revised the figure to illustrate that in this experiment, the competitor is the unlabeled FITC peptide and not the full length Orf9b sequence

      (2) Figure 2B, what is the higher Mw peak from refolded Orf9b homodimer.

      We have added the following revised text (highlighted in red) to the manuscript to clarify Figure 2B.

      “The SEC elution profile and retention volume of refolded Orf9b directly overlapped with natively folded homodimeric Orf9b and suggested a high recovery of the refolded homodimer with the early eluting peaks corresponding to either a chaperone-bound species (natively folded) or misfolded protein (refolded) as judged by SDS-PAGE (Figure 2B). Together, the overlap in elution peaks corresponding to the folded homodimer suggested a high recovery of the homodimer from the refolding conditions.”

      (3) Figure 2C, in the main text, the authors state that "...observed that the refolded homodimer structure closely aligned with the lipid-bound reference structure, which shows that the homodimer fold can be recovered after denaturing". Please provide structural comparison details here, software used? Rmsd and Dali Z-score.

      We have added the following revised text (highlighted in red) to the manuscript to clarify Figure 2C.

      “Aligning the structure of the Orf9b homodimer (PDB 6Z4U) with our structure of the refolded Orf9b homodimer (9N55) in Pymol resulted in an RMSD of 1.1Å. Further, we also searched our structures of the refolded Orf9b homodimer on the Dali server against the existing structures of the lipid-bound Orf9b homodimer which yielded a Z-score of 2.2 which shows good correspondence between the structures.”

      (4) To prove the refolded Orf9b homodimer did not contain lipid, could the authors provide mass spectra data for the refolded Orf9b sample and compare it with the results in Figure 2 - Supplemental 1.

      We do not have complete mass spectra data for the refolded homodimer samples, however, we feel that the native mass spectrometry data provides a good orthogonal comparison between natively folded and refolded samples for the presence or absence of lipids. We concede that we only used mass spectrometry to characterize the four peaks that were unique to the natively folded deconvoluted spectra which confirmed that shift in mass relative to the expected homodimer molecular weight corresponded to the two lipids we presented. However, we would expect that performing mass spectrometry on the refolded sample would only further confirm our observations from the crystal structures and the native mass spectrometry.

      (5) Have the authors tried to use analytical ultracentrifugation to analyze the Orf9b dimer-monomer equilibrium, given that AUC provides a much more accurate measurement of molecular mass?

      We thank the reviewer for this suggestion and agree that AUC could be an additional useful strategy for monitoring the dimer-monomer equilibrium and provide additional validation of the molecule weights of both the monomer and homodimer.

      While we have not performed AUC, we have revised our manuscript to include more discussion about the determination of molecular weights by SEC.

      “For the Orf9b homodimer, the retention volume was consistent with molecular weight standards based on the expected molecular weight of the homodimer (~21kDa) and the standard (~29kDa). In the case of the Orf9b monomer, although we would expect the retention volume of the monomer (~10.6kDA) to be between the molecular weight standards of 13.4kDa and 6.5kDa, the greater retention volume could be explained by non-specific hydrophobic interactions between the monomeric Orf9b and the column.”

      (6) The authors used truncation of 7 C-terminal amino acids to generate an obligate Orf9b monomer for their assays. It would be interesting to mutate residues at the homodimer interface to generate Orf9b monomers rather than deleting residues. For example, mutate 91-96aa (FVVVTV) to negatively charged residues, which will not only disrupt the dimerization interface, but also impair lipid binding. The dimer interface mutant should then be tested in their SPR, FP assays, as well as IFN inhibition assays.

      We thank the reviewer for their suggestion and agree that mutation of the 7 C-terminal amino acids into negatively charged residues could be an interesting alternative strategy to generating an obligate Orf9b monomer without the need for truncating the residues. Our choice of using the truncated construct we proposed was driven by our analysis of the structure of the homodimer which reveals that a significant portion of the dimer interface is composed of backbone-backbone hydrogen bonding between the two chains of Orf9b. We reasoned that truncating these residues would be the most effective way to compromise the interface between the two chains and drive a predominantly monomeric behavior, however, compromising the interface with multiple mutations is an intriguing alternative.

      Reviewer #2 (Recommendations for the authors):

      (1) The authors could comment on the slow monomer-dimer exchange observed by SEC and how it fits with their other analysis.

      We thank the reviewer for their comment and concede that the slow exchange may be a limitation of this experimental setup. Our observations from our SPR experiments and modeling showed us that the homodimer may be fast to dissociate into monomer given the off rate which would suggest a half-life for the homodimer to be on the order of seconds, however, we still observe a noticeable dimer species on the chromatograms. We initially allowed the diluted samples to reach equilibrium prior to injection onto the analytical sizing column, however, it is possible that the system is still in a pre-equilibrium prior to injection onto the column. This could be driven by interactions between the protein and the column that prevents full dissociation of the homodimer. While this is a limitation, we note that we did not use the Kd value that we determined by non-linear regression fitting to the equilibrium observed on the chromatograms for downstream experiments but instead used the value to get a ballpark estimate for the homodimer Kd which is on the same order as the Kd determined by SPR.

      (2) It might be useful to include the rate constants on the reaction arrows of the schematic representation of the models.

      We have revised Figure 4D to include the rates for both Orf9b monomer binding to Tom70 and Orf9b binding to Orf9b as derived from the SPR experiments as well as the modeled values for the interconversion between α and β monomers. We also revised Figure 7 to include these values as well as the modeled dissociation rate for homodimer when lipid-bound.

      (3) I couldn't find how the sensitivity analysis was performed for the more complex models. Was this the same +/- 10% as per the peptide model?

      We used the same +/- 10% sensitivity analysis for the peptide model in the more complex equilibrium model and have revised our manuscript to clearly reflect that.

      (4) Further clarification of "inspection of residuals suggested that the fits were accurate". In Figure 1B, the residues look to have systematic errors, perhaps indicating other processes occurring.

      We agree that in the SPR kinetic fitting results for the Orf9b peptide binding to Tom70 in Figure 1B that there are some regions where the fit over or under estimates the experimental results. This is partially the result of limitations in the number of different binding models that we can fit in the analysis software which is why we reported using a 1:1 langmuir binding model. It is certainly possible that there may be some additional binding reactions that occur, however, we limited our use of these specific kinetic results to the peptide model that we proposed in Figure 1D. We did note in the manuscript text that it was necessary for us to change the model parameter values to some extent in order to fit our experimental results which may be partially explained by the SPR fitting errors.

      “With the parameter set obtained from the 100µM condition, we then held all parameters fixed and simply changed the peptide concentrations in the model to fit the remaining conditions by hand. We note that this process saw the model parameter values change between 3% at the lowest end up to 70% at the highest end from the experimentally derived values but remained within an order of magnitude of the experimental SPR values. We speculate that this arises due to the differences in experimental setup between SPR and FP-based methods of measuring kinetics.”

      (5) The manuscript builds logically, but given the sophisticated nature of the system and the modelling could benefit from more clarity/streamlining in the descriptions/illustrations.

      We have revised our manuscript in response to both reviewers comments and hope that the clarity of the work is improved as a result.

      (6) Figure 4 Supplement 3 - where did the rate constants for Model 1 come from? Was there any attempt to alter them to fit the data better?

      We have clarified in the figure description that the rate constants used in Model 1 were the same values used in Figure 4B (but without the interconversion between beta and alpha rates).

      “Comparison of kinetic model 1 and 2 in describing experimental results from the kinetic binding assay. Experimental results using 10uM of refolded Orf9b homodimer are shown as rings with the predicted behavior of model 1 (equilibrium exchange) shown as a dark blue line. The predicted behavior of model 2 (equilibrium exchange with a conformational change between β-sheet and ɑ-helical monomers) is shown as the light blue line. Model parameter values were the same as described in Figure 4D and kept constant in both model comparisons.”

      (7) What are and [PT] in the second set of equations (page 13)?

      [‘PT] refers to the concentration of “fluorescent probe” (Orf9b-FITC) and Tom70.

      (8) "Additionally, the fused homodimer association rate (which can be viewed as a rate of tertiary complex formation)" - can the authors provide a mathematical proof for this?

      In the case of the fused homodimer kinetic data, we did not develop a separate model to explicitly take into account the differences between using a fused construct versus the WT construct that can dissociate into monomers. We have clarified our interpretation of this in the manuscript.

      “Although our model explicitly describes homodimer dissociation into monomers as a requisite step for Orf9b binding to Tom70, we adapted it for the fusion experimental data. In this case, all model parameters other than the association and dissociation kinetics of the fluorescent probe and Tom70 were adjusted to achieve the best agreement with the experimental data. When applied to the fusion homodimer, the parameters describing homodimer dissociation into separate monomers could instead describe the dissociation of the two β-sheet domains away from each other in the tertiary structure but remaining physically linked through the linker region.”

      (9) "For Lambda and Omicron, the P10S mutation results in the serine being positioned to form several hydrogen bonds between R13 and the backbone carbonyl of A11 and L48 within the same chain..." is this taken from AlphaFold predicted structures of the mutants? If so, it should be made clear that this is derived from predicted structures. And even so, AlphaFold can be poor at determining structures of mutants, and so there is greater uncertainty in the prediction of the bonds.

      For Lambda, Omicron, and Delta mutations, we used Pymol to examine how the placement of mutations could structurally explain the kinetic differences we observed in our model. We have gone back and clarified in the figure description that these predictions are not derived from AlphaFold.

      (10) "biological replicates" - is this different protein purifications?

      Yes, in this case biological replicates refer to different protein purifications for all variants described and tested.

      (11) Are any of the authors involved in the Berkeley Madonna commercial software used in the manuscript? If so, should this be in the conflict of interest statement?

      Yes, Michael Grabe is an owner of Berkeley Madonna, and we have updated our conflicts of interest statement to reflect this.

    1. Author response:

      Reviewer #1 (Public review):

      Thank you for your thoughtful and constructive feedback on our manuscript. We greatly appreciate your insights regarding our work, as they are invaluable in refining our research.

      We are very happy to hear that you recognize the strengths of our method, particularly the elimination of manual rosette picking, which significantly enhances throughput and reduces variability. We are also pleased that our validation efforts—through flow cytometry, immunocytochemistry, single-cell RNA-sequencing, and functional MEA recordings—effectively demonstrate both the identity and functionality of our derived dorsal forebrain neural rosette stem cells (NRSCs).

      Regarding the identified weaknesses, we agree that a direct comparison with conventional manual-selection protocols, specifically those utilizing dual-SMAD inhibition, would be a significant improvement. To address this, we have initiated additional experiments that will directly compare our single-SMAD inhibition approach (RepSox) with dual-SMAD inhibition (SB/LDN), aiming for a comprehensive evaluation of both protocols.

      In terms of statistical rigor, we appreciate your suggestion on improving our quantitative assays. All data were collected from at least three independent experiments and presented as mean ±standard deviation unless otherwise specified. Due to the qualitative nature of the data, no formal statistical tests were performed for most of the experiments and the mean and standard deviation were calculated for some quantitative measurements obtained, providing a descriptive summary of the data. When possible, we will incorporate appropriate statistical tests, to present our data in a more robust manner, rather than merely reporting mean ± SD.

      Finally, we recognize the importance of situating our work within the broader landscape of neural stem cell research. We aim to elucidate the potential downstream applications for our protocol, which we believe will significantly impact neurodevelopmental and neurodegenerative disorder studies.

      Thank you again for your valuable suggestions. We look forward to refining our manuscript and enhancing the contribution of our research to the field.

      Reviewer #2 (Public review):

      Thank you for your thoughtful and constructive feedback on our manuscript. We appreciate your recognition of the novelty and potential impact of our protocol for obtaining neural rosette stem cells (NRSCs). Your comments are invaluable in improving our work.

      We are pleased that you found our methodology to be a significant advancement in the field, particularly the elimination of the manual rosette selection step, which hopefully will enhance homogeneity and standardization. We agree that this development has implications for research, disease modelling, and compound testing.

      Regarding your specific points:

      Passage expansion: Thank you for your insightful suggestion regarding the analysis beyond passage 12. We have continued passaging our NRSC line for more than 12 passages while maintaining the rosette structure. Although we do not yet have comprehensive and detailed analyses at these later passages, we will include some data and relevant information on our findings in the revised manuscript.

      TJP1+ zones: We appreciate your observation regarding the decreased TJP1+ zones at passage 12. We have not consistently detected a reduction in the number of rosettes or TJP1+ lumens across our cultures between passages. While some variability has been noted, we occasionally observe minor reductions at specific time points, followed by a recovery of rosettes in subsequent passages. This suggests that monitoring the number of rosettes is indeed a useful indicator of cell culture health. Cultures should be discarded if rosettes are completely lost. We will take a closer look at this aspect and report the findings in the revised manuscript.

      PAX6 Gene expression verification: Thank you for highlighting the discrepancy between PAX6 gene expression levels and protein levels. Unfortunately, we have not yet validated these results using an alternative technique. One potential explanation for this discrepancy may be the phenomenon of negative autoregulation, where increased levels of PAX6 protein can inhibit its own mRNA expression (Manuel et al., 2007). Moreover, Hsieh and Yang (2009) observed that during neurogenesis, PAX6 protein levels may not correlate linearly with mRNA levels, particularly in variable cellular environments. Additionally, post-transcriptional regulatory mechanisms, such as translation initiation mediated by Internal Ribosome Entry Sites (IRES), have been documented in various contexts involving PAX6, suggesting that mRNA levels may not fully represent functional protein levels in developing tissues (Li et al., 2023). We will go deeper into this discussion in the revised manuscript.

      GFAP Labeling: We appreciate your comments regarding the nuclear labeling of GFAP. In our astrocyte cultures, we have indeed observed GFAP localization in both the nucleus and the cytoplasm (Figure 5B). We will investigate this phenomenon further and provide a clearer explanation, supported by relevant literature, in the revised version. Although GFAP is primarily categorized as an intermediate filament protein localized in the cytoplasm, evidence suggests its nuclear localization may indicate additional regulatory roles during astrocyte development, activation, and pathology. This finding highlights the potential complexity of GFAP's role during fetal development and cellular stress, suggesting a broader functional scope that may extend into the nuclear space.

      Once again, thank you for your insightful feedback and for recognizing the potential of our research. We are committed to addressing your comments and enhancing the quality of our manuscript.

      Manuel, M. et al. (2007) ‘Controlled overexpression of Pax6 in vivo negatively autoregulates the Pax6 locus, causing cell-autonomous defects of late cortical progenitor proliferation with little effect on cortical arealization’, Development, 134(3), pp. 545–555. Available at: https://doi.org/10.1242/dev.02764.

      Hsieh, Y.-W. and Yang, X.-J. (2009) ‘Dynamic Pax6 expression during the neurogenic cell cycle influences proliferation and cell fate choices of retinal progenitors’, Neural Development, 4(1), p. 32. Available at: https://doi.org/10.1186/1749-8104-4-32.

      Li, Q. et al. (2023) ‘Translation of paired box 6 (PAX6) mRNA is IRES-mediated and inhibited by cymarin in breast cancer cells’, Genes & Genetic Systems, 98(4), pp. 161–169. Available at: https://doi.org/10.1266/ggs.23-00039.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      The authors performed genome assemblies for two Fagaceae species and collected transcriptome data from four natural tree species every month over two years. They identified seasonal gene expression patterns and further analyzed species-specific differences.

      Strengths:

      The study of gene expression patterns in natural environments, as opposed to controlled chambers, is gaining increasing attention. The authors collected RNA-seq data monthly for two years from four tree species and analyzed seasonal expression patterns. The data are novel. The authors could revise the manuscript to emphasize seasonal expression patterns in three species (with one additional species having more limited data). Furthermore, the chromosome-scale genome assemblies for the two Fagaceae species represent valuable resources, although the authors did not cite existing assemblies from closely related species.

      Thank you for your careful assessment of our manuscript.

      Weaknesses:

      Comment; The study design has a fundamental flaw regarding the evaluation of genetic or evolutionary effects. As a basic principle in biology, phenotypes, including gene expression levels, are influenced by genetics, environmental factors, and their interaction. This principle is well-established in quantitative genetics.

      In this study, the four species were sampled from three different sites (see Materials and Methods, lines 543-546), and additionally, two species were sampled from 2019-2021, while the other two were sampled from 2021-2023 (see Figure S2). This critical detail should be clearly described in the Results and Materials and Methods. Due to these variations in sampling sites and periods, environmental conditions are not uniform across species.

      Even in studies conducted in natural environments, there are ways to design experiments that allow genetic effects to be evaluated. For example, by studying co-occurring species, or through transplant experiments, or in common gardens. To illustrate the issue, imagine an experiment where clones of a single species were sampled from three sites and two time periods, similar to the current design. RNA-seq analysis would likely detect differences that could qualitatively resemble those reported in this manuscript.

      One example is in line 197, where genus-specific expression patterns are mentioned. While it may be true that the authors' conclusions (e.g., winter synchronization, phylogenetic constraints) reflect real biological trends, these conclusions are also predictable even without empirical data, and the current dataset does not provide quantitative support.

      If the authors can present a valid method to disentangle genetic and environmental effects from their dataset, that would significantly strengthen the manuscript. However, I do not believe the current study design is suitable for this purpose.

      Unless these issues are addressed, the use of the term "evolution" is inappropriate in this context. The title should be revised, and the result sections starting from "Peak months distribution..." should be either removed or fundamentally revised. The entire Discussion section, which is based on evolutionary interpretation, should be deleted in its current form.

      If the authors still wish to explore genetic or evolutionary analyses, the pair of L. edulis and L. glaber, which were sampled at the same site and over the same period, might be used to analyze "seasonal gene expression divergence in relation to sequence divergence." Nevertheless, the manuscript would benefit from focusing on seasonal expression patterns without framing the study in evolutionary terms.

      We sincerely thank the reviewer for the detailed and thoughtful comments. We fully recognize the importance of carefully distinguishing genetic and environmental contributions in transcriptomic studies, particularly when addressing evolutionary questions. The reviewer identified two major concerns regarding our study design: (1) the use of different monitoring periods across species, and (2) the use of samples collected from different study sites. We addressed both concerns with additional analyses using 112 new samples and now present new evidence that supports the robustness of our conclusions.

      (1) Monitoring period variation does not bias our conclusions

      To address concerns about the differing monitoring periods, we added new RNA-seq data (42 samples each for bud and leaf samples for L. glaber and 14 samples each for bud and leaf samples for L. edulis) collected from November 2021 to November 2022, enabling direct comparison across species within a consistent timeframe. Hierarchical clustering of this expanded dataset (Fig. S6) yielded results consistent with our original findings: winter-collected samples cluster together regardless of species identity. This strongly supports our conclusion that the seasonal synchrony observed in winter is not an artifact of the monitoring period and demonstrates the robustness of our conclusions across datasets.

      (2) Site variation is limited and does not confound our findings

      Although the study included three sites, two of them (Imajuku and Ito Campus) are only 7.3 km apart, share nearly identical temperature profiles (see Fig. S2), and are located at the edge of similar evergreen broadleaf forests. Only Q. acuta was sampled from a higher-altitude, cooler site. To assess whether the higher elevation site of Q. acuta introduced confounding environmental effects, we reanalyzed the data after excluding this species. Hierarchical clustering still revealed that winter bud samples formed a distinct cluster regardless of species identity (Fig. S7), consistent with our original finding.

      Furthermore, we recalculated the molecular phenology divergence index D (Fig. 4C) and the interspecific Pearson’s correlation coefficients (Fig. 5A) without including Q. acuta. These analyses produced results that were similar to those obtained from the full dataset (Fig. S12; Fig. S14), indicating that the observed patterns are not driven by environmental differences associated with elevation.

      (3) Justification for our approach in natural systems

      We agree with the reviewer that experimental approaches such as common gardens, reciprocal transplants, and the use of co-occurring species are valuable for disentangling genetic and environmental effects. In fact, we have previously implemented such designs in studies using the perennial herb Arabidopsis halleri (Komoto et al., 2022, https://doi.org/10.1111/pce.14716) and clonal Someiyoshino cherry trees (Miyawaki-Kuwakado et al., 2024, https://doi.org/10.1002/ppp3.10548) to examine environmental effects on gene expression. However, extending these approaches to long-lived tree species in diverse natural ecosystems poses significant logistical and biological challenges. In this study, we addressed this limitation by including three co-occurring species at the same site, which allowed us to evaluate interspecific differences under comparable environmental conditions. Importantly, even when we limited our analyses to these co-occurring species, the results remained consistent, indicating that the observed variation in transcriptomic profiles cannot be attributed to environmental factors alone and likely reflects underlying genetic influences.

      Accordingly, we added four new figures (Fig. S6, Fig. S7, Fig. S12 and Fig. S14) and revised the manuscript to clarify the limitations and strengths of our design, to tone down the evolutionary claims where appropriate, and to more explicitly define the scope of our conclusions in light of the data. We hope that these efforts sufficiently address the reviewer’s concerns and strengthen the manuscript.

      To better support the seasonal expression analysis, the early RNA-seq analysis sections should be strengthened. There is little discussion of biological replicate variation or variation among branches of the same individual. These could be important factors to analyze. In line 137, the mapping rate for two species is mentioned, but the rates for each species should be clearly reported. One RNA-seq dataset is based on a species different from the reference genome, so a lower mapping rate is expected. While this likely does not hinder downstream analysis, quantification is important.

      We thank the reviewer 1 for the helpful comment. To evaluate the variation among biological replicates, we compared the expression level of each gene across different individuals. We observed high correlation between each pair of individuals (Q. glauca (n=3): an average correlation coefficient r = 0.947; Q. acuta (n=3): r = 0.948; L. glaber (n=3): r = 0.948)). This result suggests that the seasonal gene expression pattern is highly synchronized across individuals within the same species. We mentioned this point in the Result section in the revised manuscript. We also calculated the mean mapping rates for each species. As the reviewer expected, the mapping rate was slightly lower in Q. acuta (88.6 ± 2.3%) and L. glaber (84.3 ± 5.4%), whose RNA-Seq data were mapped to reference genomes of related but different species, compared to that in Q. glauca (92.6 ± 2.2%) and L. edulis (89.3 ± 2.7%). However, we minimized the impact of these differences on downstream analysis. These details have been included in the revised main text.

      In Figures 2A and 2B, clustering is used to support several points discussed in the Results section (e.g., lines 175-177). However, clustering is primarily a visualization method or a hypothesis-generating tool; it cannot serve as a statistical test. Stronger conclusions would require further statistical testing.

      We thank the reviewer for the helpful comment. As noted, we acknowledge that hierarchical clustering (Fig. 2A) is primarily a visualization and hypothesis-generating method. To assess the biological relevance of the clusters identified, we conducted a Mann-Whitney U test or the Steel-Dwass test to evaluate whether the environmental temperatures at the time of sample collection differed significantly among the clusters. This analysis (Fig. 2B) revealed statistically significant differences in temperature in the cluster B3 (p < 0.01), indicating that the gene expression clusters are associated with seasonal thermal variation. These results support the interpretation that the clusters reflect coordinated transcriptional responses to environmental temperature. We revised the Results section to clarify this point.

      The quality of the genome assemblies appears adequate, but related assemblies should be cited and discussed. Several assemblies of Fagaceae species already exist, including Quercus mongolica (Ai et al., Mol Ecol Res, 2022), Q. gilva (Front Plant Sci, 2022), and Fagus sylvatica (GigaScience, 2018), among others. Is there any novelty here? Can you compare your results with these existing assemblies?

      We agree that genome assemblies of Fagaceae species are becoming increasing available. However, our study does not aim to emphasize the novelty of the genome assemblies per se. Rather, with the increasing availability of chromosome-level genomes, we regard genome assembly as a necessary foundation for more advanced analyses. The main objective of our study is to investigate how each gene is expressed in response to seasonal environmental changes, and to link genome information with seasonal transcriptomic dynamics. To address the reviewer’s comment in line with this objective, we added a discussion on the syntenic structure of eight genome assemblies spanning four genera within the Fagaceae, including a species from the genus Fagus (Ikezaki et al. 2025, https://doi.org/10.1101/2025.07.31.667835). This addition helps to position our work more clearly within the context of existing genomic resources.

      Most importantly, Figure 1B-D shows synteny between the two genera but also indicates homology between different chromosomes. Does this suggest paleopolyploidy or another novel feature? These chromosome connections should be interpreted in the main text-even if they could be methodological artifacts.

      A previous study on genome size variation in Fagaceae suggested that, given the consistent ploidy level across the family, genome expansion likely occurred through relatively small segmental duplications rather than whole-genome duplications. Because Figure 1B-D supports this view, we cited the following reference in the revised version of the manuscript.

      Chen et al. (2014)  https://doi.org/10.1007/s11295-014-0736-y

      In both the Results and Materials and Methods sections, descriptions of genome and RNA-seq data are unclear. In line 128, a paragraph on genome assembly suddenly introduces expression levels. RNA-seq data should be described before this. Similarly, in line 238, the sentence "we assembled high-quality reference genomes" seems disconnected from the surrounding discussion of expression studies. In line 632, Illumina short-read DNA sequencing is mentioned, but it's unclear how these data were used.

      We relocated the explanation regarding the expression levels of single-copy and multi-copy genes to the section titled “Seasonal gene expression dynamics.” Additionally, we clarified in the Materials and Methods section that short-read sequencing data were used for both genome size estimation and phylogenetic reconstruction.

      Reviewer #2 (Public review):

      Summary:

      This study explores how gene expression evolves in response to seasonal environments, using four evergreen Fagaceae species growing in similar habitats in Japan. By combining chromosome-scale genome assemblies with a two-year RNA-seq time series in leaves and buds, the authors identify seasonal rhythms in gene expression and examine both conserved and divergent patterns. A central result is that winter bud expression is highly conserved across species, likely due to shared physiological demands under cold conditions. One of the intriguing implications of this study is that seasonal cycles might play a role similar to ontogenetic stages in animals. The authors touch on this by comparing their findings to the developmental hourglass model, and indeed, the recurrence of phenological states such as winter dormancy may act as a cyclic form of developmental canalization, shaping expression evolution in a way analogous to embryogenesis in animals.

      Strengths:

      (1) The evolutionary effects of seasonal environments on gene expression are rarely studied at this scale. This paper fills that gap.

      (2) The dataset is extensive, covering two years, two tissues, and four tree species, and is well suited to the questions being asked.

      (3) Transcriptome clustering across species (Figure 2) shows strong grouping by season and tissue rather than species, suggesting that the authors effectively controlled for technical confounders such as batch effects and mapping bias.

      (4) The idea that winter imposes a shared constraint on gene expression, especially in buds, is well argued and supported by the data.

      (5) The discussion links the findings to known concepts like phenological synchrony and the developmental hourglass model, which helps frame the results.

      We are grateful for the reviewer for the detailed and thoughtful review of our manuscript.

      Weaknesses:

      (1) While the hierarchical clustering shown in Figure 2A largely supports separation by tissue type and season, one issue worth noting is that some leaf samples appear to cluster closely with bud samples. The authors do not comment on this pattern, which raises questions about possible biological overlap between tissues during certain seasonal transitions or technical artifacts such as sample contamination. Clarifying this point would improve confidence in the interpretation of tissue-specific seasonal expression patterns.

      Leaf samples clustered into the bud are newly flushed leaves collected in April for Q. glauca, May for Q. acuta, May and June for L. edulis, and August and September for L. glaber. To clarify this point, we highlighted these newly flushed leaf samples as asterisk in the revised figure (Fig. 2A).

      comment; (2) While the study provides compelling evidence of conserved and divergent seasonal gene expression, it does not directly examine the role of cis-regulatory elements or chromatin-level regulatory architecture. Including regulatory genomic or epigenomic data would considerably strengthen the mechanistic understanding of expression divergence.

      We thank the reviewer for this insightful comment. As noted in the Discussion section, we hypothesize that such genome-wide seasonal expression patterns—and their divergence across species—are likely mediated by cis-regulatory elements and chromatin-level mechanisms. While a direct investigation of regulatory architecture was beyond the scope of the present study, we fully agree that incorporating regulatory genomic and epigenomic data would significantly deepen the mechanistic understanding of expression divergence. In this regard, we are currently working to identify putative cis-regulatory elements in non-coding regions and are collecting epigenetic data from the same tree species using ChIP-seq. We believe the current study provide a foundation for these future investigations into the regulatory basis of seasonal transcriptome variation. We made a minor revision to the Discussion to note that an important future direction is to investigate the evolution of non-coding sequences that regulate gene expression in response to seasonal environmental changes.

      (3) The manuscript includes a thoughtful analysis of flowering-related genes and seasonal GO enrichment (e.g., Figure 3C-D), providing an initial link between gene expression timing and phenological functions. However, the analysis remains largely gene-centric, and the study does not incorporate direct measurements of phenological traits (e.g., flowering or bud break dates). As a result, the connection between molecular divergence and phenotypic variation, while suggestive, remains indirect.

      We would like to note that phenological traits have been observed in the field on a monthly basis throughout the sampling period and the phenological data were plotted together with molecular phenology (e.g. Fig. 2A, C; Fig. 3C, D). Although the temporal resolution is limited, these observations captured species-specific differences in key phenological events such as leaf flushing and flowering times. We revised the manuscript to clarify this point.

      (4) Although species were sampled from similar habitats, one species (Q. acuta) was collected at a higher elevation, and factors such as microclimate or local photoperiod conditions could influence expression patterns. These potential confounding variables are not fully accounted for, and their effects should be more thoroughly discussed or controlled in future analyses.

      We fully agree with the reviewer that local environmental conditions, including microclimate and photoperiod differences, could potentially influence gene expression patterns. To assess whether the higher elevation site of Q. acuta introduced confounding environmental effects, we reanalyzed the data after excluding this species. Hierarchical clustering still revealed that winter bud samples formed a distinct cluster regardless of species identity (Fig. S7), consistent with our original finding.

      Furthermore, we recalculated the molecular phenology divergence index D (Fig. 4C) and the interspecific Pearson’s correlation coefficients (Fig. 5A) without including Q. acuta. These analyses produced results that were qualitatively similar to those obtained from the full dataset (Fig. S12; Fig. S14), indicating that the observed patterns are not driven by environmental differences associated with elevation.

      We believe these additional analyses help to decouple the effects of environment and genetics, and support our conclusion that both seasonal synchrony and phylogenetic constraints play key roles in shaping transcriptome dynamics. We added four new figures (Fig. S6, Fig. S7, Fig. S12 and Fig. S14) and revised the text accordingly to clarify this point and to acknowledge the potential impact of site-specific environmental variation.

      (5) Statistical and Interpretive Concerns Regarding Δφ and dN/dS Correlation (Figures 5E and 5F):

      (a) Statistical Inappropriateness: Δφ is a discrete ordinal variable (likely 1-11), making it unsuitable for Pearson correlation, which assumes continuous, normally distributed variables. This undermines the statistical validity of the analysis.

      We thank the reviewer for the insightful comment. We would like to clarify that the analysis presented in Figures 5E and 5F was based on linear regression, not Pearson’s correlation. Although Δφ is a discrete variable, it takes values from 0 to 6 in 0.5 increments, resulting in 13 levels. We treated it as a quasi-continuous variable for the purposes of linear regression analysis. This approach is commonly adopted in practice when a discrete variable has sufficient resolution and ordering to approximate continuity. To enhance clarity, we revised the manuscript to explicitly state that linear regression was used, and we now reported the regression coefficient and associated p-value to support the interpretation of the observed trend.

      (b) Biological Interpretability: Even with the substantial statistical power afforded by genome-wide analysis, the observed correlations are extremely weak. This suggests that the relationship, if any, between temporal divergence in expression and protein-coding evolution is negligible.

      Taken together, these issues weaken the case for any biologically meaningful association between Δφ and dN/dS. I recommend either omitting these panels or clearly reframing them as exploratory and statistically limited observations.

      We agree with the reviewer’s comment. While we retained the original panels, we reframed our interpretation to emphasize that, despite statistical significance, the observed correlation is very weak—suggesting that coding region variation is unlikely to be the primary driver of seasonal gene expression patterns. Accordingly, we revised the “Relating seasonal gene expression divergence to sequence divergence” section in the Results, as well as the relevant part of the Discussion.

    1. Author response:

      We thank the editor and reviewers for their positive and detailed review of the preprint. We will use these comments to improve the manuscript's revised version, which we plan to submit in the coming weeks, including: a) tests of variants of ResNet, other network architectures and the use of pre-trained weights, b) clarification and justification of the accuracy metrics used in the benchmark, c) an expanded study about the fragment connectivity in Figure 3, and d) a study the performance of idmatcher.ai with the new idtracker.ai.

    1. Author response:

      Reviewer #1 (Public review):

      The authors' goal was to arrest PsV capsids on the extracellular matrix using cytochalasin D. The cohort was then released, and interaction with the cell surface, specifically with CD151, was assessed.

      The model that fragmented HS associated with released virions mediates the dominant mechanism of infectious entry has only been suggested by research from a single laboratory and has not been verified in the 10+ years since publication. The authors are basing this study on the assumption that this model is correct, and these data are referred to repeatedly as the accepted model despite much evidence to the contrary.

      Please note that we state in the introduction on line 65/66 ´Two release mechanisms are discussed, that mutually are not exclusive´. This is implying that we do not consider the shedding model as the one accepted model. HS may associate with PsVs despite of a decreased affinity and only after priming (see below the ‘priming model’) may translocate to the cell body.

      Furthermore, we do not state in the discussion either that the shedding model is the preferred one; although it is correct that we refer to the shedding model more extensively, simply because we find HS associated with transferred PsVs, which is in line with this model and requires its citation.

      The discussion in lines 65-71 concerning virion and HSPG affinity changes is greatly simplified. The structural changes in the capsid induced by HS interaction and the role of this priming for KLK8 and furin cleavage have been well researched. Multiple laboratories have independently documented this. If this study aims to verify the shedding model, additional data need to be provided.

      As outlined above, our finding is compatible with both models, and we do not aim to verify the shedding model or disprove the priming model.

      It appears that the referee wishes more visibility of the priming model. Inhibition of KLK8 and furin should reduce the translocation to the cell body, no matter whether PsVs carry HS on their surface or not. For revision, we plan an experiment as in Figure 3 (CytD), testing whether either KLK8 or furin inhibition blocks the transfer to the cell body. Then, our data can be discussed also in the context of the priming model and by this increase its visibility.

      The model should be fitted into established entry events, or at minimum, these conflicting data, a subset of which is noted below, need to be acknowledged.

      (1) The Sapp lab (Richards et al., 2013) found that HSPG-mediated conformational changes in L1 and L2 allowed the release of the virus from primary binding and allowing secondary receptor engagements in the absence of HS shedding.

      (2) Becker et al. found that furin-precleaved capsids could infect cells independently of HSPG interaction, but this infection was still inhibited with cytochalasin D.

      (3) Other work from the Schelhaas lab showed that cytochalasin D inhibition of infection resulted in the accumulation of capsids in deep invaginations from the cell surface, not on the ECM

      (4) Selinka et al., 2007, showed that preventing HSPG-induced conformational changes in the capsid surface resulted in noninfectious uptake that was not prevented with cytochalasin D.

      (5) The well-described capsid processing events by KLK8 and furin need to be mechanistically linked to the proposed model. Does inhibition of either of these cleavages prevent engagement with CD151?

      The authors need to consider an explanation for these discrepancies.

      That PsVs carry HS-cleavage products doesn´t imply that HS cleavage is sufficient or required for infection. Therefore, we do not view our data as being in conflict with the priming model. In fact, our observations are compatible with aspects of both the shedding and the priming model.

      Yet, we acknowledge that the study would gain importance by directly testing the priming model within our experimental system. As requested by the referee, we will discuss the above papers, and further plan to test KLK8 and furin inhibitors.

      Other issues:

      (1) Line 110-111. The statement about PsVs in the ECM being too far away from the cell surface to make physical contact with the cell surface entry receptors is confusing. ECM binding has not been shown to be an obligatory step for in vitro infection.

      Not obligatory, but strongly supportive (Bienkowska-Haba et al., Plos Path., 2018; Surviladze et al., J. Gen. Viro., 2015). As recently published by the Sapp lab (Bienkowska-Haba et al., Plos Path., 2018), ´Direct binding of HPV16 to primary keratinocytes yields very inefficient infection rates for unknown reasons.´ Moreover, the paper shows that HaCaT cell ECM binding of PsVs increases the infection of NHEK by 10-fold and of HFK by almost 50-fold.

      This idea is referred to again on lines 158-159 and 199. The claim (line 158) that PsV does not interact with the cell within an hour needs to be demonstrated experimentally and seems at odds with multiple laboratories' data. PsV has been shown to directly interact with HSPG on the cell surface in addition to the ECM. Why are these PsVs not detected?

      We do not question that in many cellular systems PsVs interact with heparan sulfate proteoglycans (HSPGs) present on the cell surface, or both on the cell surface and the ECM. We stated in the manuscript on line 59 ´While in cell culture virions bind to HS of the cell surface and the ECM, it has been suggested that in vivo they bind predominantly to HS of the extracellular basement membrane (Day and Schelhaas, 2014; Kines et al., 2009; Schiller et al., 2010).´

      Moreover, we ourselves detect these PsVs, for example, in Figure 5A (CytD, 0 min time point), a handful of PsVs localize to the cell body area. However, the large majority overlaps with the strong HS staining at the cell periphery, likely the ECM. An accurate quantification of the fractions of PsVs bound to the ECM/cell body is for the following reasons very difficult. First, the ECM PsVs are very dense and therefore not microscopically resolved into single PsVs, at least not completely (see Figure 1C; the high intensity spots are non-resolved PsVs, please see our discussion on line 148 - 152). For this reason, by just counting spots we strongly underestimate the ECM PsVs versus the cell body PsVs. Second, with the available immunostainings we cannot exactly delineate the ECM from the cell body. In particular, at the cell border region (for example see Figure 4B) we often observe PsV accumulations. Assigning these ´cell border region PsVs´ entirely to the cell body fraction, a preliminary analysis (correcting for the limitation of non-resolved ECM PsVs) suggests that about a quarter of the PsVs bind to the cell body. On the other hand, assigning them to the ECM, the cell body fraction would be much below 10%. Third, we observe that in regions devoid of ECM and cells PsVs apparently adhere unspecifically to the glass-coverslip. This suggests that some of the cell body PsVs are just unspecific background. Subtraction of a background PsV density from the ECM and cell body PsV density will reduce relatively more the cell body PsVs, and consequently decreases the fraction of cell body PsVs even more.

      Moreover, in the course of the project we wondered whether at the basolateral membrane there are not many binding sites anyway. To address this question, in an unpublished experiment, we detached HaCaT cells with trypsin, incubated them with PsVs, and then allowed reattachment to assess the binding in suspension. We detected minimal to no binding, which, however, could also result from apical membrane adherence to the coverslip or trypsin-mediated cleavage of HSPGs. As suggested by the reviewing editor, we agree that repeating this experiment using EDTA for detachment—thus preserving HSPGs—would offer more definitive insight into binding efficiency in the absence of accessibility constraints. In summary, the reason why in our cellular system most PsVs do not bind to the cell surface could be a combination of several factors:

      (1) The primary binding partners are more abundant in the ECM and the polarized HaCaT cells secrete more ECM when compared to other cultured cells used to study HPV infection. This promotes ECM binding.

      (2) In the polarized HaCaT cells, the apical membrane is largely devoid of syndecan-1, CD151 and Itga6, wherefore PsVs infect the cell via the basolateral membrane. However, the accessibility to the basolateral membrane is restricted, PsVs must diffuse through a narrow slit between the glass coverslip and the attached cell to reach HS on the cell surface. This limits cell surface binding.

      (3) If HaCaT cells secrete large amounts of ECM, the may become depleted from cell surface HS. As outlined above, we will try to find out how many PsVs bind to the basolateral membrane in the absence of restricted accessibility. If it turns out that HaCaT cells have not many binding sites anyway, this would additionally promote binding to the ECM.

      The outcome of the above issues, and how we will mention them in the revised version of the manuscript, is open. In any case, we would like to point out that PsVs bound to the cell body do not weaken our main conclusion. Still, we recognize that this point merits attention and plan several modifications of the manuscript. We did already, but now we will mention more explicitly that PsVs have been shown to directly interact with HSPG on the cell surface, in addition to the ECM, but that it also has been shown that the ECM strongly supports infection in NHEK and HFK (Bienkowska-Haba et al., Plos Path., 2018). The following is a draft version of a paragraph we plan to incorporate, explaining the above issue and why we used in our experiments HaCaT cells:

      ´In vitro, PsVs bind to both the cell surface and the ECM, as has been widely documented. In vivo, however, it has been proposed that initial binding occurs predominantly to the basement membrane ECM, rather than directly to the cell surface (Day and Schelhaas, 2014; Kines et al., 2009; Schiller et al., 2010). This distinction reinforces the physiological relevance of ECM-bound particles in the early steps of HPV infection. Support for a functional role of ECM-mediated entry comes from a study showing that PsV binding to ECM derived from HaCaT cells significantly enhances infection of primary keratinocytes (Bienkowska-Haba et al., 2018). For these reasons, we specifically chose polarized HaCaT cells as a model system. These cells secrete abundant ECM from which the cells readily collect bound PsVs. On the other hand, the polarization limits the access of PsVs to basolateral receptors such as CD151 and Itgα6, and also cell body resident Syndecan-1, the most abundant HSPG in keratinocytes (Rapraeger et al., 1986; Hayashi et al., 1987; Kim et al., 1994). Hence, as polarization limits direct cell surface accessibility it biases binding toward the ECM, that in this culture system is abundant. Hence, in the HaCaT cell culture system, like probably in vivo, PsVs cannot circumvent binding to the ECM what they can do in unpolarized cell cultures that may not even secrete significant amounts of ECM. Altogether, this experimental situation closely mimics the in vivo situation where PsVs bind preferentially to the ECM (Day and Schelhaas, 2014; Kines et al., 2009; Schiller et al., 2010).´

      We appreciate the reviewer’s input and believe these additions will strengthen the manuscript with regard to the relevance of the used cellular model system.

      (2) The experiments shown in Figure 5 need to be better controlled. Why is there no HS staining of the cell surface at the early timepoints? This antibody has been shown to recognize N-sulfated glucosamine residues on HS and, therefore, detects HSPG on the ECM and cell surface.

      We have shown all images at the same adjustments of brightness and contrast. As the staining at the periphery is stronger, the impression is given that the cell surface is not stained, although there is some staining. Specific staining is documented in Figure 5D, showing the PCC between PsVs and HS only of the cell body. If there was no HS staining, the PCC would be zero, which is not the case. Yet, it is lower when compared to the PCC measured at the cell border region, with more strongly stained HS.

      We will provide images at different contrast and brightness adjustments enabling the reader to see the staining on the cell surface. We will provide also more overview images to illustrate the strong variability of the HS staining between cells.

      Therefore, the conclusion that this confirms HS coating of PsV during release from the ECM (line 430-431) is unfounded. How do the authors distinguish between "HS-coated virions" and HSPG-associated virions?

      The HS intensity transiently increases on the cell body (Fig. 5D) only after releasing a cohort of PsVs, which can be only explained by PsVs that carry HS from the ECM to the cell body. However, the effect is not significant. Using the antibody 3G10 detecting the HS neoepitope (see the referees’ suggestion below) we will reanalyze this point. This should help clarifying the issue.

      It is difficult to comprehend how the addition of 50 vge/cell of PsV could cause such a global change in HS levels.

      The distribution of bound PsVs largely varies between cells. Some areas are covered with essentially confluent cells, to which hardly any PsVs are bound, because accessing the basolateral membrane of confluent cells is nearly impossible, and PsVs do not bind to the exposed apical membrane. This is different in cultures of unpolarized cells where we expect that PsVs distribute more equally over cells.

      This means that in our experiments the vge/cell is not a suitable parameter for relating the magnitude of an effect to a defined number of PsVs. In the ECM, the PsV density is very high, enabling one cell to collect several hundred PsVs, much more than expected from the 50 vge/cell. We will point this out in the revised version.

      The claim that the HS levels are decreased in the non-cytochalasin-treated cells due to PsV-induced shedding needs to be demonstrated.

      We did not claim that PsVs induce shedding, we rather believe they just take shedded HS with them. Without PsVs, the shedded HS likely remains in the ECM or is washed out very slowly.

      If HS is actually shed, staining of the cell periphery could increase with the antibody 3G10, which detects the HS neoepitope created following heparinase cleavage.

      As outlined above, we plan to test the suggested antibody 3G10. We also plan to repeat the 0 min time point (with and without PsVs, with and without CytD) to find out whether in the PsV absence the HS intensity (at 0 min) is unchanged between control and CytD.

      Reviewer #2 (Public review):

      Summary:

      Massenberg and colleagues aimed to understand how Human papillomavirus particles that bind to the extracellular matrix (ECM) transfer to the cell body for later uptake, entry, and infection. The binding to ECM is key for getting close to the virus's host cell (basal keratinocytes) after a wounding scenario for later infection in a mouse vaginal challenge model, indicating that this is an important question in the field.

      Strengths:

      The authors take on a conceptually interesting and potentially very important question to understand how initial infection occurs in vivo. The authors confirm previous work that actin-based processes contribute to virus transport to the cell body. The superresolution microscopy methods and data collection are state-of-the art and provide an interesting new way of analysing the interaction with host cell proteins on the cell surface in certain infection scenarios. The proposed hypothesis is interesting and, if substantiated, could significantly advance the field.

      Weaknesses:

      As a study design, the authors use infection of HaCaT keratinocytes, and follow virus localisation with and without inhibition of actin polymerisation by cytochalasin D (cytoD) to analyse transfer of virions from the ECM to the cell by filopodial structures using important cellular proteins for cell entry as markers.

      First, the data is mostly descriptive besides the use of cytoD, and does not test the main claim of their model, in which virions that are still bound to heparan sulfate proteoglycans are transferred by binding to tetraspanins along filopodia to the cell body.

      The study identifies a rapid translocation step from the ECM to the cell body. We have no data that demonstrates a physical interaction between PsVs and CD151. In the model figure, we draw CD151 as part of the secondary receptor complex. We are sorry for having raised the impression that PsVs would bind directly to CD151 and will rephrase the respective section.

      Second, using cytoD is a rather broad treatment that not only affects actin retrograde flow, but also virus endocytosis and further vesicular transport in cells, including exocytosis. Inhibition of myosin II, e.g., by blebbistatin, would have been a better choice as it, for instance, does not interfere with endocytosis of the virus.

      We agree, and plan to test whether blebbistatin is equally efficient in blocking the transfer.

      Third, the authors aim to study transfer from ECM to the cell body and the effects thereof. However, there are substantial, if not the majority of, viruses that bind to the cell body compared to ECM-bound viruses in close vicinity to the cells.

      We agree that in multiple cell culture systems viruses bind preferentially to the cell directly. But we respectfully disagree with the assertion that the majority of PsVs bind to the cell body of HaCaT keratinocytes. As noted above (e.g., Figure 5A, CytD, 0 min), only a small fraction of PsVs localize to the cell body, whereas the vast majority overlap with intense HS staining at the cell periphery, consistent with ECM association, as the accessibility to the basolateral expressed HSPG is limited (see above). Based on quantitative estimation from multiple images, ECM-bound PsVs largely outnumber cell-bound particles (see above). These features make HaCaT cells a suitable in vitro model for mimicking in vivo conditions, where HPV has been proposed to bind predominantly to the basement membrane ECM rather than the cell surface (Day and Schelhaas, 2014; Kines et al., 2009; Schiller et al., 2010) which also strongly enhances infection of primary keratinocytes in vitro (Bienkowska-Haba et al., 2018).

      Thus, we believe our system appropriately models the physiologically relevant scenario of ECM-to-cell transfer, and the observed predominance of ECM binding supports the validity of our experimental focus.

      This is in part obscured by the small subcellular regions of interest that are imaged by STED microscopy, or by the use of plasma membrane sheets. As a consequence, the obtained data from time point experiments is skewed, and remains for the most part unconvincing due to the fact that the origin of virions in time and space cannot be taken into account. This is particularly important when interpreting association with HS, the tetraspanin CD151, and integral alpha 6, as the low degree of association could originate from cell-bound and ECM-transferred virions alike.

      As stated above, we observe massive binding of PsVs to the ECM, in contrast to very few PsVs that diffuse beneath the basolateral membrane of the polarized HaCaT cells and do bind directly to the cell surface (or maybe they are simply trapped between glass and basolateral membrane). PsVs are not expected to bind to the apical membrane that is depleted from CD151 and Itga6. In other cellular systems, cells may hardly secrete ECM, are not polarized, and do not adhere so tightly to the substrate. In other cultures, where virions can easily circumvent ECM binding, the large majority of PsVs will likely bind directly to the cell surface.

      As outlined above, in order to quantify PsVs that can bind without restricted accessibility, we plan to detach HaCaT cells by EDTA from the substrate, incubate them with PsVs, and let them adhere again (please see above).

      No matter what is the outcome, the fraction of PsVs that binds directly to the cell surface does not weaken our conclusion that we have identified a very fast and efficient transfer step from the ECM to the cell body.

      Fourth, the use of fixed images in a time course series also does not allow for understanding the issue of a potential contribution of cell membrane retraction upon cytoD treatment due to destabilisation of cortical actin. Or, of cell spreading upon cytoD washout.

      If blebbistatin works as expected, we can safely conclude that we observe the very same process as described in Scheelhas et al., PLoS Pathogens, 2008, showing that the PsVs migrate by retrograde transport to the cell surface and not that the cell spreads out and by this reaches the PsVs.

      The microscopic analysis uses an extension of a plasma membrane stain as a marker for ECM-bound virions, which may introduce a bias and skew the analysis.

      Our plasma membrane stain does not stain the ECM. Please see Figure 1. The stain is actually used to distinguish the cell body from the ECM area.

      Fifth, while the use of randomisation during image analysis is highly recommended to establish significance (flipping), it should be done using only ROIs that have a similar density of objects for which correlations are being established.

      We agree that the way of how randomization is done is very important. Regarding the association of PsVs with CD151 and HS, based on flipped images, we generated a calibration curve used for the correction of random background. For details, please see Supplementary Figures 3 and 5.

      For instance, if one flips an image with half of the image showing the cell body, and half of the image ECM, it is clear that association with cell membrane structures will only be significant in the original. I am rather convinced that using randomisation only on the plasma membrane ROIs will not establish any clear significance of the correlating signals.

      Figure 5D shows the PCC specifically of the cell body. In flipped images (not shown in the manuscript for clarity, but can be added) we obtain a PCC of around zero.  For CytD, the flipped images always have a significantly lower PCC compared to the original images. In the control, the PCC of the flipped images are significantly lower only for the 30 min and 60 min time point. The non-significance of the 0 min and 180 min time point is due to low PCCs also in the original images.

      Also, there should be a higher n for the measurements.

      One n is the average of 15 cells. We realize that with n = 3 we find significant effects only if the effect is very strong or moderate with very low variance.

    1. Author response:

      Reviewer #1 (Public Review):

      In this study, Deng et al. investigate the antibody response against HA antigen following repeated vaccination with the H1N1 2009 pandemic influenza vaccine strain, using in silico modeling. The proposed model provides valuable mechanistic insights into how the broadening of the antibody response takes place upon repeated vaccination.

      Overall, the authors' model effectively explains the mechanistic principles underlying antibody responses against the viral antigens harboring epitope immunodominancy.

      We thank the Reviewer for their positive and thoughtful assessment of the work. We address issues raised in the revised manuscript and in the point-by-point responses below.

      Reviewer #2 (Public Review):

      The authors have been studying the mechanism of breadth expansion in antibody responses with repeated vaccinations using their own mathematical model. In this study, they applied this mathematical model to a cohort data analyzing anti-HA antibody responses after multiple influenza virus vaccination and investigated the mechanism of antibody breadth expansion to diversified target viral strains.

      The manuscript is well written, and the mathematical model is well built that incorporates various parameters related to B cell activation in GC and EGC based on experimental data.

      We thank the reviewer for their positive and thoughtful review and address issues raised in a revised version of the manuscript and in the point-by-point below.

      Strengths:

      By carefully reanalyzing the published cohort data (Nunez IA et al 2017 PLoS One), they have clearly demonstrated that the repeated influenza virus vaccinations result in an expansion of the breadth to unmatched viral strains.

      Using their mathematical model, they have determined the major factors for the breadth expansion following multiple immunizations.

      We thank the reviewer for pointing out the strengths of our study.

      Weaknesses

      The overall concept of their model has already been published (Yang L et al 2023 Cell Reports) with a SARS-CoV-2 vaccine model, and they have applied it to influenza virus vaccine in this study, with the conclusions being largely the same.

      It is unclear how the re-evaluation of public data in the first half part is related to the validation of their model in the later part.

      The reviewer is correct in that we build directly on our model published previously to study related phenomena for SARS-CoV-2. However, a critical advance of the work was to now ask whether antibody broadening following repeated homologous antigen exposure is a general feature of human humoral immunity. As we point out in the introduction of our manuscript, repeated exposure to the same antigen has long been assumed to predominantly boost strain limited humoral immunity, necessitating rational design of vaccines that re-orient antibody responses to target otherwise immune-subdominant targets. Hence, antibody broadening in response to homologous SARS-CoV-2 antigen points to reconsideration of that basic premise in immunology; and if we are to now define this as general feature of human antibody responses, then evaluation of the principle using a different vaccine protocol and antigen is necessitated. Accordingly, we took advantage of the influenza vaccine space where, within the immediate years following the 2009 H1N1 pandemic, the 2009 H1N1 strain was repeatedly applied as the seasonal vaccine strain. This HA was also novel (as it was from a pandemic virus pHA), meaning that traditional back-boosting to historical strains would be limited. We then re-evaluated the longitudinal HAI data of Nurez et al. to define whether a broadening to increasingly divergent vaccine-unmatched strains is observed upon repeated exposure to pHA. This was not done before and was enabled by incorporating our amino acid relatedness parameter and our structure-based definition of the RBS patch. To then query mechanistic origins of the broadening effect, we adapted and extended our previous computational model to: (1) better reflect HA epitope diversity and overlap within the RBS patch; and (2) to better reflect the influenza immunization regimens that are used clinically. The differences between the modeling done in this paper and that in Yang et al. 2023 are described in the Methods section separately. Taken together, our analyses of data in Nunez et al and our simulations strengthen the emerging view that repeated boosting with the same antigen enables the humoral immune system to diversify immune responses because of feedback regulation which leads to enhanced antigen on FDCs, persistent GCs, and epitope masking. This, in turn, enables the immune system to generalize to recognize and respond to unseen variant antigens that harbor mutations in the immunodominant epitopes. Our results point to a new and emerging paradigm regarding booster immunizations and fundamental features of the humoral immune system.

      Other points:

      In the original data by Nurez LA et al., HAI (the inhibitory effect of anti-HA antibodies on the binding of HA to sialic acid on erythrocytes) was used as the lead-out. The authors conclude that the breadth expansion with repeated vaccinations is primarily due to the activation of B cells with BCRs that recognize minor common epitopes, induced by covering up of strain specific major epitopes by pre-existing antibodies. However, as they themselves show in Fig 1, once the sialic acid-binding region is covered, it seems difficult for another BCR to bind to this region. When the target epitope is limited like this, the effect of increasing antigen supply to DCs by pre-existing antibodies and the effect of increasing the presentation of minor epitopes appears to compete with each other. Could the author please explain this point?

      We agree that accounting for epitope overlap is important when the target is limited, as the reviewer indicates. In Figure 6C vs 6D we assess steric effects of possible spatial overlap between dominant and subdominant epitopes. Under overlapping conditions, we find evidence for steric-based constrainment of broadening, as predicted by the reviewer. Depending upon the degree of overlap between the epitopes and differences in germline characteristics in the B cells targeting dominant and subdominant epitopes, this effect could be compensated during subsequent shots, as described by our results (see lines 392-406).

      We also now incorporate the following sentence into our discussion (lines 448-453):

      “Epitope masking will also be constrained by the dimensions of the RBS and our simulations do report attenuation of titers against historical influenza strains when we introduce epitope overlap. Depending upon the degree of overlap between the epitopes and differences in germline characteristics in the B cells targeting dominant and subdominant epitopes, this effect could be compensated during subsequent shots.”

      In relation to this point, please explain the meaning of analysis of the entire ectodomain when the original data's lead-out is HAI.

      We include side-by-side full length ectodomain versus RBS patch (sialic acid binding residues + antibody epitope ring) to demonstrate relatedness differences in the lead-out data. But it is precisely because of the point raised by the reviewer that we focus on using the RBS patch as the relatedness values to assess antibody broadening as defined by HAI activity (see Figure 3 and S2). 

      Minor point:

      The description "The purpose of this model is ...." starting at line 171 and the description of "we obtain results in harmony with the clinical findings ...." starting at line 478 sound to be contradictory. As the authors themselves state at line 171, if the purpose of this model is not to fit the data but to demonstrate the principle, then the prudent sampling and reanalyzing data itself seems to have less meaning.

      We respectfully disagree. Please see above point as to how the clinical data is more than just “reanalyzing” but to first discover the previously unreported broadening effect across highly divergent strains following sequential immunization with homologous antigen in the influenza vaccine space; we then extended and adapted our computational model for the influenza vaccination paradigm to gain mechanistic insight on how such antibody broadening may occur. The word “harmony” was not meant to imply quantitative agreement, and apologize if it caused confusion.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public review):

      Summary:

      The study by Wu et al presents interesting data on bacterial cell organization, a field that is progressing now, mainly due to the advances in microscopy. Based mainly on fluorescence microscopy images, the authors aim to demonstrate that the two structures that account for bacterial motility, the chemotaxis complex and the flagella, colocalize to the same pole in Pseudomonas aeruginosa cells and to expose the regulation underlying their spatial organization and functioning.

      Comments on revisions:

      The authors have addressed all major and minor points that I raised in a satisfying way during the revision process. The work can now be regarded as complete, the assumptions were clarified, the results are convincing, the conclusions are justified, and the novelty has been made clear.

      This manuscript will be of interest to cell biologists, mainly those studying bacteria, but not only.

      Reviewer #2 (Public review):

      Summary:

      Here, the authors studied the molecular mechanisms by which the chemoreceptor cluster and flagella motor of Pseudomonas aeruginosa (PA) are spatially organized in the cell. They argue that FlhF is involved in localizing the receptors-motor to the cell pole, and even without FlhF, the two are colocalized. Finally, the authors argue that the functional reason for this colocalization is to insulate chemotactic signaling from other signaling pathways, such as cyclic-di-GMP signaling.

      Strength:

      The experiments and data are high quality. It is clear that the motor and receptors co-localize, and that elevated CheY levels lead to elevated c-di-GMP.

      Weakness:

      The explanation for the functional importance of receptor-motor colocalization is plausible but is still not conclusively demonstrated. Colocalization might reduce CheY levels throughout the cell in order to reduce cross-talk with c-di-GMP. This would mean that if physiologically-relevant levels of CheYp near the pole were present throughout the cell, c-di-GMP levels would be elevated to a point that is problematic for the cell. Clearly demonstrating this seems challenging.

      We acknowledge that directly proving the necessity of colocalization to prevent problematic c-di-GMP elevation is experimentally challenging, as it would require creating a system where CheY-P is artificially distributed throughout the cell at physiologically relevant concentrations while maintaining normal chemotaxis function.

      However, our data provide several lines of evidence supporting this model. First, we show that CheY overexpression leads to substantial c-di-GMP elevation (71.8% increase) and cell aggregation, demonstrating that elevated CheY levels can indeed cause problematic cross-pathway interference. Second, previous work has shown that CheY-P levels near the pole are an order of magnitude higher than in the rest of the cell (ref. 46). If this elevated CheY-P concentration near the pole were present throughout the cell, our data suggest that c-di-GMP levels would be elevated sufficiently to cause cell aggregation (Fig. 4A), thereby disabling normal motility and chemotaxis. Third, the dose-dependent relationship between CheY concentration and aggregation phenotype supports the idea that precise spatial regulation of CheY levels is functionally important for avoiding cross-pathway interference.

      Reviewer #3 (Public review):

      Summary:

      The authors investigated the assembly and polar localization of the chemosensory cluster in P. aeruginosa. They discovered that a certain protein (FlhF) is required for the polar localization of the chemosensory cluster while a fully-assembled motor is necessary for the assembly of the cluster. They found that flagella and chemosensory clusters always co-localize in the cell; either at the cell pole in wild type cells or randomly-located in the cell in FlhF mutant cells. They hypothesize that this co-localization is required to keep the level of another protein (CheY-P), which controls motor switching, at low levels as the presence of high-levels of this protein (if the flagella and chemosensory clusters were not co-localized) is associated with high-levels of c-di-GMP and cell aggregations.

      Strengths:

      The manuscript is clearly written and straightforward. The authors applied multiple techniques to study the bacterial motility system including fluorescence light microscopy and gene editing. In general, the work enhances our understanding of the subtlety of interaction between the chemosensory cluster and the flagellar motor to regulate cell motility.

      Weaknesses:

      The major weakness for me in this paper is that the authors never discussed how the flagellar genes expression is controlled in P. aeruginosa. For example, in E. coli there is a transcriptional hierarchy for the flagellar genes (early, middle, and late genes, see Chilcott and Hughes, 2000). Similarly, Campylobacter and Helicobacter have a different regulatory cascade for their flagellar genes (See Lertsethtakarn, Ottemann, and Hendrixson, 2011). How does the expression of flagellar genes in P. aeruginosa compare to other species? how many classes are there for these genes? is there a hierarchy in their expression and how does this affect the results of the FliF and FliG mutants? In other words, if FliF and FliG are in class I (as in E. coli) then their absence might affect the expression of other later flagellar genes in subsequent classes (i.e., chemosensory genes). Also, in both FliF and FliG mutants no assembly intermediates of the flagellar motor are present in the cell as FliG is required for the assembly of FliF (see Hiroyuki Terashima et al. 2020, Kaplan et al. 2019, Kaplan et al. 2022). It could be argued that when the motor is not assembled then this will affect the expression of the other genes (e.g., those of the chemosensory cluster) which might play a role in the decreased level of chemosensory clusters the authors find in these mutants.

      We thank the reviewer for the valuable suggestions. In the revised manuscript, we have further elaborated on the regulatory control of flagellar genes expression in P. aeruginosa (see our response to comment #4).

      Comments on revisions:

      I believe the authors have performed additional experiments that improved their manuscript and they have answered many of my comments and those of the other reviewers. I am supportive of publishing this manuscript, but I still find the following points that are not clear to me (probably I am misunderstanding some points; the authors can clarify).

      (1) In response to reviewer 1, the authors say that they "analyzed and categorized the distribution of the chemotaxis complex in both wild-type and flhF mutant strains into three patterns: precise-polar, near-polar, and mid-cell localization." I can see what they mean by polar and mid-cell, but near-polar sounds a bit elusive? Can they provide examples of this stage and mention how accurately they can identify it? Also, do the pie charts they show in Figure S4 really show "significant alterations"? There is a difference between 98% and 85% as they mention in their response to reviewer 1, but I am not sure that this is significant? Probably they can explain/change the language in the text? Also, the number of cells they counted for FlhF mutant is more than the double of other strains (WT and FlhF FliF mutant)?

      We thank the reviewer for the valuable suggestions. To clarify, we divided the intracellular area along the cell's long axis into three domains: the two ends each representing 10% of the length as the precise-polar domain, the central 50% as the mid-cell domain, and the remaining regions between these as the near-polar domain. The localization pattern of the chemotaxis complex was assigned based on the position of the fluorescence intensity centroid within these domains.

      Regarding the significance of the changes, you are correct to question our language. When flhF was knocked out, the proportion of chemotaxis complexes with precise-polar distribution decreased from 98% to 85% - a 13% reduction. While this represents a measurable shift in localization pattern, describing this as "significant alterations" was probably imprecise. We have revised this language to more accurately reflect the magnitude of the change (lines 169-177).

      For the cell counting, we increased the sample size for the flhF mutant because this strain exhibited the appearance of mid-cell localization (approximately 5% of cells), which was not observed in wild-type or flhF fliF double mutant strains. To accurately quantify this rare phenotype and ensure statistical reliability, we analyzed more cells for this particular strain. This explains why the flhF mutant dataset contains approximately double the number of cells compared to the other strains.

      We have redrawn Figure S4 to include a clear schematic diagram of the cell partitioning method and provided representative examples of each localization pattern (precise-polar, near-polar, and mid-cell) to better illustrate how we distinguished between these categories.

      (2) One thing that also confused me is the following: One point that the authors stress is that FlhF localizes both the flagellum and the chemoreceptors to the pole. However, if I look at Figure 2B, the flagellum and the chemoreceptors still co-localize together (although not at the pole). If FlhF was responsible for co-localizing both of them to the pole, then wouldn't one expect them to be randomly localized in this mutant and by that I mean that they do not co-localize but that each of them (the flagellum and the chemoreceptors) are located in a different random location of the cell (not co-localized). The fact that they are still co-localized together in this mutant could also be interpreted by, for example, that FlhF localizes the flagellum to the pole and another mechanism localizes the chemoreceptors to the flagellum, hence, they still co-localize in this mutant because the chemoreceptors follow the flagellum by another mechanism to wherever it goes?

      Thank you for this insightful observation. You are correct that our current experimental results do not definitively establish that FlhF directly localizes both the flagellum and chemoreceptors to the pole independently. The persistent colocalization of flagella and chemoreceptors in the DflhF mutant, even when both are mislocalized away from the pole, actually suggests a more complex regulatory mechanism than we initially proposed.

      This observation highlights an important distinction between polar targeting and colocalization maintenance. Our data suggest that FlhF influences the polar targeting of the flagellum-chemoreceptor assembly, but the colocalization itself appears to be governed by a different mechanism that operates independently of FlhF. This could involve direct protein-protein interactions between flagellar and chemotaxis components, or shared assembly machinery that we have yet to identify.

      To better reflect this interpretation, we have revised the subsection title (line 150). We have also modified the relevant discussion (line 180) to more accurately describe FlhF’s role in polar targeting rather than claiming it directly controls chemoreceptor localization.

      (3) In the response to reviewers, the authors mention "suggesting that the assembly of the receptor complex is likely influenced mainly by the C-ring and MS-ring structures rather than by the P ring". However, in the article, they still write "The complete assembly of the motor serves as a partial prerequisite for the assembly of the chemotaxis complex, and its assembly site is also regulated by the polar anchor protein FlhF" despite their FlgI results which is not in accordance with this statement? Also, As I mentioned in my previous report, in FliG and FliF mutant the motor does not assemble (see Hiroyuki Terashima et al. 2020., and Kaplan et al., 2022).

      We thank the reviewer for the suggestions and acknowledge the contradictions in our original text. You are correct that in DfliF and DfliG mutants, the flagellar motor does not assemble, while the P ring (FlgI) functions as a bushing for the peptidoglycan layer and its absence does not prevent motor assembly.

      Our DflgI results, which showed normal chemotaxis complex assembly similar to wild-type, clearly demonstrate that the P ring is not required for chemoreceptor complex formation. This contradicts our original statement that "complete assembly of the motor serves as a partial prerequisite for the assembly of the chemotaxis complex."

      We have corrected this inconsistency by: 1) Revising the subsection title (line 186) to more accurately reflect that core motor structures, rather than complete motor assembly, influences chemoreceptor complex formation. 2) Modifying sentences in the introduction (lines 97-98) to better align with our experimental findings.

      (4) The authors have said in their response to my point "and currently, there is no evidence that FliA activity is influenced by proteins like FliG". I just want to clarify what I meant in my previous report: In E. coli, FliA binds to FlgM, and when the hook is assembled FlgM is secreted outside the cell allowing FliA to trigger the transcription of class III genes, which include the chemosensory genes (see Figure 5 in Beeby et al, 2020 in FEMS Microbiology, and Chilcott and Hughes, 2000). This implies that if the hook is not built, then late genes (including the chemoreceptors) should not be present. However, in Kaplan et al., 2019, the authors imaged a FliF mutant in Shewanella oneidensis (Figure S3) and still saw that chemoreceptors are present (I believe the authors must highlight this). This suggests that species such as Shewanella and Pseudomonas have a different assembly process than that E. coli, and although the authors say that in the text, I believe they still can refine this part more in the spirit of what I wrote here.

      We thank the reviewer for the important clarification regarding the differences in transcriptional regulation among bacterial species. We agree that the observation of chemoreceptors in Shewanella oneidensis DfliF mutants (Kaplan et al., 2019) represents a significant deviation from the well-characterized E. coli model and merits stronger emphasis. In response, we have expanded the discussion to more clearly highlight the critical distinctions in the transcriptional regulatory circuits governing flagellar and chemoreceptor biogenesis between E. coli and species such as Shewanella oneidensis and Pseudomonas aeruginosa (lines 351-363).

      I do not like to ask for additional experiments in the second round of review, so for me if the authors modify the text to tackle these points and allow for probable alternative explanations/ highlight gaps/ modify language used for some claims, then that is fine with me.

      Reviewer #2 (Recommendations for the authors):

      It is plausible that colocalization reduces CheY levels throughout the cell in order to reduce cross-talk with c-di-GMP. This would mean that if physiologically-relevant levels of CheYp near the pole were present throughout the cell, c-di-GMP levels would be elevated to a point that is problematic for the cell. Clearly demonstrating this seems challenging.

      We acknowledge that directly proving the necessity of colocalization to prevent problematic c-di-GMP elevation is experimentally challenging, as it would require creating a system where CheY-P is artificially distributed throughout the cell at physiologically relevant concentrations while maintaining normal chemotaxis function.

      However, our data provide several lines of evidence supporting this model. First, we show that CheY overexpression leads to substantial c-di-GMP elevation (71.8% increase) and cell aggregation, demonstrating that elevated CheY levels can indeed cause problematic cross-pathway interference. Second, previous work has shown that CheY-P levels near the pole are an order of magnitude higher than in the rest of the cell (ref. 46). If this elevated CheY-P concentration near the pole were present throughout the cell, our data suggest that c-di-GMP levels would be elevated sufficiently to cause cell aggregation (Fig. 4A), thereby disabling normal motility and chemotaxis. Third, the dose-dependent relationship between CheY concentration and aggregation phenotype supports the idea that precise spatial regulation of CheY levels is functionally important for avoiding cross-pathway interference.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Major comments:

      (1) The main issue that I have with this study is the lack of exploration of "why" the model produces the results it does. Considering this is a model, it should be possible to find out why the three timescales of half-act/inact parameter modifications lead to different sets of results. Without this, it is simply an exploratory exercise. (The model does this, but we do not know the mechanism.) Perhaps this is enough as an interesting finding, but it remains unconvincing and (clearly) does not have the impact of describing a potential mechanism that could be potentially explored experimentally.

      This is now addressed in a new section in Results (“Potential Mechanism”):

      “To explore why the properties of the resulting bursters depend on the timescale of half-(in)activation adjustments, we examined what happens when SP1 is assembled under different half-(in)activation timescales: (1) fast, (2) intermediate (matching the timescale of ion channel density changes), and (3) infinitely slow (i.e., effectively turned off). The effects of these timescales can be seen by comparing the zoomed-in views of the SP1 activity profiles under each condition (Figure 4).

      When half-(in)activations are fast, the time evolution of — which tracks how far the activity pattern is from its targets (see Methods)—shows an abrupt jump as it searches for a voltage-dependence configuration that meets calcium targets (Figure 4A). As this happens, the channel densities are slightly altered, and this process continues again. Slowing the half-(in)activations alterations reduces these abrupt fluctuations (Figure 4B). Making the alterations infinitely slow effectively removes half-(in)activation changes altogether, leaving the system reliant solely on slower alterations in maximal conductances (Figure 4C). Because each timescale of half-(in)activation produces a different channel repertoire at each time step, different timescales of half-(in)activation alteration led the model through a different path in the space of activity profiles and intrinsic properties. Ultimately, this resulted in distinct final activity patterns – all of which were consistent with the Ca<sup>2+</sup> targets [22].

      (2) A related issue is the use of bootstrapping to do statistics for a family of models, especially when the question is in fact the width of the distribution of output attributes. I don't buy this. One can run enough models to find say N number of models within a tight range (say 2% cycle period) and the same N number within a loose range (say 20%) and compare the statistics within the two groups with the same N.

      We appreciate the reviewer’s skepticism regarding our statistical approach with the “Group of 5” and “Group of 20.” These groups arose from historical aspects of our analysis and this analysis does not directly advance the main point—that changes in the timescale of channel voltage-dependence alterations impact the properties of bursters to which the homeostatic mechanism converges. Therefore, we removed the references to the Group of 5 and focus on how the Group of 20 responds to variations in the timescale of voltage-dependent alterations.

      (3) The third issue is that many of the results that are presented (but not the main one) are completely expected. If one starts with gmax values that would never work (say all of them 0), then it doesn't matter how much one moves the act/inact curves one probably won't get the desired activity. Alternately, if one starts with gmax values that are known to work and randomizes the act/inact midpoints, then the expectation would be that it converges to something that works. This is Figure 1 B and C, no surprise. But it should work the other way around too. If one starts with random act/inact curves that would never work and fixes those, then why would one expect any set of gmax values would produce the desired response? I can easily imagine setting the half-act/inact values to values that never produce any activity with any gmax.

      We appreciate this observation and agree that it highlights a limitation of our initial condition sampling. Our claim that the half-(in)activation mechanism is subordinate to the maximal conductance mechanism is not intended as a general statement. Rather, we make this observation only within the specific range of initial conditions we explored. Within this restricted set, we found that the conductance mechanism was sufficient for successful assembly, while the half-(in)activation mechanism alone was not. We have revised the manuscript to limit the claim.

      “The results shown in Figure 1A require activity-dependent regulation of the maximal conductances. When activity-dependent regulation of the maximal conductances is turned off, the model failed to assemble SP1 into a burster (Figure 1B). This was seen in the other 19 Starting Parameters (SP2-SP20), as well [22].

      (4) A potential response to my previous criticism would be that you put reasonable constraints on gmax's or half-act/inact values or tie the half-act to half-inact. But that is simply arbitrary ad hoc decisions made to make the model work, much like the L8-norm used to amplify some errors. There is absolutely no reason to believe this is tied to the biology of the system.

      Here the reviewer highlights that model choices (e.g., constraints on maximal conductance and half-(in)activation, use of the L8 norm) are not necessarily justified by biology. A discussion of the constraints on maximal conductance and half-(in)activation are in the Model Assumptions section at the end of Methods. The Methods also contains a longer discussion of the use of the L8 norm:

      “To compute this match score, we adapted a formulation from Alonso et al (2023),  who originally used a root-mean-square (RMS) or  norm to combine the sensor mismatches. In that approach, each error (, , and ) is divided by its allowable tolerance (, , and ) to produce a normalized error. These normalized errors are then squared, summed, and square-rooted to produce a single scalar score that reflects how well the model matches the target activity pattern.

      In our version, we instead used an  norm, which raises each normalized error to the 8th power before summing and taking the 1/8th root. This formulation emphasizes large deviations in any one sensor, making it easier to pinpoint which feature of the activity is limiting convergence. By amplifying outlier mismatches, this approach provided a clearer view of which sensor was driving model mismatch, helping us both interpret failure modes and tune the model’s sensitivity by adjusting the tolerances for individual sensor errors.

      Although the  norm emphasizes large deviations more strongly than the  norm, the choice of norm does not fundamentally alter which models can converge—a model that performs well under one norm can also be made to perform well under another by adjusting the allowable tolerances. The biophysical mechanisms by which neurons detect deviations from target activity and convert them into changes in ion channel properties are still not well understood. Given this uncertainty, and the fact that using different norms ultimately shouldn’t affect the convergence of a given model, the use of different norms to combine sensor errors is consistent with the broader basic premise of the model: that intrinsic homeostatic regulation is calcium mediated [22].

      (5) The discussion of this manuscript is at once too long and not adequate. It goes into excruciating detail about things that are simply not explored in this study, such as phosphorylation mechanisms, justification of model assumptions of how these alterations occur, or even the biological relevance. (The whole model is an oversimplification - lack of anatomical structure, three calcium sensors, arbitrary assumptions, and how parameter bounds are implemented.) Lengthy justifications for why channel density & half-act/inact of all currents are obeying the same time constant are answering a question that no one asked. It is a simplified model to make an important point. The authors should make these parts concise and to the point. More importantly, the authors should discuss the mechanism through which these differences may arise. Even if it is not clear, they should speculate.

      We agree. A long discussion on Model Assumptions and potential biological mechanisms that implement alteration in channel voltage-dependence obscure this. The former is relocated to the Methods section. The latter discussion is shortened. A discussion of a potential mechanism is included in the Results (Figure 4).

      (6) There should be some justification or discussion of the arbitrary assumptions made in the model/methods. I understand some of this is to resolve issues that had come up in previous iterations of this approach and in fact the Alonso et al, 2023 paper was mainly to deal with these issues. However, some level of explanation is needed, especially when assumptions are made simply because of the intuition of the modeler rather than the existence of a biological constraint or any other objective measure.

      A discussion of Model Assumptions is included in the Methods.

      Reviewer #2 (Public review):

      Summary:

      In this study, Mondal and co-authors present the development of a computational model of homeostatic plasticity incorporating activity-dependent regulation of gating properties (activation, inactivation) of ion channels. The authors show that, similar to what has been observed for activity-dependent regulation of ion channel conductances, implementing activity-dependent regulation of voltage sensitivity participates in the achievement of a target phenotype (bursting or spiking). The results however suggest that activity-dependent regulation of voltage sensitivity is not sufficient to allow this and needs to be associated with the regulation of ion channel conductances in order to reliably reach the target phenotype. Although the implementation of this biologically relevant phenomenon is undeniably relevant, the main conclusions of the paper and the insights brought by this computational work are difficult to grasp.

      Strengths:

      (1) Implementing activity-dependent regulation of gating properties of ion channels is biologically relevant.

      (2) The modeling work appears to be well performed and provides results that are consistent with previous work performed by the same group.

      Weaknesses:

      (1) The writing is rather confusing, and the state of the art explaining the need for the study is unclear.

      We reorganized the manuscript to make its focus clearer.

      Introduction: We clarified our explanation of the state-of-the-art. Briefly, prior work on activity-dependent homeostasis has focused on regulating ion channel density. Neurons have also been documented to homeostatically regulate channel voltage-dependence. However, the consequences of channel voltage-dependence alterations on homeostatic regulation remain underexplored. To study this, we extend a computational model of activity-dependent homeostasis — originally developed to only alter channel density— to alter channel voltage-dependence.

      Results: We reorganized this section to underscore the main point: that the timescale of half-(in)activation alterations influences the intrinsic properties and activity patterns targeted by a homeostatic mechanism. Figures 1A and 1B were retained to provide context—Figure 1A illustrates how activity can emerge from random initial conditions, while Figure 1B suggests that in these simulations, modulation of half-(in)activation played a specific limited role. Figure 2 builds on Figure 1A by summarizing how intrinsic properties and activity characteristics vary across a population of 20 bursters. Figure 3 then demonstrates that despite playing this specific limited role, altering the timescale of half-(in)activation in these simulations significantly impacted the intrinsic properties and activity characteristics of the bursters targeted by the homeostatic mechanism. Figure 4 supports this by offering a possible mechanistic explanation. Finally, Figure 5 reinforces the central message by showing how the same population responds to perturbation when the timescale of half-(in)activation alterations is varied—essentially extending the analysis of Figure 3 to a perturbed regime.

      Discussion: The Discussion concentrates on more specifically on how the timescale of half-(in)activation alterations shape bursters targeted he homeostatic mechanism. Extended content on model assumptions is moved to Methods. The discussion of biological pathways that implement channel voltage-dependence is shortened to avoid distracting from the main message.

      Methods: Aside from moving model assumptions here, we removed discussion of the “Group of 5” and explained in more detail why we chose the L8 norm.

      (2) The main outcomes and conclusions of the study are difficult to grasp. What is predicted or explained by this new version of homeostatic regulation of neuronal activity?

      Our message is general: the timescale of half-(in)activation alterations influences the intrinsic properties and activity characteristics of bursters targeted by a homeostatic mechanism. As such, the implications are general. Their value lies in circumscribing a conceptual framework from which experimentalists may devise and test new hypotheses. We do not aim to predict or explain any specific phenomenon in this work. To address this concern the Discussion highlights two potential implications of our findings—one to neuronal development and another to pathologies that may arise from disruptions to homeostatic processes:

      “One application for the simulations involving the self-assembly of activity may be to model the initial phases of neural development, when a neuron transitions from having little or no electrical activity to possessing it (Baccaglini & Spitzer 1977). As shown in Figure 6, the timescale of (in)activation curve alterations define a neuron's activity characteristics and intrinsic properties. As such, neurons may actively adjust these timescales to achieve a specific electrical activity aligned with a developmental phase’s activity targets. Indeed, developmental phases are marked by changes in ion channel density and voltage-dependence, leading to distinct electrical activity at each stage (Baccaglini & Spitzer 1977, Gao & Ziskind-Conhaim 1998, Goldberg et al 2011, Hunsberger & Mynlieff 2020, McCormick & Prince 1987, Moody & Bosma 2005, O'Leary et al 2014, Picken Bahrey & Moody 2003).

      Additionally, our results show that activity-dependent regulation of channel voltage-dependence can play a critical role in restoring neuronal activity during perturbations (Figure 5). Specifically, the presence and timing of half-(in)activation modulation influenced whether the model neuron could successfully return to its target activity pattern. Many model neurons only achieved recovery when a half-(in)activation mechanism was present. Moreover, the speed of this modulation shaped recovery outcomes in nuanced ways: some model neurons reached their targets only when voltage-dependence was adjusted rapidly, while others did so only when these changes occurred slowly. These observations all suggest that impairments in a neuron’s ability to modulate the voltage-dependence of its channels may lead to disruptions in activity-dependent homeostasis. This may have implications for conditions such as addiction (Kourrich et al 2015) and Alzheimer’s disease (Styr & Slutsky 2018), where disruptions in homeostatic processes are thought to contribute to pathogenesis.”

      Reviewer #3 (Public review):

      Mondal et al. use computational modeling to investigate how activity-dependent shifts in voltage-dependent (in)activation curves can complement activity-dependent changes in ion channel conductance to support homeostatic plasticity. While changes in the voltage-dependent properties of ion channels are known to modulate neuronal excitability, their role as a homeostatic plasticity mechanism interacting with channel conductance has been largely unexplored. The results presented here demonstrate that activity-dependent regulation of voltage-dependent properties can interact with plasticity in channel conductance to allow neurons to attain and maintain target activity patterns, in this case, intrinsic bursting. These results also show that the rate of channel voltage-dependent shifts can influence steady-state parameters reached as the model stabilizes into a stable intrinsic bursting state. That is, the rate of these modifications shapes the range of channel conductances and half-(in)activation parameters as well as activity characteristics such as burst period and duration. A major conclusion of the study is that altering the timescale of channel voltage dependence can seamlessly shift a neuron's activity characteristics, a mechanism that the authors argue may be employed by neurons to adapt to perturbations. While the study's conclusions are mostly well-supported, additional analyses, and simulations are needed.

      (1) A main conclusion of this study is that the speed at which (in)activation dynamics change determines the range of possible electrical patterns. The authors propose that neurons may dynamically regulate the timescale of these changes (a) to achieve alterations in electrical activity patterns, for example, to preserve the relative phase of neuronal firing in a rhythmic network, and (b) to adapt to perturbations. The results presented in Figure 4 clearly demonstrate that the timescale of (in)activation modifications impacts the range of activity patterns generated by the model as it transitions from an initial state of no activity to a final steady-state intrinsic burster. This may have important implications for neuronal development, as discussed by the authors.

      However, the authors also argue that the model neuron's dynamics - such as period, and burst duration, etc - could be dynamically modified by altering the timescale of (in)activation changes (Figure 6 and related text). The simulations presented here, however, do not test whether modifications in this timescale can shift the model's activity features once it reaches steady state. In fact, it is unlikely that this would be the case since, at steady-state, calcium targets are already satisfied. It is likely, however, as the authors suggest, that the rate at which (in)activation dynamics change may be important for neuronal adaptation to perturbations, such as changes in temperature or extracellular potassium. Yet, the results presented here do not examine how modifying this timescale influences the model's response to perturbations. Adding simulations to characterize how alterations in the rate of (in)activation dynamics affect the model's response to perturbations-such as transiently elevated extracellular potassium (Figure 5) - would strengthen this conclusion.

      The reviewer suggests that our core message — namely, that the timescale of half-(in)activation alterations influences the intrinsic properties and activity patterns targeted by a homeostatic mechanism — should also hold during perturbations. We agree that this extension strengthens the central message and have incorporated it into the subsection of the Results (“Half-(in)activation Alterations Contribute to Activity Homeostasis”) and Figure 5.

      (2) Another key argument in this study is that small, coordinated changes in channel (in)activation contribute to shaping neuronal activity patterns, but that, these subtle effects may be obscured when averaging across a population of neurons. This may be the case; however, the results presented don't clearly demonstrate this point. This point would be strengthened by identifying correlations, if they exist, between (in)activation curves, conductance, and the resulting bursting patterns of the models for the simulations presented in Figure 2 and Figure 4, for example. Alternatively, or additionally, relationships between (in)activation curves could be probed by perturbing individual (in)activation curves and quantifying how the other model parameters compensate, which could clearly illustrate this point.

      In part of the Discussion, we noted that small, coordinated shifts in half-(in)activation curves could be obscured when averaging across a population of neurons. Our intention was not to present this as a primary result, but to highlight an emergent consequence of the model: that distinct initial maximal conductances may converge to activity targets via different small shifts in half-(in)activation, making such changes difficult to detect at the population level. However, we did not systematically examine correlations between (in)activation parameters, conductances, and activity features, nor how these correlations might vary with the timescale of (in)activation modulation. While this observation is consistent with model behavior, it does not directly advance the study’s main point — that the timescale of half-(in)activation modulation influences the types of bursting patterns that satisfy the activity target. To keep the focus clear, we have removed this remark from the Discussion, though we agree that a more detailed analysis of these correlations may offer a fruitful direction for future work.

      Reviewer #1 (Recommendations for the authors):

      Minor comments:

      (1) Page 5: remove "an" from "achieve a given an activity..."

      The sentence containing this error has been removed.

      (2) Page 7, bottom of page. Explain what prespecifying means here. This requires a conceptual explanation, even if the equations are given in the methods. Was one working ad hoc model built from which the three sensor values were chosen? What was this model and how was it benchmarked? The sensors are never shown. In any figure, but presumably they have different kinetics. What is meant by "average value"? What was the window of averaging and why?

      The intention of this passage was to provide a broad overview of the homeostatic mechanism, with the rationale for using sensor “averages” as homeostatic targets explained in detail in the Methods. We have replaced the word “average” with “target” to maintain this focus.

      (3) Page 9: add "the" in "electrical activity of the neuron as [the] model seeks...".

      Done

      (4) Page 9: say briefly what alpha is before using it. Also, please be consistent in either using the symbol for alpha or spelling it out across the manuscript and the figures.

      Done

      (5) Page 10: the paragraph "In general, ..." is confusing although it becomes clear later on what this is all about. Please rewrite and expand this to clarify some points. For instance, the word "degenerate" is first used here and it is unclear in what sense these models are degenerate. Then it is unclear why the first 5 models were chosen and then 15 more added. What was the point of doing this? What is the intent? Set this up properly before saying that you just did it. This also would clarify the weird terminology used later on of Group of 20 vs. Group of 5. The 20 and 5 are arbitrary. Say what the purpose is. Finally, is the "mean" at the very end the same 416 ms? If not, what do you mean by "the mean"? In fact, I find these 2% and 20% to be imprecise substitutes of (say) two distinct values of CV which are an order of magnitude different. Is that the intent?

      This comment refers to a passage that was removed during revision.

      (6) Page 10: this may be clear to you, but it took me a while to understand that in Figure 1C, you took the working model at the end of 1A, fixed the gmax values and randomized just the half-act/inact values to run it. Perhaps rewrite this to clarify?

      This comment refers to a figure that was removed during revision.

      (7) Page 13: why do channel densities not change much after the perturbation?

      This comment refers to a figure that has since been reworked during revision. In particular, we only study what happens during perturbation. This question is interesting and is the subject of ongoing work.

      Reviewer #2 (Recommendations for the authors):

      The article should be carefully corrected, because the current quality of writing might obscure the interest of the study. Particular attention should be paid to the state-of-the-art section and to the discussion, but even the writing of the results should be carefully reworked. The current state of the article makes it very difficult to understand the motivation behind the study but also what the main result provided by this work is.

      The Introduction, Results, and Discussion have been reworked to build on the central premise of the work: the timescale of half-(in)activation alterations influences the intrinsic properties and activity patterns targeted by the neuron’s homeostatic mechanism. These changes are detailed in Public Comment #1.

      Reviewer #3 (Recommendations for the authors):

      The manuscript presents an interesting computational study exploring how activity-dependent regulation of (in)activation dynamics interacts with conductance plasticity to shape neuronal activity patterns. While the study provides valuable insights, some aspects would benefit from clarification, further analyses, and/or additional simulations to strengthen the conclusions. Below, I outline concerns and comments related to specific details of the model and results presentation that were not included in the public review.

      (1) The results presented in Figure 5 show that adaptation occurs in both channel conductances and (in)activation dynamics; however, the changes in conductance remain relatively permanent after the model recovers from the transient elevation in extracellular potassium. It therefore seems likely that the model would recover bursting more quickly in response to a subsequent exposure to simulated elevated extracellular potassium since large modifications in the slowly changing conductances would not be required. If this is the case, it could provide a plausible mechanism for adaptation to repeated high-potassium exposure, as demonstrated experimentally in Cancer borealis by this group (PMID: 36060056).

      This is an astute observation and the subject of our present follow-up investigation.

      (2) In the text relating to Figure 5, it is argued that the resulting shifts in (in)activation curves may be conceptualized as alterations in window currents. It would be helpful to illustrate this by plotting and comparing changes in window currents of these channels alongside the changes in their (in)activation curves.

      This comment refers to a passage that was removed during revision.

      (3) Some discussion of the role these homeostatic mechanisms may play when the neuron is synaptically integrated into a rhythmically active network could be informative. Surely, phasic and tonic inputs to the neuron would alter its conductance and voltage-dependent properties. Therefore, the model's parameters in an intact network could be very different from those in the synaptically isolated case.

      This is an excellent point. We agree that synaptic context—particularly tonic and phasic inputs—would likely influence a neuron’s conductances and voltage-dependent properties, potentially leading to different homeostatic outcomes than in the isolated case. While our current study focuses on synaptically isolated neurons, the Marder lab has considered how homeostatically stabilized neurons might interact in network settings. For example, O'Leary et al (2014) presents an example network of three such neurons operating under homeostatic regulation. However, systematically exploring this question remains a challenge. We are currently developing ideas to study this in the context of a simplified half-center oscillator model, where network-level dynamics can be more tractably analyzed.

      (4) Why are the transitions of alpha typically so abrupt, essentially either 1 or 0? Similarly, what happens in the model when there are transient transitions from what appears to be a steady-state alpha that abruptly shifts from 0 to 1 or 1 to 0? For example, what is occurring in Figure 1A at ~150s and ~180s when alpha jumps between 1 and 0, or in Figure 1B when the model transiently jumps up from 0 to 1 at ~400s and ~830s? In Figure 1A, does the bursting pattern change at all after ~250s, or is it identical to the pattern at c?

      This is addressed in the revision (Lines 141 – 150).

      (5) Are the final steady-state parameters of the 25 (sic) models consistent with experimental observations?

      It is difficult to assess — it is hard to design an experiment to do what the reviewer is suggesting.

      (6) Why isn't gL allowed to change dynamically? This seems like the most straightforward way to allow a neuron to adjust its excitability (aside from tonic synaptic inputs).

      Passive currents could, in principle, be subject to homeostatic regulation. However, our study focused on active intrinsic currents. This focus stems from earlier investigations, which showed that active currents are dynamically regulated during homeostasis – for instance Turrigiano et al (1995) and (Desai et al 1999).

      Alonso LM, Rue MCP, Marder E. 2023. Gating of homeostatic regulation of intrinsic excitability produces cryptic long-term storage of prior perturbations. Proc Natl Acad Sci U S A 120: e2222016120

      Baccaglini PI, Spitzer NC. 1977. Developmental changes in the inward current of the action potential of Rohon-Beard neurones. J Physiol 271: 93-117

      Desai NS, Rutherford LC, Turrigiano GG. 1999. Plasticity in the intrinsic excitability of cortical pyramidal neurons. Nature Neuroscience 2: 515-20

      Gao BX, Ziskind-Conhaim L. 1998. Development of ionic currents underlying changes in action potential waveforms in rat spinal motoneurons. J Neurophysiol 80: 3047-61

      Goldberg EM, Jeong HY, Kruglikov I, Tremblay R, Lazarenko RM, Rudy B. 2011. Rapid developmental maturation of neocortical FS cell intrinsic excitability. Cereb Cortex 21: 666-82

      Hunsberger MS, Mynlieff M. 2020. BK potassium currents contribute differently to action potential waveform and firing rate as rat hippocampal neurons mature in the first postnatal week. J Neurophysiol 124: 703-14

      Kourrich S, Calu DJ, Bonci A. 2015. Intrinsic plasticity: an emerging player in addiction. Nature Reviews Neuroscience 16: 173-84

      McCormick DA, Prince DA. 1987. Post-natal development of electrophysiological properties of rat cerebral cortical pyramidal neurones. J Physiol 393: 743-62

      Moody WJ, Bosma MM. 2005. Ion channel development, spontaneous activity, and activity-dependent development in nerve and muscle cells. Physiol Rev 85: 883-941

      O'Leary T, Williams AH, Franci A, Marder E. 2014. Cell types, network homeostasis, and pathological compensation from a biologically plausible ion channel expression model. Neuron 82: 809-21

      Picken Bahrey HL, Moody WJ. 2003. Early development of voltage-gated ion currents and firing properties in neurons of the mouse cerebral cortex. J Neurophysiol 89: 1761-73

      Styr B, Slutsky I. 2018. Imbalance between firing homeostasis and synaptic plasticity drives early-phase Alzheimer’s disease. Nature Neuroscience 21: 463-73

      Turrigiano G, LeMasson G, Marder E. 1995. Selective regulation of current densities underlies spontaneous changes in the activity of cultured neurons. J Neurosci 15: 3640-52

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      This work focuses on the connection strength of the corticostriatal projections, without considering the involvement of synaptic plasticity in sensory integration.

      Thank you for raising this point. Indeed, sensory integration is a complex process with a multitude of factors beyond connectivity patterns and synaptic strength. In addition, it is true that both connectivity levels and synaptic strength can be modified by plasticity. 

      We modified our conclusion as follows, line 354: 

      “Since the inputs to a single SPN represent only a limited subset of whisker columns, a complete representation of whiskers could emerge at the population level, with each SPN’s representation complementing those of its neighbors (Fig. 7). These observations raise the hypothesis of a selective or competitive process underlying the formation of corticostriatal synapses. The degree of input convergence onto SPNs could be modulated by plasticity, potentially enabling experience-driven reconfiguration of S1 corticostriatal coupling. “

      Reviewer #2 (Public review):

      A few minor changes to the figures and text could be made to improve clarity.

      We thank you for having taken the time to indicate where changes could benefit the paper. We followed your recommendations. 

      Reviewer #3 (Public review):

      (1) Several factors may contribute to an underestimation of barrel cortex inputs to SPNs (and thus an overestimate of the input heterogeneity among SPNs). First, by virtue of the experiments being performed in an acute slice prep, it is probable that portions of recorded SPN dendritic trees have been dissected (in an operationally consistent anatomical orientation). If afferents happen to systematically target the rostral/caudal projections of SPN dendritic fields, these inputs could be missed. Similarly, the dendritic locations of presynaptic cortical inputs remain unknown (e.g., do some inputs preferentially target distal vs proximal dendritic positions?). As synaptic connectivity was inferred from somatic recordings, it's likely that inputs targeting the proximal dendritic arbor are the ones most efficiently detected. Mapping the dendritic organization of synapses is beyond the scope of this work, but these points could be broached in the text.

      Thank you for this analysis. The positions of S1 spines have been mapped on the SPN dendritic arbor by the group of Margolis (B.D. Sanabria et al., ENeuro 2024,10.1523/ENEURO.0503-23.2023). They observed that S1 spines were at 80 % on dendrites but with a specific distribution, on average rather close to the soma.  In this study, S1 spines did not exhibit a specific distribution that would systematically hinder their detection in a slice. But, it remains that the position in the dendritic arbor where an S1 input is received does indeed impact its detection in somatic recordings. We modified the discussion as follows, line 275:

      “The LSPS combined with glutamate uncaging mapped projections contained in the slice, intact from the presynaptic cell bodies to the SPN dendrites. Some cortical inputs targeting distal SPN dendrites may have gone undetected, either due to attenuation of synaptic events recorded at the soma or because distal dendritic branches were lost during slice preparation. Indeed, about 80 % of S1 synaptic contacts are distributed along dendrites (Sanabria et al., 2024). However, synapses located distally are proportionally rare (Sanabria et al., 2024), and our estimates suggest that the loss of S1 input was minimal (see Methods). More significantly, our mapping only included projections from neuronal somata located within the S1 barrel field in the slice: projections from cortical columns outside the slice were not stimulated. For this reason, our study characterized connectivity patterns rather than the full extent of connectivity with the barrel cortex.”

      We explain our estimation of truncated S1 contacts in the Methods, line 434:

      “To estimate the loss of S1 synaptic contacts caused by slice preparation, we modeled the SPN dendritic field as a sphere centered on the soma. S1 synapses were at 80 % distributed radially along dendrites, according to the specific distribution described by Sanabria et al. (2024). The simulation also incorporated the known distribution of SPN dendritic length as a function of distance from the soma (Gertler et al., 2008). Finally, it assumed that synapse placement was isotropic, with equal probability in all directions from the soma. Truncation was simulated by removing a spherical cap at one pole of the sphere, reflecting the depth of our recordings (beyond 80 μm). Based on this simulation, the loss of S1 inputs was < 10 %.”

      (2) In general, how specific (or generalizable) is the observed SPN-specific convergence of cortical barrel cortex projections in the dorsolateral striatum? In other words, does a similar cortical stimulation protocol targeted to a non-barrel sensory (or motor) cortex region produce similar SPN-specific innervation patterns in the dorsolateral striatum?

      This is an interesting question that could be addressed using the LSPS approach in areas for which ex vivo preparations have been designed to maintain the integrity of the corticostriatal projections, such as A1, M1 and S2.  

      We included this point in the discussion, line 299: 

      ” The speckled connectivity pattern of individual SPNs, arising from the abundant and diffuse cortical innervation in the DLS, suggests that somatosensory corticostriatal synapses are established through a selective and/or competitive process. It is important to determine whether this sparse innervation of SPNs by S1 is a characteristic shared with other projections. In particular, it will be interesting to test this hypothesis on the auditory projections targeting the posterior striatum, where neurons exhibit clear tone frequency selectivity (Guo et al., 2018).”

      (3) In general, some of the figure legends are extremely brief, making many details difficult to infer. Similarly, some statistical analyses were either not carried out or not consistently reported.

      We thank you for having taken the time to indicate where changes could benefit the paper. We have followed your recommendations. 

      Reviewer #1 (Recommendations for the authors):

      A few limitations should be discussed in the manuscript:

      (1) The manuscript should mention that most corticostriatal synapses are formed at the dendritic spines of the SPNs, not their cell bodies. This is particularly important regarding the analysis and interpretation of the data in Figure 4.

      Thank you for this comment. This characteristic is important with regards to a limitation of electrophysiological recordings. This is now discussed:

      Line 275:

      “The LSPS combined with glutamate uncaging mapped projections contained in the slice, intact from the presynaptic cell bodies to the SPN dendrites. Some cortical inputs targeting distal SPN dendrites may have gone undetected, either due to attenuation of synaptic events recorded at the soma or because distal dendritic branches were lost during slice preparation. Indeed, about 80 % of S1 synaptic contacts are distributed along dendrites (Sanabria et al., 2024). However, synapses located distally are proportionally rare (Sanabria et al., 2024), and our estimates suggest that the loss of S1 input was minimal (see Methods).“

      Line 313:

      [...],, we found that overlaps between the connectivity maps of SPNs were rare and, when present, involved only a small fraction of the connected sites. This indicates that neighboring SPNs predominantly integrated distinct inputs from the barrel cortex, although it is possible that overlapping inputs received in distal dendrites were not all detected”

      (1) SPNs show up- and down-states in vivo, which were not mimicked by the present study since all cells were held at - 80 mV (Line 364) and recorded at room temperature (Line 368). It should be discussed how the conclusion of the present work may be affected by the up/down states of SPNs in vivo.

      Thank you for raising this point. Indeed, our experimental conditions were not designed to capture the effects of network oscillatory activity. Instead, LSPS conditions were optimized to reveal monosynaptic connectivity between neurons in S1 and their postsynaptic targets. These optimizations include the use of a high concentration of extracellular divalents (4 mM Ca<sup>2+</sup> and Mg<sup>2+</sup>) to generate robust yet moderate and spatially-restricted stimulations of cortical cells and reliable neurotransmitter release (Shepherd, Pologruto and Svoboda, Neuron 2003; 10.1016/s0896-6273(03)00152-1; in our study, see Fig. 1D  and Suppl Fig. 2). Investigating the pre- and postsynaptic modulations of the corticostriatal coupling by up- and down-states would require specific conditions. 

      The conclusion now acknowledges that functional connectivity is subject to plasticity in general, line 358:

      “The degree of input convergence onto SPNs could be modulated by plasticity, potentially enabling experience-driven reconfiguration of S1 corticostriatal coupling.”

      (2) In addition to population-level integration (Line 337), sensory integration is likely to involve synaptic plasticity (like via NMDARs), which was not studied in the present work

      Thank you for raising this point. Indeed, we agree that sensory integration is a complex process with a multitude of factors beyond connectivity patterns and synaptic strength. We also agree that both connectivity levels and synaptic strength can be modified by plasticity. 

      We modified our conclusion as follows, line 354:

      “Since the inputs to a single SPN represent only a limited subset of whisker columns, a complete representation of whiskers could emerge at the population level, with each SPN’s representation complementing those of its neighbors (Fig. 7). These observations raise the hypothesis of a selective or competitive process underlying the formation of corticostriatal synapses. The degree of input convergence onto SPNs could be modulated by plasticity, potentially enabling experience-driven reconfiguration of S1 corticostriatal coupling. “

      (3) The potential corticostriatal connectivity may be underestimated due to loss of axonal branches during slice resection, and this might contribute to the conclusion of "sparse connectivity". Whether the author has considered performing LSPS studies within the striatum (i.e., stimulating ChR2-expressing cortical axon terminals) and whether this experiment may consolidate the conclusion of the present work.

      We appreciate the suggestion to employ Subcellular Channelrhodopsin-2-Assisted Circuit Mapping (sCRACM) to study the density of S1 spines on SPNs dendritic arbor. If ChR2 is broadly expressed in S1, this approach would likely increase spine detection, as spines contacted by presynaptic neurons located inside and outside the slice would now be activated. If ChR2 expression could be restricted to the whisker columns present in our preparation, enhanced detection could still occur, but in this case, it would reflect the activation of spines contacted by specific ChR2<sup>+</sup> axonal branches that exit and re-enter the slice to form synapses on the recorded SPN. The anatomy of corticostriatal axonal arbors suggest convoluted axonal trajectories could be relatively rare (T. Zheng and C.J. Wilson, J Neurophysiol. 2001; 10.1152/jn.00519.2001; M. Lévesque et al., Brain Res. 1996; 10.1016/0006-8993(95)01333-4).  

      Moreover, it is important to remember that sCRACM does not generate connectivity maps between 2 structures, but maps of spines on dendritic arbors (Petreanu L.T. et al., Nature 2009; 10.1038/nature07709.). Precise localization of presynaptic cell bodies was key for the present study, as it enabled distinguishing between different connectivity patterns and between different degrees of convergence of inputs from adjacent S1 cortical columns present in the slice (schematized in Fig. 1). Distinguishing these inputs using the stimulation of axon terminals would require the possibility to express one distinct opsin in each whisker column (or each cortical layer, depending on the axis of investigation). This is an exciting perspective but the technology is not yet available to our knowledge. 

      To emphasize our reasons for using LSPS, we revised the final paragraph of the Introduction, line 69: 

      “LSPS enabled precise mapping of corticostriatal functional connectivity by identifying cortical sites where stimulation evoked synaptic currents in the recorded SPNs, thereby localizing the cell bodies of their presynaptic neurons. This approach allowed us to determine both the cortical column and layer of origin within the barrel field in the slice for each SPN input.”

      Reviewer #2 (Recommendations for the authors):

      (1)  Figure 2F: SPN and cortical regions - both are shown in green. The distinction between the two would be clearer if SPNs were made a different color.

      Done

      (2)  Figure 2H: Based on their data, the authors conclude that since EPSCs in SPNs had small amplitudes (~40pA), only one or a few presynaptic cortical neurons (< 5) were activated by uncaging. It is not clear how this number was estimated. Either this statement should be qualified with data or citations provided to support it.

      We thank you for noticing it. We modified this part as follows, line 105:

      “Based on known amplitudes of spontaneous and miniature EPSCs in SPNs (10-20 pA on average; Kreitzer and Malenka, 2007; Cepeda et al., 2008; Dehorter et al., 2011; Peixoto et al., 2016), this finding is consistent with the presence of only one or a few presynaptic cells (≤ 5) at each connected site of the map.”

      (3) Figure 2I: The top graph is difficult to understand without already seeing the lower plot. Moving it below or to the side would help the reader follow the data more easily.

      done

      (4) Figure 3D: In Line 162, the authors state, " Furthermore, SPNs receiving input from a single column were often located near others receiving input from multiple ones (Figure 3D), reinforcing that the low functional connectivity with barrel columns in the slice was genuine in these cases." However, Figure 3D does not show spatial information about SPNs relative to each other. This data should be added or the statement adjusted to reflect what is shown in the panel.

      Corrected as follows, line 167:

      “Furthermore, SPNs receiving input from a single column were often located in slices where other cells received input from multiple ones (Fig. 3D), reinforcing that the low functional connectivity with barrel columns in the slice was genuine in these cases.”

      (5) Figure 3F: Are the authors attempting to show how cluster number, cluster width, and connectivity gaps contribute to input field width? If so, this could be clarified by flipping the x- and y-axes so that the input field width is the y-axis in each case. Additionally, the difference between black and white points should be stated (or, if there is no difference, made to be the same). The significance of the dotted red line vs. the solid red lines should also be stated in the figure legend.

      These plots illustrate how cluster number, cluster width, and ratio of connectivity gaps over total length vary as a function of input field width. As expected, wider input fields contain more clusters (top). However, the overall density of connected sites does not increase with input field width, as indicated by a higher ratio of connectivity gaps over total length (bottom).

      This suggests the presence of a mechanism that regulates the connectivity level of individual SPNs (mentioned in the discussion). We prefer this orientation because the flipped one makes a cluttered panel due to different X axis labels. Symbols and lines were corrected. The correlation coefficients and statistics are now indicated in the panels and in the legend.

      (6) Figure 3H: The schematic is very useful for highlighting the core conclusions and is greatly appreciated. The pie charts are a bit hard to see and could be replaced with the percentages stated simply as text within the figure. It would also help to label the panel as "Summary," so readers can quickly identify its purpose.

      Done

      (7) Figures 4B-D: To clarify the overall percentage, the maximum for the y-axis should be set to 100% in each panel.

      Done

      Reviewer #3 (Recommendations for the authors):

      (1) Though mostly minor, several sentences/statements in the manuscript are confusing or overstated. For example:

      a. Lines 62-63: "Studies have found that inputs received by D1 SPNs were stronger than those received by D2 SPNs" is a broad statement that should be qualified.

      We changed this sentence for: 

      “Electrophysiological studies have found that inputs received by D1 SPNs were stronger than those received by D2 SPNs, both in vivo and ex vivo (Reig and Silberberg, 2014 ; Filipović et al., 2019 ; Kress et al., 2013 ; Parker et al., 2016).”

      b. Lines 118-119: "EPSCs evoked with stimulations in L2/3 to L5b had similar amplitudes (Figure 2H), suggesting that L5a dominated these other layers thanks to a greater connectivity with SPNs principally." Here, the word "connectivity" is vague and could easily be misunderstood. Connectivity could refer to the amplitude of corticostriatal EPSCs, which the authors stated are not different between L2/3-L5b. Presumably, connectivity here refers to % of connected SPNs, but for the sake of clarity, the authors should be more explicit, e.g,. "...L5a dominated the other layers because a larger fraction of SPNs received connections from L5a, rather than because L5a synapses were stronger."

      We changed the sentence for (line 122): 

      “EPSCs evoked with stimulations in L2/3 to L5b had similar amplitudes (Fig. 2H), suggesting that L5a dominance over these other layers is primarily due to a higher likelihood of SPNs being connected to it, rather than to stronger synaptic inputs.”

      c. In the Figure 4 legend, (A) says "Four example slices with 2 to 4 recordings. Same as in Figure 2A." Did the authors mean Figure 3A?

      Done

      d.Line 184: Should Figure 4B, C actually be Figure 4D?

      Done

      (2) Line 32: typo in Sippy et al. reference.

      Done

      (3) In Figure 2I, the label "dSPN" is confusing, as in the literature, dSPN often refers to the direct pathway SPN.

      Done

      (4) The y-axes in Figure 3C should be better labeled/explained.

      Fig.3C. Median (red) and 25-75th percentiles (box) of cluster width and spacing, expressed in µm (left Y axis) and number of cortical columns (right Y axis). Labels have been changed in the figure.

      (5)  Lines 150-152: "...45 % of the input fields with several clusters produced no synaptic response upon stimulation." This wording is confusing. It can be inferred that the authors mean "no synaptic response in the gaps between clusters." However, their phrasing omits this crucial detail and reads as though those input fields produce no response at all.

      We changed this sentence for (line 154):

      “Strikingly, regions lacking evoked synaptic responses (i.e., connectivity gaps) made up an average of 45 % of the length of input fields with multiple clusters (maps collapsed along the vertical axis; Fig. 3F, bottom). “

      (6)  Lines 184-186: "DLS SPNs could receive inputs from the same domain in the barrel cortex and yet have patterns of cortical innervation without or little redundancy." This should be rephrased to "with little to no redundancy."

      Done

      (7)  Lines 186-187: "They support a connectivity model in which synaptic connections on each SPNs..." should be revised to "connections to each SPN...".

      Done

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review)

      As this code was developed for use with a 4096 electrode array, it is important to be aware of double-counting neurons across the many electrodes. I understand that there are ways within the code to ensure that this does not happen, but care must be taken in two key areas. Firstly, action potentials traveling down axons will exhibit a triphasic waveform that is different from the biphasic waveform that appears near the cell body, but these two signals will still be from the same neuron (for example, see Litke et al., 2004 "What does the eye tell the brain: Development of a System for the Large-Scale Recording of Retinal Output Activity"; figure 14). I did not see anything that would directly address this situation, so it might be something for you to consider in updated versions of the code.

      Thank you for this comment. We have added a routine to the SpikeMAP to remove highly correlated spikes detected within a given spatial radius of each other. The following was added to the main text (line 149):

      “As an additional verification step, SpikeMAP allows the computation of spike-count correlations between putative neurons located within a user-defined radius. Signals that exceed a defined threshold of correlation can be rejected as they likely reflect the same underlying cell.”

      Secondly, spike shapes are known to change when firing rates are high, like in bursting neurons (Harris, K.D., Hirase, H., Leinekugel, X., Henze, D.A. & Buzsáki, G. Temporal interaction between single spikes and complex spike bursts in hippocampal pyramidal cells. Neuron 32, 141-149 (2001)). I did not see this addressed in the present version of the manuscript.

      We have added a routine to SpikeMAP that computes population spike rates to verify stationarity over time. We have also added a routine to identify putative bursting neurons through a Hartigan statistical dip test applied to the inter-spike distribution of individual cells.

      We added the following (line 204):

      “Further, SpikeMAP contains a routine to perform a Hartigan statistical dip test on the inter-spike distribution of individual cells to detect putative bursting neurons.”

      Another area for possible improvement would be to build on the excellent validation experiments you have already conducted with parvalbumin interneurons. Although it would take more work, similar experiments could be conducted for somatostatin and vasoactive intestinal peptide neurons against a background of excitatory neurons. These may have different spike profiles, but your success in distinguishing them can only be known if you validate against ground truth, like you did for the PV interneurons.

      We have added the following (line 326):

      “future work could include different inhibitory interneurons such as somatostatin (SOM) and vasoactive intestinal polypeptide (VIP) neurons to improve the classification of inhibitory cell types. Another avenue could involve applying SpikeMAP on artificially generated spike data (Buccino & Einevoll 2021; Laquitaine et al., 2024).”

      Reviewer #2 (Public review)

      Summary:

      While I find that the paper is nicely written and easy to follow, I find that the algorithmic part of the paper is not really new and should have been more carefully compared to existing solutions. While the GT recordings to assess the possibilities of a spike sorting tool to distinguish properly between excitatory and inhibitory neurons are interesting, spikeMAP does not seem to bring anything new to state-of-the-art solutions, and/or, at least, it would deserve to be properly benchmarked. I would suggest that the authors perform a more intensive comparison with existing spike sorters.

      Thank you for your insightful comment. A full comparison between SpikeMAP and related methods is provided in Table. 1. As can be seen, SpikeMAP is the only method listed that performs E/I sorting on large-scale multielectrodes. Nonetheless, several aspects of SpikeMAP included in the spike sorting pipeline do overlap with existing methods, as these constitute necessary steps prior to performing E/I identification. These steps are not novel to the current work, nor do they constitute rigid options that cannot be substituted by the user. Rather, we aim to offer SpikeMAP users the option to combine E/I identification with preliminary steps performed either through our software or through another package of their choosing. For instance, preliminary spike sorting could be done through Kilosort before importing the spike data into SpikeMAP for E/I identification. To allow greater flexibility, we have now modularized our suite so that E/I identification can be performed as a stand-alone module. We have clarified the text accordingly (line 317):

      “While SpikeMAP is the only known method to enable the identification of putative excitatory and inhibitory neurons on high-density multielectrode arrays (Table 1), several aspects of SpikeMAP included in the spike sorting pipeline (Figure 1) overlap with existing methods, as these constitute required steps prior to performing E/I identification. To enable users the ability to integrate SpikeMAP with existing toolboxes, we provide a modularized suite of protocols so that E/I identification can be performed separately from preliminary spike sorting steps. In this way, a user could carry out spike sorting through Kilosort or another package before importing their data to SpikeMAP for E/I identification.”

      Weaknesses:

      (1) The global workflow of spikeMAP, described in Figure 1, seems to be very similar to that of Hilgen et al. 2020 (10.1016/j.celrep.2017.02.038). Therefore, the first question is what is the rationale of reinventing the wheel, and not using tools that are doing something very similar (as mentioned by the authors themselves). I have a hard time, in general, believing that spikeMAP has something particularly special, given its Methods, compared to state-of-the-art spike sorters.

      The paper by Hilgen et al. is reported in Table 1. As seen, while this paper employs optogenetics, it does not target inhibitory (e.g., PV) cells. We have added the following clarification (line 82):

      “Despite evidence showing differences in action potential kinetics for distinct cell-types as well as the use of optogenetics (Hilgen et al., 2017), there exists no large-scale validation efforts, to our knowledge, showing that extracellular waveforms can be used to reliably distinguish cell-types.”

      This is why, at the very least, the title of the paper is misleading, because it lets the reader think that the core of the paper will be about a new spike sorting pipeline. If this is the main message the authors want to convey, then I think that numerous validations/benchmarks are missing to assess first how good spikeMAP is, with reference to spike sorting in general, before deciding if this is indeed the right tool to discriminate excitatory vs inhibitory cells. The GT validation, while interesting, is not enough to entirely validate the paper. The details are a bit too scarce for me, or would deserve to be better explained (see other comments after).

      We thank the reviewer for this comment, and have amended the title as follows:

      “SpikeMAP: An unsupervised pipeline for the identification of cortical excitatory and inhibitory neurons in high-density multielectrode arrays with ground-truth validation”

      (2) Regarding the putative location of the spikes, it has been shown that the center of mass, while easy to compute, is not the most accurate solution [Scopin et al, 2024, 10.1016/j.jneumeth.2024.110297]. For example, it has an intrinsic bias for finding positions within the boundaries of the electrodes, while some other methods, such as monopolar triangulation or grid-based convolution,n might have better performances. Can the authors comment on the choice of the Center of Mass as a unique way to triangulate the sources?

      We agree with the reviewer that the center-of-mass algorithm carries limitations that are addressed by other methods. To address this issue, we have included two additional protocols in SpikeMAP to perform monopolar triangulation and grid-based convolution, offering additional options for users of the package. The text has been clarified as follows (line 429):

      “In addition to center-of-mass triangulation, SpikeMAP includes protocols to perform monopolar triangulation and grid-based convolution, offering additional options to estimate putative soma locations based on waveform amplitudes.”

      (3) Still in Figure 1, I am not sure I really see the point of Spline Interpolation. I see the point of such a smoothing, but the authors should demonstrate that it has a key impact on the distinction of Excitatory vs. Inhibitory cells. What is special about the value of 90kHz for a signal recorded at 18kHz? What is the gain with spline enhancement compared to without? Does such a value depend on the sampling rate, or is it a global optimum found by the authors?

      We clarified the text as follows (line 183):

      “While we found that a resolution of 90 kHZ provided a reasonable estimate of spike waveforms, this value can be adjusted as a parameter in SpikeMAP.”

      (4) Figure 2 is not really clear, especially panel B. The choice of the time scale for the B panel might not be the most appropriate, and the legend filtered/unfiltered with a dot is not clear to me in Bii.

      We apologize for the rendering issues in the Figures that occurred during conversion into PDF format. We have now ensured that all figures are properly displayed.

      In panel E, the authors are making two clusters with PCA projections on single waveforms. Does this mean that the PCA is only applied to the main waveforms, i.e. the ones obtained where the amplitudes are peaking the most? This is not really clear from the methods, but if this is the case, then this approach is a bit simplistic and does not really match state-of-the-art solutions. Spike waveforms are quite often, especially with such high-density arrays, covering multiple channels at once, and thus the extracellular patterns triggered by the single units on the MEA are spatio-temporal motifs occurring on several channels. This is why, in modern spike sorters, the information in a local neighbourhood is often kept to be projected, via PCA, on the lower-dimensional space before clustering. Information on a single channel only might not be informative enough to disambiguate sources. Can the authors comment on that, and what is the exact spatial resolution of the 3Brain device? The way the authors are performing the SVD should be clarified in the methods section. Is it on a single channel, and/or on multiple channels in a local neighbourhood?

      We agree with the reviewer that it would be useful to have the option of performing PCA on several channels at once, since spikes can occur at several channels at the same time. We have now added a routine to SpikeMAP that allows users to define a radius around individual channels prior to performing PCA. The text was clarified as follows (line 131):

      “The SpikeMAP suite also offers a routine to select a radius around individual channels in order to enter groups of adjacent channels in PCA.”

      (5) About the isolation of the single units, here again, I think the manuscript lacks some technical details. The authors are saying that they are using a k-means cluster analysis with k=2. This means that the authors are explicitly looking for 2 clusters per electrode? If so, this is a really strong assumption that should not be held in the context of spike sorting, because, since it is a blind source separation technique, one can not pre-determine in advance how many sources are present in the vicinity of a given electrode. While the illustration in Figure 2E is ok, there is no guarantee that one can not find more clusters, so why this choice of k=2? Again, this is why most modern spike sorting pipelines do not rely on k-means, to avoid any hard-coded number of clusters. Can the authors comment on that?

      We clarified the text as follows (line 135):

      “In SpikeMAP, the optimal number of k-means clusters can be chosen by a Calinski-Harabasz criterion (Calinski and Harabasz, 1974) or pre-selected by the user.”

      (6) I'm surprised by the linear decay of the maximal amplitude as a function of the distance from the soma, as shown in Figure 2H. Is it really what should be expected? Based on the properties of the extracellular media, shouldn't we expect a power law for the decay of the amplitude? This is strange that up to 100um away from the soma, the max amplitude only dropped from 260 to 240 uV. Can the authors comment on that? It would be interesting to plot that for all neurons recorded, in a normed manner V/max(V) as function of distances, to see what the curve looks like.

      We added Supplemental Figure 1 showing the drop in voltage over all putative somas (N=1,950) of one recording, after excluding somas with an increase voltage away from electrode peak and computing normed values V/max(V). We see a distribution of slopes as well as intercepts across somas, showing some variability across recordings sites. As the reviewer suggests, it is possible that a power-law describes these data better than a linear function, and this would need to be investigated further by quantitatively comparing the fit of these functions.

      (7) In Figure 3A, it seems that the total number of cells is rather low for such a large number of electrodes. What are the quality criteria that are used to keep these cells? Did the authors exclude some cells from the analysis, and if yes, what are the quality criteria that are used to keep cells? If no criteria are used (because none are mentioned in the Methods), then how come so few cells are detected, and can the authors convince us that these neurons are indeed "clean" units (RPVs, SNRs, ...)?

      The reviewer is correct to point out that a number of stringent criteria were employed to exclude some putative cells. We now outline these criteria directly in the text (line 161):

      “ At different steps in the process, conditions for rejecting spikes can be tailored by applying: (1) a stringent threshold to filtered voltages; (2) a minimal cut-off on the signal-to-noise ratio of voltages (see Supplemental Figure 2); (3) an LDA for cluster separability; (4) a minimal spike rate to putative neurons; (5) a Hartigan statistical dip test to detect spike bursting; (6) a decrease in voltage away from putative somas; and (7) a maximum spike-count correlation for nearby channels. Together, these criteria allow SpikeMAP users the ability to precisely control parameters relevant to automated spike sorting.”

      Further, we provide SNRs of individual channels (Supplemental Figure 2), and added to the SpikeMAP software the ability to apply a minimal criterion based on SNR.

      (8) Still in Figure 3A, it looks like there is a bias to find inhibitory cells at the borders, since they do not appear to be uniformly distributed over the MEA. Can the authors comment on that? What would be the explanation for such a behaviour? It would be interesting to see some macroscopic quantities on Excitatory/Inhibitory cells, such as mean firing rates, averaged SNRs... Because again, in Figure 3C, it is not clear to me that the firing rates of inhibitory cells are higher than Excitatory ones, whilst they should be in theory.

      We have added figures showing the distribution of E and I firing rates across a population of N=1,950 putative cells (Supplemental Figure 3). Firing rates of inhibitory neurons are marginally higher than excitatory neurons, and both E and I follow an approximately exponential distribution of rates.

      Reviewer may be right that there are more I neurons at borders in Fig.3B because injections were done in medial prefrontal cortex, so this may reflect an experimental artefact related to a high probability of activating I neurons in locations where the opsin was activated. We added a sentence to the text to clarify this point (line 201):

      “It is possible that the spatial location of putative I cells reflects the site of injection of the opsin in medial prefrontal cortex.”

      (9) For Figure 3 in general, I would have performed an exhaustive comparison of putative cells found by spikeMAP and other sorters. More precisely, I think that to prove the point that spikeMAP is indeed bringing something new to the field of spike sorting, the authors should have compared the performances of various spike sorters to discriminate Exc vs Inh cells based on their ground truth recordings. For example, either using Kilosort [Pachitariu et al, 2024, 10.1038/s41592-024-02232-7], or some other sorters that might be working with such large high-density data [Yger et al, 2018, 10.7554/eLife.34518].

      The reviewer is correct to point out that our the spike-sorting portion of our pipeline shares similarities with related approaches. Other aspects, however, are unique to SpikeMAP. We have clarified the text accordingly:

      “In sum, SpikeMAP provides an end-to-end pipeline to perform spike-sorting on high-density multielectrode arrays. Some elements of this pipeline are similar to related approaches (Table 1), including the use of voltage filtering, PCA, and k-means clustering. Other elements are novel, including the use of spline interpolation, LDA, and the ability to identify putative excitatory and inhibitory cells.”

      (10) Figure 4 has a big issue, and I guess the panels A and B should be redrawn. I don't understand what the red rectangle is displaying.

      Again, we apologize for the rendering issues in the Figures that occurred during conversion into PDF format. We have now ensured that all figures are properly displayed.

      (11) I understand that Figure 4 is only one example, but I have a hard time understanding from the manuscript how many slices/mices were used to obtain the GT data? I guess the manuscript could be enhanced by turning the data into an open-access dataset, but then some clarification is needed. How many flashes/animals/slices are we talking about? Maybe this should be illustrated in Figure 4, if this figure is devoted to the introduction of the GT data.

      Details of the open access data are now provided in Supplemental Table 1. We also clarified Figure 5B:

      “Quantification of change in firing rate following optogenetic stimulation. Average firing rates are taken over four recordings obtained from 3 mice.”

      (12) While there is no doubt that GT data as the ones recorded here by the authors are the most interesting data from a validation point of view, the pretty low yield of such experiments should not discourage the use of artificially generated recordings such as the ones made in [Buccino et al, 2020, 10.1007/s12021-020-09467-7] or even recently in [Laquitaine et al, 2024, 10.1101/2024.12.04.626805v1]. In these papers, the authors have putative waveforms/firing rate patterns for excitatory and inhibitory cells, and thus, the authors could test how good they are in discriminating the two subtypes.

      We agree with the reviewer that it would be worthwhile for future work to apply SpikeMAP to artificially generated spike trains, and have added the following (line 328):

      “Another avenue could involve applying SpikeMAP on artificially generated spike data (Buccino & Einevoll 2021; Laquitaine et al., 2024).”

      Reviewer #1 (Recommendations for the authors):

      (1) Line 154 seems to include a parenthetical expression left over from editing: "sensitive to noise (contamination? Better than noise?) generated by the signal of proximal units." See also line 186: "use (reliance?) of light-sensitive" and line 245: "In the absence of synaptic blockers (right?)," and line 270: "the size of the data prevents manual intervention (curation?)." Check carefully for all parentheses like that, which should be removed.

      Thank you for pointing this out. We have revised the text and removed parenthetical expressions left over from editing.

      (2) In lines 285-286, you state that: "k-mean clustering of spike waveform properties best differentiated the two principal classes of cells..." But I could not find where you compared k-means clustering to other methods. I think you just argued that k-means seemed to work well, but not better than, another method. If that is so, then you should probably rephrase those lines.

      The reviewer is correct that direct comparisons are not performed here, hence we removed this sentence.

      (3) Methods section, E/I classification, lines 396-405: You give us figures on what fraction was E and I (PV subtype) (94.75% and 5.25%), but there is more that you could have said. First of all, what is the expected fraction of parvalbumin-sensitive interneurons in the cortex - is it near 5%?

      We clarified the text as follows (line 444): “This number is close to the expected percentage of PV interneurons in cortex (4-6%) (Markram et al. 2004).”

      Second, how would these percentages change if you altered the threshold from 3 s.d. to something lower, like 2 s.d.? Giving us some idea of how the threshold affects the fraction of PV interneurons could give us an idea of whether this method agrees with our expectations or not.

      While SpikeMAP offers the flexibility to set the voltage threshold manually, we opted for a stringent threshold to demonstrate the capabilities of the software. As seen in Figure 2D, at 2 and 3 s.d., the signal is largely accounted for by Gaussian noise, while deviation from noise arises around 4 s.d. We clarified the text as follows (line 120):

      “At a threshold of -3 , the signal could be largely accounted for by Gaussian noise, while a separation between signal and noise began around a threshold of -4 ”

      Third, did the inhibitory neurons identified by this optogenetic method also have narrow spike widths at half amplitude? Could you do a scatterplot of all the spike widths and inter-peak distances that had color-coded dots for E and I based on your optogenetic method?

      We have added a scatterplot (Supplemental Figure 5).

      (4) Can you compare your methods with others now widely in use, like, for example, Spiking Circus or Kilosort? You do that in Table 1 in terms of features, but not in terms of performance. For example, you could have applied Kilosort4 to your data from the 4096 electrode array and seen how often it sorted the same neurons that SpikeMAP did. I realize this could not give you a comparison of how many were E/I, but it could tell you how close your numbers of neurons agreed with their numbers. Were your numbers within 5% of each other? This would be helpful for groups who are already using Kilosort4.

      As mentioned ealier, packages listed in Table 1 do not provide an identification of putative E/I neurons on high-density electrode arrays. To facilitation the integration of SpikeMAP with other spike sorting packages, our suite now provides a stand-alone module to perform E/I identification. This is now mentioned in the text (see earlier comment).

      Reviewer #2 (Recommendations for the authors):

      I would encourage the authors to decide what the paper is about: is it about a new sorting method (and if yes, more tests/benchmarks are needed to explain the pros and the cons of the pipelines, and the Methods need to be expanded). Or is it about the new data for Ground Truth validation, and again, if yes, then maybe explain more what they are, how many slices/mice/cells, ... Maybe also consider making the data available online as an open dataset.

      We agree with the reviewer that the paper is best slated toward ground truth validation of E/I identification. We now specify how many slices/mice/cells etc. (see Supplemental Table 1) and make the data available online as open source.

    1. Author response:

      (1) Explore the temporal component of neural responses (instead of collapsing responses to a single number, i.e., the average response over 4s), and determine which of the three models can recapitulate the observed dynamics.

      (2) Expand the polar plot visualization to show all three slopes (changes in responses across all three successive concentrations) instead of only two slopes.

      (3) Attempt to collect and analyze, from published papers, data of: (a) first-order neuron responses to odors to determine the role of first-order inhibition towards generating non-monotonic responses, and (b) PN responses in Drosophila to properly compare with corresponding first-order neuron responses.

      (4) Further discuss: (a) why the brain may need to encode absolute concentration, (b) the distinction between non-monotonic responses and cross-over responses, and (c) potential limitations of the primacy model.

      (5) Expand the divisive normalization model by evaluating different values of k and R, and study the effects of divisive normalization on tufted cells.

      (6) Add discussion of other potential inhibitory mechanisms that could contribute towards the observed effects.

      Reviewer #1:

      The article starts from the premise that animals need to know the absolute concentration of an odor over many log units, but the need for this isn't obvious. The introduction cites an analogy to vision and audition. These are cases where we know for a fact that the absolute intensity of the stimulus is not relevant. Instead, sensory perception relies on processing small differences in intensity across space or time. And to maintain that sensitivity to small differences, the system discards the stimulus baseline. Humans are notoriously bad at judging the absolute light level. That information gets discarded even before light reaches the retina, namely through contraction of the pupil. Similarly, it seems plausible that a behavior like olfactory tracking relies on sensing small gradients across time (when weaving back and forth across the track) or space (across nostrils). It is important that the system function over many log units of concentration (e.g., far and close to a source) but not that it accurately represents what that current concentration is [see e.g., Wachowiak et al, 2025 Recalibrating Olfactory Neuroscience..].

      We thank the Reviewer for the insightful input and agree that gradients across time and space are important for various olfactory behaviors, such as tracking. At the same time, we think that absolute concentration is also needed for two reasons. First, in order to extract changes in concentration, the absolute concentration needs to be normalized out; i.e., change needs to be encoded with respect to some baseline, which is what divisive normalization computes. Second, while it is true that representing the exact number of odor molecules present is not important, this number directly relates to distance from the odor source, which does provide ethological value (e.g., is the tiger 100m or 1000m away?). Indeed, our decoding experiments focused on discriminating relative, and not on absolute, concentrations by classifying between each pair of concentrations (i.e., relative distances), which is effectively an assessment of the gradient. In our revision, we will make all of these points clearer.

      Still, many experiments in olfactory research have delivered square pulses of odor at concentrations spanning many log units, rather than the sorts of stimuli an animal might encounter during tracking. Even within that framework, though, it doesn't seem mysterious anymore how odor identity and odor concentration are represented differently. For example, Stopfer et al 2003 showed that the population response of locust PNs traces a dynamic trajectory. Trajectories for a given odor form a manifold, within which trajectories for different concentrations are distinct by their excursions on the manifold. To see this, one must recognize that the PN responds to an odor pulse with a time-varying firing rate, that different PNs have different dynamics, and that the dynamics can change with concentration. This is also well recognized in the mammalian systems. Much has been written about the topic of dynamic coding of identity and intensity - see the reviews of Laurent (2002) and Uchida (2014).

      Given the above comments on the dynamics of odor responses in first- and second-order neurons, it seems insufficient to capture the response of a neuron with a single number. Even if one somehow had to use a single number, the mean firing rate during the odor pulse may not be the best choice. For example, the rodent mitral cells fire in rhythm with the animal's sniffing cycle, and certain odors will just shift the phase of the rhythm without changing the total number of spikes (see e.g., Fantana et al, 2008). During olfactory search or tracking, the sub-second movements of the animal in the odor landscape get superposed on the sniffing cycle. Given all this, it seems unlikely that the total number of spikes from a neuron in a 4-second period is going to be a relevant variable for neural processing downstream.

      To our knowledge, it is not well understood how downstream brain regions read out mitral cell responses to guide olfactory behavior. The olfactory bulb projects to more than a dozen brain regions, and different regions could decode signals in different ways. We focused on the mean response because it is a simple, natural construct.

      The datasets we analyzed may not include all relevant timing information; for example, the mouse data is from calcium imaging studies that did not track sniff timing. Nonetheless, we plan to address this comment within our framework by binning time into smaller-sized windows (e.g., 0-0.2s, 0.2-0.4s, etc.) and repeating our analysis for each of these windows. Specifically, we will determine how each normalization method fares in recapitulating statistics of the population responses of each window, beyond simply assessing the population mean.

      Much of the analysis focuses on the mean activity of the entire population. Why is this an interesting quantity? Apparently, the mean stays similar because some neurons increase and others decrease their firing rate. It would be more revealing, perhaps, to show the distribution of firing rates at different concentrations and see how that distribution is predicted by different models of normalization. This could provide a stronger test than just the mean.

      We agree that mean activity is only one measure to summarize a rich data set and will perform the suggested analysis.

      The question "if concentration information is discarded in second-order neurons, which exclusively transmit odor information to the rest of the brain, how does the brain support olfactory behaviors, such as tracking and navigation?" is really not an open question anymore. For example, reference 23 reports in the abstract that "Odorant concentration had no systematic effect on spike counts, indicating that rate cannot encode intensity. Instead, odor intensity can be encoded by temporal features of the population response. We found a subpopulation of rapid, largely concentration-invariant responses was followed by another population of responses whose latencies systematically decreased at higher concentrations."

      Primacy coding does provide one plausible mechanism to decode concentration. Our manuscript demonstrated how such a code could emerge in second-order neurons with the help of divisive normalization, though it does require maintaining at least partial rank invariance across concentrations, which may not be robust. We also showed how concentration could be decoded via spike rates, even if average rates are constant, which provides an alternative hypothesis to that of ref 23.

      Further, ref 23 only considers the piriform cortex, which, as mentioned above, is one of many targets of the olfactory bulb, and it remains unclear what the decoding mechanisms are of each of these targets. In addition, work from the same authors of ref 23 found multiple potential decoding strategies in the piriform cortex itself, including changes in firing rate (see Fig. 2E of ref. 23 - Bolding & Franks, 2017; as well as Fig. 4 in Roland et al., 2017).

      It would be useful to state early in the manuscript what kinds of stimuli are being considered and how the response of a neuron is summarized by one number. There are many alternative ways to treat both stimuli and responses.

      We will add this explanation to the manuscript.

      "The change in response across consecutive concentration levels may not be robust due to experimental noise and the somewhat limited range of concentrations sampled": Yes, a number of the curves just look like "no response". It would help the reader to show some examples of raw data, e.g. the time course of one neuron's firing rate to 4 concentrations, and for the authors to illustrate how they compress those responses into single numbers.

      We agree and will add this information to the manuscript.

      "We then calculated the angle between these two slopes for each neuron and plotted a polar histogram of these angles." The methods suggest that this angle is the arctan of the ratio of the two slopes in the response curve. A ratio of 2 would result from a slope change from 0.0001 to 0.0002 (i.e., virtually no change in slope) or from 1 to 2 (a huge change). Those are completely different response curves. Is it reasonable to lump them into the same bin of the polar plot? This seems an unusual way to illustrate the diversity of response curve shapes.

      We agree that the two changes in the reviewer’s example will be categorized in the same quadrant in our analysis. We did not focus on the absolute changes because our analysis covers many log ratios of concentrations. Instead, we focused on the relative shapes of the concentration response curves, and more specifically, the direction of the change (i.e., the sign of the slope). We will better motivate this style of analysis in the revision. Moreover, in response to comments by Reviewer 2, we will compare response shapes between all three successive levels of concentration changes, as opposed to only two levels.

      The Drosophila OSN data are passed through normalization models and then compared to locust PN data. This seems dangerous, as flies and locusts are separated by about 300 M years of evolution, and we don't know that fly PNs act like locust PNs. Their antennal lobe anatomy differs in many ways, as does the olfactory physiology. To draw any conclusions about a change in neural representation, it would be preferable to have OSN and PN data from the same species.

      We are in the process of requesting PN response data in Drosophila from groups that have collected such data and will repeat the analysis once we get access to the data.

      One conclusion is that divisive normalization could account for some of the change in responses from receptors to 2nd order neurons. This seems to be well appreciated already [e.g., Olsen 2010, Papadopoulou 2011, minireview in Hong & Wilson 2013].

      While we agree that these manuscripts do study the effects of divisive normalization in insects and fish, here we show that this computation also generalizes to rodents. In addition, these previous studies do not focus on divisive normalization’s role towards concentration encoding/decoding, which is our focus. We will clarify this difference in the revision.

      Another claim is that subtractive normalization cannot perform that function. What model was used for subtractive normalization is unclear (there is an error in the Methods). It would be interesting if there were a categorical difference between divisive and subtractive normalization.

      We apologize for the mistake in the subtractive normalization equation and will correct it. Thank you for catching it.

      Looking closer at the divisive normalization model, it really has two components: (a) the "lateral inhibition" by which a neuron gets suppressed if other neurons fire (here scaled by the parameter k) , and (b) a nonlinear sigmoid transformation (determined by the parameters n and sigma). Both lateral inhibition and nonlinearity are known to contribute to decorrelation in a neural population (e.g., Pitkow 2012). The "intraglomerular gain control" contains only the nonlinearity. The "subtractive normalization" we don't know. But if one wanted to put divisive and subtractive inhibition on the same footing, one should add a sigmoid nonlinearity in both cases.

      Our intent was not to place all the methods on the “same footing” but rather to isolate the two primary components of normalization methods – non-linearity and lateral inhibition – and determine which of these, and in which combination, could generate the desired effects. Divisive normalization incorporates both components, whereas intraglomerular gain control and subtractive normalization only incorporate one of these components. We will clarify this reasoning in the revision.

      The response models could be made more realistic in other ways. For example, in both locusts and fish, the 2nd order neurons get inputs from multiple receptor types; presumably, that will affect their response functions. Also, lateral inhibition can take quite different forms. In locusts, the inhibitory neurons seem to collect from many glomeruli. But in rats, the inhibition by short axon cells may originate from just a few sparse glomeruli, and those might be different for every mitral cell (Fantana 2008).

      We thank the Reviewer for the input. Instead of fixing k for all second-order neurons, we will apply different k values for different neurons. We will also systematically vary the percentage of neurons used for the divisive normalization calculation in the denominator, and determine the regime under which the effects experimentally observed are reproducible. This approach takes into account the scenario that inter-glomerular inhibitory interactions are sparse.

      There are questions raised by the following statements: "traded-off energy for faster and finer concentration discrimination" and "an additional type of second-order neuron (tufted cells) that has evolved in land vertebrates and that outperforms mitral cells in concentration encoding" and later "These results suggest a trade-off between concentration decoding and normalization processes, which prevent saturation and reduce energy consumption.". Are the tufted cells inferior to the mitral cells in any respect? Do they suffer from saturation at high concentration? And do they then fail in their postulated role for odor tracking? If not, then what was the evolutionary driver for normalization in the mitral cell pathway? Certainly not lower energy consumption (50,000 mitral cells = 1% of rod photoreceptors, each of which consumes way more energy than a mitral cell).

      The question of what mitral cells are “good for”, compared to tufted cells, remains unclear in our view. We speculate that mitral cells provide superior context-dependent processing and are better for determining stimuli-reward contingencies, but this remains far from settled experimentally.

      We believe the mitral cell pathway evolved earlier than tufted cells, since the former appear akin to projection neurons in insects. Nonetheless, we agree that differences in energy consumption are unlikely to be the primary distinguishing factor, and in the revision, we will drop this argument.

      Reviewer #2:

      The main premise that divisive normalization generates this diversity of dose-response curves in the second-order neurons is a little problematic. … The analysis in [Figure 3] indicates that divisive normalization does what it is supposed to do, i.e., compresses concentration information and not alter the rank-order of neurons or the combinatorial patterns. Changes in the combinations of neurons activated with intensity arise directly from the fact that the first-order neurons did not have monotonic responses with odor intensity (i.e., crossovers). This was the necessary condition, and not the divisive normalization for changes in the combinatorial code. There seems to be a confusion/urge to attribute all coding properties found in the second-order neurons to 'divisive normalization.' If the input from sensory neurons is monotonic (i.e., no crossovers), then divisive normalization did not change the rank order, and the same combinations of neurons are activated in a similar fashion (same vector direction or combinatorial profile) to encode for different odor intensities. Concentration invariance is achieved, and concentration information is lost. However, when the first-order neurons are non-monotonic (i.e., with crossovers), that causes the second-order neurons to have different rank orders with different concentrations. Divisive normalization compresses information about concentrations, and rank-order differences preserve information about the odor concentration. Does this not mean that the non-monotonicity of sensory neuron response is vital for robustly maintaining information about odor concentration? Naturally, the question that arises is whether many of the important features of the second-order neuron's response simply seem to follow the input. Or is my understanding of the figures and the write-up flawed, and are there more ways in which divisive normalization contributes to reshaping the second-order neural response? This must be clarified. Lastly, the tufted cells in the mouse OB are also driven by this sensory input with crossovers. How does the OB circuit convert the input with crossovers into one that is monotonic with concentration? I think that is an important question that this computational effort could clarify.

      It appears that there is confusion about the definitions of “non-monotonicity” and “crossovers”.  These are two independent concepts – one does not necessarily lead to the other. Non-monotonicity concerns the response of a single neuron to different concentration levels. A neuron’s response is considered non-monotonic if its response goes up then down, or down then up, across increasing concentrations. A “cross-over” is defined based on the responses of multiple neurons. A cross-over occurs when the response of one neuron is lower than another neuron at one concentration, but higher than the other at a different concentration. For example, the responses of both neurons could increase monotonically with increasing concentration, but one neuron might start lower and grow faster, hence creating a cross-over. We will clarify this in the manuscript, which we believe will resolve the questions raised above.

      The way the decoding results and analysis are presented does not add a lot of information to what has already been presented. For example, based on the differences in rank-order with concentration, I would expect the combinatorial code to be different. Hence, a very simple classifier based on cosine or correlation distance would work well. However, since divisive normalization (DN) is applied, I would expect a simple classification scheme that uses the Euclidean distance metric to work equally as well after DN. Is this the case?

      Yes, we used a simple classification scheme, logistic regression with a linear kernel, which is essentially a Euclidean distance-based classification. This scheme works better for tufted cells because they are more monotonic; i.e., if neuron A and B both increase their responsiveness with concentration, then Euclidean distance would be fine. But if neuron A’s response amplitude goes up and neuron B’s response goes down – as often happens for mitral cells – then Euclidean distance does not work as well. We will add intuition about this in the manuscript.

      Leave-one-trial/sample-out seems too conservative. How robust are the combinatorial patterns across trials? Would just one or two training trials suffice for creating templates for robust classification? Based on my prior experience (https://elifesciences.org/reviewed-preprints/89330https://elifesciences.org/reviewed-preprints/89330), I do expect that the combinatorial patterns would be more robust to adaptation and hence also allow robust recognition of odor intensity across repeated encounters.

      As suggested, we will compute the correlation coefficient of the similarity of neural responses for each odor (across trials). We will repeat this analysis for both mitral and tufted cells. To determine the effect of adaptation, we will compute correlation coefficients of responses between the 1st and 2nd trials vs the 1st and final trial.

      Lastly, in the simulated data, since the affinity of the first-order sensory neurons to odorants is expected to be constant across concentration, and "Jaccard similarity between the sets of highest-affinity neurons for each pair of concentration levels was > 0.96," why would the rank-order change across concentration? DN should not alter the rank order.

      We agree that divisive normalization should not alter the rank order, but the rank order may change in first-order neurons, which carries through to second-order neurons. This confusion may be related to the one mentioned above re: cross-overs vs non-monotonicity. Moreover, in the simulated data (Fig. 4D-H), the Jaccard similarity was calculated based on only the 50 neurons with the highest affinity, not the entire population of neurons. As shown in Fig. 4H, most of the rank-order change happens in the remaining 150 neurons.

      Note that in response to a comment by Reviewer 3, we will change the presentation of Fig. 4H in the revision.

      If the set of early responders does change, how will the decoder need to change, and what precise predictions can be made that can be tested experimentally? The lack of exploration of this aspect of the results seems like a missed opportunity.

      In the Discussion, we wrote about how downstream circuits will need to learn which set of neurons are to be associated with each distinct concentration level. We will expand upon this point and include experimentally testable predictions.

      Based on the methods, for Figures 1 and 2, it appears the responses across time, trials, and odorants were averaged to get a single data point per neuron for each concentration. Would this averaging not severely dilute trends in the data? The one that particularly concerns me is the averaging across different odorants. If you do odor-by-odor analysis, is the flattening of second-order neural responses still observable? Because some odorants activate more globally and some locally, I would expect a wide variety of dose-response relationships that vary with odor identity (more compressed in second-order neurons, of course). It would be good to show some representative neural responses and show how the extracted values for each neuron are a faithful/good representation of its response variation across intensities.

      It appears there is some confusion here; we will clarify in the text and figure captions that we did not average across different odors in our analysis. We will also add figure panels showing some representative neural responses as suggested by the Reviewer.

      A lot of neurons seem to have responses that flat line closer to zero (both firing rate and dF/F in Figure 1). Are these responsive neurons? The mean dF/F also seems to hover not significantly above zero. Hence, I was wondering if the number of neurons is reducing the trend in the data significantly.

      Yes, if a neuron responds to at least one concentration level in at least 50% of the trials, it is considered responsive. So it is possible that some neurons respond to one concentration level and otherwise flatline near zero.  We will highlight a few example neurons to visualize this scenario.

      I did not fully understand the need to show the increase in the odor response across concentrations as a polar plot. I see potential issues with the same. For example, the following dose-response trend at four intensities (C4 being the highest concentration and C1 the lowest): response at C3 > response at C1 and response at C4 > response at C2. But response at C3 < response at C2. Hence, it will be in the top right segment of the polar plot. However, the responses are not monotonic with concentrations. So, I am not convinced that the polar plot is the right way to characterize the dose-response curves. Just my 2 cents.

      Your 2 cents are valuable! Thank you for raising this point. Instead of computing two slopes (C1-C3 and C2-C4), we will expand our analysis to include all three slopes (C1-C2, C2-C3, C3-C4). Consequently, there are 2^3 = 8 different response shapes, and we will list them and quantify the fraction of the responses that fall into each shape category.

      In many analyses, simulated data were used (Figures 3 and 4). However, there is no comparison of how well the simulated data fit the experimental data. For example, the Simulated 1st order neuron in Figure 3D does not show a change in rank-order for the first-order neuron. In Figure 3E, temporal response patterns in second-order neurons look unrealistic. Some objective comparison of simulated and experimental data would help bolster confidence in these results.

      We believe the Reviewer is referring to Figs. 4D and 4E, since Fig. 3D does not show a first-order neuron simulation, and there is no Fig 3E. In Fig. 4D there is no change of rank order because the simulation is for a single odor and single concentration level, and the change of rank-order (i.e., cross-overs) as we define occurs between concentration levels. We will clarify this in the manuscript.

      Reviewer #3:

      While the authors focus on concentration-dependent increases in first-order neuron activity, reflecting the majority of observed responses, recent work from the Imai group shows that odorants can also lead to direct first-order neuron inhibition (i.e., reduction in spontaneous activity), and within this subset, increasing odorant concentration tends to increase the degree of inhibition. Some discussion of these findings and how they may complement divisive normalization to contribute to the diverse second-order neuron concentration-dependence would be of interest and help expand the context of the current results.

      We thank the Reviewer for the suggestion. We will request datasets of first-order neuron responses from the groups who acquired them. We will analyze this data to determine the role of inhibition or antagonistic binding and quantify what percentage of first-order neurons respond less strongly with larger concentrations.

      Related to the above point, odorant-evoked inhibition of second-order neurons is widespread in mammalian mitral cells and significantly contributes to the flattened concentration-dependence of mitral cells at the population level. Such responses are clearly seen in Figure 1D. Some discussion of how odorant-evoked mitral cell inhibition may complement divisive normalization, and likewise relate to comparatively lower levels of odorant-evoked inhibition among tufted cells, would further expand the context of the current results. Toward this end, replication of analyses in Figures 1D and E following exclusion of mitral cell inhibitory responses would provide insight into the contribution of such inhibition to the flattening of the mitral cell population concentration dependence.

      We will perform the analysis suggested, specifically, we will set the negative mitral cell responses to 0 and assess whether the population mean remains flat.

      The idea of concentration-dependent crossover responses across the first-order population being required for divisive normalization to generate individually diverse concentration response functions across the second-order population is notable. The intuition of the crossover responses is that first-order neurons that respond most sensitively to any particular odorant (i.e., at the lowest concentration) respond with overall lower activity at higher concentrations than other first-order neurons less sensitively tuned to the odorant. Whether this is a consistent, generalizable property of odorant binding and first-order neuron responsiveness is not addressed by the authors, however. Biologically, one mechanism that may support such crossover events is intraglomerular presynaptic/feedback inhibition, which would be expected to increase with increasing first-order neuron activation such that the most-sensitively responding first-order neurons would also recruit the strongest inhibition as concentration increases, enabling other first-order neurons to begin to respond more strongly. Discussion of this and/or other biological mechanisms (e.g., first-order neuron depolarization block) supporting such crossover responses would strengthen these results.

      We thank the reviewer for providing additional mechanisms to consider. As suggested, we will add discussion of these alternatives to divisive normalization.

      It is unclear to what degree the latency analysis considered in Figures 4D-H works with the overall framework of divisive normalization, which in Figure 3 we see depends on first-order neuron crossover in concentration response functions. Figure 4D suggests that all first-order neurons respond with the same response amplitude (R in eq. 3), even though this is supposed to be pulled from a distribution. It's possible that Figure 4D is plotting normalized response functions to highlight the difference in latency, but this is not clear from the plot or caption. If response amplitudes are all the same, and the response curves are, as plotted in Figure 4D, identical except for their time to half-max, then it seems somewhat trivial that the resulting second-order neuron activation will follow the same latency ranking, regardless of whether divisive normalization exists or not. However, there is some small jitter in these rankings across concentrations (Figure 4G), suggesting there is some randomness to the simulations. It would be helpful if this were clarified (e.g., by showing a non-normalized Figure 4D, with different response amplitudes), and more broadly, it would be extremely helpful in evaluating the latency coding within the broader framework proposed if the authors clarified whether the simulated first-order neuron response timecourses, when factoring in potentially different amplitudes (R) and averaging across the entire response window, reproduces the concentration response crossovers observed experimentally. In summary, in the present manuscript, it remains unclear if concentration crossovers are captured in the latency simulations, and if not, the authors do not clearly address what impact such variation in response amplitudes across concentrations may have on the latency results. It is further unclear to what degree divisive normalization is necessary for the second-order neurons to establish and maintain their latency ranks across concentrations, or to exhibit concentration-dependent changes in latency.

      As suggested by the Reviewer, we will add another simulation scenario where the response amplitudes (R) are different for different neurons. For each concentration, we will then average each neuron’s response across the entire response window and determine if the simulation reproduces the cross-overs as observed experimentally.

      How the authors get from Figure 4G to 4H is not clear. Figure 4G shows second-order neuron response latencies across all latencies, with ordering based on their sorted latency to low concentration. This shows that very few neurons appear to change latency ranks going from low to high concentration, with a change in rank appearing as any deviation in a monotonically increasing trend. Focusing on the high concentration points, there appear to be 2 latency ranks switched in the first 10 responding neurons (reflecting the 1 downward dip in the points around neuron 8), rather than the 7 stated in the text. Across the first 50 responding neurons, I see only ~14 potential switches (reflecting the ~7 downward dips in the points around neurons 8, 20, 32, 33, 41, 44, 50), rather than the 32 stated in the text. It is possible that the unaccounted rank changes reflect fairly minute differences in latencies that are not visible in the plot in Figure 4G. This may be clarified by plotting each neuron's latency at low concentration vs. high concentration (i.e., similar to Figure 4H, but plotting absolute latency, not latency rank) to allow assessment of the absolute changes. If such minute differences are not driving latency rank changes in Fig. 4G, then a trend much closer to the unity line would be expected in Figure 4H. Instead, however, there are many massive deviations from unity, even within the first 50 responding neurons plotted in Figure 4G. These deviations include a jump in latency rank from 2 at low concentration to ~48 at high concentration. Such a jump is simply not seen in Figure 4G.

      We apologize that Fig. 4H was a poor choice for visualization. What is plotted in Fig. 4H is the sorted identity of neurons under low and high concentrations, and points on the y=x line indicate that the two corresponding neurons have the same rank under the two concentrations. We will replace this panel with a more intuitive visualization, where the x and y axes are the ranks of the neurons; and deviation from the y=x line indicates how different the ranks are of a neuron to the two concentrations.

      In the text, the authors state that "Odor identity can be encoded by the set of highest-affinity neurons (which remains invariant across concentrations)." Presumably, this is a restatement of the primacy model and refers to invariance in latency rank (since the authors have not shown that the highest-affinity neurons have invariant response amplitudes across concentration). To what degree this statement holds given the results in Figure 4H, however, which appear to show that some neurons with the earliest latency rank at low concentration jump to much later latency ranks at high concentration, remains unclear. Such changes in latency rank for only a few of the first responding neurons may be negligible for classifying odor identity among a small handful of odorants, but not among 1-2 orders of magnitude more odors, which may feasibly occur in a natural setting. Collectively, these issues with the execution and presentation of the latency analysis make it unclear how robust the latency results are.

      The original primacy model states that the latency of a neuron decreases with increasing concentration, while the ranks of neurons remain unaltered. Our results, on the other hand, suggest that the ranks do at least partially change across concentrations. This leads to two possible decoding mechanisms. First, if the top K responding neurons remain invariant across concentrations (even if their individual ranks change within the top K), then the brain could learn to associate a population of K neurons with a response latency; lower response latency means higher concentration. Second, if the top K responding neurons do not remain invariant across concentrations, then the brain would need to learn to associate a different set of neurons with each concentration level. The latter imposes additional constraints on the robustness of the primacy model and the corresponding read-out mechanism. We will include more discussion of these possibilities in the revision.

      Analysis in Figures 4A-C shows that concentration can be decoded from first-order neurons, second-order neurons, or first-order neurons with divisive normalization imposed (i.e., simulating second-order responses). This does not say that divisive normalization is necessary to encode concentration, however. Therefore, for the authors to say that divisive normalization is "a potential mechanism for generating odor-specific subsets of second-order neurons whose combinatorial activity or whose response latencies represent concentration information" seems too strong a conclusion. Divisive normalization is not generating the concentration information, since that can be decoded just as well from the first-order neurons. Rather, divisive normalization can account for the different population patterns in concentration response functions between first- and second-order neurons without discarding concentration-dependent information.

      We agree that the word “generating” is faulty. We thank the reviewer for their more precise wording, which we will adopt.

      Performing the same polar histogram analysis of tufted vs. mitral cell concentration response functions (Figure 5B) provides a compelling new visualization of how these two cell types differ in their concentration variance. The projected importance of tufted cells to navigation, emerging directly through the inverse relationship between average concentration and distance (Figure 5C), is not surprising, and is largely a conceptual analysis rather than new quantitative analysis per se, but nevertheless, this is an important point to make. Another important consideration absent from this section, however, is whether and how divisive normalization may impact tufted cell activity. Previous work from the authors, as well as from Schoppa, Shipley, and Westbrook labs, has compellingly demonstrated that a major circuit mediating divisive normalization of mitral cells (GABA/DAergic short-axon cells) directly targets external tufted cells, and is thus very likely to also influence projection tufted cells. Such analysis would additionally provide substantially more justification for the Discussion statement "we analyzed an additional type of second-order neuron (tufted cells)", which at present instead reflects fairly minimal analysis.

      We agree that tufted cells are subject to divisive normalization as well, albeit probably to a less degree than mitral cells. To determine the effect of this, we will alter the strength (and degree of sparseness of interglomerular interactions) of divisive normalization and determine if there is a regime where response features of tufted cells match those observed experimentally.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Zhang et al. used a conditional knockout mouse model to re-examine the role of the RNA-binding protein PTBP1 in the transdifferentiation of astroglial cells into neurons. Several earlier studies reported that PTBP1 knockdown can efficiently induce the transdifferentiation of rodent glial cells into neurons, suggesting potential therapeutic applications for neurodegenerative diseases. However, these findings have been contested by subsequent studies, which in turn have been challenged by more recent publications. In their current work, Zhang et al. deleted exon 2 of the Ptbp1 gene using an astrocyte-specific, tamoxifen-inducible Cre line and investigated, using fluorescence imaging and bulk and single-cell RNA-sequencing, whether this manipulation promotes the transdifferentiation of astrocytes into neurons across various brain regions. The data strongly indicate that genetic ablation of PTBP1 is not sufficient to drive efficient conversion of astrocytes into neurons. Interestingly, while PTBP1 loss alters splicing patterns in numerous genes, these changes do not shift the astroglial transcriptome toward a neuronal profile.

      Strengths:

      Although this is not the first report of PTBP1 ablation in mouse astrocytes in vivo, this study utilizes a distinct knockout strategy and provides novel insights into PTBP1-regulated splicing events in astrocytes. The manuscript is well written, and the experiments are technically sound and properly controlled. I believe this study will be of considerable interest to a broad readership.

      Weaknesses:

      (1) The primary point that needs to be addressed is a better understanding of the effect of exon 2 deletion on PTBP1 expression. Figure 4D shows successful deletion of exon 2 in knockout astrocytes. However, assuming that the coverage plots are CPM-normalized, the overall PTBP1 mRNA expression level appears unchanged. Figure 6A further supports this observation. This is surprising, as one would expect that the loss of exon 2 would shift the open reading frame and trigger nonsense-mediated decay of the PTBP1 transcript. Given this uncertainty, the authors should confirm the successful elimination of PTBP1 protein in cKO astrocytes using an orthogonal approach, such as Western blotting, in addition to immunofluorescence. They should also discuss possible reasons why PTBP1 mRNA abundance is not detectably affected by the frameshift.

      We thank the reviewer for raising this important point. Indeed, the deletion of exon 2 introduces a frameshift that is predicted to disrupt the PTBP1 open reading frame and trigger nonsensemediated decay (NMD). While our CPM-normalized coverage plots (Figure 4D) and gene-level expression analysis (Figure 6A) suggest that PTBP1 mRNA levels remain largely unchanged in cKO astrocytes, we acknowledge that this observation is counterintuitive and merits further clarification.

      We suspect that the process of brain tissue dissociation and FACS sorting for bulk or single cell RNA-seq may enrich for nucleic material and thus dilute the NMD signal, which occurs in the cytoplasm. Alternatively, the transcripts (like other genes) may escape NMD for unknown mechanisms. Although a frameshift is a strong indicator for triggering NMD, it does not guarantee NMD will occur in every case. We will include this discussion in the revised manuscript to provide additional context for the apparent discrepancy between mRNA abundance and protein loss.

      Regarding the validation of PTBP1 protein depletion in cKO astrocytes by Western blotting, we acknowledge that orthogonal approaches to confirm PTBP1 elimination would address uncertainty around the effect of exon 2 deletion on PTBP1 expression. The low cell yield of cKO astrocytes poses a significant burden on obtaining sufficient samples for immunoblotting detection of PTBP1 depletion. On average 3-5 adult animals per genotype are needed for each biological replicate. Our characterization of this Ptbp1 deletion allele in other contexts show the loss of full length PTBP1 proteins in ESCs and NPCs using Western blotting. Furthermore, germline homozygous mutant mice do not survive beyond embryonic day 6, supporting that it is  a loss of function allele.

      (2) The authors should analyze PTBP1 expression in WT and cKO substantia nigra samples shown in Figure 3 or justify why this analysis is not necessary.

      We thank the reviewer for pointing out this important question. We used Aldh1l1-CreERT2, which is designed to be active in all the astrocyte throughout mouse brain. Although we have systematically verified PTBP1 elimination in different mouse brain regions (cortex and striatum) at multiple time points (from 4w to 12w after tamoxifen administration), we agree that it remains necessary and important to demonstrate whether the observed lack of astrocyte-to-neuron conversion is indeed associated with sufficient PTBP1 depletion. We will analyze the PTBP1 expression in the substantia nigra, as we did in the cortex and striatum. 

      (3) Lines 236-238 and Figure 4E: The authors report an enrichment of CU-rich sequences near PTBP1-regulated exons. To better compare this with previous studies on position-specific splicing regulation by PTBP1, it would be helpful to assess whether the position of such motifs differs between PTBP1-activated and PTBP1-repressed exons.

      We thank the reviewer for this insightful comment. We agree that assessing the positional distribution of CU-rich motifs between PTBP1-activated and PTBP1-repressed exons would provide valuable insight into the position-specific regulatory mechanisms of PTBP1. In response, we will perform separate motif enrichment analyses for PTBP1-activated and PTBP1-repressed exons and examine whether their positional patterns differ. This will help clarify whether these exons are differentially regulated by PTBP1 through distinct motif positioning in mature astrocytes.

      (4) The analyses in Figure 5 and its supplement strongly suggest that the splicing changes in PTBP1-depleted astrocytes are distinct from those occurring during neuronal differentiation. However, the authors should ensure that these comparisons are not confounded by transcriptome-wide differences in gene expression levels between astrocytes and developing neurons. One way to address this concern would be to compare the new PTBP1 cKO data with publicly available RNA-seq datasets of astrocytes induced to transdifferentiate into neurons using proneural transcription factors (e.g., PMID: 38956165).

      We would like to express our gratitude for the thoughtful feedback. We agree that transcriptomewide differences in gene expression between astrocytes and developing neurons could confound the interpretation of splicing differences. To address this concern, we will incorporate publicly available RNA-seq datasets from studies in which astrocytes are reprogrammed into neurons using proneural transcription factors (PMID: 38956165). 

      Reviewer #2 (Public review):

      Summary:

      The manuscript by Zhang and colleagues describes a study that investigated whether the deletion of PTBP1 in adult astrocytes in mice led to an astrocyte-to-neuron conversion. The study revisited the hypothesis that reduced PTBP1 expression reprogrammed astrocytes to neurons. More than 10 studies have been published on this subject, with contradicting results. Half of the studies supported the hypothesis while the other half did not. The question being addressed is an important one because if the hypothesis is correct, it can lead to exciting therapeutic applications for treating neurodegenerative diseases such as Parkinson's disease.

      In this study, Zhang and colleagues conducted a conditional mouse knockout study to address the question. They used the Cre-LoxP system to specifically delete PTBP1 in adult astrocytes. Through a series of carefully controlled experiments, including cell lineage tracing, the authors found no evidence for the astrocyte-to-neuron conversion.

      The authors then carried out a key experiment that none of the previous studies on the subject did: investigating alternative splicing pattern changes in PTBP1-depleted cells using RNA-seq analysis. The idea is to compare the splicing pattern change caused by PTBP1 deletion in astrocytes to what occurs during neurodevelopment. This is an important experiment that will help illuminate whether the astrocyte-to-neuron transition occurred in the system. The result was consistent with that of the cell staining experiments: no significant transition was detected.

      These experiments demonstrate that, in this experimental setting, PTBT1 deletion in adult astrocytes did not convert the cells to neurons.

      Strengths:

      This is a well-designed, elegantly conducted, and clearly described study that addresses an important question. The conclusions provide important information to the field.

      To this reviewer, this study provided convincing and solid experimental evidence to support the authors' conclusions.

      Weaknesses:

      The Discussion in this manuscript is short and can be expanded. Can the authors speculate what led to the contradictory results in the published studies? The current study, in combination with the study published in Cell in 2021 by Wang and colleagues, suggests that observed difference is not caused by the difference of knockdown vs. knockout. Is it possible that other glial cell types are responsible for the transition? If so, what cells? Oligodendrocytes?

      We are grateful for the reviewer’s careful reading and valuable suggestions. These will help us improve the manuscript. We will expand the Discussion. The contradictory results in the previously published studies can be due to the stringency and neuronal leakage of the astrocytespecific GFAP promoter that some investigators chose. Other possibilities include alternative cell origin, increased neuronal resilience, or combinations of as yet unidentified factors.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Weaknesses:

      (1) The manuscript's logical flow is challenging and hard to follow, and key arguments could be more clearly structured, particularly in transitions between mechanistic components.

      We have revised our manuscript so as to make it easy for readers to follow the logical flow in transitions between mechanistic components by adding the descriptions of Figure S1E-J, Figure S2F-K, Figure S3A-H, Figure S4A-F, Figure S5, and Figure S6 in the revised manuscript.

      (2) The causality between stress-induced α2A-AR internalization and the enhanced MAO-A remains unclear. Direct experimental evidence is needed to determine whether α2A-AR internalization itself or Ca2+ drives MAO-A activation, and how they activate MAO-A should be considered.

      We believe that the causality between stress-induced α2A-AR internalization and the enhancement of MAO-A is clearly demonstrated by our current experiments, while our explanations may be improved by making them easier to understand especially for those who are not expert on electrophysiology.

      Firstly, it is well established that autoinhibition in LC neurons is mediated by α2A-AR coupled-GIRK (Arima et al., 1998, J Physiol; Williams et al., 1985, Neuroscience). We found that spike frequency adaptation in LC neurons was also mediated by α2A-AR coupled GIRK-I (Figure 1A-I), and that α2A-AR coupled GIRK-I underwent [Ca<sup>2+</sup>]<sub>i</sub> dependent rundown (Figures 2, S1, S2), leading to an abolishment of spike-frequency adaptation (Figures S4). [Ca<sup>2+</sup>]<sub>i</sub> dependent rundown of α2A-AR coupled GIRK-I was prevented by barbadin (Figure 2G-J), which prevents the internalization of G-protein coupled receptor (GPCR) channels.

      Abolishment of spike frequency adaptation itself, i.e., “increased spike activity” can increase [Ca<sup>2+</sup>]<sub>i</sub> because [Ca<sup>2+</sup>]<sub>i</sub> is entirely dependent on the spike activity as shown by [Ca<sup>2+</sup>]<sub>i</sub> imaging method in Figure S3.

      Thus, α2A-AR internalization can increase [Ca<sup>2+</sup>]<sub>i</sub> through the abolishment of autoinhibition or spike frequency adaptation, and a [Ca<sup>2+</sup>]<sub>i</sub> increase drives MAO-A activation as reported previously (Cao et al., 2007, BMC Neurosci). The mechanism how Ca<sup>2+</sup> activates MAO-A is beyond the scope of the current study.

      Our study just focused on the mechanism how chronic or sever stress can cause persistent overexcitation and how it results in LC degeneration.

      (3) The connection between α2A-AR internalization and increased cytosolic NA levels lacks direct quantification, which is necessary to validate the proposed mechanism.

      Direct quantification of the relationship between α2A-AR internalization and increased cytosolic NA levels may not be possible, and may not be necessarily needed to be demonstrated as explained below.

      The internalization of α2A-AR can increase [Ca<sup>2+</sup>]<sub>i</sub> through the abolishment of autoinhibition or spike frequency adaptation, and [Ca<sup>2+</sup>]<sub>i</sub> increases can facilitate NA autocrine (Huang et al., 2007), similar to the transmitter release from nerve terminals (Kaeser & Regehr, 2014, Annu Rev Physiol).

      Autocrine released NA must be re-uptaken by NAT (NA transporter), which is firmly established (Torres et al., 2003, Nat Rev Neurosci). Re-uptake of NA by NAT is the only source of intracellular NA, and NA re-uptake by NAT should be increased as the internalization of NA biding site (α2A-AR) progresses in association with [Ca<sup>2+</sup>]<sub>i</sub> increases (see page 11, lines 334-336).

      Thus, the connection between α2A-AR internalization and increased cytosolic NA levels is logically compelling, and the quantification of such connection may not be possible at present (see the response to the comment made by the Reviewer #1 as Recommendations for the authors (2) and beyond the scope of our current study.

      (4) The chronic stress model needs further validation, including measurements of stress-induced physiological changes (e.g., corticosterone levels) to rule out systemic effects that may influence LC activity. Additional behavioral assays for spatial memory impairment should also be included, as a single behavioral test is insufficient to confirm memory dysfunction.

      It is well established that restraint stress (RS) increases corticosterone levels depending on the period of RS (García-Iglesias et al., 2014, Neuropharmacology), although we are not reluctant to measure the corticosterone levels. In addition, there are numerous reports that showed the increased activity of LC neurons in response to various stresses (Valentino et al., 1983; Valentino and Foote, 1988; Valentino et al., 2001; McCall et al., 2015), as described in the text (page 4, lines 96-98). Measurement of cortisol levels may not be able to rule out systemic effects of CRS on the whole brain.

      We had already done another behavioral test using elevated plus maze (EPM) test.By combining the two tests, it may be possible to more accurately evaluate the results of Y-maze test by differentiating the memory impairment from anxiety. However, the results obtained by these behavioral tests are just supplementary to our current aim to elucidate the cellular mechanisms for the accumulation of cytosolic free NA. Therefore, we have softened the implication of anxiety and memory impairment (page 13, lines 397-400 in the revised manuscript).

      (5) Beyond b-arrestin binding, the role of alternative internalization pathways (e.g., phosphorylation, ubiquitination) in α2A-AR desensitization should be considered, as current evidence is insufficient to establish a purely Ca<sup>2+</sup> -dependent mechanism.

      We can hardly agree with this comment. 

      It was clearly demonstrated that repeated application of NA itself did not cause desensitization of α2A-AR (Figure S1A-D), and that the blockade of b-arrestin binding by barbadin completely suppressed the Ca<sup>2a</sup>-dependent downregulation of GIRK (Figure 2G-K). These observations can clearly rule out the possible involvement of phosphorylation or ubiquitination for the desensitization.

      Not only the barbadin experiment, but also the immunohistochemistry and western blot method clearly demonstrated the decrease of α2A-AR expression on the cell membrane (Figure 3).

      Ca<sup>2+</sup>-dependent mechanism of the rundown of GIRK was convincingly demonstrated by a set of different protocols of voltage-clamp study, in which Ca<sup>2+</sup> influx was differentially increased. The rundown of GIRK-I was orderly potentiated or accelerated by increasing the number of positive command pulses each of which induces Ca<sup>2+</sup> influx (compare Figure S1E-J, Figure S2A-E and Figure S2F-K along with Figure 2A-F). The presence or absence of Ca<sup>2+</sup> currents and the amount of Ca<sup>2+</sup> currents determined the trend of the rundown of GIRK-I (Figures 2, S1 and S2). Because the same voltage protocol hardly caused the rundown when it did not induce Ca<sup>2+</sup> currents in the absence of TEA (Figure S1F; compare with Figure 2B), blockade of Ca<sup>2+</sup> currents by nifedipine would not be so beneficial.

      We believe the series of voltage-clamp protocols convincingly demonstrated the orderly involvement of [Ca<sup>2+</sup>]<sub>i</sub> in accelerating the rundown of GIRK-I.

      (6) NA leakage for free NA accumulation is also influenced by NAT or VMAT2. Please discuss the potential role of VMAT2 in NA accumulation within the LC in AD. 

      It has been demonstrated that reduced VMAT2 levels increased susceptibility to neuronal damage: VMAT2 heterozygote mice displayed increased vulnerability to MPTP as evidenced by reductions in nigral dopamine cell counts (Takahashi et al, 1997, PNAS). Thus, when the activity of VMAT2 in LC neurons were impaired by chronic restraint stress, cytosolic NA levels in LC neurons would increase. We have added such discussion in the revised manuscript (page 12, lines 381-384).

      (7) Since the LC is a small brain region, proper staining is required to differentiate it from surrounding areas. Please provide a detailed explanation of the methodology used to define LC regions and how LC neurons were selected among different cell types in brain slices for whole-cell recordings.

      LC neurons were identified immunohistochemically and electrophysiologically as we previously reported (see Fig. 2 in Front. Cell. Neurosci. 16:841239. doi: 10.3389/fncel.2022.841239). We have added this explanation in the method section of the revised manuscript (page 15, lines 474-475). A delayed spiking pattern in response to depolarizing pulses (Figure S10 in the revised manuscript) applied at a hyperpolarized membrane potential was commonly observed in LC neurons in many studies (Masuko et al., 1986; van den Pol et al., 2002; Wagner-Altendorf et al., 2019).

      Reviewer #2 (Public review):

      Weaknesses:

      (1) The manuscript reports that chronic stress for 5 days increases MAO-A levels in LC neurons, leading to the production of DOPEGAL, activation of AEP, and subsequent tau cleavage into the tau N368 fragment, ultimately contributing to neuronal damage. However, the authors used wild-type C57BL/6 mice, and previous literature has indicated that AEP-mediated tau cleavage in wild-type mice is minimal and generally insufficient to cause significant behavioral alterations. Please clarify and discuss this apparent discrepancy.

      In our study, normalized relative value of AEP-mediated tau cleavage (Tau N368) was much higher in CRS mice than non-stress wild-type mice. It is not possible to compare AEP-mediated tau cleavage between our non-stress wild type mice and those observed in previous study (Zhang et al., 2014, Nat Med), because band intensity is largely dependent on the exposure time and its numerical value is the normalized relative value. In view of such differences, our apparent band expression might have been intensified to detect small changes.

      (2) It is recommended that the authors include additional experiments to examine the effects of different durations and intensities of stress on MAO-A expression and AEP activity. This would strengthen the understanding of stress-induced biochemical changes and their thresholds.

      GIRK rundown was almost saturated after 3-day RS and remained the same in 5-day RS mice (Fig. 4A-G), which is consistent with the downregulation of α2A-AR and GIRK1 expression by 3-day RS (Fig. 3C, F and G; Fig. 4J and K). However, we examined the protein levels of MAO-A, pro/active-AEP and Tau N368 only in 5-day RS mice without examining in 3-day RS mice. This is because we considered the possibility that a high [Ca<sup>2+</sup>]<sub>i</sub> condition may have to be sustained for some period of time to induce changes in MAO-A, AEP and Tau N368, and therefore 3-day RS may be insufficient to induce such changes. We have added this in the revised manuscript (page 17, lines 521-525).

      (3) Please clarify the rationale for the inconsistent stress durations used across Figures 3, 4, and 5. In some cases, a 3-day stress protocol is used, while in others, a 5-day protocol is applied. This discrepancy should be addressed to ensure clarity and experimental consistency.

      Please see our response to the comment (2).

      (4) The abbreviation "vMAT2" is incorrectly formatted. It should be "VMAT2," and the full name (vesicular monoamine transporter 2) should be provided at first mention.

      Thank you for your suggestion. We have revised accordingly.

      Reviewer #3 (Public review):

      Weaknesses:

      Nevertheless, the manuscript currently reads as a sequence of discrete experiments rather than a single causal chain. Below, I outline the key points that should be addressed to make the model convincing.

      Please see the responses to the recommendation for the authors made by reviewer #3.

      Reviewer #1 (Recommendations for the authors):

      (1) Improve the clarity and organization of the manuscript, ensuring smoother transitions between concepts and mechanisms.

      Please see the response to the comment raised by Reviewer #1 as Weakness

      (2) Adjust any quantifying method for cytosolic NA levels under different conditions to support the link between receptor internalization and NA accumulation.

      If fluorescent indicator of cytosolic free NA is available, it would be possible to measure changes in cytosolic NA levels. However, at present, there appeared to be no fluorescence probe to label cytosolic NA. For example, NS521 labels both dopamine and norepinephrine inside neurosecretory vesicles (Hettie & Glass et al., 2014, Chemistry), and BPS3 fluorescence sensor labels NA around cell membrane by anchoring on the cell membrane (Mao et al., 2023, Nat Comm). Furthermore, the method reported in “A Genetically Encoded Fluorescent Sensor for Rapid and Specific In Vivo Detection of Norepinephrine” is limited to detect NA only when α2AR is expressed. In the present study, increases in cytosolic NA levels are caused by internalization of α2AR. Cytosolic NA measurements with GRAB NE photometry may not be applicable in the present study. However, we have discussed the availability of such fluorescent methods to directly prove the increase in cytosolic NA as a limitation of our study (page 14, lines 429-436 in the revised manuscript).

      (3) Include validation of the chronic stress model with physiological and behavioral measures (e.g., corticosterone levels and another behavioral test).

      Please see the response to the comment raised by Reviewer #1 as Weakness (4).

      (4) All supplemental figures should be explicitly explained in the Results section. Specifically, clarify and describe the details of Figure S1G-K, Figure S2F-K, Figure S3A-H, Figure S4A-F, Figure S5, and Figure S6 to ensure all supplementary data are fully integrated into the main text.

      We have more explicitly and clearly described the details of Figure S1E-J, Figure S2F-K, Figure S3A-H, Figure S4A-F, Figure S5, and Figure S6 and fully integrated those explanations into the main text in the revised manuscript.

      (5) In Figure 3, the morphology of TH-positive cells differs between panels D and E. Additionally, TH is typically expressed in the cytosol, but in the provided images, it appears to be localized only to the membrane. Please clarify this discrepancy and provide a lower-magnification image to display a larger area, not one cell.

      In a confocal image, TH is not necessarily expressed homogenously in the cytosol, but is expressed in a ring-shaped pattern inside the plasma membrane, avoiding the cell nucleus and its surrounding Golgi apparatus and endoplasmic reticulum (ER) (Henrich et al., 2018, Acta Neuropathol Commun; see Fig. 4a and 6e), especially when the number of z-stack of confocal images is small. This is presumably because LC neurons are especially enriched with numerous Golgi apparatus and ER (Groves & Wilson, 1980, J Comp Neurol).

      In Figure S7, we showed a lower-magnification image of LC and its adjacent area (mesencephalic trigeminal nucleus). In the LC area, there are a variety of LC neurons, which include oval shaped neurons (open arrowhead; similar to Figure 3D) and also rhombus-like shaped neurons (open double arrowheads, similar to Figure 3E). A much lower-magnification image of LC neurons constituting LC nucleus was shown in Figure 5A.

      (6) In Figure 5, the difference in MAO-A expression is not clearly visible in the fluorescence images. Enzymatic assays for AEP and MAO-A should be included to demonstrate the increased activity better.

      In the current study, we did not elaborate to detect the changes in TH, MAO-A and AEP in terms of immunohistochemical method. Instead, we elaborated to detect such changes in terms of western blot method. The main conclusions in the current study were drawn primarily by electrophysiological techniques as we have expended much effort on electrophysiological experiments. Because the relative quantification of active AEP and Tau N368 proteins by western blotting analysis may accurately reflect changes in those enzyme activities, enzymatic assay may not be necessarily required but is helpful to better demonstrate AEP and MAO-A activity. We have described the necessity of enzymatic assay to better demonstrate the AEP and MAO-A activities (page 10, lines 314-315).

      Reviewer #3 (Recommendations for the authors):

      (1) Causality across the pathway

      Each step (α2A internalisation, GIRK rundown, Ca<sup>2+</sup> rise, MAO-A/AEP upregulation) is demonstrated separately, but no experiment links them in a single preparation. Consider in vivo Ca<sup>2+</sup> or GRAB NE photometry during restraint stress while probing α2A levels with i.p. clonidine injection or optogenetic over excitation coupled to biochemical readouts. Such integrated evidence would help to overcome the correlational nature of the manuscript to a more mechanistic study.

      It is not possible to measure free cytosolic NA levels with GRAB NE photometry when α2A AR is internalized as described above (see the response to the comment made by reviewer #1 as the recommendation for the authors).

      (2) Pharmacology and NE concentration

      The use of 100 µM noradrenaline saturates α and β adrenergic receptors alike. Please provide ramp measurements of GIRK current in dose-response at 1-10 µM NE (blocked by atipamezole) to confirm that the rundown really reflects α2A activity rather than mixed receptor effects.

      It is true that 100 µM noradrenaline activates both α and β adrenergic receptors alike. However, it was clearly showed that enhancement of GIRK-I by 100 µM noradrenaline was completely antagonized by 10 µM atipamezole and the Ca<sup>2+</sup> dependent rundown of NA-induced GIRK-I was prevented by 10 µM atipamezole. Considering the Ki values of atipamezole for α2A AR (=1~3 nM) (Vacher et al., 2010, J Med Chem) and β AR (>10 µM) (Virtanen et al., 1989, Arch Int Pharmacodyn Ther), these results really reflect α2A AR activity but not β AR activity (Figure S5). Furthermore, because it is already well established that NA-induced GIRK-I was mediated by α2A AR activity in LC neurons (Arima et al., 1998, J Physiol; Williams et al., 1985, Neuroscience), it is not necessarily need to re-examine 1-10 µM NA on GIRK-I.

      (3) Calcium dependence is not yet definitive

      The rundown is induced with a TEA-enhanced pulse protocol. Blocking L-type channels with nifedipine (or using Cd²⁺) during this protocol should show whether Ca<sup>2+</sup> entry is necessary. Without such a control, the Ca<sup>2+</sup> link remains inferential.

      The Ca<sup>2+</sup> link was precisely demonstrated by a series of voltage clamp experiment, in which Ca<sup>2+</sup> influx was orderly potentiated by increasing the number of positive voltage pulses (Figures S1 and S2). As the number of positive voltage pulses was increased, the rundown of GIRK-I was accelerated or enhanced more. The relationship between the number of spikes and the Ca<sup>2+</sup> influx detected as Ca<sup>2+</sup> transients was well documented in Ca2+ imaging experiments using fura-2 (Figure S3).

      The presence or absence of Ca<sup>2+</sup> currents and the amount of Ca<sup>2+</sup> currents determined the trend of the rundown of GIRK-I (Figs. 2, S1 and S2). The same voltage protocol hardly caused the rundown when it did not induce Ca<sup>2+</sup> currents in the absence of TEA (Fig. S1F; compare with Fig. 2B), and the series of voltage-clamp protocols convincingly demonstrated the orderly involvement of [Ca<sup>2+</sup>]<sub>i</sub> in accelerating the rundown of GIRK-I. Therefore, blockade of Ca<sup>2+</sup> currents by nifedipine may not be so beneficial.

      (4) Age mismatch and disease claims

      All electrophysiology and biochemical data come from juvenile (< P30) mice, yet the conclusions stress Alzheimer-related degeneration. Key endpoints need to be replicated in adult or aged mice, or the manuscript should soften its neurodegenerative scope.

      As described in the section of Conclusion, we never stress Alzheimer-related degeneration, but might give such an impression. To avoid such a misunderstanding, we have added a description “However, the present mechanism must be proven to be valid in adult or old mice, to validate its involvement in the pathogenesis of AD.” (page 14, lines 448-450).

      (5) Direct evidence for extracellular/cytosolic NE

      The proposed rise in reuptake NA is inferred from electrophysiology. Modern fluorescent sensors (GRAB NE, nLight) or fast scan voltammetry could quantify NE overflow and clearance during stress, directly testing the model.

      Please see the response to the comment made by Reviewer #1 as the Recommendations for the authors (2) as described above.

      (6) Quantitative histology

      Figure 5 presents attractive images but no numerical analysis. Please provide ROI-based fluorescence quantification (with n values) or move the images to the supplement and rely on the Western blots.

      We have moved the immunohistochemical results in Fig. 5 to the supplement as we believe the quantification of immunohistochemical staining is not necessarily correct.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      This manuscript investigates beta burst dynamics in the primate motor cortex during movement and recovery from stroke. The authors differentiate between "global" beta bursts, which are synchronous across cortical and often subcortical regions, and more spatially confined "local" bursts. Global bursts are associated with reduced spiking variability, slower movements, and are more frequent after stroke, while local bursts increase during recovery and grasp execution. The study provides compelling evidence that beta bursts with different spatial and temporal characteristics may play distinct roles in motor control and recovery.

      We thank the reviewer for their assessment that the manuscript proves compelling evidence for distinct roles of local and global beta bursts on motor control and recovery.  

      Strengths:

      The major strength of this paper lies in its conceptual advance: the identification and characterization of distinct global and local beta bursts in the primate motor cortex. This distinction builds upon and considerably extends previous work on the heterogeneity of beta bursts. The paper is methodologically rigorous, using simultaneous cortical and subcortical recordings, detailed behavioral tracking, and thorough analyses of spikeLFP interactions. The use of stroke models and neurotypical animals provides converging evidence for the functional dissociation between burst types. The observation that local bursts increase with motor recovery and occur during grasping is particularly novel and may prove valuable for developing biomarkers of motor function.

      We thank the reviewer for recognizing the strengths of this manuscript. 

      Weaknesses:

      There are several conceptual and methodological limitations that should be addressed. First, the burst detection method relies on an amplitude threshold (median + 1 SD), which is susceptible to false positives and variability (Langford & Wilson, 2025). The classification into global or local bursts then depends on the number of co-bursting channels, compounding the arbitrariness. Second, the imposition of a minimum of three co-bursting cortical channels may bias against the detection of truly local bursts. 

      We thank the reviewer for bringing up these methodological details. We plan to conduct a follow-up analysis using alternative burst detection methods to verify that the paper’s main results hold when using different burst detection methodologies. We anticipate this will improve confidence in our results. 

      Third, the classification is entirely cortical; subcortical activity is considered post hoc rather than integrated into the classification, despite the key role of subcortical-cortical synchrony in motor control. 

      We thank the reviewer for this comment. First, because the different animals had subcortical recording sites in different locations, we hesitate to use subcortical activity in the classification of bursts since we were not sure we would be identifying the same burst-phenomenon (e.g. thalamo-cortical bursts vs. capsule-cortical bursts may differ). Second, we believe that having a cortical-only criteria allows the designation of local vs. global bursts to be more widely applied in preparations that only have access to cortical data (e.g. surface ECoG recordings, EEG, Utah array recordings). Thus, in this study we chose to analyze the subcortical data post-hoc (after burst detection and classification) to support our “global” vs. “local” designation of burst types 

      Fourth, the apparent dissociation between global and local bursts raises important questions about their spatial distribution across areas like M1 and PMv, which are not thoroughly analyzed. 

      We thank the reviewer for this comment. In our study’s stroke animals, we chose to study PMv due to its role in compensating for damage to M1, thus we hesitate to make any comparisons between PMv (which was recorded in stroke animals) and M1 (recorded in healthy unimpaired animals). Furthermore, animals are doing different tasks (e.g. reaching vs. reaching and grasping) which may also influence the spatial distribution. We agree that future work should certainly investigate the spatial distribution of global vs. local beta bursts across areas of sensorimotor cortex and subcortex, and that this comparison would be best done in healthy animals with both reaching and grasping behaviors.  

      Finally, while the authors interpret local bursts during grasping as novel, similar findings have been reported (e.g., Szul et al., 2023; Rayson et al., 2023), and a deeper discussion of these precedents would strengthen the argument.

      Thank you for these references! We will review them and incorporate them into our discussion of our results. 

      Impact:

      This work is likely to have a substantial impact on the field of motor systems neuroscience. The distinction between global and local beta bursts offers a promising framework for understanding the dual roles of beta in motor inhibition and sensorimotor computation. The findings are relevant not only for basic research but also for translational efforts in stroke rehabilitation and neuromodulation, particularly given the emerging interest in beta burst-based biomarkers and stimulation targets. The dataset and analytical framework will be useful to researchers investigating beta dynamics, spike-field relationships, and recovery from neural injury.

      We thank the reviewers for their assessment that our work will likely have a substantial impact on the field of motor systems neuroscience. 

      Reviewer #2 (Public review):

      Summary:

      The paper by Khanna et al. describes global vs local beta synchrony between a cortical premotor area (PMv) and subcortical structures during motor tasks in the non-human primate, specifically investigating the progression following M1 injury. They found that increases in global beta synchrony between PMv and subcortical structures during the sub-acute phase of injury, and that global synchrony was associated with relatively slower motor movements. As recovery progressed, they report a shift from global synchrony to local synchrony and a subsequent reduction in the movement time. The authors suggest that global changes in subcortical and cortical beta synchrony may generally underpin a variety of movement disorders, including Parkinson's disease, and that shifting from global to local (or reducing global synchrony) might improve functional outcomes.

      Strengths:

      Ischemic insults and other acquired brain injuries have a significant public health impact. While there is a large body of clinical and basic science studies describing the behavioral, neurophysiological, and mechanistic outcomes of such injury, there is a significant lack studies looking at longitudinal, behaviorally-related neurophysiological measures following cortical injury, so any information has outsized contribution to understanding how brain injury disrupts underlying neural activity and how this may contribute to injury presentation and recovery.

      A significant percentage of pre-clinical stroke studies tend to focus on peri-infarct or other cortical structures and their role in recovery. The addition of subcortical recordings allows for the investigation of the role of thalamo-basal gangliar-cortical loops that may be contributing to the degree of impairment or to the recovery process is important for the field. Here, there are longitudinal (up to 3 months post-injury) recordings in the ventral premotor area (PMv) and either the internal capsule or sensorimotor thalamus that can be synchronized with phases of behavioral recovery.

      The methods are well described and can act as a framework for assessing synchrony across other data sets with similar recording locations. Limitations in methodology, recordings, and behavior were noted.

      We thank the reviewer for their comments on the strengths of this paper.  

      Weaknesses:

      A major limitation of this paper is that it is a set of case studies rather than a welldesigned, well-controlled study of beta synchrony following motor cortex injury. While non-human primate neurophysiological studies are almost always limited by extremely low animal numbers, they are made up for by the fact that they can acquire significant numbers of units or channels, and in the case of normal behavior, can obtain many behavioral trials over months of individual sessions. Here, there were two NHPs used, but they had different subcortical implant locations (thalamus vs internal capsule). They had different injury outcomes, with one showing a typical recovery curve following injury while one had complications and worsening behavior before ultimately recovering. Further, there were significant differences in the ability to record at different times, with one NHP having poor recordings early in the recovery process while one had poor recordings late in the process. Due to the injury, the authors report sessions in which they were not able to record many trials (~10). Assuming that recovery after a cortical injury is an evolving process, breaking analysis into "Early" and "Late" phases reduces the interpretation of where these shifts occur relative to recovery on the task, especially given different thresholds for recovery were used between animals. Because of this, despite a careful analysis of the data and an extensive discussion, the conclusions derived are not particularly compelling. To overcome this, the authors present data from neurotypical NHPs, but with electrodes in M1 rather than PMv, doing a completely different task with no grasping component, again making accurate conclusions about the results difficult. Even with low numbers, the study would have been much stronger if there were within-animal longitudinal data prior to and after the injury on the same task, so the impact of M1 injury could be better assessed.

      We thank the reviewer for these comments. Below we address some of these in more detail: 

      Different subcortical implant locations: We would like to clarify that the subcortical recordings were only used to confirm that global beta bursts (as characterized by cortical recordings alone) did indeed occur on subcortical sites coincidentally with cortical site more frequently than local beta bursts. Neither the beta burst categories nor the beta bursts themselves were influenced by the subcortical recordings.  

      Different injury outcomes: There is difficulty in creating strokes that result in identical deficits across animal as we and others have noted in previous work[1.3]. As a field, we are still understanding what factors give rise to variability in recovery curves. For example, one recent study noted that biological sex is a factor in predicting differences in recovery rates[4], and another noted that baseline white matter hyperintensities is also predictive of post-stroke recovery [5]. Overall, our methodology that creates structurally-consistent lesions can still result in very different functional outcomes depending on a variety of factors. Given this state of the field, we have done our best to match the recovery curves between our two animals, especially the initial recovery curves before Monkey H’s secondary decline. 

      Differences in ability to record at different times: We note this as a strength. One concern with these studies that induce stroke at the same time as implanting electrode arrays is that it is well appreciated that single-unit neuron yield right after array implantation is low and then improves in the following weeks [6]. There is always that concern that having more units later in recovery may drive results, but in this case, since one animal showed the opposite trend we are more confident that results are not driven by increases in unit-yield. We also note that we broadly see similar unit quality metrics in the early and late stages in both animals (Fig. S7).  

      Breaking continuous recovery curve into early and late: We note that this division was only made for one main analysis in the paper (Fig. 5CD): assessment of mean firing and variance of single-unit firing rates.  Without this split our analyses would be underpowered and inconclusive, thus we would not be able to provide any comment on how firing rates change, even coarsely, with recovery. 

      Presentation of data from M1 of healthy animals doing a different task: We agree that the strongest data would be longitudinally recorded from the same animals/brain areas pre-stroke and then post-stroke. However, we also view our inclusion of separate healthy animals doing a different task as evidence that our global vs. local segregation of beta bursts generalizes beyond the reach-to-grasp task to reaching-only tasks.  

      Overall, we appreciate the reviewer pointing out these notes about our data. In some cases we do not think these notes are concerning, in others, we acknowledge that have done the best we can given the state of the neurophysiology stroke recovery field. 

      It is unclear to what extent the subpial aspiration used is a stroke model. While it is much more difficult to perform a pure ischemic motor injury using electrocoagulatory methods in animal models that do not have a lissencephalic cortex, the suction ablation method that the authors use leads to different outcomes than an ischemic injury alone. For instance, in rat models, ischemic vs suction ablation leads to very different electrophysiological profiles and differences in underlying anatomical reorganization (see Carmichael and Chesselet, 2002), even if the behavioral outcomes were similar. There is a concern that the effects shown may be an artifact of the lesion model rather than informing underlying mechanisms of recovery.

      We thank the reviewer for bringing this up. 

      Clarification of our stroke model methodology: We wish to highlight that when we create stroke, we first do surface vessel occlusion as the first step. This is designed to match true ischemic injury. After a waiting period, the injured tissue is then aspiration to reduce the effects of edema and secondary mass effect in the model. 

      Carmichael and Chesselet 2002: The rodent work cited did show differential effects of a suction ablation method (without any surface vessel occlusion first) versus an ischemic method. The effects observed in this work were in the first 5 days following stroke. In our case, we started recording on day 7 and examined recovery over extended periods (weeks to months). 

      Effects of acute insult on rehabilitation: From a rehabilitation perspective, it remains unclear how the acute insult affects outcomes weeks and months later. One line of evidence to suggest that the manner that the acute insult occurs may not matter for rehabilitation is the observation that one therapeutic approach (vagus nerve stimulation) has been found to successfully improve rehabilitation outcomes in a range of injury models (intracranial hemorrhage, stroke, spinal cord injury). We agree that additional work is required in this area.

      Human stroke data shows similar results reported: Lastly, we note that neurophysiology performed in humans with clinical strokes supports the results we seek here (e.g.[7], see discussion section for full elaboration) suggesting that our stroke model methodology is similar enough to clinical stroke to result in similar results. 

      The injury model leads to seemingly mild impairments in grasp (but not reach), with rapid and complete recovery occurring within 2-3 weeks from the time of injury. Because of the rapid recovery, relating the physiological processes of recovery to beta synchronization becomes challenging to interpret - Are the global bursts the result of the loss of M1 input to subcortical structures? Are they due to the lack of M1 targets, so there is a more distributed response? Is this due to other post-injury sub-acute mechanisms? How specific is this response - is it limited to peri-infarct areas (and to what extent is the PMv electrode truly in peri-infarct cortex), or would this synchrony be seen anywhere in the sensorimotor networks? Are the local bursts present because global synchrony wanes over time as a function of post-injury homeostatic mechanisms, or is local beta synchrony increasing as new motor plans are refined and reinforced during task re-acquisition? How coupled are they related to recovery - if it is motor plan refinement, the shift from global to local seemingly should lag the recovery?  

      We think these are all wonderful questions that could be addressed in follow-up studies! 

      While the study has significant limitations in design that reduce the impact of the results, it should act as a useful baseline/pilot data set in which to build a more complete picture of the role of subcortical-cortical beta synchrony following cortical injury.

      We agree that this is a study that should be treated as a starting point for further investigation. 

      Reviewer #3 (Public review):

      Summary:

      Khanna et al. use a well-conceived and well-executed set of experiments and analyses primarily to document the interaction between neural oscillations in the beta range (here, 13-30 Hz) and recovery of function in an animal model of stroke. Specifically, they show that cortical "beta bursts", or short-term increases in beta power, correlate strikingly with the timeline of behavioral recovery as quantified with a reach-to-grasp task. A key distinction is made between global beta bursts (here, those that synchronize between cortical and subcortical areas) and local bursts (which appear on only a few electrodes). This distinction of global vs. local is shown to be relevant to task performance and movement speed, among other quantities of interest.

      A secondary results section explores the relationship between beta bursts and neuronal firing during the grasp portion of the behavioral task. These results are valuable to include, though mostly unsurprising, with global beta in particular associated with lower mean and variance in spike rates.

      Last, a partial recapitulation of the primary results is offered with a neurologically intact (uninjured) animal. No major contradictions are found with the primary results.

      Highlights of the Discussion section include a thoughtful review of atypical movements executed by individuals with Parkinson's disease or stroke survivors, placing the current results in an appropriate clinical context. Potential physiological mechanisms that could account for the observed results are also discussed effectively.

      Strengths:

      Overall, this is a very interesting paper. The ultimate impact will be enhanced by the authors' choice to analyze beta bursts, which remain a relatively under-explored aspect of neural coding.

      The reach-and-grasp task was also a well-considered choice; the combination of a relatively simple movement (reaching towards a target in the same location each time) and a more complex movement (a skilled object-manipulation grasp) provides an internal control of sorts for data analysis. In addition, the task's two sub-movements provide a differential in terms of their likelihood to be affected by the stroke-like injury: proximal muscles (controlling reach) are likely to be less affected by stroke, while distal muscles (controlling grasp) are highly likely to be affected. Lastly, the requirement of the task to execute an object lift maximizes its difficulty and also the potential translational impact of the results on human injury.

      The above comments about the task exemplify a strength that is more generally evident: a welcome awareness of clinical relevance, which is in evidence several times throughout the Results and Discussion.

      Weaknesses:

      The study's weaknesses are mostly minor and, for the most part, correctable.

      One concern that may not be correctable in this study: the results about the spatial extent of beta activity seem constrained by relatively poor-quality data. It seems half or more of the electrodes are marked as too noisy to provide useful data in Figure 3. If this reflects the wider reality for all analyses, as mentioned, it may not be correctable for the present study. In that case, perhaps some of the experiments or analyses can be revisited or expanded for a future study, when better electrode yields are available.

      We thank the reviewer for their comments. We note that we have chosen to be particularly conservative with which channels we considered noise-free and acceptable for analysis as our animals were not head-posted (see methods: “On each day, trials were manually inspected alongside camera data for any movement or chewing artifacts (note that animals were not head-posted) and were discarded from neural data analysis if there were any artifacts”). After re-visiting our analysis, we note that the data shown in Fig. 3 (spatial distribution of local bursts) is not representative from a data quality perspective – this data was from a session that had a particularly large number of channels discarded due to artifacts. We plan to correct this to show a more representative figure. 

      Other concerns:

      In some places, there is a lack of clarity in the presentation of the results. This is not serious but should be addressed to aid readers' comprehension.

      We thank the reviewer for this comment and for their numerous suggestions in the notes to the authors. We plan to address as many of these as we can to improve clarity and comprehension.  

      Lastly, given the central role of beta oscillations within the study, it would be better for completeness to include even a brief exploration of sustained beta power (rather than bursts), and the modulation of sustained beta (or lack thereof) in the study's areas of concern: behavioral recovery, task performance, etc.

      We thank the reviewer for this suggestion – we plan to include this in our revisions.  

      References cited in response to public reviewer comments: 

      (1) Ganguly, K., Khanna, P., Morecraft, R. J. & Lin, D. J. Modulation of neural co-firing to enhance network transmission and improve motor function after stroke. Neuron 110, 2363–2385 (2022).

      (2) Khanna, P. et al. Low-frequency stimulation enhances ensemble co-firing and dexterity after stroke. Cell 184, 912-930.e20 (2021).

      (3) Darling, W. G. et al. Sensorimotor Cortex Injury Effects on Recovery of Contralesional Dexterous Movements in Macaca mulatta. Exp Neurol 281, 37–52 (2016).

      (4) Bottenfield, K. R. et al. Sex differences in recovery of motor function in a rhesus monkey model of cortical injury. Biology of Sex Differences 12, 54 (2021).

      (5) Schwarz, A. et al. Association that Neuroimaging and Clinical Measures Have with Change in Arm Impairment in a Phase 3 Stroke Recovery Trial. Ann Neurol 97, 709– 719 (2025).

      (6) Gulati, T. et al. Robust Neuroprosthetic Control from the Stroke Perilesional Cortex. J. Neurosci. 35, 8653–8661 (2015).

      (7) Silberstein, P. et al. Cortico-cortical coupling in Parkinson’s disease and its modulation by therapy. Brain 128, 1277–1291 (2005).

    1. Author response:

      Reviewer #1 (Public review): 

      The manuscript by Butler et al. explores a novel physiological role for connexin 32 (Cx32) hemichannels in Schwann cells at peripheral nerves. Building on the authors' prior work on CO₂-sensitive gating of connexins, this study proposes that mitochondrial CO₂ production dependent on neuronal activity promotes the opening of Cx32 hemichannels in the paranode, which in turn modulates neuronal activity by reducing conduction velocity. This hypothesis is addressed using a multifaceted approach that includes immunofluorescence microscopy, dye uptake assays, calcium imaging, computational modeling, and extracellular recordings in isolated sciatic nerves. 

      Among the strengths of the study are the interdisciplinary integration of imaging, in silico approaches, and functional data. Also, this study proposes a new mechanism with profound physiological relevance. Specifically, Butler et al. provide new insights into glial modulation of electrical conduction in sensory/motor myelinated nerves. 

      In the current state, the study has some limitations. The evidence linking Cx32 to the observed dye uptake and conduction velocity changes relies primarily on pharmacological inhibition with carbenoxolone, which lacks specificity. The imaging data show overlapping marker signals that preclude the anatomical distinction between nodes and paranodes. FITC uptake, while convincing to test Cx32 hemichannel gating, lacks spatial-temporal information and validation of distribution and localization to viable intracellular compartments. Moreover, while the findings are intriguing, functional proof that Cx32 regulates conduction velocity through ATP release or other downstream effects remains incomplete. Further work using targeted genetic tools, live-tissue imaging, and additional controls would strengthen the mechanistic conclusions. 

      Overall, the manuscript offers compelling preliminary evidence that supports a new role for Cx32 in peripheral nerve physiology and raises important questions for future investigation. 

      We thank the reviewer for their comments and agree that the evidence for involvement of Cx32 is indirect. We are planning to perform genetic manipulations to strengthen this link. We shall review our presentation of the morphology in terms of the node/paranode/juxtaparanode distribution and adjust accordingly. We have in the interim generated new data using GCaMP transduced into Schwann cells that provides the live-tissue imaging that the reviewer requests. This strengthens our conclusions, and we will add these data into the paper.

      Reviewer #2 (Public review): 

      Summary: 

      This article aims to demonstrate that local production of CO₂ at the axonal node opens Cx32 hemichannels in the Schwann cell paranode, and that CO₂ diffuses through the AQP1 channel to reach Cx32 and trigger its opening. The authors also present evidence supporting a physiological role for this regulatory mechanism. They propose that CO₂-dependent Cx32 activation mediates activity-dependent Ca²⁺ influx into the paranode, and by increasing the leak current across the myelin sheath, it contributes to a slowing of action potential conduction velocity. 

      The study presents a very interesting and novel mechanism for the physiological regulation of Cx32 hemichannels. The findings are relevant to the field, and the methods and results are of good quality, with some improvements in interpretation and explanation required, and some minor experimental suggestions. 

      Strengths: 

      The article is solid in terms of the novelty of the findings and relevance for the physiology of myelinated axons. In addition, it is of major interest for the Connexin field because it explores a physiological way to open Cx32 hemichannels. The experiments are well elaborated, and most of them are sufficient for the main points described by the authors. The finding that nervous activity will trigger the mechanism of hemichannel opening by CO2 is probably the most relevant biological mechanism derived from this article. 

      Weaknesses: 

      Throughout the manuscript, the authors interpret their findings as if the described mechanism specifically occurs in the node and paranode regions. However, there is no direct evidence identifying the precise site of CO₂ production or the activation site of Cx32 hemichannels. Therefore, statements such as the one in the title ("activity-dependent CO₂ production in the axonal node opens Cx32 in the Schwann cell paranode") should be reconsidered or removed, as they may be misleading and are not essential to the interpretation of the data. In addition, the participation of aquaporin AQP1 as the main conduit for CO2 diffusion through the plasma membrane could have another interpretation. 

      We thank the reviewer for their comments and agree that we do not have direct evidence for the site of CO2 production or the site of activation of Cx32 hemichannels. This direct evidence is extremely difficult to obtain, and we therefore depend on indirect arguments. Mitochondria represent the major source of CO2, and their distribution will therefore indicate where CO2 is likely to be produced. We agree that this is not essential to the interpretation of the data and will adjust the text as recommended. We will add a section to the Discussion to consider this point in more detail.

    1. Author response:

      Reviewer #1 (Public review):

      The main limitations of this article are that it provides insufficient detail on VR implementation. The design of the VR environment is, at this stage, under-described. Crucial information is missing, such as the number of pineapples per block, timing precision, details on how motion is mapped to the virtual movement, etc. This aspect strongly limits the reproducibility of the experiments. A second limitation lies in the lack of clarity regarding the study hypotheses. Although two overarching hypotheses can be inferred, they are not explicitly formulated. To this end, it is unclear which analyses were merely exploratory, especially for physiological and EEG outcomes.

      In Experiment 2, the reduction in vigor during tonic pain could plausibly reflect attentional load rather than pain per se. As recognized by the authors, there is no control condition involving an innocuous salient stimulus to rule out non-specific effects of distraction. Perhaps a tonic non-painful but salient somatosensory stimulus (e.g., a strong vibrotactile stimulus applied on the same arm) could have been used as a control stimulus.

      We appreciate the reviewer's comments regarding the insufficient implementation details. We hope the newly uploaded software for reproducing the experiment can improve the reader's understanding of the task. In addition to making the software available, we will expand the Methods section in the revised manuscript to include greater detail on the task description.

      The hypothesised functions of phasic and tonic pain, and their collaborative interaction, are both broad and deep topics. In the revised manuscript, we will more explicitly formulate our hypotheses and clarify the distinction between a priori predictions and exploratory analyses, particularly concerning the extent to which our evidence supports these hypotheses.

      We agree that examining the potential role of attentional load on the interaction between tonic and phasic pain is an important area of future investigation. Addition of additional control conditions matched for attentional salience with additional experiments is possible but introduces other confounds related to their different qualities (e.g. a salient vibrotactile stimulus might invigorate behaviour): however more fundamentally, attentional processes are a core part of pain function, and should not necessarily be viewed as a confound (i.e. the way that pain mediates some of its core functional effects may directly be through its salient attentional nature) . This view is formalised in Wall and Melzack’s classical tripartite model of pain, and distinguishes pain from purely sensory systems such as somatosensation, vision and so on..

      Reviewer #2 (Public review):

      Two critical issues require clarification or justification. First, phasic pain was induced using electrical stimulation, which typically elicits somatosensory evoked potentials (SEPs). These responses may not reflect pain-specific processes and thus complicate interpretation. This issue bears directly on the study's conclusions, especially when discussing interactions between phasic and tonic pain. For example, tonic pain is known to reduce perceived intensity or cortical responses to phasic pain stimuli delivered elsewhere on the body - an effect not expected for SEPs elicited by electrical stimuli.

      We acknowledge the reviewer’s concern regarding the specificity of evoked potentials elicited by electrical stimulation. We agree that traditional SEPs—particularly those evoked by large surface electrodes—primarily reflect activation of non-nociceptive A-beta fibres and thus may not reliably index pain-specific processes or be modulated by tonic pain via descending nociceptive control. However, we would like to clarify that phasic pain was administered in the present study using small-diameter concentric ‘Wasp’ electrodes. These are comparable to intraepidermal electrodes shown to preferentially activate nociceptive A-delta fibres, thereby eliciting ERPs more closely associated with nociceptive processing rather than mixed somatosensory input [1, 2]. Accordingly, our ERP results demonstrated a reliable increase in N1-P2 amplitude with higher phasic pain intensity, suggesting that the evoked responses captured stimulus-evoked nociceptive processing.

      We acknowledge that these ERPs may still reflect mixed sensory processing and thus may not be fully modulated by tonic pain. Previous studies have shown that ERPs elicited by nociceptive electrical stimulation can be attenuated during tonic pain using cold-water immersion in CPM paradigms [3, 4]. However, these studies typically employ passive tasks, whereas our paradigm involved continuous voluntary behaviour during sustained tonic pressure pain. This difference in task context may engage distinct modulatory systems, possibly prioritising behavioural adaptation over sensory gating.

      We will revise the manuscript to acknowledge these factors and to encourage a more nuanced interpretation of the ERP findings in light of this literature.

      Second, additional control experiments are necessary to rule out alternative explanations. For instance, the authors are suggested to deliver phasic pain to the contralateral arm (e.g., at 1-2 Hz), which might also reduce action velocity. Similarly, tonic pain applied to the grasping hand should be tested to disentangle hand-specific effects.

      We are grateful to the reviewer for this suggestion. In the current study, phasic pain was delivered to the grasping hand to generate a coherent, spatially congruent representation of virtual stimuli (painful fruit) and behavioural consequences (pain upon grasp). Delivering phasic pain stimuli to the contralateral hand would be incongruent with the task design and may alter the interpretation of the learning signal, which was central to our computational modelling framework. Similarly, tonic pain was not applied to the grasping hand to avoid interfering with motor control. Applying tonic pain to the grasping hand would make it extremely difficult for participants to effectively grasp the hand controller, thereby complicating the interpretation of behavioural and neural measures. We will discuss these issues in the revision. Therefore, while we agree that such manipulations could be informative for future studies, they were not the focus of the current investigation.

      Reviewer #3 (Public review):

      Despite these strengths, the manuscript would benefit significantly from more precise definitions of key concepts and an overall clearer, more coherent presentation of its main arguments. The writing, in its current form, often presents claims that are too vague or insufficiently connected with the experimental findings. Moreover, certain aspects of the computational modeling and statistical analysis appear flawed or inadequately justified.

      We thank the reviewer for highlighting the need for clearer definitions and a more coherent presentation. In the revised manuscript, we will refine our definitions of key concepts and improve the presentation of hypothesised functions of phasic and tonic pain. As stated previously, we will clarify the extent to which our evidence supports these hypotheses. We also appreciate the feedback on our statistical analysis and computational modelling. We will address these points and provide the necessary clarifications and justifications in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      To elucidate the mechanisms and evolution of animal biomineralization, Voigt et al. focused on the sponge phylum - the earliest branching extant metazoan lineages exhibiting biomineralized structures - with a particular emphasis on deciphering the molecular underpinnings of spicule formation. This study centered on calcareous sponges, specifically Sycon ciliatum, as characterized in previous work by Voigt et al. In S. ciliatum, two morphologically distinct spicule types are produced by a set of two different types of cells that secrete extracellular matrix proteins, onto which calcium carbonate is subsequently deposited. Comparative transcriptomic analysis between a region with active spicule formation and other body regions identified 829 candidate genes involved in this process. Among these, the authors focused on the calcarine gene family, which is analogous to the Galaxins, the matrix proteins known to participate in coral calcification. The authors performed three-dimensional structure prediction using AlphaFold, examined mRNA expression of Calcarin genes in spiculeforming cell types via in situ hybridization, conducted proteomic analysis of matrix proteins isolated from purified spicules, and carried out chromosome arrangement analysis of the Calcarin genes.

      Based on these analyses, it was revealed that the combination of Calcarin genes expressed during spicule formation differs between the founder cells-responsible for producing diactines and triactinesand the thickener cells that differentiate from them, underscoring the necessity for precise regulation of Calcarin gene expression in proper biomineralization. Furthermore, the observation that 4 Calcarin genes are arranged in tandem arrays on the chromosome suggests that two rounds of gene duplication followed by neofunctionalization have contributed to the intricate formation of S. ciliatum spicules. Additionally, similar subtle spatiotemporal expression patterns and tandem chromosomal arrangements of Galaxins during coral calcification indicate parallel evolution of biomineralization genes between S. ciliatum and aragonitic corals. 

      Strengths: 

      (1) An integrative research approach, encompassing transcriptomic, genomic, and proteomic analyses as well as detailed FISH. 

      (2) High-quality FISH images of Calcarin genes, along with a concise summary clearly illustrating their expression patterns, is appreciated. 

      (3) It was suggested that thickener cells originate from founder cells. To the best of my knowledge, this is the first study to demonstrate trans-differentiation of sponge cells based on the cell-typespecific gene expression, as determined by in situ hybridization. 

      (4) The comparison between Calcarins of Calcite sponge and Galaxins of aragonitic corals from various perspective-including protein tertiary structure predictions, gene expression profiling during calcification, and chromosomal sequence analysis to reveal significant similarities between them. 

      We thank the reviewer for this assessment. 

      (1) The conclusions of this paper are generally well supported by the data; however, some FISH images require clearer indication or explanation.

      We have modified Fig. 3 by including some insets indicating the depicted part of the sponge body and to change the color-scheme as suggested by reviewer3 for the FISH images. In accordance to the following comment, we decided to remove single-channel views in Fig. 3 A. 

      (2) Figure S2 (B, C, D): The fluorescent signals in these images are difficult to discern. If the authors choose to present signals at such low magnification, enhancing the fluorescence signals would improve clarity. Additionally, incorporating Figure S2A as an inset within Figure S2E may be sufficient to convey the necessary information about signal localization. 

      We changed the figure according to the suggestions.

      (3) Figure S3A: The claim that Cal2-expressing spherical cells are closely associated with the choanoderm at the distal end of the radial tube is difficult to follow. Are these Cal2-expressing spherical cells interspersed among choanoderm cells, or are they positioned along the basal surface of the choanoderm? Clarifying their precise localization and indicating it in the image would strengthen the interpretation. 

      In the figure, the view is on the choanoderm that lines the inner surface of the radial tube. Our interpretation is that the spherical cells are positioned at the basal surface of the choanoderm. We updated Fig. S3, which now includes another view to support our interpretation and also indicate some choanocytes.

      (4) To further highlight the similarities between S.ciliatum and aragonitic corals in the molecular mechanisms of calcification, consider including a supplementary figure providing a concise depiction of the coral calcification process. This would offer valuable context for readers.

      We considered this suggestion, and have included such a supplementary figure (Fig. S9).

      Reviewer #2 (Public review): 

      Summary: 

      This paper reports on the discovery of calcarins, a protein family that seems involved in calcification in the sponge Sycon ciliatum, based on specific expression in sclerocytes and detection by mass spectrometry within spicules. Two aspects stand out: (1) the unexpected similarity between Sycon calcarins and the galaxins of stony corals, which are also involved in mineralization, suggesting a surprising, parallel co-option of similar genes for mineralization in these two groups; (2) the impressively cell-type-specific expression of specific calcarins, many of which are restricted to either founder or thickener cells, and to either diactines, triactines, or tetractines. The finding that calcarins likely diversified at least partly by tandem duplications (giving rise to gene clusters) is a nice bonus. 

      Strengths: 

      I enjoyed the thoroughness of the paper, with multiple lines of evidence supporting the hypothesized role of calcarins: spatially and temporally resolved RNAseq, mass spectrometry, and whole-mount in situ hybridization using CISH and HCR-FISH (the images are really beautiful and very convincing). The structural predictions and the similarity to galaxins are very surprising and extremely interesting, as they suggest parallel evolution of biomineralization in sponges and cnidarians during the Cambrian explosion by co-option of the same "molecular bricks". 

      Weaknesses: 

      I did not detect any major weakness, beyond those inherent to working with sponges (lack of direct functional inhibition of these genes) or with fast-evolving gene families with complex evolutionary histories (lack of a phylogenetic tree that would clarify the history of galaxins/calcarins and related proteins). 

      We thank the reviewer for this assessment and the detailed comments be addressed below.

      Reviewer #3 (Public review):

      Summary: 

      The study explores the extent to which the biomineralization process in the calcitic sponge Sycon ciliatum resembles aragonitic skeleton formation in stony corals. To investigate this, the authors performed transcriptomic, genomic, and proteomic analyses on S. ciliatum and examined the expression patterns of biomineralization-related genes using in situ hybridization. Among the 829 differentially expressed genes identified in sponge regions associated with spicule formation, the authors focused on calcarin genes, which encode matrix proteins analogous to coral galaxins. The expression patterns of calcarins were found to be diverse but specific to particular spicule types. Notably, these patterns resemble those of galaxins in stony corals. Moreover, the genomic organization of calcarine genes in S. ciliatum closely mirrors that of galaxin genes in corals, suggesting a case of parallel evolution in carbonate biomineralization between calcitic sponges and aragonitic corals. 

      Strengths: 

      The manuscript is well written, and the figures are of high quality. The study design and methodologies are clearly described and well-suited to addressing the central research question. Particularly noteworthy is the authors´ integration of various omics approaches with molecular and cell biology techniques. Their results support the intriguing conclusion that there is a case of parallel evolution in skeleton-building gene sets between calcitic sponges and aragonitic corals. The conclusions are well supported by the data and analyses presented. 

      Weaknesses: 

      The manuscript is strong, and I have not identified any significant weaknesses in its current form. 

      We thank the reviewer for the insight and addressed the detailed comments below.

      Reviewer #1 (Recommendations for the authors): 

      The description of the region "radial tube" is unclear. Please define and explain it at its first mention in the manuscript, and, if possible, refer to the appropriate figure(s) (e.g., Figure 1A). 

      We now explain radial tubes at the beginning of the results and added a label in figure 1A. “Sycon ciliatum is a tube-shaped sponge with a single apical osculum and a sponge wall of radial tubes around the central atrium (Fig. 1A). The radial tubes are internally lined with choanoderm, which forms elongated chambers in an angle of approximately 90° to the tube axis”. 

      Reviewer #2 (Recommendations for the authors): 

      Scientific suggestions: 

      (1) Page 13: "Despite their presence in the same orthogroups, the octocoral and stony coral proteins were only distantly related to the calcareous sponge calcarins (e.g., 12-24% identity between octocoral and calcareous sequences in orthogroup Cal 2-4-6), resulting in poor alignment. Their homology to calcarins, therefore, remains to be determined." Could 3D structures of these coral proteins be predicted with AlphaFold to substantiate (or nuance) the comparison with calcarins? 

      We run additional alphafold predictions for two octocoral and two scleractinian galaxins. A galaxin-like sequence from Pinnigorgia flava was only a short fragment and therefore we did not attempt any structure predictions. The result shows that the octocoral galaxin-like proteins show some structural similarity (12 beta-harpins), while the scleractinian galaxin-like proteins differ from the sponge counterparts of the same orthogroup. We added this information to the results and in the new Fig. S7.

      Minor improvements to the text: 

      (1)  Page 7 : "The expression of Cal1 to Cal8 was investigated using chromogenic in situ hybridization (CISH) and hairpin-chain reaction fluorescence in situ hybridization (HCR-FISH), confirming their presence in sclerocytes." - Figure 3 should be cited here. 

      We refer to the figure now.

      (2) Page 8-9: "Cal6 expression mirrors that of Cal2, occurring in rounded cells at the distal tip of radial tubes and in a ring of cells around the oscular ring." - Please cite a figure here. 

      We refer now to Fig. 3K

      (3) Page 11-12: Please define eigengene, this term is not necessarily common knowledge. 

      We provide now a short definition in this sentence: “ The analysis provided eight meta-modules, of which four showed significant changes in expression module eigengenes —summary profiles that capture the overall expression pattern of each module— between samples with high spicule formation context (osculum region and regeneration stages older than four days) and samples with low spicule formation (sponge-wall and early regeneration stages until day 3-4) (Fig. S5).” 

      (4) Page 13: "Species without skeletons, such as the cnidarians Hydra, Actinia, Exaiptasia, and Nematostella, also possess galaxin-like proteins." This is too concise - can you explain what evidence was used? PANTHER, AlphaFold, OrthoFinder, Blastp...? 

      The evidence used is from PANTHER, and we enhanced clarification of this by modifying the last sentence of the section.

      (5) Page 20: "We have identified calcarins, galaxin-like proteins, as crucial components of the biomineralization toolkit in calcareous sponges." I'm not sure you showed they are crucial (this would require functional evidence). Perhaps "novel" components or some other adjective would fit better. 

      We changed the adjective to “novel”.

      Suggestions for the figures: 

      (1) Figure 1A: radial tubes should be labelled. 

      A label was added.

      (2) Figure 3 is beautiful but hard to parse. The name of all markers should be written on each panel (notably B, C, and D) and ideally placed in a consistent position (top right corner?) so that the reader's eye doesn't have to look for them anew in each panel. Consider depicting the same gene with the same color in all panels if possible (confocal imaging gives virtual colors anyway, there's no reason to be bound to the real-life color of the fluorophores used - if that was the original intent). Finally, the red/green color scheme is not colorblind-readable, so please consider switching to another scheme (white/cyan/magenta, for example).

      We have updated the figure according to the suggestions. The names of all markers are now included on each panel. Placing them in the upper right corner was not feasible for all panels, so we adjusted their placement as needed. Reoccurring genes are shown in the same color where possible. To improve accessibility for individuals with red/green color vision deficiency, we adopted a cyan/magenta/yellow color scheme. Each HCR-FISH image was processed in ImageJ by splitting the image into channels, applying cyan, magenta, or yellow lookup tables, converting each channel to RGB, and then stacking and blending them using the Z-Project function with maximum intensity projection. Since the original channel information is not preserved after this processing, we provide the original red/green/blue version of the figure in the supplementary material in Fig S11. Additionally, we added small sketches of Figure 1A to indicate the sponge body regions depicted, where relevant.

      (3) Figure S3: the blue staining is not explained. It is also unclear where choanocytes are - could individual choanocytes be indicated with arrows or lines? 

      We added the information to the figure legend. The blue channel shows “Autofluorescence detected with the Leica TXR filter (approx. 590–650 nm), included to help distinguish true signal from background autofluorescence observed in the FITC channel (used for Spiculin detection).”

      Reviewer #3 (Recommendations for the authors): 

      I have no major concerns about the manuscript - only minor edits and comments, which are listed below: 

      (1) On page 13, the authors refer to Figure S8; however, I believe this should be Figure S7. 

      We now refer to the correct Figure. Because of introducing a new Fig. S7, now the correct reference is Fig. S8.

      (2) On page 16, please correct "Spciulin" to "Spiculin". 

      Now corrected.

      (3) On page 17, there are two commas following "(Sycon)"; please remove one. 

      Corrected.

      (4) In the Data Accessibility section, none of the provided links appear to work. Please ensure all links are functional. 

      We apologize for this oversight and now provide working links. 

      (5) In Figure 3, the description of panel L is missing from the figure legend. 

      We added the description of this panel.

      (6) On page 39, change "Fig. 4" to "Figure 4" to maintain consistency throughout the manuscript. 

      Changed.

      (7) Figure S7 is not cited in the main text. Please, address this. 

      Corrected (see above at point 1)

      (8) In the legend for Table S2, the reference to Soubigou et al. (3) is incorrect, as it is not listed in the SI reference section. Please correct this. 

      Soubigou et al. (2020) is now included in the SI reference list.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      Migration of the primordial germ cells (PGCs) in mice is asynchronous, such that leading and lagging populations of migrating PGCs emerge. Prior studies found that interactions between the cells the PGCs encounter along their migration routes regulates their proliferation. In this study, the authors used single cell RNAseq to investigate PGC heterogeneity and to characterize their niches during their migration along the AP axis. Unlike prior scRNAseq studies of mammalian PGCs, the authors conducted a time course covering 3 distinct stages of PGC migration (pre, mid, and post migration) and isolated PGCs from defined somite positions along the AP axis. In doing so, this allowed the authors to uncover differences in gene expression between leading and lagging PGCs and their niches and to investigate how their transcript profiles change over time. Among the pathways with the biggest differences were regulators of actin polymerization and epigenetic programming factors and Nodal response genes. In addition, the authors report changes in somatic niches, specifically greater non-canonical WNT in posterior PGCs compared to anterior PGCs. This relationship between the hindgut epithelium and migrating PGCs was also detected in reanalysis of a previously published dataset of human PGCs. Using whole mount immunofluorescence, the authors confirmed elevated Nodal signaling based on detection of the LEFTY antagonists and targets of Nodal during late stage PGC migration. Taken together, the authors have assembled a temporal and spatial atlas of mouse PGCs and their niches. This resource and the data herein provide support for the model that interactions of migrating mouse PGCs with their niches influences their proliferation, cytoskeletal regulation, epigenetic state and pluripotent state.

      Overall, the findings provide new insights into heterogeneity among leading and lagging PGC populations and their niches along the AP axis, as well as comparisons between mouse and human migrating PGCs. The data are clearly presented, and the text is clear and well-written. This atlas resource will be valuable to reproductive and developmental biologists as a tool for generating hypotheses and for comparisons of PGCs across species.

      Strengths:

      (1) High quality atlas of individual PGCs prior to, during and post migration and their niches at defined positions along the AP axis.

      (2) Comparisons to available datasets, including human embryos, provide insight into potentially conserved relationships among PGCs and the identified pathways and gene expression changes.

      (3) Detailed picture of PGC heterogeneity.

      (4) Valuable resource for the field.

      (5) Some validation of Nodal results and further support for models in the literature based on less comprehensive expression analysis.

      Weaknesses:

      (1) No indication of which sex(es) were used for the mouse data and whether or not sex-related differences exist or can excluded at the stages examined. This should be clarified.

      We have added: “Embryos of both sexes were pooled without genotyping, as the timepoints analyzed were prior to sex specification” to both the Animals section of the Materials and Methods and the Figure 1 legend. In addition, bioinformatic evaluation of potential sex biases in Nodal-Lefty signaling using Y-chromosome gene expression is reported in supplementary figure 4 and discussed in Discussion paragraph 2.

      Reviewer #2 (Public review):

      Summary:

      This work addresses the question of how 'leading' and 'lagging' PGCs differ, molecularly, during their migration to the mouse genital ridges/gonads during fetal life (E9.5, E10.5, E11.5), and how this is regulated by different somatic environments encountered during the process of migration. E9.5 and E10.5 cells differed in expression of genes involved in canonical WNT signaling and focal adhesions. Differences in cell adhesion, actin cytoskeletal dynamics were identified between leading and lagging cells, at E9.5, before migration into the gonads. At E10.5, when some PGCs have reached the genital ridges, differences in Nodal signaling response genes and reprogramming factors were identified. This last point was verified by whole mount IF for proteins downstream of Nodal signaling, Lefty1/2. At E11.5, there was upregulation of genes associated with chromatin remodeling and oxidative phosphorylation. Some aspects of the findings were also found to be likely true in human development, established via analysis of a dataset previously published by others.

      Strengths:

      The work is strong in that a large number of PGCs were isolated and sequenced, along with associated somatic cells. The authors dealt with problem of very small number of migrating mouse PGCs by pooling cells from embryos (after ascertaining age matching using somite counting). 'Leading' and 'lagging' populations were separated by anterior and posterior embryo halves and the well-established Oct4-deltaPE-eGFP reporter mouse line was used.

      Weaknesses:

      The work seems to have been carefully done, but I do not feel the manuscript is very accessible, and I do not consider it well written. The novel findings are not easy to find. The addition of at least one figure to show the locations of putative signaling etc. would be welcome.

      Thank you for the excellent suggestion. Fig. 6 has been added to highlight the main novel findings of this work and integrate them among contributions of earlier studies to provide a more complete view of signaling pathways and cell behaviors governing PGC migration.

      (1) The initial discussion of CellRank analysis (under 'Transcriptomic shifts over developmental time...' heading) is somewhat confusing - e.g. If CellRank's 'pseudotime analysis' produces a result that seems surprising (some E9.5 cells remain in a terminal state with other E9.5 cells) and 'realtime analysis' produces something that makes more sense, is there any point including the pseudotime analysis (since you have cells from known timepoints)? Perhaps the 'batch effects' possible explanation (in Discussion) should be introduced here. Do we learn anything novel from this CellRank analysis? The 'genetic drivers' identified seem to be genes already known to be key to cell transitions during this period of development.

      Thank you for this important observation. We have clarified the text in this section and added “This discrepancy may reflect differences in differentiation potential of some E9.5 PGCs that end in a terminal state among anterior E9.5 PGCs, but could also result from technical batch effects generated during library preparation. These possible interpretations are further discussed in the Discussion section.” to the pertinent results section and added additional relevant thoughts on the implications of this finding in Discussion paragraphs 4 and 7. We feel that it is important to include both results to the reader, as it is challenging to differentiate between heterogeneous developmental and migratory potential among E9.5 anterior PGCs and differential influence of batch effects across sequencing libraries with the data available.

      (2) In Discussion - with respect to Y-chromosome correlation, it is not clear why this analysis would be done at E10.5, when E11.5 data is available (because some testis-specific effect might be more apparent at the later stage).

      Since we had identified autocrine Nodal signaling primarily in anterior late migratory PGCs at E10.5 and knew that Nodal signaling was involved in sex specification of testicular germ cells into prospermatogonia by E12.5, we wanted to determine whether the Nodal signaling in late migratory PGCs at E10.5 was likely to be a sex-specific effect or was common to PGCs in both sexes. This was assessed in supplementary figure 4 and determined unlikely to be related to sex specification of PGCs as Nodal signaling was not strongly correlated with Y-chromosome transcripts in migratory PGCs. Assessing the relationship between Nodal signaling and Y-chromsome transcription at E11.5, when migration is complete, would be unlikely to help us further understand the dynamics of Nodal signaling during late PGC migration.

      (3) Figure 2A - it seems surprising that there are two clusters of E9.5 anterior cells

      Thank you for the interesting observation! One possibility is that the two states represent differential developmental competence as is suggested by the presence of one E9.5 anterior cluster along the differentiation trajectory in Fig 2A and one not within this differentiation trajectory. Another is that technical aspects of generating these sequencing libraries affected some cells more than others, resulting in clustering of highly affected and less affected cells, which would also be consistent with some E9.5 anterior cells lying within the differentiation trajectory and some not. Since it is challenging to differentiate between these possibilities with the data available, we have intentionally avoided overstating interpretations of this result in the manuscript text. We have included discussion of the potential implications of the transcriptional divergence you identify in Discussion paragraphs 4 and 7.

      (4) Figure 5F - there does seem to be more LEFTY1/2 staining in the anterior region, but also more germ cells as highlighted by GFP

      This is true; based on our selected anatomic landmarks for “anterior” and “posterior” as indicated in Methods, the “anterior” compartment typically contains more PGCs. Thus, we have included violin plots with all data points shown of signal intensities of both LEFTY1/2 and pSMAD2/3 in Fig. 5G and 5I so that the reader can evaluate the entire distribution of PGC signal intensities for each embryo.

      Reviewer #3 (Public review):

      Summary:

      The migration of primordial germ cells (PGCs) to the developing gonad is a poorly understood, yet essential step in reproductive development. Here, the authors examine whether there are differences in leading and lagging migratory PGCs using single-cell RNA sequencing of mouse embryos. Cleverly, the authors dissected embryonic trunks along the anterior-to-posterior axis prior to scRNAseq in order to distinguish leading and lagging migratory PGCs. After batch corrections, their analyses revealed several known and novel differences in gene expression within and around leading and lagging PGCs, intercellular signaling networks, as well as number of genes upregulated upon gonad colonization. The authors then compared their datasets with publicly available human datasets to identify common biological themes. Altogether, this rigorous study reveals several differences between leading and lagging migratory PGCs, hints at signatures for different fates among the population of migratory PGCs, and provides new potential markers for post-migratory PGCs in both humans and mice. While many of the interesting hypotheses that arise from this work are not extensively tested, these data provide a rich platform for future investigations.

      Strengths:

      The authors have successfully navigated significant technical challenges to obtain a substantial number of mouse migratory primordial germ cells for robust transcriptomic analysis. Here the authors were able to collect quality data on ~13,000 PGCs and ~7,800 surrounding somatic cells, which is ten times more PGCs than previous studies.

      The decision to physically separate leading and lagging primordial germ cells was clever and well-validated based on expected anterior-to-posterior transcriptional signatures.

      Within the PGCs and surrounding tissues, the authors found many gene expression dynamics they would expect to see both along the PGC migratory path as well as across developmental time, increasing confidence in the new differentially expressed genes they found.

      The comparison of their mouse-based migratory PGC datasets with existing human migratory PGC datasets is appreciated.

      The quality control, ambient RNA contamination elimination, batch correction, cell identification and analysis of scRNAseq data were thorough and well-done such that the new hypotheses and markers found through this study are dependable.

      The subsetting of cells in their trajectory analysis is appreciated, further strengthening their cell terminal state predictions.

      Weaknesses:

      Although it is useful to compare their mouse-based dataset with human datasets, the authors used two different analysis pipelines for each dataset. While this may have been due to the small number of cells in the human dataset as mentioned, it does make it difficult to compare them.

      Direct comparisons between findings in human and mouse focused on CellChat cell-cell communication prediction results, which were conducted in an identical fashion using the same analysis methods for both datasets.

      There were few validation experiments within this study. For one such experiment, whether there is a difference in pSMAD2/3 along the AP axis is unclear and not quantified as was nicely done for Lefty1/2.

      Additional validation of the pSMAD2/3 signal intensity along the AP axis was performed and is now included in Fig. 5.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #2 (Public review): 

      In this manuscript, Tiedje and colleagues longitudinally track changes in parasite numbers across four time points as a way of assessing the effect of malaria control interventions in Ghana. Some of the study results have been reported previously, and in this publication, the authors focus on age-stratification of the results. Malaria prevalence was lower in all age groups after IRS. Follow-up with SMC, however, maintained lower parasite prevalence in the targeted age group but not the population as a whole. Additionally, they observe that diversity measures rebound more slowly than prevalence measures. This adds to a growing literature that demonstrates the relevance of asymptomatic reservoirs. 

      Strengths:  

      Overall, I found these results clear, convincing, and well-presented. There is growing interest in developing an expanded toolkit for genomic epidemiology in malaria, and detecting changes in transmission intensity is one major application. As the authors summarize, there is no one-size-fits-all approach, and the Bayesian MOIvar estimate developed here has the potential to complement currently used methods, particularly in regions with high diversity/transmission. I find its extension to a calculation of absolute parasite numbers appealing as this could serve as both a conceptually straightforward and biologically meaningful metric.

      We thank the reviewer for this positive review of our results and approach.

      Weaknesses:

      While I understand the conceptual importance of distinguishing among parasite prevalence, mean MOI, and absolute parasite number, I am not fully convinced by this manuscript's implementation of "census population size".

      This reviewer remains unconvinced of the use of the term “census population size”. This appears to be due to the dependence of the term on sample size rather than representing a count of a whole population. To give context to our use we are clear in the study presented that the term describes a count of the parasite “strains” in an age-specific sample of a human population in a specified location undergoing malaria interventions. 

      They have suggested instead using “sample parasite count”.  We argue that this definition is too specific and less applicable when we extrapolate the same concept to a different denominator, such as the population in a given area. Importantly, our ecological use of a census allows us to count the appearance of the same strain more than once should this occur in different people. 

      The authors reference the population genetic literature, but within the context of that field, "census population size" refers to the total population size (which, if not formally counted, can be extrapolated) as opposed to "effective population" size, which accounts for a multitude of demographic factors. There is often interesting biology to be gleaned from the magnitude of difference between N and Ne.

      As stated in the introduction we have been explicit in saying that we are not using a population genetic framework. Exploration of N and Ne in population genetics has merit. How this is reconciled when using a “strain” definition and not neutral markers would need to be assessed.  

      In this manuscript, however, "census population size" is used to describe the number of distinct parasites detected within a sample, not a population. As a result, the counts do not have an immediate population genetic interpretation and cannot be directly compared to Ne. This doesn't negate their usefulness but does complicate the use of a standard population genetic term.

      We are clear we are defining a census of parasite strains in an age-specific sample of a population living in two catchment areas of Bongo District. We appreciate the concern of the reviewer and have now further edited the relevant paragraphs in both the Introduction (Lines 75-80) and the Discussion (Lines 501-506) to make very clear the dependence of the reported quantity on sample size, but also its feasible extrapolation consistent with the census of a population. 

      In contrast, I think that sample parasite count will be most useful in an epidemiological context, where the total number of sampled parasites can be contrasted with other metrics to help us better understand how parasites are divided across hosts, space and time. However, for this use, I find it problematic that the metric does not appear to correct for variations in participant number. For instance, in this study, participant numbers especially varied across time for 1-5 year-olds (N=356, 216, 405, and 354 in 2012, 2014, 2015, and 2017 respectively).

      The reviewer has made an important point that for the purpose of comparisons across the four surveys or study time points (i.e., 2012, 2014, 2015, and 2017), we should "normalize" the number of individuals considered for the calculation of the "census population size".  Given that this quantity is a sum of the estimated MOI<sub>var,,</sub> we need to have constant numbers for its values to be compared across the surveys, within age group and the whole population. This is needed not only to get around the issue of the drop in 1-5 year olds surveyed in 2014 but to also stabilize the total number of individuals for the whole sample and for specific age groups. One way to do this is to use the smaller sample size for each age group across time, and to use that value to resample repeatedly for that number of individuals for surveys where we have a larger sample size. This has now been updated included in the manuscript as described in the Materials and Methods (Lines 329-341) and in the Results (Lines 415-430; see updated Figure 4 and Table supplement 7). This correction produces very similar results to those we had presented before (see updated Figure 4 and Table supplement 7).   

      As stated in our previous response we have used participant number in an interrupted time series where the population was sampled by age to look at age-specific effects of sequential interventions IRS and SMC. As shown in Table supplement 1 of the 16 age-specific samples of the total population, we have sampled very similar proportions of the population by age group across the four surveys. The only exception was the 1-5 year-old age group during the survey in 2014. We are happy to provide additional details to further clarify the lower number (or percentage) of 1-5 year olds (based on the total number of participants per survey) in 2014 (~12%; N = 216) compared to the other surveys conducted 2012, 2015, and 2017 (~18-20%; N = 356, 405, and 354, respectively). Please see Table supplement 1 for the total number of participants surveyed in each of the four surveys (i.e., 2012, 2014, 2015, and 2017).   

      This sample size variability is accounted for with other metrics like mean MOI. 

      We agree that mean MOI by age presents a way forward with variable samples to scale up. Please see updated Figure supplement 8.  

      In sum, while the manuscript opens up an interesting discussion, I'm left with an incomplete understanding of the robustness and interpretability of the new proposed metric.”

      We thank you for your opinion. We have further edited the manuscript to make clear our choice of the term and the issue of sample size.  We believe the proposed terminology is meaningful as explained above.

      Reviewer #3 (Public review): 

      Summary

      The manuscript coins a term "the census population size" which they define from the diversity of malaria parasites observed in the human community. They use it to explore changes in parasite diversity in more than 2000 people in Ghana following different control interventions. 

      Strengths:

      This is a good demonstration of how genetic information can be used to augment routinely recorded epidemiological and entomological data to understand the dynamics of malaria and how it is controlled. The genetic information does add to our understanding, though by how much is currently unclear (in this setting it says the same thing as age stratified parasite prevalence), and its relevance moving forward will depend on the practicalities and cost of the data collection and analysis. Nevertheless, this is a great dataset with good analysis and a good attempt to understand more about what is going on in the parasite population.

      Thank you to the reviewer for their supportive assessment of our research.

      Weaknesses

      None

      Reviewer #3 (Recommendations for the authors): 

      New figure supplement 8 - x-axis says percentage but goes between 0-1, so is a proportion

      We thank the reviewer for bringing this to our attention. We have amended the x-axis labels accordingly for Figure supplement 8.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The manuscript by Ozcan et al., presents compelling evidence demonstrating the latent potential of glial precursors of the adult cerebral cortex for neuronal reprogramming. The findings substantially advance our understanding of the potential of endogenous cells in the adult brain to be reprogrammed. Moreover, they describe a molecular cocktail that directs reprogramming toward corticospinal neurons (CSN).

      Strengths:

      Experimentally, the work is compelling and beautifully designed, with no major caveats. The main conclusions are fully supported by the experiments. The work provides a characterization of endogenous progenitors, genetic strategies to isolate them, and proof of concept of exploiting these progenitors' potential to produce a specific desired neuronal type with "a la carte" combination of transcription factors.

      Weaknesses:

      Some issues need to be addressed or clarified before publication. The manuscript requires editing. It is dense and rich in details while in other parts there are a few mistakes.

      We thank the reviewer for their excellent summary and for their extremely positive review of our paper. We are pleased that the experimental design and conclusions were judged to be wellsupported.

      We have revised the paper to enhance clarity, include additional relevant citations, and refine terminology in some sections of the original version.

      We appreciate the reviewer’s thoughtful review and agree that these revisions enhance the paper.

      Reviewer #2 (Public Review):

      Summary:

      Here the authors show a novel direct neuronal reprogramming model using a very pure culture system of oligodendrocyte progenitor cells and demonstrate hallmarks of corticospinal neurons to be induced when using Neurogenin2, a dominant-negative form of Olig2 in combination with the CSN master regulator Fezf2.

      Strengths:

      This is a major achievement as the specification of reprogrammed neurons towards adequate neuronal subtypes is crucial for repair and still largely missing. The work is carefully done and the comparison of the neurons induced only by Neurogenin 2 versus the NVOF cocktail is very interesting and convincingly demonstrates a further subtype specification by the cocktail.

      Weaknesses:

      As carefully as it is done in vitro, the identity of projection neurons can best be assessed in vivo. If this is not possible, it could be interesting to co-culture different brain regions and see if these neurons reprogrammed with the cocktail, indeed preferentially send out axons to innervate a co-cultured spinal cord versus other brain region tissue.

      We appreciate the reviewer’s positive evaluation of our work and their recognition of its significance in advancing neuronal subtype specification through directed differentiation of endogenous progenitors. 

      We agree with the reviewer’s suggestion that a very interesting future stage of this work would be to investigate the projection neuron identity in vivo. We aim to pursue follow-up studies to investigate in vivo integration and connectivity of such neurons generated by directed differentiation from endogenous SOX6+/NG2+ cortical progenitors. As the reviewer insightfully suggests, co-culturing different brain regions with these neurons could offer an alternative strategy to partially assess potential preferential connectivity into cultured spinal cord vs. alternate tissue.

      We agree with the reviewer that future investigation in vivo will further strengthen the implications of this work.

      Reviewer #3 (Public Review):

      Summary:

      Ozkan, Padmanabhan, and colleagues aim to develop a lineage reprogramming strategy towards generating subcerebral projection neurons from endogenous glia with the specificity needed for disease modelling and brain repair. They set out by targeting specifically Sox6-positive NG2 glia. This choice is motivated by the authors' observation that the early postnatal forebrain of Sox6 knockout mice displays marked ectopic expression of the proneural transcription factor (TF) Neurog2, suggesting a latent neurogenic program may be derepressed in NG2 cells, which normally express Sox6. Cultured NG2 glia transfected with a construct ("NVOF") encoding Neurog2, the corticofugal neuron-specifying TF Fezf2, and a constitutive repressor form of Olig2 are efficiently reprogrammed to neurons. These acquire complex morphologies resembling those of mature endogenous neurons and are characterized by fewer abnormalities when compared to neurons induced by Neurog2 alone. NVOF-induced neurons, as a population, also express a narrower range of cortical neuron subtype-specific markers, suggesting narrowed subtype specification, a potential step forward for Neurog2-driven neuronal reprogramming. Comparison of NVOF- and Neurog2-induced neurons to endogenous subcerebral projection neurons (SCPN) also indicates Fezf2 may aid Neurog2 in directing the generation of SCPN-like neurons at the expense of other cortical neuronal subtypes.

      Strengths:

      The report describes a novel, highly homogeneous in vitro system amenable to efficient reprogramming. The authors provide evidence that Fezf2 shapes the outcome of Neurog2-driven reprogramming towards a subcerebral projection neuron identity, consistent with its known developmental roles. Also, the use of the modified RNA for transient expression of Neurog2 is very elegant.

      Weaknesses:

      The molecular characterization of NVOF-induced neurons is carried out at the bulk level, therefore not allowing to fully assess heterogeneity among NVOF-induced neurons. The suggestion of a latent neurogenic potential in postnatal cortical glia is only partially supported by the data from the Sox6 knockout. Finally, some of the many exciting implications of the study remain untested.

      Discussion:

      The study has many exciting implications that could be further tested. For example, an ultimate proof of the subcerebral projection neuron identity would be to graft NVOF cells into neonatal mice and study their projections. Another important implication is that Sox6-deficient NG2 glia may not only express Neurog2 but activate a more complete neurogenic programme, a possibility that remains untested here.

      Also, is the subcerebral projection neuron dependent on the starting cell population? Could other NG2 glia, not expressing Sox6, also be co-axed by the NVOF cocktail into subcerebral projection neurons? And if not, do they express other (Sox) transcription factors that render them more amenable to reprogramming into other cortical neuron subtypes? The authors state that SOX6-positive NG2 glia are a quiescent progenitor population. Given that NG2 glia is believed to undergo proliferation as a whole, are Sox6-positive NG2 glia an exception from this rule? Finally, the authors seem to imply that subcerebral projection neurons and Sox6-positive NG2 glia are lineage-related. However, direct evidence for this conjecture seems missing.

      We appreciate the reviewer’s thoughtful and detailed review of this work. We especially appreciate the positive evaluation of the work and the highlighting of multiple strengths of our approach, including the role of Fezf2 in refining neuronal subtype identity and the use of modified RNA to enable transient expression of Neurog2.

      We acknowledge the reviewer’s comment that single-cell transcriptomic analysis would indeed provide a more granular view of likely heterogeneity. This current study focuses on investigating the feasibility of directed differentiation of corticospinal-like neurons from endogenous progenitors. Future work employing single-cell sequencing could indeed help delineate the heterogeneity of neurons generated by directed differentiation, and potentially contribute toward identification of potential molecular roadblocks in different subsets.

      Regarding the suggestion that SOX6-deficient NG2+ progenitors might activate a broader neurogenic program, we agree that this is an intriguing possibility. We are currently conducting indepth investigation of the loss of SOX6 function in NG2+ progenitors, and we aim to submit this quite distinct work for separate publication.

      The reviewer raises an important point about whether SOX6+/NG2+ progenitors and subcerebral projection neurons are indeed normally lineage-related. In the current work, we utilized postnatal cortical SOX6+/NG2+ progenitors that are thought to be largely derived from EMX1+ and GSH2+ ventricular zone neural progenitors. Our unpublished data from the separate study noted above indicate that SOX6 is expressed by both these lineages in vivo. Since subcerebral projection neurons are derived from EMX1+ ventricular zone progenitors (SOX6-expressing), at least some of the SOX6+/NG2+ progenitors are expected to share a lineage relationship with subcerebral projection neurons. While our data strongly suggest such a link, we agree that direct lineagetracing could be pursued in future work. 

      Finally, we agree with the reviewer’s suggestion that in vivo transplantation to assess the identity and connectivity of neurons generated by directed differentiation would be very interesting, and is a natural next phase of this work. We aim to pursue such work in future investigations.

      We again thank the reviewer for their insightful comments.

      Reviewer #1 (Recommendations For The Authors): 

      The most important clarification for me concerns the initial description of the progenitors. I think there is a mistake with the transgenic line NG2. The dsRed mouse used in Figure 1 C is not described until later in the results describing Figure 2. This was confusing. Moreover, perhaps this is a reason why I get confused and do not understand how the authors conclude that SOX6+ cells are a subset of NG2positive cells. Panel C shows the opposite. Please correct the description and show the quantification of data in panel 1C.

      We thank the reviewer for their thoughtful review and for highlighting this important point. We appreciate the reviewer pointing out the benefit of further clarity regarding the NG2.DsRed transgenic mouse description in Figure 1C. We have revised the text to clarify the use of the transgenic line and ensure that the DsRed mouse is properly introduced. Additionally, we have further clarified the description explaining the basis for concluding that SOX6+ cells are a subset of NG2+ cells and further integrate this conclusion with the data presented.

      During cell sorting from the cortices of NG2.DsRed mice, we observe two distinct populations of NG2-DsRed+ cells based on fluorescence intensity in FACS: NG2-DsRed “bright” and NG2-DsRed “dim” populations. The NG2-DsRed “dim” population consists of a heterogenous mix of NESTIN+ progenitors, GFAP+ astrocytes/progenitors, a subset of NG2+ cells, and other unidentified cells. In contrast, the DsRed “bright” population includes a broader group of progenitors that also give rise to oligodendrocytes (please see Zhu, Bergles, and Nishiyama 2008), along with pericytes. 

      Previous studies have shown that, while dorsal/pallial VZ progenitors express SOX6 during embryonic development, SOX6 expression becomes restricted to interneurons postnatally (these do not express NG2 proteoglycan; Azim et al., 2009) and to the broader group of NG2+ progenitors that also give rise to oligodendrocytes. The ICC image in Fig. 1C shows bright NG2+ cells in the cortex, many of which express SOX6. Thus, we conclude that SOX6+ cells constitute a subset of NG2-DsRed+ cells. 

      In a similar line, the work is beautiful, but the manuscript can gain a lot from shortening and some more editing. for example:

      (1) In the abstract, the word inappropriate should be removed. It seems to me that is an unnecessary subjective qualification - it is hardly possible that in biology we found repression of something inappropriate.

      We have removed the word “inappropriate”.

      (2) FACS-purify these genetically accessible....establish a pure culture. Genetically accessible is nice, and I understand that it conveys that they can be traced in the mouse, but everything is genetically accessible with the right tool, and perhaps it is more informative to explain which gene or report is used for the isolation. These cells are not accessible in humans. Also, I consider it best to remove pure- the culture is pure (purified by FACS) cells.

      We have revised the text to specify the gene/reporter used for isolation instead of using "genetically accessible", and we removed "pure", since FACS purification is already explicitly mentioned.

      (3) In the initial paragraph in the results: "They are exposed to the same morphogen gradients throughout embryonic development, and thus, compared to distant cell types, have similar epigenomic and transcription landscapes." This is proven in the cited publication, but the way is stated here seems a bit of an unnecessary overstatement. The hypothesis stated after this paragraph is as good as it is with or without this argument.

      We have revised the text and simplified the statement. We agree that the hypothesis remains clear and well-supported without this emphasis.

      (4) In the result sections, "two distinct populations of DsREd-positive cells were identified based on fluorescence intensity"- I know it is correct, but when reading the percentages, I was confused because those percentages divided the population into three fractions. What the authors do not explain is that they discard the intermediate-expressing population.

      We appreciate the reviewer highlighting this inadvertent point of confusion. We erred by discussing only the two populations of central interest to us (DsRed-bright and DsRed-dim), and did not explicitly mention the DsRed-negative population. We have now clarified the text to include all three cell populations and their percentages of the total cells in all three populations (in the original manuscript and still now, ~75-78% were DsRed-negative). We have also further clarified that only DsRed-Bright cells (identified as progenitors) were used for all subsequent experiments.

      These examples illustrate the type of editing that would be appreciated but which is entirely up to the authors.

      We thank the reviewer for their thoughtful suggestions toward improving clarity and precision. We have incorporated these recommendations, along with suggestions from the other two reviewers, in the revised paper.

      Reviewer #2 (Recommendations For The Authors):

      (1)  The authors start their results section by showing in situ Hybridization for Ngn2 in control and Sox6KO mice. These control sections do not look convincing, as there is not even some signal in the adult VZSVZ region and virtually no background. Please show sections where some positive signal can also be detected in the control sections.

      We agree with the reviewer that making direct comparisons in ISH experiments is an important point. In our ISH experiments, to ensure consistency and appropriate comparisons, we process WT and KO sections together and stop the signal development simultaneously. We could have extended the development time to enhance WT signal to a detectable level, but that would have led to excessive background and over-saturated signal in the KO sections.

      To address the reviewer’s point, we have added a new supplementary figure with an additional pair of WT and KO sections, along with reference data from the Allen Brain Atlas. The WT section shows faint Neurog2 expression in the dentate gyrus region of the hippocampus, while the KO section confirms very substantial upregulation of Neurog2 in the absence of SOX6 function. These additional data enhance the clarity and depth of our results.

      Please see the following link for the Allen Brain Atlas ISH data demonstrating that Neurog2 expression in the postnatal (P4) SVZ/SGZ is inherently low. (https://developingmouse.brainmap.org/experiment/show/100093831). 

      (2) As a hallmark of projection neurons is where they send their axons, it would be important to include a biological assay for this. Of course, in vivo experiments would be great, but if this is not possible, the authors could co-culture sections from the late embryonic cortex, striatum, and spinal cord to see if the reprogrammed neurons preferentially extend their axons towards one of these targets (as normally developing neurons would, see e.g. Bolz et al., 1990).

      We agree with the reviewer’s suggestion that a very interesting future stage of this work would be to investigate the projection neuron identity including connectivity in vivo. We aim to pursue follow-up studies to investigate in vivo integration and connectivity of such neurons generated by directed differentiation from endogenous SOX6+/NG2+ cortical progenitors. As the reviewer insightfully suggests, co-culturing different brain regions with these neurons could offer an alternative strategy to partially assess potential preferential connectivity into cultured spinal cord vs. alternate tissue. This area of investigation is of substantial interest to our lab, and we aim to pursue it in the coming years– it is a very large undertaking by either approach.

      (3) However, if the loss of Sox6 is sufficient for Ngn2 to be upregulated, why did the authors not pursue this approach in their reprogramming experiments? Are these endogenous levels sufficient for reprogramming? Please add some OPC cultures from WT and KO mice to explore their conversion to neurons and possibly combine them with Olig2VP16 and Fezf2.

      We thank the reviewer for this insightful comment and for raising this broader area of inquiry regarding whether SOX6 might be down-regulated to enhance induction of neurogenesis. We are writing a separate manuscript regarding function of SOX6 in these progenitors during normal or molecularly manipulated development. We investigate function of SOX6 using both whole body null mice and a series of conditional null mice. We aim to post that work as a preprint and submit it for review and publication in the coming months. Beyond that work, the potential strategy of downregulating SOX6 function while simultaneously upregulating other molecular controls to refine directed neuronal differentiation is also of substantial interest to us, and we aim to pursue this in follow-up work. Though these are both interesting questions/topics, we respectfully submit that these broad areas of parallel, complex, and future investigation would substantially expand the scope of work in this paper, so we aim to address them in separate studies.

      (4) Please indicate independent biological replicates as individual data points in all histograms, i.e. also in Figure 2K, Figure 4I, S2H.

      We have updated the figure legends indicating the biological replicates, and explained the broad media optimization that was used successfully in all further experiments.

      (5) GFP labelling in Figures S2K-N is not convincing - too high background. Please optimize.

      We have redesigned this figure and now present it as a new supplementary figure, with GFP pseudocolored in gray and enlarged subpanels for improved visualization of cell morphology.

      Reviewer #3 (Recommendations For The Authors):

      This is an extremely well-written manuscript with very exciting implications. Obviously, not all can be tested here. Some of the suggestions are relatively easy and may be worth testing right away, others may require more extensive study in the future. In my view, completing some of the points below could make this paper a landmark study.

      I start with the key questions:

      (1) Do grafted NVOF cells give rise to subcerebral projection neurons in vivo?

      We agree with the reviewer’s suggestion that a very interesting future stage of this work would be to investigate the projection neuron identity including connectivity in vivo. As noted above in response to Reviewer 2, we aim to pursue follow-up studies to investigate in vivo integration and connectivity of such neurons generated by directed differentiation from endogenous SOX6+/NG2+ cortical progenitors. This question is of substantial interest to us, and we aim to pursue it in the coming years– as the reviewer notes, this is a very large undertaking, and beyond the scope of this paper.

      (2) What is the fate of the Sox6 deficient NG2 glia that express Neurog2? One could isolate these cells and subject them to scRNA sequencing to see how far neurogenesis proceeds without addition of exogenous factors.

      We thank the reviewer for this insightful question. As noted in our response to Reviewer 2, we are writing a separate manuscript regarding function of SOX6 in these progenitors during normal or molecularly manipulated development. We investigate function of SOX6 using both whole body null mice and a series of conditional null mice. We aim to post that work as a preprint and submit it for review and publication in the coming months, likely in early summer. We respectfully submit that this broad area of parallel, complex investigation would substantially expand the scope of work in this paper and make this paper too complex and multi-directional, so we aim to publish them as separate papers for the benefit of clarity for readers.

      (3) Obviously, what happens to Sox6-deficient (or non-deficient cells) when forced to express NVOF? In this context, it might be fair to cite Felske et al (PLoS Biol, 2023) who report Neurog2 and Fezf2-induced reprogramming in the postnatal brain. In their model, these authors did not distinguish between converted astrocytes and NG2 glia. Thus, some of the reprogrammed cells may comprise the SOX6positive cells described here.

      We thank the reviewer for highlighting for us that we inadvertently omitted referencing the important paper by Felske et al., 2023. We have now included this citation. 

      We thank the reviewer for raising this broader area of inquiry regarding whether SOX6 might be down-regulated to enhance induction of neurogenesis. Beyond the work noted above regarding function of SOX6 in these progenitors during normal or molecularly manipulated development, the potential strategy of downregulating SOX6 function while simultaneously upregulating other molecular controls to refine directed neuronal differentiation is of substantial interest to us, and we aim to pursue this in follow-up work. We again respectfully submit that this area of complex, future investigation should be addressed in future studies.

      Very interesting unaddressed questions include:

      (1) Are Sox6+ NG glia of dorsal origin? This is implied but not shown. One could use Emx1Cre lines to assess this. Are Sox6+ glia and subcerebral projection neurons clonally related? This may be more challenging. In this context, it might be again fair to refer to Herrero-Navarro et al (Science Advances 2021) who show that glia lineage related to nearby neurons gives rise to induced neurons with regional specificity.

      The reviewer raises an important question regarding the competence of SOX6+/NG2+ progenitors from distinct origins to generate corticospinal-like neurons by directed differentiation. In ongoing unpublished work, we have identified SOX6 expression by NG2+ progenitors of the three lineages derived from ventricular zone progenitors that express either Emx1, Gsh2, or Nkx2.1 transcription factors. The EMX1+ lineage-derived SOX6+/NG2+ progenitors are directly lineage related to cortical projection neurons. As the reviewer suggests, future experiments could explore potential differences in competence between these three populations.

      We again thank the reviewer for highlighting for us that we also inadvertently omitted referencing the exciting study by Herrero-Navarro that addresses the question of regional heterogeneity within astrocytes and the differential reprogramming potential related to their origins. We have now cited this paper in the manuscript.

      (2) Do other NG2 glia not give rise to subcerebral projection neurons when challenged with NVOF? Thus, how important is Sox6 expression really?

      The question of the specific competence of dorsal/cortical SOX6+/NG2+ progenitors to differentiate into corticospinal-like neurons, and the strategy of downregulating SOX6 function while simultaneously upregulating other molecular controls to direct neuronal differentiation, are both of great interest to us. In pilot experiments, we observed reduced competence of ventrallyderived SOX6+/NG2+ progenitors to generate similar neurons. We plan to pursue the SOX6 manipulation in follow up work.

      (3) Do Sox6+ NG2 glia proliferate like other NG2 glia and thereby represent a replenishable pool of progenitors?

      Yes; as noted in the text shortly after Figure 1, and as presented in Figure S3l-L, these progenitors proliferate robustly in response to the mitogens PDGF-A and FGF2.

      (4) How heterogenous are the NVOF-induced neurons? The bulk highlights the overall specificity, but does not tell whether all cells make it equally well.

      We agree with the reviewer that this is an interesting question. ICC analysis (Fig. 4G-4H) presents the variation in the levels of a few functionally important proteins in the population of NVOFinduced neurons. This could be due to any or all of at least three potential possibilities: 1) potential diversity in the population of purified SOX6+/NG2+ progenitors; 2) technical variability in the amount of NVOF plasmid delivered to individual progenitors during transfection; and/or 3) natural stochastic TF-level variations generating closely-related neuron types, that also occurs during normal development. Future experiments could explore these questions.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer#1 (Public review):

      This work regards the role of Aurora Kinase A (AurA) in trained immunity. The authors claim that AurA is essential to the induction of trained immunity. The paper starts with a series of experiments showing the effects of suppressing AurA on beta-glucan-trained immunity. This is followed by an account of how AurA inhibition changes the epigenetic and metabolic reprogramming that are characteristic of trained immunity. The authors then zoom in on specific metabolic and epigenetic processes (regulation of S-adenosylmethionine metabolism & histone methylation). Finally, an inhibitor of AurA is used to reduce beta-glucan's anti-tumour effects in a subcutaneous MC-38 model.

      Strengths:<br /> With the exception of my confusion around the methods used for relative gene expression measurements, the experimental methods are generally well-described. I appreciate the authors' broad approach to studying different key aspects of trained immunity (from comprehensive transcriptome/chromatin accessibility measurements to detailed mechanistic experiments). Approaching the hypothesis from many different angles inspires confidence in the results (although not completely - see weaknesses section). Furthermore, the large drug-screening panel is a valuable tool as these drugs are readily available for translational drug-repurposing research.

      We thank the reviewer for the positive and encouraging comments.

      Weaknesses:

      (1) The manuscript contains factual inaccuracies such as:

      (a) Intro: the claim that trained cells display a shift from OXPHOS to glycolysis based on the paper by Cheng et al. in 2014; this was later shown to be dependent on the dose of stimulation and actually both glycolysis and OXPHOS are generally upregulated in trained cells (pmid 32320649).

      We appreciate the reviewer for pointing out this inaccuracy, and we have revised our statement to ensure accurate and updated description in manuscript. We are aware that trained immunity involves different metabolic pathways, including both glycolysis and oxidative phosphorylation [1, 2]. We also detected Oxygen Consumption Rate (please see response to comment 8 of reviewer#1) but observed no obvious increase of oxygen consumption in trained BMDMs in our experiment setting. As the reviewer pointed out, it might be dependent on the dose of stimulation.

      (b) Discussion: Trained immunity was first described as such in 2011, not decades ago.

      We are sorry for the inaccurate description, and we have corrected the statement in our revised manuscript as “Although the concept of ‘trained immunity’ has been proposed since 2011, the detailed mechanisms that regulate trained immunity are still not completely understood.”

      (2) The authors approach their hypothesis from different angles, which inspires a degree of confidence in the results. However, the statistical methods and reporting are underwhelming.

      (a) Graphs depict mean +/- SEM, whereas mean +/- SD is almost always more informative. (b) The use of 1-tailed tests is dubious in this scenario. Furthermore, in many experiments/figures the case could be made that the comparisons should be considered paired (the responses of cells from the same animal are inherently not independent due to their shared genetic background and, up until cell isolation, the same host factors like serum composition/microbiome/systemic inflammation etc). (c) It could be explained a little more clearly how multiple testing correction was done and why specific tests were chosen in each instance.

      We sincerely thank the reviewer for this thoughtful comment. (a) The data from animal experiments in which trained immunity was induced in vivo are presented as mean ± SD, while the statistical results from cell-based experiments are presented as mean ± SEM in the revised manuscript. (b) We have replaced one-tailed test with two-tailed test (see Figure 3J in revised manuscript, with updated P value label). We agree that cells derived from the same animal and subjected to different treatment conditions may be deemed paired data. We reanalyzed our data using paired statistical tests. While this led to a slight reduction in statistical significance for some comparisons, the overall trends remained consistent, and our biological interpretation remains unchanged. For in vitro experiments unpaired statistical tests are commonly used in literature [3, 4]. Thus, we still used unpaired test results here. (c) We have provided a detailed description of how multiple comparisons were performed in revised figure legends.

      (d) Most experiments are done with n = 3, some experiments are done with n = 5. This is not a lot. While I don't think power analyses should be required for simple in vitro experiments, I would be wary of drawing conclusions based on n = 3. It is also not indicated if the data points were acquired in independent experiments. ATAC-seq/RNA-seq was, judging by the figures, done on only 2 mice per group. No power calculations were done for the in vivo tumor model.

      We are sorry for the confusion in our description in figure legends. For the in vivo experiment, we determined the sample size (n=5, n refers to number of mice used as biological replicates) by referring to the animal numbers used for similar experiments in literatures. And according to a reported resource equation approach for calculating sample size in animal studies [5], n=5-7 is suitable for most of our mouse experiments. The in vitro cell assay was performed at least three independent experiments (BMs isolated from different mice), and each experiment was independently replicated at least three times and points represents biological replicates in our revised manuscript. In Figure 1A, 5 biological replicates of these experiments are presented to carefully determine a working concentration of alisertib that would not significantly affect the viability of trained macrophages, and that was subsequently used in all related cell-based experiments. As for seq data, we acknowledge the reviewer's concern regarding the small sample size (n=2) in our RNA-seq/ATAC-seq experiment. We consider the sequencing experiment mainly as an exploratory/screening approach, and performed rigorous quality control and normalization of the sequencing data to ensure the reliability of our findings. For RNA-seq data analysis, we referred to the DESeq2 manual, which specifies that its statistical framework is based on the Negative Binomial Distribution and is capable of robustly inferring differential gene expression with a minimum of two replicates per group. Therefore, the inclusion of two replicates per group was deemed sufficient for our analysis. Nevertheless, the genomic and transcriptome sequencing data were used primarily for preliminary screening, where the candidates have been extensively validated through additional experiments. For example, we conducted ChIP followed by qPCR for detecting active histone modification enrichment in Il6 and Tnf region to further verify the increased accessibility of trained immunity-induced inflammatory genes.

      (e) Furthermore, the data spread in many experiments (particularly BMDM experiments) is extremely small. I wonder if these are true biological replicates, meaning each point represents BMDMs from a different animal? (disclaimer: I work with human materials where the spread is of course always much larger than in animal experiments, so I might be misjudging this.).

      Thanks for your comments. In our initially submitted manuscript, some of the statistical results were presented as the representative data (technical replicates) from one of three independent biological replicates (including BMDMs experiments showing the suppression and rescue experiments of trained immunity under different inhibitors or activators, see original Figure 1B-C, Figure 5D, and Figure 5H, also related to Figure 1B-C, Figure 5D, and Figure 5H respectively in our revised manuscript) while other experimental data are biological replicates including CCK8 experiment, metabolic assay and ChIP-qPCR. In response to your valuable suggestion, we have revised the manuscript to present all statistical results as biological replicates from three independent experiments (presented as mean ± SEM), and we have provided all the original data for the statistical analysis results (please see Appendix 2 in resubmit system).

      (3) Maybe the authors are reserving this for a separate paper, but it would be fantastic if the authors would report the outcomes of the entire drug screening instead of only a selected few. The field would benefit from this as it would save needless repeat experiments. The list of drugs contains several known inhibitors of training (e.g. mTOR inhibitors) so there must have been more 'hits' than the reported 8 Aurora inhibitors.

      Thank you for your suggestion and we have briefly reported the outcomes of the entire drug screening in the revised manuscript. The targets of our epigenetic drug library are primarily categorized into several major classes, including Aurora kinase family, histone methyltransferase and demethylase (HMTs and KDMs), acetyltransferase and deacetylase (HDACs and SIRTs), JAK-STAT kinase family, AKT/mTOR/HIF, PARP family, and BRD family (see New Figure 1, related to Figure 1-figure supplement 1B in revised manuscript). Notably, previous studies have reported that inhibition of mTOR-HIF1α signaling axis suppressed trained immunity[6]. Our screening results also indicated that most inhibitors targeting mTOR-HIF1α signaling exhibit an inhibitory effect on trained immunity. Additionally, cyproheptadine, a specific inhibitor for SETD7, which was required for trained immunity as previously reported [7], was also identified in our screening.

      JAK-STAT signaling is closely linked to the interferon signaling pathway, and certain JAK kinase inhibitors also target SYK and TYK kinases. A previous drug library screening study has reported that SYK inhibitors suppressed trained immunity [8]. Consistently, our screening results reveal that most JAK kinase inhibitors exhibit suppressive effects on trained immunity.

      BRD (Bromodomain) and Aurora are well-established kinase families in the field of oncology. Compared to BRD, the clinical applications of the Aurora kinase inhibitor are still at early stage. In previous studies using inflammatory arthritis models where trained immunity was established, both adaptive and innate immune cells exhibited upregulated expression of AurA [9, 10]. Our study provides further evidence supporting an essential role of AurA in trained immunity, showing that AurA inhibition leads to the suppression of trained immunity.

      (4) Relating to the drug screen and subsequent experiments: it is unclear to me in supplementary figure 1B which concentrations belong to secondary screens #1/#2 - the methods mention 5 µM for the primary screen and "0.2 and 1 µM" for secondary screens, is it in this order or in order of descending concentration?

      Thank you for your comments and we are sorry for unclear labelled results in original manuscript (related to Figure 1-supplement 1C). We performed secondary drug screen at two concentrations, and drug concentrations corresponding to secondary screen#1 and #2 are 0.2 and 1 μM respectively. It was just in this order, but not in an order of descending concentration.

      (a) It is unclear if the drug screen was performed with technical replicates or not - the supplementary figure 1B suggests no replicates and quite a large spread (in some cases lower concentration works better?)

      Thank you for your question. The drug screen was performed without technical replicates for initial screening purpose, and we need to verify any hit in the following experiment individually. Yes, we observed that lower concentration works better in some cases. We speculate that it might be due to the fact that the drug's effect correlates positively with its concentration only within a specific range. But in our primary screening, we simply choose one concentration for all the drugs. This is a limitation for our screening, and we acknowledge this limitation in our discussion part.

      (5) The methods for (presumably) qPCR for measuring gene expression in Figure 1C are missing. Which reference gene was used and is this a suitably stable gene?

      We are sorry for this omission. The mRNA expression of Il6 and Tnf in trained BMDMs was analyzed by a quantitative real-time PCR via a DDCt method, and the result was normalized to untrained BMDMs with Actb (β-actin) as a reference gene, a well-documented gene with stable expression in macrophages. We have supplemented the description for measuring gene expression in Material and Methods in our revised manuscript.

      (6) From the complete unedited blot image of Figure 1D it appears that the p-Aurora and total Aurora are not from the same gel (discordant number of lanes and positioning). This could be alright if there are no/only slight technical errors, but I find it misleading as it is presented as if the actin (loading control to account for aforementioned technical errors!) counts for the entire figure.

      We are very sorry for this omission. In the original data, p-Aurora and total Aurora were from different gels. In this experiment the membrane stripping/reprobing after p-Aurora antibody did not work well, so we couldn’t get all results from one gel, and we had to run another gel using the same samples to blot with anti-aurora antibody and used β-tubulin as loading control for total AurA (please see New Figure 2A, also related to original Figure 1D). We have provided the source data for β-tubulin from the same membrane of total AurA (please see Figure 1-source data). To avoid any potential misleading, we have repeated this experiment and updated this Figure (please see New Figure 2B, also related to Figure 1D in revised manuscript) with phospho-AurA, total AurA and β-actin from the same gel. The bands for phospho AurA (T288) were obtained using a new antibody (Invitrogen, 44-1210G) and we have revised this information in Material and Methods. We have provided data of three biological replicates to confirm the experiment result also see New Figure 2B, related to Figure 1D in revised manuscript, and the raw data have been added in source data for Figure 1)

      (7) Figure 2: This figure highlights results that are by far not the strongest ones - I think the 'top hits' deserve some more glory. A small explanation on why the highlighted results were selected would have been fitting.

      We appreciate the valuable suggestion. Figure 2 (see also Figure 2 in revised manuscript) presented information on the chromatin landscape affected by AurA inhibition to confirm that AurA inhibition impaired key gene activation involved in pro-inflammatory macrophage activation by β-glucan. In Figure 2B we highlighted a few classical GO terms downregulated including “regulation of growth”, “myeloid leukocyte activation” and “MAPK cascade” (see also Figure 2B in revised manuscript), among which “regulation of growth” is known function of Aurora A, just to show that alisertib indeed inhibited Aurora A function in vivo as expected. “Myeloid leukocyte activation” and “MAPK cascade” were to show the impaired pro-inflammatory gene accessibility. We highlighted KEGG terms downregulated like “JAK-STAT signaling pathway”, “TNF signaling pathway” and “NF-kappa B signaling pathway” in Figure 2F (see also Figure 2F in revised manuscript), as these pathways are highly relevant to trained immunity. Meanwhile, KEGG terms “FOXO signaling pathway” (see also Figure 2G in revised manuscript) was highlighted to confirm the anti-inflammation effect of alisertib in trained BMDMs, which was further illustrated in Figure 5 (see also Figure 5 in revised manuscript, illustrating FOXO3 acts downstream of AurA). Some top hits in Figure 2B like “positive regulation of cell adhesion”, and “pathway of neurodegeneration” and "ubiquitin mediated proteolysis" in Figure 2F and 2G, is not directly related to trained immunity, thus we did not highlight them, but may provide some potential information for future investigation on other functions of Aurora A.

      (8) Figure 3 incl supplement: the carbon tracing experiments show more glucose-carbon going into TCA cycle (suggesting upregulated oxidative metabolism), but no mito stress test was performed on the seahorse.

      We appreciate this question raised by the reviewer. We previously performed seahorse XF analyze to measure oxygen consumption rate (OCR) in β-glucan-trained BMDMs. The results showed no obvious increase in oxidative phosphorylation (OXPHOS) indicated by OCR under β-glucan stimulation (related to Figure 3-figure supplement 1 A) although the carbon tracing experiments showed more glucose-carbon going into TCA cycle. We speculate that the observed discrepancy between increased glucose incorporation into TCA cycle and unchanged OXPHOS may reflect a characteristic metabolic reprogramming induced by trained immunity. The increased incorporation of glucose-derived carbon into the TCA cycle likely serves a biosynthetic purpose—supplying intermediates for anabolic processes—rather than augmenting mitochondrial respiration[6]. Moreover, the unchanged OXPHOS may be attributed to a reduced reliance on fatty acid oxidation- “catabolism”, with glucose-derived acetyl-CoA becoming the predominant substrate. Thus, while overall OXPHOS remains stable, the glucose contribution to the TCA cycle increases. This is in line with reports showing that trained immunity promotes fatty acid synthesis- “anabolism”[11]. Alternatively, the partial decoupling of the TCA cycle from OXPHOS could result from the diversion of intermediates such as fumarate out of the cycle. Oxygen consumption rate (OCR) after a mito stress test upon sequential addition of oligomycin (Oligo, 1 μM), FCCP (1 mM), and Rotenone/antimycin (R/A, 0.5 μM), in BMDMs with different treatment for 24 h. β-glucan, 50 μg/mL; alisertib, 1 μM.

      (9) Inconsistent use of an 'alisertib-alone' control in addition to 'medium', 'b-glucan', 'b-glucan + alisertib'. This control would be of great added value in many cases, in my opinion.

      Thank you for your comment. We appreciate that including “alisertib-alone” group throughout all the experiments may further solidify the results. We set the aim of the current study to investigate the role of Aurora kinase A in trained immunity. Therefore, in most settings, we did not include the group of alisertib only without β-glucan stimulation.

      (10) Figure 4A: looking at the unedited blot images, the blot for H3K36me3 appears in its original orientation, whereas other images appear horizontally mirrored. Please note, I don't think there is any malicious intent but this is quite sloppy and the authors should explain why/how this happened (are they different gels and the loading sequence was reversed?)

      Thank you for pointing out this error. After checking the original data, we found that we indeed misassembled the orientation of several blots in original data submitted. We went through the assembling process and figured out that the orientation of blots in original data was assembled according to the loading sequences, but not saved correctly, so that the orientations in Figure 4A were not consistent with the unedited blot image. We are sorry for this careless mistake, and we have double checked to make sure all the blots are correctly assembled in the revised manuscript. We also provided three replicates of for the Western blot results showing the level of H3K36me3 in trained BMDMs was inhibited by alisertib (as seen in New Figure 7 at recommendation 2 of reviewer#2).

      (11) For many figures, for example prominently figure 5, the text describes 'beta-glucan training' whereas the figures actually depict acute stimulation with beta-glucan. While this is partially a semantic issue (technically, the stimulation is 'the training-phase' of the experiment), this could confuse the reader.

      Thanks for the reviewer’s suggestion and we have reorganized our language to ensure clarity and avoid any inconsistencies that might lead to misunderstanding.

      (12) Figure 6: Cytokines, especially IL-6 and IL-1β, can be excreted by tumour cells and have pro-tumoral functions. This is not likely in the context of the other results in this case, but since there is flow cytometry data from the tumour material it would have been nice to see also intracellular cytokine staining to pinpoint the source of these cytokines.

      Thanks for the reviewer’s suggestion. In Figure 6, we performed assay in mouse tumor model and found that trained immunity upregulated cytokines level like IL-6 in tumor tissue, which was downregulated by alisertib administration. In order to rule out the possibility that the detected cytokines such as IL-6 was from tumor cells, we performed intracellular cytokine staining of single cells isolated from tumor tissues (please see New Figure 4). The result showed that only a small fraction of non-immune cells (CD45<sup>-</sup> population) expressed IL-6 (0.37% ± 0.11%), whereas a significantly higher proportion of IL-6-positive cells was observed among CD45<sup>+</sup> population (deemed as immune cells, 13.66% ± 1.82%), myeloid cells (CD45<sup>+</sup>CD11b<sup>+</sup>, 15.60% ± 2.19%), and in particular, macrophages (CD45<sup>+</sup>CD11b<sup>+</sup>F4/80<sup>+</sup>37.24% ± 3.04%). These findings strongly suggest that immune cells, especially macrophages, are the predominant source of IL-6 cytokine within the tumor microenvironment. Moreover, we also detected higher IL-6 positive population in myeloid cells and macrophages (please see Figure 6I in revised manuscript).

      Reviewer#2 (Public review):

      Summary:

      This manuscript investigates the inhibition of Aurora A and its impact on β-glucan-induced trained immunity via the FOXO3/GNMT pathway. The study demonstrates that inhibition of Aurora A leads to overconsumption of SAM, which subsequently impairs the epigenetic reprogramming of H3K4me3 and H3K36me3, effectively abolishing the training effect.

      Strengths:

      The authors identify the role of Aurora A through small molecule screening and validation using a variety of molecular and biochemical approaches. Overall, the findings are interesting and shed light on the previously underexplored role of Aurora A in the induction of β-glucan-driven epigenetic change.

      We thank the reviewer for the positive and encouraging comments.

      Weaknesses:

      Given the established role of histone methylations, such as H3K4me3, in trained immunity, it is not surprising that depletion of the methyl donor SAM impairs the training response. Nonetheless, this study provides solid evidence supporting the role of Aurora A in β-glucan-induced trained immunity in murine macrophages. The part of in vivo trained immunity antitumor effect is insufficient to support the final claim as using Alisertib could inhibits Aurora A other cell types other than myeloid cells.

      We appreciate the question raised by the reviewer. Though SAM generally acts as a methyl donor, whether the epigenetic reprogram in trained immunity is directly linked to SAM metabolism was not formally tested previously. In our study, we provided evidence suggesting the necessity of SAM maintenance in supporting trained immunity. As for in vivo tumor model, we agree that alisertib may inhibits Aurora A in many cell types besides myeloid cells. To further address the reviewer’s concern, we have performed the suggested bone marrow transplantation experiment (trained mice as donor and naïve mice as recipient) to verify the contribution of myeloid cell-mediated trained immunity for antitumor effect (please see New Figure 8, also related to Figure 6C, 6D and Figure 6-figure supplement 1B and 1C in revised manuscript).

      Reviewer #1 (Recommendations for the authors):

      Some examples of spelling errors and other mistakes (by far not a complete list):

      (a) Introduction, second sentence: reads as if Candida albicans (which should be italicised and capitalised properly) and BCG are microbial polysaccharide components.

      (b) Methods: ECAR is ExtraCellular Acidification Rate, not 'Extracellular Acid Ratio'

      (c) Figure 2C: β-glucan is misspelled in the graph title.

      (d) TNFα has been renamed to 'TNF' for a long time now.

      (e) Inconsistent use of Tnf and Tfnα (the correct gene symbol is Tnf) (NB: this field does not allow me to italicise gene symbols)

      (f) Figure supplement 1B: 'secdonary'

      (g) Caption of figure 4: "Turkey's multiple-comparison test"

      (h) etc

      I would ask the authors that they please go over the entire manuscript very carefully to correct such errors.

      We apologize for these errors and careless mistakes. We greatly appreciate your suggestions, and have carefully proofread the revised manuscript to make sure no further mistakes.

      Please also address the points I raised in the public review about statistical approaches. Even more important than the relatively low 'n' is my question about biological replicates. Please clarify what you mean by 'biological replicate'.If you are able to repeat at least the in vitro experiments (if this is too much work pick the most important ones) a few more times this would really strengthen the results.

      Thank you for your comment. Our biological replicates refer to independently repeated experiments using bone marrow cells isolated from different mice, and n represents the number of mice used. We repeated each experiment at least three times using BMDMs isolated from different mice (n =3, biological replicates). Specifically, we repeated several in vitro experiments showing inhibition of AurA upregulated GNMT in trained BMDMs and showing transcription factor FOXO3 acted as a key protein in AurA-mediated GNMT expression to control trained immunity as well as showing mTOR agonist rescued trained immunity inhibited by alisertib (see New Figure 5, related to Figure 5B-C, Figure 5H in revised manuscript). Additionally, we have provided data with three biological replicates to show the β-glucan induced phosphorylation of AurA (see comment 6 of reviewer#1) and changes of histone modification marker under AurA inhibition and GNMT deficiency (see recommendation 2 of reviewer#2). We also repeated in vivo tumor model to analysis intratumor cytokines (see recommendation 12 of reviewer#1).

      Finally: the authors report 'no funders' during submission, but the manuscript contains funding details. Please modify this in the eLife submission system if possible.

      Thank you for your kind reminder and we have modified funding information in the submission system.

      Reviewer #2 (Recommendations for the authors):

      (1) I have the following methodological and interpretative comments for consideration:

      Aurora A has been previously implicated in M1 macrophage differentiation and NF-κB signaling. What is the effect of Aurora A inhibition on basal LPS stimulation? Considering that β-glucan + Ali also skews macrophage priming towards an M2 phenotype, as shown in Fig. 2E, further clarification on this point would strengthen the study.

      Thanks for your suggestion. Previous study showed AurA was upregulated in LPS-stimulated macrophages and the inhibition of AurA downregulated M1 markers of LPS-stimulated macrophages through NF-κB pathway but did not affect IL-4-induced M2 macrophage polarization [12]. Consistently, we also found that AurA inhibition downregulated inflammatory response upon basal LPS stimulation as shown by decreased IL-6 level (see New Figure 6). In original Figure 2E (also related to Figure 2E in revised manuscript), we showed an increased accessibility of Mrc1 and Chil3 under “β-glucan +Ali” before re-challenge, both of which are typical M2 macrophage markers. Motif analysis showed that AurA inhibition would upregulate genes controlled by PPARγ (STAT6 was not predicted). Different from STAT6, a classical transcriptional factor in controlling M2 polarization (M2a) dependent on IL-4 or IL-13, PPARγ mediates M2 polarization toward M2c and mainly controls cellular metabolism on anti-inflammation independent on IL-4 or IL-13. Thus, we speculate that inhibition of AurA might promote non-classical M2 polarization, and the details warrant future investigation.

      (2) In Figure 4A, it looks like that H3K27me3 is also significantly upregulated by β-glucan and inhibited by Ali. How many biological replicates were performed for these experiments? It would be beneficial to include densitometric analyses to visualize differences across multiple Western blot experiments for better reproducibility and quantitative assessment. In addition, what is the effect of treatment of Ali alone on the epigenetic profiling of macrophages?

      We are sorry for this confusion. Each experiment was performed with at least three independent biological replicates. In original Figure 4-figure supplement 1 (also related to Figure 4-figure supplementary 1 in the revised manuscript), we presented the densitometric analysis results from three independent Western blot experiments, which showed that β-glucan did not affect H3K27me3 levels under our experimental conditions. Three biological replicates data for histone modification were shown as follows (New Figure 7, as related to Figure 4-figure supplement 1 in revised manuscript). We appreciate that assay for “Ali alone” in macrophages may add more value to the findings. We set the aim of the current study to investigate the role of Aurora kinase A in trained immunity, and we know that alisertib itself would not induce or suppress trained immunity. Therefore, in most settings, we did not test the effect of Alisertib alone without β-glucan stimulation.

      (3) The IL-6 and TNF concentrations exhibit considerable variability (Fig. 3K and Fig. 5H), ranging from below 10 pg/mL to 500-1000 pg/mL. Please specify the number of replicates for these experiments and provide more detail on how variability was managed. Including this information would enhance the robustness of the conclusions.

      Thank you for your comment. These experiments were replicated as least three times using BMDMs isolated from different mice. The observed variations in cytokines concentration may be attributed to factors such as differences in cell density, variability among individual mice, and the passage number of the MC38 cells used for supernatant collection. We have prepared new batch of BMDMs and repeated the experiment and provided consistent results in the revised manuscript (please see Figure 5H in revised manuscript). Data for biological replicates have been provided (please see Appendix 2 in resubmit system).

      (4) The impact of Aurora A inhibition on β-glucan-induced anti-tumor responses appears complex. Specifically, GNMT expression is significantly upregulated in F4/80- cells, with stronger effects compared to F4/80+ cells as seen in Fig. 6D. To discern whether this is due to the abolishment of trained immunity in myeloid cells or an effect of Ali on tumor cells which inhibit tumor growth, I suggest performing bone marrow transplantation. Transplant naïve or trained donor BM into naïve recipients, followed by MC38 tumor transplantation, to clarify the mechanistic contribution of trained immunity versus off-target effects.

      Thanks for your valuable suggestion. Following your suggestion, we have performed bone marrow transplantation to clarify that alisertib acts on the BM cells to inhibit anti-tumor effect induced by trained immunity (see New Figure 8, related to Figure 6C-D in revised manuscript). As the results shown below, transplantation of trained BM cells conferred antitumor activity in recipient mice, while transplantation of trained BM cells with alisertib treatment lost such activity, further demonstrating that alisertib inhibited AurA in trained BM cells to impair their antitumor activity.

      References

      (1) Ferreira, A.V., et al., Metabolic Regulation in the Induction of Trained Immunity. Semin Immunopathol, 2024. 46(3-4): p. 7.

      (2) Keating, S.T., et al., Rewiring of glucose metabolism defines trained immunity induced by oxidized low-density lipoprotein. J Mol Med (Berl), 2020. 98(6): p. 819-831.

      (3) Cui, L., et al., N(6)-methyladenosine modification-tuned lipid metabolism controls skin immune homeostasis via regulating neutrophil chemotaxis. Sci Adv, 2024. 10(40): p. eadp5332.

      (4) Yu, W., et al., One-Carbon Metabolism Supports S-Adenosylmethionine and Histone Methylation to Drive Inflammatory Macrophages. Mol Cell, 2019. 75(6): p. 1147-1160 e5.

      (5) Arifin, W.N. and W.M. Zahiruddin, Sample Size Calculation in Animal Studies Using Resource Equation Approach. Malays J Med Sci, 2017. 24(5): p. 101-105.

      (6) Cheng, S.C., et al., mTOR- and HIF-1α-mediated aerobic glycolysis as metabolic basis for trained immunity. Science, 2014. 345(6204): p. 1250684.

      (7) Keating, S.T., et al., The Set7 Lysine Methyltransferase Regulates Plasticity in Oxidative Phosphorylation Necessary for Trained Immunity Induced by β-Glucan. Cell Rep, 2020. 31(3): p. 107548.

      (8) John, S.P., et al., Small-molecule screening identifies Syk kinase inhibition and rutaecarpine as modulators of macrophage training and SARS-CoV-2 infection. Cell Rep, 2022. 41(1): p. 111441.

      (9) Glant, T.T., et al., Differentially expressed epigenome modifiers, including aurora kinases A and B, in immune cells in rheumatoid arthritis in humans and mouse models. Arthritis Rheum, 2013. 65(7): p. 1725-35.

      (10) Jeljeli, M.M. and I.E. Adamopoulos, Innate immune memory in inflammatory arthritis. Nat Rev Rheumatol, 2023. 19(10): p. 627-639

      (11) Ferreira, A.V., et al., Fatty acid desaturation and lipoxygenase pathways support trained immunity. Nat Commun, 2023. 14(1): p. 7385.

      (12) Ding, L., et al., Aurora kinase a regulates m1 macrophage polarization and plays a role in experimental autoimmune encephalomyelitis. Inflammation, 2015. 38(2): p. 800-11.

    1. Author response:

      Reviewer #1 (Public review):

      Summary: 

      As a general phenomenon, adaptation of populations to their respective local conditions is well-documented, though not universally. In particular, local adaptation has been amply demonstrated in Arabidopsis thaliana, the focal species of this research, which is naturally highly selfing. Here, the authors report assays designed to evaluate the spatial scale of fitness variation among source populations and sites, as well as temporal variability in fitness expression. Further, they endeavor to identify traits and genomic regions that contribute to the demonstrated variation in fitness.  

      Strengths: 

      With many (200) inbred accessions drawn from throughout Sweden, the study offers an unusually fine sampling of genetic variation within this much-studied species, and through assays in multiple sites and years, it amply demonstrates the context-dependence of fitness expression. It supports the general phenomenon of local adaptation, with multiple nuances. Other examples exist, but it is of value to have further cases illustrating not only the context-dependence of fitness expression but also the sometimes idiosyncratic nature of fitness variation. I commend the authors on their cautionary language in relation to inferences about the roles of particular genomic regions (e.g.l.140-144; l.227)  

      Weaknesses: 

      To my mind, the manuscript is written primarily for the Arabidopsis community. This community is certainly large, but there are many evolutionary biologists who could appreciate this work but are not invited to do so. The authors could address the broader evolution community by acknowledging more of the relevant work of others (I've noted a few references in my comments to the authors). At least as important, the authors could make clearer the fact that A. thaliana is (almost) strictly selfing and how this feature of its biology both enables such a study and also limits inferences from it. Further, it seems to me that though I could be wrong, readers would appreciate a more direct, less discursive style of writing, and one that makes the broader import of the focal questions clearer. 

      we agree that connecting the paper better to the broader field is desirable, and will try to do this in the revision. As for how selfing matters, there certainly are some things we can discuss, but a general discussion is probably a suitable topic for a review/opinion article!

      As a reader, I would value seeing estimates of the overall fitness of the accessions in the different conditions, i.e., by combining the survival and fecundity results of the common garden experiments.

      Combining estimates would be possible in the common garden experiments, and would bring us somewhat closer to total fitness estimates, although as noted by another reviewer (and also emphasized by us), the time scale of our experiment is not sufficient to evaluate the trade-off between survival and fecundity. Furthermore, we would still be missing the establishment component of fitness, which we found to be extremely important. Therefore little would be gained by combining the estimates, while at the same time losing resolution to disentangle the fitness components. We thus decided to focus on the individual fitness components and leave consideration of their joint effect for the Discussion.

      Reviewer #2 (Public review):

      Summary: 

      The goal of this study was to find evidence for local adaptation in survival and fecundity of the model plant Arabidopsis thaliana. The authors grew a large set of Swedish Arabidopsis accessions at four common garden sites in northern and southern Sweden. Accessions were grown from seed in trays, which were laid on the ground at each site in late summer, screened for survival in fall and the following spring, and fecundity was determined from rosette size and seed production in spring. Experiments were complemented by 'selection experiments', in which seeds of the same accessions were sown in plots, and after two years of growth, plants were sampled to determine fitness from genotype frequencies, providing a more comprehensive evaluation of lifetime fitness than can be gleaned from fecundity alone. 

      To clarify, fecundity was determined from total plant area using photos of the mature stems, not the rosettes or direct counting of seeds. That said, it is true that our fecundity estimate was well correlated with rosette area. Furthermore, we validate our fecundity estimates by showing they were highly correlated with seed production estimated by measuring and counting siliques on a separate set of plants grown under common garden conditions in one of our sites (Brachi et al.2022). 

      As the main result, southern accessions had higher mortality in northern sites in one of two years, but also suffered more slug damage in southern sites in one year, indicating a potential link between frost tolerance and herbivore resistance. Fecundity of accession was highest when growing close to the 'home' environment, but while accessions from one sand dune population in southern Sweden had among the lowest fecundities overall, they consistently had the highest fitness in the selection experiment. Accessions from this population had large seed size and rapid root growth, which might be related to establishment success when arriving in a new, partially occupied habitat. However, neither trait could fully explain the very high fitness of this population, suggesting the presence of other, unmeasured traits. 

      Overall, the authors could provide clear evidence of local adaptation in different traits for some of their experiments, but they also highlight high temporal and spatial variability that makes prediction of microevolutionary change so challenging. 

      Strengths: 

      A major strength of this study is the highly comprehensive evaluation of different fitness-related traits of Arabidopsis under natural conditions. The evaluation of survival and fecundity in common garden experiments across four sites and two years provides an estimate of variability and consistency of results. The addition of the 'selection experiment' provides an extended view on plant fitness that is both original and interesting, in particular highlighting potential limitations of 'fitness-proxies' such as seed production that don't take into account seedling establishment and competitive exclusion. 

      Throughout the study, the authors have gone to impressive depths in exploring their data, and particularly the discovery of 'native volunteers' in selection experiment plots and their statistical treatment is very elegant and has resulted in compelling conclusions. Also, while the authors are careful in the interpretation of their GWAS results, they nonetheless highlight a few interesting gene candidates that may be underlying the observed plant adaptations, and which likely will stimulate further research. 

      Overall, the authors provide a rich new resource that is relevant and interesting both in the context of general evolutionary theory as well as more specifically for molecular biology. 

      Weaknesses:

      While the repetition of the common garden experiments over two years is certainly better than no repetition (hence its mention also under 'strengths'), the very high variability found between the two years highlights the need for more extensive temporal replication. In this context, two temporal replicates are the bare minimum, and more repeats in time would be necessary to draw any kind of conclusion about the role of 'high mortality' and 'low mortality' years for the microevolution of Arabidopsis. It also seems that the authors missed an opportunity to explore potentially causal variation among years, as they did not attempt to relate winter mortality to actual climatic variables, even though they discuss winter harshness as a potential predictor.

      We agree that two years is insufficient to understand how variation in selective pressures compound over time to generate micro-evolutionary change. The eight-year data in Oakley et al. (2023), which we discuss in the paper, support this. Our results are nonetheless sufficient to demonstrate the idiosyncratic nature of selection. In the revision, we will further emphasize that far longer time series would be needed for definitive conclusions.

      Our short time series is also why we do not try to correlate with climate data, as this would amount to doing statistics with four data points (mostly two groups of accession N vs S, with mostly homogenous climates within groups, and two years).

      The low temporal variation also makes the accidental slug herbivory appear somewhat random. Potted plants are notoriously susceptible to slug herbivory, and while it is certainly nice that slug damage predominantly affected one group of accessions, it nonetheless raises the question whether this reflects a 'real' selection pressure that plants commonly face in their respective local environments. 

      We agree with this point as well. The evidence for selection on glucosinolates by generalist herbivores such as slugs is fairly strong, but the precise agent is not known, and probably varies over time and space. Our results merely demonstrate one possibility (and we will clarify this in the revision).

      The addition of the 'selection experiment' is certainly original and provides valuable additional insights, but again, it seems a bit questionable which natural process really has affected this outcome. While the genetic and statistical analysis of this experiment seems to be state-of-the-art, the experimental design is rather rudimentary compared to more standard selection experiments. Specifically, the authors added seeds from greenhouse-grown mothers to experimental plots and only sampled plants two years later. This means that, potentially,y the first very big bottleneck was germination under natural conditions, which may have already excluded many of the accessions before they had a chance to grow. While this certainly is one type of selection, it is not exactly the type of selection that a 2-year selection experiment is set up to measure. Either initially establishing the selection experiment from plants instead of seeds, or genotyping the population over several generations, would have substantially strengthened the conclusions that could be drawn from this experiment.

      We agree that more data would have been beneficial, and we do not make strong claims about the nature of selection. Among other phenotypes, we mention dormancy, and note that existing dormancy estimates do not predict fitness in our selection experiments. In addition the same seed batches germinated uniformly in the common-garden experiments with minimal stratification (we will note this in the revision).

      Also, the complete lack of information on population density is a bit problematic. It is not clear if there were other (non-Arabidopsis) plants present in the plots, how many Arabidopsis plants were established, if numbers changed over the year, etc. Given all of these limitations, calling this a 'selection experiment' is in fact somewhat misleading. 

      Seeds were introduced into sites that appeared appropriate for A. thaliana, leaving the background community intact. We provided information on sowing density; the density of plants (A. thaliana and other species) that we obtained during the course of the experiments varied considerably between sites, much like in natural populations, although we lack systematic measurements. We will provide more information (including photos) in the revision.  

      Despite these weaknesses, the authors could achieve their main goals, and despite the somewhat minimal temporal replication, they were lucky to sample two fairly distinct years that provided them with interesting variation, which they could partially explain using the variation among their accessions. Overall, this study will likely make an important contribution to the field of evolutionary biology, and it is another very strong example of how the extensive molecular tools in Arabidopsis can be leveraged to address fundamental questions in evolution and ecology, to an extent that is not (yet) possible in other plant systems. 

      Reviewer #3 (Public review)

      Summary: 

      The manuscript presents a large common garden experiment across Sweden using solely local germplasm. Additionally, there is a collection of selection experiments that begin investigating the factors shaping fecundity in these populations. This provides an impressive amount of data and analysis investigating the underlying factors involved. Together, this helps support the data showing that fluctuations and interactions are key components determining Arabidopsis fitness and are more broadly applicable across plant and non-plant species. 

      Strengths: 

      The field trials are well conducted with extensive effort and sampling. Similarly while the genetic analysis is complex it is well conducted and reflects the complexity of dealing with population structure that may be intricately linked to adaptive structure. This has no real solution and the option of presenting results with and without correction is likely the only appropriate option. 

      Weaknesses: 

      A significant finding from this study was that fecundity is shaped more by yearly fluctuations and their interaction with genotype than it is by the main effect of location or genotype. Another significant finding is that the strength of selection can be quite strong, with nearly 5x ranges across accessions. It should be noted that there are a number of other studies using Arabidopsis in the wild with multiple years and locations that found similar observations beyond the Oakley citation. In general, the context of how these findings relate to existing knowledge in Arabidopsis is a bit underdeveloped. 

      We shall remedy this in the revision (see also comments by Reviewer #1).

      The effects of the populations across the locations seem to rely on individual tests and PC analysis. It would seem to be possible to incorporate these tests more directly in the linear modeling analysis, and it isn't quite clear why this wasn't conducted. 

      The fecundity estimates were modelled for all experiments simultaneously and the results are presented in Figure 6 to explore the relative importance of genotype effects and interaction terms including genotypes. For survival and fecundity, the BLUPS are generated from linear mixed models fitted for all experiments simultaneously including a random intercept effect for the genotypes within experiments. A principal component analysis is used to explore the pattern of accession effects (BLUPS) on fecundity (Figure 7); this will be explained in the Methods.  

      I'm a bit puzzled by the discussion on how to find causative loci. This seems to focus solely on GWAS as the solution, with a goal to sequence vast individuals. But the loci that the manuscript discussed were found by a combination of structured mapping populations followed by molecular validation that then informed the GWAS. As such, I'm unsure if the proposed future approach of more sequencing is the best when a more balanced approach integrating diverse methods and population types will be more useful. 

      We are puzzled by this comment in return. Our statement about more sequencing (penultimate sentence of discussion) was referring to achieving a better understanding of the history of migration and selection rather than identifying causative loci. Happy for clarification!

      References

      Brachi, Benjamin, Daniele Filiault, Hannah Whitehurst, Paul Darme, Pierre Le Gars, Marine Le Mentec, Timothy C. Morton, et al. 2022. “Plant Genetic Effects on Microbial Hubs Impact Host Fitness in Repeated Field Trials.” Proceedings of the National Academy of Sciences of the United States of America 119 (30): e2201285119.

      Oakley, Christopher G., Douglas W. Schemske, John K. McKay, and Jon Ågren. 2023. “Ecological Genetics of Local Adaptation in Arabidopsis: An 8-Year Field Experiment.” Molecular Ecology, June. https://doi.org/10.1111/mec.17045.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public review):

      The paper is well written and the figures well laid out. The methods are easy to follow, and the rational and logic for each experiment easy to follow. The introduction sets the scene well, and the discussion is appropriate. The summary sentences throughout the text help the reader.

      The authors have done a lot of work addressing my previous concerns and those of the other Reviewers.

      We are pleased that the revised manuscript satisfactorily addresses the previous concerns of the reviewer.

      Reviewer #2 (Public review):

      Summary

      Le Roy et al quantify wing morphology and wing kinematics across twenty eight and eight hoverfly species, respectively; the aim is to identify how weight support during hovering is ensured across body sizes. Wing shape and relative wing size vary non-trivially with body mass, but wing kinematics are reported to be size-invariant. On the basis of these results, it is concluded that weight support is achieved solely through size-specific variations in wing morphology, and that these changes enabled hoverflies to decrease in size. Adjusting wing morphology may be preferable compared to the alternative strategy of altering wing kinematics, because kinematics may be subject to stronger evolutionary and ecological constraints, dictated by the highly specialised flight and ecology of the hoverflies.

      Strengths

      The study deploys a vast array of challenging techniques, including flight experiments, morphometrics, phylogenetic analyses, and numerical simulations; it so illustrates both the power and beauty of an integrative approach to animal biomechanics. The question is well motivated, the methods appropriately designed, and the discussion elegantly places the results in broad biomechanical, ecological, and evolutionary context.

      We thank the reviewer for appreciating the strengths of our study.

      Weaknesses

      (1) In assessing evolutionary allometry, it is key to pinpoint the variation expected from changes in size alone. The null hypothesis for wing morphology is well-defined (isometry), but the equivalent predictions for kinematic parameters, although specified, are insufficiently justified, and directly contradict classic scaling theory. A detailed justification of the "kinematic similarity" assumption, or a change in the null hypothesis, would substantially strengthen the paper, and clarify its evolutionary implications.

      We agree with the reviewer that a clearly articulated null hypothesis is crucial for interpreting scaling relationships. In fact, when carefully reviewing our manuscript, we realized that we nowhere did so, and which might have led to a misinterpretation of this. In the revised manuscript, we therefore now explicitly state our newly defined null hypotheses (lines 120–125, 340-352), and how we tested these (lines 359-360).

      In fact, we define two alternative null hypotheses: (1) weight support is maintained across sizes using allometric scaling of wing morphology only, and thus wingbeat kinematics are kept constant (kinematic similarity); (2) weight support is maintained across sizes using allometric scaling of wingbeat kinematics, while wing morphology scales isometrically (morphological similarity).

      According to the first null hypothesis, the second-moment-of-area of the wing should scale linearly with body mass, resulting in negative allometry of S<sub>2</sub> relative to body mass (S<sub>2</sub>∼m<sup>1</sup> <m<sup>4/3</sup>). According to the second null hypothesis, the product of wingbeat frequency and amplitude should scale with mass under negative allometry (ω∼ƒ A<sub>ϕ</sub>∼m<sup>-1/6</sup>). We test these alternative null hypotheses using Phylogenetic Generalized Least Square (PGLS) regressions of the morphology and kinematics metrics against the body mass.

      Furthermore, in our revised manuscript, we now also better explain the use of "kinematic similarity" assumption as a theoretical scenario, that is physically, biomechanically nor physiological sustainable across sizes, but that we merely use to define our null hypotheses (lines 340-351). This is made particularly explicit in a new subsection named “Theoretical considerations” (lines 448–461). Note that our second null hypothesis is thus not that hoverflies fly under "kinematic similarity", but that wingbeat kinematics scales under negative allometry (ω∼ƒ A<sub>ϕ</sub>∼m<sup>-1/6</sup>), which we assume is in line with the classic scaling theory that the reviewer refers to.

      We sincerely thank the reviewer for making us aware that we did not explicitly state our null hypotheses, and that introducing these new null hypotheses removed the confusion about the assumptions in our study.

      (2) By relating the aerodynamic output force to wing morphology and kinematics, it is concluded that smaller hoverflies will find it more challenging to support their body mass--a scaling argument that provides the framework for this work. This hypothesis appears to stand in direct contrast to classic scaling theory, where the gravitational force is thought to present a bigger challenge for larger animals, due to their disadvantageous surface-to-volume ratios. The same problem ought to occur in hoverflies, for wing kinematics must ultimately be the result of the energy injected by the flight engine: muscle. Much like in terrestrial animals, equivalent weight support in flying animals thus requires a positive allometry of muscle force output. In other words, if a large hoverfly is able to generate the wing kinematics that suffice to support body weight, an isometrically smaller hoverfly should be, too (but not vice versa). Clarifying the relation between the scaling of muscle mechanical input, wing kinematics, and weight support would help resolve the conflict between these two contrasting hypotheses, and considerably strengthen the biomechanical motivation and evolutionary interpretation.

      We agree with the reviewer that, due to disadvantageous surface-to-volume ratios, larger animals are more challenged to maintain weight-support, and that this is also the case for hovering hoverflies. In the current manuscript, we do not aim to challenge this universal scaling law of muscle force with body mass.

      Instead, we here focus merely on how the flight propulsion system (wing morphology and kinematics) scale with size, and how this allows hovering hoverflies to maintain weight support. We also fully agree with the reviewer that in theory, “if a large hoverfly is able to generate the wing kinematics that suffice to support body weight, an isometrically smaller hoverfly should be, too”. This aligns in fact with our second null hypothesis where wingbeat frequency should scale as ƒ∼m<sup>-1/6</sup>, to maintain weight support under morphological isometry.

      In our study, we show that this null hypothesis is rejected (lines 511-517, and line 525), and thus hoverflies primarily adjust their wing morphology to maintain in-hovering weight-support across sizes, and wingbeat kinematics is in fact highly conserved. Why this specific flight kinematics is so strongly conserved is not known, and thus a key topic in the discussion section of our manuscript.

      We agree with the reviewer that muscle physiology might be an important driver for this conserved kinematics, but also aerodynamic efficiency and maneuverability could be key aspects here. In our revised manuscript, we now discuss these three aspects in more detail (lines 762-775). Also, we here now also mention that we aim to address this outstanding question in future studies, by including muscle physiology in our animal flight studies, and by studying the aerodynamics and maneuver kinematic of hoverflies in more detail. 

      Moreover, in our revised introduction section, we now also mention explicitly that the capability for maintaining in-flight weight-support scales inversely with animal size, due to the negative isometric scaling of muscle force with body mass (line 52-56). Furthermore, we removed all statements that might suggest the opposite. We hope that these adjustments helped resolve the apparent conflict between our null hypotheses and general muscle scaling laws.

      Finally, in the Discussion section (lines 770-775), we now more explicitly acknowledge that wing motion is ultimately driven by the flight motor musculature, and that a full biomechanical interpretation must consider the scaling of muscle mechanical input alongside wing kinematics and morphology. While we decided to keep the focus primarily on aerodynamic constraints in this study, we agree that future work integrating both aerodynamic and physiological scaling will be essential to fully resolve these contrasting perspectives.

      (3) One main conclusion-- that miniaturization is enabled by changes in wing morphology--is insufficiently supported by the evidence. Is it miniaturization or "gigantism" that is enabled by (or drives) the non-trivial changes in wing morphology? To clarify this question, the isolated treatment of constraints on the musculoskeletal system vs the "flapping-wing based propulsion" system needs to be replaced by an integrated analysis: the propulsion of the wings, is, after all, due to muscle action. Revisiting the scaling predictions by assessing what the engine (muscle) can impart onto the system (wings) will clarify whether non-trivial adaptations in wing shape or kinematics are necessary for smaller or larger hovering insects (if at all!).

      In many ways, this work provides a blueprint for work in evolutionary biomechanics; the breadth of both the methods and the discussion reflects outstanding scholarship.

      In response to the first review round, we have removed all references to “miniaturization,” as our data does not allow us to infer evolutionary trajectories of body size (i.e., whether lineages have become smaller or larger over time). We now frame our conclusion more conservatively: that changes in wing morphology enable small hoverflies to maintain weight support despite the aerodynamic disadvantages imposed by isometric scaling.

      We fully agree that an integrated biomechanical framework, explicitly linking muscle mechanical output with wing kinematics and morphology, would significantly strengthen the study. However, we believe that performing an integrated analysis assessing the scaling of muscle input into the wing is beyond the current scope, which focuses specifically on the aerodynamic consequences of morphological and kinematic variation (see reply above).

      Reviewer #3 (Public review):

      This paper addresses an important question about how changes in wing morphology vs. wing kinematics change with body size across an important group of high-performance insects, the hoverflies. The biomechanics and morphology convincingly support the conclusions that there is no significant correlation between wing kinematics and size across the eight specific species analyzed in depth and that instead wing morphology changes allometrically. The morphological analysis is enhanced with phylogenetically appropriate tests across a larger data set incorporating museum specimens.

      The authors have made very extensive revisions that have significantly improved the manuscript and brought the strength of conclusions in line with the excellent data. Most significantly, they have expanded their morphological analysis to include museum specimens and removed the conclusions about evolutionary drivers of miniaturization. As a result, the conclusion about morphological changes scaling with body size rather than kinematic properties is strongly supported and very nicely presented with a strong complementary set of data. I only have minor textual edits for them to consider.

      We thank the reviewer for this positive feedback. We are pleased to hear that the revised manuscript is satisfactory.

      Reviewer #2 (Recommendations For The Authors):

      My main remaining qualm remains the null hypothesis for the scaling of kinematic parameters - all weaknesses come back to this point. I appreciate that the authors now specify an expectation, but they offer no justification. This is a problem, because the expectation dictates the interpretation of the results and is thus crucial to some of the key claims (including one in the paper title!): the choice made by the authors indeed implies that hovering is harder for small hoverflies, so that the reported changes in size-specific wing morphology are to be interpreted as an adaptation that enables miniaturization. However, why is this choice appropriate over alternatives that would predict the exact opposite, namely that hovering is harder for larger hoverflies?

      In my original review, I suggested that the authors may address this key question by considering the scaling of muscle mechanical output, and provided a quick sketch of what such an argument would look like, both in classic textbook scaling theory, and in the framework of more recent alternative approaches. The authors have decided against an implementation of this suggestion, providing various version of the following justification in their reply: "our study focuses precisely on this constraint on the wing-based propulsion system, and not on the muscular motor system." I am puzzled by this distinction, which also appears in the paper: muscle is the engine responsible for wing propulsion. How can one be assessed independent of the other? The fact that the two must be linked goes straight to the heart of the difficulty in determining the null hypotheses for the allometry of kinematic and dynamic parameters: they must come from assertions on how muscle mechanical output is expected to vary with size, and so couple muscle mechanical output to the geometry of the wing-based propulsion system. What if not muscle output dictates wing kinematics?

      I fully agree with the authors that null hypotheses on kinematic parameters are debatable. But then the authors should debate their choice, and at least assess the plausibility of its implications (note that the idea of "similarity" in scaling does not translate to equal or invariant, but is tied closely to dimensional analysis - so one cannot just proclaim that kinematic similarity implies no change in kinematic parameters). I briefly return to the same line of argument I laid out in the initial review to provide such an assessment:

      Conservation of energy implies:

      W = 1/2 I ω2

      where I is the mass moment of inertia and W is the muscle work output. Under isometry, I ∝m5/3, the authors posit ω ∝m0, and it follows at once that they predict W ∝m5/3. That is, the "kinematic similarity" hypothesis presented in the paper implies that larger animals can do substantially more work per unit body mass than small animals (unless the author have an argument why wing angular velocity is independent of muscle work capacity, and I cannot think of one). This increase in work output is in contradiction with the textbook prediction, going all the way back to Borelli and Hill: isogeometric and isophysiological animals ought to have a constant mass-specific work output. So why, according to the authors, is this an incorrect expectation, ie how do they justify the assumption ω ∝m0 and its implication W ∝m5/3? How can larger animals do more mass-specific work, or, equivalently, what stops smaller animals from delivering the same mass-specific work? If non-trivial adaptations such as larger relative muscle mass enable larger animals to do more work, how does this fit within the interpretation suggested by the authors that the aerodynamics of hovering require changes in small animals?

      A justification of the kinematic similarity hypothesis, alongside answers to the above questions, is necessary, not only to establish a relation to classic scaling theory, but also because a key claim of the paper hinges on the assumed scaling relationship: that changes in wing morphology enable hovering in small hoverflies. If I were to believe Borelli, Hill and virtually all biomechanics textbooks, the opposite should be the case: combing constant mass-specific work output with eq. 1, one retrieves F∝m2/3, so that weight support presents a bigger challenge for larger animals; the allometry of wing morphology should then be seen as an adaptation that enables hovering in larger hoverflies - the exact opposite of the interpretation offered by the authors.

      Now, as it so happens, I disagree with classic scaling theory on this point, and instead believe that there are good reasons to assume that muscle work output varies non-trivially with size. The authors can find a summary of the argument for this disagreement in the initial review, or in any of the following references:

      Labonte, D. A theory of physiological similarity for muscle-driven motion. PNAS, 2023, 120, e2221217120

      Labonte, D.; Bishop, P.; Dick, T. & Clemente, C. J. Dynamics similarity and the peculiar allometry of maximum running speed. Nat Comms., 2024, 15, 2181

      Labonte, D. & Holt, N. Beyond power limits: the kinetic energy capacity of skeletal muscle. J Exp Bio, 2024, 227, jeb247150

      Polet, D. & Labonte, D. Optimal gearing of musculoskeletal systems. Integr Org Biol, 2024, 64, 987-10062024

      I am asking neither that the authors agree with the above references nor that they cite them. But I do expect that they critically discuss and justify their definition of kinematic similarity, its relation to expectation from classic scaling theory, and the implications for their claim that hovering is harder for small animals. I do note that the notion of "physiological similarity" introduced in the above references predicts a size-invariant angular velocity for small animals, that small animals should be able to do less mass-specific work, and that average muscle force output can grow with positive allometry even for isogeometric systems. These predictions appear to be consistent with the data presented by the authors.

      We agree with the reviewer that our null hypothesis was not clearly articulated in our previous version of the manuscript, and that this might have led to a misinterpretation of the merits and limitations of our study. In the revised manuscript, we therefore now explicitly introduce our null hypotheses in the Introduction (lines 120–125), we define these in the Methods section (lines 340–360), test these in the Results section (lines 511–517), and reflect on the results in the Discussion (lines 602–610). We thank the reviewer for pointing out this unclarity in our manuscript, because revising it clarified the study significantly. See our replies in the “Public Review” section for details.

      Minor points

      L56: This is somewhat incomplete and simplistic; to just give one alternative option, weight support with equivalent muscle effort could also be ensured by a change in gearing (see eg Biewener's work). It is doubtful whether weight support is a strong selective force, as any animal that can move will be able to support its weight. The impact of scaling on dynamics is thus arguably more relevant.

      We thank the reviewer for pointing out that our original sentence may be too simplistic. We now briefly mention alternative mechanisms (suggested by the reviewer) to provide more nuance (line 56-58).

      L58: I am not aware of any evidence that smaller animals have reduced the musculature dedicated to locomotion beyond what is expected from isometry; please provide a reference for this claim or remove it.

      We removed that claim.

      The authors use both isometry and geometric similarity. As they also talk about muscle, solely geometric similarity (or isogeometry) may be preferable, to avoid confusion with isometric muscle contractions.

      To avoid confusion, we now use “geometric similarity” wherever the use of isometry might be ambiguous.

      L86: negative allometry only makes sense if there is a justified expectation for isometry - I suggest to change to "The assumed increase in wingbeat frequency in smaller animals" or similar, or to clarify the kinematic similarity hypothesis.

      We edited the sentence as suggested.

      L320: This assertion is somewhat misleading. Musculoskeletal systems are unlikely to be selected for static weight support. Instead, they need to allow movement. Where movement is possible, weight support is trivially possible, and so weight support should rarely, if ever, be a relevant constraint. At most, the negative consequence of isometry on weight support would be that a larger fraction of the muscle mass needs to be active in larger animals to support the weight.

      We fully agree with the reviewer that musculoskeletal systems are unlikely not selected for static loads, as the ability to move dynamically in the real world is crucial for survival. That said, we here look at hovering flight, which is far from static. In fact, hovering flight is among the energetic most costly movement patterns found in nature, due to the required high-frequency wingbeat motions (Dudley 2002). Rapid maneuvers are of course more power demanding, but hovering is a good proxy for this. For example, in fruit flies maximum force production in rapid evasive maneuvers are only two times the force produced during hovering (Muijres et al., 2014).

      We agree with the reviewer that it is important to explicitly mention the differences in functional demands on the motor system in hovering and maneuvering flight, and thus we now do so in both the introduction and discussion sections (lines 116-118 and 762-765, respectively).

      Dudley, Robert. The biomechanics of insect flight: form, function, evolution. Princeton university press, 2002.Muijres, F. T., et al. "Flies evade looming targets by executing rapid visually directed banked turns." Science 344.6180 (2014): 172-177.

      Reviewer #3 (Recommendations For The Authors):

      Throughout, check use of "constrains" vs. "constraints"

      Thank you for pointing this out. We have corrected these errors.

      Line 52 do you mean lift instead of thrust?

      We agree with the reviewer that the use of “thrust” might be confusing in the context of hovering flight, and thus we replaced “flapping-wing-based aerodynamic thrust-producing system” with the “flapping-wing-based propulsion system”. This way, we no longer use the word thrust in this context, and only use lift as the upward-directed force required for weight-support.

      Line 60 "face also constrains" wording

      Corrected.

      Line 79 Viscous forces only "dominate" at Re<1 and so this statement only refers to very very small insects which I suspect are far below the scale of the hoverflies considered (likely Re ~100) although maybe not for the smallest 3 mg ones?

      Indeed, viscous forces do not “dominate” force production at the Reynolds numbers of our flying insects. We thank the reviewer for pointing out this incorrect statement, which we corrected in the revised manuscript.

      Line 85 again thrust doesn't seem to be right

      Agreed. See reply 3.2.

      533 "maximized" should probably be "increased"

      We now use “increased”.

      Line 705-710 The new study by Darveau might help resolve this a bit because of the reliability of this relationship across and between orders. Darveau, C.-A. (2024). Insect Flight Energetics And the Evolution of Size, Form, And Function. Integrative And Comparative Biology icae028.

      We thank the reviewer for this highly relevant reference, which was unfortunately not included in the original manuscript. In connection with this work, we now further discuss the relationship between wing size allometry and deviations from the expected scaling of wingbeat frequency (lines 730-735).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      This Tanzanian study focused on the relationship between human genetic ancestry, Mycobacterium tuberculosis complex (MTBC) diversity, and tuberculosis (TB) disease severity. The authors analyzed the genetic ancestry of 1,444 TB patients and genotyped the corresponding MTBC strains isolated from the same individuals. They found that the study participants predominantly possess Bantu-speaking genetic ancestry, with minimal European and Asian ancestry. The MTBC strains identified were diverse and largely resulted from introductions from South or Central Asia. Unfortunately, no associations were identified between human genetic ancestry, the MTBC strains, or TB severity. The authors suggest that social and environmental factors are more likely to contribute to TB severity in this setting.

      Strengths:

      In comparison to other studies investigating the role of human genetics in TB phenotypes, this study is relatively large, with more than 1,400 participants.

      The matched human-MTBC strain collection is valuable and offers the opportunity to address questions about human-bacterium co-evolution.

      Weaknesses:

      Although the authors had genome-wide genotyping and whole genome sequencing data, they only compared the associations between human ancestry and MTBC strains. Given the large sample size, they had the opportunity to conduct a genome-wide association study similar to that of Muller et al. (https://doi.org/10.1016/j.ygeno.2021.04.024).

      Thank you very much for taking the time to carefully review our manuscript and for your suggestions and comments. In another published study using the same cohort (https://doi.org/10.1101/2023.05.11.23289848), we performed a genome-wide association analysis between the genome-wide SNPS of the host and the genome-wide SNPs from the paired MTBC strains. In the current work we were interested in testing specifically if host ancestry and pathogen genotype family, as well as their interaction, were associated with differences in disease severity, a clinical phenotype with direct consequences for both host and pathogen fitness. The study of Müller et al, referred to by the reviewer, investigates whether MTBC families of strains causing disease in two patient cohorts (South Africa and Ghana) were associated with particular human SNPS assessed genome-wide. In that study, clinical phenotypes were not assessed and human ancestries, in a much broader sense than the ones used in our current study, were used as covariates. To leverage the genome-wide information and the clinical variables collected in our study, we have now added a genome-wide association analysis of all the human SNPs with disease severity measures while adjusting for co-variates (age, sex,  smoking, cough duration, socioeconomic status, history of previous TB, malnutrition, education level, and drug resistance status) and for human population stratification . Yet, no significant statistical associations were detected (L243-249).

      The authors tested whether human genetic ancestry is associated with TB severity. However, the basis for this hypothesis is unclear. The studies cited as examples all focused on progression to active TB (from a latent infection state), which should not be conflated with disease severity. It is difficult to ascertain whether the role of genetic ancestry in disease severity would be detectable through this study design, as some participants might simply have been sicker for longer before being diagnosed (despite the inquiry about cough duration). This delay in diagnosis would not be influenced solely by human genetics, which is the conclusion of the study.

      Evidence that mortality and natural recovery from TB vary by disease presentation spectrum come from studies carried out before the introduction of anti-TB chemotherapy. Patients with mild disease presentation, as measured by radiology at the time of diagnosis had higher odds of recovering naturally compared to those with advanced disease (doi: 10.5588/ijtld.23.0254, doi: 10.1164/arrd.1960.81.6.839). Given the deleterious effects of an MTBC infection leading to symptomatic disease on human fitness, we hypothesized that natural selection has acted on human traits underlying TB disease severity. If those traits are heritable one would expect to find underlying genetic variation in human populations. In addition, because certain MTBC genotype families and human populations have co-existed since a least a few centuries to a few millennia, we hypothesized that some of that genetic variation could be related to human ancestry. We have added more details to the introduction to make our rational clearer (L118-127).  In our patient cohort, we observed a large variation in disease severity using as approximations; TB-Score, X-Ray score and bacterial burden in sputa (Ct-value as determined with GeneXpert). However, the reviewer is absolutely correct in that patients in our study are being diagnosed at different stages of disease confounding our analysis. This is a limitation of our study which cannot be fully accounted for by including cough duration, as we also acknowledged in the manuscript (L343-346).

      Additionally, the study only included participants who attended the TB clinic.

      Yes, this is related to the previous point, our study only considers patients that felt ill enough to visit the TB clinic potentially not including patients that had less severe disease as acknowledged.

      Including healthy controls from the general population would have provided an interesting comparison to see if ancestry proportions differ.

      We agree that it would be interesting to compare the ancestries of healthy controls to the ancestries of TB patients from the same population. However, that would be especially informative with respect to TB susceptibility and would not necessarily be informing disease severity traits and its underlying genetics. The similarities between the ancestry proportions of our cohort with those of neighboring countries such as Kenya, Malawi and Mozambique publicly available genomic data, suggests that there would be no major differences between TB patients and healthy controls.

      Although the authors suggest that social and environmental factors contribute to TB severity, only age, smoking, and HIV status were characterised in the study.

      Based on the comments of both reviewers, we added the following additional variables as covariates in the regression models: the socioeconomic status representing the ratio between the household income and the number of individuals in the household, malnutrition, the education level and whether it was a relapse/reinfection or a new case.

      Reviewer #2 (Public review):

      Summary:

      This manuscript reports the results of an observational study conducted in Dar es Salaam, Tanzania, investigating potential associations between genetic variation in M. tuberculosis and human host vs. disease severity. The headline finding is that no such associations were found, either for host / bacillary genetics as main effects or for interactions between them.

      Strengths:

      Strengths of the study include its large size and rigorous approaches to classification of genetic diversity for host and bacillus.

      Weaknesses:

      (1) There are some limitations of the disease severity read-outs employed: X-ray scores and Xpert cycle thresholds from sputum analysis can only take account of pulmonary disease. CXR is an insensitive approach to assessing 'lung damage', especially when converted to a binary measure. What was the basis for selection of Ralph score of 71 to dichotomise patients? If outcome measures were analysed as continuous variables, would this have been more sensitive in capturing associations of interest?

      Thank you very much for taking the time to carefully review our manuscript and for your suggestions and comments.  

      We recruited active TB patients with pulmonary TB disease that were sputum smear-positive and GeneXpert-positive. In this study we aimed at obtaining paired samples from both the patient and the strain, and in the current analysis we aimed at testing if human ancestry and its interaction with the strain genotype could explain differences in disease severity. It is often difficult to obtain microbiological cultures from extra-pulmonary cases and including those cases would have not been possible at the scale of this cohort. We believe as well that extra-pulmonary TB is of less relevance for the question we are addressing because in exclusively extrapulmonary cases, disease severity is not linked with bacterial transmission. However, extra-pulmonary TB can be extremely severe, and it would be very interesting to explore the potential role of human genetic variation underlying extra-pulmonary TB in future studies.

      As to the insensitivity of CXR to measure lung damage, we would argue that it depends on what is being assed. As a rationale for the Ralph score, its inventors argue that as in other grading methods, the proportion of affected lung and or cavitation is important to assess severity. It has been described as a “validated method for grading CXR severity in adults with smear-positive pulmonary TB that correlates with baseline clinical and microbiological severity and response to treatment, and is suitable for use in clinical trials” (https://thorax.bmj.com/content/thoraxjnl/65/10/863.full.pdf). While the validation of the score is convincing in that study, and the score has been used in several TB studies and trials, the low proportion of HIV co-infections might have been a limitation. Indeed, as shown in our previous publication, in our cohort of patients, chest X-ray scores were significantly lower in HIV infected TB patients https://doi.org/10.1371/journal.ppat.1010893. In the current analysis, regression analyses performed for the CXR severity and for the other severity measures did not include HIV co-infected patients.

      We obtained the same pattern of results using a continuous outcome. However, an assumption of linear regression was violated. The residuals were not normally distributed stemming from the bimodal distribution of the scores in our dataset. The threshold of 71 for the Ralph score has been used by others in previous studies; in its original description it has been suggested as the optimal cut-off point for predicting a positive sputum smear status after two months, which in turn has been shown to predict unfavorable outcomes (https://doi.org/10.1136/thx.2010.136242). Another study showed that a Ralph score higher than 71 was significantly associated with a longer duration of symptoms, higher clinical scores and a lower BMI (doi: 10.5603/ARM.2018.0032).

      (2) There is quite a lot of missing data, especially for TB scores - could this have introduced bias? This issue should be mentioned in the discussion.

      While we have a TB-score available for each patient, the chest X-ray score is missing for many patients. However, this is random and due both to the absence of an X-ray picture or to the bad quality of X-ray pictures that the radiologists could not assess. When stating that there is a lot of missing data for the TB scores, we assume that the reviewer was referring to the “missing N” columns in Table 1. There, the number of observations missing in each of the disease severity measures actually relates to the explanatory variables (i.e MTBC genotype and human ancestries). This table includes all patients that either had a bacterial genome available or a human genome/genotype (N = 1904). As an example for the TB-score as outcome variable, for 1471 patients the MTBC genotype was determined while it was missing for 433 patients. On the other hand for X-ray scores, 177 had a severe X-ray score, 849 a mild one and for 878 patients, there was no X-ray score available.  As for the Ct-value, despite the fact that the patients were recruited based on positive GeneXpert by the clinical team, these results were not always available to us.

      (3) The analysis adjusted for age, sex, HIV status, age, smoking and cough duration - but not for socio-economic status. This will likely be a major determinant of disease severity. Was adjustment made for previous TB (i.e. new vs repeat episode) and drug-sensitivity of the isolate? Cough duration will effectively be a correlate/consequence of more severe disease - thus likely highly collinear with disease severity read-outs - not a true confounder. How does removal of this variable from the model affect results? Data on socioeconomic status should be added to models, or if not possible then lack of such data should be noted as a limitation.

      Out of the 1904 patients that have either human or bacterial genomic data available, 48 were relapses (2.5%). The mean of the disease severity measures suggest that relapses have a higher CXR score but the TB-score and Ct-values did not differ. Based on the comments of both reviewers, we added the following additional variables as covariates to the regression models: the socioeconomic status representing the ratio between the household income and the number of individuals in the household, malnutrition examined by a doctor, the education level, and whether it was a relapse/reinfection or a new case and if the causative strain had any resistance to any anti-TB drugs. The results did not change. Cough duration could also be a consequence of more severe disease, as pointed out by the reviewer. We present now the results excluding cough duration as a variable from the model, however this also did not affect the results.

      (4) Recruitment at hospitals may have led to selection bias due to exclusion of less severe, community cases. The authors already acknowledge this limitation in the Discussion however.

      (5) Introduction: References refer to disease susceptibility, but the authors should also consider the influences of host/pathogen genetics on host response - both in vitro (PMIDs 11237411, 15322056) and in vivo (PMID 23853590). The last of these studies encompassed a broader range of ethnic variation than the current study, and showed associations between host ancestry and immune response - null results from the current study may reflect the relative genetic homogeneity of the population studied.

      We thank the reviewer for these suggestions which we have added to the introduction. 

      Reviewer #1 (Recommendations for the authors):

      Minor Comments:

      (1) The authors should be careful when using the term "Bantu" as opposed to "Bantu-speaking". (i.e. referring to the language group). The term is considered offensive in some settings.

      We thanks the reviewer for this important concern, we have revised throughout the manuscript.

      (2) There are several "(Error! Reference source not found)" phrases in the place of references throughout the document.

      We thank the reviewer for pointing this out, this has been corrected in the revised version.

      (3) Please correct line 365: "... sequencing (WGS) the patient...." to "... sequencing (WGS) of the patient...."

      (4) The figures in the supplementary PDF are not numbered and some are cut-off (I think it is Supplementary Figure S2).

      This has been corrected in the revised version.

      Reviewer #2 (Recommendations for the authors):

      Typographical errors

      (1) There are multiple instances where references have not pulled through to the text, e.g. line 126 (Error! Reference source not found.)

      We thank the reviewer for pointing this out, this has been corrected in the revised version.

      (2) Line 239: have been show - have been shown?

      Thank you, this mistake has been corrected in the revised version.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      This is a new and important system that can efficiently train mice to perform a variety of cognitive tasks in a flexible manner. It is innovative and opens the door to important experiments in the neurobiology of learning and memory. 

      Strengths: 

      Strengths include: high n's, a robust system, task flexibility, comparison of manual-like training vs constant training, circadian analysis, comparison of varying cue types, long-term measurement, and machine teaching. 

      Weaknesses: 

      I find no major problems with this report. 

      Minor weaknesses: 

      (1)  Line 219: Water consumption per day remained the same, but number of trails triggered was more as training continued. First, is this related to manual-type training? Also, I'm trying to understand this result quantitatively, since it seems counter-intuitive: I would assume that with more trials, more water would be consumed since accuracy should go up over training (so more water per average trial). Am I understanding this right? Can the authors give more detail or understanding to how more trials can be triggered but no more water is consumed despite training? 

      Thanks for the comment. We would like to clarify the phenomenon described in Line 219: As the training advanced, the number of trials triggered by mice per day decreased (rather than increased as you mentioned in the comment) gradually for both manual and autonomous groups of mice (Fig. 2H left). The performance, as you mentioned, improved over time (Fig. 2D and 2E), leading to an increased probability of obtaining water and thus relatively stable daily water intake (Fig. 2H middle). We believe the stable daily intake is the minimum amount of water required by the mice under circumstance of autonomous behavioral training. To make the statement more clearly, we indicated the corresponding figure numbers in the text.

      Results “… As shown in Fig. 2H, autonomous training yielded significantly higher number of trial/day (980 ± 25 vs. 611 ± 26, Fig. 2H left) and more volume of water consumption/day (1.65 ± 0.06 vs. 0.97 ± 0.03 ml, Fig. 2H middle), which resulted in monotonic increase of body weight that was even comparable to the free water group (Fig.2H right). In contrast, the body weight in manual training group experienced a sharp drop at the beginning of training and was constantly lower than autonomous group throughout the training stage (Fig. 2H right).”

      (2) Figure 2J: The X-axis should have some label: at least "training type". Ideally, a legend with colors can be included, although I see the colors elsewhere in the figure. If a legend cannot be added, then the color scheme should be explained in the caption.

      Thanks for the suggestion. The labels with corresponding colors for x-axis have been added for Fig. 2J.

      (3) Figure 2K: What is the purple line? I encourage a legend here. The same legend could apply to 2J.

      Thanks for the suggestion. The legend has been added for Fig. 2K.

      (4) Supplementary Figure S2 D: I do not think the phrase "relying on" is correct. Instead, I think "predicted by" or "correlating with" might be better. 

      We thank the reviewer for the valuable suggestion. The phrase has been changed to ‘predicted by’ for better suitability.

      Figure S2 “(D), percentage of trials significantly predicted by different regressors during task learning. …”

      Reviewer #2 (Public review): 

      Summary: 

      The manuscript by Yu et al. describes a novel approach for collecting complex and different cognitive phenotypes in individually housed mice in their home cage. The authors report a simple yet elegant design that they developed for assessing a variety of complex and novel behavioral paradigms autonomously in mice. 

      Strengths: 

      The data are strong, the arguments are convincing, and I think the manuscript will be highly cited given the complexity of behavioral phenotypes one can collect using this relatively inexpensive ($100/box) and high throughput procedure (without the need for human interaction). Additionally, the authors include a machine learning algorithm to correct for erroneous strategies that mice develop which is incredibly elegant and important for this approach as mice will develop odd strategies when given complete freedom. 

      Weaknesses:

      (1) A limitation of this approach is that it requires mice to be individually housed for days to months. This should be discussed in depth. 

      Thank you for raising this important point. We agree that the requirement for individual housing of mice during the training period is a limitation of our approach, and we appreciate the opportunity to discuss this in more depth. In the manuscript, we add a section to the Discussion to address this limitation, including the potential impact of individual housing on the mice, the rationale for individual housing in our study, and efforts or alternatives made to mitigate the effects of individual housing.

      Discussion “… Firstly, our experiments were confined to single-housed mice, which is known to influence murine behavior and physiology, potentially affecting social interaction and stress levels [76]. In our study, individual housing was necessary to ensure precise behavioral tracking, eliminate competitive interactions during task performance, and maintain consistent training schedules without disruptions from cage-mate disturbances. However, the potential of group-housed training has been explored with technologies such as RFID [28,29,32–34] to distinguish individual mice, which potentially improving the training efficiency and facilitating research of social behaviors [77]. Notably, it has shown that simultaneous training of group-housed mice, without individual differentiation, can still achieve criterion performance [25].”

      (2) A major issue with continuous self-paced tasks such as the autonomous d2AFC used by the authors is that the inter-trial intervals can vary significantly. Mice may do a few trials, lose interest, and disengage from the task for several hours. This is problematic for data analysis that relies on trial duration to be similar between trials (e.g., reinforcement learning algorithms). It would be useful to see the task engagement of the mice across a 24-hour cycle (e.g., trials started, trials finished across a 24-hour period) and approaches for overcoming this issue of varying inter-trial intervals. 

      Thank you for your insightful comment regarding the variability in inter-trial intervals and its potential impact on data analysis. We agree that this is an important consideration for continuous self-paced tasks.

      In our original manuscript, we have showed the general task engagement across 24-hour cycle (Fig. 2K), which revealed two peaks of engagements during the dark cycle with relatively fewer trials during the light cycle. To facilitate analyses requiring consistent trial durations, we defined trial blocks as sequences between two no-response trials. Notably, approximately 66.6% of trials occurred within blocks of >5 consecutive trials (Fig. 2L), which may be particularly suitable for such analyses.

      In the revised manuscript, we also added the analysis of the histogram of inter-trial-interval for both the autonomous and manual training paradigms in HABITS (Fig. S2H), which shows that around 55.2% and 77.5% of the intervals are less than 2 seconds in autonomous and manual training, respectively.

      Results “… We found more than two-third of the trials was done in >5-trial blocks (Fig. 2L left) which resulted in more than 55% of the trials were with inter-trial-interval less than 2 seconds (Fig. S2H).”

      Regarding the approaches to mitigate the issue of varying inter-trial interval, we observed that manual training (i.e., manually transferring to HABITS for ~2 hr/day) in Fig. S2H resulted in more trials with short inter-trial-interval, suggesting that constrained access time promotes task engagement and reduces interval variability. Fig. 2L also indicated that the averaged correct rate increased and the earlylick rate decreased as the length of block increased. This approach could be valuable for studies where consistent trial timing is critical. In the context of our study, we could actually introduce a light, for example, to serve as the cue that prompt the animals to engage during a fixed time duration in a day.

      Discussion “… In contrast, the self-paced nature of autonomous training may permit greater variability in attentional engagement 83 and inter-trial-intervals, which could be problematic for data analysis relaying on consistent intervals and/or engagements. Future studies should explore how controlled contextual constraints enhance learning efficiency and whether incorporating such measures into HABITS could optimize its performance.”

      (3) Movies - it would be beneficial for the authors to add commentary to the video (hit, miss trials). It was interesting watching the mice but not clear whether they were doing the task correctly or not. 

      Thanks for the reminder. We have added subtitles to both of the videos. Since the supplementary video1 was not recorded with sound, the correctness of the trials was hard to judge. We replaced the video with another one with clear sound recordings, and the subtitles were commented in detail.

      (4) The strength of this paper (from my perspective) is the potential utility it has for other investigators trying to get mice to do behavioral tasks. However, not enough information was provided about the construction of the boxes, interface, and code for running the boxes. If the authors are not willing to provide this information through eLife, GitHub, or their own website then my evaluation of the impact and significance of this paper would go down significantly. 

      Thanks for this important comment. We would like to clarify that the construction methods, GUI, code for our system, PCB and CAD files (newly uploaded) have already been made publicly available on https://github.com/Yaoyao-Hao/HABITS. Additionally, we have open-sourced all the codes and raw data for all training protocols (https://doi.org/10.6084/m9.figshare.27192897). We will continue to maintain these resources in the future.

      Minor concerns: 

      (5) Learning rate is confusing for Figure 3 results as it actually refers to trials to reach the criterion, and not the actual rate of learning (e.g., slope).

      Thanks for pointing this out. The ‘learning rate’ which refers to trial number to reach criterion has been changed to ‘the number of trials to reach criterion’.

      Reviewer #3 (Public review): 

      Summary: 

      In this set of experiments, the authors describe a novel research tool for studying complex cognitive tasks in mice, the HABITS automated training apparatus, and a novel "machine teaching" approach they use to accelerate training by algorithmically providing trials to animals that provide the most information about the current rule state for a given task. 

      Strengths: 

      There is much to be celebrated in an inexpensively constructed, replicable training environment that can be used with mice, which have rapidly become the model species of choice for understanding the roles of distinct circuits and genetic factors in cognition. Lingering challenges in developing and testing cognitive tasks in mice remain, however, and these are often chalked up to cognitive limitations in the species. The authors' findings, however, suggest that instead, we may need to work creatively to meet mice where they live. In some cases, it may be that mice may require durations of training far longer than laboratories are able to invest with manual training (up to over 100k trials, over months of daily testing) but the tasks are achievable. The "machine teaching" approach further suggests that this duration could be substantially reduced by algorithmically optimizing each trial presented during training to maximize learning. 

      Weaknesses: 

      (1) Cognitive training and testing in rodent models fill a number of roles. Sometimes, investigators are interested in within-subjects questions - querying a specific circuit, genetically defined neuron population, or molecule/drug candidate, by interrogating or manipulating its function in a highly trained animal. In this scenario, a cohort of highly trained animals that have been trained via a method that aims to make their behavior as similar as possible is a strength. 

      However, often investigators are interested in between-subjects questions - querying a source of individual differences that can have long-term and/or developmental impacts, such as sex differences or gene variants. This is likely to often be the case in mouse models especially, because of their genetic tractability. In scenarios where investigators have examined cognitive processes between subjects in mice who vary across these sources of individual difference, the process of learning a task has been repeatedly shown to be different. The authors do not appear to have considered individual differences except perhaps as an obstacle to be overcome. 

      The authors have perhaps shown that their main focus is highly-controlled within-subjects questions, as their dataset is almost exclusively made up of several hundred young adult male mice, with the exception of 6 females in a supplemental figure. It is notable that these female mice do appear to learn the two-alternative forced-choice task somewhat more rapidly than the males in their cohort.

      Thank you for your insightful comments and for highlighting the importance of considering both within-subject and between-subject questions in cognitive training and testing in rodent models. We acknowledge that our study primarily focused on highly controlled within-subject questions. However, the datasets we provided did show preliminary evidences for the ‘between-subject’ questions. Key observations include:

      The large variability in learning rates among mice observed in Fig. 2I;

      The overall learning rate difference between male and female subjects (Fig. 2D vs. Fig. S2G);

      The varying nocturnal behavioral patterns (Fig. 2K), etc.

      We recognize the value of exploring between-subjects differences in mouse model and discussed more details in the Discussion part.

      Discussion “Our study was designed to standardize behavior for the precise interrogation of neural mechanisms, specifically addressing within-subject questions. However, investigators are often interested in between-subject differences—such as sex differences or genetic variants—which can have long-term behavioral and cognitive implications [72,74]. This is particularly relevant in mouse models due to their genetic tractability [75]. Although our primary focus was not on between-subject differences, the dataset we generated provides preliminary evidence for such investigations. Several behavioral readouts revealed individual variability among mice, including large disparities in learning rates across individuals (Fig. 2I), differences in overall learning rates between male and female subjects (Fig. 2D vs. Fig. S2G), variations in nocturnal behavioral patterns (Fig. 2K), etc.”

      (2) Considering the implications for mice modeling relevant genetic variants, it is unclear to what extent the training protocols and especially the algorithmic machine teaching approach would be able to inform investigators about the differences between their groups during training. For investigators examining genetic models, it is unclear whether this extensive training experience would mitigate the ability to observe cognitive differences, or select the animals best able to overcome them - eliminating the animals of interest. Likewise, the algorithmic approach aims to mitigate features of training such as side biases, but it is worth noting that the strategic uses of side biases in mice, as in primates, can benefit learning, rather than side biases solely being a problem. However, the investigators may be able to highlight variables selected by the algorithm that are associated with individual strategies in performing their tasks, and this would be a significant contribution.

      Thank you for the insightful comments. We acknowledge that the extensive training experience, particularly through the algorithmic machine teaching approach, could potentially influence the ability to observe cognitive differences between groups of mice with relevant genetic variants. However, our study design and findings suggest that this approach can still provide valuable insights into individual differences and strategies used by the animals during training. First, the behavioral readout (including learning rate, engagement pattern, etc.) as mentioned above, could tell certain number of differences among mice. Second, detailed modelling analysis (with logistical regression modelling) could further dissect the strategy that mouse use along the training process (Fig. S2B). We have actually highlighted some variables selected by the regression that are associated with individual strategies in performing their tasks (Fig. S2C) and these strategies could be different between manual and autonomous training groups (Fig. S2D). We included these comments in the Discussion part for further clearance.

      Discussion “… Furthermore, a detailed logistic regression analysis dissected the strategies mice employed during training (Fig. S2B). Notably, the regression identified variables associated with individual task-performance strategies (Fig. S2C), which also differed between manually and autonomously trained groups (Fig. S2D). Thus, our system could facilitate high-throughput behavioral studies exploring between-subject differences in the future.”

      (3) A final, intriguing finding in this manuscript is that animal self-paced training led to much slower learning than "manual" training, by having the experimenter introduce the animal to the apparatus for a few hours each day. Manual training resulted in significantly faster learning, in almost half the number of trials on average, and with significantly fewer omitted trials. This finding does not necessarily argue that manual training is universally a better choice because it leads to more limited water consumption. However, it suggests that there is a distinct contribution of experimenter interactions and/or switching contexts in cognitive training, for example by activating an "occasion setting" process to accelerate learning for a distinct period of time. Limiting experimenter interactions with mice may be a labor-saving intervention, but may not necessarily improve performance. This could be an interesting topic of future investigation, of relevance to understanding how animals of all species learn.

      Thank you for your insightful comments. We agree that the finding that manual training led to significantly faster learning compared to self-paced training is both intriguing and important. One of the possible reasons we think is due to the limited duration of engagement provided by the experimenter in the manual training case, which forced the mice to concentrate more on the trials (thus with fewer omitting trials) than in autonomous training. Your suggestion that experimenter interactions might activate an "occasion setting" process is particularly interesting. In the context of our study, we could actually introduce, for example, a light, serving as the cue that prompt the animals to engage; and when the light is off, the engagement was not accessible any more for the mice to simulate the manual training situation. We agree that this could be an interesting topic for future investigation that might create a more conducive environment for learning, thereby accelerating the learning rate.

      Discussion “… Lastly, while HABITS achieves criterion performance in a similar or even shorter overall days compared to manual training, it requires more trials to reach the same learning criterion (Fig. 2G). We hypothesize that this difference in trial efficiency may stem from the constrained engagement duration imposed by the experimenter in manual training, which could compel mice to focus more intensely on task execution, resulting in less trial omissions (Fig. 2F). In contrast, the self-paced nature of autonomous training may permit greater variability in attentional engagement 83 and inter-trial-intervals, which could be problematic for data analysis relaying on consistent intervals and/or engagements. Future studies should explore how controlled contextual constraints enhance learning efficiency and whether incorporating such measures into HABITS could optimize its performance.”

      Reviewer #2 (Recommendations for the authors):

      As I mentioned in the weaknesses, I did not see code or CAD drawings for their home cages and how these interact with a computer.

      Thanks for the comment. We would like to clarify that the construction methods, GUI, code for our system, PCB and CAD files (newly uploaded) have already been made publicly available on https://github.com/Yaoyao-Hao/HABITS.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      This study highlights the strengths of using predictive computational models to inform C. elegans screening studies of compounds' eCects on aging and lifespan. The authors primarily focus on all-trans retinoic acid (atRA), one of the 5 compounds (out of 16 tested) that extended C. elegans lifespan in their experiments. They show that atRA has positive eCects on C. elegans lifespan and age-related health, while it has more modest and inconsistent eCects (i.e., some detrimental impacts) for C. briggsae and C. tropicalis. In genetic experiments designed to evaluate contributing mediators of lifespan extension with atRA exposure, it was found that 150 µM of atRA did not significantly extend lifespan in akt1 or akt-2 loss-of-function mutants, nor in animals with loss of function of aak-2, or skn-1 (in which atRA had toxic eCects); these genes appear to be required for atRA-mediated lifespan extension. hsf-1 and daf-16 loss-of-function mutants both had a modest but statistically significant lifespan extension with 150 µM of atRA, suggesting that these transcription factors may contribute towards mediating atRA lifespan extension, but that they are not individually required for some lifespan extension. RNAseq assessment of transcriptional changes in day 4 atRA-treated adult wild-type worms revealed some interesting observations. Consistent with the study's genetic mutant lifespan observations, many of the atRA-regulated genes with the greatest fold-change diCerences are known regulated targets of daf-2 and/or skn-1 signaling pathways in C. elegans. hsf-1 loss-offunction mutants show a shifted atRA transcriptional response, revealing a dependence on hsf-1 for ~60% of the atRA-downregulated genes. On the other hand, RNAseq analysis in aak-2 loss-of-function mutants revealed that aak-2 is only required for less than a quarter of the atRA transcriptional response. All together, this study is proof of the concept that computational models can help optimize C. elegans screening approaches that test compounds' eCects on lifespan, and provide comprehensive transcriptomic and genetic insights into the lifespan-extending eCects of all-trans retinoic acid (atRA).

      Strengths:

      (1) A clearly described and well-justified account describes the approach used to prioritize and select compounds for screening, based on using the top candidates from a published list of computationally ranked compounds (Fuentealba et al., 2019) that were crossreferenced with other bioinformatics publications to predict anti-aging compounds, after de-selecting compounds previously evaluated in C. elegans as per the DrugAge database. 16 compounds were tested at 4-5 diCerent concentrations to evaluate eCects on C. elegans lifespan.

      (2) Robust experimental design was undertaken evaluating the lifespan eCects of atRA, as

      it was tested on three strains each of C. elegans, C. briggsae, and C. tropicalis, with trial replication performed at three distinct laboratories. These observations extended beyond lifespan to include evaluations of health metrics related to swimming performance.

      (3) In-depth analyses of the RNAseq data of whole-worm transcriptional responses to atRA revealed interesting insights into regulator pathways and novel groups of genes that may be involved in mediating lifespan-extension eCects (e.g., atRA-induced upregulation of sphingolipid metabolism genes, atRA-upregulation of genes in a poorly-characterized family of C. elegans paralogs predicted to have kinase-like activity, and disproportionate downregulation of collagen genes with atRA).

      We thank the reviewer for highlighting the strengths of our paper.

      Weaknesses:

      (1) The authors' computational-based compound screening approach led to a ~30% prediction success rate for compounds that could extend the median lifespan of C.elegans. However, follow-up experiments on the top compounds highlighted the fact that some of these observed "successes" could be driven by indirect, confounding eCects of these compounds on the bacterial food source, rather than direct beneficial eCects on C. elegans physiology and lifespan. For instance, this appeared to be the case for the "top" hit of propranolol; other compounds were not tested with metabolically inert or killed bacteria. In addition, there are no comparative metrics provided to compare this study's ~30% success rate to screening approaches that do not use computational predictions.

      We do test whether compounds have a direct e:ect on bacterial growth. We have the text to clarify that fact. There may be potential lifespan e:ects from atRA due to changes in bacterial metabolites, however exploring that more fully is beyond the scope of the current work. 

      We very much appreciate the question regarding relative success. An appropriate benchmark for “hit rate” is perhaps best provided by Petrascheck, Ye & Buck (2007), who conducted a large-scale screen of 88,000 compounds for e:ects on adult lifespan in C. elegans. They found an initial screening hit rate of 1.2% (1083/88000), which were then retested for a verified hit rate of 0.13% (115/88000), with a retest failure rate of 89% (968/1083). Similarly, Lucanic et al. (2016) screened 30,000 compounds, with an initial hit rate of approximately 1.7% (~500/30000), or these 180 were selected for retesting, resulting in a final verified hit rate of 0.19% (57/29680), which is comparable to the Petrascheck et al. result. The text in the discussion has been modified to include these studies.

      (2)Transcriptomic analyses of atRA eCects were extensive in this study, but evaluations and discussions of non-transcriptional eCects of key proposed regulators (such as AMPK) were limited. For instance, non-transcriptional eCects of aak-2/AMPK might account for its requirement for mediating lifespan extension eCects, since aak-2 was not required for a major proportion of atRA transcriptional responses.

      We naturally agree with the reviewer that non-transcriptional e:ects are possible and well worth pursuing in future work. However, these e:ects will still show within our study, as any upstream non-transcriptional e:ects are likely to reveal themselves in downstream transcriptional changes, as measured here.  

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Banse et al. experimentally validate the power of computational approaches that predict anti-aging molecules using the multi-species approach of the Caenorhabditis Intervention Testing Program (CITP). Filtering candidate molecules based on transcriptional profiles, ML models, literature searches, and the DrugAge database, they selected 16 compounds for testing. Of those, eight did not aCect C.elegan's lifespan, three shortened it, and five extended C.elegan's lifespan, resulting in a hit rate of over 30%. Of those five, they then focused on all-trans-retinoic acid (atRA), a compound that has previously resulted in contradictory eCects. The lifespan-extending eCect of atRA was consistent in all C. elegans strains tested, was absent in C. briggsae, and a small eCect was observed in some C. tropicalis strains. Similar results were obtained for measures of healthspan. The authors then investigated the mechanism of action of atRA and showed that it was only partially dependent on daf-16 but required akt-1, akt-2, skn-1, hsf-1, and, to some degree, pmk-1. The authors further investigate the downstream eCects of atRA exposure by conducting RNAseq experiments in both wild-type and mutant animals to show that some, but surprisingly few, of the gene expression changes that are observed in wild-type animals are lost in the hsf-1 and aak-2 mutants.

      Strengths:

      Overall, this study is well conceived and executed as it investigates the eCect of atRA across diCerent concentrations, strains, and species, including life and health span. Revealing the variability between sites, assays, and the method used is a powerful aspect of this study. It will do a lot to dispel the nonsensical illusion that we can determine a percent increase in lifespan to the precision of two floating point numbers.

      An interesting and potentially important implication arises from this study. The computational selection of compounds was agnostic regarding strain or species diCerences and was predominantly based on observations made in mammalian systems. The hit rate calculated is based on the results of C. elegans and not on the molecules' eCectiveness in Briggsae or Tropicalis. If it were, the hit rate would be much lower. How is that? It would suggest that ML models and transcriptional data obtained from mammals have a higher predictive value for C. elegans than for the other two species. This selectivity for C.elegans over C.tropicalis and C.Briggsae seems both puzzling and unexpected. The predictions for longevity were based on the transcriptional data in cell lines.

      This is a common observation in the CITP for which we do not currently have a satisfying explanation. For whatever reason, C. elegans is much more responsive to compounds than other species, much like it is more responsive to RNAi and other environmental interventions. It may be less active in detoxifying external agents than the other species, although this is just speculation at the moment. We continue to investigate this question, but that work is beyond the scope of the present paper.

      Would it be feasible to compare the mammalian data to the transcriptional data in Figure 5 and see how well they match? While this is clear beyond the focus of this study, an implied prediction is that running RNAseqs for all these strains exposed to atRA would reveal that the transcriptional changes observed in the strains where it extends lifespan the most should match the mammalian data best. Otherwise, how could the mammalian datasets be used to predict the eCects of C.elegans over C.Briggsae or C.Tropicalis have more predictive for one species than the other? There are a lot of IFs in this prediction, but such an experiment would reconsider and validate the basis on which the original predictions were made.

      These questions are worth pursuing in the future but are beyond the scope of the current work.

      Weaknesses:

      Many of the most upregulated genes, such as cyps and pgps are xenobiotic response genes upregulated in many transcriptional datasets from C. elegans drug studies. Their expression might be necessary to deal with atRA breakdown metabolites to prevent toxicity rather than confer longevity. Because atRA is very light sensitive and has toxicity of breakdown, metabolites may explain some of the diCerences observed with the lifespan of machine eCects compared to standard assay practices.

      This is certainly a possibility, although we often observe longer lifespans on the ALM, perhaps because they themselves are stressful, thereby providing a more sensitive background environment for detecting positive stress response modulators.

      Reviewer #3 (Public review):

      Summary:

      In this study, Banse et al., demonstrate that combining computer prediction with genetic analysis in distinct Caenorhabditis species can streamline the discovery of aging interventions by taking advantage of the diverse pool of compounds that are currently available. They demonstrate that through careful prioritization of candidate compounds, they are able to accomplish a 30% positive hit rate for interventions that produce significant lifespan extensions. Within the positive hits, they focus on all-trans retinoic acid (atRA) and discover that it modulates lifespan through conserved longevity pathways such as AKT-1 and AKT-2 (and other conserved Akt-targets such as Nrf2/SKN-1 and HSF1/HSF-1) as well as through AAK-2, a conserved catalytic subunit of AMPK. To better understand the genetic mechanisms behind lifespan extension upon atRA treatment, the authors perform RNAseq experiments using a variety of genetic backgrounds for cross-comparison and validation. Using this current state-of-the-art approach for studying gene expression, the authors determine that atRA treatment produces gene expression changes across a broad set of stress-response and longevity-related pathways. Overall, this study is important since it highlights the potential of combining traditional genetic analysis in the genetically tractable organism C. elegans with computational methods that will become even more powerful with the swift advancements being made in artificial intelligence. The study possesses both theoretical and practical implications not only in the field of aging but also in related fields such as health and disease. Most of the claims in this study are supported by solid evidence, but the conclusions can be refined with a small set of additional experiments or re-analysis of data.

      Strengths:

      (1) The criteria for prioritizing compounds for screening are well-defined and easy to replicate (Figure 1), even for scientists with limited experience in computational biology. The approach is also adaptable to other systems or model organisms.

      (2) I commend the researchers for doing follow-up experiments with the compound propranolol to verify its eCect on lifespan (Figure 2 Supplement 2), given the observation that it aCected the growth of OP50. To prevent false hits in the future, the reviewer recommends the use of inactivated OP50 for future experiments to remove this confounding variable.

      (3) The sources of variation (Figure 3, Figure Supplement 2) are taken into account and demonstrate the need for advancing our understanding of the lifespan phenotype due to inter-individual variation.

      (4) The addition of the C. elegans swim test in addition to the lifespan assays provides further evidence of atRA-induced improvement in longevity.

      (5) The RNAseq approach was performed in a variety of genetic backgrounds, which allowed the authors to determine the relationship between AAK-2 and HSF-1 regulation of the retinoic acid pathway in C. elegans, specifically, that the former functions downstream of the latter.

      We thank the reviewer for highlighting these strengths.

      Weaknesses:

      (1) The filtering of compounds for testing using the DrugAge database requires that the database is consistently updated. In this particular case, even though atRA does not appear in the database, the authors themselves cite literature that has already demonstrated atRA-induced lifespan extension, which should have precluded this compound from the analysis in the first place.

      As often happens in science, this work was initiated before Statzer et al. (2021) was published. As such, it is included in the test set.

      (2) The threshold for determining positive hits is arbitrary, and in this case, a 30% positive hit rate was observed when the threshold is set to a lifespan extension of around 5% based on Figure 1B (the authors fail to explicitly state the cut-oC for what is considered a positive hit).

      Any compound that statistically increases lifespan is considered a positive hit by the CITP. The CITP in general is powered to detect minimum e:ect sizes of 5%.

      (3) The authors demonstrate that atRA extends lifespan in a species-specific manner (Figure 3). Specifically, this extension only occurs in the species C. elegans yet, the title implies that atRA-induced lifespan extension occurs in diCerent Caenorhabditis species when it is clearly not the case. While the authors state that failure to observe phenotypes in C. briggsae and C. tropicalis is a common feature of CITP tests, they do not speculate as to why this phenomenon occurs.

      Please see the comment above.

      (4) There are discrepancies between the lifespan curves by hand (Figure 3 Figure Supplement 1) and using the automated lifespan machine (Figure 3 Supplement 3). Specifically, in the automated lifespan assays, there are drastic changes in the slope of the survival curve which do not occur in the manual assays. This may be due to improper filtering of non-worm objects, improper annotation of death times, or improper distribution of plates in each scanner.

      Our storyboarding SOP ensures that discrepancies in the shape of the curve are unlikely to be due to annotation errors. We check every page of the storyboard by hand, so all non-worm objects are excluded. Furthermore, the first and last ~10% of deaths are checked by hand (as we observed that these time points are the most likely to be wrongly called by the software), with a few deaths chosen at random from the middle to ensure that the software is calling death times accurately. If we find a high amount of inaccurately called deaths, the entire plate is annotated by hand. For this specific experiment, 18% of the total deaths were hand annotated. Plates are randomly distributed across each scanner in an e:ort to prevent bias. As noted above, it does appear that the ALM environment and the “by hand” environment are somewhat di:erent.

      (5) The authors miss an opportunity to determine whether the lifespan extension phenotype attributed to the retinoic acid pathway is mostly transcriptional in nature or whether some of it is post-transcriptional. The authors even state "that while aak-2 is absolutely required for the longevity eCects of atRA, aak-2 is required only for a small proportion (~1/4) of the transcriptional response", suggesting that some of the eCects are post-transcriptional. Further information could have been obtained had the authors also performed RNAseq analysis on the tol-1 mutant which exhibited an enhanced response to atRA compared to wild-type animals, and comparing the magnitude of gene expression changes between the tol-1 mutant and all other genetic backgrounds for which RNAseq was performed.

      Reviewer #1 (Recommendations for the authors):

      (1) Will the raw RNA-seq data be publicly deposited? Please clarify. This would strengthen the value of the study.

      All data is available. We have clarified this in the text.

      (2) Since all-trans retinoic acid is a metabolite of vitamin A, it seems important to include a discussion of and reference to the recent study SKN-1/NRF2 upregulation by vitamin A is conserved from nematodes to mammals and is critical for lifespan extension in Caenorhabditis elegans (Sirakawin et al Cell Reports 2024). Sirakawin et al include data that corroborates and expands on the findings of the current study, including the observation that vitamin A reduces whole-body lipid deposition (agrees with some of the transcriptional findings in the current study); that vitamin A protects against oxidative stress; that vitamin A elevates expression of gst-5, skn-1, and pmk-1; and that loss-offunction mutation of skn-1 has similar eCects to the current study, in terms of suppressing lifespan-extending eCects of vitamin A. In addition, adding some discussion of oxidative stress would strengthen this work, in light of widespread perceptions of the antioxidant properties of vitamin A (and its metabolites).

      Thank you for this suggestion. We have added this citation to the discussion.

      (3) Minor typo: Lines 341-342 - After a sentence that contains the phrase "collagen and neuropeptide related genes", the next sentence uses the term "the latter" in reference to the collagen genes (should be "the former").

      Edited in text.

      (4) Minor correction: In Figure 6, the information in the figure legend is swapped for figure panels A) and B).

      Edited in figure caption.

      (5) To me, the subtitle heading "Loss of AMPK leads to a unique transcriptional profile in response to atRA treatment" (Line 403) is misleading, considering the contents of the text in that section, and the data presented in Figure 6.

      We have altered this heading to reflect this comment.

      Reviewer #2 (Recommendations for the authors):

      Using diCerent colors for the diCerent testing sites would make Figure 3 more readable.

      Edited so that each lab is represented by a di:erent shade of green.

      Reviewer #3 (Recommendations for the authors):

      It would be interesting to investigate the eCect of even higher concentrations of atRA as it has been reported that atRA accumulation is associated with deleterious phenotypes in mice (Snyder et al., 2020, FASEB J).

      We tested the highest concentration (150 uM) based on the solubility of the compound using our standardized plate treatment protocol, so we are unable to test higher concentrations.  

      A good first guess for a downstream retinoid receptor is nhr-23 which is the homolog of the vertebrate ROR genes. Stehlin-Gaon et al. (2003, Nat Struct Mol Biol) have shown that atRA is a ligand for the orphan nuclear receptor RORβ. It might be interesting to study the eCects of atRA on an nhr-23::AID (auxin inducible degron) background. This would allow you to circumvent the developmental phenotypes as a result of nhr-23 knockdown. Patrick/Stephen

      A few notes on the text/figures:

      Line 342: I believe the authors meant "former" instead of "latter".

      Corrected in text.

      Line 346: Can you also highlight col-144 in Fig. 5 S1?

      This is not really feasible, as it is in the cluster near the where the axes meet (red arrow).

      Line 400: CUB pathogen - based on Figure 6 Supp 1, this occurs in aak-2 and not in hsf-1.

      Great catch by the reviewer. We have updated the figure with the correct information.

      Line 414: hedgehog-like signaling - occurs in hsf-1 instead of aak-2. Similar inconsistencies occur in lines 415 (sterol), 417 (C-type lectin), and 418 (unassigned pathogens)

      We have updated the text to eliminate potential conflicts/confusion in the presentation here.

      Line 434: I believe the authors meant Figure "6" instead of "7"

      Edited in text.

      Line 475: Is it "fifteen" or "sixteen" compounds initially targeted?

      Edited in text.

      Can you please include the population sizes for the lifespan assays if not yet included in the detailed protocol to be published in FigShare (to which I currently do not have access to)?

      Added “50 animals per petri plate” to Lifespan Assay methods section; additionally, all sample sizes are included as a summary tab in each dataset on figshare.com (10.6084/m9.figshare.c.6320690).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The authors in this study extensively investigate how telomere length (TL) regulates hTERT expression via non-telomeric binding of the telomere-associated protein TRF2. They conclusively show that TRF2 binding to long telomeres results in a reduction in its binding to the hTERT promoter. In contrast, short telomeres restore TRF2 binding in the hTERT promoter, recruiting repressor complexes like PRC2, and suppressing hTERT expression. The study presents several significant findings revealing a previously unknown mechanism of hTERT regulation by TRF2 in a TL-dependent manner

      Strengths:

      (1) A previously unknown mechanism linking telomere length and hTERT regulation through the non-telomeric TRF2 protein has been established strengthening the telomere biology understanding.

      (2) The authors used both cancer cell lines and iPSCs to showcase their hypothesis and multiple parameters to validate the role of TRF2 in hTERT regulation.

      (3) Comprehensive integration of the recent literature findings and implementation in the current study.

      (4) In vivo validation of the findings.

      (5) Rigorous controls and well-designed assays have been use.

      Weaknesses:

      (1) The authors should comment on the cell proliferation and morphology of the engineered cell lines with ST or LT.

      The cell proliferation and morphology of the engineered cells were monitored during experiments. With a doubling time within 16-18 hours, all the cancer cell line pairs used in the study were counted and seeded equally before experiments.

      No significant difference in morphology or cell count (before harvesting for experiments) was noted for the stable cell lines, namely, HT1080 ST-HT1080 LT, HCT116 p53 null scrambled control-HCT116 p53 null hTERC knockdown.

      MDAMB 231 cells which were treated with guanine-rich telomere repeats (GTR) over a period of 12 days, as per the protocol mentioned in Methods. Due to the alternate day of GTR treatment in serum-free media followed by replenishment with serum-supplemented media, we noted that cells would undergo periodic delay in their proliferation (or transient arrest) aligning with the GTR oligo-feeding cycles and appeared somewhat larger in comparison to their parental untreated cells.

      Next, the cells with Cas9-telomeric sgRNA mediated telomere trimming were maintained transiently (till 3 days after transfection). During this time, no significant change in morphology or cell proliferation was observed in any of the cell lines, namely HCT116 or HEK293T Gaussia Luciferase reporter cells. iPSCs were also monitored. However, no change in morphology or cellular proliferation was observed during the 5 days post-transfection and antibiotic selection.  

      (2) Also, the entire study uses engineered cell lines, with artificially elongated or shortened telomeres that conclusively demonstrate the role of hTERT regulation by TRF2 in telomere-length dependent manner, but using ALT negative cell lines with naturally short telomere length vs those with long telomeres will give better perspective. Primary cells can also be used in this context.

      The reviewer correctly highlights (as we also acknowledge in the Discussion) that our study primarily utilizes engineered cell lines with artificially elongated or shortened telomeres. We agree that using ALT-negative cells with naturally short versus long telomeres would provide additional perspective. However, a key challenge in this experimental setup is the inherent variation in TRF2 protein levels among these cell types—a parameter central to our hypothesis. Comparing observations across such non-isogenic cell line pairs presents experimental limitations as these would require extensive normalization for multiple factors and introduce additional complexities, which would be difficult to interpret with clarity.

      We had also explored primary cells, specifically foreskin fibroblasts and MRC5 lung fibroblasts, as suggested by the reviewer. However, we encountered two significant challenges. To achieve a notable telomere length difference of at least 20%, these primary cells had to undergo a minimum of 25 passages. During this period, we observed a substantial decline in their proliferation capacity and an increased tendency toward replicative senescence. Additionally, we noted a significant reduction in TRF2 protein levels as the primary cells aged, consistent with findings from Fujita K et al., 2010 (Nat Cell Biol.), which reported p53-induced, Siah-1-mediated proteasomal degradation of TRF2. Due to these practical limitations, we focused on cancer cell lines with respective isogenic backgrounds, ensuring a controlled experimental framework. On the other hand, this opens new avenues for future research to explore broader implications. Investigating other primary cell types that may not present these challenges could be a valuable direction for future studies.

      (3) The authors set up time-dependent telomere length changes by dox induction, which may differ from the gradual telomere attrition or elongation that occurs naturally during aging, disease progression, or therapy. This aspect should be explored.

      In this study, we utilized a Doxycycline-inducible hTERT expression system to modulate telomere length in cancer cells, aiming to capture any gradual changes that might occur upon steady telomerase induction or overexpression—an event frequently observed in cancer progression. We monitored telomere length and telomerase activity at regular intervals (Supplementary Figure 2), noting a gradual increase until a characteristic threshold was reached, followed by a reversal to the initial telomere length.

      While this model provides interesting insights in context of cancer cells, it does not replicate the conditions of aging or therapeutic intervention. We agree that exploring telomere length-dependent regulation of hTERT in normal aging cells is an important avenue for future research. Investigating TRF2 occupancy on the hTERT promoter in response to telomere length alterations through therapeutic interventions—such as telomestatin or imetelstat (telomerase inhibitors) and 6-thio-2’-deoxyguanosine (telomere damage inducer)—would provide valuable insights and warrants further exploration.

      (4) How does the hTERT regulation by TRF2 in a TL-dependent manner affect the ETS binding on hTERT mutant promoter sites?

      In our previous study (Sharma et al., 2021, Cell Reports), we have experimentally demonstrated that GABPA and TRF2 do not compete for binding at the mutant hTERT promoter (Figure 4M-R). Silencing GABPA in various mutant hTERT promoter cells did not increase TRF2 binding. While GABPA has been reported to show increased binding at the mutant promoter compared to the wild-type (Bell et al., 2015, Science), no telomere length (TL) sensitivity has been noted yet. In the current manuscript we show that telomere alterations in hTERT mutant cells (that do not form promoter G-quadruplex) does not significantly affect TRF2 occupancy at the promoter, reinforcing our earlier findings that G-quadruplex formation is crucial for TRF2 recruitment. Since TRF2 binding is not affected this would not impact GABPA binding. Therefore change in TL is unlikely to influence ETS binding by GABPA.

      (5) Stabilization of the G-quadruplex structures in ST and LT conditions along with the G4 disruption experimentation (demonstrated by the authors) will strengthen the hypothesis.

      We agree with the reviewer’s suggestion that stabilizing G-quadruplex (G4) structures in mutant promoter cells under ST and LT conditions would further strengthen our hypothesis. From our ChIP experiments on hTERT promoter mutant cells following G4 stabilization with ligands, as reported in Sharma et al. 2021 (Figure 5G), we observed that TRF2 occupancy was regained in the telomere-length unaltered versions of -124G>A and -146G>A HEK293T Gaussia luciferase cells (referred to as LT cells in the current manuscript).

      (6) The telomere length and the telomerase activity are not very consistent (Figure 2A, and S1A, Figure 4B and S3). Please comment.

      In this study, we employed both telomerase-dependent and independent methods for telomere elongation.

      HT1080 model: Telomere elongation resulted from constitutive overexpression of hTERC and hTERT, leading to a direct correlation with telomerase activity.

      HCT116 (p53-null) model: hTERC silencing in ST cells, a known limiting factor for telomerase activity, resulted in significantly lower telomerase activity and a 1.5-fold telomere length difference.

      MDAMB231 model: Guanine-rich telomeric repeat (GTR) feeding induced telomere elongation through recombinatorial mechanisms (Wright et al., 1996), leading to significant telomere length gain but no notable change in telomerase activity.

      HCT116 Cas9-telomeric sgRNA model: Telomere shortening occurred without modifying telomerase components, resulting in a minor, insignificant increase in telomerase activity (Figure 2A, S1).

      Regarding xenograft-derived HT1080 ST and LT cells (Figure 4B, S3), the observed variability in telomere length and telomerase activity may stem from infiltrating mouse cells, which naturally have longer telomeres and higher telomerase activity than human cells. Since in the reported assay tumour masses were not sorted to exclude mouse cells, using species-specific markers or fluorescently labelled HT1080 cells in future experiments would minimize bias. However, even though telomere length and telomerase activity assays cannot differentiate for cross-species differences, mRNA analysis and ChIP experiments performed specifically for hTERT and hTERC mRNA levels, TRF2 occupancy, and H3K27me3 enrichment on hTERT promoter (Figure 4B–E) strongly support our conclusions.

      (7) Please comment on the other telomere-associated proteins or regulatory pathways that might contribute to hTERT expression based on telomere length.

      The current study provides experimental evidence that TRF2, a well-characterized telomere-binding protein, mediates crosstalk between telomeres and the regulatory region of the hTERT gene in a telomere length-dependent manner. Given the observed link between hTERT expression and telomere length, it is likely that additional telomere-associated proteins and regulatory pathways contribute to this regulation.

      The remaining shelterin complex components—POT1, hRap1, TRF1, TIN2, and TPP1—may play crucial roles in this context, as they are integral to telomere maintenance and protection (Stewart J et al., 2012 Mutat Res.). Additionally, several DNA damage response (DDR) proteins, which interact with telomere-binding factors and help preserve telomere integrity, could potentially influence hTERT regulation in a telomere length-dependent manner (Longhese M, 2008 Genes & Development). However, direct interactions or regulatory roles would require further experimental validation. Another group of proteins with potential relevance in this mechanism are the sirtuins, which directly associate with telomeres and are known to positively regulate telomere length, undergoing repression upon telomere shortening (Amano H et al., 2019 Cell Metabolism, Amano H, Sahin E 2019 Molecular & Cellular Oncology). Notably, SIRT1 has been reported to interact with telomerase (Lee SE et al., 2024, Biochem Biophys Res Commun.), while SIRT6 has been implicated in TRF2 degradation (Rizzo et al. 2017) and telomerase activation (Chen J et al. 2021, Aging) . Given their roles in telomere homeostasis, sirtuins may serve as key mediators of telomere length-dependent hTERT regulation.

      Based on this suggestion, we have included the above in Discussion.

      Reviewer #2 (Public review):

      Summary:

      Telomeres are key genomic structures linked to everything from aging to cancer. These key structures at the end of chromosomes protect them from degradation during replication and rely on a complex made up of human telomerase RNA gene (hTERC) and human telomerase reverse transcriptase (hTERT). While hTERC is expressed in all cells, the amount of hTERT is tightly controlled. The main hypothesis being tested is whether telomere length itself could regulate the hTERT enzyme. The authors conducted several experiments with different methods to alter telomere length and measured the binding of key regulatory proteins to this gene. It was generally observed that the shortening of telomere length leads to the recruitment of factors that reduce hTERT expression and lengthening of telomeres has the opposite effect. To rule out direct chromatin looping between telomeres and hTERT as driving this effect artificial constructs were designed and inserted a significant distance away and similar results were obtained.

      Overall, the claims of telomere length-dependent regulation of hTERT are supported throughout the manuscript.

      Strengths:

      The paper has several important strengths. Firstly, it uses several methods and cell lines that consistently demonstrate the same directionality of the findings. Secondly, it builds on established findings in the field but still demonstrates how this mechanism is separate from that which has been observed. Specifically, designing and implementing luciferase assays in the CCR5 locus supports that direct chromatin looping isn't necessary to drive this effect with TRF2 binding. Another strength of this paper is that it has been built on a variety of other studies that have established principles such as G4-DNA in the hTERT locus and TRF2 binding to these G4 sites.

      Weaknesses:

      The largest technical weakness of the paper is that minimal replicates are used for each experiment. I understand that these kinds of experiments are quite costly, and many of the effects are quite large, however, experiments such as the flow cytometry or the IPSC telomere length and activity assays appear to be based on a single sample, and several are based upon two maximum three biological replicates. If samples were added the main effects would likely hold, and many of the assays using GAPDH as a control would result in significant differences between the groups. This unnecessarily weakens the strength of the claims.

      We appreciate the reviewer’s recognition of the resource-intensive nature of our experiments, and we are confident in the robustness of the observed results. Due to the project’s timeline constraints and the need for consistency across experiments, we have reported findings based on 3 biological replicates with appropriate statistical analysis.

      Regarding the fibroblast-iPSC model, we would like to clarify that we have presented data from two independent biological replicates, each consisting of a fibroblast and its derived iPS cell pair, rather than a single sample. Additionally, the Tel-FACS assays involved analysing at least 10,000 events, ensuring statistical significance in all cases.

      Another detail that weakens the confidence in the claims is that throughout the manuscript there are several examples of the control group with zero variance between any of the samples: e.g. Figure 2K, Figure 3N, and Figure 6G. It is my understanding that a delta delta method has been used for calculation (though no exact formula is reported and would assist in understanding). If this is the case, then an average of the control group would be used to calculate that fold change and variance would exist in the group. The only way I could understand those control group samples always set to 1 is if a tube of cells was divided into conditions and therefore normalized to the control group in each case. A clearer description in the figure legend and methods would be required if this is what was done and repeated measures ANOVA and other statistics should accompany this.

      The above point has been raised by the reviewer in the 'Recommendations for Authors' section as well. We have addressed it in detail in that section, citing each figure where the reviewer noted a concern regarding the lack of variance. Changes made in the manuscript have also been highlighted there.

      We would like to clarify that, throughout the manuscript, fold changes were previously calculated independently for each biological replicate by normalizing treated conditions to their corresponding control (untreated or Day 0) sample within the same replicate. This means that the control group is normalized to 1 individually in each replicate, resulting in an apparent lack of variance in the control when plotted. The normalization was not performed using an averaged control value across replicates. As such, the absence of visible variance in the control group reflects the normalization method rather than a true lack of variability in the underlying data.

      In the revised version of the manuscript, we have carefully considered the reviewer’s comments and applied changes wherever appropriate. For example (detailed response in the ‘Recommendations for Authors’ section), in datasets where two distinct stable cell lines are compared (e.g., HT1080 ST/LT and HCT p53-null ST/LT), unpaired statistical analysis is more appropriate. Hence, we have updated these panels accordingly and indicated the statistical methods used in the figure legends and Methods section. However, in experiments where cells were indeed seeded separately and subsequently subjected to experimental conditions—representing paired samples—we have chosen not to make any changes. A clearer description of this procedure has, however, been added to the Methods and figure legends to ensure full transparency.

      We believe this approach accurately reflects the experimental design, appropriately addresses the reviewer’s concerns regarding variance and statistical analysis, and ensures clarity and rigor in data reporting.

      A final technical weakness of the paper is the data in Figure 5 where the modified hTERT promoter was inserted upstream of the luciferase gene. Specifically, it is unclear why data was not directly compared between the constructs that could and could not form G4s to make this point. For this reason, the large variance in several samples, and minimal biological replicates, this data was the least convincing in the manuscript (though other papers from this laboratory and others support the claim, it is not convincing standalone data).

      We appreciate the reviewer's thoughtful feedback on the presentation of the luciferase assay data in Figure 5. The data for the wild-type hTERT promoter (capable of forming G4 structures) was previously reported in Figure 2G-K. To avoid redundancy in data presentation, we initially chose to report the results of the mutated promoter separately. However, we recognize that directly comparing the wild-type and mutated promoter constructs within the same figure would provide clearer context and strengthen the interpretation of the results. In light of this, we have updated Figure 5 in the revised manuscript to include the data for both constructs, ensuring a more comprehensive and informative comparison.

      The second largest weakness of the paper is formatting.

      When I initially read the paper without a careful reading of the methods, I thought that the authors did not have appropriate controls meaning that if a method is applied to lengthen, there should be one that is not lengthened, and when a method is applied to shorten, one which is not shortened should be analysed as well. In fact, this is what the authors have done with isogenic controls. However, by describing all samples as either telomere short or telomere long, while this simplifies the writing and the colour scheme, it makes it less clear that each experiment is performed relative to an unmodified. I would suggest putting the isogenic control in one colour, the artificially shortened in another, and the artificially lengthened in another.

      Similarly, the graphs, in general, should be consistent with labelling. Figure 2 was the most confusing. I would suggest one dotted line with cell lines above it, and then the method of either elongation or shortening below it. I.e. HT1080 above, hTERC overexpression below, MDAMB-231 above guanine terminal repeats below, like was done on the right. Figure 2 readability would also be improved by putting hTERT promoter GAPDH (-ve control) under each graph that uses this (Panel B and Panel C not just Panel C). All information is contained in the manuscript but one must currently flip between figure legends, methods, and figures to understand what was done and this reduces clarity for the reader.

      We thank the reviewer again for their thoughtful suggestions regarding figure formatting and colour coding to improve clarity. We fully understand the rationale for proposing separate colours for unmodified, telomere-shortened, and telomere-lengthened groups, as this could make the experimental design more immediately apparent. However, after careful consideration, we believe that implementing this change across all figures may unintentionally reduce clarity in other aspects  (presented in other figures) of the data presentation. This is further explained below.

      Specifically, applying three distinct colours throughout would make it harder to visually track key biological trends—such as changes in chromatin occupancy—across different models. For instance, the same colour could represent opposing regulatory patterns in distinct contexts (e.g., upregulation in one model and downregulation in another), which will make these figures difficult to understand. We feel that maintaining a consistent colour scheme based on telomere status—i.e., long telomeres (LT) vs short telomeres (ST)—across figures facilitates better comparison of biological outcomes across different experimental systems.

      Nevertheless, to address the reviewer’s concern about clarity in experimental design, we have added more detailed descriptions of the methodology and model systems used, in both the Methods and figure legend sections. These updates aim to make it easier for the reader to follow which groups serve as isogenic controls versus modified samples, without disrupting the consistency of data visualization.

      We hope this strikes a balance between improving clarity and preserving the interpretability of the broader biological trends presented in our manuscript.

      Please note, we have incorporated the reviewer’s suggestion to indicate details of model generation for HT1080 and MDAMB 231 cell lines in Figure 2. To quote the reviewer,  

      “I would suggest one dotted line with cell lines above it, and then the method of either elongation or shortening below it. I.e. HT1080 above, hTERC overexpression below, MDAMB-231 above guanine terminal repeats below, like was done on the right.”

      We have also put hTERT promoter GAPDH (-ve control) under each graph and not at the end of Panel C in Figure 2, as suggested by reviewer.

      Reviewer #1 (Recommendations for the authors):

      (1) Please check for grammatical errors throughout the manuscript.

      We have gone through the manuscript thoroughly, checked and corrected it for grammatical errors if and where detected.

      (2) Please use both the FACS and qPCR-based assays to check telomere length in all the experiments to strengthen the observations.

      We would like to thank the reviewer for this valuable suggestion. We confirm that both FACS- and qPCR-based assays were performed to assess telomere length in our experiments. In the original submission, we chose to present primarily the FACS-based data in the main figures. This decision was based on the inherent differences in the measurement principles of the two methods, which can lead to discrepancies in the reported fold changes. We were concerned that presenting both datasets side by side in the main figures might lead to confusion for readers who are not directly familiar with the nuances of telomere length assays.

      However, in light of the reviewer’s suggestion, we have now included the qPCR-based data as Supplementary Figure 1A, and updated the manuscript text and figure legends accordingly to reflect this addition.

      (3) Correct the labeling in the legend (Figure 2).

      We have corrected legend of Figure 2. Thanks to the reviewer for pointing it out.

      (4) In Figure 6B, why TRF WT condition have higher hTERT expression than the UT condition?

      We thank the reviewer for noting that the hTERT mRNA levels, as estimated by FISH in Figure 6B, appear slightly higher in TRF2 WT overexpressing HT1080 cells compared to the untransfected (UT) condition. Specifically, the average mean intensity values (a.u.) were 53 for UT and 57 for WT. Although this difference was not statistically significant, we acknowledge the reviewer's observation. Currently, we do not have a clear explanation for this small, non-significant variation.

      Importantly, using the same FISH-based method, we observed a significant upregulation of hTERT mRNA levels upon TRF2 R17H overexpression compared to both UT and TRF2 WT conditions, supporting our key conclusions.

      Additionally, qRT-PCR analysis of hTERT mRNA levels in cells stably expressing TRF2 WT (induced by doxycycline) consistently showed a significant downregulation compared to the uninduced (equivalent to UT in the microscopy experiments) state. These results were robust and reproducible across three different cell lines, including HT1080. Consistently, TRF2 R17H expression led to significant upregulation of hTERT mRNA levels upon induction.

      Together, these complementary findings strengthen the validity of our observations.

      (5) In telomere length between ST and LT in Fig. 5B significant? (especially the right panel -146G>A).

      We consistently worked with approximately 20–30% telomere shortening in HEK293 cells across all three cell types (WT promoter, -124G>A, and -146G>A), as this range was reproducibly achieved within the experimental timeframe without risking excessive telomere trimming. The reported telomere length differences are based on FACS analysis of more than 10,000 events per condition, providing strong statistical significance. Importantly, while the absolute differences in telomere length may appear modest, their biological impact is evident in the distinct cellular characteristics observed between ST and LT cell pairs.

      Reviewer #2 (Recommendations for the authors):

      As mentioned above it was somewhat unclear why so many instances of control groups had no variance between them. A more complete reporting of the formulas used to calculate the results, and methods (if samples were divided from a single source into different conditions) would be appreciated.

      We thank the reviewer for their valuable and detailed feedback. The instances where the control groups appeared to lack variance were mainly mRNA data (Figure 2D, 3G,3N), luciferase activity (Figure 2K), and in vitro methyltransferase activity (Figure 6G). We shall try to categorically address them all. 

      In Figure 2D, for the MDA-MB-231  GTR oligo and HCT116 telomere trimming datasets, the untreated cells were seeded separately and subsequently used to generate the treated conditions within the same experiment. Thus, these two datasets represent paired experimental conditions. Fold changes were calculated independently for each replicate (paired samples), and the fold changes across replicates were plotted. Because the control group serves as a common baseline within each pair and fold changes are normalized individually, minimal variance appears across controls. Given the experimental design, we believe no change is necessary for these panels. However, we have provided additional clarification regarding the calculation formulas and sample handling in the Methods section to avoid any ambiguity.

      For the ST/LT versions in HT1080 and HCT p53-null background cells, while each replicate could technically be treated as paired, these could be treated as four distinct stable cell lines. Hence, we agree it would be appropriate to apply unpaired statistical analysis for these datasets. We have updated the plots accordingly and described the statistical methods in detail in the figure legends and Methods section.

      Figure 3G and 3N depict the doxycycline-induced cells which follow the design: untreated and dox-treated conditions were seeded from the same batch of cells into separate flasks and treated differently. Hence, these are also paired cases, and fold changes were calculated per replicate before plotting. Therefore, we believe no changes are necessary for these panels. However, we have provided more details regarding sample handling in the Methods section to avoid any ambiguity.

      In Figure 2K, previously we had plotted fold change in luciferase activity over short telomere (ST) cells, for each independent biological replicates. However, to address the reviewer’s concern of not showing variance in control group, we have now plotted the luminescence signal (normalised over total protein). We have also updated Figure 5E accordingly, and also included WT promoter data along with the mutant cell line data- as was suggested in public reviewer’s comment.

      In Figure 6G, as each replicate of the in vitro methyltransferase activity used different batches of purified protein, there are inherent batch differences that were accounted for by normalizing each replicate internally. Fold changes were then determined for each replicate separately, as previously described. The fold changes across replicates were plotted, and significance between different conditions was tested using two-way ANOVA. To address the reviewer’s comment to show variance in the control, we have now plotted individual replicates.

      We believe these revisions, along with the expanded methods clarification, will fully address the reviewer's concerns and accurately reflect the experimental design and statistical analysis applied.

      Many times, in the manuscript a / is used to indicate both directions. For example: "Genes distal from telomeres (for instance 60 Mb from the nearest telomere) were activated/repressed in a TL-dependent way"... "Resulting increase/decrease in non-telomeric promoter-bound TRF2 affected gene expression". For readability, either this can be replaced with a directionless word like altered, changed, etc, or the writer can list both directions.

      We thank the reviewer for the careful reading and thoughtful suggestions. In the manuscript, we have used the ‘/’ symbol to indicate opposing directions, followed by the word ‘respectively’ to relate these directions to their corresponding outcomes, wherever appropriate. However, as rightly pointed out, certain sentences would benefit from alternative constructions for improved clarity and readability. We have therefore reviewed the manuscript and revised such sentences, making minor modifications wherever necessary, as outlined below.

      We found hTERT was transcriptionally altered depending on telomere length (TL).

      Notably, another conceptually distinct mechanism of TL-dependent gene regulation was reported which influenced genes spread throughout the genome: expression of genes distal from telomeres (for instance 60 Mb from the nearest telomere) was altered in a TL-dependent way, but without physical telomere looping interactions.

      Second, the shortening or elongation of telomeres led to the release or sequestration of telomeric TRF2, respectively, thereby increasing or decreasing the availability of TRF2 at non-telomeric promoters and affecting gene expression.

      A non-necessary, but potentially extra convincing experiment to perform would be to use a combination of light-activated, or ligand-activated cas9 telomere trimming and guanine terminal repeat additions in the same cell line. Like the dox experiments, this would show over time how altering telomere length alters the recruitment of heterochromatin factors and hTERT levels. Executing the experiment this way would be more definitive as it does not rely on changing hTERT itself. Authors do already have examples that support their claims.

      We thank the reviewer for suggesting this additional experiment (reviewer mentions as non-necessary), which would indeed provide valuable insights into the relationship between telomere length, heterochromatin factor recruitment, and hTERT levels. While we recognize the potential of this approach, due to constraints on resources, we are currently unable to execute this experiment. However, we believe that the existing data presented in the manuscript already supports our conclusions effectively.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      In this study, the authors showed that enalapril was able to reduce cellular senescence and improve health status in aged mice. The authors further showed that phosphorylated Smad1/5/9 was significantly elevated and blocking this pathway attenuated the protection of cells from senescence. When middle-aged mice were treated with enalapril, the physiological performance in several tissues, including memory capacity, renal function, and muscle strength, exhibited significant improvement.

      Strengths:

      The strength of the study lies in the identification of the pSMAD1/5/9 pathway as the underlying mechanism mediating the anti-senescence effects of enalapril with comprehensive evaluation both in vitro and in vivo.

      Thank you for your patient reading and great efforts to advance our research! Your comments are shown in bold font below, and specific concerns have been numbered. Our point-by-point answers are provided in standard blue font, with all modifications and additions to the MS highlighted in red text.

      Weaknesses:

      (1) The major weakness of the study is the in vivo data. Despite the evidence shown in the in vitro study, there is no data to show that blocking the pSmad1/5/9 pathway is able to attenuate the anti-aging effects of enalapril in the mice. In addition, the aging phenotypes mitigation by enalapril is not evidenced by the extension of lifespan.

      Many thanks for your careful reading and valuable comments! We fully agree with this comment. In accordance with your suggestion, we administered LDN193189 to investigate its suppressive effects on pSmad1/5/9 signaling in vivo. Notably, pharmacological inhibition of pSmad1/5/9 resulted in upregulation of enalapril-suppressed SASP factors, while conversely leading to marked decrease of downstream antioxidant genes expression across multiple organ systems (Revised Fig. S7). These analyses and corresponding sentences have been added in the Result section of the revised MS (Revised Fig.S7, Lines 222–223, 444–448).

      Additionally, aging-related behavioral phenotypes were also examined following pSmad1/5/9 inhibition, including decreased muscle strength and endurance, impaired spatial memory and increased anxiety behaviors (Revised Fig. S8). These analyses and corresponding sentences have been added in the Result section of the revised MS (Revised Fig.S8, Lines 476–480). Collectively, these findings demonstrate that the anti-aging effects of enalapril in mice are mediated through the pSmad1/5/9 pathway.

      In this study, we focused exclusively on assessing the improvement in the health status of aged mice, which indicates that enalapril can extend the healthspan of aged mice. While we agree that lifespan extension is an important indicator of anti-aging potential, recent studies have emphasized that healthspan, rather than lifespan alone, provides a more relevant and translational measure of aging interventions, particularly in the context of chronic disease and quality of life in aged individuals (Kennedy et al., 2014; Lopez-Otin et al., 2023). Moreover, given the strong influence of genetic background, environmental factors and stochastic events on lifespan, focusing on functional rejuvenation and delayed onset of aging-related pathologies may offer a more practical and mechanistically informative approach. Our study aims to elucidate how enalapril enhances healthy phenotypes in aged mice, however, we acknowledge the critical need for direct lifespan evaluation and intend to address this limitation in subsequent research. We sincerely hope that these explanations address your concerns.

      (2) If it is necessary to show that NAC is able to attenuate enalapril effects in the aging mice. In addition, it would be beneficial to test if enalapril is able to achieve similar rescue in a premature aging mouse model.

      Thanks for your suggestion. We apologize for any confusion that may have arisen due to the wording in the original manuscript. N-acetylcysteine (NAC) is widely reported as an antioxidant that scavenges reactive oxygen species (ROS) (Huang et al., 2020; Zafarullah et al., 2003). In our study, enalapril was also observed to reduce ROS levels. Therefore, NAC is unlikely to antagonize the effects of enalapril in this context, as both compounds act in a similar direction with respect to oxidative stress mitigation. To avoid potential misunderstanding, we have carefully reviewed the relevant statements in the MS and revised the text to clarify this point.

      We sincerely appreciate this valuable suggestion to evaluate enalapril in a premature aging mouse model; however, the premature aging mouse models represent a pathological form of aging, whereas the naturally aged mouse models used in our study reflect physiological aging processes. While we observed beneficial effects of enalapril in naturally aged mice, these effects may not necessarily extend to premature aging models due to fundamental differences in the underlying mechanisms and progression of aging. Natural aging is characterized by the gradual accumulation of cellular damage, driven by multifactorial processes such as inflammaging and mitochondrial dysfunction. In this context, enalapril appears effective, in part by modulating SASP factors and reducing oxidative stress through the BMP-Smad signaling axis (Revised Fig. 4, 5) (Lopez-Otin et al., 2023). In contrast, premature aging models are driven by distinct mechanisms like nuclear lamina defects, which may not respond similarly to BMP-Smad axis. Moreover, genetic background, strain variability, and specific model characteristics can significantly influence treatment outcomes (Mitchell et al., 2016). For instance, rapamycin extends lifespan in wild-type mice but shows limited effects on aging, underscoring the challenge of extrapolating findings across distinct aging models (Neff et al., 2013). We sincerely hope that these explanations address your concerns. Thank you again for your great efforts in advancing our research!

      Reviewer #2 (Public review):

      This manuscript presents an interesting study of enalapril for its potential impact on senescence through the activation of Smad1/5/9 signaling with a focus on antioxidative gene expression. Repurposing enalapril in this context provides a fresh perspective on its effects beyond blood pressure regulation. The authors make a strong case for the importance of Smad1/5/9 in this process, and the inclusion of both in vitro and in vivo models adds value to the findings. Below, I have a few comments and suggestions which may help improve the manuscript.

      We appreciate your great efforts in advancing our research! Your comments are shown in bold font below, and specific concerns have been numbered. Our point-by-point answers are provided in standard blue font, with all modifications and additions to the MS highlighted in red text.

      (1) A major finding in the study is that phosphorylated Smad1/5/9 mediates the effects of enalapril. However, the manuscript focused on the Smad pathway relatively abruptly, and the rationale behind targeting this specific pathway is not fully explained. What makes Smad1/5/9 particularly relevant to the context of this study?

      Thank you for your informative guidance, and we regret for the unclear description. As stated in the MS, after we found that enalapril could improve the cellular senescence phenotype, we screened and examined key targets in important aging-related signaling pathways, such as AKT, mTOR, ERK, Smad2/3 and Smad1/5/9 (Revised Fig. S2A, Revised Fig. 2A). We found that only the phosphorylation levels of Smad1/5/9 significantly increased after enalapril treatment. Therefore, the subsequent focus of this study is on pSmad1/5/9. We sincerely hope that these explanations address your concerns.

      (2) Furthermore, their finding that activation of Smad1/5/9 leads to a reduction of senescence appears somewhat contradictory to the established literature on Smad1/5/9 in senescence. For instance, studies have shown that BMP4-induced senescence involves the activation of Smad1/5/8 (Smad1/5/9), leading to the upregulation of senescence markers like p16 and p21 (JBC, 2009, 284, 12153). Similarly, phosphorylated Smad1/5/8 has been shown to promote and maintain senescence in Ras-activated cells (PLOS Genetics, 2011, 7, e1002359). Could the authors provide more detailed mechanistic insights into why enalapril seems to reverse the typical pro-senescent role of Smad1/5/9 in their study?

      Many thanks for your helpful comments! The downstream regulatory network of BMP-pSmad1/5/9 is highly complex. The BMP-SMAD-ID axis has been mentioned in many studies, and its downstream signaling inhibits the expression of p16 and p21 (Hayashi et al., 2016; Ying et al., 2003). Additionally, studies have also found that the Smad1-Stat1-P21 axis inhibits osteoblast senescence (Xu et al., 2022). In our study, enalapril was found to increase the expression of ID1, which is a classic downstream target of pSmad1/5/9 (Genander et al., 2014). Therefore, pSmad1/5/9 inhibits cellular senescence markers such as p16, p21 and SASP through ID1, thereby promoting cell proliferation (Revised Fig. 3). Furthermore, we also found that pSmad1/5/9 increases the expression of antioxidant genes and reduces ROS levels, exerting antioxidant effects (Revised Fig. 4). Together, ID1 and antioxidant genes enable pSmad1/5/9 to exert its anti-senescence effects. We sincerely hope that these explanations address your concerns.

      (3) While the authors showed that enalapril increases pSmad1/5/9 phosphorylation, what are the expression levels of other key and related factors like Smad4, pSmad2, pSmad3, BMP2, and BMP4 in both senescent and non-senescent cells? These data will help clarify the broader signaling effects.

      Thanks for your insightful suggestions. We observed an increase in pSmad1/5/9 and Smad4 expression, while the levels of pSmad2 and pSmad3 remained unchanged after enalapril treatment (Revised Fig. 2A). Consistently, we found that the levels of pSmad1/5/9 and Smad4 were markedly reduced in senescent cells, aligning with the upregulation of these proteins by enalapril (Revised Fig. S2B). In contrast, pSmad2 and pSmad3 showed a slight increase during senescence, while BMP2 and BMP4 were slightly decreased, though these changes were not statistically significant (Revised Fig. S2B). These findings suggest that enalapril primarily exerts its effects by enhancing pSmad1/5/9 and Smad4 levels, thereby regulating downstream target genes and contributing to the restoration of a more youthful cellular state. These analyses and corresponding sentences have been added in the Result section of the revised MS (Revised Fig.S2B, Lines 303–306, 311–313).

      (4) They used BMP receptor inhibitor LDN193189 to pharmacologically inhibit BMP signaling, but it would be more convincing to also include genetic validation (e.g., knockdown or knockout of BMP2 or BMP4). This will help confirm that the observed effects are truly due to BMP-Smad signaling and not off-target effects of the pharmacological inhibitor LDN.

      Many thanks for your careful reading and valuable comments! We used shRNA to knockdown the BMP receptor BMPR1A, which led to a reduction in Smad1/5/9 phosphorylation (Revised Fig. S4D, E). This was accompanied by senescence-associated phenotypes, including increased expression of p16 and SA-β-gal and decreased Ki67 staining (Revised Fig. S4F, G). Notably, the addition of enalapril failed to reverse these senescence phenotypes under BMPR1A knockdown conditions, mirroring the results observed with the BMP receptor inhibitor LDN193189 (Revised Fig. S4F, G, Revised Fig. 2F, G). Furthermore, knockdown of BMPR1A also resulted in a marked decrease in the expression of downstream targets, such as ID1 and antioxidative genes (Revised Fig. S4D). These findings strongly support the notion that enalapril exerts its anti-senescence effects through BMP-Smad signaling. These analyses and corresponding sentences have been added in the Result section of the revised MS (Revised Fig.S4D–G, Lines 323–329, 335–337, 348–351, 416–418).

      (5) I don't see the results on the changes in senescence markers p16 and p21 in the mouse models treated with enalapril. Similarly, the effects of enalapril treatment on some key SASP factors, such as TNF-α, MCP-1, IL-1β, and IL-1α, are missing, particularly in serum and tissues. These are important data to evaluate the effect of enalapril on senescence.

      Thanks for your comments. As for the markers p16 and p21, we observed no change in p16, while the changes in p21 varied across different organs and tissues. Nevertheless, behavioral experiments and physiological and biochemical indicators at the individual level consistently demonstrated the significant anti-aging effects of enalapril (Revised Fig. 6).

      We also examined the changes in SASP factors in the serum of mice after enalapril treatment. Notably, SASP factors such as CCL (MCP), CXCL and TNFRS11B showed significant decreases (Revised Fig. 5C). The expression changes of SASP factors varied across different organs. In the liver, kidneys and spleen, the expression of IL1a and IL1b decreased, while TNFRS11B expression decreased in both the liver and muscles (Revised Fig. 5B). Additionally, CCL (MCP) levels decreased in all organs (Revised Fig. 5B). We sincerely hope that these explanations address your concerns.

      (6) Given that enalapril is primarily known as an antihypertensive, it would be helpful to include data on how it affects blood pressure in the aged mouse models, such as systolic and diastolic blood pressure. This will clarify whether the observed effects are independent of or influenced by changes in blood pressure.

      Thanks for your comments. While enalapril is primarily recognized for its antihypertensive properties, in our experimental setting involving aged, normotensive mice, we did not observe notable changes in systolic or diastolic blood pressure following enalapril administration. This observation aligns with previous reports indicating that enalapril does not significantly affect blood pressure in similar non-hypertensive aging models (Keller et al., 2019). Based on these findings, we cautiously interpret that the beneficial effects of enalapril observed in our study are unlikely to be driven by changes in blood pressure. We sincerely hope that these explanations address your concerns. Again, thank you for the constructive comments to advance the understanding of our work!

      Reviewer #1 (Recommendations for the authors):

      This is an interesting study that reveals enalapril is able to elevate the pSmad1/5/9 pathway to reduce ROS and inflammation to improve the health status in vitro and in vivo. While the pathway is clearly shown in cells to be involved in the enalarpril-mediated mitigation of aging, little was done to demonstrate this pathway is responsible for the in vivo effects in the physiological improvements. This can be done by ROS-reduction chemicals such as NAC and also the use of BMP receptor inhibitor LDN193189 (LDN). It is critical to show the lifespan extension in enalapril-treated animals given that the significantly improved physiological functions.

      Thanks very much for your constructive recommendations. This part has already been addressed in our response to the public review.

      Reviewer #2 (Recommendations for the authors):

      The term "anti-aging" appears frequently throughout the manuscript, including in the title. However, the study doesn't directly address lifespan or a comprehensive range of aging symptoms, which are also difficult to define and measure. Many of the observed effects appeared to be driven by senescence. To be more accurate, I recommend avoiding terms like "anti-aging" and "mitigates aging", and instead replacing them with more specific phrases such as "anti-senescence", "senescence reduction/suppression", or "mitigates age-related symptoms" to better reflect the scope of the study and avoid overstating the findings.

      Thanks very much for your constructive recommendations. In accordance with your suggestion, we have revised all uses of the term “aging” in the MS. To facilitate review, all changes have been clearly marked in red text.

      Please provide detailed information on the antibodies used, particularly those targeting pSmad1/5/9 and other Smads.

      Thanks for your helpful comment. In response, we have now provided detailed information regarding the antibodies used in this study in Revised Table S4 (Revised MS, Page 120–121).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      This fundamental study identifies a new mechanism that involves a mycobacterial nucleomodulin manipulation of the host histone methyltransferase COMPASS complex to promote infection. Although other intracellular pathogens are known to manipulate histone methylation, this is the first report demonstrating the specific targeting of the COMPASS complex by a pathogen. The rigorous experimental design using state-of-the art bioinformatic analysis, protein modeling, molecular and cellular interaction, and functional approaches, culminating with in vivo infection modeling, provides convincing, unequivocal evidence that supports the authors' claims. This work will be of particular interest to cellular microbiologists working on microbial virulence mechanisms and effectors, specifically nucleomodulins, and cell/cancer biologists that examine COMPASS dysfunction in cancer biology. 

      Strengths: 

      (1) The strengths of this study include the rigorous and comprehensive experimental design that involved numerous state-of-the-art approaches to identify potential nucleomodulins, define molecular nucleomodulin-host interactions, cellular nucleomodulin localization, intracellular survival, and inflammatory gene transcriptional responses, and confirmation of the inflammatory and infection phenotype in a small animal model. 

      (2) The use of bioinformatic, cellular, and in vivo modeling that are consistent and support the overall conclusions is a strength of the study. In addition, the rigorous experimental design and data analysis, including the supplemental data provided, further strengthen the evidence supporting the conclusions. 

      Weaknesses: 

      (1) This work could be stronger if the MgdE-COMPASS subunit interactions that negatively impact COMPASS complex function were better defined. Since the COMPASS complex consists of many enzymes, examining the functional impact on each of the components would be interesting. 

      We thank the reviewer for this insightful comment. A biochemistry assays could be helpful to interpret the functional impact on each of the components by MgdE interaction. However, the purification of the COMPASS complex could be a hard task itself due to the complexity of the full COMPASS complex along with its dynamic structural properties and limited solubility. 

      (2) Examining the impact of WDR5 inhibitors on histone methylation, gene transcription, and mycobacterial infection could provide additional rigor and provide useful information related to the mechanisms and specific role of WDR5 inhibition on mycobacterial infection. 

      We thank the reviewer for the comment. A previous study showed that WIN-site inhibitors, such as compound C6, can displace WDR5 from chromatin, leading to a reduction in global H3K4me3 levels and suppression of immune-related gene expression (Hung et al., Nucleic Acids Res, 2018; Bryan et al., Nucleic Acids Res, 2020). These results closely mirror the functional effects we observed for MgdE, suggesting that MgdE may act as a functional mimic of WDR5 inhibition. This supports our proposed model in which MgdE disrupts COMPASS activity by targeting WDR5, thereby dampening host pro-inflammatory responses.

      (3) The interaction between MgdE and COMPASS complex subunit ASH2L is relatively undefined, and studies to understand the relationship between WDR5 and ASH2L in COMPASS complex function during infection could provide interesting molecular details that are undefined in this study. 

      We thank the reviewer for the comment. In this study, we constructed single and multiple point mutants of MgdE at residues S<sup>80</sup>, D<sup>244</sup>, and H<sup>247</sup> to identify key amino acids involved in its interaction with ASH2L (Figure 5A and B; Figure S5). However these mutations did not interrupt the interaction with MgdE, suggesting that more residues are involved in the interaction.

      ASH2L and WDR5 function cooperatively within the WRAD module to stabilize the SET domain and promote H3K4 methyltransferase activity with physiological conditions (Couture and Skiniotis, Epigenetics, 2013; Qu et al., Cell, 2018; Rahman et al., Proc Natl Acad Sci U S A, 2022). ASH2L interacts with RbBP5 via its SPRY domain, whereas WDR5 bridges MLL1 and RbBP5 through the WIN and WBM motifs (Chen at al., Cell Res, 2012; Park et al., Nat Commun, 2019). The interaction status between ASH2L and WDR5 during mycobacterial infection could not be determined in our current study. 

      (4) The AlphaFold prediction results for all the nuclear proteins examined could be useful. Since the interaction predictions with COMPASS subunits range from 0.77 for WDR5 and 0.47 for ASH2L, it is not clear how the focus on COMPASS complex over other nuclear proteins was determined.  

      We thank the reviewer for the comment. We employed AlphaFold to predict the interactions between MgdE and the major nuclear proteins. This screen identified several subunits of the SET1/COMPASS complex as high-confidence candidates for interaction with MgdE (Supplementary Figure 4A). This result is consistent with a proteomic study by Penn et al. which reported potential interactions between MgdE and components of the human SET1/COMPASS complex based on affinity purification-mass spectrometry analysis (Penn et al., Mol Cell, 2018).

      Reviewer #2 (Public review): 

      Summary: 

      The manuscript by Chen et al addresses an important aspect of pathogenesis for mycobacterial pathogens, seeking to understand how bacterial effector proteins disrupt the host immune response. To address this question, the authors sought to identify bacterial effectors from M. tuberculosis (Mtb) that localize to the host nucleus and disrupt host gene expression as a means of impairing host immune function. 

      Strengths: 

      The researchers conducted a rigorous bioinformatic analysis to identify secreted effectors containing mammalian nuclear localization signal (NLS) sequences, which formed the basis of quantitative microscopy analysis to identify bacterial proteins that had nuclear targeting within human cells. The study used two complementary methods to detect protein-protein interaction: yeast two-hybrid assays and reciprocal immunoprecipitation (IP). The combined use of these techniques provides strong evidence of interactions between MgdE and SET1 components and suggests that the interactions are, in fact, direct. The authors also carried out a rigorous analysis of changes in gene expression in macrophages infected with the mgdE mutant BCG. They found strong and consistent effects on key cytokines such as IL6 and CSF1/2, suggesting that nuclear-localized MgdE does, in fact, alter gene expression during infection of macrophages. 

      Weaknesses: 

      There are some drawbacks in this study that limit the application of the findings to M. tuberculosis (Mtb) pathogenesis. The first concern is that much of the study relies on ectopic overexpression of proteins either in transfected non-immune cells (HEK293T) or in yeast, using 2-hybrid approaches. Some of their data in 293T cells is hard to interpret, and it is unclear if the protein-protein interactions they identify occur during natural infection with mycobacteria. The second major concern is that pathogenesis is studied using the BCG vaccine strain rather than virulent Mtb. However, overall, the key findings of the paper - that MgdE interacts with SET1 and alters gene expression are well-supported. 

      We thank the reviewer for the comment. We agree that the ectopic overexpression could not completely reflect a natural status, although these approaches were adopted in many similar experiments (Drerup et al., Molecular plant, 2013; Chen et al., Cell host & microbe, 2018; Ge et al., Autophagy, 2021). Further, the MgdE localization experiment using Mtb infected macrophages will be performed to increase the evidence in the natural infection.

      We agree with the reviewer that BCG strain could not fully recapitulate the pathogenicity or immunological complexity of M. tuberculosis infection.  We employed BCG as a biosafe surrogate model since it was acceptable in many related studies (Wang et al., Nat Immunol, 2025; Wang et al., Nat Commun, 2017; Péan et al., Nat Commun, 2017; Li et al., J Biol Chem, 2020). 

      Reviewer #3 (Public review): 

      In this study, Chen L et al. systematically analyzed the mycobacterial nucleomodulins and identified MgdE as a key nucleomodulin in pathogenesis. They found that MgdE enters into host cell nucleus through two nuclear localization signals, KRIR<sup>108-111</sup> and RLRRPR<sup>300-305</sup>, and then interacts with COMPASS complex subunits ASH2L and WDR5 to suppress H3K4 methylation-mediated transcription of pro-inflammatory cytokines, thereby promoting mycobacterial survival. This study is potentially interesting, but there are several critical issues that need to be addressed to support the conclusions of the manuscript.

      (1) Figure 2: The study identified MgdE as a nucleomodulin in mycobacteria and demonstrated its nuclear translocation via dual NLS motifs. The authors examined MgdE nuclear translocation through ectopic expression in HEK293T cells, which may not reflect physiological conditions. Nuclear-cytoplasmic fractionation experiments under mycobacterial infection should be performed to determine MgdE localization. 

      We thank the reviewer for the comment. The MgdE localization experiment using Mtb infected macrophages will be performed.

      (2) Figure 2F: The authors detected MgdE-EGFP using an anti-GFP antibody, but EGFP as a control was

      We thank the reviewer for pointing this out. The new uncropped blots containing the EGFP band will be provided in Supplementary Information.

      (3) Figure 3C-3H: The data showing that the expression of all detected genes in 24 h is comparable to that in 4 h (but not 0 h) during WT BCG infection is beyond comprehension. The issue is also present in Figure 7C, Figure 7D, and Figure S7. Moreover, since Il6, Il1β (proinflammatory), and Il10 (anti-inflammatory) were all upregulated upon MgdE deletion, how do the authors explain the phenomenon that MgdE deletion simultaneously enhanced these gene expressions? 

      We thank the reviewer for the comment. A relative quantification method was used in our qPCR experiments to normalize the WT expression levels in Figure 3C–3H, Figure 7C, 7D, and Figure S7. 

      The concurrent induction of both types of cytokines likely represents a dynamic host strategy to fine-tune immune responses during infection. This interpretation is supported by previous studies (Podleśny-Drabiniok et al., Cell Rep, 2025; Cicchese et al., Immunological Reviews, 2018).

      (4) Figure 5: The authors confirmed the interactions between MgdE and WDR5/ASH2L. How does the interaction between MgdE and WDR5 inhibit COMPASS-dependent methyltransferase activity? Additionally, the precise MgdE-ASH2L binding interface and its functional impact on COMPASS assembly or activity require clarification. 

      We thank the reviewer for this insightful comment. We cautiously speculate that the MgdE interaction inhibits COMPASS-dependent methyltransferase activity by interfering with the integrity and stability of the COMPASS complex. Accordingly, we have incorporated the following discussion into the revised manuscript (Lines 298-310):

      “The COMPASS complex facilitates H3K4 methylation through a conserved assembly mechanism involving multiple core subunits. WDR5, a central scaffolding component, interacts with RbBP5 and ASH2L to promote complex assembly and enzymatic activity (Qu et al., 2018; Wysocka et al., 2005). It also recognizes the WIN motif of methyltransferases such as MLL1, thereby anchoring them to the complex and stabilizing the ASH2L-RbBP5 dimer (Hsu et al., Cell, 2018). ASH2L further contributes to COMPASS activation by interacting with both RbBP5 and DPY30 and by stabilizing the SET domain, which is essential for efficient substrate recognition and catalysis (Qu et al., Cell, 2018; Park et al., Nat Commun, 2019). Our work shows that MgdE binds both WDR5 and ASH2L and inhibits the methyltransferase activity of the COMPASS complex. Site-directed mutagenesis revealed that residues D<sup>224</sup> and H<sup>247</sup> of MgdE are critical for WDR5 binding, as the double mutant MgdE-D<sup>224</sup>A/H<sup>247</sup> A fails to interact with WDR5 and shows diminished suppression of H3K4me3 levels (Figure 5D).”

      Regarding the precise MgdE-ASH2L binding interface, we attempted to identify the key interaction site by introducing point mutations into ASH2L. However, these mutations did not disrupt the interaction (Figure 5A and B; Figure S5), suggesting that more residues are involved in the interaction.

      (5) Figure 6: The authors proposed that the MgdE-regulated COMPASS complex-H3K4me3 axis suppresses pro-inflammatory responses, but the presented data do not sufficiently support this claim. H3K4me3 inhibitor should be employed to verify cytokine production during infection. 

      We thank the reviewer for the comment. We have now revised the description in lines 824825 “MgdE may suppresses COMPASS complex-mediated inflammatory responses by inhibiting H3K4 methylation” and in lines 219-220 "MgdE suppresses host inflammatory responses probably by inhibition of COMPASS complex-mediated H3K4 methylation." 

      (6) There appears to be a discrepancy between the results shown in Figure S7 and its accompanying legend. The data related to inflammatory responses seem to be missing, and the data on bacterial colonization are confusing (bacterial DNA expression or CFU assay?). 

      We thank the reviewer for the comment. Figure S7 specifically addresses the effect of MgdE on bacterial colonization in the spleens of infected mice, which was assessed by quantitative PCR rather than by CFU assay. 

      We have now revised the legend of Figure S7 as below (Lines 934-938):

      “MgdE facilitates bacterial colonization in the spleens of infected mice. Bacterial colonization was assessed in splenic homogenates from infected mice (as described in Figure 7A) by quantifying bacterial DNA using quantitative PCR at 2, 14, 21, 28, and 56 days post-infection.”

      (7) Line 112-116: Please provide the original experimental data demonstrating nuclear localization of the 56 proteins harboring putative NLS motifs. 

      We thank the reviewer for the comment. We will provide this data in the new Supplementary Table 2.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      Asthenospermia, characterized by reduced sperm motility, is one of the major causes of male infertility. The "9 + 2" arranged MTs and over 200 associated proteins constitute the axoneme, the molecular machine for flagellar and ciliary motility. Understanding the physiological functions of axonemal proteins, particularly their links to male infertility, could help uncover the genetic causes of asthenospermia and improve its clinical diagnosis and management. In this study, the authors generated Ankrd5 null mice and found that ANKRD5-/- males exhibited reduced sperm motility and infertility. Using FLAG-tagged ANKRD5 mice, mass spectrometry, and immunoprecipitation (IP) analyses, they confirmed that ANKRD5 is localized within the N-DRC, a critical protein complex for normal flagellar motility. However, transmission electron microscopy (TEM) and cryo-electron tomography (cryo-ET) of sperm from Ankrd5 null mice did not reveal any structural abnormalities.

      Strengths:

      The phenotypes observed in ANKRD5-/- mice, including reduced sperm motility and male infertility, are conversing. The authors demonstrated that ANKRD5 is an N-DRC protein that interacts with TCTE1 and DRC4. Most of the experiments are thoughtfully designed and well executed.

      Weaknesses:

      The cryo-FIB and cryo-ET analyses require further investigation, as detailed below. The molecular mechanism by which the loss of ANKRD5 affects sperm flagellar motility remains unclear. The current conclusion that Ankrd5 knockout reduces axoneme stability is not well-supported. Specifically, are other axonemal proteins diminished in Ankrd5 knockout sperm? Conducting immunofluorescence analyses and revisiting the quantitative proteomics data may help address these questions.

      Reviewer #2 (Public review):

      Summary:

      The manuscript investigates the role of ANKRD5 (ANKEF1) as a component of the N-DRC complex in sperm motility and male fertility. Using Ankrd5 knockout mice, the study demonstrates that ANKRD5 is essential for sperm motility and identifies its interaction with N-DRC components through IP-mass spectrometry and cryo-ET. The results provide insights into ANKRD5's function, highlighting its potential involvement in axoneme stability and sperm energy metabolism.

      Strengths:

      The authors employ a wide range of techniques, including gene knockout models, proteomics, cryo-ET, and immunoprecipitation, to explore ANKRD5's role in sperm biology.

      Weaknesses:

      (1) Limited Citations in Introduction: Key references on the role of N-DRC components (e.g., DRC1, DRC2, DRC3, DRC5) in male infertility are missing, which weakens the contextual background.

      (2) Lack of Functional Insights: While interacting proteins outside the N-DRC complex were identified, their potential roles and interactions with ANKRD5 are not adequately explored or discussed.

      (3) Mitochondrial Function Uncertainty: Immunofluorescence suggests possible mitochondrial localization for ANKRD5, but experiments on its role in energy metabolism (e.g., ATP production, ROS) are insufficient, especially given the observed sperm motility defects.

      (4) Glycolysis Pathway Impact: Proteomic analysis indicates glycolysis pathway disruptions in Ankrd5-deficient sperm, but the link between these changes and impaired motility is not well explained.

      (5) Cryo-ET Data Limitations: The structural analysis of the DMT lacks clarity on how ANKRD5 influences N-DRC or RS3. The low quality of RS3 data hinders the interpretation of ANKRD5's impact on axoneme structure.

      (6) Discussion of Findings: The manuscript could benefit from a deeper discussion on the broader implications of ANKRD5's interactions and its role in sperm energy metabolism and motility mechanisms.

      Reviewer #1 (Recommendations for the authors):

      EMD-35210/35211 are 16-nm maps while the Ankrd5 null map is 8-nm repeat. To generate a difference map, the authors should use maps of the same periodicity.

      Thank you for your suggestion. We have replaced the old 16-nm maps with an 8nm map and updated the images (Fig. 7). The 8nm repeats DMT density map we used was obtained by summing two 16nm repeats DMTs that were staggered 8nm apart from each other (EMD-35229). The replacement of the 16nm repeats DMT density map with the 8nm repeats DMT density map has no effect on our scientific findings and experimental conclusions.

      "We were able to detect the N-DRC structure in WT sperm, but we failed to find the density of N-DRC adjacent to RS3 in Ankrd5 null sperm". Do the authors imply that the N-DRC is lost in Ankrd5 null sperm? To draw a conclusion, they need to compare the 96-nm map of WT sperm axoneme with that of Ankrd5 null sperm axoneme. Quantitative proteomics shows that the levels of most N-DRC components in Ankrd5 null sperm are comparable with those of WT sperm. Why are the quantitative proteomics results not consistent with the structural observation?

      We are very sorry for this improper description. Our original description was not rigorous, which led to misunderstanding. Our original intention is to say that the quality of the density map causes the N-DRC to be difficult to recognize, rather than that the N-DRC has disappeared. In addition, attempts to classify 96nm repeats DMT structure during our data processing failed. In the process of classification, we found that the density of RS was not good. So we changed the picture and the description.

      We have changed the description in the text: "During the STA process, many particles were misaligned or deformed in the classification results, revealing various degrees of deformation—particularly affecting the B-tube (Figure 9,Fig. S9E). We could retain only ~10% of the DMT particles to obtain the final density map for ANKRD5-KO sperm (Fig. S9E), whereas ~70% were usable in WT dataset as reported previously [59]. The mutant DMT density map also displayed roughness at its periphery, indicating substantial structural heterogeneity (Fig. S9E). Even after discarding a large fraction of deformed particles, the final density map still showed evident artifacts, implying that although the mutant DMT preserves the fundamental features of both tubes, its shape is highly heterogeneous (Fig. S9E). Furthermore, attempts to classify the 96-nm repeats did not yield a clear density for radial spokes (RSs) (Fig. S9F), indicating that ANKRD5 deficiency may affect the stability of other accessory structures, such as RSs [24-26]. In the raw tomograms, RSs in ANKRD5-KO sperm appeared less regularly arranged than those in WT(Fig. S9A and C)."

      Figure S9. The states of DMT particles in sperm of Ankrd5-KO mouse. (A) and (C) Tomogram slices of WT and Ankrd5-KO in Dynamo (The data for WT mouse sperm was EMPIARC-200007). DMT and RS are marked with white dashed lines and white arrows, respectively. (B) and (D) Comparison of DMT particle states between WT and Ankrd5-KO in Dynamo. The visual angles of the DMT particles shown in (B) and (D) show that the DMT fibers within the white box in (A) and (B) are divided equally into 10 slices along the direction of the white arrow, respectively. The DMT particle shapes of WT and Ankrd5-KO are marked by white dashed lines on the right of (B) and (D). The white arrow in (D) identifies the junction of A-tube and B-tube that is suspected to be disconnected. (E) Deformed particles discarded in 3D classification and final aligned DMT artifacts. (F) 3D classification of attempted RS locations.

      In the process of obtaining DMT with a period of 8nm, we discarded about 90% of the particles (some were mis-aligned particles and some were deformed particles). Although the final DMT density showed complete A-tube and B-tube, both the particles in our calculation process and the projection of the final structure showed strong particle heterogeneity.

      Our results show that in ANKRD5-KO mice, the structure of sperm DMT itself has no apparent effect in tube A and tube B, and we found that DMT in the original tomography were not smooth. We speculate that loss of ANKRD5 may reduce the interaction between N-DRC and neighboring DMTs, resulting in nonuniform force on the axoneme during sperm swimming, which may limit our ability to obtain an average structure of the more dynamic components (RS, N-DRC, ODA, IDA). Therefore, when trying to classify 96nm repeat DMTS, we can only see the density of suspected RS3 and RS2, but it is difficult to obtain the confident 96nm repeat DMT density. It is difficult to further discuss the effects of ANKRD5 on RS3 and N-DRC. To test this conjecture, we further classified the density of suspected RS3, and the results obtained exhibited a variety of mixed states (Fig. S9). To avoid confusion, we have already removed the discussion of RS3 and the related images from the original text.

      It's not clear whether N-DRC proteins and ODA, IDA, RS proteins are affected in DMT of Ankrd5 null sperm. Immunofluorescence staining would help to resolve this problem.

      Thank you for your suggestion. The levels of N-DRC proteins and ODA, IDA, RS were detected by immunofluorescence, and no difference was found between ANKRD5-null sperm and control. We added figure S6 as a new figure and added the following description in red font on page 7 of the article:

      Figure S6. Immunofluorescence results of ANKRD5-null sperm and control. DRC11 serves as a marker protein for N-DRC (nexin-dynein regulatory complex), NME5 as a marker for RS (radial spoke), DNALI1 as a marker for IDA (inner dynein arm), and DNAI1 as a marker for ODA (outer dynein arm).

      In addition, ODA and RS were also marked in the figure when we further analyzed the Cryo-ET data (Figure 7 and Figure S9).

      Does Ankrd5 express in other cilia cells except for sperm?

      We stained mouse respiratory cilia using immunofluorescence and found that the protein was also expressed in mouse respiratory cilia. To support this finding, we added Figure S3 as a new figure and included a description in red font on page 6 of the article.

      Page 7, "However, in the process of manual selection of DMT fibers, we found that they were not as smooth as WT particles." This description is too subjective. Please show the data.

      Thank you for your suggestion. We have added a supplementary figure showing the difference between mutant samples and WT samples during particle picking (Fig. S9).

      Abstract, "These findings establish that ANKRD5 is critical for maintaining axoneme stability, "Page 7, "This suggests that the knockout of Ankrd5 may affect the structural stability of the axoneme," I do not see direct evidence that Ankrd5 KO reduces the axoneme stability.

      Our phrasing was not sufficiently precise. These findings suggest that ANKRD5 plays a crucial role in limiting the relative sliding between adjacent microtubule doublets during axoneme bending, rather than directly contributing to the stability of the axoneme. This sentence has already been modified in the abstract and marked in red. We have added the description in the text: "These findings suggest that ANKRD5 may weaken the N-DRC’s "car bumper" role, reducing the buffering effect between adjacent DMTs and thereby destabilizing axoneme structures during intense axoneme motility." and "To further investigate the RS, IDA, and ODA structures of the axonemes, we conducted immunofluorescence assays in both Ankrd5<sup>-/-</sup> mice and the control group. No significant differences were detected between the two groups (Fig. S6)."

      Page 8, "but our study offers new perspectives for male contraceptive research". Could the authors expand this a bit - how this study may offer new perspectives for male contraceptive research?

      We sincerely appreciate the reviewer's insightful feedback regarding the translational potential of our findings. This is indeed a critical aspect that we sought to highlight. In response, we have added a paragraph on page 9 (marked in red) to further emphasize this point. We have added the description in the text: "The potential for male contraceptive development arises from ANKRD5's critical structural role mediated through its ANK domain, which facilitates interaction with the N-DRC complex in sperm flagella. Recent structural evidence suggests the protein's positively charged surface may engage with glutamylated tubulin in adjacent microtubules[41], presenting a druggable interface. Targeted disruption of this interaction through small-molecule inhibitors could transiently impair sperm motility. Sperm function relies more on ANKRD5 than respiratory cilia, so inhibiting ANKRD5 has less impact on the latter. This makes ANKRD5 a promising drug target. This tissue-specific phenotypic uncoupling is not uncommon among axonemal-associated proteins, such as DNAH17 and IQUB[65,66]."

      Abstract, "reveals its interaction with TCTE1 and DRC4/GAS8", please provide the alias symbol DRC5 for TCTE1 for clarity.

      Thank you for your suggestion, I have revised the abstract by replacing "TCTE1" with "DRC5/TCTE1" to clarify the alias. The changes have been highlighted in red in the manuscript for easy reference.

      Introduction, "Fertilization relies on successful spermatogenesis and normal sperm motility (4), which occurs in the testes." Does spermatogenesis or normal sperm motility occur in the testes?

      Thank you for pointing out the ambiguity in the sentence. We have revised the sentence in the Introduction and highlighted it in red as follows: Fertilization relies on successful spermatogenesis and normal sperm motility..

      Introduction, "The axoneme exhibits a 9+2 microtubule doublet structure". The description is not accurate. The "2" are singlet microtubules.

      Thank you for your suggestion. We have revised the sentence to accurately describe the axoneme structure and highlight in red as follows: The axoneme features a 9+2 architecture, comprising nine doublet microtubules encircling a central pair of singlet microtubules, with the N-DRC forming cross-bridges between adjacent doublets.

      Page 4, "control sperm successfully fertilized both cumulus-intact eggs". "control" should be a capital "C".

      We thank the reviewer for noting this oversight. The correction has been implemented on page 5 with the term highlighted in red (now reading: "Control sperm successfully fertilized both cumulus-intact eggs"), and we have verified capitalization consistency throughout the manuscript.

      Page 6, "applied RELION, M, and other software". "other software" is not an appropriate description, please be precise.

      We have revised the description as suggested. Specifically, on page 7, the phrase "and other software" has been replaced with "Dynamo and Warp/M," and this change is highlighted in red for clarity.

      Reviewer #2 (Recommendations for the authors):

      Several components of the N-DRC complex (e.g., DRC1, DRC2, DRC3, DRC5) have been reported to be associated with male infertility in both humans and mice. However, the introduction lacks proper citations for these studies. Adding these references would provide a more comprehensive background for readers.

      Thank you for your suggestion to strengthen the comprehensiveness of the research background by incorporating additional literatures. More literatures related to DRC1, DRC2, DRC3, and DRC5 were cited in the background of this paper. We have rewritten and reorganized the language of the last paragraph of the introduction, and the entire paragraph is highlighted in red. The content of the paragraph is as follows:

      "It was previously believed that N-DRC comprised 11 protein components[13,18]. However, a new component CCDC153 (DRC12) was found to interact with DRC1[19]. In situ cryoelectron tomography (cryo-ET) has significantly advanced understanding of the N-DRC architecture in Chlamydomonas, demonstrating that DRC1, DRC2/CCDC65, and DRC4/GAS8 constitute its core framework[16], while proteins DRC3/5/6/7/8/11 associate with this framework and engage with other axonemal complexes[20]. Biochemical experiments corroborate these findings and validate this structural model[12,21,22]. The N-DRC functions between the DMTs to convert sliding into axonemal bending motion by restricting the relative sliding of outer microtubule doublets[23,24,25]. Mutations of N-DRC subunits demonstrate that the structural integrity of the N-DRC is crucial for flagellar movements. Mutations in DRC1, DRC2/CCDC65, and DRC4/GAS8 are linked to ciliary motility disorders, causing primary ciliary dyskinesia (PCD)[12,26]. Biallelic truncating mutations in DRC1 induce multiple morphological abnormalities of sperm flagella (MMAF), including outer DMT disassembly, mitochondrial sheath disorganization, and incomplete axonemal structures in human sperm[22,27,28]. Similarly, CCDC65 loss disrupts N-DRC stability, leading to disorganized axonemes, global microtubule dissociation, and complete asthenozoospermia[12,29].  Homozygous frameshift mutations in DRC3 impair N-DRC assembly and intraflagellar transport (IFT), resulting in severe motility defects despite normal sperm morphology[30,31]. TCTE1 knockout mice maintain normal sperm axoneme structure but show impaired glycolysis, leading to reduced ATP levels, lower sperm motility, and male infertility[32]. Both Drc7 and Iqcg (Drc9) knockout mice exhibit disrupted '9+2' axonemal architecture, sperm immotility, and male infertility[21,33]. Drc7 knockout sperm also display head deformities and shortened tails[21]. While N-DRC is critical for sperm motility, but the existence of additional regulators that coordinate its function remains unclear. Our findings indicate that ANKRD5 (Ankyrin repeat domain 5; also known as ANK5 or ANKEF1) interacts with N-DRC structure, serving as an auxiliary element to facilitate collaboration among DRC members. The absence of ANKRD5 results in diminished sperm motility and consequent male infertility."

      While many N-DRC components were identified as interacting with ANKRD5, other proteins outside the N-DRC complex were also detected. Notably, GAS8 (DRC4) ranked 165th among the identified proteins. What are the functions of the higher-ranking proteins, and why do they interact with ANKRD5? Discussing their potential roles would enhance the mechanistic understanding of ANKRD5's function.

      We thank the reviewer for highlighting the importance of non-N-DRC proteins interacting with ANKRD5 (ANKEF1). Below, we provide a detailed analysis of the roles and interaction mechanisms of the top-ranked non-N-DRC proteins (Krt77, Rab2a, Gm7429) to elucidate their functional relevance to ANKRD5. We have added the following text to page 6 to clarify and highlight this in red:

      As for other proteins in the LC-MS results, KRT77 is a classic protein that maintains cytoskeletal stability. It may enhance the physical connection between the N-DRC and adjacent DMTs through interaction with ANKRD5. Recent studies indicate that ANKRD5, a newly identified component in the distal lobe of the N-DRC, has a positively charged surface, which may facilitate binding to glutamylated tubulin on adjacent DMTs[41]. Thus, KRT77 may also regulate its interaction with ANKRD5 via post-translational modifications (PTMs, e.g., phosphorylation), thereby strengthening sperm resistance to shear forces during flagellar movement. Rab family proteins participate in intraflagellar transport and membrane dynamics. RAB2A may promote targeted transport of ANKRD5 or other N-DRC components to axonemal assembly sites by recruiting vesicles, and its GTPase activity might link cellular signals to ANKRD5-mediated axoneme remodeling. However, the observed signals could be false positives due to nonspecific factors such as electrostatic adsorption, high-abundance protein interference, detergent-induced membrane disruption, or protein aggregation tendencies.

      The immunofluorescence localization of ANKRD5-Flag appears more aligned with the mitochondrial sheath rather than the axoneme. There is a finer red fluorescent signal extending from the mitochondrial sheath that might correspond to the axoneme. Could this suggest that ANKRD5 has a functional role in the mitochondria? While the authors measured ROS levels, this might not fully clarify whether ANKRD5 is involved in sperm energy metabolism. Considering the motility defects in Ankrd5 knockout mice, further experiments to explore ANKRD5's potential involvement in energy metabolism are necessary.

      The increased detection of ANKRD5 in the midpiece region of the sperm axoneme does not necessarily indicate its localization in mitochondria. Immunofluorescence signals of multiple axonemal Nexin-Dynein Regulatory Complex (N-DRC) components (e.g., TCTE1, DRC1, CCDC65, DRC3, GAS8, and DRC7) are also non-uniformly distributed along the entire flagellum[1]. Similar localization patterns are observed in other structural components, such as radial spoke protein NME5[2] and outer dynein arm protein DNAH5[3]. Furthermore, mitochondria are membrane-bound organelles, and ANKRD5 predominantly resides in the SDS-soluble fraction under varying lysis conditions, confirming its association with the axoneme rather than mitochondria. Thus, the spatial distribution of ANKRD5 does not support a functional role in mitochondria. Importantly, we validated intact mitochondrial function through measurements of reactive oxygen species (ROS) levels (Figure S5C, D), ATP content (Figure 6E), and mitochondrial membrane potential (Figure S5A, B).

      Proteomic analysis of Ankrd5-deficient sperm revealed disruptions in the glycolysis pathway. While these changes do not appear to affect ATP production, the mechanism by which these disruptions impact sperm motility remains unclear. Further investigation into how glycolysis pathway alterations contribute to impaired motility is warranted.

      We appreciate the reviewer's careful consideration of our proteomic data. However, our Gene Set Enrichment Analysis (GSEA) of glycolysis/gluconeogenesis pathways showed no significant enrichment (p-value=0.089, NES=0.708; Fig.6D), which does not meet the statistical thresholds for biological significance (|NES|>1, pvalue<0.05). This observation is further corroborated by our direct ATP measurements showing no difference between genotypes (Fig.6E). We agree that further studies on metabolic regulation could be valuable, but current evidence does not support glycolysis disruption as a primary mechanism for the motility defects observed in Ankrd5-null sperm. This misinterpretation likely arose from the reviewer's overinterpretation of non-significant proteomic trends. We request that this specific claim be excluded from the assessment to avoid misleading readers.

      Weaknesses:

      Cryo-ET Data Limitations: The structural analysis of the DMT lacks clarity on how ANKRD5 influences NDRC or RS3. The low quality of RS3 data hinders the interpretation of ANKRD5's impact on axoneme structure.

      We tried to further calculate the DMT at 96nm period using the present data to analyze the effect of ANKRD5 deletion on RS and N-DRC, however, due to the heterogeneity of the data, we were only able to obtain DMT at 8nm period (we have added a figure in the supplementary material for presentation). And in the process of obtaining DMT with a period of 8nm, we throw away about 90% of the particles (some are misaligned particles, some are deformed particles). Although we were not able to obtain the structure of 96nm repeats DMT, we noticed the enhanced heterogeneity of DMT caused by ANKRD5 knockout, as shown by the 3D classification and other results of the new supplementary images (Fig. S9), and the graphic description was added in the original article.

      We have changed the description in the text: "During particle picking of DMT fibers, we observed that transverse sections of axonemal DMT particles from ANKRD5-KO sperm differ markedly from those in WT sperm. Although both A- and B-tubes were visible in both samples, the DMTs in ANKRD5-KO sperm showed a more irregular profile. In WT sperm, DMTs typically appeared circular, whereas ANKRD5-KO DMTs seemed to be extruded as polygonal. (Fig. S9B,D). Notably, ANKRD5-KO DMTs seemed partially open at the junction between the A- and B-tubes (Fig. S9B,D).

      During the STA process, many particles were misaligned or deformed in the classification results, revealing various degrees of deformation—particularly affecting the B-tube (Fig. S9E). We could retain only ~10% of the DMT particles to obtain the final density map for ANKRD5-KO sperm (Fig. S9E), whereas ~70% were usable in WT dataset as reported previously [59]. The mutant DMT density map also displayed roughness at its periphery, indicating substantial structural heterogeneity (Fig. S9E). Even after discarding a large fraction of deformed particles, the final density map still showed evident artifacts, implying that although the mutant DMT preserves the fundamental features of both tubes, its shape is highly heterogeneous (Fig. S9E). Furthermore, attempts to classify the 96-nm repeats did not yield a clear density for radial spokes (RSs) (Fig. S9F), indicating that ANKRD5 deficiency may affect the stability of other accessory structures, such as RSs [23,24,25]. In the raw tomograms, RSs in ANKRD5-KO sperm appeared less regularly arranged than those in WT (Fig. S9A and C).

      Most recently, following the submission of this work, ANKRD5 was reported to localize at the head of the N-DRC, simultaneously binding DRC11, DRC7, DRC4, and DRC5 [46]. This structural insight agrees with our in vitro findings that ANKRD5 interacts with DRC4 and DRC5 (Fig. 8C-F). However, that study used isolated and purified DMT samples, leaving the precise positioning of ANKRD5 between adjacent axonemal DMTs unconfirmed. We therefore fitted the published structure (PDB entry: 9FQR) into the in situ DMT structure of mouse sperm 96-nm repeats (EMD-27444), revealing that ANKRD5 lies a mere ~3 nm from the adjacent DMT (Fig. 8G). Notably, the N-DRC is often likened to a "car bumper", buffering two neighboring DMTs during vigorous axonemal motion. Given the extensive DMT deformation observed in our cryo-ET data (Fig. S9E), we propose that ANKRD5 contributes to this buffering function at the N-DRC. The loss of ANKRD5 may weaken the "bumper" effect and consequently increase structural damage to adjacent DMTs under intense conditions, while also compromising the stability of associated DMT accessory structures [19,46,60]."

      Figure S9. The states of DMT particles in sperm of Ankrd5-KO mouse. (A) and (C) Tomogram slices of WT and Ankrd5-KO in Dynamo (The data for WT mouse sperm was EMPIARC-200007). DMT and RS are marked with white dashed lines and white arrows, respectively. (B) and (D) Comparison of DMT particle states between WT and Ankrd5-KO in Dynamo. The visual angles of the DMT particles shown in (B) and (D) show that the DMT fibers within the white box in (A) and (B) are divided equally into 10 slices along the direction of the white arrow, respectively. The DMT particle shapes of WT and Ankrd5-KO are marked by white dashed lines on the right of (B) and (D). The white arrow in (D) identifies the junction of A-tube and B-tube that is suspected to be disconnected. (E) Deformed particles discarded in 3D classification and final aligned DMT artifacts. (F) 3D classification of attempted RS locations.

      Although the loss of ANKRD5 did not affect the density of DMT itself in A Tube and B Tube, we found that DMT particles were not smooth in the original tomogram. We speculate that the loss of ANKRD5, a component of the N-DRC that is close to the neighboring DMT, may reduce the interaction between N-DRC and the neighboring DMT, resulting in uneven force on the axoneme during sperm swimming, which may limit our ability to obtain the average structure of the more dynamic components (RS, N-DRC, ODA, IDA). Therefore, when trying to classify 96nm repeat DMT, we could only see the density of suspected RS3 and RS2, but it was difficult to obtain the complete 96nm repeat DMT density, so that we could not further analyze the effect of ANKRD5 deletion on RS and N-DRC. To test this conjecture, we further classified the density of suspected RS3, and the results obtained exhibited a variety of mixed states (which have been added to the supplementary material). To avoid confusion, we have already removed the discussion of RS3 and the related images from the original text.

      The cryo-ET data on the internal structure of the DMT seems to have limited relevance to the N-DRC complex. Additionally, the quality of the RS3 data appears suboptimal, making it difficult to understand how the absence of ANKRD5 influences RS3. Further refinement of the data or alternative approaches may be needed to address this question.

      Thank you very much for your suggestions. For the 96 nm periodic DMT, we have conducted multiple rounds of classification, including applying different masks at the positions of ODA, RS, and DMT. We have also tried classifying with both a single reference and multiple references. However, we were unable to obtain a suitable 96 nm periodic DMT. Regarding the heterogeneity of the particles, we have added a discussion in the manuscript. Following your advice, we have reanalyzed the data, but unfortunately, we still could not further optimize the experimental results.

      In the process of obtaining the 8 nm periodic DMT, we discarded approximately 90 percent of the particles through multiple rounds of classification and alignment, in order to obtain high-quality 8 nm periodic DMT. We classified the remaining particles and found that the densities of RS3 and RS2 were not in their normal states. RS3 might be a mixture of different states of RS3, which makes it difficult for us to further discuss the effects of ANKRD5 on RS3.

      To avoid confusion, we have already removed the discussion of RS3 and the related images from the original text.

      Regarding the effects of ANKRD5 deficiency, we speculate that as the head of the N-DRC, its absence might affect the interaction between the N-DRC and the adjacent DMT, thereby influencing the forces experienced by the DMT during sperm movement. The uneven and irregular forces on the nine pairs of DMTs do not affect the structure of the A and B tubes of the DMT itself, but result in some heterogeneity in the peripheral microtubule parts of the DMT particles. We have added a discussion on these hypotheses in the manuscript. In addition, our 3D classification results demonstrate the structural heterogeneity of DMT caused by ANKRD5 knockdown. We have changed the description in the text:"During particle picking of DMT fibers, we observed that transverse sections of axonemal DMT particles from ANKRD5-KO sperm differ markedly from those in WT sperm. Although both A- and B-tubes were visible in both samples, the DMTs in ANKRD5-KO sperm showed a more irregular profile. In WT sperm, DMTs typically appeared circular, whereas ANKRD5-KO DMTs seemed to be extruded as polygonal. (Fig. S9B,D). Notably, ANKRD5-KO DMTs seemed partially open at the junction between the A- and B-tubes (Fig. S9B,D).

      During the STA process, many particles were misaligned or deformed in the classification results, revealing various degrees of deformation—particularly affecting the B-tube (Figure 9, Fig. S9E). We could retain only ~10% of the DMT particles to obtain the final density map for ANKRD5-KO sperm (Fig. S9E), whereas ~70% were usable in WT dataset as reported previously [59]. The mutant DMT density map also displayed roughness at its periphery, indicating substantial structural heterogeneity (Fig. S9E). Even after discarding a large fraction of deformed particles, the final density map still showed evident artifacts, implying that although the mutant DMT preserves the fundamental features of both tubes, its shape is highly heterogeneous (Fig. S9E). Furthermore, attempts to classify the 96-nm repeats did not yield a clear density for radial spokes (RSs) (Fig. S9F), indicating that ANKRD5 deficiency may affect the stability of other accessory structures, such as RSs [23,24,25]. In the raw tomograms, RSs in ANKRD5-KO sperm appeared less regularly arranged than those in WT (Fig. S9A and C).

      Most recently, following the submission of this work, ANKRD5 was reported to localize at the head of the N-DRC, simultaneously binding DRC11, DRC7, DRC4, and DRC5 [46]. This structural insight agrees with our in vitro findings that ANKRD5 interacts with DRC4 and DRC5 (Fig. 8C-F). However, that study used isolated and purified DMT samples, leaving the precise positioning of ANKRD5 between adjacent axonemal DMTs unconfirmed. We therefore fitted the published structure (PDB entry: 9FQR) into the in situ DMT structure of mouse sperm 96-nm repeats (EMD-27444), revealing that ANKRD5 lies a mere ~3 nm from the adjacent DMT (Fig. 8G). Notably, the N-DRC is often likened to a "car bumper", buffering two neighboring DMTs during vigorous axonemal motion. Given the extensive DMT deformation observed in our cryo-ET data (Fig. S9E), we propose that ANKRD5 contributes to this buffering function at the N-DRC. The loss of ANKRD5 may weaken the "bumper" effect and consequently increase structural damage to adjacent DMTs under intense conditions, while also compromising the stability of associated DMT accessory structures [19,46,60]."

      To further enhance the readability of our manuscript, we created a Graphic Abstract to visually illustrate the biological functions of ANKRD5. The figure is placed immediately after the Abstract section and has been designated as Figure 9.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      Summary:

      The major result in the manuscript is the observation of the higher order structures in a cryoET reconstruction that could be used for understanding the assembly of toroid structures. The cross-linking ability of ZapD dimers result in bending of FtsZ filaments to a constant curvature. Many such short filaments are stitched together to form a toroid like structure. The geometry of assembly of filaments - whether they form straight bundles or toroid like structures - depends on the relative concentrations of FtsZ and ZapD.

      Strengths:

      In addition to a clear picture of the FtsZ assembly into ring-like structures, the authors have carried out basic biochemistry and biophysical techniques to assay the GTPase activity, the kinetics of assembly, and the ZapD to FtsZ ratio.

      Weaknesses:

      The discussion does not provide an overall perspective that correlates the cryoET structural organisation of filaments with the biophysical data. The current version has improved in terms of addressing this weakness and clearly states the lacuna in the model proposed based on the technical limitations.

      Future scope of work includes the molecular basis of curvature generation and how molecular features of FtsZ and ZapD affect the membrane binding of the higher order assembly.

      Reviewer #3 (Public review):

      Summary:

      Previous studies have analyzed the binding of ZapD to FtsZ and provided images of negatively stained toroids and straight bundles, where FtsZ filaments are presumably crosslinked by ZapD dimers. Toroids without ZapD have also been previously formed by treating FtsZ with crowding agents. The present study is the first to apply cryoEM tomography, which can resolve the structure of the toroids in 3D. This shows a complex mixture of filaments and sheets irregularly stacked in the Z direction and spaced radially. The most important interpretation would be to distinguish FtsZ filaments from ZapD crosslinks, This is less convincing. The authors seem aware of the ambiguity: "However, we were unable to obtain detailed structural information about the ZapD connectors due to the heterogeneity and density of the toroidal structures, which showed significant variability in the conformations of the connections between the filaments in all directions." Therefore, the reader may assume that the crosslinks identified and colored red are only suggestions, and look for their own structural interpretations. But readers should also note some inconsistencies in stoichiometry and crosslinking arrangements that are detailed under "weaknesses."

      Strengths.

      This is the first cryoEM tomography to image toroids and straight bundles of FtsZ filaments bound to ZapD. A strength is the resolution, which. at least for the straight bundles. is sufficient to resolve the ~4.5 nm spacing of ZapD dimers attached to and projecting subunits of an FtsZ filament. Another strength is the pelleting assay to determine the stoichiometry of ZapD:FtsZ (although this also leads to weaknesses of interpretation).

      Weaknesses

      The stoichiometry presents some problems. Fig. S5 uses pelleting to convincingly establish the stoichiometry of ZapD:FtsZ. Although ZapD is a dimer, the concentration of ZapD is always expressed as that of its subunit monomers. Fig. S5 shows the stoichiometry of ZapD:FtsZ to be 1:1 or 2:1 at equimolar or high concentrations of ZapD. Thus at equimolar ZapD, each ZapD dimer should bridge two FtsZ's, likely forming crosslinks between filaments. At high ZapD, each FtsZ should have it's own ZapD dimer. However, this seems contradicted by later statements in Discussion and Results. (1) "At lower concentrations of ZapD, .. toroids are the most prominent structures, containing one ZapD dimer for every four to six FtsZ molecules." Shouldn't it be one ZapD dimer for every two FtsZ? (2) "at the high ZapD concentration...a ZapD dimer binds two FtsZ molecules connecting two filaments." Doesn't Fig. S5 show that each FtsZ subunit has its own ZapD dimer? And wouldn't this saturate the CTD sites with dimers and thus minimize crosslinking?

      We thank the reviewer for these insightful comments. The affinity of ZapD for FtsZ is relatively low and a higher concentration of ZapD is required in solution to effectively saturate the binding sites of all FtsZ molecules forming macrostructures. It is important to clarify that the concentrations mentioned in the text refer to the amounts and ratios of protein added to the total volume of the sample, rather than the proteins actively interacting and forming bundles or macrostructures.

      To differentiate, two aspects can be considered: the ratio of added protein (as mentioned in the text) and the fraction of proteins that contribute to the formation of the macrostructures. Under polymerization conditions, FtsZ-GTP recruits additional monomers to form polymers. Therefore, more FtsZ than ZapD would be involved in forming filaments and bundles. Our results support this hypothesis and show that a higher amount of ZapD is required in the sample to pellet with FtsZ bundles.

      We propose that starting with the same initial concentration of FtsZ and ZapD in solution, only a small fraction of ZapD will bind to the structures, favoring the formation of toroidal structures despite the initial 1:1 ratio of proteins added to the sample. When considering a higher FtsZ:ZapD ratio (1:6), the increased amount of ZapD in solution would facilitate the saturation of all FtsZ binding sites, consistent with the observation of straight bundles. Analytical sedimentation velocity data further supported this finding, indicating a binding ratio of approximately 0.3-0.4, suggesting that one ZapD dimer binds for every 4-6 FtsZ monomers. The binding ratio indicates that two FtsZ monomers will bind to a single dimer of ZapD, but this only occurs when there is a significant excess of ZapD over FtsZ in the solution mixture. 

      These findings align qualitatively with the relative intensities of the electrophoretic bands observed for FtsZ and ZapD in the pelleting assay with different FtsZ-ZapD mixtures, as shown in Suppl. Fig. 5 as % of FtsZ in the fractions. Without prior staining calibration of the gels, there is no simple quantitative relationship between gel band intensities after Coomassie staining and the amount of protein in a band (Darawshe et al. 1993 Anal Biochem - DOI: 10.1006/abio.1993.1581). This last point precludes a quantitative comparison between pelleting/SDS-PAGE data and analytical sedimentation measurements. For this reason, we have decided to present pelleting results as % of FtsZ in supernatant and pellet to avoid overestimations. 

      A major weakness is the interpretation of the cryoEM tomograms, specifically distinguishing ZapD from FtsZ. The distinction of crosslinks seems based primarily on structure: long continuous filaments (which often appear as sheets) are FtsZ, and small masses between filaments are ZapD. The density of crosslinks seems to vary substantially over different parts of the figures. More important, the density of ZapD's identified and colored red seem much lower than the stoichiometry detailed above. Since the mass of the ZapD monomer is half that of FtsZ, the 1:1 stoichiometry in toroids means that 1/3 of the mass should be ZapD and 2/3 FtsZ. However, the connections identified as ZapD seem much fewer than the expected 1/3 of the mass. The authors conclude that connections run horizontally, diagonally and vertically, which implies no regularity. This seems likely, but as I would suggest that readers need to consider for themselves what they would identify as a crosslink.

      The amount of ZapD in the toroids will be significantly less than one third. Although the theoretical addition of protein to the samples is at a 1:1 ratio, the actual amount of protein in the macrostructures containing ZapD is much lower, as shown by sedimentation velocity pelleting assays.

      In contrast to the toroids formed at equimolar FtsZ and ZapD, thin bundles of straight filaments are assembled in excess ZapD. Here the stoichiometry is 2:1, which would mean that every FtsZ should have a bound ZapD DIMER. The segmentation of a single filament in Fig. 5e seems to agree with this, showing an FtsZ filament with spikes emanating like a picket fence, with a 4.5 nm periodicity. This is consistent with each spike being a ZapD dimer, and every FtsZ subunit along the filament having a bound ZapD dimer. But if each FtsZ has its own dimer, this would seem to eliminate crosslinking. The interpretative diagram in Fig. 6, far right, which shows almost all ZapD dimers bridging two FtsZs on opposite filaments, would be inconsistent with this 2:1 stoichiometry.

      Assessing the precise stoichiometry of FtsZ and ZapD within the macrostructures is challenging. We interpret the spikes as ZapD dimers bridging two FtsZ filaments, implying a theoretical 1:1 stoichiometry in the straight bundle. However, ZapD may be enriched in certain areas, indicating that a single FtsZ monomer is binding to one side of the dimer. In contrast, the other side remains available for additional connections, resulting in a potential 2:1 stoichiometry. A combination of both scenarios is likely, although our resolution does not allow further characterization. Considering these complexities, we assume these connections represent a dimer of ZapD binding to two FtsZ monomers.

      Figure 6 shows a simplified scheme illustrating how the bundles could be assembled based on the Cryo-ET data. We acknowledge the limitations of this diagram; its purpose is to depict the mesh formed by the stabilization of ZapD. We have not included interactions that do not lead to filament crosslinking, such as dimers binding to only one FtsZ filament. This focus enhances the interpretation of the scheme and the FtsZ-ZapD interaction. A sentence has been added to the caption to highlight the possibility of other interactions not considered in the scheme.

      In the original review I suggested a control that might help identify the structures of ZapD in the toroids. Popp et al (Biopolymers 2009) generated FtsZ toroids that were identical in size and shape to those here, but lacking ZapD. These toroids of pure FtsZ were generated by adding 8% polyvinyl chloride, a crowding agent. The filamentous substructure of these toroids in negative stain seemed very similar to that of the ZapD toroids here. CryoET of these toroids lacking ZapD might have been helpful in confirming the identification of ZapD crosslinks in the present toroids. However, the authors declined to explore this control.

      The mechanisms by which methylcellulose (MC) promotes the assembly of FtsZ macrostructures reported by Popp et al. involve more than simple excluded volume effects, as the low concentration of MC (less than 1 mg/ml) falls below the typical crowding regime. The latter suggests the existence of poorly characterized additional interactions between MC and FtsZ. These complexities preclude the use of FtsZ polymers formed in the presence of MC as a true control for the FtsZ toroidal structures reported here.

      Finally, it should be noted that the CTD binding sites for ZapD should be on the outside of curved filaments, the side facing the membrane in the cell. All bound ZapD should project radially outward, and if it contacted the back side of the next filament, it should not bind (because the CTD is on the front side). The diagram second to right in Fig. 6 seems to incorporate this abortive contact.

      The role of the flexible linker and its biological implications are still under debate in the field. The flexible linker allows ZapD-driven connections to be made in different directions. While these implications are not the primary focus of our manuscript, the flexible linker could allow connections between filaments in different orientations.

      Reviewer #1 (Recommendations for the authors):

      Most of the concerns which I had raised in the earlier version have been taken care of, as detailed in the response.

      A few minor points, mostly related to re-phrasing are listed below:

      Page 2: line 21: The use of the term 'C-terminal domain' for the C-terminal unstructured region of FtsZ is confusing. The term C-terminal domain or CTD for FtsZ is commonly used to describe part of the globular domain, while C-terminal tail or CCTP will be a more apt usage for all the instances in this manuscript.

      We refer to the C-terminal domain as the carboxy-terminal region of the protein. This domain includes the C-terminal linker (CTL), which varies in length between species, followed by a conserved 11-residue sequence (CTC) and shorter, variable C-terminal sequences (CTV). We used the term "C-terminal domain" primarily to improve the readability of the manuscript, but we appreciate the reviewer's feedback. We have now adopted the term "CCTP" instead of "C-terminal domain" to improve the clarity of our manuscript.

      On a related note, the schematic in Fig 1 shows the interaction with CCTP rather than the C-terminal domain of the globular FtsZ. Please provide an explanation.

      We refer to the unstructured C-terminal domain of FtsZ as the C-terminal tail. To avoid confusion, we have introduced the term CCTP in this manuscript.

      Supple Fig 2: "The FCS analysis demonstrated an increasing diffusion time of ZapD along with the FtsZ concentration as result of higher proportion of ZapD bound to FtsZ.

      The increased diffusion time need not be interpreted as increased ZapD bound, it could also mean that FtsZ could polymerise in the presence of increasing ZapD, was this possibility ruled out? Including a comment on this aspect will be useful.

      In these experiments, we monitored fluorescently labeled ZapD. Due to their interaction, we found that its diffusion time increased at high FtsZ concentrations. The data presented in Supplementary Figure 2 shows ZapD in the presence of FtsZ-GDP (i.e. under non-polymerization conditions).

      Was it possible to get a molecular weight estimate based on the diffusion time?

      It is possible to estimate hydrodynamic volumes using the Stokes-Einstein equation if the diffusion coefficient of the diffusing particles is known, assuming that the particles are small and spherical. A molecular weight can then be estimated using a standard density of 1.35 g/cm3 (Fisher et all. Protein science 2009 DOI: 10.1110/ps.04688204). This estimate is heavily dependent on the shape of the diffusing particle, as we assume that our protein of interest here is far from a spherical shape due to the interaction through the flexible linker, the hydrodynamic volumes are overestimated. This overestimation then leads to a further overestimation of the molecular weight. In addition, for a more accurate estimation of the sizes and thus molecular weights for proteins, a modified model of the Stokes-Einstein equation is required (Tyn and Gusek Biotechnology and Bioengineering DOI: 10/1002/bit.260350402), where additional information about the shape of the diffusing particle is estimated by measuring the radius of gyration of the particle. These calculations are complex and beyond the scope of our manuscript.

      Supple Fig 4:

      Does FtsZ GTPase activity (without ZapD) also vary with KCl concentrations? It will be useful to comment on this in Supplementary Figure 4.

      Yes, it has been previously reported that moderate concentration of KCl is optimal for FtsZ GTPase activity. We added a comment to the caption.

      Page 6, line 42: short filament segments arranged nearly 'parallel' to each other Since FtsZ filaments are polar, it is better to rephrase as 'parallel or antiparallel'.

      Corrected.

      Page 7, line 41: cross linking of short 'FtsZ' filaments and not ZapD?

      It was a typo. Corrected

      Page 8: delete 'from above' in the title?

      Corrected

      The use of the phrases such as 'cross linking from the top'; 'binds to FtsZ from above' is vague. (Figure 5b legend; discussion page 10, line 18; page 8, line 26; page 12, line 27). Similarly labelling on a schematic figure on the use of vertical, diagonal/lateral will be useful for the readers.

      We thank the reviewer for the suggestions to improve the understanding of our data. We have simplified them by renaming these interactions as vertical.

      Page 13, lines 6 -10

      Rather than an orientation of top or from the side, just the presence of multiple crosslinks along coaxial filaments suffices for a straight bundle. The average spacing will be more uniform in such a straight bundle compared to a toroid where there might be regions without ZapD. I do not find the data on an upward orientation convincing. ZapD binding need not be above to have the C-terminal ends of FtsZ pointing towards the membrane. On the other hand, having ZapD bind above is likely to occlude membrane binding of FtsZ?

      The flexibility of the FtsZ linker suggests that ZapD can bind filaments oriented in different directions. In a cellular environment, FtsZ molecules interact with other division proteins that compete with ZapD for binding sites. This competition could prevent the membrane from occluding and instead create binding sites between the filaments, stabilizing them.

      Page 11, lines 32 - 34: Please rephrase the sentence, with focus on the main point to be conveyed. Do the authors want to say that the 'Same molecule contributes to variability in spacing based on the number of connections formed.'

      Thank you for your comment. We have rephrased the sentence for clarity.

      Page 11: paragraphs 1,2, and 3 appears to convey similar, related ideas and are redundant. Could these be shortened further into one paragraph highlighting how the ratio leads to differences in higher order FtsZ organisation?

      These paragraphs discuss different ideas, and it is better to keep them separate.

      In the response to reviewers, page 19, point 5 (iii), it is given that 5000 FtsZ molecules correspond to 2/3rd of the total, while in the manuscript text, it is given as one-third. Please correct the response text/manuscript text accordingly. The numbers in the cited reference appears to suggest 1/3rd.

      Yes, it was 1/3rd. Thanks for pointing that out. 

      Fig 1b. Y-axis: Absorbance spelling has a typo.

      Page 14, line 11: Healthcare ('h' missing)

      Page 14, line 15: HCl, KCl (L should be in small letter)

      Page15, line 18: 43 - 48K rpm (not Krpm)

      Supple Fig 1 legend: line 5: 's' missing for species

      Corrected.

  2. Jul 2025
    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The authors aim to explore the effects of the electrogenic sodium-potassium pump (Na<sup>+</sup>/K<sup>+</sup>-ATPase) on the computational properties of highly active spiking neurons, using the weakly-electric fish electrocyte as a model system. Their work highlights how the pump's electrogenicity, while essential for maintaining ionic gradients, introduces challenges in neuronal firing stability and signal processing, especially in cells that fire at high rates. The study identifies compensatory mechanisms that cells might use to counteract these effects, and speculates on the role of voltage dependence in the pump's behavior, suggesting that Na<sup>+</sup>/K<sup>+</sup>-ATPase could be a factor in neuronal dysfunctions and diseases

      Strengths:

      (1) The study explores a less-examined aspect of neural dynamics-the effects of (Na<sup>+</sup>/K<sup>+</sup>-ATPase) electrogenicity. It offers a new perspective by highlighting the pump's role not only in ion homeostasis but also in its potential influence on neural computation.

      (2) The mathematical modeling used is a significant strength, providing a clear and controlled framework to explore the effects of the Na+/K+-ATPase on spiking cells. This approach allows for the systematic testing of different conditions and behaviors that might be difficult to observe directly in biological experiments.

      (3) The study proposes several interesting compensatory mechanisms, such as sodium leak channels and extracellular potassium buffering, which provide useful theoretical frameworks for understanding how neurons maintain firing rate control despite the pump's effects.

      Weaknesses:

      (1) While the modeling approach provides valuable insights, the lack of experimental data to validate the model's predictions weakens the overall conclusions.

      (2) The proposed compensatory mechanisms are discussed primarily in theoretical terms without providing quantitative estimates of their impact on the neuron's metabolic cost or other physiological parameters.

      We thank the reviewer for their concise and accurate summary and appreciate the constructive feedback on the article’s strengths and weaknesses. Experimental work is beyond the scope of our modeling-based study. However, we would like our work to serve as a framework for future experimental studies into the role of the electrogenic pump current (and its possible compensatory currents) in disease, and its role in evolution of highly specialized excitable cells (such as electrocytes).

      Quantitative estimates of metabolic costs in this study are limited to the ATP that is required to fuel the pump. By integrating the net pump current over time and dividing by one elemental charge, one can find the rate of ATP that is consumed by the Na<sup>+</sup>/K<sup>+</sup>pump for either compensatory mechanism. The difference in net pump current is thus proportional to ATP consumption, which allows for a direct comparison of the cost efficiency of the Na<sup>+</sup>/K<sup>+</sup> pump for each proposed compensatory mechanism. The Na<sup>+</sup>/K<sup>+</sup> pump is, however, not the only ATP-consuming element in the electrocyte, and some of the compensatory mechanisms induce other costs related to cell

      ‘housekeeping’ or presynaptic processes. We now added a section in the appendix titled

      ‘Considerations on metabolic costs of compensatory mechanisms’ (section 11.4), where we provide ballpark estimates for the influence of the compensatory mechanisms on the total metabolic costs of the cell and membrane space occupation. Although we argue that according these estimates, the impact of discussed compensatory mechanisms could be significant, due to the absence of more detailed experimental quantification, a plausible quantitative cost approximation on the whole cell level remains beyond the scope of this article.

      Reviewer #1 (Recommendations for the authors):

      (1)  For the f-I curves in Figures 1 and 6, the firing rate increases as the input current increases. I am curious to know: (a) whether the amplitudes of the action potentials (APs) vary with increased input current; (b) whether the waveform of APs (such as in Fig. 1I) transitions into smaller amplitude oscillations at higher input currents; and (c) if the waveform does change at higher input currents, how do the "current contributions," "current," and "ion exchanges per action potential" in Figures 1HJ and 6AB respond?

      To fully answer these questions, we added a supplemental figure with accompanied text in section 11.1 (Fig. A1). We also added a reference to this figure in the main text (section 4.1). Here, it is shown that, as previously illustrated in [1], AP amplitude decreases when the input current increases (Fig. A1 A, left). This effect remains upon addition of either a pump with constant pump rate and co-expressed sodium leak channels (Fig. A1 A, center), or a voltage-dependent pump (Fig. A1 A, right). Interestingly, even though the shape of the current contributions (Fig. A1 B) and the APs (Fig. A1 C) look very different for low (Fig. A1 C, top) and high inputs (Fig. A1 C, bottom), the total sodium and potassium displacement per AP, and thus the pump rate, is roughly the same (Fig. A1 D). Under the assumption that voltage-gated sodium channel (NaV) expression is adjusted to facilitate fixed-AP amplitudes, however, (as in [1]) more NaV channels would be expressed in fish with higher synaptic drives. This would then result in an additional sodium influx per AP and result in higher energetic requirements per AP for electrocytes with higher firing rates (also shown in [1]).

      (2) Could the authors clarify what the vertical dashed line represents in Figures 1B and 1F? Does it correspond to an input current of 0.63uA?

      (Reviewer comment refers to Fig. 1C and 1F in new version): Yes, it corresponds to the input current that is also used in figures 1D and 1G. We clarified this by adding an additional tick label on the x-axis in 1F. The current input of 0.63uA was chosen as a representative input for this cell as follows: we first modeled an electrocyte with a periodic synaptic drive as in [1]. The frequency of this drive was set to 400 Hz, which is an intermediate value in the range of reported EODfs (and thus presumably pacemaker firing rates) of 200-600Hz [2]. Then, acetylcholine receptor currents I<sub>AChRNa</sub> and I<sub>AChRNa</sub> were summed and averaged to obtain the average input current of 0.63uA. This is now also explained in new Methods section 6.2.1.

      (3) What input current was used for Figures 1H, 1I, and 1J?

      Response: In a physiological setting, where the electrocyte is electrochemically coupled to the pacemaker nucleus, stimulation of the electrocyte occurs through neurotransmitter release in the synaptic cleft, which then leads to the opening of acetylcholine receptor channels. As figures 1H-J concern different ion fluxes, we aimed to also include currents stemming from acetylcholine receptor channels. We therefore did not stimulate the electrocyte with a constant input current as in Fig. 1C and F, but simulated elevated constant neurotransmitter levels in the synaptic cleft, which then leads to elevated acetylcholine receptor currents. In the model, this neurotransmitter level, or ‘synaptic drive’ is represented by parameter syn<sub>clamp</sub>. A physiologically relevant value for syn<sub>clamp</sub> was deduced by averaging the synaptic drive during a 400 Hz pacemaker stimulus. This is now also explained in new Methods section 6.2.1.

      (4) In Figure 4A, there is a slight delay between the PN spikes (driver) and the EO (receiver), and no EO spikes occur without PN spikes. However, the firing rate of EO (receiver) appears to decrease before the chirp initiations in Fig 4B; and this delay seems to disappear in Fig 4C. Could the authors explain these observations?

      As shown in the bottom right of figure 4A, when plotting the instantaneous firing rate as one over the inter-spike-interval (1/ISI), the firing rate of a cell is only plotted at the end of every ISI. Therefore, even though the PN drives the electrocyte and thus spikes earlier in time than the electrocyte, when it initiates chirps, these will only be plotted as an instantaneous firing rate at the end of the chirp. If the electrocyte fires spontaneously within this chirp, its instantaneous firing rate will appear earlier in time than the initiation of the chirp of the PN. The PN did, however, initiate the chirp before that and causality between the PN and electrocyte is not disturbed.

      (5) Regarding Figure 6, could the authors specify the input current used in Figures 6A and 6B?

      Figure 6A and 6B have the same synaptic drive as Fig. 1 H, I and J (syn<sub>clamp</sub>=0.13).

      (6) In Section 6, I would recommend that the authors provide a table of parameters and their corresponding values for clarity.

      Thank you for your suggestion. We now reorganized the method section and added two tables with parameters for clarity. Table 1 (see Methods 6.1) includes all parameters that differ from the parameters reported in [1], and parameters that arise from the additionally modeled equations to simulate ion concentration dynamics and pump. We also added the parameters used to simulate the different stimulus protocols (and corresponding tuned parameters) that are presented in the article in Table 2 (see Methods 6.2).

      Reviewer #2 (Public review):

      Summary:

      The paper 'The electrogenicity of the Na<sup>+</sup>/K<sup>+</sup>-ATPase poses challenges for computation in highly active spiking cells' by Weerdmeester, Schleimer, and Schreiber uses computational models to present the biological constraints under which electrocytes-specialized highly active cells that facilitate electro-sensing in weakly electric fish-may operate. The authors suggest potential solutions these cells could employ to circumvent these constraints.

      Electrocytes are highly active or spiking (greater than 300Hz) for sustained periods (for minutes to hours), and such activity is possible due to an influx of sodium and efflux of potassium ions into these cells for each spike. This ion imbalance must be restored after each spike, which in electrocytes, as with many other biological cells, is facilitated by the Na-K pumps at the expense of biological energy, i.e., ATP molecules. For each ATP molecule the pump uses, three positively charged sodium ions from the intracellular space are exchanged for two positively charged potassium ions from the extracellular volume. This creates a net efflux of positive ions into the extracellular space, resulting in hyperpolarized potentials for the cell over time. This does not pose an issue in most cells since the firing rate is much slower, and other compensatory mechanisms and other pumps can effectively restore the ion imbalances. In electrocytes of weakly electric fish, however, that operate under very different circumstances, the firing rate is exceptionally high. On top of this, these cells are also involved in critical communication and survival behaviors, emphasizing their reliable functioning.

      In a computation model, the authors test four increasingly complex solutions to the problem of counteracting the hyperpolarized states that occur due to continuous NaK pump action to sustain baseline activity. First, they propose a solution for a well-matched Na leak channel that operates in conjunction with the NaK pump, counteracting the hyperpolarizing states naturally. Additionally, their model shows that when such an orchestrated Na leak current is not included, quick changes in the firing rates could have unexpected side effects. Secondly, they study the implication of this cell in the context of chirps - a means of communication between individual fishes. Here, an upstream pacemaking neuron entrains the electrocyte to spike, which ceases to produce a so-called chirp - a brief pause in the sustained activity of the electrocytes. In their model, the authors show that it is necessary to include the extracellular potassium buffer to have a reliable chirp signal. Thirdly, they tested another means of communication in which there was a sudden increase in the firing rate of the electrocyte followed by a decay to the baseline. For reliable occurrence of this, they emphasize that a strong synaptic connection between the pacemaker neuron and the electrocyte is warranted. Finally, since these cells are energy-intensive, they hypothesize that electrocytes may have energyefficient action potentials, for which their NaK pumps may be sensitive to the membrane voltages and perform course correction rapidly.

      Strengths:

      The authors extend an existing electrocyte model (Joos et al., 2018) based on the classical Hodgkin and Huxley conductance-based models of Na and K currents to include the dynamics of the NaK pump. The authors estimate the pump's properties based on reasonable assumptions related to the leak potential. Their proposed solutions are valid and may be employed by weakly electric fish. The authors explore theoretical solutions that compound and suggest that all these solutions must be simultaneously active for the survival and behavior of the fish. This work provides a good starting point for exploring and testing in in vivo experiments which of these proposed solutions the fish use and their relative importance.

      Weaknesses:

      The modeling work makes assumptions and simplifications that should be listed explicitly. For example, it assumes only potassium ions constitute the leak current, which may not be true as other ions (chloride and calcium) may also cross the cell membrane. This implies that the leak channels' reversal potential may differ from that of potassium. Additionally, the spikes are composed of sodium and potassium currents only and no other ion type (no calcium). Further, these ion channels are static and do not undergo any post-translational modifications. For instance, a sodium-dependent potassium pump could fine-tune the potassium leak currents and modulate the spike amplitude (Markham et al., 2013).

      This model considers only NaK pumps. In many cell types, several other ion pumps/exchangers/symporters are simultaneously present and actively participate in restoring the ion gradients. It may be true that only NaK pumps are expressed in the weakly electric fish Eigenmannia virescens. This limits the generalizability of the results to other cell types. While this does not invalidate the results of the present study, biological processes may find many other solutions to address the non-electroneutral nature of the NaK pump. For example, each spike could include a small calcium ion influx that could be buffered or extracted via a sodium-calcium exchanger.

      Finally, including testable hypotheses for these computational models would strengthen this work.

      We thank the reviewer for the detailed summary and the identified weaknesses according to which we improved our article. Our model assumptions and simplifications are now mentioned in more detail in the introduction of the article (section 3), and justified in the Methods (section 6.1).

      Furthermore, we added a discussion section (section 5.1) where we outline the conditions under which the present study can be extended to other cell types. We now also state more clearly that the pump current will be present for any excitable cell with significant sodium flux (assuming that the NaK pump carries out the majority of its active transport), but that compensatory mechanisms (if employed at all in a particular cell) could also be implemented via other ionic currents and transporters. We furthermore now highlight the testable hypotheses that we put forward with our computational study on the weakly electric fish electrocyte more explicitly in the first paragraph of the discussion.

      Reviewer #2 (Recommendations for the authors):

      Main text

      Please explicitly state this model's assumptions in the introduction and elaborate on them in the discussion if necessary. For example, some assumptions that I find relevant to mention are: - The Na and K channels are classic HH conductance-based channels, with no post-translational modifications or beta subunit modifications as seen in other high-frequency firing cells (10.1523/JNEUROSCI.23-12-04899.2003).

      Neither calcium nor chloride ions are considered in the spike generation. Nor are Na-dependent K channels (10.1152/jn.00875.2012).

      Only the Na-K pump (and not the Na-Ca exchanger, Ca-pump, or Cl pumps) is modeled,

      Calmodulin, which can buffer calcium, is highly expressed in electric eels, but it is not considered. If some of these assumptions have valid justifications in weakly electric fish electrocytes, please state so with the citations. I recognize that including these in your models is beyond the scope of the current paper.

      We thank the reviewer for pointing out this issue. We now specified in the introduction that the model only contains sodium and potassium ions and only classic HH conductance-based channels. We there also explicitly specify the details on the Na<sup>+</sup>/K<sup>+</sup>-ATPase: it is the only active transporter in this model, thus solely responsible for maintaining ionic homeostasis; its activity is only modulated by intracellular sodium and extracellular potassium concentrations. In the discussion (6.1), we now elaborate on how ion-channel-related aspects (i.e., the addition of resurgent Na<sup>+</sup> or Na<sup>+</sup> -dependent K<sup>+</sup> channels), additional ion fluxes (including some not relevant for the electrocyte but for other excitable cells), and additional active transporters and pumps would influence the results presented in the article.

      In addition, there might be other factors that the authors and the reviewers have yet to consider. The model is a specific case study about the weakly electric fish electrocyte with high-frequency firing. It is almost guaranteed that biology will find other compensatory ways in different cell types, systems, and species (auditory nerve, for example). Given this, it would be prudent to use phrases such as 'this model suggests,' 'perhaps,' 'could,' 'may,' and 'eludes to,' etc., to accommodate other possible solutions to ion homeostasis in rapidly spiking neurons. The solutions the authors are proposing are some of many.

      We rephrased some of the statements to highlight more the hypothetical nature of the compensatory mechanisms in specific cells and to draw attention to the fact that there can be many more such factors. This fact is now also explicitly mentioned in discussion section 5.2.

      Figures

      Some of my comments on the figures are stylistic, others are to improve clarity, and some are critical for accuracy.

      The research problem concerns weakly electric fish E. virescens. I suggest introducing a picture of an electric fish in the beginning (such as that in Figure 3, but not exactly; see specific comments on this fish figure) along with a schema of the research question. 

      We agree, and added an overview schema in Fig. 1A.

      Font sizes change between the panels in all the figures. Please maintain consistency. The figure panel titles and axis labels should start with a capital letter.

      Thank you for pointing this out, both issues have been resolved in the new version of the article.

      Figure 1:

      Please rearrange the figure - BCFG belong together and should appear in the same order. The x-axis labels could be better placed.

      Consider using fewer pump current f-I curves (B, D, E, F). Five is sufficient to make the point. Having 10 curves adds to the clutter. The placement of the color bar could be better. Similarly, the placement of the panel titles 'without co-expression' and 'with co-expression' and the panel labeling (BCFG) makes it confusing. The panel labels should be above the panel title.

      Response (C, D, F, G in new version): We improved the layout of figure 1. Panels B, C, F, G are now C, D, F, G. We opted to include panel E before panels F and G, because it shows the coexpression mechanism before its effect on the tuning curve. We did move the colorbar, added x-axis labels to B and C, and adjusted the location of the panel labels for clarity. We also plotted fewer pump currents.

      B, F: What does the dashed line indicate?

      Response (C, F in new version): The dashed line indicates the input current that was used in figures 1D and 1G. We now clarified this by adding this value on the x-axis.

      C: Any reason not to show the lower firing rates?

      Response (B in new version): In the previous version of the article, pump currents were estimated for electrocytes that were stimulated with the mean synaptic drive that stems from periodic stimulation in the 200-600 Hz regime. We now extended the range of synaptic inputs to obtain lower (and higher) firing rates. The linear relationship between firing rate and pump current also holds for these additional firing rates.

      D: There is no difference between the curves at the top and the bottom. One fills the area between the curve and the zero line; the other shows the curve itself. Please use only one of the two representations.

      Response (panel I in new version): In the previous version, the difference between the plots was that one showed the absolute values of the currents (the curves), and the other plot showed the contributions of the currents to the total (area between the curves). We now only depict the current contributions.

      The I and H orders can be swapped.

      Thank you, they are now swapped.

      The colors used for Na and K are very dull (light blue and pink).

      We now use darker colors in the new version of the article.

      Figure 2:

      Please verify that without the synaptic input perturbations (i.e., baseline in A, D), the firing rate (B, E) and pump current (C, F) converge to the baseline. There is a noticeable drift (downward for firing rate and upward for pump currents) at the 10-second time point.

      Thanks to you noticing, we identified a version mismatch in the code that estimates the pump current required for ionic homeostasis (see Methods 6.1.2). We have now corrected the code and made sure to start the simulation in the steady state so that there is no drift at baseline firing. We also used this corrected code to present tuned parameters for different stimulus protocols in Table 2 (Methods 6.2).

      Figure 3:

      A. The dipole orientation with respect to the fish in panel B needs to be corrected. Consider removing this as this work is not about the dipole.

      This panel has been removed.

      B. This figure has already been overused in multiple papers; please redraw it. Localized expressions of different pumps and ion channels are present within each electrocyte, which generates the dipole. Either show this correctly or don't at all (the subfigure pointed out by the red arrow).

      This panel has been moved to Fig. 1A. We opted to remove the localized expressions.

      C and D belong together; please place them next to each other. Consider introducing panel D first since it follows a similar protocol to the last figure.

      Response (A in new version): Panel placement has been adjusted. We opted to maintain the order to maintain the flow of the text, but we do now combine them in one panel.

      E and F are very similar in that they are swapped on the x and y axes. Either that or I have severely misunderstood something, in which case it needs to be shown better.

      Response (B and C in new version): We adjusted the placement of these panels. They are not the same, panel B shows the mean of physiological periodic inputs, and figure C shows that when this mean is fed to the electrocyte, it also induces tonic firing. The range of mean currents that result from periodic synaptic stimulation in the physiological regime (panel B, y-axis) is now indicated in panel C by a grey box along the x-axis.

      G. Why show the lines with double arrow ends? The curves are diverging - that's enough.

      Good point, we updated this panel accordingly (now panel D).

      Figure 4

      Please verify the time units in these plots. Something seems amiss. B and D lower plots-perhaps this is seconds? B could use an inset box/ background gray color (t1, t2) indicating the plots of the C panel (left, right). Likewise, for D (t1, t2), connect to E (left, right).

      You are right, the x-axes were supposed to be in seconds, we updated this. We indicated the relations between D-C and D-E by gray backgrounds and by adding the corresponding panel label on the x-axis.

      A: Indicate the perturbation in the schematic, i.e., extracellular K buffer.

      The perturbation is now indicated.

      D: Even with the extracellular K buffer, there is a decay (slower than in B) of the pump current over time. Please verify (you do not have to show in your paper) that this decay saturates.

      After the ten chirps are initiated, pacemaker firing goes back to baseline. In both cases (panel B and panel D), the pump current goes back to baseline after some time. With extracellular potassium buffering, this happens more slowly due to a decreased reaction speed of the pump to changes in firing rate (in comparison to the case without extracellular potassium buffer).

      The decrease in reaction speed however merely delays the effects of changes in firing rates on the pump current in time. Therefore, even with an extracellular potassium buffer, when more chirps are initiated in a short period of time, the pump current can still decrease to an extent that impairs entrainment. Using the same protocol as in panel B and D, we increased the number of chirps and found that with an extracellular potassium buffer, a maximum of 13 chirps could be encoded without entrainment failure (as opposed to 2 chirps without the buffer as shown in panel B).

      Figure 5

      Please verify the time units in these plots, as for Figure 4. B and E lower plots-perhaps this is seconds? B could use an inset box/ background gray color (t1, t2) indicating the plots of the panels C and D. Likewise, for E (t1, t2), connect to F and G.

      The time axis in this figure was indeed also in seconds, which we corrected here. The relations between plots B-C/D and E-F/G are now indicated through gray backgrounds and corresponding panel references on the x-axis.

      A: Indicate the perturbation in the schematic, i.e., the synapse's strength. There is no need to include the arrow or to mention freq. rise. The placement of the time scale can be misinterpreted as a current clamp. Instead, plot it as a zoomed inset.

      The arrow is removed and we now also show a zoomed inset. Also, the perturbation is now indicated.

      E: Verify that the pump current in the strong synapse case already starts at 1.25

      We verified this and noticed that the pump current in the strong synapse case is indeed lower than that in the weak synapse case. This is because to ensure a fair comparison for this stimulation protocol, voltage-gated sodium channel conductance was tuned to maintain a spike amplitude of 13 mV in both cases (see Methods 6.2). In this case, a weak synapse leads to a lower influx of sodium via AChR channels, but a higher influx via voltage-gated sodium channels. The total sodium influx in this case is larger than that for a stronger synapse with relatively less voltage-gated sodium currents, and thus a larger pump current. In the previous version of the article, this was wrongly commented on in the figure captions, and we removed the erroneous statement.

      This is not critical, but because the R-value here can be obtained as a continuous value, it would be appropriate to show it for the whole duration of the weak and strong synapses in B and E. Maybe consider including a schema that shows how R is calculated in panel A.The caption has a typo, 'during frequency rises before (D) and after (E)'. It should be before C) and after (D) instead.

      The caption typo has been corrected. The R-value for the whole duration of the weak and strong synapses in B and E is 1.000. This is because the R-value is the variance of all phase relations between the PN and the electrocyte, and for the entire duration of the stimulus protocol, there are only a few outliers in phase relations at the maxima of the frequency rises. We decided to include this R-value to show that in general, synchronization between the PN and the electrocyte is very stable. The schema that explains how R is calculated has not been included in favor of not overcrowding the figure. We did add a reference in the figure caption to the methods section in which the calculation of R is explained.

      Figure 6:

      A: The top and bottom plots are redundant. Use one of the two. They show the same thing. It may be better to plot Na, K, pump, and net currents on the top panels and the Na leak, which is of smaller magnitude, in a different panel.

      We now only show current contributions.

      B: Please change the color schema. It is barely visible on my prints.

      D: Pump current, instantaneous case, is barely visible

      Color schemes were adjusted.

      Figure A1: It's all good.

      Methods:

      Please provide some internal citations for where specific equations were used in the results/figures. You do this for sections 6.2.3, referencing Figure 5 (c,d,e,g), and 6.2.4, referencing Fig 5 C-E.

      There are now internal references in each methods section to where in the figures they were used. We also included a table with stimulus parameters for each figure with a stimulus protocol (Table 2).

      Also, the methods could be ordered in the same order as the results are presented. Please consider if some details in the methods could be moved to the appendix.

      The ordering of the methods has now been changed to separately explain the model expansions (6.1) and the stimulus protocols (6.2). Both sections are in corresponding order of the figures presented in the article. We opted to maintain all details in the methods.

      6.1.1 Please cite 26 after the first line. Where was this used? In Figure 3C, 4, 5?

      We added the citation. The effects of co-expressed leak channels are shown in Fig. 1 EG, and were used to compensate for pump currents at baseline firing in figures 1 D, H-J (left, with pump), 2, 4, 5, and 6 A-B (left), C (top). This is now also added to the text for clarity.

      Traditionally (Hodgkin, A. L. and Huxley, A. F. (1952). J. Physiol. (Lond.), 117:500-544. Table 3; & Hodgkin, A. L. and Huxley, A. F. (1952). J. Physiol. (Lond.), 116:473-496 Table 5 and the paragraph around it), leak potential is set such that it accounts for all leak from all ions. While in your work, this potential is equal to the reversal of potassium - it need not be so in the animal. There may be leaks from other ions as well, particularly sodium and chloride. Please verify that assuming the leak reversal is the same as that of potassium (Ek, in Equation 3) does not lead to having to model Na leak currents separately.

      In the original model [1], it was assumed that the reversal potential of the leak was the same as that of potassium, which contains the implicit assumption that only potassium ions contribute to the leak. In our article, we also assume that sodium ions contribute to the leak. This can be modeled by adjusting the leak reversal potential accordingly, or by adding an additional leak current that solely models the sodium leak. We opted for the latter in order to track all sodium and potassium ions separately so that ion concentration dynamics could also be modeled properly. Chloride ions were neglected in this study; in our model they do not contribute to the leak. If one were to also model chloride currents and chloride concentration dynamics, it would be beneficial to model these as an additional separate leak current.

      The notation of I_pump_0 needs to be more convenient. Please consider another notation instead of the _0 (pump at baseline). Similarly for [Na<sup>+</sup>]_in_0 [Na<sup>+</sup>]_out_0 and [K<sup>+</sup>]_in_0 and [K+]_out_0

      We changed the notation for baseline similarly to [3], with ‘0’ as a superscript instead of a subscript.

      Equation 11: Please mention why AChRs do not let calcium ions through. Please cite a justification for this. If this is an assumption of the model, please state this explicitly.

      The AChR channels that were found in the E. virescence electrocytes are muscle-type acetylcholine nicotinic receptors [4], which are non-selective cation channels that could indeed support calcium flux [5]. No calcium currents were, however, modeled in the original electrocyte model [1], presumably due to the lack of significant contributions of calcium currents or extracellular calcium concentrations to electrocyte action potentials of a similar weakly electric electrogenic wave-type fish Sternopygus macrurus [6].

      Due to the lack of calcium currents in the original electrocyte model, and due to the limitation of this study to sodium and potassium ions, we chose not to include calcium currents stemming from AChR channels. This assumption is now explicitly stated in Methods 6.1.

      Equation 12, V_in, where the intracellular volume. If possible, avoid the notation of 'V' - you already use a small v for membrane potential.

      We changed the notation for volume to ‘ω’ similarly to [3]. As we previously used ω as a notation for the firing rate, we changed the notation for firing rate to ‘r’.

      Equation 17: Does this have any assumptions? Would the I_AchRNa, and thus Sum(mean(I_Na))) not change depending on the synaptic drive?

      The assumptions of this equations are the following (now also mentioned in Methods 6.1.2):

      The sum of all sodium currents also includes sodium currents through acetylcholine channels (I_AChRNa).

      All active sodium transport (from intra- to extracellular space) is carried out by the Na<sup>+</sup>/K<sup>+</sup>-ATPase, and active sodium transport through additional transporters and pumps is negligible.

      The time-average of sodium currents is either taken in a tonic firing regime where the timeinterval that is averaged over is a multiple of the spiking period, nT, or if it is taken for a more variable firing regime, the size of the averaging window should be sufficiently large to properly sample all firing statistics.

      Under these assumptions, Eq. 17 can be used to compute suitable pump currents for different synaptic drives (as Sum(mean(I_Na))) and thus I_pump0 indeed change with the synaptic drive, see Table 2 in Methods 6.2). 

      6.2: Please rewrite the first sentence of this paragraph.

      The first sentence of this paragraph, which has been moved to section 6.2.2 for improved structuring of the text, has been rewritten.

      6.2.1: The text section could use a rewrite.

      Please elaborate on what t_p is. If it is not time, please do not use 't.' What is p here? What are the units of the equation (22), t_p < 0.05 (?)

      This section has now also been moved to 6.2.2. It has been rewritten to improve clarity and t_p has been renamed to t_pn (as it does reflect time, which is now better explained). The units have now also been added to the equation (which is now Eq. 26).

      6.2.4: Please rewrite this.

      This section has been rewritten (and has been moved to section 6.1.4).

      Bibliography

      Some references are omitted (left anonymous) or inconsistent on multiple occasions.

      Thank you for pointing this out! It is now rectified.

      References used for author response

      (1) Joos B, Markham MR, Lewis JE, Morris CE. A model for studying the energetics of sustained high frequency firing. PLOS ONE. 2018 Apr;13:e0196508.

      (2) Hopkins CD. Electric communication: Functions in the social behavior of eigenmannia virescens. Behaviour. 1974;50(3-4):270–304.

      (3) Hübel N, Dahlem MA. Dynamics from seconds to hours in hodgkin-huxley model with time-dependent ion concentrations and buer reservoirs. PLoS computational biology.ff2014;10(12):e1003941.

      (4) BanY, Smith BE, Markham MR. A highly polarized excitable cell separates sodium channels from sodium-activated potassium channels by more than a millimeter. Journal of neurophysiology. 2015; 114(1):520–30.

      (5) Vernino S, Rogers M, Radcliffe KA, Dani JA. Quantitative measurement of calcium flux through muscle and neuronal nicotinic acetylcholine receptors. Journal of Neuroscience. 1994;14(9):5514-5524.

      (6) Ferrari M, Zakon H. Conductances contributing to the action potential of sternopygus electro-cytes. Journal of Comparative Physiology A. 1993;173:281–92.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations for the authors):

      Thank you for your thorough review of our manuscript and your valuable suggestions. Here are our responses to each point you raised:

      (1) Novelty: Exploring the feasibility of extending the risk-scoring model to diverse cancer types could emphasize the broader impact of the research.

      Thank you so much for your thoughtful and insightful feedback. Your suggestion to explore extending the risk-scoring model to diverse cancer types is truly valuable and demonstrates your broad vision in this field. We deeply appreciate your interest in our research and the effort you put into providing such constructive input.

      After careful consideration, we have decided to focus our current study on the specific cancer type(s) we initially set out to explore. This decision was made to ensure that we can thoroughly address the research questions at hand, given our current resources, time constraints, and the complexity of the topic. By maintaining this focused approach, we aim to achieve more in-depth and reliable results that can contribute meaningfully to the understanding of this particular area.

      However, we fully recognize the potential significance of your proposed direction and firmly believe that it could be an excellent avenue for future research. We will definitely keep your suggestion in mind and may explore it in subsequent studies as our research progresses and evolves.

      (2) Improvement in Figure Presentation: The inconsistency in font formatting across figures, particularly in Figure 2 (A-D, E, F-H, I), Figure 3 (A-C, D-J, H, K), and the distinct style change in Figure 5, raises concerns about the professionalism of the visual presentation. It is recommended to standardize font sizes and styles for a more cohesive and visually appealing layout. This ensures that readers can easily follow and comprehend the graphical data presented in the article.

      The text in the picture has been revised as requested.

      (3) Enhancing Reliability of Immune Cell Infiltration Data: Address the potential limitations associated with relying solely on RNASeq data for immune cell infiltration analysis between ICD and ICD high groups in Figure 2. It is advisable to discuss the inherent challenges and potential biases in this methodology. To strengthen the evidence, consider incorporating bladder cancer single-cell sequencing data, which could provide a more comprehensive and reliable understanding of immune cell dynamics within the tumor microenvironment.

      Thank you very much for your meticulous review and the highly constructive suggestions. Your insight regarding the limitations of relying on RNASeq data for immune cell infiltration analysis and the proposal to incorporate bladder cancer single-cell sequencing data truly reflect your profound understanding of the field. We deeply appreciate your efforts in guiding our research and the valuable perspectives you've offered.

      After careful deliberation, given our current research scope, timeline, and available resources, we've decided to focus on further discussing and addressing the challenges and biases inherent in RNASeq-based immune cell infiltration analysis. By delving deeper into the methodological limitations and conducting more in-depth statistical validations, we aim to provide a comprehensive and reliable interpretation of the data within our study framework. This focused approach allows us to maintain the integrity of our original research design and deliver robust findings on the relationship between immune cell infiltration and ICD in the current context.

      However, we fully acknowledge the significant value of your proposed single-cell sequencing approach. It is indeed a powerful method that could offer more detailed insights into immune cell dynamics, and we believe it holds great promise for future research in this area. We will keep your suggestion in mind as an important direction for potential future studies, especially when we plan to expand and deepen our exploration of the tumor microenvironment.

      (4) Clarity in Data Sources and Interpretation of Figure 5: In the results section, provide a detailed and transparent explanation of the sources of data used in Figure 5. This includes specifying the databases or platforms from which the chemotherapy, targeted therapy, and immunotherapy data were obtained. Additionally, elucidate the rationale behind the chosen data sources and how they contribute to the overall interpretation of the study's findings. And, strangely, these immune-related genes are associated with cancer sensitivities to different targeted therapies.

      Thank you very much for your detailed and valuable feedback on Figure 5. We sincerely appreciate your careful review and insightful suggestions, which have provided us with important directions for improvement.

      Regarding the data sources in Figure 5, we used the pRRophetic algorithm to conduct a drug sensitivity analysis on the TCGA database. The reason for choosing these data sources is multi - faceted. Firstly, these databases and platforms are well - established and widely recognized in the field. They have strict data collection and verification processes, ensuring the accuracy and reliability of the data. For example, TCGA has a large - scale, long - term - accumulated chemotherapy case database, which can comprehensively reflect the clinical application and treatment effects of various chemotherapeutic drugs.

      Secondly, these data sources cover a wide range of cancer types and patient information, which can meet the requirements of our study's diverse sample size and variety. This comprehensiveness enables us to conduct a more in - depth and representative analysis of the relationships between different therapies and immune - related genes.

      In terms of the overall interpretation of the study's findings, the use of these data sources provides a solid foundation. The accurate chemotherapy, targeted therapy, and immunotherapy data help us clearly demonstrate the associations between immune - related genes and cancer sensitivities to different treatments. This allows us to draw more reliable conclusions and provides a scientific basis for understanding the complex mechanisms of cancer treatment from the perspective of immune - gene - therapy interactions.

      As for the unexpected association between immune - related genes and cancer sensitivities to different targeted therapies, this is indeed a fascinating discovery. In our analysis, we hypothesized that immune - related genes may affect the tumor microenvironment, thereby influencing the response of cancer cells to targeted therapies. Although this finding is currently beyond our initial expectations, it has opened up a new research direction for us. We will further explore and verify the underlying mechanisms in future research.

      Once again, thank you for your guidance. We will make corresponding revisions and improvements according to your suggestions to make our research more rigorous and complete.

      (5) Legends and Methods: Address the brevity and lack of crucial details in the figure legends and methods section. Expand the figure legends to include essential information, such as the number of samples represented in each figure. In the methods section, provide comprehensive details, including the release dates of databases used, versions of coding packages, and any other pertinent information that is crucial for the reproducibility and reliability of the study.

      We would like to express our sincere gratitude for your valuable feedback on the figure legends and methods section of our study. We highly appreciate your sharp observation of the issues regarding the brevity and lack of key details, which are crucial for further improving our research.

      We have supplemented the methods section with data including the number of samples, the release dates of the databases used, and the versions of the coding packages, etc. For TCGA samples: 421 tumor samples and 19 normal samples.Database release date: March 29, 2022, v36 versions.Coding package version: R version 4.1.1.We will immediately proceed to supplement these key details, making the research process and methods transparent. This will allow other researchers to reproduce our study more accurately and enhance the persuasiveness of our research conclusions.

      (6) Evidence Supporting Immunotherapy Response Rates: The importance of providing a robust foundation for the conclusion regarding lower immunotherapy response rates. Strengthen this section by offering a more detailed description of sample parameters, specifying patient demographics, and presenting any statistical measures that validate the observed trends in Figure 5Q-T. More survival data are required to conclude. Avoid overinterpretation of the results and emphasize the need for further investigation to solidify this aspect of the study.

      Thank you very much for your professional and meticulous feedback on the content related to immunotherapy response rates in our study! Your suggestions, such as providing a solid foundation for the conclusions and supplementing key information, are of great value in enhancing the quality of our research, and we sincerely appreciate them.

      The data in Figures 5Q to T are from the TCGA database, which has already been provided. The statistical measure used for Figures 5Q to T is the P-value, which has been marked in the figures. The survival data have been provided in Figure 3D.

      Reviewer #2 (Recommendations for the authors):

      Thank you for your thorough review of our manuscript and your valuable suggestions. Here are our responses to each point you raised:

      (1) There is no information on the samples studied. Are all TCGA bladder cancer samples studied? Are these samples all treatment naïve? Were any excluded? Even simply, how many samples were studied?

      Thank you so much for pointing out the lack of sample - related information. Your attention to these details has been extremely helpful in identifying areas for improvement in our study.

      All the samples in our study were sourced from the TCGA (The Cancer Genome Atlas) and TCIA (The Cancer Immunome Atlas) databases. It should be noted that the patient data in the TCIA database are originally from the TCGA database. Regarding whether the patients received prior treatment, this information was not specifically mentioned in our current report. Instead, we mainly relied on the scores of the prediction model for evaluation. Since all samples were obtained from publicly available databases, we understand the importance of clarifying their origin and characteristics.

      We sincerely apologize for the omission of the sample size and other relevant details. We will promptly supplement this crucial information in the revised version, including a detailed description of the sample sources and any relevant characteristics. This will ensure greater transparency and help readers better understand the basis of our research.

      For TCGA samples: 421 tumor samples and 19 normal samples.Database release date: March 29, 2022, v36 versions.Coding package version: R version 4.1.1.

      (2) What clustering method was used to divide patients into ICD high/low? The authors selected two clusters from their "unsupervised" clustering of samples with respect to the 34 gene signatures. A Delta area curve showing the relative change in area under the cumulative distribution function (CDF) for k clusters is omitted, but looking at the heatmap one could argue there are more than k=2 groups in that data. Why was k=2 chosen? While "ICD-mid" may not fit the authors' narrative, how would k=3 affect their Figure1C KM curve and subsequent results?

      Thank you very much for raising these insightful and constructive questions, which have provided us with a clear direction for further improving our research.

      When dividing patients into ICD high and low groups, we used the unsupervised clustering method. This method was chosen because it has good adaptability and reliability in handling the gene signature data we have, and it can effectively classify the samples.

      Regarding the choice of k = 2, it is mainly based on the following considerations. Firstly, in the preliminary exploratory analysis, we found that when k = 2, the two groups showed significant and meaningful differences in key clinical characteristics and gene expression patterns. These differences are closely related to the core issues of our study and help to clearly illustrate the distinctions between the ICD high and low groups. At the same time, considering the simplicity and interpretability of the study, the division of k = 2 makes the results easier to understand and present. Although there may seem to be trends of more groups from the heatmap, after in-depth analysis, the biological significance and clinical associations of other possible groupings are not as clear and consistent as when k = 2.

      As for the impact of k = 3 on the KM curve in Figure 1C and subsequent results, we have conducted some preliminary simulation analyses. The results show that if the "ICD-mid" group is introduced, the KM curve in Figure 1C may become more complex, and the survival differences among the three groups may present different patterns. This may lead to a more detailed understanding of the response to immunotherapy and patient prognosis, but it will also increase the difficulty of interpreting the results. Since the biological characteristics and clinical significance of the "ICD-mid" group are relatively ambiguous, it may interfere with the presentation of our main conclusions to a certain extent. Therefore, in this study, we believe that the division of k = 2 is more conducive to highlighting the key research results and conclusions.

      Thank you again for your valuable comments. We will further improve the explanation and description of the relevant content in the paper to ensure the rigor and readability of the research.

      (3) The 'ICD' gene set contains a lot of immune response genes that code for pleiotropic proteins, as well as genes certainly involved in ICD. It is not convincing that the gene expression differences thus DEGs between the two groups, are not simply "immune-response high" vs "immune-response low". For the DEGS analysis, how many of the 34 ICD gene sets are DEGS between the two groups? Of those, which markers of ICD are DEGs vs. those that are related to immune activation?

      a. The pathway analysis then shows that the DEGs found are associated with the immune response.

      b. Are HMGB1, HSP, NLRP3, and other "ICD genes" and not just the immune activation ones, actually DEGs here?

      c. Figures D, I-J are not legible in the manus.

      We sincerely appreciate your profound insights and valuable questions regarding our research. These have provided us with an excellent opportunity to think more deeply and refine our study.

      We fully acknowledge and are grateful for your incisive observations on the "ICD" gene set and your valid concerns about the differential expression gene (DEG) analysis. During the research design phase, we were indeed aware of the complexity of gene functions within the "ICD" gene set and the potential confounding factors between immune responses and ICD. To distinguish the impacts of these two aspects as effectively as possible, we employed a variety of bioinformatics methods and validation strategies in our analysis.

      Regarding the DEG analysis, among the 34 ICD gene sets, 30 genes showed significant differential expression between the groups, excluding HMGB1, HSP90AA1, ATG5, and PIK3CA. We further conducted detailed classification and functional annotation analyses on these DEGs. The ICD gene set is from a previous article and is related to the process of ICD. Relevant literature is in the materials section. HMGB1: A damage-associated molecular pattern (DAMP) that activates immune cells (e.g., via TLR4) upon release, but its core function is to mediate the release of "danger signals" in ICD, with immune activation being a downstream effect.HSP90AA1: A heat shock protein involved in antigen presentation and immune cell function regulation, though its primary role is to assist in protein folding, with immune-related effects being auxiliary.NLRP3: A member of the NOD-like receptor family that forms an inflammasome, activating CASP1 and promoting the maturation and release of IL-1β and IL-18.Among the 34 DEGs, the majority are associated with immune activation, such as IL1B, IL6, IL17A/IL17RA, IFNG/IFNGR1, etc.

      (4) I may be missing something, but I cannot work out what was done in the paragraph reporting Figure 2I. Where is the ICB data from? How has this been analysed? What is the cohort? Where are the methods?

      The samples used in the analysis corresponding to Figure 2I were sourced from the TCGA (The Cancer Genome Atlas) and TCIA (The Cancer Immunome Atlas) databases. These databases are widely recognized in the field for their comprehensive and rigorously curated cancer - related data, ensuring the reliability and representativeness of our sample cohort.

      Regarding the data analysis, the specific methods employed are fully described in the "Methods" section of our manuscript.

      (5) How were the four genes for your risk model selected? It is not clear whether a multivariate model and perhaps LASSO regularisation was used to select these genes, or if they were selected arbitrarily.

      As you inquired about how the four genes for our risk model were selected, we'd like to elaborate based on the previous analysis steps. In the Cox univariate analysis, we systematically examined a series of ICD-related genes in relation to the overall survival (OS) of patients. Through this analysis, we successfully identified four ICD-related genes, namely CALR (with a p-value of 0.003), IFNB1 (p = 0.037), IFNG (p = 0.022), and IF1R1 (p = 0.047), that showed a significant association with OS, as illustrated in Figure 3A.

      Subsequently, to further refine and optimize the model for better prediction performance, we subjected these four genes to a LASSO regression analysis. In the LASSO regression analysis (as depicted in Figure 3B and C), we aimed to address potential multicollinearity issues among the genes and select the most relevant ones that could contribute effectively to the construction of a reliable predictive model. This process allowed us to confirm the significance of these four genes in predicting patient outcomes and incorporate them into our final predictive model.

      (6) How related are the high-risk and ICD-high groups? It is not clear. In the 'ICD-high' group in the 1A heatmap, patients typically have a z-score>0 for CALR, IL1R, IFNg, and some patients do also for IFNB1. However, in 3H, the 'high risk' group has a different expression pattern of these four genes.

      Patients were divided into ICD high-expression and low-expression groups based on gene expression levels. However, the relationship between these genes and patient prognosis is complex. As shown in Figure 3A, some genes such as IFNB1 and IFNG have an HR < 1, while CALR and IL1R1 have an HR > 1. Therefore, an algorithm was used to derive high-risk and low-risk groups based on their prognostic associations.

      (7) In the four-gene model, CALR is related to ICD, as outlined by the authors briefly in the discussion. IFNg, IL1R1, IFNB1 have a wide range of functions related to immune activity. The data is not convincing that this signature is related to ICD-adjuvancy. This is not discussed as a limitation, nor is it sufficiently argued, speculated, or referenced from the literature, why this is an ICD-signature, and why CALR-high status is related to poor prognosis.

      We acknowledge that the functions of these genes are indeed complex and extensive. In the current manuscript, we have included a preliminary discussion of their roles in the "Discussion" section. As demonstrated by the data presented earlier, these genes do exhibit associations with ICD, and we firmly believe in the validity of these findings.

      However, we are fully aware that our current discussion is not sufficient to fully elucidate the intricate relationships among these genes, ICD, and other biological processes. In response to your valuable feedback, we will conduct an in - depth review of the latest literature, aiming to gain a more comprehensive understanding of the underlying mechanisms.

      (8) Score is spelt incorrectly in Figures 3F-J.

      Figures 3F-J have been revised as requested.

      (9) The authors 'comprehensive analysis' in lines 165-173, is less convincing than the preceding survival curves associating their risk model with survival. Their 'correlations' have no statistics.

      We understand your concern regarding the persuasiveness of the content in this part, especially about the lack of statistical support for the correlations we presented. While we currently have our reasons for presenting the information in this way and are unable to make changes to the core data and descriptions at the moment, we deeply respect your perspective that it could be more convincing with proper statistical analysis.

      (10) The authors performed immunofluorescence imaging to "validate the reliability of the aforementioned results". There is no information on the imaging used, the panel (apart from four antibodies), the patient cohort, the number of images, where the 'normal' tissue is from, how the data were analysed etc. This data is not interpretable without this information.

      a. Is CD39 in the panel? CD8, LAG3? It's not clear what this analysis is.

      The color of each antibody has been marked in Fig 2B. The cohort information and its source have been supplemented. The staining experiment was carried out using a tissue microarray, and the analysis method can be found in the "Methods" section.Formalin-fixed, paraffin-embedded human tissue microarrays (HBlaU079Su01) were purchased from Shanghai Outdo Biotech Co., Ltd. (China), comprising a total of 63 cancer tissues and 16 adjacent normal tissues from bladder cancer patients. Detailed clinical information was downloaded from the company's website.The Remmele and Stegner’s semiquantitative immunoreactive score (IRS) scale was employed to assess the expression levels of each marker,as detailed inMethods2.5.CD39, CD8, and LAG3 were also stained, but the results were not presented.

      (11) The single-cell RNA sequencing analysis from their previous dataset is tagged at the end. CALR expression in most identified cells is interesting. Not clear what this adds to the work beyond 'we did scRNA-seq'. How were these data analysed? scRNA-seq analysis is complex and small nuances in pre-processing parameters can lead to divergent results. The details of such analysis are required!

      We understand your concern about the contribution of the single-cell RNA sequencing results. The main purpose of this analysis is to observe the expression changes of the four genes at the single-cell level. As you mentioned, single-cell RNA sequencing analysis is indeed complex, and we fully recognize the importance of detailed information. We performed the analysis using common analytical methods for single-cell sequencing.It has been supplemented in the Methods section.

    1. Author response:

      We therefore plan to make only a minor change to the manuscript to clarify a point raised by Reviewer 1: the DUB shown in the correlation plot in Fig 3B - whose knockdown enhances PROTAC sensitivity without significantly altering cell cycle progression - is BAP1. Since BAP1 subsequently showed no significant effect on endogenous AURKA levels (Fig 3E) it was excluded from further analysis.

      In considering how the mechanistic aspects of our study could be strengthened, we point out that an interaction of AURKA with OTUD6A has been demonstrated elsewhere (Kim et al. 2021). We also argue that an interaction of AURKA with UCHL5 would not be expected since UCHL5 is a proteasomal DUB shown to act on substrates recruited to the proteasome via capture of ubiquitin chains by the ubiquitin receptors of the proteasome lid. We agree that mechanistically we have not provided complete evidence for a direct deubiquitinating activity of UCHL5 on AURKA. We cannot explain why there is no change in AURKA ubiquitination upon UCHL5 knockdown in our ubiquitin pulldown experiment, but indeed there is considerable uncertainty in the scientific literature on the precise role of UCHL5 at the proteasome.

      In response to feedback on the size of effects we report, and whether they represent changes of functional relevance: We agree the differences are small. Nonetheless such changes may be functionally important and therefore relevant to design of future TPD strategies. Our previous characterization of PROTAC-D (Wang et al. 2021) provides evidence that differential degradation of subcellular pools can have functional relevance. We showed in our study that the lack of degradation of the centrosomal pool (even if this represents only a small fraction of the total pool) led to unexpected phenotypic consequences that were distinct from those observed upon treatment with ATP-competitive inhibitor or siRNA. Therefore we believe our specific finding of spatially restricted action of AURKA-selective OTUD6A to be of clear functional relevance to AURKA TPD strategies and of conceptual importance in establishing the paradigm of TPD modulation by DUBs.

      As Reviewer 1 notes, we do not directly test our hypothesis that combining PROTACs with DUB inhibition could enhance degradation. We would have done so had there been suitable small molecule inhibitors available for OTUD6A or UCHL5 at the time of our study. We plan a broader study of OTUD6A mechanisms and its role in PROTAC sensitivity in cancer cell lines, and appreciate Reviewer 3’s suggestion that the impact of our findings would be strengthened if key results were validated in one or more cancer cell lines. The scope of this new study means we plan to report it in a separate, future publication.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations for the authors):

      Line 122: There were a number of qualitative descriptors in the paper. For instance, if the authors want to say massive campaign, how massive? How rapid? These are relative terms in this context.

      We have revised the text to minimize qualitative descriptors and to provide concrete numbers where possible. The revised sentence (line 121) now reads “We began our structural investigation of nitrogenase evolutionary history by conducting on a large-scale structure prediction analysis of 5378 protein structures, a more than threefold increase compared to available nitrogenase structures in the PDB. We then analyzed our phylogenetic dataset to identify notable structural changes.”

      Line 179: "massively scale up" How massive?

      We agree with the reviewer’s observation, in response, we have removed the phrase “massively scale up” and revised the text.

      Line 182: "no compromise on alignment depth and negligible cost to prediction accuracy". How do you know this? Is this shown somewhere? Was there a comparison between known structures and the predicted structure for those nitrogenases that have structures?

      In response to this comment, we have made several clarifications and revisions in the manuscript:

      We modified Figure S1, which now shows the pLDDT (per-residue confidence metric from Alphafold) values of all our predictions. These scores are consistently high (over 90 for the D and K subunits, and approximetly 90 for the H subunits) regardless of whether the recycling protocol or the bona-fide protocol was used.

      The reviewer’s comment demonstrated to us that the Figure S1 needed to more clearly representing these values, we therefore updated it accordingly.

      To prevent any misinterpretation of our claims about the accuracy and cost of the method , we have revised the text at line 179, as follows:

      “In total, 2,689 unique extant and ancestral nitrogenase variants were targeted. All structures were generated in approximately 805 hours, including GPU computations and MMseqs2 alignments performed using two different protocols: one for extant or most likely ancestral sequences, and another for ancestral variants.”

      To support our analyses further, Figure S10A compares our model predictions with available PDB structures for nitrogenases.

      Additionally, Figure S10B compare our predicted structures with the experimental structures reported in this article. In all cases, we observe low RMSD values.

      Line 220: "fall within 2 angstroms" instead of "fall 2A"?

      We have updated it in the text.

      Line 315: It is not clear how the binding affinities and other measurements in Figure 4 and S6C were measured, and it is not discussed in the material and methods.

      We thank the reviewer for pointing out this lack of clarity. The binding affinity estimations were performed using Prodigy. We have updated the main text (see line 322) to explicitly state that binding affinities were estimated using Prodigy. In addition, we have expanded the Materials and Methods section to include additional information about the structure characterization methods (lines 745-749). Previously, these details were only noted in Supplementary Table S6.

      Line 510-511: "Subtle, modular structural adjustments away from the active site were key to the evolution and persistence of nitrogenases over geologic time". This seems like a bit of an overstatement. While the authors see structural differences in the ancestral nitrogenase and speculate these differences could be involved in oxygen protection, there is no evidence that the ancestral nitrogenase is more sensitive to oxygen than the extant nitrogenase.

      We appreciate the reviewer’s comment. Our intention was to emphasize that subtle, modular structural adjustments might have contributed to oxygen protection rather than to assert that ancestral nitrogenases are more oxygen-sensitive than their extant counterparts. We have revised the text to clarify.

      Reviewer #2 (Recommendations for the authors):

      What is the reference for the measured RMSDs in Fig 2A? What is the value on the y-axis? The range of 'Count' is unclear, given that there are 5000 structures predicted in the study.

      Figure 2A presents a histogram of RMSD values from all pairwise alignments among 769 structures (385 extant and 384 ancestral DDKK), totaling 591,361 comparisons. We excluded ancestral DDKK variants due to computational limitations.  

      Similarly, what is the sequence identity in Figure 2B calculated relative to?

      In Figure 2B, sequence identities are derived from pairwise comparisons across all structures in our dataset. Each value represents the identity between two specific structures, rather than being measured against a single reference.

      The claim that 'structural analysis could reproduce sequence-based phylogenetic variation' should probably be tempered or qualified, given that the RMSD differences calculated are so low.

      We hope to have addressed the concerns about the low RMSD values in the previous comments. We have revised the text (line 204), which now reads: “it still strongly correlates with sequence identity (Figure 2B), indicating that even minor structural variations can recapitulate sequence-based phylogenetic distinctions.”

      How are binding affinities (Figure 4) calculated?

      We have now clarified the binding affinity calculations in the main text. The model used is now detailed at line 322, with additional information provided in the Methods section.

      Presumably, crystallized proteins (Anc1A, Anc1B, Anc2) were also among those whose structures were predicted with AF. A comparison should be provided of the predicted and crystallized structures, as this is an excellent opportunity to further comment on the reliability of AlphaFold.

      In the revised manuscript, Figure S10 now present structural comparisons between the crystallized proteins and their AlphaFold-predicted counterparts.

      The labels in Figure 5B are not clear. Are the 3rd and 4th panels also comparative RMSD values? But only one complex name is provided.

      We appreciate this feedback and now revised the Figure 5B for clarity.

      Page 9 line 220, missing word: 'varaints fall within/under 2angstroms'

      We thank the reviewer for the correction, we have updated the text.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Munday, Rosello, and colleagues compared predictions from a group of experts in epidemiology with predictions from two mathematical models on the question of how many Ebola cases would be reported in different geographical zones over the next month. Their study ran from November 2019 to March 2020 during the Ebola virus outbreak in the Democratic Republic of the Congo. Their key result concerned predicted numbers of cases in a defined set of zones. They found that neither the ensemble of models nor the group of experts produced consistently better predictions. Similarly, neither model performed consistently better than the other, and no expert's predictions were consistently better than the others. Experts were also able to specify other zones in which they expected to see cases in the next month. For this part of the analysis, experts consistently outperformed the models. In March, the final month of the analysis, the models' accuracy was lower than in other months and consistently poorer than the experts' predictions. 

      A strength of the analysis is the use of consistent methodology to elicit predictions from experts during an outbreak that can be compared to observations, and that are comparable to predictions from the models. Results were elicited for a specified group of zones, and experts were also able to suggest other zones that were expected to have diagnosed cases. This likely replicates the type of advice being sought by policymakers during an outbreak. 

      A potential weakness is that the authors included only two models in their ensemble. Ensembles of greater numbers of models might tend to produce better predictions. The authors do not address whether a greater number of models could outperform the experts. 

      The elicitation was performed in four months near the end of the outbreak. The authors address some of the implications of this. A potential challenge to the transferability of this result is that the experts' understanding of local idiosyncrasies in transmission may have improved over the course of the outbreak. The model did not have this improvement over time. The comparison of models to experts may therefore not be applicable to the early stages of an outbreak when expert opinions may be less welltuned. 

      This research has important implications for both researchers and policy-makers. Mathematical models produce clearly-described predictions that will later be compared to observed outcomes. When model predictions differ greatly from observations, this harms trust in the models, but alternative forms of prediction are seldom so clearly articulated or accurately assessed. If models are discredited without proper assessment of alternatives then we risk losing a valuable source of information that can help guide public health responses. From an academic perspective, this research can help to guide methods for combining expert opinion with model outputs, such as considering how experts can inform models' prior distributions and how model outputs can inform experts' opinions. 

      Reviewer #2 (Public review):

      Summary: 

      The manuscript by Munday et al. presents real-time predictions of geographic spread during an Ebola epidemic in north-eastern DRC. Predictions were elicited from individual experts engaged in outbreak response and from two mathematical models. The authors found comparable performance between experts and models overall, although the models outperformed experts in a few dimensions. 

      Strengths: 

      Both individual experts and mathematical models are commonly used to support outbreak response but rarely used together. The manuscript presents an in-depth analysis of the accuracy and decision-relevance of the information provided by each source individually and in combination. 

      Weaknesses: 

      A few minor methodological details are currently missing.

      We thank the reviewers for taking the time to consider our paper and for their positive reflections and suggestions for our study. We recognise and endorse their characterisation of the study in the public reviews and are greatful for their interest and support for this work. 

      Reviewer #1 (Recommendations For The Authors): 

      I initially found Table 1 difficult to interpret. In the final two columns, the rows relate to each other but in the other columns, rows within months don't relate to each other. Could this be made clearer? 

      Thank you for your helpful suggestion. We agree that this is a little confusing and have now added vertical dividers to the table to indicate which parts of the table relate to each other.

      In Figure 1A, the colours are the same as in the colour-bar for Figure 1B but don't have the same meaning. Could different colours be used or could Figure 1A have its own colour-bar to aid clarity? 

      Thank you for your query. The colours are not the same pallette, but we appreciate that they look very similar. To help the reader we have changed the colour palette of panel A and added a legend to the left.  

      In Figure 3, can labels for each expert be aligned horizontally, rather than moving above and below the timeline each month? 

      Thank you for your perspective on this. We made the concious dicision to desplay the experts in this way as it allows the timeline to be presented in a shorter horizontal space. We appreciate that others may prefer a different design, but we are happy with this one. 

      On lines 292 and 293, the authors state that experts were less confident that case numbers would cross higher thresholds. It seems that this would be inevitable given the number of cases is cumulative. Could this be clarified, please? 

      Thank you for raising this point. We agree that this wording is confusing. We have now reworked the entire section in response to another reviewer. The equivalent section now reads: 

      Experts correctly identified Mabalako as the highest-risk HZ in December. They attributed an average 82% probability of exceeding 2 cases; Mabalako reported 38 cases that month, exceeding all thresholds, although the probability assigned to exceeding the higher thresholds was similar to that of Beni (3 cases)

      Reviewer #2 (Recommendations For The Authors): 

      (1) Some methodological details seem to be missing. Most importantly, the results present multiple ensembles (experts, models, and both), but I can't seem to find anywhere in the Methods that details how these ensembles are calculated. Also, I think it would be useful to define the variables in each equation. It would have been easier to connect the equations to the description if the variables were cited explicitly in the text. 

      Thank you for pointing out these omissions. We have included the following paragraph to detail how ensemble forecasts were calculated. 

      “Enslemble forecasts

      Ensemble forecasts were calculated as an average of the probabilities attributed by the members of the ensemble. For the expert ensemble the arithmetic mean was calculated across all experts with equal weighting. Similarly the model ensemble used the unweighted mean of the model forecasts. For the mixed (model and expert) ensemble, the mean was weighted such that the combined weight of the experts forecasts and the combined weight of the models forecasts were equal.”

      (2) Overall, I think the results provide a strong analysis of model vs. expert performance. However, some sections were highly detailed (e.g., the text usually discusses results for every month and all health zones), which clouded my ability to see the salient points. For example, I found it difficult to follow all the details about expert/model predictions vs. observations in the "Expert panel and health zones..." subsection; instead, the graphical illustration of predictions vs. observations in Figure 4 was much easier to interpret. Perhaps some of these details could be trimmed or moved to the supplementary material. 

      Thank you for your honest feedback on this point. We have shortened this section to highlight the key points that we feel are the most important. We have also simplified the text where we discuss the health zones nominated by experts. 

      (3) Figure 5C is a nice visualization of the fallibility of relying on a single individual expert (or model). I wonder if it would be useful to summarize these results into the probability that a randomly selected expert outperforms a single model. Is it the case that a single expert is more unreliable than a single model? The discussion emphasizes the importance of ensembles and compares a single model to an ensemble of experts, but eliciting predictions from multiple experts may not always be possible. 

      Thank you for raising this. We agree that this is an important point that eliciting expert opinions is not a trivial task and should not be taken for granted. We agree with the principle of your suggestion that it would be useful to understand how the models compare to indevidual experts. We don’t however believe that an additional analysis would add sufficiently more information than already shown in Figure 5, which already displays the full distribution of indevidual experts for each month and threshold. If you would like to try this analysis yourself, the relevant data (the indevidual score for each combination of expert, threshold, heal zone and month) is included in the github repo (https://github.com/epiforecasts/Ebola-Expert-Elicitation/blob/main/outputs/indevidual_results_with_scores.csv).

      Minor comments: 

      (1) Figure 2: the color scales in each panel are meant to represent different places, correct? The figure might be easier to interpret if the colors used were different.  

      Thank you for bringing this to our attention. We have now changed the palette of panel A to differ from panel B.  

      (2) Equation 7: is o(c>c_thresh) meant to be the indicator function (i.e. 1 if c>c_thresh) and 0 otherwise)? 

      Thanks for raising this. The function o is the same as in the previous equation – an observation count function. We appreciate that this is not immediately clear so have added a sentence to explain the notation after the equation.

      (3) Table 1: a brief description of the column headers would be useful.  

      Thank you for the suggestion. We have now extended the table caption to include more description of the columns. 

      “Table 1: Experts and health zones included in each round of the survey. The left part of the table details the experts interviewed (highlighted in green) the health zones included in the main survey in each month. In addition, the right part of the table details the health zones nominated by experts and the number of experts that nominated each one.”

    1. Author response:

      (1) We will clarify statements comparing regeneration and developmental processes. Additionally, we will include a new supplemental figure with published data showing that the pou4-2 clone dd_Smed_v6_30562_0_1 (cross-referenced as SMED30002016) is expressed during stages corresponding to organ development in Schmidtea mediterranea (https://planosphere.stowers.org/feature/Schmidtea/mediterranea-sexual/transcript/SMED30002016).

      (2) We will reorganize the figures by combining Figures 3 and 4 for improved clarity.

      (3) We will address experimental and interpretive concerns regarding the role of atonal in the pou4-2 gene regulatory network.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      This study investigates how ant group demographics influence nest structures and group behaviors of Camponotus fellah ants, a ground-dwelling carpenter ant species (found locally in Israel) that build subterranean nest structures. Using a quasi-2D cell filled with artificial sand, the authors perform two complementary sets of experiments to try to link group behavior and nest structure: first, the authors place a mated queen and several pupae into their cell and observe the structures that emerge both before and after the pupae eclose (i.e., "colony maturation" experiments); second, the authors create small groups (of 5,10, or 15 ants, each including a queen) within a narrow age range (i.e., "fixed demographic" experiments) to explore the dependence of age on construction. Some of the fixed demographic instantiations included a manually induced catastrophic collapse event; the authors then compared emergency repair behavior to natural nest creation. Finally, the authors introduce a modified logistic growth model to describe the time-dependent nest area. The modification introduces parameters that allow for age-dependent behavior, and the authors use their fixed demographic experiments to set these parameters, and then apply the model to interpret the behavior of the colony maturation experiments. The main results of this paper are that for natural nest construction, nest areas, and morphologies depend on the age demographics of ants in the experiments: younger ants create larger nests and angled tunnels, while older ants tend to dig less and build predominantly vertical tunnels; in contrast, emergency response seems to elicit digging in ants of all ages to repair the nest.

      We sincerely thank Reviewer #1 for the time and effort dedicated to our manuscript's detailed review and assessment. The revision suggestions were constructive, and we have provided a point-by-point response to address them.

      Reviewer #2 (Public review):

      I enjoyed this paper and the approach to examining an accepted wisdom of ants determining overall density by employing age polyethism that would reduce the computational complexity required to match nest size with population (although I have some questions about the requirement that growth is infinite in such a solution). Moreover, the realization that models of collective behaviour may be inappropriate in many systems in which agents (or individuals) differ in the behavioural rules they employ, according to age, location, or information state. This is especially important in a system like social insects, typically held as a classic example of individual-as-subservient to whole, and therefore most likely to employ universal rules of behaviour. The current paper demonstrates a potentially continuous age-related change in target behaviour (excavation), and suggests an elegant and minimal solution to the requirement for building according to need in ants, avoiding the invocation of potentially complex cognitive mechanisms, or information states that all individuals must have access to in order to have an adaptive excavation output.

      We sincerely thank reviewer #2 for the time and effort dedicated to our manuscript's detailed review and assessment. We have provided a point-by-point response to the reviewer's comments, which we have incorporated into the revised version of the manuscript.

      The only real reservation I have is in the question of how this relationship could hold in properly mature colonies in which there is (presumably) a balance between the birth and death of older workers. Would the prediction be that the young ants still dig, or would there be a cessation of digging by young ants because the area is already sufficient? Another way of asking this is to ask whether the innate amount of digging that young ants do is in any way affected by the overall spatial size of the colony. If it is, then we are back to a problem of perfect information - how do the young ants know how big the overall colony is? Perhaps using density as a proxy? Alternatively, if the young ants do not modify their digging, wouldn't the colony become continuously larger? As a non-expert in social insects, I may be misunderstanding and it may be already addressed in the citations used.

      We thank the reviewer for this interesting question. We find that the nest excavation is predominantly performed by the younger ants in the nest, and the nest area increase is followed by an increase in the population. However, if the young ants dig unrestricted, this could result in unnecessary nest growth as suggested by reviewer #2. Therefore, we believe that the innate digging behavior of ants could potentially be regulated by various cues such as;

      (a) Density-based: If the colony becomes less dense as its area expands, this could serve as a feedback signal for young ants to reduce or stop digging, as described in references (25, 29, 30).

      (b) Pheromone depositions: If the colony reaches a certain population density, pheromone signals could inhibit further digging by young ants, references (25, 29), or space usage as a proxy for the nest area. 

      Thus, rather than perfect information, decentralized control, and digging-based local cues probably regulate the level of age-dependent digging, without the ants needing to estimate the overall colony size or nest area.

      In any case, this is an excellent paper. The modelling approach is excellent and compelling, also allowing extrapolation to other group sizes and even other species. This to me is the main strength of the paper, as the answer to the question of whether it is younger or older ants that primarily excavate nests could have been answered by an individual tracking approach (albeit there are practical limitations to this, especially in the observation nest setup, as the authors point out). The analysis of the tunnel structure is also an important piece of the puzzle, and I really like the overall study.

      We thank the reviewer for the comments. We completely agree that individual tracking of ants within our experimental setup would have been the ideal approach, but we were limited by technical and practical limitations of the setup, as pointed out by the reviewer, such as; 

      (a) Continuous tracking of ants in our nests would have required a camera to be positioned at all times in front of the nest, which necessitates a light background. Since Camponotus fellah ants are subterranean, we aimed to allow them to perform nest excavation in conditions as close to their natural dark environment as possible. Additionally, implementing such a system in front of each nest would have reduced the sample sizes for our treatments.

      (b) The experimental duration of our colony maturation and fixed demographics experiments extended for up to six months (unprecedented durations in these kinds of measurements). These naturally limited our ability to conduct individual tracking while maintaining the identity of each ant based on the current design.

      These details are described in detail within the revised version of the manuscript.

      Reviewer #3 (Public review):

      Summary:

      In this study, Harikrishnan Rajendran, Roi Weinberger, Ehud Fonio, and Ofer Feinerman measured the digging behaviours of queens and workers for the first 6 months of colony development, as well as groups of young or old ants. They also provide a quantitative model describing the digging behaviours and allowing predictions. They found that young ants dig more slanted tunnels, while older ants dig more vertically (straight down). This finding is important, as it describes a new form of age polyethism (a division of labour based on age). Age polyethism is described as a "yes or no" mechanism, where individuals perform or not a task according to their age (usually young individuals perform in-nest tasks, and older ones foraging). Here, the way of performing the task is modified, not only the propensity to carry it or not. This data therefore adds in an interesting way to the field of collective behaviours and division of labour.

      The conclusions of the paper are well supported by the data. Measurements of the same individuals over time would have strengthened the claims.

      We sincerely thank reviewer #3 for the time and effort dedicated to our manuscript's detailed review and assessment. We completely agree with the reviewer’s comments on the measurements of the same individuals over time, however, we were limited by the technical and experimental limitations as described above and pointed out by reviewer #2.

      Strengths:

      I find that the measure of behaviour through development is of great value, as those studies are usually done at a specific time point with mature colonies. The description of a behaviour that is modified with age is a notable finding in the world of social insects. The sample sizes are adequate and all the information clearly provided either in the methods or supplementary.

      We thank reviewer #3  for this assessment.

      Weaknesses:

      I think the paper is failing to take into consideration or at least discuss the role of inter-individual variabilities. Tasks have been known to be undertaken by only a few hyper-active individuals for example. Comments on the choice to use averages and the potential roles of variations between individuals are in my opinion lacking. Throughout the paper wording should be modified to refer to the group and not the individuals, as it was the collective digging that was measured. Another issue I had was the use of "mature colony" for colonies with very few individuals and only 6 months of age. Comments on the low number of workers used compared to natural mature colonies would be welcome.

      Regarding the main comment 1

      We completely agree with the reviewer’s comment on considering inter-individual variability based on activity levels. We have discussed how individual morphological variability could influence digging behavior (references: 28, 31), and we will elaborate further on this aspect in future revisions.

      Regarding the main comment 2:

      The term ‘colony maturation’ in our study refers to the progressive development of colonies from a single queen, distinguishing it from experiments that begin with pre-established, demographically stable colonies. We provide a detailed explanation for this terminology in the revised version of the manuscript. We were practically limited by the continuation of the experiments for more than 6 months of age, predominantly due to the stability of nests, as they were made with a sand-soil mix. We also acknowledge that the colony sizes attained in our maturation experiments may be smaller than those of naturally matured colonies. This trend was observed generally in lab-reared colonies and could be attributed to differences in microclimatic conditions, foraging opportunities, space availability, and other factors. We have explicitly described these details in the revised version of the manuscript.

      Reviewer #1 (Recommendations for the authors):

      The experimental design is fantastic. The large quasi-2D should allow for the direct visualization of the movements of individuals and the creation of the nest, and the inclusion of non-workers (specifically, a mated queen and pupae) is new and important. However, I have some questions and concerns about the results, as outlined below. Also, I found the paper difficult to read, and the connections between the various experiments and the model were not always clear. 

      We thank the reviewer for the time and effort dedicated to reviewing our manuscript. We have modified the manuscript substantially to address the comments and readability. 

      The assumption that the digging rate is constant across ants may be a strong one. Previous work (see, for instance, Aguilar, et al, Science 2018) has demonstrated a very heterogeneous workload distribution among ants. I am not sure what implications that may have for the results here, but the authors should comment on this choice. Related to the point above, given a constant digging rate, the variation in digging is attributed to an age-dependent "desired target area". Can the authors comment on the implications of this, specifically in contrast to a variable digging rate? The distinction between digging rate differences and target area differences seems to be important for the authors. However, the way this is presented, it is difficult to fully understand or appreciate this importance and its implications. What is the consequence of this difference, and why is this important?

      We apologize to the reviewer for the confusion.

      Our model does not assume that the digging rate (da/dt, Equation 1) remains constant throughout the experiment. Instead, we only treat the basal digging rate (r) as a constant.

      The variable digging rate (da/dt, Equation 1) is derived by multiplying the basal rate constant (r) by the term (1 - a/a<sub>age</sub>), which accounts for deviations from the age-dependent target area that the ants aim to achieve. This makes the actual digging rate dynamic, as it responds to changes in excavated area (e.g., expansion or rapid collapse)

      For example, according to our model (Equation 1), two ants with the same basal digging rate (r) may exhibit markedly different actual digging rates at a given time if they differ in age. This occurs because the variable digging rate (da/dt) depends not only on ‘r’ but also on the age-dependent term (1 - a/a<sub>age</sub>). Also, we emphasize that the use of a basal digging rate constant aligns with prior studies (refs. 24, 29, 30).

      In our work, we demonstrate that after a collapse event, ants of all ages dig at rates comparable to those observed in the initial (pre-collapse) phase of the experiment. This occurs because the ants are far from their age-dependent target area, effectively resetting their digging behavior. By comparing maximum digging rates pre- and post-collapse, we provide strong empirical evidence that this rate is age-independent (SI Fig. 6A, 6B), supporting the conclusion that the basal digging rate constant (r) is a fundamental property of the ants' behavior, unaffected by age.

      We agree with the reviewer that individual tracking of ants within our experimental setup would have been the ideal approach. Then, we could have taken the inter-individual variability of the digging activity into account. However, we were limited to doing so by the technical and practical limitations of the setup, such as; 

      (a) Continuous tracking of ants in our nests would have required a camera to be positioned at all times in front of the nest, which necessitates a light background. Since Camponotus fellah ants are subterranean, we aimed to allow them to perform nest excavation in conditions as close to their natural dark environment as possible. Additionally, implementing such a system in front of each nest would have reduced the sample sizes for our treatments.

      (b) The experimental duration of our colony maturation experiments extended for up to six months (unprecedented durations in these kinds of measurements). These naturally limited our ability to conduct individual tracking while maintaining the identity of each ant based on the current design.

      In light of these points, the following lines are added to the discussion (line numbers: 283-295), signifying the above points:

      “Our age-dependent model demonstrates that the digging behavior in Camponotus fellah is governed by a basal digging rate constant (r) modulated by the age-dependent feedback (1 − a/aage). Crucially, we show that after a collapse, the maximum digging rates return to their pre-collapse levels, suggesting that this basal rate ’r’ represents an age-independent ceiling on how fast ants can dig, regardless of age or context (SI Fig. 6 A, B). Previous studies have demonstrated both homogeneous and heterogeneous workload distribution, with varying digging rates among ants (24, 29, 30, 35). Studies showing heterogeneous workload distribution relied on continuous individual tracking of ants to quantify digging rates (35). However, this approach was not feasible in our current design due to the experimental durations of both our colony maturation and fixed demographics experiments. Additionally, sample size requirements naturally limited our ability to conduct continuous individual tracking during nest construction in our study. Thus, based on empirical measurements from our fixed-demographics experiments and supported by the age-independent post-collapse digging rates, we adopted a constant basal digging rate for simulating our age-dependent model—an assumption aligned with both prior literature and the collective dynamics observed in our system (24,29,30)”.

      Model: as presented, the model seems to lack independent validation. The model seems to have built-in that there is an age-dependent target area, and this is what is recovered from the model. I am failing to see what is learned from the model that the experiments do not already show. Also, the model has no ant interactions, though ants are eusocial and group size is known to have a large effect on behavior (this is acknowledged by the authors at the beginning of the discussion). Can the authors comment on this?My recommendation would be to remove the model from this paper or improve the text to address the above comments.

      We did not draw the conclusion of the age-dependent target area from our model. We used the fixed demographics experiments to quantify the age-dependent area target as a function of the age of individuals. We then used this age-dependent area target in our model to quantify the excavation dynamics of the colony maturation experiments, where ants span a variety of ages, as the nest population changes over time, resulting in natural variation in the ages of individuals within the nest.  These results could not have been obtained by performing any of the individual experiments, whether colony maturation or the fixed demographics, young or old, on their own. The need for different age demographics was crucial to quantify the age-dependent effects in nest excavation, which were lacking in previous studies. 

      First, the age-dependent model provides a very good estimate for the natural growth of the nest.  More importantly, after fixing an age threshold of 56 days (mean + standard deviation of the young ant age), the model provides an estimate of which ants are doing the majority of the digging during natural nest expansion. This teaches us that during natural expansion, the older ants are far from their density target and therefore do not engage in any substantial digging, which is shown in Figure 4. C. 

      On the other hand, the younger ants are close to their area targets and induced to dig. Indeed, the target area fitted for the age-independent model closely approximates the empirically measured age-dependent target when extrapolated to very young ants. This provides further support for the idea that, in the colony maturation experiments, the youngest ants are responsible for most of the digging.

      Our model is a simple analytical model, inspired by earlier models that used a fixed area target (such as density models) for nest construction. However, because we knew the precise age of workers in our experiments, we were able to obtain age-dependent area targets, thereby challenging the use of a constant area target (as employed in prior studies) in light of our findings from the fixed demographics of young and old colonies.

      Empirically Quantifiable Parameters: We wanted our model to have empirically quantifiable parameters. Since we did not continuously record the experiment, we could not quantify agent-agent interactions, pheromonal depositions, or similar factors.

      Minimal Model Design: We aimed to keep the model as minimal as possible, which is why we did not include complex interactions such as those found in continuous tracking experiments.

      However, the model does set up some interesting hypotheses that could easily be tested with the experimental setup (e.g., marking the ants / tracking individual activity levels). For instance, it is hypothesized that older ants dig less often, but when they do dig, they do so at the same rate. Given the 2D setup, the authors could track individual ants and test this hypothesis. Also, if the desired target area does decrease with age, the authors could verify this hypothesis by placing older ants into arenas with different-sized pre-formed nests to observe how structure is changed to achieve the desired area/ant.

      We thank the reviewer for this comment.

      We believe that the confusion with the usage of a constant basal digging rate is resolved now. To briefly reiterate, ants dig at variable rates that can be decomposed to a (constant on short time scales but age-dependent) basal rate times the (variable) distance from the density target. The suggested experiments are beyond the scope of our current study, and further studies could utilize the suggested experimental design with better time-resolved imaging for individual ant tracking that could verify the predictions from our model. 

      Specific comments:

      Title:

      The title suggests a broad result, yet the study focuses on one ant species. Please modify the title to more accurately reflect the scope of the work.

      We thank the reviewer for the comment.

      The title is modified as “Colony demographics shape nest construction in Camponotus fellah ants.”

      Introduction:

      Important information and context are missing about this ant species. For instance, please add the following about this species in the introduction:

      What is their natural habitat and substrate? How does the artificial soil compare?

      What is their (rough) colony size? [later, discuss experiment group size choice and potential insights/limitations of results when applied to the natural system].

      The details have been added to the introduction (line numbers : 49-55) and the materials and methods section (Study species).

      “Camponotus fellah ants are native to the Near East and North Africa, particularly found in countries like Israel, Egypt, and surrounding arid and semi-arid regions, where they prefer to nest in moist, decaying wood, including tree trunks, branches, or stumps (49,50). The species lives in monogynous colonies with tens to thousands of individuals. Nests are commonly found in a sand-loamy mix, which is a combination of sand, soil, clay, or gravel, providing structural stability and moisture retention (51). They are typically found under rocks, in the crevices of dried vegetation, or dry, sandy soils, sometimes in areas with loose gravel, with a colony size ranging from tens to thousands of workers”.

      What is the natural life expectancy of a worker? A queen? [later, discuss fixed demographic age choices in this context and/or why were age ranges chosen for experiments?].

      The lifespan of ants, including both queens and workers, varies significantly based on caste, species, and environmental conditions.

      (1) Queen Longevity: From the literature, Camponotus fellah queens can live up to 20 years, with one documented case reaching 26 years (50). 

      (2) Worker Longevity: In contrast to queens, the lifespan of workers is much shorter. Lab studies on Camponotus fellah (82) and other Camponotus species (83) suggest that workers can live for several months depending on environmental conditions, colony health, and caste-specific roles (e.g., minor vs. major workers)

      (3) Laboratory vs. Natural Conditions: Worker longevity is highly variable between laboratory and natural conditions

      Therefore, in the context of the old worker lifespan in our experiments, ~200 days (roughly 6–7 months), we strongly believe that the worker lifespan used in our experiments represents a substantial portion of a worker's expected life. While exact figures for C. fellah workers are unavailable, inferences from related species suggest that workers nearing 200 days are approaching the latter stages of their lifespan, making them meaningfully "old". 

      The details are added to the main text (line numbers: 124-127) and discussion (line numbers: 278-282).

      Why was this species chosen? Convenience, or is there something special about this species that the readers should know? Specifically, is there something that might make the results more general or of broader interest?

      Camponotus fellah was chosen for this study because it is native to Israel, making it convenient to collect and maintain in the lab. Additionally, its nuptial flights occur close to the study location, ensuring a steady supply of colonies. We were able to provide them with a nesting substrate similar to what they naturally use, as their nests are typically found in a sand-loamy mix, similar to the sand-soil mix in our artificial nests. This was possible because we had the opportunity to observe their habitat and nesting behavior in the wild, allowing us to gather preliminary information on their natural nesting conditions.

      Results:

      Line 60: "several brood items" - how many exactly? Was this consistent across experiments? Do mated queens ever produce more pupae during the experiments?

      Yes, the number of brood items (5) was added consistently across the experiments. Additionally, the mated queen did produce pupae during the course of the experiments, which was evident from the noticeable increase in the number of workers in the nest. This was significantly higher than the number of brood items present at the start of the study.

      The above points are added to the section (line numbers : 68-69).

      Figure 1: Panel A - The food ports are never mentioned in the text. Are the ants fed during the experiments? If so, what? With what frequency? Is the water column replenished/maintained? If so, how and how often? panel C - how long did this experiment last?

      We thank the reviewer for pointing this out. We have now updated the nest maintenance section in the Materials and Methods (line numbers : 349-354) part to include all the necessary details and clarifications.

      “We provided food to the ants ad libitum through three separate tubes containing water, 20 % sucrose water, and protein food. The protein mixture included egg powder, tuna, prawns, honey, agar, and vitamins. Each of the three tubes was filled with 5 ml of their respective contents and sealed with a cotton stopper to prevent overflow. The tubes were positioned at a slight angle and connected using a custom-made plexiglass adapter to facilitate the flow of liquids. These tubes were replenished once depleted, and regularly replaced once the nest maintenance was carried out bi-weekly.”

      Line 76: "...excavation was commenced by the founding queen". How were the queen and pupae introduced into the system?

      We initiated colony maturation experiments by introducing a single mated queen and several brood items (pupae) at random positions on the soil layer of the nest (line numbers : 68-69)

      Line 87: Please provide bounds for 11cm2/ant value. Is there any biological or physical justification for this number?

      We thank the reviewer for the suggestion. We have now provided the bounds as requested (line numbers : 97-101). 

      We were unable to pinpoint a specific biological justification based solely on this treatment. However, on extrapolating the age-dependent area fit we derived from the fixed demographics experiment, we found that at the age of 1 day, an ant has a target area of approximately 11.17 cm², which is the largest age-dependent area target possible within our experimental setup.

      From the colony maturation experiment, we obtained the value of  11.6 (±1.15) cm² as the area per ant. The consistency between the area per ant obtained from two completely different treatments across different colonies yielded similar results. We propose that under standardized conditions, a 1-day-old ant has a theoretical maximum target area of 11.17 cm²—the highest value observed in our experimental framework.

      Lines 98-99: "one straightforward possibility would be that newborn ants are the ones that dig". This statement contradicts the results presented in Figures 1 and S1 - the population increase seems to occur at least a few days before increased excavation in nearly all cases.

      We apologize for any confusion caused by our initial phrasing. To clarify, we proposed that a lag likely exists between population growth and nest area expansion. This lag could arise from two sequential processes: (1) newborn ants require time to mature and become active (first delay), and (2) digging to expand the nest takes additional time (second delay; estimated at ~10 days from the cross-correlation analysis). Thus, our results suggest that it is not the population that lags behind the area, but rather the area that lags behind the population, as demonstrated in Figures 2D and SI. Figure. S1.

      The sentence “one straightforward possibility would be that newborn ants are the ones that dig” is modified as below (line numbers : 112-119) to prevent further confusion.

      “One possible explanation is that, although all ants are capable of digging, it is primarily the newly emerged ants who perform this task. In this case, nest expansion would lag behind colony growth due to two delays: first, the time needed for young ants to mature enough to begin digging, and second, the physical time required to excavate additional space (e.g., around 10 days). This mechanism could eliminate the need for ants to assess overall colony density, as each new group of active workers simply enlarges the nest as they become ready. An alternative possibility is that all ants, regardless of age, respond to increased density by initiating excavation. In that scenario, nest expansion would follow more immediately after the emergence of new individuals, making delays less prominent (24, 29, 30)”.

      Line 105: How do group sizes compare to natural colony size? Line 106: How do "young" and "old" classifications compare to natural life expectancy?

      We have already addressed this question in an earlier comment. The details are added to the main text (line numbers: 124-127) and discussion (line numbers: 278-282).

      Line 118-119: How are nests artificially collapsed?

      We have added a new section in the Materials and Methods section that describes the nest collapsing procedure (Nest artificial collapse - line numbers : 386-399).

      Figure 2 Panel A: The white dotted line is nearly impossible to see. Please use a more visible color.

      We thank the reviewer for the comment.

      We changed the solid circles to violet and the dotted line color to continuous white.

      Figure 3: The use of circle markers as post-collapse recovery in young and old as well as old pre-collapse is confusing. Use different symbols for old pre-collapse vs young and old post-collapse.

      We thank the reviewer for pointing out the confusion. We have revised the figure markers as suggested and modified the main text accordingly.

      • Young; pre-collapse : star

      • Young; post-collapse : diamond

      • Old; pre-collapse : circle

      • Old; post-collapse: triangle.

      Figure 3 Panel C: Indicate that fixed demographic values here are pre-collapse. Also, as presented, it appears that there is a large group-size dependence that is not commented on. Previous results (Line 87 and Figure 2C) suggest a constant excavation area per ant of 11cm2/ant. Figure 3, panel C appears to suggest a group-size dependence. If these values are divided by group size, is excavated area per ant nearly constant across groups? How does the numerical value compare to the slope from Figure 2C?

      We thank the reviewer for their insightful comments.

      First, we would like to clarify that the area target of 11.1 (±1) cm²/ant, as described in Line 87, was obtained from the colony maturation experiments. In these experiments, we were unable to track the age of each individual ant, so the area target was calculated by normalizing the total excavated area by the number of ants.

      We normalized the excavated area by the group size for both young and old colonies as suggested, and found that the area per ant was not significantly different across the group sizes (see new SI Fig. 5A). This indicates that the excavated area per ant remains relatively constant within each demographic group. Moreover, this shows that the total excavated area is proportional to group size, in agreement with previous works (24, 29, and 30). 

      We have explicitly described the above information in the line numbers: 142-146

      Regarding the slope comparisons, the slope of Figure 2C (10.71), from the colony maturation experiments, is the largest, followed by the area per ant from the short-term young (8.79 ± 0.98) cm²/ant, and short-term old experiments (5.16 ± 0.44) cm²/ant.

      Lines 128-129: "...younger ants aim to approach a higher target area". Seems hard to know what they "aim" to do... rephrase to report what they are observed to do.

      We thank the reviewer for the comment. The sentence is rephrased as suggested (line numbers : 158-161).

      “In the previous sections, we showed that in fixed-demographics experiments, younger ants excavated a significantly larger nest area compared to older ants (Fig. 3. C).  This difference emerged despite similar temporal patterns in digging rates across age groups, with excavation activity peaking within the first 7 days before asymptotically decaying as nest expansion approached saturation (SI Fig. 8).”

      Lines 133-141: The model description is not clear. Specifically, what parameters are ant-dependent? How does A relate to a?

      We appreciate the reviewer's request for clarification. In our model:

      (1) Equation 1 describes the change in the excavated area due to the digging activity of a single ant. Here, the variable 'a' represents the area excavated by one ant. This formulation allows us to capture the individual digging behavior and its impact on the excavation process.

      (2) Equation 2 extends this concept to the total area excavated in the nest, denoted by 'A'. Specifically, 'A' is the sum of the areas excavated by all ants present in the nest. In other words, it aggregates the individual contributions of each ant, linking the microscopic digging behavior to the macroscopic excavation dynamics.

      Therefore, the relationship between 'a' and 'A' is as follows:

      ●     'a' = Area excavated by a single ant.

      ●     'A' = ∑ 'a' (Summed over all ants in the nest).

      We have explicitly mentioned this in the line numbers “ 161-179”, and describe the model assumptions and parameters in detail.

      Figure 4:

      Figure 4, Panel A: The equation quoted in the caption does not match the data in the figure. The equation has a positive slope and negative intercept, while the figure has a negative slope and a positive intercept. Please provide the correct equation and bounds on fit parameters.

      We thank the reviewer for spotting this typing mistake.

      The equation was already updated in the reviewed preprint published online. The correct equation and the fit bound are provided in the figure caption.

      “Target areas decrease linearly with the ant age (y = −0.032x + 11.22 , 95 % CI (Intercept : (-0.035,-0.027), Slope : (10.53,11.91)), R2 = 0.96 ).”

      Figure 4, Panel A: There seem to be three "fixed target area per ant values" in the paper: around 11cm2/ant (line 87), 11.6 cm2/ant (SI Figure 2), and linearly dependent value from fit to Figure 4A. The distinctions between these values and their significance are hard to keep track of. Can the authors add a discussion somewhere that helps the reader better understand? Is there a way to connect/rationalize/explain these different values in terms of demographics?

      We thank the reviewer for the suggestion.We have added a paragraph in the discussion (line numbers : 270-277) describing the area targets.

      “In our colony maturation experiments, we found that area per ant was highest when the workers were youngest, with values around 11.1–11.6 (±1–1.15). This aligns with observations from naturally growing nests, where newly eclosed ants dominate the population and nest volumes are relatively large. Supporting this, fixed-demographics experiments showed that the area excavated per ant declines linearly with worker age, indicating that the youngest ants contribute most to excavation. Notably, the target area we fit for the age-independent model (11.6 ± 1.15) closely matches the extrapolated value for very young workers (Fig. 4. A), reinforcing the idea that young ants are the primary excavators during early colony growth. In contrast, during events like collapses or displacement, when space is urgently needed, ants of all ages participate in excavation.”

      Figure 4, Panel A: What are various symbols and colors for data with error bars? If consistent with Figure 3, then this panel and subsequent model confound two factors: (1) the age dependence and (2) the behavioral differences pre- and post-collapse (structures are different pre-and post-collapse, according to SI Figure 6; line 120: "...colonies ceased digging when they recovered 93{plus minus}3% of the area lost by the manual collapse..."; lines 201-202: "We find significant quantitative and qualitative differences between nests constructed within this natural context and nests constructed in the context of an emergency") and behavior is different (according to SI Figure 7 and line 119: "...all ants dig after collapse...")). Therefore, without further supporting evidence, it does not seem that these data should be used to fit a single line that defines a model parameter a_age for each ant in equation 2.

      The symbols are the area per ant quantified from the fixed demographics of young, and old experiments. The symbols show the following;

      A.  Star - Young, pre-collapse

      B.  Diamond - Young, post-collapse 

      C.  Circle - Old, pre-collapse

      D.  Triangle - Old, post-collapse.

      The details are clearly described in the figure caption. 

      We apologize to the reviewer for the confusion. We argue that the data can be fit by a single line to quantify the parameter ‘a_age’ as follows. 

      A. All data presented in Figure 4A were obtained from the same fixed-demographics experiments (containing only young and old ants) under experimental collapse conditions, pre- and post-collapse. These results, therefore, exclusively reflect emergency nest-building behaviors during emergency scenarios and do not include any observations from natural colony maturation processes.

      B. Age-dependent excavation differences: As correctly noted by the reviewer, the observed difference in excavated area before versus after collapse reflects the natural aging of ants in our experimental colonies. While colonies recovered >90% of lost area post-collapse, the residual variation was not negligible—instead, it systematically correlated with colony age structure. By tracking colonies across this demographic transition, we obtained additional data points spanning a broader developmental spectrum. This extended range strengthened our ability to detect and quantify the linear relationship between worker age and excavation output.

      C.The quoted sentence (lines 201-202, submitted version) refers to comparisons across all three experimental cases: (1) fixed-demographics young ants, (2) fixed-demographics old ants, and (3) the natural scenario (mixed-age colonies). Importantly, these comparisons are based on pre-collapse steady-state excavation areas, ensuring a consistent baseline across treatments. We highlight quantitative and qualitative differences between these distinct experimental groups, not between pre- and post-collapse phases within the same treatment. The pre- and post-collapse data within fixed-demographics groups were analyzed separately to avoid conflating aging effects with emergency responses.

      To avoid confusion, the whole paragraph in the discussion (line numbers : 253-260) is rephrased.

      In lines 201-202; “We find significant quantitative and qualitative differences between nests constructed within this natural context and nests constructed in the context of an emergency”. 

      Here, by natural context, we mean the nests excavated in the colony maturation experiments. We believe that it could have been confusing, and the sentence is modified as answered for the previous question. 

      Figure 4, Panel B: This uses the model with a_age determined by from Figure 4A and the life table (as shown in the supplemental), whereas the supplemental Figure SI 8 uses the fixed blue line a_age value for the model, which comes from the colony maturation experiments. The age-independent model in the supplemental fits the data better, yet the authors claim the supplemental model cannot be applied to the data because of their experimentally determined age-dependent target area. Given the age-independent target area model fits better, additional evidence/justification is needed to support the choice of the model.

      We agree with the reviewer that the age-independent model fits the data well. However, we believe that the fixed area target cannot be used to explain the excavation dynamics for the following reasons.

      We make an important assumption in our model: that the ants rely on local cues and that individual ants can not distinguish between the fixed demographics and colony maturation experiments (line numbers : 161-166). Given this assumption, the ants cannot change their behavior between experiments, meaning the same model should fit all of our results. However, the fixed demographics experiments revealed a significant difference in the areas excavated by young vs. old cohorts, despite having the same group size. If the ants regulated the excavated area based on an age-independent constant density target model, then the excavated area in the fixed demographics of young and old colonies would have been similar. This discrepancy indicates that the target area per ant is not constant, as assumed in the age-independent density model (SI. Fig. 8). We emphasize that while the age-independent model provides a better fit for the excavated area in colony maturation experiments, the age-dependence of excavation is empirically supported by fixed-demographics experiments. Therefore, we implemented this age-dependence through a variable target area within the age-dependent model framework to explain excavation dynamics in the colony maturation experiments.

      These details are explicitly mentioned in the main text (line numbers : 187 - 198)

      Figure 4, Panel C: Is this plot entirely from the model, or are the data points measured from experiments? Please label this more clearly.

      We apologize to the reviewer for the confusion.

      The Figure 4C is based on the age-dependent digging model. We applied the model to population data from the long-term experiments (n = 22). By setting an age threshold of 56 days (since ants used in the short-term young experiment had an average age of 40 ± 16 days), we categorized the ants into young and old groups. We then quantified the area dug by the young ants, the queen, and the old ants in terms of the percentage of the total area excavated. We hypothesized that, because young ants have a lower digging threshold, they would perform the majority of the digging. We indeed confirm this in Figure 4C.

      This information is added to the main text and described in detail (line numbers: 200 - 208).

      Lines 162-165: "...Furthermore, we quantified the area dug by each ant in the normal colony growth experiment as estimated from the age-dependent model and found that all ants excavated more or less the same amount...". Figure 4D shows a distribution with significant values ranges from 1-16 cm2... how is this interpreted as "more or less the same amount" and what is the significance of this?

      We apologise to the reviewer for the confusion.

      We quantified the percentage contribution to the excavated area of each histogram bin (provided in the new SI table: 4), and found that the area excavated between 5 cm² and 13 cm² accounts for 73.76% of the total excavated area. This indicates that most ants dug within this range rather than exhibiting extreme variations. Additionally, the mean excavation amount is 7.84 cm², with a standard deviation of 3.44 cm², meaning that most values fall between 4.4 cm² and 11.28 cm², which aligns well with the 5–13 cm² range. Since the majority of the excavation is concentrated within this narrow interval, and the mean is well centered within it, this suggests that ants excavated more or less the same amount, rather than forming distinct groups with highly different excavation behaviors.

      We have modified the main text (line numbers: 209-216) to include these points.

      The biological significance of this finding is that since all ants in the colony maturation experiments are born inside the nest, we hypothesize that they should excavate similar amounts. To test this, we quantified the area contribution of each ant over the entire duration of the experiment using the age-dependent digging model as described above and found that they indeed excavated more or less the same amount. From our analysis of fixed demographics experiments, we showed that the youngest ants excavate the largest area. Since the majority of the youngest ants participated in the colony maturation experiments, this further supports our hypothesis.

      Figure 5.

      Figure 5, Panels A-C: Please provide a scale bar. 

      The scale bar is provided in the figure as suggested. The algorithm for the cutoffs for tunnel vs wide tunnels is described in detail in the section “Nest skeletonization, segmentation, and orientation.”

      Figure 5, Panel E: Why does the chamber error bar for 5 ants go to zero?

      In Figure 5, E, we plot the standard error, as described in the figure caption. In the experiments, the chamber area contributions were (0,0,39.94,0) respectively. The mean of the 4 numbers is 9.985, the standard deviation is 19.97, and the standard error is 9.985. So, the mean and the standard error are the same, so the lower error bar goes to zero, and the upper error bar goes to 19.97. This implies that in these experiments, the chamber area is often zero.

      Figure 5, Panel I: Why are there no chambers for young colonies in I when they are in the histogram in E?

      We apologize to the reviewer for the confusion. We initially missed adding the chamber orientation data of the young colonies to Panel I, but it has now been included.

      Line 212: "...densities of ants never become too high...". What is too high? Is there some connection to biological or physical constraints?

      Under normal growth conditions, nest volume is kept proportional to the number of ants, ensuring that the density remains within a specific range. This prevents overcrowding, which could otherwise lead to excessively high densities.

      Yes, we believe there is likely a connection to both biological and physical constraints. The proportional relationship between nest volume and the number of ants is likely driven by factors such as:

      (1) Biological Constraints:

      Ant Colony Size: Ants typically adjust their behavior and social structure to maintain an optimal population size relative to available resources and space.Overcrowding could lead to potentially a breakdown in colony function.

      Colony Health: High densities can lead to faster epidemic spread, leading to negative effects on reproduction, foraging efficiency, and overall colony health. By maintaining density within a specific range, the colony can thrive without these adverse effects.

      (2) Physical Constraints:

      Spatial Limitations: The physical space within the nest limits how many ants can occupy it before space becomes constrained. The nest’s structure and size must physically accommodate the ants, and the volume must be large enough to prevent overcrowding, and efficient resource distribution.

      Lines 272 and 302: How often were photos taken? These two statements seem to suggest different data collection rates.

      As stated in line 272, photos were taken every 1 to 3 days. During each photo session, four photos were taken, with each photo separated by 2 seconds, as mentioned in line 302. To avoid confusion, we rephrased the sentence (line numbers: 359-361).

      “We photographed the nest development every 1-3 days. During each photography session, four pictures of the nest were taken, with a 2-second interval between each.”

      Reviewer #2 (Recommendations for the authors):

      Some more minor points/questions/clarifications:

      This might be pedantic, but I don't think the nest serves as the skeleton of the superorganism, while it does change and grow, the analogy becomes weak beyond that point. The skeleton serves to protect the internal organs of the organism, facilitates movement and muscle attachment, and creates new blood cells. I would be more comfortable with a statement that the nest can grow or shrink according to need.

      We sincerely thank the reviewer for their time and effort in providing a detailed review and assessment of our manuscript. A point-by-point response to the comments is provided below.

      The analogy of treating a nest structure to the skeleton of a superorganism was based on the following points;

      (a) Protection: A nest protects the colony on a collective scale. This is analogous to protecting "organs" by a skeletal framework.

      (b) Organization and Division of Space: The skeletal structure organizes the body's internal layout, just as nest structures are organized into various spatial compartments for various colony functions, with specific regions designated for brood chambers, food storage, and waste disposal.

      Thus, we believe that the analogy can still be valid in a metaphorical way.

      Does this statement need justification with a citation, or is that information contained in the subsequent clause? "However, for more complex structures where ants congregate in specific chambers, workers are less likely to assess the overall nest density." The idea that workers do (or do not) assess overall density touches on many issues, including that of perfect information and adaptive responses, that it seems it needs to be well founded in previous work to be stated in such unequivocal terms.

      We thank the reviewer for this comment. The references for this argument are provided in the next sentence. We have now moved these references to the relevant sentence (reference number: 24, 29,30; line number : 30-31 ) 

      Can you give some more information on this statement? "Experiments were terminated either when the queen died or when she became irreversibly trapped after a structural collapse." Why was this collapse irreversible and therefore unlike treatment 2? Did the queen die in these instances? Was this event more likely than in natural colonies? And if so, was there something inherently different about your experiments that limit interpretation under natural conditions (e.g. the narrow nature of the observation setup? The consistency of the sand?)

      Our nest excavation experiments were terminated under two primary scenarios: (1) the queen died of natural causes, reflecting the baseline mortality expected when queens are brought into laboratory conditions, or (2) the nest experienced a structural collapse that left the queen irreversibly trapped. The second scenario is further elaborated below:

      Irreversible Collapses: These collapses were classified as irreversible because the queen could not be rescued alive. This occurred when the structural stability of the nest failed, burying the queen in a manner that prevented recovery. In some cases, the collapse resulted in the queen's immediate death, while in others, she was trapped beyond reach, and any rescue attempt risked further structural damage.

      Collapse and Experimental Context: These collapses were not uniquely associated with natural colonies or fixed-demographic experiments; rather, they occurred across various experimental setups.

      The sentence is modified as below to improve clarity (line numbers : 70-72 ).

      “In all instances where a collapse resulted in the queen's death or her being irreversibly trapped in the nest, the experiment was excluded from analysis starting from the point of the collapse, as such events did not reflect normal colony dynamics.”

      I want to make sure I understand the following statement: "Moreover, the area excavated by the young cohorts was similar to that excavated by naturally maturing colonies at the point in which they reached the same population size (Tukey's HSD; group size: 5; p = 0.61, group size: 10; p = 0.46, group size: 15; p = 0.20)." Do I have it right that this means a group of (e.g. 10) young ants excavates an area similar to that of a group of 10 naturally maturing ants at the same age as the young ants?

      Yes, the interpretation provided is correct. We apologize to the reviewer for the confusion. We have rephrased the sentence for better readability (line numbers : 146-148).

      “Furthermore, the area excavated by the young cohorts was comparable to that excavated by naturally maturing colonies when they reached the same population size (Tukey's HSD; group size: 5, p = 0.61; group size: 10, p = 0.46; group size: 15, p = 0.20)”

      How old do ants get? Is the 'old' demographic (~200 days) meaningfully old in the context of the overall worker lifespan? While the results certainly demonstrate there is an age effect, I would like to understand how rapid this is in terms of overall lifespan.

      The lifespan of ants, including both queens and workers, varies significantly based on caste, species, and environmental conditions.

      (1) Queen Longevity: From the literature, Camponotus fellah queens can live up to 20 years, with one documented case reaching 26 years. This remarkable longevity underscores the queen's central role in maintaining the colony.

      (2) Worker Longevity: In contrast to queens, the lifespan of workers is much shorter.

      However, specific data on worker longevity in Camponotus fellah colonies are lacking. Studies on other Camponotus species (50, 82) suggest that workers can live for several months depending on environmental conditions, colony health, and caste-specific roles (e.g., minor vs. major workers).

      (3) Laboratory vs. Natural Conditions: Worker longevity is highly variable between laboratory and natural conditions

      Therefore, in the context of the old worker lifespan in our experiments of, ~200 days (roughly 6–7 months) we strongly believe that the worker lifespan used in our experiments represents a substantial portion of a worker's expected life. While exact figures for C. fellah workers are unavailable, inferences from related species suggest that workers nearing 200 days are approaching the latter stages of their lifespan, making them meaningfully "old."

      These details are added to the main text (line numbers : 124 - 127) and to the discussion (line numbers : 278-282)

      Reviewer #3 (Recommendations for the authors):

      We sincerely thank the reviewer for their time and effort in providing a detailed review and assessment of our manuscript. A point-by-point response to the comments is provided below.

      L10: "fixed demographics": I find this term unclear, what does it mean, it should specify if the groups are with or without a queen.

      We thank the reviewer for the comment. The sentence is modified in the abstract, and definitions are later added in detail in the introduction (line numbers : 8-10) and the Materials and Methods section (Fixed demographics colonies). 

      “We experimentally compared nest excavation in colonies seeded from a single mated queen and allowed to grow for six months to excavation triggered by a catastrophic event in colonies with fixed demographics, where the age of each individual worker, including the queen, is known”.

      The details of the “fixed demographics” treatments were explained in the later portion of the text (line numbers: 58-61).

      L36: I think it is documented that younger individuals are the ones who involved in nest construction in many species.

      Previous studies on nest construction were predominantly performed on mature colonies of specific age demographics or rather mixed demographics, where age was not considered as a factor influencing nest construction. Some studies have speculated that young ants could be the most probable ones to dig, but this has not been experimentally verified to the best of our knowledge.

      L50: I do not think the colony should be called mature after only 6 months, given that colonies reach thousands of workers.

      The sentence is changed as suggested (line numbers : 56-57).

      “The "Colony-Maturation" experiment observed the development of colonies up to six months, starting from a single fertile queen and progressing to colonies with established worker populations.” 

      L60: Where was the queen introduced? It is specified in the Methods but a word here would be helpful.

      The detail is added as suggested (line numbers : 68-69).

      “We initiated colony maturation experiments by introducing a single mated queen and several brood items (n = 5, across all experiments) at random positions on the soil layer of the nest.”

      L106: Young vs Old workers 40 vs 171 days. Maybe cite a reference or provide a reason for the selection of those ages?

      Previous studies have shown that the Camponotus fellah queens can live up to 20 years, with one documented case reaching 26 years (50). To the best of our knowledge, specific data on worker longevity in Camponotus fellah colonies in natural conditions are lacking. Lab studies on Camponotus fellah (82) and other Camponotus species (50) suggest that workers can live for several months depending on environmental conditions, colony health, and caste-specific roles (e.g., minor vs. major workers). 

      We intentionally selected workers from two distinct age groups: younger ants (40 ± 16 days old) and older ants (171.56 ± 20 days old). These ages represent functionally different life stages - the younger group had completed about 25% of their expected lifespan at the start of the experiment, while the older group had lived through most of theirs (50, 82). This 4-fold age difference allowed us to compare excavation behaviors across fundamentally different phases of adult life.

      Our experiments lasted for 60-90 days, during which all participating workers continued to age. To ensure all ants remained alive throughout the experiments, and given the constraints of the experimental timeline, we selected young and old workers within the specified age range. 

      These details are added to the main text (line numbers :  124 -127), and the discussion (line numbers  : 278-282)

      L122-123: But usually ants can vary highly in their behaviours. Can the authors comment on their choice to consider an average, implying that all ants of the same age had the same digging rates?

      We thank the reviewer for the comment.

      In our experiments, we could not track each worker's activity over time. As described in the methods, we took snapshots of the nest structure over days and recorded the population size of the nest. Thus, we could not capture the activity of single ants in the nest as described in the response to major comments in the reviewed preprint.

      We agree that individual tracking of ants within our experimental setup would have been the ideal approach. Then, we could have taken the inter-individual variability of the digging activity into account. However, we were limited to doing so by the technical and practical limitations of the setup, such as; 

      (a) Continuous tracking of ants in our nests would have required a camera to be positioned at all times in front of the nest, which necessitates a light background. Since Camponotus fellah ants are subterranean, we aimed to allow them to perform nest excavation in conditions as close to their natural dark environment as possible. Additionally, implementing such a system in front of each nest would have reduced the sample sizes for our treatments.

      (b)The experimental duration of our colony maturation and fixed demographics experiments extended for up to six months (unprecedented durations in these kinds of measurements). These naturally limited our ability to conduct individual tracking while maintaining the identity of each ant based on the current design.

      To clarify this, we have added the following to the discussion (line numbers: 286-292).

      “Previous studies have demonstrated both homogeneous and heterogeneous workload distribution, with varying digging rates among ants (24,29,30,35). Studies showing heterogeneous workload distribution relied on continuous individual tracking of ants to quantify digging rates (35). However, this approach was not feasible in our current design due to the experimental durations of both our colony maturation and fixed demographics experiments. Additionally, sample size requirements naturally limited our ability to conduct continuous individual tracking during nest construction in our study.”

      L171: A line on how the nest structure was acquired and data extracted would be welcome here.

      The algorithm for the nest structure segmentation, data extraction, and analysis is added in detail to the SI section: Nest skeletonization, segmentation, and orientation. The line is modified (line numbers : 221-224) in the main text as suggested.

      “We compared nest architectures by segmenting raw nest images into chambers and tunnels (see SI Section: Nest Skeletonization, Segmentation, and Orientation). Chambers were identified as flat, horizontal structures, while tunnels were narrower and more vertical in orientation (see SI Fig. 9, SI Section: Nest Skeletonization, Segmentation, and Orientation)”.  

      Figure 3: Where does the data of the mean in panel C come from: is it the mean of the first 30 days, before the collapse? How is it comparable with the rest?

      We apologize to the reviewer for the confusion.

      In panel C, the mean values (solid stars and circles) for fixed-demography colonies (young/old groups) represent pre-collapse excavation areas. For colony maturation experiments (where no collapses were induced), we instead plot the mean saturated excavation area for each group size. This allows direct comparison of mean excavated areas across experimental conditions at equivalent colony sizes.

      To improve readability, the following sentences are added to the main text (line numbers : 139 - 146 ) 

      “We compared the saturated excavation areas (pre-collapse) from fixed-demographics experiments (young and old groups) with those from colony maturation experiments of the same colony sizes (Fig. 3C). We find that, for a given age cohort (young or old), the saturation areas increase linearly with the colony size (GLMM, F(35,37); p < 0.0001) (Fig. 3 C, SI. Fig 7 A). The observed proportional scaling between excavated area and group size aligns with previous studies, even though those studies did not explicitly account for age demographics (24, 29, 30). After normalizing the pre-collapse excavated area by group size for both young and old colonies, we found no significant difference in area per ant across group sizes (SI Fig. 5. A). This indicates that the excavated area per ant remains relatively constant within each demographic group”.

      L209-210: I would be more parsimonious in saying that the results presented prove that the target area decreases with age, as the individual behaviour of the ants was not monitored. Suggestion: rephrase to "the target of the group decreases with age".

      The sentence is rephrased as suggested (line numbers : 265-266).

      “Our results reveal that this target area of the group decreases linearly with age, such that young ants are more sensitive to shortages in space.”

      L246: Are C.fellah colonies really found with such few workers?

      Previous studies have speculated that mature Camponotus fellah colonies are a monogynous species typically founded by a single queen following nuptial flights (50,51,82), and can range from tens to thousands of workers. However, during the founding stage (as in our experiments), colonies naturally pass through smaller developmental sizes comparable to the matured colonies.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary:  

      The Szczupak lab published a very interesting paper in 2012 (Rodriquez et al. J Neurophysiol 107:1917-1924) on the effects of the segmentally-distributed non-spiking (NS) cell on crawl-related motoneurons. As far as I can tell, the working model presented in 2012, for how the non-spiking (NS) cell impacts the crawling motor pattern, is the same functional model presented in this new paper. Unfortunately, the Discussion does not address any of the findings in the previous paper or cite them in the context of NS alterations of fictive crawling. Aside from different-looking figures and some new analyses, the results and conclusions are the same. 

      Reviewers #1 and #2 called our attention to our failure to cite the Rodriguez et al. 2012 article in the context of the main goal of the present work. We do now explain how the present study is framed by the published work. See lines 74-79.

      In Rodriguez et al. 2012, we hypothesized that the inhibitory signals onto NS were originated in the motoneuron firing. We now cite this reference in line 104. In the current manuscript we further investigated the connection between the inhibitory signals onto NS and the motoneuron activity (Figure 2) and proved that the hypothesis was wrong. Thus, the model presented here differs from the one proposed in Rodriguez et al. 2012.

      In Rodriguez et al. 2012, we speculated that the inhibitory signals received by NS were transmitted to the motoneurons, but an important control was missing in that study. In the current study depolarization of NS during crawling is tested against a control series that allows to properly examine the hypothesis (lines 138-147). But, most important, because NS is so widely connected with the layer of motoneurons it was necessary to test the effect on other motoneurons during the fictive crawling cycle. We now explain this rationale in lines 249-257.

      Strengths: 

      The figures are well illustrated. 

      Weaknesses:  

      The paper is a mix of what appears to be two different studies and abruptly switches gears to examine how closely the crawl patterning is in the intact animal as compared to the fictive crawl patterning in the intact animal. Unfortunately, previous studies in other labs are not cited even though identical results have been obtained and similar conclusions were made. Thus, the novelty of the results is missing for those who are familiar with the leech preparation. The lack of appropriate citations and discussion of previous studies also deprives the scientific community of fully comprehending the impact of the data presented and the science it was built upon.  

      The main aim of the manuscript is to learn the role of premotor NS neurons in the crawling motor pattern studied using spike sorting in extracellular nerve recordings. This readout allows to  simultaneously monitor a larger number of units  than in any previous study. This approach aims to determine whether and how a recurrent inhibitory peripheral circuit is involved in coordinating or modulating the rhythmic motor pattern.

      Our rationale was that the known effect of NS on one particular motoneuron (DE-3) may have overlooked a more general effect on crawling (lines 253-257). Moreover, we wanted to investigate whether this effect was due to the recurrent inhibitory circuit or if other elements were involved, and to study whether the modulation was mediated by the recurrent synapse between NS and the motoneurons.

      In the context of this aim we studied the rhythmic activity of cell DE-3, together with motoneurons that fire in-phase and anti-phase, in isolated ganglia (Figure 4). To reveal the effect of NS manipulation we applied a quantitative analysis that showed the phase-specific effect of NS (Figure 6). 

      Given that this is the first study using a spike sorting algorithm to detect and describe the activity of motoneurons in nerve recordings we found it reasonable to compare these results with an in vivo study; thus, providing information to the general reader, that supports the correspondence between the ex vivo and the in vivo patterns.

      (1) Results, Lines 167-170: "While multiple extracellular recordings have been performed previously (Eisenhart et al., 2000), these results present the first quantitative analysis of motor units activated throughout the crawling cycle. The In-Phase units are expected to control the contraction stage by exciting or inhibiting the longitudinal or circular muscles, respectively, and the Anti-Phase units to control the elongation stage by exciting or inhibiting the circular or longitudinal muscles, respectively."  

      Reviewer: The first line above is misleading. The study by Puhl and Mesce (2008, J. Neurosci, 28:4192- 420) contains a comprehensive analysis of the motoneurons active during fictive crawling with the aim of characterizing their roles and phase relationships and solidifying the idea that the oscillator for crawling resides in a single ganglion. Intracellular recordings from a number of key crawl-related motoneurons were made in combination with extracellular recordings of motoneuron DE-3, a key monitor of crawling. In their paper, it was shown that motoneurons AE, VE-4, DI-1, VI-2, and CV were all correlated with crawl activity, and fired repeatedly either in phase or out-of-phase with DE-3. They were shown to be either excitatory or inhibitory. At a minimum, the above paper should be cited. 

      The sentence in the submitted manuscript explicitly refers to the quantitative analysis of extracellular recordings, but we recognize that it may lead to confusion. We have now added a clarification (lines 197-199). 

      The article by Puhl and Mesce 2008 shows very nice intracellular recordings of the AE, CV, VE-4, DE-3, DI-1, and Vi-2, accompanied by extracellular recordings of DE-3 in the DP nerve. In all cases, there is only one intracellular recording paired with the DP nerve recording.

      While it is possible to perform up to 3-4 simultaneous intracellular recordings, these are technically challenging, and more so when the recordings have to last 10-20 minutes. Due to this difficulty, and because our objective was to record multiple units simultaneously in order to comprehensively describe the different crawling stages, we implemented the spike sorting analysis on multiple extracellular recordings. This approach enabled us to reliably obtain multiple units per experiment and thus execute a quantitative analysis of the activity of each identified unit.

      The article by Puhl and Mesce 2008 mentions several quantitative aspects of the neurons that fire in-phase or out-of-phase with DE-3, but, as far as we understand, there is no figure that summarizes activity levels and span in the way Figures 4 and 6 do in the current manuscript. To the best of our knowledge, no previous work renders this information.

      It is very important for us to emphasize that the work by Puhl and Mesce was seminal for our research. We cited it four times in the original manuscript and 10 times in the present version. But, like any important discovery, it sets the ground for further work that can refine certain measurements that in the original discovery were not central.

      This is why we believe that the cited sentence in our manuscript is not misleading.  However, to comply with the requirement of Reviewer #1, we added a sentence preceding the mentioned paragraph (lines 185-187) that acknowledges the description made using intracellular recordings, and explains the need for implementing the approach we chose.

      The submitted paper would be strengthened if some of these previously identified motoneurons were again recorded with intracellular electrodes and concomitant NS cell stimulation. The power of the leech preparation is that cells can be identified as individuals with dual somatic (intracellular) and axonal recordings (extracellular). 

      Most of the motoneurons mentioned by Reviewer #1 are located on the opposite side (dorsal) of the ganglion to NS (ventral), and therefore, simultaneous intracellular recordings in the context of fictive crawling are challenging.

      In the publication of Rodriguez et al. 2009, Mariano Rodriguez did manage to record NS from the dorsal side together with DE-3 and MN-L (!) and this led to the discovery that these motoneurons are electrically coupled, but the recurrent inhibitory circuit masks this interaction. Repeating this type of experiments during crawling, which requires stable recordings for around 15 minutes, is not a reasonable experimental setting.

      Rodriguez et al. 2012 shows intracellular recordings of motoneurons AE and CV during crawling in conjunction with NS, and their activity presented the expected correlation. 

      The shortfall of this aspect of the study (Figure 5) is that the extracellular units have not been identified here. 

      The Reviewer is right in that the extracellular units have not been identified in terms of cell identity. As we explained earlier, most motoneurons are on the opposite side (ventral/dorsal) of the ganglion relative to NS. 

      However, we do characterize the units in terms of the nerve through which they project to the periphery and their activity phase. In lines 345-349 we use this information and, based on published work, we propose possible cellular identities of the different units.

      In xfact, these units might not even be motoneurons. 

      We are surprised by this comment. The classical work of Ort and collaborators (1974) showed that spikes detected in extracellular nerve recordings were emitted by specific motoneurons, and several previous publications have validated extracellular nerve recordings as a means to study fictive motor patterns (Wittenberg & Kristan 1992, Shaw & Kristan 1997, Eisenhart et al. 2000).

      For further reassurance, we only took in consideration units whose activity was locked to DE3; any non-rhythmical activity was filtered out (see lines 433-435). 

      They could represent activity from the centrally located sensory neurons, dopamine-modulated afferent neurons or peripherally projecting modulatory neurons. 

      Peripheral nerves also contain axons from sensory neurons. However, in a previous article, we studied the activity of mechanosensory neurons (Alonso et al. 2020) and showed that they remain silent during crawling. Moreover, the low-threshold T sensory neurons are inhibited in phase with DE-3 bursts and NS IPSPs (Kearney et al. 2022). Alonso et al. 2000 showed that spiking activity of T cells affects the crawling motor pattern, revealing the relevance of keeping them silent.

      What does the Reviewer mean by “dopamine-modulated afferents”? We are not aware of this category of leech neurons.

      The neuromodulatory Rz neurons project peripherally through the recorded nerves, but intracellular recordings of these neurons from our lab show no rhythmic activity in those cells during dopamine-induced crawling.

      Essentially, they may not have much to do with the crawl motor pattern at all.

      Does the Reviewer consider that neurons engaged in a coherent rhythmic firing could be unrelated to the pattern? As indicated above, the units reported in our manuscript were selected because dopamine evoked their rhythmic activity, locked to DE-3. 

      Does the Reviewer consider that dopamine could evoke spurious neuronal activity?

      (2) Results Lines 206-210: "with the elongation and contraction stages of in vivo behavior. However the isometric stages displayed in vivo have no obvious counterpart in the electrophysiological recordings. It is important to consider that the rhythmic movement of successive segments along the antero-posterior axis of the animal requires a delay signal that allows the appropriate propagation of the metachronal wave, and this signal is probably absent in the isolated ganglion." 

      Reviewer: The so-called isometric stages, indeed, have an electrophysiological counterpart due in part to the overlapping activities across segments. This submitted paper would be considerably strengthened if it referred to the body of work that has examined how the individual crawl oscillators operate in a fully intact nerve cord, excised from the body but with all the ganglia (and cephalic ganglion) attached. Puhl and Mesce 2010 (J. Neurosci 30: 2373-2383) and Puhl et al. 2012 (J. Neurosci, 32:17646 -17657) have shown that "appropriate propagation of the metachronal wave" requires the brain, especially cell R3b-1. They also show that the long-distance projecting cell R3b-1 synapses with the CV motoneuron, providing rhythmic excitatory input to it.  

      We would like to draw the Reviewer’s attention to the fact that Puhl and Mesce 2008, 2010 and Puhl et al. 2012 characterized crawling in intact (or nearly intact) animals considering the whole body. In our in vivo analysis, we studied the changes in length of the whole animal and of sections demarcated by the drawn points, as described in the Materials and Methods/Behavioral

      Experiments. Because of this different analysis, we defined “isometric” stages as those in which a given section of the animal does not change its length. We now clarify this (line 230).

      In the paragraph cited by the Reviewer, we intended to state that, in the context of our study, the intersegmental lag caused by the coordinating mechanisms has no counterpart “in the electrophysiological recordings of motoneurons in the isolated ganglia”. We have now completed this idea with the expression underlined in the previous sentence (line 231).

      As the Reviewer indicates, in the intact nerve cord the behavioral isometric stages correspond to the “waiting time” between segments. We did refer to the metachronal order but did not cite the articles by Puhl and Mesce 2010 and Puhl et al. 2012; we now do so (lines 234).

      For this and other reasons, the paper would be much more informative and exciting if the impacts of the NS cell were studied in a fully intact nerve cord. Those studies have never been done, and it would be exciting to see how and if the effects of NS cell manipulation deviated from those in the single ganglion.  

      The Reviewer may consider that a systematic analysis of multiple nerves in several ganglia along the whole nerve cord would have been a different enterprise than the one we carried out. The Reviewer is right in recognizing the interest of such study, but in our opinion, the value of the present work lies in presenting a thorough quantitative analysis of multiple nerves to demonstrate its usefulness for the study of the network underlying leech crawling. In this manuscript, we used it to analyze the role of the premotor NS neuron. Without the recording of units firing in-phase and out-ofphase with DE-3, we would have been unable to assess the span of NS effects.

      (3) Discussion Lines 322-324. "The absence of descending brain signals and/or peripheral signals are assumed as important factors in determining the cycle period and the sequence at which the different behavioral stages take place." 

      Reviewer: The authors could strengthen their paper by including a more complete picture of what is known about the control of crawling. For example, Puhl et al. 2012 (J Neurosci, 32:17646-17657) demonstrated that the descending brain neuron R3b-1 plays a major role in establishing the crawlcycle frequency. With increased R3b-1 cell stimulation, DE-3 periods substantially shortened throughout the entire nerve cord. Thus, the importance of descending brain inputs should not be merely assumed; empirical evidence exists.  

      We now strengthen the concept using “known descending brain signals” (line 358) and cite Puhl et al. 2012. We believe that extending the discussion to cell R3b-1 does not contribute meaningfully to the focus of this manuscript.

      (4) Discussion Lines 325-327: "the sequence of events, and the proportion of the active cycle dedicated to elongation and contraction were remarkably similar in both experimental settings. This suggests that the network activated in the isolated ganglion is the one underlying the motor behavior." 

      Reviewer: The results and conclusions drawn in the current manuscript mirror those previously reported by Puhl and Mesce (2008, J. Neurosci, 28:4192- 420) who first demonstrated that the essential pattern-generating elements for leech crawling were contained in each of the segmental ganglia comprising the nerve cord. Furthermore, the authors showed that the duty cycle of DE-3, in a single ganglion treated with dopamine, was statistically indistinguishable from the DE-3 duty cycle measured in an intact nerve cord showing spontaneous fictive crawling, in an intact nerve cord induced to crawl via dopamine, and in the intact behaving animal. What was statistically significant, however, was that the DE-3 burst period was greatly reduced in the intact animal (i.e., a higher crawl frequency), which was replicated in the submitted paper.  

      There is no doubt that the article by Puhl and Mesce 2008 is seminal to the work we present here. The Reviewer seems to suggest that we do not recognize the value of this work. The contrary is true, all our related papers cite this important breakthrough. We cite the paper very early in the article in the Introduction (see lines 51 and 52-53). Likely, we would like the Reviewer to recognize the novelty of the current report. To clarify what has been shown and what is new in our manuscript, considerer the following:

      i. Figures 1-6 in Puhl and Mesce 2008 provide representative intracellular recordings that describe neurons that fire in phase and out of phase relative to DE-3. Some general measurements are given in the text, but none of these figures quantify the relative activity of neurons that fire in different stages; only DE-3 activity was quantified. A quantitative description of multiple units active in phase and out of phase with DE-3 is presented here for the first time, are we wrong? This quantification is particularly relevant when assessing how a treatment affects the function of the circuit.

      ii. Regarding the cycle period, we referred to the work from the Kristan lab, which reported this value long before the requested reference. We now cite Puhl and Mesce 2008 in lines 222 regarding in vivo measurements, and in line 221 regarding isolated ganglia.

      iii. Regarding the duty cycle: 

      Puhl and Mesce 2008 measured the duty cycle of DE-3 in three configurations: a. spontaneous whole cord, b. DA-mediated whole cord and c. DA mediated single ganglion crawling. However, it does not report the duty cycle of neurons out-of-phase with DE-3. Our current manuscript carried out this analysis. One could argue that the silence between DE-3 bursts captures that value, but this is a speculation that needed a proper measure.

      Puhl and Mesce 2008 does not indicate the duty cycle of the contraction and elongation stages in vivo. Our current manuscript does. 

      Therefore, the sentence cited by the Reviewer refers to data presented in this manuscript, and not in any prior manuscript. It is true that Puhl and Mesce 2008 inspire the intuition that the sentence is true, but does not present the data that the current manuscript does.

      Finally, our study focused only on the body sections corresponding to the same segmental range used in the ex vivo experiments, rather than the whole animal. The comparison was made only to validate that the duty cycles of neurons firing in phase and out of phase with DE-3 matched the dynamic stages in the studied sections of the leech (line 364).

      In my opinion, the novelty of the results reported in the submitted manuscript is diminished in the light of previously published studies. At a minimum, the previous studies should be cited, and the authors should provide additional rationale for conducting their studies. They need to explain in the discussion how their approach provided additional insights into what has already been reported.  

      Throughout our reply, we have provided a detailed explanation of the rationale and necessity behind each experiment. Following the Reviewer’s suggestion, we have rephrased the research objectives, included what is known from our previously published work, and highlighted the substantial new data contributed by the present study. See lines 80-85. 

      Additionally, we further cite our published article in lines 93, 104, 138, 146 and 250. 

      Reviewer #2 (Public review):  

      The paper is well-written overall. The findings are clearly presented, and the data seems solid overall. I do have, however, a few major and some minor comments representing some concerns.

      My major comments are below. 

      (1) This may seem somewhat semantic, yet, it has implications on the way the data is presented and moreover on the conclusions drawn - a single ganglion cannot show fictive crawling. It can demonstrate rhythmic patterns of activity that may serve in the (fictive) crawling motor pattern. The latter is a result of the intrinsic within single-ganglion connectivity AND the inter-ganglia connections and interactions (coupling) among the sequential ganglia. It may be affected by both short-range and long-range connections (e.g., descending inputs) along the ganglia chain. 

      Semantics is not a trivial issue in science communication. It entails metaphors that enter the bibliography as commonly used “shortcuts” to a complex concept that are adopted by a community of researchers. And yes, indeed, they can be misleading.

      However, if recording the activity in an isolated ganglion shows that a wide group of motoneurons, that control known muscle movements, presents a rhythmic output that maintains the appropriate cycle period and phase relationships, the “shortcut” is incomplete but could be valid (Puhl and Mesce 2008). If we were to include the phase lag component, a single ganglion cannot generate the fictive motor output.

      Because any new study builds knowledge on the basis of the cited bibliography, the way we name concepts is a sensitive point. Adopting the terminology used by previous publications (Puhl and Mesce 2008) seems important to allow readers to follow the development of knowledge. However, attending the observation made by Reviewer #2, we included a sentence clarifying that the concept “fictive crawling” does not include intersegmental connectivity (lines 54-57)

      (2) The point above is even more critical where the authors set to compare the motor pattern in single ganglia with the intact animals. It would have made much more sense to add a description of the motor pattern of a chain of interconnected ganglia. The latter would be expected to better resemble the intact animal. Furthermore, this project would have benefitted from a three-way comparison (isolated ganglion-interconnected ganglia-intact animal.  

      As we answered to Reviewer #1, the present manuscript does not intend to present a thorough study on how the activity in the isolated nervous system compares with the animal behavior. To do so we would have needed to perform a completely different set of experiments. To better define the relevance of our comparison with the in vivo experiments we rephrased the objective of the behavioral analysis (lines 197-199).

      The main aim of the manuscript is to learn the role of premotor NS neurons in the crawling motor pattern studied using a readout (spike sorting in extracellular nerve recordings) that allows simultaneous screening of a larger number of units than in any previous study, in order to determine whether and how a recurrent inhibitory peripheral circuit is involved in coordinating or modulating the rhythmic motor pattern.

      Our rationale was that the known effect of NS on one particular motoneuron (DE-3) may have overlooked a more general effect on crawling (lines 253-257). Moreover, we wanted to investigate whether this effect was due to the recurrent inhibitory circuit or if other elements were involved, and to study whether the modulation was mediated by the recurrent synapse between NS and the motoneurons.

      In the context of this aim we studied the rhythmic activity of cell DE-3, together with motoneurons that fire in-phase and anti-phase, in isolated ganglia (Figure 4). To reveal the effect of NS manipulation we applied a quantitative analysis that showed the phase-specific effect of NS (Figure 6). 

      Given that this is the first study using a spike sorting algorithm to detect and describe the activity of motoneurons in nerve recordings we found it reasonable to compare these results with an in vivo study; thus, providing information to the general reader, that supports the correspondence between the ex vivo and the in vivo patterns.

      (3) Two previous studies by the same group are repeatedly mentioned (Rela and Szczupak, 2003; Rodriguez et al., 2009) and serve as a basis for the current work. The aim of one of these previous studies was to assess the role of the NS neurons in regulating the function of motor networks. The other (Rodriguez et al., 2009) reported on a neuron (the NS) that can regulate the crawling motor pattern. LL 71-74 of the current report presents the aim of this study as evaluating the role of the known connectivity of the premotor NS neuron in shaping the crawling motor pattern. The authors should make it very clear what indeed served as background knowledge, what exactly was known about the circuitry beforehand, and what is different and new in the current study. 

      Rela and Szczupak 2003 and Rodriguez et al. 2009 analyze the interactions of motoneurons with NS. We believe that Reviewer #2 refers here to Rodriguez et al. 2012. A similar observation was made by Reviewer #1. Below, we copy the answer previously stated:

      Following the Reviewer’s suggestion, we have rephrased the research objectives, included what is known from our previously published work, and highlighted the substantial new data contributed by the present study. See lines 80-85. 

      Additionally, we further cite our published article in lines 93, 104, 138, 146 and 250. 

      Reviewer #1 (Recommendations for the authors):  

      Please edit for correct word usage. 

      Reviewer #2 (Recommendations for the authors):  

      Minor Concerns 

      (1) LL33-36: These lines are somewhat vague and non-informative. Why is the functional organization of motor systems an open question? What are the mechanisms at the level of the nerve cord that are an open question? Maybe be more explicit? 

      We did as suggested (lines 30-32).

      (2) L62: The homology between the NS neurons and the vertebrate Renshaw cells is mentioned already in the Abstract and here again. While a reference is provided (citing the lead author of this current work), the reader would benefit from some further short words of explanation regarding the alleged homology. 

      We included a description of Renshaw cell connectivity (lines 64-65).

      (3) LL90-92: The NS recording in Figure 1 (similar to Figure 3 in Rodriguez et al.) demonstrates clear distinct IPSPs. Could these be correlated with DE-3 spikes? 

      We investigated this correlation in detail and the answer is that there is no strictly a 1:1 DE-3 spike to IPSP correlation. NS receives inputs from other dorsal and ventral excitors of longitudinal muscles, and the NS trace is too “noisy” to reflect any short-term correlation. Originally we proposed that the NS IPSPs were due to the polysynaptic interaction between the MN and NS (Rodríguez et al. 2012). However, the present work demonstrates that the IPSPs in NS are caused by a source upstream from the MNs. 

      (4) LL145-145: Do you mean - inhibitory signals FROM NS premotor neurons? Not clear. 

      We see the confusion, and we rewrote the sentence (lines 164). We hope it is clearer now: “…inhibitory signals onto NS premotor neurons were transmitted to DE-3 motoneurons via rectifying electrical synapses and counteracted their excitatory drive during crawling, limiting their firing frequency.”

      (5) LL153-154: Why isn't AA included in Figure 4A? 

      Reading our original text, the Reviewer #1 is right in expecting to see the AA recording. We changed the sentence: “we performed extracellular recordings of DP along with AA and/or PP root nerves” (lines 171-172).

      We dissected the three nerves but, unfortunately, we did not always obtain good recordings from the three of them.

      (6) LL237-238: The statistical significance (B- antiphase) is not clear. Furthermore, with N of 7-8, I'm not sure the parametric tests utilized are appropriate. 

      Regarding the Reviewer's concern about the tests, please note that all the assumptions made for each model were tested (see now Materials and Methods lines 466-467).The information on each model is provided in Supplementary Table 2 under the column 'Model, random effect,' which specifies whether a Linear Mixed Model (LMM) or a Generalized Linear Mixed Model (GLMM) was implemented. For GLMMs, the corresponding distribution and link function are also specified. For the analysis of Max bFF of Anti-Phase motor units, we found a significant interaction between epoch and treatment, indicating a difference between treatments. This is indicated on the left of the y-axis (##). In control experiments, all three comparisons (pre-test, pre-post, test-post) show significant differences in Max bFF: this variable decreased (slightly but significantly) along the subsequent epochs, suggesting a change over time. We now corrected the text to indicate that these changes were small (line 268). In contrast, Max bFF in depo experiments remained stable between pre-test and pre-post, but significantly decreased between the depo and post epochs. Thus, in our view the comparison between control and the test supports the conclusion that NS depolarization was limited to counteracting this decrease (lines 270-273). Supplementary Table 2 provides the significance and modeled estimated ratio for each comparison in the column for pairwise simple contrasts.

      Thanks to this question, we realized that the nomenclature used in the table for the epochs (pre - depo - post) needed to be changed to pre - test - post, and we have now corrected it.

      (7) LL240-241: I fail to see a difference from Control. 

      For the Relative HW of In-Phase units, we also found a significant interaction between epoch and treatment, indicating a difference between treatments, as denoted to the left of the y-axis (#). Then, the significance of the comparisons across epochs within each treatment are shown in the figure (*). What is important to notice is that obtaining the same significance for each treatment does not imply identical results, but we failed to describe this in our original text and we do now in lines 275-279.

      (8) LL244-245: I must admit that Table 2 is beyond me. Maybe add some detail or point out to the reader what is important (if at all). 

      We have now clarified what each column of the tables indicates in the corresponding legends. 

      Here, we also share an insight into how the experiments were designed and analyzed:

      To account for possible temporal drifts of the variables during the recordings that could mask or confuse the results, we compared two experimental series: one in which NS was subjected to depolarizing current pulses (depo), and another series (ctrl) in which the neurons were not depolarized.

      The statistical analysis was made using Linear Mixed Models (LMMs) or Generalized Linear Mixed Models (GLMMs). In these analyses treatments and epochs are used as explanatory variables to evaluate the interaction between these factors. These models allow us to determine whether changes in each variable across epochs differ depending on the treatment. For example, whether the variation in firing frequency from pre to test to post differs between control experiments and those in which NS was depolarized.

      A significant interaction between treatment and epoch indicates that NS depolarization affected the variable. In such cases, we performed pairwise comparisons between epochs (pre-test, test-post, pre-post) within each treatment. In contrast, the absence of a significant interaction can result from two possibilities: either the variable did not change across epoch in either treatment, or a similar temporal drift occurred in both cases.

      (9) LL245-256: Move this paragraph to the discussion. 

      Because we introduced a rationale for the experiments described in Figure 6 (lines 282-284) the paragraph was mostly removed, but the part that supports the methodological approach was left.

      (10)  LL259-260: see my second minor point above. This is explained in LL270-272 for the first time. 

      We amended according to comment (2).

      (11) Figures: The quantitative analysis shown in Figure 3B is very useful. Why isn't this type of analysis utilized for the comparisons shown in Figures 4 and 6? 

      We chose different ways of plotting the data based on their nature. In Figure 3B, we present data from an identified neuron (DE-3) recorded in different experiments. In contrast, in Figure 6 we analyze data from neurons classified into the same group based on their activity during the fictive crawling cycle, but their individual identity was not ascertained. Therefore, we consider it important to plot the results for each unit individually, to assess the effect of temporal drift and NS depolarization.

      (12) Figures: Figure 7 is meant to be compared to Figure 1C; the point being the addition of an inhibitory connection onto the NS neuron. Why are other details of the figure also different (different colored M)? 

      While Figure 1C illustrates the known connection between NS and both DE-3 and CV motoneurons, Figure 7 shows the connections between NS and the different groups of motor units described in this study. The units are represented in the circuit using the same colors that identify them in Figures 4 and 6. Since the CV motoneuron was not recorded in this study, the circuit represents the AntiPhase neurons but does not identify them with CV. Figure 7 legend now clarifies what the colors represent, and Figure 1C has been updated to match the same color scheme.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      This work addresses an important question in the field of Drosophila aggression and mating- prior social isolation is known to increase aggression in males by increased lunging, which is suppressed by group housing (GH). However, it is also known that single-housed (SH) males, despite their higher attempts to court females, are less successful. Here, Gao et al., developed a modified aggression assay, to address this issue by recording aggression in Drosophila males for 2 hours, over a virgin female which is immobilized by burying its head in the food. They found that while SH males frequently lunge in this assay, GH males switch to higher intensity but very low-frequency tussling. Constitutive neuronal silencing and activation experiments implicate cVA sensing Or67d neurons promoting high-frequency lunging, similar to earlier studies, whereas Or47b neurons promote low-frequency but higher intensity tussling. Using optogenetic activation they found that three pairs of pC1 neurons- pC1SS2 increase tussling. While P1a neurons, previously implicated in promoting aggression and courtship, did not increase tussling in optogenetic activation (in the dark), they could promote aggressive tussling in thermogenetic activation carried out in the presence of visible light. It was further suggested, using a further modified aggression assay that GH males use increased tussling and are able to maintain territorial control, providing them mating advantage over SI males and this may partially overcome the effect of aging in GH males.

      Strengths

      Using a series of clever neurogenetic and behavioral approaches, subsets of ORNs and pC1 neurons were implicated in promoting tussling behaviors. The authors devised a new paradigm to assay for territory control which appears better than earlier paradigms that used a food cup (Chen et al, 2002), as this new assay is relatively clutter-free, and can be eventually automated using computer vision approaches. The manuscript is generally well-written, and the claims made are largely supported by the data.

      Thank you for your precise summary of our study, and being very positive on the novelty and significance of the study.

      Weaknesses

      I have a few concerns regarding some of the evidence presented and claims made as well as a description of the methodology, which needs to be clarified and extended further.

      (1) Typical paradigms for assaying aggression in Drosophila males last for 20-30 minutes in the presence of nutritious food/yeast paste/females or all of these (Chen et al. 2002, Nilsen et al., 2004, Dierick et al. 2007, Dankert et al., 2009, Certel & Kravitz 2012). The paradigm described in Figure 1 A, while important and more amenable for video recording and computational analysis, seems a modification of the assay from Kravitz lab (Chen et al., 2002), which involved using a female over which males fight on a food cup. The modifications include a flat surface with a central food patch and a female with its head buried in the food, (fixed female) and much longer adaptation and recording times respectively (30 minutes, 2 hours), so in that sense, this is not a 'new' paradigm but a modification of an existing paradigm and its description as new should be appropriately toned down. It would also be important to cite these earlier studies appropriately while describing the assay.

      We now toned down the description of the paradigm and cited more related references.

      (2) Lunging is described as a 'low intensity' aggression (line 111 and associated text), however, it is considered a mid to high-intensity aggressive behavior, as compared to other lower-intensity behaviors such as wing flicks, chase, and fencing. Lunging therefore is lower in intensity 'relative' to higher intensity tussling but not in absolute terms and it should be mentioned clearly.

      We have modified the description as suggested.

      (3) It is often difficult to distinguish faithfully between boxing and tussling and therefore, these behaviors are often clubbed together as box, tussle by Nielsen et al., 2004 in their Markov chain analysis as well as a more detailed recent study of male aggression (Simon & Heberlein, 2020). Therefore, authors can either reconsider the description of behavior as 'box, tussle' or consider providing a video representation/computational classifier to distinguish between box and tussle behaviors.

      Indeed, we could not faithfully distinguish boxing and tussling. To address this concern, we now made textual changes in the result section we occasionally observed the high-intensity boxing and tussling behavior in male flies, which are difficult to distinguish and hereafter simply referred to as tussling.

      We also added this information in the Materials and Methods section Tussling is often mixed with boxing, in which both flies rear up and strike the opponent with forelegs. Since boxing is often transient and difficult to distinguish from tussling, we referred to the mixed boxing and tussling behavior simply as tussling.

      (4) Simon & Heberlein, 2020 showed that increased boxing & tussling precede the formation of a dominance hierarchy in males, and lunges are used subsequently to maintain this dominant status. This study should be cited and discussed appropriately while introducing the paradigm.

      We now cited this important study in both the Introduction and Discussion sections.

      (5) It would be helpful to provide more methodological details about the assay, for instance, a video can be helpful showing how the males are introduced in the assay chamber, are they simply dropped to the floor when the film is removed after 30 minutes (Figures 1-2)?

      We now provided more detailed description about behavioral assays and how we analyze them. For example All testers were loaded by cold anesthesia. After a 30-minute adaptation, the film was gently removed to allow the two males to fell into the behavioral chamber, and the aggressive behavior was recorded for 2 hours.

      (6) The strain of Canton-S (CS) flies used should be mentioned as different strains of CS can have varying levels of aggression, for instance, CS from Martin Heisenberg lab shows very high levels of aggressive lunges. Are the CS lines used in this study isogenized? Are various genetic lines outcrossed into this CS background? In the methods, it is not clear how the white gene levels were controlled for various aggression experiments as it is known to affect aggression (Hoyer et al. 2008).

      We used the wtcs flies from Baker lab in Janelia Research Campus, and are not sure where they are originated. We appreciate your concern on the use of wild-type strains as they may show different fighting levels, but this study mainly used wild-type strains to compare behavioral differences between SH and GH males. All flies tested in this study are in w+ background, based on w+ balancers flies but are not backcrossed. We have listed detailed genotypes of all tested flies in Table S1 in the revised manuscript.

      (7) How important it is to use a fixed female for the assay to induce tussling? Do these females remain active throughout the assay period of 2.5 hours? Is it possible to use decapitated virgin females for the assay? How will that affect male behaviors?

      We used a fixed female to restrict it in the center of food. These females remain active throughout the assay as their legs and abdomens can still move. Such design intends to combine the attractive effects from both female and food. One can also use decapitated females, but in this case, males can push the decapitated female into anywhere in the behavioral chamber. The logic to use fixed females has now been added in the Materials and Methods section of the revised manuscript.

      (8) Raster plots in Figure 2 suggest a complete lack of tussling in SH males in the first 60 minutes of the encounter, which is surprising given the longer duration of the assay as compared to earlier studies (Nielsen et al. 2004, Simon & Heberlein, 2020 and others), which are able to pick up tussling in a shorter duration of recording time. Also, the duration for tussling is much longer in this study as compared to shorter tussles shown by earlier studies. Is this due to differences in the paradigm used, strain of flies, or some other factor? While the bar plots in Figure 2D show some tussling in SH males, maybe an analysis of raster plots of various videos can be provided in the main text and included as a supplementary figure to address this.

      Indeed, tussling is very low in SH males in our paradigm, which may be due to different genetic backgrounds and behavioral assays. Since tussling behavior is a rare fighting form, it is not surprising to see variation between studies from different labs. Nevertheless, this study compared tussling behaviors in SH and GH males, and our finding that GH males show much more tussling behaviors is convincing. The longer duration of tussling in our paradigm may also be due to the modified behavioral paradigm, which also supports that tussling is a high-level fighting form.

      (9) Neuronal activation experiments suggesting the involvement of pC1SS2 neurons are quite interesting. Further, the role of P1a neurons was demonstrated to be involved in increasing tussling in thermogenetic activation in the presence of light (Figure 4, Supplement 1), which is quite important as the role of vision in optogenetic activation experiments, which required to be carried out in dark, is often not mentioned. However, in the discussion (lines 309-310) it is mentioned that PC1SS2 neurons are 'necessary and sufficient' for inducing tussling. Given that P1a neurons were shown to be involved in promoting tussling, this statement should be toned down.

      Thank you for this important comment. We now toned down the statement on pC1SS2 function.

      (10) Are Or47b neurons connected to pC1SS2 or P1a neurons?

      We conducted pathway analysis in the FlyWire electron microscopy database to investigate the connection between Or47b neurons and pC1 neurons. The results indicate that at least three levels of interneurons are required to establish a connection from Or47b neurons to pC1 neurons. Although the FlyWire database currently only contains neuronal data from female brains, they provide a reference for circuit connect in males.

      (11) The paradigm for territory control is quite interesting and subsequent mating advantage experiments are an important addition to the eventual outcome of the aggressive strategy deployed by the males as per their prior housing conditions. It would be important to comment on the 'fitness outcome' of these encounters. For instance, is there any fitness advantage of using tussling by GH males as compared to lunging by SH males? The authors may consider analyzing the number of eggs laid and eclosed progenies from these encounters to address this.

      Thank you for this suggestion. We agree with you and other reviewers that increased tussling behaviors correlate with better mating competition, but it is difficult for us to make a direct link between them. Thus, in the revised manuscript, we prefer to tone down this statement but not expanding on this part.

      Reviewer #2 (Public review):

      Summary

      Gao et al. investigated the change of aggression strategies by the social experience and its biological significance by using Drosophila. Two modes of inter-male aggression in Drosophila are known lunging, high-frequency but weak mode, and tussling, low-frequency but more vigorous mode. Previous studies have mainly focused on the lunging. In this paper, the authors developed a new behavioral experiment system for observing tussling behavior and found that tussling is enhanced by group rearing while lunging is suppressed. They then searched for neurons involved in the generation of tussling. Although olfactory receptors named Or67d and Or65a have previously been reported to function in the control of lunging, the authors found that these neurons do not function in the execution of tussling, and another olfactory receptor, Or47b, is required for tussling, as shown by the inhibition of neuronal activity and the gene knockdown experiments. Further optogenetic experiments identified a small number of central neurons pC1[SS2] that induce the tussling specifically. In order to further explore the ecological significance of the aggression mode change in group rearing, a new behavioral experiment was performed to examine territorial control and mating competition. Finally, the authors found that differences in the social experience (group vs. solitary rearing) are important in these biologically significant competitions. These results add a new perspective to the study of aggressive behavior in Drosophila. Furthermore, this study proposes an interesting general model in which the social experience-modified behavioral changes play a role in reproductive success.

      Strengths

      A behavioral experiment system that allows stable observation of tussling, which could not be easily analyzed due to its low frequency, would be very useful. The experimental setup itself is relatively simple, just the addition of a female to the platform, so it should be applicable to future research. The finding about the relationship between the social experience and the aggression mode change is quite novel. Although the intensity of aggression changes with the social experience was already reported in several papers (Liu et al., 2011, etc), the fact that the behavioral mode itself changes significantly has rarely been addressed and is extremely interesting. The identification of sensory and central neurons required for the tussling makes appropriate use of the genetic tools and the results are clear. A major strength of the neurobiology in this study is the finding that another group of neurons (Or47b-expressing olfactory neurons and pC1[SS2] neurons), distinct from the group of neurons previously thought to be involved in low-intensity aggression (i.e. lunging), function in the tussling behavior. Further investigation of the detailed circuit analysis is expected to elucidate the neural substrate of the conflict between the two aggression modes.

      Thank you for the acknowledgment of the novelty and significance of the study, and your suggestions for improving the manuscript.

      Weaknesses

      The experimental systems examining the territory control and the reproductive competition in Figure 5 are novel and have advantages in exploring their biological significance. However, at this stage, the authors' claim is weak since they only show the effects of age and social experience on territorial and mating behaviors, but do not experimentally demonstrate the influence of aggression mode change itself. In the Abstract, the authors state that these findings reveal how social experience shapes fighting strategies to optimize reproductive success. This is the most important perspective of the present study, and it would be necessary to show directly that the change of aggression mode by social experience contributes to reproductive success.

      We agree that our data did not directly show that it is the change of aggression mode that results in territory and reproductive advantages in GH males. To address the concern, we have toned down the statement throughout the manuscript. For example, we made textual changes in the abstract as following

      Moreover, shifting from lunging to tussling in socially enriched males is accompanied with better territory control and mating success, mitigating the disadvantages associated with aging. Our findings identify distinct sensory and central neurons for two fighting forms and suggest how social experience shapes fighting strategies to optimize reproductive success.

      In addition, a detailed description of the tussling is lacking. For example, the authors state that the tussling is less frequent but more vigorous than lunging, but while experimental data are presented on the frequency, the intensity seems to be subjective. The intensity is certainly clear from the supplementary video, but it would be necessary to evaluate the intensity itself using some index. Another problem is that there is no clear explanation of how to determine the tussling. A detailed method is required for the reproducibility of the experiment.

      Thank you for this important suggestion. We now analyzed duration of tussling and lunging, and found that a lunging event is often very short (less than 0.2s), while a tussling event may last from seconds to minutes. This new data is added as Figure 2G. In addition, we also provided more detailed methods regarding to tussling behavior

      .<br /> Reviewer #3 (Public review):

      In this manuscript, Gao et al. presented a series of intriguing data that collectively suggest that tussling, a form of high-intensity fighting among male fruit flies (Drosophila melanogaster) has a unique function and is controlled by a dedicated neural circuit. Based on the results of behavioral assays, they argue that increased tussling among socially experienced males promotes access to resources. They also concluded that tussling is controlled by a class of olfactory sensory neurons and sexually dimorphic central neurons that are distinct from pathways known to control lunges, a common male-type attack behavior.

      A major strength of this work is that it is the first attempt to characterize the behavioral function and neural circuit associated with Drosophila tussling. Many animal species use both low-intensity and high-intensity tactics to resolve conflicts. High-intensity tactics are mostly reserved for escalated fights, which are relatively rare. Because of this, tussling in the flies, like high-intensity fights in other animal species, has not been systematically investigated. Previous studies on fly aggressive behavior have often used socially isolated, relatively young flies within a short observation duration. Their discovery that 1) older (14-days-old) flies tend to tussle more often than younger (2-days-old) flies, 2) group-reared flies tend to tussle more often than socially isolated flies, and 3) flies tend to tussle at a later stage (mostly ~15 minutes after the onset of fighting), are the result of their creativity to look outside of conventional experimental settings. These new findings are keys for quantitatively characterizing this interesting yet under-studied behavior.

      Precisely because their initial approach was creative, it is regrettable that the authors missed the opportunity to effectively integrate preceding studies in their rationale or conclusions, which sometimes led to premature claims. Also, while each experiment contains an intriguing finding, these are poorly related to each other. This obscures the central conclusion of this work. The perceived weaknesses are discussed in detail below.

      Thank you for the precise summary of the key findings and novelty of the study, and your insightful suggestions.

      Most importantly, the authors' definition of "tussling" is unclear because they did not explain how they quantified lunges and tussling, even though the central focus of the manuscript is behavior. Supplemental movies S1 and S2 appear to include "tussling" bouts in which 2 flies lunge at each other in rapid succession, and supplemental movie S3 appears to include bouts of "holding", in which one fly holds the opponent's wings and shakes vigorously. These cases raise a concern that their behavior classification is arbitrary. Specifically, lunges and tussling should be objectively distinguished because one of their conclusions is that these two actions are controlled by separate neural circuits. It is impossible to evaluate the credibility of their behavioral data without clearly describing a criterion of each behavior.

      Thank you for this very important suggestion. We now provided more detailed description of the two fighting forms in the Materials and Methods section. See below

      Lunging is characterized by a male raising its forelegs and quickly striking the opponent, and each lunge typically lasts less than 0.2 seconds through detailed analysis. Tussling is characterized by both males using their forelegs and bodies to tumble over each other, and this behavior may last from seconds to minutes. Tussling is often mixed with boxing, in which both flies rear up and strike the opponent with forelegs. Since boxing is often transient and difficult to distinguish from tussling, we referred to the mixed boxing and tussling behavior simply as tussling. As we manually analyze tussling for 2 hours for each pair of males, it is possible that we may miss some tussling events, especially those quick ones.

      It is also confusing that the authors completely skipped the characterization of the tussling-controlling neurons they claimed to have identified. These neurons (a subset of so-called pC1 neurons labeled by previously described split-GAL4 line pC1SS2) are central to this manuscript, but the only information the authors have provided is its gross morphology in a low-resolution image (Figure 4D, E) and a statement that "only 3 pairs of pC1SS2 neurons whose function is both necessary and sufficient for inducing tussling in males" (lines 310-311). The evidence that supports this claim isn't provided. The expression pattern of pC1SS2 neurons in males has been only briefly described in reference 46. It is possible that these neurons overlap with previously characterized dsx+ and/or fru+ neurons that are important for male aggressions (measured by lunges), such as in Koganezawa et al., Curr. Biol. 2016 and Chiu et al., Cell 2020. This adds to the concern that lunge and tussling are not as clearly separated as the authors claim.

      Thank you very much for this important question. Indeed, there are many experiments that could do to better understand the function of pC1SS2 neurons, and we only provide the initial characterization of them due to the limited scope of this study. My lab has been focused on studying P1/pC1 function in both male and female flies and will continue to do so.

      To partially address your concern, we made the following revisions

      (1) We provided higher-resolution images of P1a and pC1SS2 (Figure 4C-4E). While their cell bodies are very close, they project to distinct brain regions, in addition to some shared ones.

      (2) By staining these neurons with GFP and co-staining with anti-FruM or anti-DsxM antibodies, we showed that P1a neurons are partially FruM-positive and partially DsxM-positive, while pC1SS2 neurons are DsxM-positive and FruM-negative (Figure 5A-5D).

      (3) As pC1SS2 neurons are DsxM-positive and FruM-negative, we also examined how DsxM regulates the development of these neurons. We found that knocking down DsxM expression in pC1SS2 neurons using RNAi significantly affected pC1 development regarding to both cell numbers (Figure 5G) and their projections (Figure 5H).

      (4) We further found that DsxM in pC1SS2 neurons is crucial for executing their tussling-promoting function, as optogenetic activation of these neurons with DsxM knockdown failed to induce tussling behavior in the initial activation period, and a much lower level of tussling in the second activation period compared to control males (Figure 5I-5K).

      (5) While it is very difficult to identify the upstream and downstream neurons of P1a and pC1SS2 neurons, we made an initial step by utilizing trans-tango and retro-Tango to visualize potential downstream and upstream neurons of P1a and pC1SS2 (Figure 4-figure supplement 2), which certainly needs future investigation.  

      While their characterizations of tussling behaviors in wild-type males (Figures 1 and 2) are intriguing, the remaining data have little link with each other, making it difficult to understand what their main conclusion is. Figure 3 suggests that one class of olfactory sensory neurons (OSN) that express Or47b is necessary for tussling behavior. While the authors acknowledged that Or47b-expressing OSNs promote male courtship toward females presumably by detecting cuticular compounds, they provided little discussion on how a class of OSN can promote two different types of innate behavior. No evidence of a functional or circuitry relationship between the Or47b pathway and the pC1SS2 neurons was provided. It is unclear how these two components are relevant to each other.

      It has been previously found that Or47b-expressing ORNs respond to fly pheromones common to both sexes, and group-housing enhances their sensitivity. Regarding to how Or47b ORNs promotes two different types of innate behaviors, a simple explanation is that they act on multiple second-order and further downstream neurons to regulate both courtship and aggression, not mentioning that neural circuitries for courtship and aggression are partially shared. We did not include this in the discussion as we would like to focus on aggression modes, and how different ORNs (Or47b and Or67d) mediate distinct aggression modes.

      Regarding to the relationship between Or47b ORNs and pC1<sub>SS2</sub> neurons, or in general ORNs to P1/pC1, it is interesting and important to explore, but probably in a separate study. We tried to conduct pathway connection analyses from Or47b to pC1 using the FlyWire database, and found that Or47b neurons can act on pC1 neurons via three layers of interneurons. Although the FlyWire database currently only contains neuronal data from female brains, they can provide a certain degree of reference. We hope the editor and reviewers would agree with us that identifying these intermediate neurons involved in their connection is beyond this study.

      Lastly, the rationale of the experiment in Figure 5 and the interpretation of the results is confusing. The authors attributed a higher mating success rate of older, socially experienced males over younger, socially isolated males to their tendency to tussle, but tussling cannot happen when one of the two flies is not engaged. If, for instance, a socially isolated 14-day-old male does not engage in tussling as indicated in Figure 2, how can they tussle with a group-housed 14-day-old male? Because aggressive interactions in Figure 5 were not quantified, it is impossible to conclude that tussling plays a role in copulation advantage among pairs as authors argue (lines 282-288).

      Indeed, we do not have direct evidence to show it is tussling that makes socially experienced males to dominate over socially isolated males. To address your concern, we have made following revisions

      (1) We toned down the statements about the relationship between fighting strategies and reproductive success throughout the manuscript. For example, in the abstract Moreover, shifting from lunging to tussling in socially enriched males is accompanied with better territory control and mating success.

      (2)  Regarding to whether a SH male can engage in tussling with a GH male, we found that while two SH males rarely perform tussling, paired SH and GH males displayed similar levels of tussling like two GH males, although tussling duration from paired SH and GH males is significantly lower compared to that in two GH males (Figure 6-figure supplement 2).

      (3) To support the potential role of tussling in territory control and mating competition, we performed additional experiments to silence Or47b or pC1SS2 neurons that almost abolished tussling, and paired these males with control males. We found that males with Or47b or pC1SS2 neurons silenced cannot compete over control males, further suggesting the involvement of tussling in territory control and mating competition.  

      Despite these weaknesses, it is important to acknowledge the authors' courage to initiate an investigation into a less characterized, high-intensity fighting behavior. Tussling requires the simultaneous engagement of two flies. Even if there is confusion over the distinction between lunges and tussling, the authors' conclusion that socially experienced flies and socially isolated flies employ distinct fighting strategies is convincing. Questions that require more rigorous studies are 1) whether such differences are encoded by separate circuits, and 2) whether the different fighting strategies are causally responsible for gaining ethologically relevant resources among socially experienced flies. Enhanced transparency of behavioral data will help readers understand the impact of this study. Lastly, the manuscript often mentions previous works and results without citing relevant references. For readers to grasp the context of this work, it is important to provide information about methods, reagents, and other key resources.

      Thank you very much for this comment and we almost totally agree.

      (1) Our results suggest the involvement of distinct sensory neurons and central neurons for lunging and tussling, but do not exclude the possibility that they may also utilize shared neurons. For example, activation of P1a neurons promotes both lunging and tussling in the presence of light.

      (2) We have now toned down the statements about the relationship between fighting strategies and reproductive success throughout the manuscript.

      (3) We provided more detailed methods, genotypes of flies to improve transparency of the manuscript.

      Reviewer #1 (Recommendations for the authors):

      (1) Figure 1 Supplement 1 shows that increased aging has a linear and inverse relationship with the number of lunges, this is in contrast to a previous study from Dierick lab (Chowdhury, 2021), where using Divider assays they showed that aggressive lunges increased up to day 10 and subsequently decreased in 30-day old flies. Given that this study did not use 14-day-old flies, it might be useful to comment on this.

      Thank you for this comment. Indeed, Chowdhury et al., suggested a decline of lunging after 10 days, which is not contradictory to our findings that lunging in 14d-old males is lower than that in 7d-old males. It is ideally to perform a time-series experiments to reveal the detailed relationship between ages and aggression (lunging or tussling) levels, but given our initial findings that 14d-old males showed stable tussling behavior, we prefer to use this time point for the rest of this study.

      (2) For Figure 3, do various manipulations also affect the duration of tussling and boxing besides frequency and latency?

      Thank you for this comment. We only analyzed latency and frequency, but not duration, as data analysis was performed manually rather than automatically on every fly pair for about 2 hours, which is very labor-consuming. We hope you could agree with us that the two parameters (frequency and latency) for tussling are representative for assaying this behavior.

      (3) For Figure 3 A-F, the housing status of the males is not clearly mentioned either in the main text or the figure. What is the status of the tussling and lunging status when this housing condition is reversed when Or47b neurons are silenced, or the gene is knocked down? Do these manipulations overcome the effect of housing conditions similar to what is seen in NaChBac-mediated activation experiments?

      Figure 3A-F used group-housed males and we have now added such information in the figure legends as well as Table S1.

      We appreciate your suggestion on using different housing conditions. As silencing Or47b neurons or knocking down Or47b reduced tussling, it is reasonable to use GH males (as we did in Figure 3A-F) that performed stable tussling behavior, but not SH males that rarely tussle.

      (4) The connections between Or47b neurons and pC1SS2 or P1a neurons can be addressed by available connectomic datasets or TransTango/GRASP approaches.

      Thank you for this important suggestion. We used the FlyWire electron microscope database to analyze the pathway connections between these two types of neurons. The results indicated that there are at least three levels of interneurons for connecting Or47b and pC1 neurons. Although the FlyWire database currently only contains neuronal data from female brains, they can provide a certain degree of reference for males.

      The lack of direct synaptic connection also suggests that it is challenging to resolve the connection between these two neuronal types using methods like trans-Tango/GRASP. To partially address this question, we utilized trans-Tango and retro-Tango techniques to visualize potential downstream and upstream neurons of P1a and pC1SS2 (Figure 4-figure supplement 2). Future investigations are certainly needed for clarifying functional connections between Or47b/Or67d and P1a/pC1SS2 neurons.

      (5) Figure 5, 'Winning index' and 'Copulation advance index' while described in Material and Methods, should be referred to in the main text.

      We now described these two indices briefly in the main manuscript, and in the Discussion section with more details.

      (6) Figure 6 shows comparisons for territorial control and mating outcomes where four different housing and aging conditions are organized in a hierarchical sequence. It is not clear from the data in Figure 5, how this conclusion was arrived at. A supplementary table with various outcomes with statistical analysis would help with this.

      We now added a supplementary table (Table S2) with various outcomes with statistical analysis.

      Minor Comments

      (1) Line 26 says that the courtship levels in SH and GH males are not different, however, unilateral wing extension is higher in SH males as compared to GH males (Pan & Baker, 2014; Inagaki et al., 2014), also it was shown that courtship attempts are higher in D. paulsitorium (Kim & Ehrman, 1998). It would be better to clarify this statement.

      Indeed, it is found in some cases that SH males court more vigorously than GH males. We have added more references on this matter in the introduction.

      (2) Figure 4, correct 'Tussing' to 'Tussling' or 'Box, Tussling' as appropriate.

      Corrected.

      (3) Duistermars, 2018 should be cited while discussing the role of vision in aggression (Figure 4). [A Brain Module for Scalable Control of Complex, Multi-motor Threat Displays]

      We now cited this reference and added more discussion in the revised manuscript.

      (4) Reviews on Drosophila aggression and social isolation can be cited in the introduction/discussion to incorporate recent literature e.g., Palavicino-Maggio, 2022 [The Neuromodulatory Basis of Aggression Lessons From the Humble Fruit Fly]; Yadav et al., 2024[Lessons from lonely flies Molecular and neuronal mechanisms underlying social isolation], etc.

      We now cited these references in both the introduction and discussion sections.

      (5) The concentration of apple juice agar should be mentioned in the methods.

      We added this and other necessary information for materials in the Materials and Methods section of the study.

      (6) Source of the LifeSongX software and, if available, a Github link would be helpful to include in the materials and methods section.

      We now provided the source of the LifesongY software (website https//sourceforge.net/projects/lifesongy/), which is a Windows version of LifesongX (Bernstein, Adam S.et al., 1992).

      Reviewer #2 (Recommendations for the authors):

      (1) Major comment 1

      As pointed out in the public review, the weakness of this study is that the relationship between the aggression strategy and reproductive success is an inference that is not based on experimental facts; I understand that the frequency of tussling is not so high, but at least tussling-like behavior can be observed in the territory control experiment shown in Video 3. Wouldn't it be possible to re-analyse data and examine the correlation between aggressive behavior and territory control? Even if the analysis of tussling itself in this setup is difficult, for example, additional experiments using Or47b knock-out fly or pC1[SS2]-inactivated fly could provide stronger support.

      Indeed, we can only make a correlation between the type of aggressive behavior and territory control. We now toned down this statement throughout the manuscript. For example, in the abstract, we changed our conclusions as following

      Moreover, shifting from lunging to tussling in socially enriched males is accompanied with better territory control and mating success. Our findings identify distinct sensory and central neurons for two fighting forms and suggest how social experience shapes fighting strategies to optimize reproductive success.

      To further address the concern, we now performed additional experiments to silence Or47b or pC1SS2 neurons that almost abolished tussling, and paired these males with control males. We found that males with Or47b or pC1SS2 neurons silenced cannot compete over control males (Figure 6-figure supplement 3), further suggesting the involvement of tussling in territory control and mating competition.

      In relation to the above, some of the text in the Abstract should be changed.Line 28 These findings "reveal" how social experience shapes fighting strategies to optimise reproductive success.

      "suggest" is more accurate at this stage.

      Changed as suggested.

      (2) Major comment 2

      The tussling is the central subject of this paper. However, neither the main text nor Materials and Methods section provides a clear explanation of how this aggression mode was detected. Did the authors determine this behavior manually? Or was it automatically detected by some kind of image analysis? In either case, the criteria and method for detecting the tussling should be clearly described.

      The behavioral data analysis in this study was performed manually. We now provided more detailed description of the two fighting forms in the Materials and Methods section. See below

      Lunging is characterized by a male raising its forelegs and quickly striking the opponent, and each lunge typically lasts less than 0.2 seconds through detailed analysis. Tussling is characterized by both males using their forelegs and bodies to tumble over each other, and this behavior may last from seconds to minutes. Tussling is often mixed with boxing, in which both flies rear up and strike the opponent with forelegs. Since boxing is often transient and difficult to distinguish from tussling, we referred to the mixed boxing and tussling behavior simply as tussling. As we manually analyze tussling for 2 hours for each pair of males, it is possible that we may miss some tussling events, especially those quick ones.

      For the experimental groups where tussling cannot be observed, the latency is regarded as 120 min, but this is a value depending on the observation time. While it is reasonable to use the latency to evaluate the behavior such as the lunging that is observed at relatively early times, care should be taken when using it to evaluate the tussling. Since similar trends to those obtained for the latency are observed for Number of tussles and % of males performing tussling, it may be better to focus on these two indices.

      We initially intended to provide all three statistical metrics. However, we found that using the "% of males performing tussling" would require a significantly larger sample size for subsequent statistical analysis (using chi-square tests), greatly increasing the workload. At the same time, we believe that the trend observed with "% of males performing tussling" is consistent with the other two indices, and the percentage information can also be derived from the individual sample scatter data of the other two metrics. Therefore, we opted to use "latency" and "numbers" as the statistical metrics, despite the caveat as you mentioned.

      The authors repeatedly mention that tussling is less frequent but more vigorous. The low frequency can be understood from the data in Fig. 1 and Fig. 2, but there are no measured data on the intensity. As the authors mention in line 125, each tussling event appears to be sustained for a relatively long period, as can be seen from the ethogram in Fig. 2. For example, it would be possible to evaluate the intensity by measuring the duration of the tussling event.

      Thank you for your valuable suggestion. We now analyzed duration of tussling and lunging, and found that a lunging event is often very short (less than 0.2s), while a tussling event may last from seconds to minutes, further supporting their relative intensities. This new data is added as Figure 2G.

      (3) Minor comments

      a) Line 117 How many flies were placed in one vial for group-rearing (GH)? Were males and females grouped together? Please specify in the Materials and Methods section.

      We have added this information in the Materials and Methods section. In brief, 30-40 virgin males were collected after eclosion and group-housed in each food vial.

      b) Line 174 The trans-Tango is basically a postsynaptic cell labeling technique. It is unlikely that the labeling intensity changes depending on neuronal activity. Do the authors want to say in this text the high activity of Or47b-expressing neurons under GH conditions? Or are they trying to show that the expression level of the Or47b gene, which is supposedly monitored by the expression of GAL4, is increased by GH conditions? The authors should clarify which is the case.

      Although the primary function of the trans-Tango technique is to label downstream neurons, the original literature indicates that the signal strength in downstream neurons depends on the use of upstream neurons evidenced by age-dependent trans-Tango signals. Therefore, the trans-Tango technique can indirectly reflect the usage of upstream neurons. Our findings that GH males showed broader Or47b trans-Tango signals than SH males can indirectly suggest that group-housing experience acts on Or47b neurons. We made textually changes to clarify this.

      c) Line 178 Which fly line labels the mushroom body; R19B03-GAL4?

      Yes, we now provided the detailed genotypes for all tested flies in the Table S1.

      d) Line 184 It was reported in Koganezawa et al., 2016 that some dsx-expressing pC1 neurons are involved in aggressive behavior. The authors should also refer to this paper as they include tussling in the observed aggressive behavior.

      Thank you for this comment, and we now cited this reference in the revised manuscript.

      e) Line 339 I think you misspelled fruM RNAi.

      Thank you for pointing this out. fruMi refers to microRNAi targeting fruM, and we have now clearly stated this information in the main text.

      f) Line 681 Is tussling time (%) the total duration of tussling occurrences during the observation time? Or is it the percentage of individuals observed tussling during the observation time? This needs to be clarified.

      It is the former one. We now clearly stated this definition in the Materials and Methods section

      Reviewer #3 (Recommendations for the authors):

      For authors to support their conclusion that enhanced tussling among socially experienced flies allows them to better retain resources, it is necessary to quantify aggressive behaviors (mainly tussling and lunging) in Figure 5.

      We agree that we can only make a correlation between enhanced tussling behavior and mating competition. We now toned down this statement throughout the manuscript. For example, in the abstract, we changed our conclusions as following Moreover, shifting from lunging to tussling in socially enriched males is accompanied with better territory control and mating success. Our findings identify distinct sensory and central neurons for two fighting forms and suggest how social experience shapes fighting strategies to optimize reproductive success.

      To further address the concern, we now performed additional experiments to silence Or47b or pC1SS2 neurons that almost abolished tussling, and paired these males with control males. We found that males with Or47b or pC1SS2 neurons silenced cannot compete over control males (Figure 6-figure supplement 3), further suggesting the involvement of tussling in territory control and mating competition.

      In contrast to the authors' data in Figure 4, movies in ref 36 clearly show instances of 2 flies exchanging lunges after the optogenetic activation of P1a neurons, like the examples shown in supplementary movies S1-S3. It is a clear discrepancy that requires discussion (and raises a concern about the lack of transparency about behavioral quantification).

      In our study, optogenetic activation of P1<sup>a</sup> neurons failed to induce obvious tussling behavior, and temperature-dependent activation of P1<sup>a</sup> neurons can only induce tussling in the presence of light. These data are different from Hoopfer et al., (2015), but are generally consistent with a new study (Sten et al., Cell, 2025), in which pC1SS2 neurons but not P1a neurons promote aggression. Such discrepancy has now been discussed in the revised manuscript.

      The authors often fail to cite relevant references while discussing previous results, which compromises the scholarship of the manuscript. Examples include (but are not limited to)

      (1) Line 85-86 Simon and Heberlein, J. Exp. Biol. 223 jeb232439 (2020) suggested that tussling is an important factor for flies to establish a dominance hierarchy.

      Reference added.

      (2) Line 142-143 Cuticular compounds such as palmitoleic acid are characterized to be the ligands of Or47b by ref #18.

      Reference added.

      (3) Line 185-187 pC1SS1 and pC1SS2 are first characterized by ref #46. Expression data of this paper also implies that pC1SS1 and pC1SS2 label different neurons in the male brain.

      We have now added this reference at the appropriate place in the revised manuscript. In addition, we have clarified that these two drivers exhibit sexually dimorphic expression patterns in the brain.

      (4) Line 196-199 Cite ref #36, which describes the behavior induced by the optogenetic activation of P1a neurons.

      Reference added.

      (5) Line 233-235 The authors' observation that control males do not form a clear dominance directly contradicts previous observations by others (Nilsen et al., PNAS 10112342 (2002); Yurkovic et al., PNAS 10317519 (2006); also see Trannoy et al., PNAS 1134818 (2016) and Simon and Heberlein above). The authors must at least discuss why their results are different.

      There is a misunderstanding here. We clearly state that there is a ‘winner takes all’ phenomenon. However, for wild-type males of the same age and housing condition, we calculated the winning index as (num. of wins by unmarked males – num. of wins by marked males)/10 encounters * 100%, which is roughly zero due to the randomness of marking.

      (6) Line 251-254 The authors' observation that aged males are less competitive than younger males contradicts the conclusion in ref #18. Discussion is required.

      We have now added a discussion on this matter. In brief, Lin et al., showed that 7d-old males are more competitive than 2d-old males, which is probably due to different levels of sexual maturity of males, but not a matter of age like our study that used up to 21d-old males.

      (7) Line 274-275 It is unclear which "previous studies" "have found that social isolation generally enhances aggression but decreases mating competition in animal models". Cite relevant references.

      Reference added.

      (8) Line 309-310 The evidence supporting the statement that "there are only three pairs of pC1SS2 neurons". If there is a reference, cite it. If it is based on the authors' observation, data is required.

      We have now provided additional data on the number of pC1SS2 neurons in Figure 5G of the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      The manuscript by Feng et al. reported that the Endothelin B receptor (ETBR) expressed by the satellite glial cells (SGCs) in the dorsal root ganglions (DRG) acted to inhibit sensory axon regeneration in both adult and aged mice. Thus, pharmacological inhibition of ETBR with specific inhibitors resulted in enhanced sensory axon regeneration in vitro and in vivo. In addition, sensory axon regeneration significantly reduces in aged mice and inhibition of ETBR could restore such defect in aged mice. Moreover, the study provided some evidence that the reduced level of gap junction protein connexin 43 might act downstream of ETBR to suppress axon regeneration in aged mice. Overall, the study revealed an interesting SGC-derived signal in the DRG microenvironment to regulate sensory axon regeneration. It provided additional evidence that non-neuronal cell types in the microenvironment function to regulate axon regeneration via cell-cell interaction. 

      However, the molecular mechanisms by which ETBR regulates axon regeneration are unclear, and the manuscript's structure is not well organized, especially in the last section. Some discussion and explanation about the data interpretation are needed to improve the manuscript. 

      We thank the reviewer for the positive comments. We agree that the mechanisms by which ETBR signaling functions as a brake on axon growth and regeneration remain to be elucidated. We believe that unraveling the detailed molecular pathways downstream of ETBR signaling in SGCs that promote axon regeneration is beyond the scope of this manuscript. Answering these questions would first require cell specific KO of ETBR and Cx43 to confirm that this pathway is operating in SGCs to control axon regeneration. We would also need to identify how SGCs communicate with neurons to regulate axon regeneration, which is a large area of ongoing research that remains poorly understood. Our data showing that pharmacological inhibition of ETBR with specific FDA-approved inhibitors enhances sensory axon regeneration provide not only new evidence for non-neuronal mechanisms in nerve repair, but also a new potential clinical avenue for therapeutic intervention.

      As suggested by the reviewer, we have extensively revised the organization of the manuscript, especially the last section of results. We have performed additional snRNAseq experiments to establish the impact of aging in DRG. We have also performed additional experiments to determine if blocking ETBR improves target tissue reinnervation. Following the reviewer’s suggestion, we have also expanded the Discussion section to discuss alternative mechanisms and o]er additional interpretation of our data. Below we describe how we address each point in detail.

      (1) The result showed that the level of ETBR did not change after the peripheral nerve injury. Does this mean that its endogenous function is to limit spontaneous sensory axon regeneration? In other words, the results suggest that SGCs expressing ETBR or vascular endothelial cells expressing its ligand ET-1 act to suppress sensory axon regeneration. Some explanation or discussion about this is necessary. Moreover, does the protein level of ETBR or its ligand change during aging?  

      We thank the reviewer for this point. Our results indeed indicate that one endogenous function of ETBR is to limit the extent of sensory axon regeneration. This may be a part of a mechanism to limit spontaneous sensory axon growth or plasticity and maladaptive neural rewiring after nerve injury. While the increased growth capacity of damaged peripheral axons can lead to reconnection with their targets and functional recovery, the increased growth capacity can also lead to axonal sprouting of the central axon terminals of injured neurons in the spinal cord, and to pain (see for example Costigan et al 2010, PMID: 19400724).  In the context of aging that we describe here, this protective mechanism may hinder beneficial recovery. Other mechanisms that slow axon regeneration have been reported, and include, for example, axonally synthesized proteins, which typically support nerve regeneration through retrograde signaling and local growth mechanisms. RNA binding proteins (RBP) are needed for this process. One such RBP, the RNA binding protein KHSRP is locally translated following nerve injury. Rather than promoting axon regeneration, KHSRP promotes decay of other axonal mRNAs and slows axon regeneration.  Another example includes the Rho signaling pathway, which was shown to function as an inhibitory mechanism that slows the growth of spiral ganglion neurites in culture. We have now included these examples in the Discussion section.

      To address the reviewer’s second question, we have checked protein levels of ETBR and ET-1 in adult and aged DRG tissue. We observed a robust increase in ET-1 in aged DRG, while the levels of ETBR did not appear to change significantly. These results are now presented in Figure 4- Figure Supplement 1, and further support the notion that in aging, activation of the ETBR signaling hinders axon regeneration.

      (2) In ex vivo experiments, NGF was added to the culture medium. Previous studies have shown that adult sensory neurons could initiate fast axon growth in response to NGF within 24 hours. In addition, dissociated sensory neurons could also initiate spontaneous regenerative axon growth without NGF after 48 hours. Some discussion or rationale is needed to explain the di]erence between NGF-induced or spontaneous axon growth of culture adult sensory neurons and the roles of ETBR and SGCs. 

      We appreciate the reviewer’s suggestion. In adult DRG explant or dissociated cultures, NGF is not typically required for survival or axon outgrowth. However, in dissociated culture, the addition of NGF to the medium stimulates growth from more neurons compared to controls (Smith and Skene 1997). In the DRG explant, NGF does not promote significant e]ects on axon growth, but stimulates glial cell migration (Klimovich et al 2020). We opted to included NGF in our explant assay to increase the potential of stimulating axon regeneration with pharmacological manipulations of ETBR. We have now clarified these considerations in the Method section.

      (3) In cultured dissociated sensory neurons, inhibiting ETBR also enhanced axon growth, which meant the presence of SGCs surrounding the sensory neurons. Some direct evidence is needed to show the cellular relationship between them in culture.  

      We thank the reviewer for raising this point and have added new data, now presented in Figure 2B, to show that in mixed DRG cultures, SGCs labeled with Fabp7 are present in the culture in proximity to neurons labeled with TUJ1, but they do not fully wrap the neuronal soma. These results are consistent with prior findings reporting that as time in culture progresses, SGCs lose their adhesive contacts with neuronal soma and adhere to the coverslip (PMID: 22032231, PMID: 27606776).  While in some cases SGCs can maintain their association with neuronal soma in the first day in culture after plating, in our hands, most SGCs have left the soma at the 24h time point we examined. 

      (4) In Figure 3, the in vivo regeneration experiments first showed enhanced axon regeneration either 1 day or 3 days after the nerve injury. The study then showed that inhibiting ETBR could enhance sensory axon growth in vitro from uninjured naïve neurons or conditioning lesioned neurons. To my knowledge, in vivo sensory axon regeneration is relatively slow during the first 2 days after the nerve injury and then enters the fast regeneration mode on the 3rd day, representing the conditioning lesion e]ect in vivo. Some discussion is needed to compare the in vitro and the in vivo model of axon regeneration. 

      We agree that axon growth is relatively slow the first 2 days and enters a fast growth mode on day 3. This has been elegantly demonstrated in Shin et al Neuron 2012 (PMID: 22726832), where an in vivo conditioning injury 3 days prior increases axon growth one day after injury. In vitro, similar e]ects have been described: a prior in vivo injury accelerates growth capacity within the first day in culture, but a similar growth mode occurs in naive adult neurons after 2-3 days in vitro (Smith and Skene 1996). We also know that the neurite growth in culture is stimulated by higher cell density, likely because non-neuronal cells can secrete trophic factors (Smith and Skene 1996). Our in vitro results thus suggest that blocking ETBR in SGCs in these mixed cultures may alter the media towards a more growth promoting state. In vivo, our data show that Bosentan treatment for 3 days partially mimics the conditioning injury and potentiate the e]ect of the conditioning injury. One possible interpretation is that inhibition of ETBR alters the release of trophic factors from SGCs. Future studies will be required to unravel how ETBR signaling influence the SGCs secretome and its influence on axon growth. We have now included these discussions points in the Results and Discussion Section.

      (5) In Figure 5, the study showed that the level of connexin 43 increased after ETBR inhibition in either adult or aged mice, proposing an important role of connexin 43 in mediating the enhancing e]ect of ETBR inhibition on axon regeneration. However, in the study, there was no direct evidence supporting that ETBR directly regulates connexin 43 expression in SGCs. Moreover, there was no functional evidence that connexin 43 acted downstream of ETBR to regulate axon regeneration.  

      We thank the reviewer for this point and agree that we do not provide direct evidence that connexin 43 acts downstream of ETBR to regulate axon regeneration. To obtain such functional evidence would require selective KO of ETBR and Cx43 in SGCs, which we believe is beyond the scope of the current study. We have revised the Results and Discussion sections to emphasize that while we observe that ETBR inhibition increases Cx43 levels and Cx43 levels correlates with axon regeneration, whether Cx43 directly mediates the e]ect on axon regeneration remains to be established.  We also discuss potential alternative mechanisms downstream of ETBR in SGCs that could contribute to the observed e]ects on axon regeneration. Specifically, we discuss the possibility that  ETBR signaling may limit axon regeneration via regulating SGCs glutamate reuptake functions, because of the following reasons: 1) Similarly to astrocytes, glutamate uptake by SGCs is important to regulate neuronal function, 2) exposure of cultured cortical astrocytes to endothelin results in a decrease in glutamate uptake that correlates with a major loss of basal glutamate transporter expression (GLT-1 and1), 3) Both glutamate transporters are expressed in SGCs in sensory ganglia 4) GLAST and glutamate reuptake function is important for lesion-induced plasticity in the developing somatosensory cortex. 

      Reviewer #2 (Public Review): 

      Summary: 

      In this interesting and original study, Feng and colleagues set out to address the e]ect of manipulating endothelin signaling on nerve regeneration, focusing on the crosstalk between endothelial cells (ECs) in dorsal root ganglia (DRG), which secrete ET-1 and satellite glial cells (SGCs) expressing ETBR receptor. The main finding is that ETBR signaling is a default brake on axon growth, and inhibiting this pathway promotes axon regeneration after nerve injury and counters the decline in regenerative capacity that occurs during aging. ET-1 and ETBR are mapped in ECs and SGCs, respectively, using scRNA-seq of DRGs from adult or aged mice. Although their expression does not change upon injury, it is modulated during aging, with a reported increase in plasma levels of ET-1 (a potent vasoconstrictive signal). Using in vitro explant assays coupled with pharmacological inhibition in mouse models of nerve injury, the authors demonstrate that ET-1/ETBR curbs axonal growth, and the ETAR/ETBR antagonist Bosentan boosts regrowth during the early phase of repair. In addition, Bosentan restores the ability of aged DRG neurons to regrow after nerve lesions. Despite Bosentan inhibiting both endothelin receptors A and B, comparison with an ETAR-specific antagonist indicates that the e]ects can be attributed to the ET-1/ETBR pathway. In the DRGs, ETBR is mostly expressed by SGCs (and a subset of Schwann cells) a cell type that previous studies, including work from this group, have implicated in nerve regeneration. SGCs ensheath and couple with DRG neurons through gap junctions formed by Cx43. Based on their own findings and evidence from the literature, the pro-regenerative e]ects of ETBR inhibition are in part attributed to an increase in Cx43 levels, which are expected to enhance neuron-SGC coupling. Finally, gene expression analysis in adult vs aged DRGs predicts a decrease in fatty acid and cholesterol metabolism, for which previous work by the authors has shown a requirement in SGCs to promote axon regeneration. 

      Strengths: 

      The study is well-executed and the main conclusion that "ETBR signaling inhibits axon regeneration after nerve injury and plays a role in age-related decline in regenerative capacity" (line 77) is supported by the data. Given that Bosentan is an FDA-approved drug, the findings may have therapeutic value in clinical settings where peripheral nerve regeneration is suboptimal or largely impaired, as it often happens in aged individuals. In addition, the study highlights the importance of vascular signals in nerve regeneration, a topic that has gained traction in recent years. Importantly, these results further emphasize the contribution of longneglected SGCs to nerve tissue homeostasis and repair. Although the study does not reach a complete mechanistic understanding, the results are robust and are expected to attract the interest of a broader readership. 

      We thank the reviewer for the positive comments, especially in regard to the rigor and originality of our study.

      Weaknesses: 

      Despite these positive comments provided above, the following points should be considered: 

      (1) This study examines the contribution of the ET-1 pathway in the ganglia, and in vitro assays are consistent with the idea that important signaling events take place there. Nevertheless, it remains to be determined whether the accelerated axon regrowth observed in vivo depends also on cellular crosstalk mediated by ET-1 at the lesion site. Are ECs along the nerve secreting ET-1? What cells are present in the nerve stroma that could respond and participate in the repair process? Would these interactions be sensitive to Bosentan? It may be di]icult to dissect this contribution, but it should at least be discussed.  

      We thank the reviewer for this important point and agree that the in vivo e]ects observed cannot rule out the contribution of ECs or SCs at the lesion site in the nerve. Dissecting the contribution of ETBR expressing cells in the nerve would require cell-specific manipulations that go beyond the scope of this manuscript. We have revised the Discussion section to highlight the potential contribution of ECs, fibroblast and SCs in the nerve.  

      (2) It is suggested that the permeability of DRG vessels may facilitate the release of "vascularderived signals" (lines 82-84). Is it possible that the ET-1/ETBR pathway modulates vascular permeability, and that this, in turn, contributes to the observed e]ects on regeneration?  

      We thank the reviewer for raising this interesting point. ET-1 can have an impact on vascular permeability. It was indeed shown that in high glucose conditions, increased trans-endothelial permeability is associated with increased Edn1, Ednra and Ednrb expression and augmented ET1 immunoreactivity (PMID: 10950122). It is thus possible that part of the e]ects observed results from altered vascular permeability. We have included this point in the Discussion section. Future experiments will be required to test how injury and age a]ects vascular permeability in the DRG.

      (3) Is the a]inity of ET-3 for ETBR similar to that of ET-1? Can it be excluded that ET-3 expressed by fibroblasts is relevant for controlling SGC responses upon injury/aging?  

      We thank the reviewer for raising this point. ET-1 binds to ETAR and ETBR with the same a]inity, but ET3 shows a higher a]inity to ETBR than to ETAR (Davenport et al. Pharmacol. Rev 2016 PMID: 26956245). We attempted to examine ET-3 level in adult and aged DRG by western blot, but in our hands the antibody did not work well enough, and we could not obtain clear results. We thus cannot exclude the possibility that ET-3 released by fibroblasts contribute to the e]ects we observe on axon regeneration. Indeed, in cultured cortical astrocytes, application of either ET-1 or ET-3 leads to inhibition of Cx43 expression. We have revised the text in the Discussion section to highlight the possibility that both ET-1 and ET-3 could participate on the ETBRdependent e]ect on axon regeneration.

      (4) ETBR inhibition in dissociated (mixed) cultures uncovers the restraining activity of endothelin signaling on axon growth (Figure 2C). Since neurons do not express ET-1 receptors, based on scRNA-seq analysis, these results are interpreted as an indication that basal ETBR signaling in SGC curbs the axon growth potential of sensory neurons. For this to occur in dissociated cultures, however, one should assume that SGC-neuron association is present, similar to in vivo, or to whole DRG cultures (Figure 2C). Has this been tested?

      We thank the reviewer for this point. In dissociated DRG culture, neurons, SGCs and other nonneuronal cells are present, but SGCs do not retain the surrounding morphology as they do in vivo. Within 24 hours in culture, SGCs lose their adhesive contacts with neuronal soma and adhere to the coverslip (PMID: 22032231, PMID: 27606776).  We have included new data in Figure 2B to show that in our culture conditions, SGCs are present, but do not wrap neurons soma as they do in vivo. We also know from prior studies that the density of the culture a]ects axon growth, an e]ect that was attributed to trophic factors released from non-neuronal cells (Smith and Skene 1997). Therefore, although SGCs do not surround neurons, the signaling pathway downstream of ETBR may be present in culture and contribute to the release of trophic factors that influence axon growth. We have revised the Results section to better explain our in vitro results and their interpretation.

      In both in vitro experimental settings (dissociated and whole DRG cultures) how is ETBR stimulated over up to 7 days of culture? In other words, where does endothelin come from in these cultures (which are unlikely to support EC/blood vessel growth)? Is it possible that the relevant ligand here derives from fibroblasts (see point #6)? Or does it suggest that ETBR can be constitutively active (i.e., endothelin-independent signaling)? Is there any chance that endothelin is present in the culture media or Matrigel? 

      We thank the reviewer for raising this point.  Our single-cell data indicate that ET-1 is expressed by endothelial cells and ET-3 by fibroblasts. In dissociated DRG culture at 24h time point, all DRGs cells are present, including endothelial cells and fibroblasts, and could represent the source of ET-1 or ET-3. In the explant setting, it is also possible that both ET-1 and ET-3 are released by endothelial cells and fibroblasts during the 7 days in culture. According to information for the suppliers, endothelin is not present neither in the culture media nor in the Matrigel. While mutations can facilitate the constitutive activity of the ETBR receptor, we are not aware of data showing that endogenous ETBR can be constitutively active.  Because the molecular mechanisms governing ETBR -mediated signaling remain incompletely understood (see for example PMID: 39043181, PMID: 39414992) future studies will be required to elucidate the detailed mechanisms activating ETBR in SGCs and its downstream signaling mechanisms.  We have now expanded the Results and discussion sections to clarify these points. 

      (5) The discovery that ET-1/ETBR signaling in SGC curtails the growth capacity of axons at baseline raises questions about the physiological role of this pathway. What happens when ETBR signaling is prevented over a longer period of time? This could be addressed with pharmacological inhibitors, or better, with cell-specific knock-out mice. The experiments would certainly be of general interest, although not within the scope of this story. Nevertheless, it could be worth discussing the possibilities. 

      We agree that this is an interesting point. As mentioned above in response to point #1 of reviewer 1, the physiological role of this pathway could be to limit plasticity and prevent maladaptive neural rewiring that can happen after injury (Costigan et al 2009, PMID: 19400724), but can also hinder beneficial recovery after injury. Other mechanisms that limit axon regeneration capacity have been described and involve local mRNA translation and Rho signaling. We have revised the Discussion section to include these points. We agree that understanding the consequence of blocking ETBR over longer time periods is beyond the scope of the current study, but we now discuss the possibility that blocking ETBR with a cell specific KO approach could unravel its physiological function on target innervation and behavior. 

      (6) Assessing Cx43 levels by measuring the immunofluorescence signal (Figure 5E-F) is acceptable, particularly when the aim is to restrict the analysis to SGCs. The modulation of Cx43 expression by ET-1/ETBR plays an important part in the proposed model. Therefore, a complementary analysis of Cx43 expression by quantitative RT-PCR on sorted SGCs would be a valuable addition to the immunofluorescence data. Is this attainable? 

      We agree and have attempted to perform these types of experiments but encountered technical di]iculties. We attempted to sorting SGCs from transgenic mice in which SGCs are fluorescently labeled. However, the cells did not survive the sorting process and died in culture.  We think that increasing the viability of cells after sorting would require capillary- free fluorescent sorting approaches. However, we do not currently have access to such technology. We attempted this experiment with cultured SGCs, following a previously published protocol (Tonello et al. 2023 PMID: 38156033). In these experiments, SGCs are cultured for 8 days to obtain purity. We did not observe any di]erence in Cx43 protein or mRNA level upon treatment with ET-1 with or without BQ788. However, in these SGCs cultures, Cx43 displayed a di]use localization, rather than puncta as observed in vivo. Therefore, despite our multiple attempts, quantifying Cx43 on sorted or purified SGCs was not attainable.

      (7) The conclusions "We thus hypothesize that ETBR inhibition in SGCs contributes to axonal regeneration by increasing Cx43 levels, gap junction coupling or hemichannels and facilitating SGC-neuron communication" (lines 303-305) are consistent with the findings but seem in contrast with the e]ect of aging on gap junction coupling reported by others and cited in line 210: "the number of gap junctions and the dye coupling between these cells increases (Huang et al., 2006)". I am confused by what distinguishes a potential, and supposedly beneficial, increase in coupling after ETBR inhibition, from what is observed in aging. 

      We agree that the aging impact of Cx43 level and gap junction number appears contradictory. Procacci et al 2008 reported that Cx43 expression in SGCs decreases in the aged mice. Huang et al 2006 report that both the number of gap junctions and the dye coupling between these cells were found to increase with aging. Procacci et al suggested as a possible explanation for this apparent discrepancy that additional connexin types other than Cx43 may contribute to the gap junctions between SGCs in aged mice. Our snRNAseq data did not allow us to verify this hypothesis, because there were less SGCs in aged mice compared to adult, and connexin genes were detected in only 20% or less of SGCs.  Furthermore, our quantification did not look specifically at gap junctions, but just at Cx43 puncta. Cx43 can also form hemichannels in addition to gap junctions, and can also perform non-channel functions, such as protein interaction, cell adhesion, and intracellular signaling. Thus, more research examining the role of Cx43 in SGCs is necessary to address this discrepancy in the literature. We have expanded the Discussion section to include these points. 

      (8) I find it di]icult to reconcile the results in Figure 5F with the proposed model since (1) injury increases Cx43 levels in both adult and aged mice, (2) the injured aged/vehicle group has a similar level to the uninjured adult group, (3) upon injury, aged+Bosentan is much lower than adult+Bosentan (significance not tested). It seems hard to explain the e]ect of Bosentan only through the modulation of Cx43 levels. Whether the increase in Cx43 levels following ETBR inhibition actually results in higher SGC-neuron coupling has not been assessed experimentally. 

      We thank the reviewer for this point and agree that the e]ect of Bosentan is likely not exclusively through the modulation of Cx43 levels in SGCs, and that Cx43 levels may simply correlate with axon regenerative capacity. We have revised the manuscript to clarify this point.  We have also added the missing significance test in Figure 5F.

      Cell specific KO of Cx43 and ETBR would allow to test this hypothesis directly but is beyond the scope of the current study. We have not tested SGCs-neuron coupling, as these experiments are currently beyond our area of expertise. Cx43 has also other functions beyond gap junction coupling, such as protein interaction, cell adhesion, and intracellular signaling. Investigating the precise function of Cx43 would require in depth biochemical and cell specific experiments that are beyond the scope of this study. Furthermore, as we now mentioned in response to reviewer #2 point 5, ETBR signaling may also have other downstream e]ects in SGCs, such as glutamate transporters expression, or a]ect other cells in the nerve during the regeneration process. We have revised the Discussion section to include these alternative mechanisms.

      Reviewer #3(Public Review): 

      Summary: 

      This manuscript suggests that inhibiting ETBR via the FDA-approved compound Bosentan can disrupt ET-1-ETBR signalling that they found detrimental to nerve regeneration, thus promoting repair after nerve injury in adult and aged mice. 

      Strengths: 

      (1) The clinical need to identify molecular and cellular mechanisms that can be targeted to improve repair after nerve injury. 

      (2) The proposed mechanism is interesting. 

      (3) The methodology is sound. 

      We thank the reviewer for highlighting the strengths of our study

      Weaknesses: 

      (1) The data appear preliminary and the story appears incomplete. 

      We appreciate the reviewer’s point. We would like to emphasize that our results provide compelling evidence that ETBR signaling is a default brake on axon growth, and inhibiting this pathway promotes axon regeneration after nerve injury and counters the decline in regenerative capacity that occurs during aging. We also provide evidence that ETBR signaling regulates the levels of Cx43 in SGCs. Furthermore, our results document the use of an FDA approved compound to increase axon regeneration may be of interest to the broader readership, as there is currently no therapies to improve or accelerate nerve repair after injury. We agree that the detailed mechanisms operating downstream of ETBR will need to be elucidated. Answering these questions would first require cell specific KO of ETBR and Cx43 to confirm that this pathway is operating in SGCs to control axon regeneration. We would also need to identify how SGCs communicate with neurons to regulate axon regeneration, which is a large area of ongoing research that remains poorly understood. This extensive and highly complex set of experiments is beyond the scope of the current study. As we discussed in our response to reviewer #1 and #2 we attempted to perform numerous additional experiments to better define the role of ETBR signaling in SGCs in aging and have included additional results in Fig. 2B, Fig 3G-H,  Fig 5A-E, and Figure 4- Figure Supplement 1and Figure 5- Figure Supplement 1. We have expanded the

      Discussion to acknowledge the limitation of our study and to discuss possible mechanisms.  

      (2) Lack of causality and clear cellular and molecular mechanism. There are also some loose ends such as the role of connexin 43 in SGCs: how is it related to ET-1- ETBR signalling?  

      We thank the reviewer for this point and agree that the molecular mechanisms downstream of ETBR remain to be elucidated. However, we believe that our manuscript reports an interesting potential of an FDA-approved compound in promoting nerve repair. We focused on Cx43 downstream of ETBR signaling because decreased Cx43 expression in SGCs in ageing was previously established, but the mechanisms were not elucidated. Furthermore, it was reported that ET1 signaling in cultured astrocytes, which share functional similarities with SGCs, leads to the closure of gap junctions and reduction in Cx43 expression. Our study thus provides a mechanism by which ETBR signaling in SGCs regulates Cx43 expression. Whether Cx43 directly impact axon regeneration remains to be tested. Cell specific KO of Cx43 and ETBR would be required to answer this question. We have revised the Introduction and Discussion section extensively to provide a link between ETBR and Cx43 and to acknowledge the lack of causality in Cx43 in SGCs, as well as to provide additional potential mechanisms by which ETBR inhibition may promote nerve repair.

      Reviewer #2 (Recommendations For The Authors): 

      In addition to the points listed in the Public Review section, please consider the following comments: 

      (1) ETAR, which is high in mural cells, does not seem to be implicated in the reported proregenerative e]ects. Even so, can vasoconstriction be ruled out as an underlying cause of the age-dependent decline in axon regrowth potential and, more generally, in the e]ects of ET-1 inhibition on regeneration? This could be discussed. 

      We agree that we can’t exclude a role in vasoconstriction or e]ect on vascular permeability in the age-dependent decline in axon regrowth potential. However, our in vitro and ex vivo experiments, in which vascular related mechanisms are unlikely, suggest that vasoconstriction may not be a major contributor to the e]ects we observed.

      (2) The manuscript (e.g. line 287-288) would benefit from a discussion of the role that blood vessels play in the peripheral nervous system, and possibly CNS, repair. Vessels were shown to accompany regenerating fibers and instruct the reorganization of the nerve tissue to favor repair potentially through the release of pro-regenerative signals acting on stromal cells, glia, and other cellular components. Highlighting these processes will help put the current findings into perspective. 

      We agree and have revised the Discussion section to better explain the role of blood vessels in orientating Schwann cells migration and guiding axon regeneration.

      (3) The vast majority of the cells that are sequenced and shown in the UMAP in Figure 1C are from adult (3-month-old) mice [16,923 out of 18,098]. It would be useful to include the UMAP split (or color-coded) by timepoint to appreciate changes in cell clustering that may occur with aging.  

      We apologize for this misunderstanding, Figure 1C had all cells from all ages. However, the number of cells we obtained from the age group was insu]icient to perform in depth analysis of each cell type. We have thus revised this section and Figure 1, now only presenting the data from adult mice.  

      It is not discussed why fewer cells were sequenced at later stages. Additionally, I do not know how to interpret the double asterisks next to the labeling "18,098 samples" in Figure 1C. 

      Since our original sequencing of adult and aged mice using 10x yielded so few cells from the aged DRG, we tested and optimized a new technology for single cell preparation of DRG using Illumina Single Cell 3’ RNA Prep. This preparation creates templated emulsions using a vortex mixer to capture and barcode single-cell mRNA instead of a microfluidics system. This method yielded much better results for nuclei recovery from aged DRG, with more nuclei and better quality of nuclei. Thus, we now present in Figure 5 and Figure 5- Figure Supplement 1 the results from snRNA-sequencing of aged and adult DRG using the Illumina single cell kit. The results of the snRNA-sequencing show a decreased abundance of SGCs in aged mice, consistent with the results from our morphology analysis with EM. We were also able to perform SGCs-specific pathway analysis because of the increased number of nuclei captured in the aged SGCs, which we included in the manuscript.

      (4) The in vivo studies are designed to examine the e]ects of ETBR inhibition during the first phase of axon regrowth after nerve injury (1-3 days post-injury, dpi). Is there a reason why later stages have not been studied? It would be interesting to understand whether ETBR inhibition improves long-term recovery or is only e]ective at boosting the initial growth of axons through the lesion. It is possible that early inhibition will be enough for long-term recovery. If so, these experiments would define a sensitivity window with therapeutic value. 

      We agree that assessing functional recovery requires proper behavioral tests or morphological evaluations of reinnervation. To determine if Bosentan treatment has long-term e]ects on recovery, we administered Bosentan or vehicle for 3 weeks (daily for 1 week, and then once a week for the subsequent 2 weeks) after sciatic nerve crush. At 24 days after SNC, we assessed intraepidermal nerve fiber density (IENFD) in the injured paw and saw a trend towards increased fibers/mm in the treated animals (new Figure 3G,H). Future studies will examine how long-term Bosentan treatment a]ects functional recovery and innervation at later time points. Additionally, behavior assays will be needed to determine if these morphological changes relate to behavioral improvements using IENFD and behavior assays.

      (5) I am unsure if the gene expression analysis shown in Figure 6 fits well into this story. It is interesting per se and in line with previous work from this group showing the relevance of fatty acid metabolism in SGCs for axon regeneration. Nevertheless, without a mechanistic link to endothelin signaling and Cx43/gap junction modulation, the observations derived from DEG analysis are not well integrated with the rest and may be more distracting than helpful. One limitation is that there is no cell-type information for the DEGs due to the small number of cells recovered from aged mice. For instance, if ETBR inhibition rescued gene downregulation associated with fatty acid/cholesterol metabolism, then the DGE results would become more relevant for understanding the cellular basis of the pro-regenerative e]ect, which at this point remains quite speculative (lines 264-265; lines 318-319).  

      We agree and have added new snRNA sequencing data to replace these findings (see above response to point #4, new Figure 5 and Figure 5- Figure Supplement 1. The new data shows a decreased abundance of SGCs in aged mice, consistent with our TEM results. Pathway analysis revealed that aging triggers extensive transcriptional reprogramming in SGCs, reflecting heightened demands for structural integrity, cell junction remodeling, and glia–neuron interactions within the aged DRG microenvironment.  

      (6) It would be interesting to determine whether Bosentan increases SGC coverage of neuronal cell bodies in aged mice (Figures 6A-C). 

      We agree that this would be very interesting, but will require extensive EM analysis at di]erent time points and is beyond the scope of the current manuscript.

      (7) Finally, adding a summary model would help the readers. 

      We agree and have made a summary model, now presented in Figure 6F.

      Reviewer #3 (Recommendations For The Authors): 

      Longer time points post-injury and assessment of functional recovery after Bosentan would be of great value here. 

      We agree that assessing functional recovery requires proper behavioral tests or morphological evaluations of reinnervation. To determine if Bosentan treatment has long-term e]ects on recovery, we administered Bosentan or vehicle for 3 weeks (daily for 1 week, and then once a week for the subsequent 2 weeks) after sciatic nerve crush. At 24 days after SNC, we assessed intraepidermal nerve fiber density in the injured paw and saw a trend towards increased fibers/mm in the treated animals (Fig 3). While the results do not reach significance, we decided to include this new data as it provides evidence that Bosentan treatment may also improves long term recovery. Future studies will be required examine how long-term Bosentan treatment a]ects functional recovery and innervation at later time points. Additionally, behavior assays will be needed to determine if these morphological changes relate to behavioral improvements.

      It would be important to know how ET-1- ETBR signalling axis promotes the regeneration of axons:this remains unaddressed. What are the cells that are specifically involved? Endothelial cellsSGC- neurons- SC? There are no experiments addressing the role of any of these? 

      We agree that the molecular and cellular mechanisms by which ETBR signaling in SGCs promote axon regeneration remains to be elucidated.  Answering these questions would first require cell specific KO of ETBR and Cx43 to confirm that this pathway is operating in SGCs to control axon regeneration. We would also need to identify how SGCs communicate with neurons to regulate axon regeneration, which is a large area of ongoing research that remains poorly understood. While these are important experiments, because of numerous technical and temporal constrains, we believe they are beyond the scope of the current manuscript. 

      How does connexin 43 in SGCs related to ET-1- ETBR signalling? 

      The relation between connexin 43 and ETBR signaling stems from observations made in astrocytes. ET1 signaling in cultured astrocytes, which share functional similarities with SGCs, was shown to lead to the closure of gap junctions and the reduction in Cx43 expression. Because Cx43 expression, a major connexin expressed in SGCs as in astrocytes, was previously shown to be reduced at the protein level in SGCs from aged mice, we decided to explore it this ETBR-Cx43 mechanism also operates in SGCs. We have revised the Introduction and Discussion section extensively to acknowledge the lack of causality in Cx43 expression SGCs and to provide additional potential mechanisms by which ETBR inhibition may promote nerve repair.

    1. Author response:

      We thanks the Reviewers for their thorough reviews and helpful suggestions. We will provide additional quantification as requested for several aspects of the study.

      The methods that we developed were meant to provide candidates for regulatory elements for a gene of interest. These candidates could be used to further understand the regulation of a gene, a complex and difficult task, especially for dynamically regulated genes in the context of development. These candidates could also, or instead, be used to drive gene expression specifically in a target cell of interest for applications such as gene therapy or perturbations that need this type of specificity. In the first case, to use the candidates to understand the regulation of a gene, one would need to validate the candidates using the types of methods typically employed for this purpose, most rigorously in the in vivo genomic context. We did not pursue this level of validation as it would encompass a great deal of work outside the scope of the current study. However, by initially testing loci and CRMs which have been studied by several groups (Rho, Grm6, Vsx2, and Cabp5), and at least in the cases of Rho and Vsx2, shown to be relevant in the genomic context in vivo, we provide evidence that the LS-MPRA can identify relevant CRMs. These data show that the method is worth using for loci of interest, particularly when only one or a few loci are of interest, i.e. one does not need to use genome-wide approaches. It is also apparent that our methods are not perfect and that the LS-MPRA does not pick up all CRMs. We do not know of a method that has been shown to do so.

      Some of the statistical and quantitative data asked for by the Reviewers will be provided. However, it is important to note that the types of statistics using peak callers asked for regarding candidate choice will be of limited value. If one is testing a library in a single cell type in vitro, and/or running genome-wide assays, these statistics could aid in the choice of candidates. However, here we are electroporating a complex and dynamic set of cells, present at very different frequencies. In addition, at least for Olig2 and Ngn2, their expression is very transient, and each is expressed in only a small subset of cells. An additional confound is that the level of expression of each gene that one might test is variable. All of these variables render a statistical prediction of strong candidates to be less valuable than one might hope, and might lead one to miss those CRMs of interest. Instead, we suggest that one use one’s own level of interest and knowledge in choosing CRM candidates. We provide several examples of experimental, rather than purely statistical, approaches that might help in one’s choice of candidates. We used a functional read-out of CRM activity (Notch perturbation), carried out in the context of the entire LS-MPRA library, as one method. Co-expression in single cells of candidate regulators identified by the d-MPRA is another. One can of course use chromatin structure and sequence conservation, as used in many studies of regulatory regions, as other ways to narrow down candidates. The d-MPRA predictions also can be viewed in light of previous genetic studies, i.e. mutations in TFs that effect the cell type of interest or the regulation of the gene of interest, as we were able to do here for CRMs predicted to be regulated by Otx2.

      If one wishes to use a candidate CRM to drive gene expression in a targeted cell type, one needs to establish specificity. In particular, specificity needs to be established in the context of the vector that is being used. Non-integrated vs integrated vectors, different types of viral vectors with their own confounding regulatory sequences, and copy number can all effect specificity. We provided a double in situ hybridization method for the examination of specificity for some of the novel candidate CRMs. It was quite difficult in the case of Olig2 and Ngn2 as their RNAs and proteins are unstable. We would need to provide further evidence should we wish to use these candidate CRMs for directing expression specifically in Olig2- or Ngn2-expressing cells. We suggest that an investigator can choose the vector and method for establishing specificity depending upon the goals of the application.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      The authors investigated sleep and circadian rhythm disturbances in Fmr1 KO mice. Initially, they monitored daily home cage behaviors to assess sleep and circadian disruptions. Next, they examined the adaptability of circadian rhythms in response to photic suppression and skeleton photic periods. To explore the underlying mechanisms, they traced retino-suprachiasmatic connectivity. The authors further analyzed the social behaviors of Fmr1 KO mice and tested whether a scheduled feeding strategy could mitigate sleep, circadian, and social behavior deficits. Finally, they demonstrated that scheduled feeding corrected cytokine levels in the plasma of mutant mice. 

      Strengths: 

      (1) The manuscript addresses an important topic-investigating sleep deficits in an FXS mouse model and proposing a potential therapeutic strategy. 

      (2) The study includes a comprehensive experimental design with multiple methodologies, which adds depth to the investigation. 

      We thank the reviewer for the positive comments.

      Weaknesses: 

      (1) The first serious issue in the manuscript is the lack of a clear description of how they performed the experiments and the missing definitions of various parameters in the results.  

      We thank the reviewer for pointing out lapses in the editing of the manuscript. We were trying to keep the descriptions of previously published methods brief but must have gone too far, the manuscript has been carefully checked for grammar and readability. Description of the experimental design has been refined and a graphical presentation has been added as Suppl Fig 3. The sleep and circadian parameters have been thoroughly explained in the methods and briefly in the figure legnds.

      (2) Although the manuscript has a relatively long Methods section, some essential information is missing. For instance, the definition of sleep bout, as described above, is unclear. Additional missing information includes

      Figure 2: "Rhythmic strength (%)" and "Cycle-to-cycle variability (min)." 

      Figure 3: "Activity suppression." 

      Figure 4: "Rhythmic power (V%)" (is this different from rhythmic strength (%)?) and "Subjective day activity (%)." 

      We have provided definitions for the general audience of the terms used in the field of circadian rhythms, such as sleep bout, rhythm power, cycle-to-cycle, masking, and % of activity during the day in the methods and Fig legends. Most of the techniques used in this study, for example, the behavioral measurement of sleep or locomotor activity, are well established and have been used in multiple published works, including our own. We have made sure to include citations for interested readers.

      Figure 5: Clear labeling of the SCN's anatomical features and an explanation for quantifying only the ventral part instead of the entire SCN. 

      We have added more landmarks (position of the third ventricle and optic chiasm) to Fig 5, and have outlined the shell and core of the SCN in two additional images of the ventral hypothalamus in Suppl fig 4.

      We had actually quantified the fluorescence in the whole SCN as well as in the ventral part.This was/is described in the methods as well as reported in the results section and Table 4 “Likewise, a subtle decrease in the intensity of the labelled fibers was found in the whole SCN (Table 4) of the Fmr1 KO mice as compared to WT.“ 

      Methods: ” Two methods of analyses were carried out on the images of 5 consecutive sections per animal containing the middle SCN. First, the relative intensity of the Cholera Toxin fluorescent processes was quantified in the whole SCN, both left and right separately, by scanning densitometry using the Fiji image processing package of the NIH ImageJ software (https://imagej.net). A single ROI of fixed size (575.99 μm x 399.9 μm, width x height) was used to measure the relative integrated density (mean gray values x area of the ROI) in all the images. The values from the left and right SCN were averaged per section and 5 sections per animal were averaged to obtain one value per animal………..”

      Since the retinal innervation of the SCN is strongest in the ventral aspect, where the retino-hypothalamic fibers reach the SCN and our goal was to identify differences in the input to the SCN, e.g. defects in the retino-SCN connectivity as suggested by some deficits in circadian behaviour; we also looked at intensity of Cholera Toxin in the fibers arriving to the ventral SCN from the retina.

      We have added a sentence in the methods about the rationale for measuring the intensity of the cholera toxin labelled fiber in the whole SCN and also just in the ventral part: “Second, the retinal innervation of the SCN is strongest in the ventral aspect, where the retino-hypothalamic fibers reach the SCN, hence, the distribution….”

      Figure 6: Inconsistencies in terms like "Sleep frag. (bout #)" and "Sleep bouts (#)." Consistent terminology throughout the manuscript is essential.

      We have now clearly explained that sleep bouts are a measure of sleep fragmentation throughout the manuscript and in the fig legends; in addition, we have corrected the figures, reconciled the terminology, which is now consistent throughout the results and methods.

      Methods: “Sleep fragmentation was determined by the number of sleep bouts, which were operationally defined as episodes of continuous immobility with a sleep count greater than 3 per minute, persisting for at least 60 secs.”

      (3) Figure 1A shows higher mouse activity during ZT13-16. It is unclear why the authors scheduled feeding during ZT15- 21, as this seems to disturb the rhythm. Consistent with this, the body weights of WT and Fmr1 KO mice decreased after scheduled feeding. The authors should explain the rationale for this design clearly.

      We have added to the rationale for the feeding schedule. This protocol was initially used by the Panda group to counter metabolic dysfunction (Hatori et al., 2012). We have used it for many years now (see citations below) in various mouse models presenting with circadian disruption to reset the clock and improve sleep. This study represents our first application/intervention in a mouse model of a neurodevelopmental disease.

      Hatori M, Vollmers C, Zarrinpar A, DiTacchio L, Bushong EA, Gill S, Leblanc M, Chaix A, Joens M, Fitzpatrick JA, Ellisman MH, Panda S. Time-restricted feeding without reducing caloric intake prevents metabolic diseases in mice fed a high-fat diet. Cell Metab. 2012 Jun 6;15(6):848-60. doi: 10.1016/j.cmet.2012.04.019. Epub 2012 May 17. PMID: 22608008; PMCID: PMC3491655.

      Chiem E, Zhao K, Dell'Angelica D, Ghiani CA, Paul KN, Colwell CS. Scheduled feeding improves sleep in a mouse model of Huntington's disease. Front Neurosci. 2024 18:1427125. doi: 10.3389/fnins.2024.1427125. PMID: 39161652.

      Whittaker DS, Akhmetova L, Carlin D, Romero H, Welsh DK, Colwell CS, Desplats P. Circadian modulation by time-restricted feeding rescues brain pathology and improves memory in mouse models of Alzheimer's disease. Cell Metab. 2023 35(10):1704- 1721.e6. doi: 10.1016/j.cmet.2023.07.014. PMID: 37607543

      Brown MR, Sen SK, Mazzone A, Her TK, Xiong Y, Lee JH, Javeed N, Colwell CS, Rakshit K, LeBrasseur NK, Gaspar-Maia A, Ordog T, Matveyenko AV. Time-restricted feeding prevents deleterious metabolic effects of circadian disruption through epigenetic control of β cell function. Sci Adv. 2021 7(51):eabg6856. doi: 10.1126/sciadv.abg6856. PMID: 34910509

      Whittaker DS, Loh DH, Wang HB, Tahara Y, Kuljis D, Cutler T, Ghiani CA, Shibata S, Block GD, Colwell CS. Circadian-based Treatment Strategy Effective in the BACHD Mouse Model of Huntington's Disease. J Biol Rhythms. 2018 33(5):535-554. doi: 10.1177/0748730418790401. PMID: 30084274.

      Wang HB, Loh DH, Whittaker DS, Cutler T, Howland D, Colwell CS. Time-Restricted Feeding Improves Circadian Dysfunction as well as Motor Symptoms in the Q175 Mouse Model of Huntington's Disease. eNeuro. 2018 Jan 3;5(1):ENEURO.0431-17.2017. doi: 10.1523/ENEURO.0431-17.2017.

      Loh DH, Jami SA, Flores RE, Truong D, Ghiani CA, O'Dell TJ, Colwell CS. Misaligned feeding impairs memories. Elife. 2015 4:e09460. doi: 10.7554/eLife.09460.

      (4) The interpretation of social behavior results in Figure 6 is questionable. The authors claim that Fmr1 KO mice cannot remember the first stranger in a three-chamber test, writing, "The reduced time in exploring and staying in the novelmouse chamber suggested that the Fmr1 KO mutants were not able to distinguish the second novel mouse from the first now-familiar mouse." However, an alternative explanation is that Fmr1 KO mice do remember the first stranger but prefer to interact with it due to autistic-like tendencies. Data in Table 5 show that Fmr1 KO mice spent more time interacting with the first stranger in the 3-chamber social recognition test, which support this possibility. Similarly, in the five-trial social test, Fmr1 KO mice's preference for familiar mice might explain the reduced interaction with the second stranger.

      Thank you for this interesting interpretation of the social behavior experiments. We used the common interpretations for both the three-chamber test and the 5-trial social interaction test, but have now modified the text leaving space for alternative interpretations, have soften the language, and mentioned decreased sociability in the Fmr1 KO mice. “The reduced time spent exploring the novel-mouse chamber suggest that the mutants were, perhaps, unable to distinguish the second novel mouse from the first, now familiar, mouse, along with decreased sociability.”

      In Figure 6C (five-trial social test results), only the fifth trial results are shown. Data for trials 1-4 should be provided and compared with the fifth trial. The behavioral features of mice in the 5-trial test can then be shown completely. In addition, the total interaction times for trials 1-4 (154 {plus minus} 15.3 for WT and 150 {plus minus} 20.9 for Fmr1 KO) suggest normal sociability in Fmr1 KO mice (it is different from the results of 3-chamber). Thus, individual data for trials 1-4 are required to draw reliable conclusions.  

      We have added a suppl figure showing the individual trial results for both WT and Fmr1 KO mice as requested (Suppl. Fig. 2).  

      In Table 6 and Figure 6G-6J, the authors claim that "Sleep duration (Figures 6G, H) and fragmentation (Figures 6I, J) exhibited a moderate-strong correlation with both social recognition and grooming." However, Figure 6I shows a p-value of 0.077, which is not significant. Moreover, Table 6 shows no significant correlation between SNPI of the three-chamber social test and any sleep parameters. These data do not support the authors' conclusions. 

      Thanks for pointing out the error with statement about Fig. 6I.

      “…. Sleep duration (Fig. 6G, H; Table 6) exhibited a moderate to strong correlation with both social recognition and grooming time, while sleep fragmentation (measured by sleep bouts number) only correlated with the latter (Fig. 6J); the length of sleep bouts (Table 6) showed moderate correlation with both social recognition and repetitive behavior. In addition, a moderate correlation was seen between grooming time and the circadian parameters, rhythmic power and activity onset variability (Table 6). In short, our work suggests that even when tested during their circadian active phase, the Fmr1 KO mice exhibit robust repetitive and social behavioral deficits. Moreover, the shorter and more fragmented the daytime sleep, the more severe the behavioral impairment in the mutants.”

      (5) Figure 7 demonstrates the effect of scheduled feeding on circadian activity and sleep behaviors, representing another critical set of results in the manuscript. Notably, the WT+ALF and Fmr1 KO+ALF groups in Figure 7 underwent the same handling as the WT and Fmr1 KO groups in Figures 1 and 2, as no special treatments were applied to these mice. However, the daily patterns observed in Figures 7A, 7B, 7F, and 7G differ substantially from those shown in Figures 2B and 1A, respectively. Additionally, it is unclear why the WT+ALF and Fmr1 KO+ALF groups did not exhibit differences in Figures 7I and 7J, especially considering that Fmr1 KO mice displayed more sleep bouts but shorter bout lengths in Figures 1C and 1D. 

      We appreciate the reviewer’s attention to the subtle details of the behavioral measurement of sleep and believe the reviewer to be referring to differences in the behavioral measurements of sleep with data shown in Table 1 and Table 7. The first set of experiments described in this study was carried out between 2016 and 2017 and involves the comparison between WT and Fmr1 KO mice. The WT and mutants were obtained from JAX. In this initial set of experiments (Table 1), the total amount of sleep in 24 hrs was reduced in the KO, albeit not significantly, and these also exhibited sleep bouts of significantly reduced duration. The pandemic forced us to greatly slow down the research and reduce our mouse colonies. Post-pandemic, we used new cohorts of Fmr1 KO ordered again from JAX for the TRF experiment presented in this study. In these cohorts, the KO mice exhibited a significant reduction in total sleep (Table 7) and the sleep bouts were still shorter but not significantly. We have added to our text to explain that the description of the mutants and TRF interventions were carried out at different times (2017 vs 2022). We would like to emphasize that we always run contemporaneously controls and experimental groups to be used for the statistical analyses. We believe that the data are remarkably consistent over these years, even with different students doing the measurements. 

      Furthermore, it is not specified whether the results in Figure 7 were collected after two weeks of scheduled feeding (for how many days?) or if they represent the average data from the two-week treatment period.

      This is another good point raised by the reviewer. The activity measurements are collected during the 2 weeks (14 days) then the TRF was extended for a 3 more days to allow the behavioral sleep measurements.

      We have added a supplementary figure (Supp Fig 3) depicting the different experimental designs.

      The rationale behind analyzing "ZT 0-3 activity" in Figure 7D instead of the parameters shown in Figures 2C and 2D is also unclear. 

      We have added to our explanation. In prior work, we found that the TRF protocol has a big impact on the beginning of the sleep time, hence, we specifically targeted this 3-hours interval in the analysis.

      In Figure 7F, some data points appear to be incorrectly plotted. For instance, the dark blue circle at ZT13 connects to the light blue circle at ZT14 and the dark blue circle at ZT17. This is inconsistent, as the dark blue circle at ZT13 should link to the dark blue circle at ZT14. Similarly, it is perplexing that the dark blue circle at ZT16 connects to both the light blue and dark blue circles at ZT17. Such errors undermine confidence in the data. The authors need to provide a clear explanation of how these data were processed. 

      Thank you for bringing this to our attention. The data were plotted correctly, however, those data points completely overlapped with those behind, masking them. We have now offset a bit them for clarity.

      Lastly, in the Figure 7 legend, Table 6 is cited; however, this appears to be incorrect. It seems the authors intended to refer to Table 7. 

      We have corrected this error, thank you.  

      (6) Similar to the issue in Figure 7F, the data for day 12 in Supplemental Figure 2 includes two yellow triangles but lacks a green triangle. It is unclear how the authors constructed this chart, and clarification is needed. 

      We have corrected this error. As the reviewer pointed out, we filled the triangle on day 12 with yellow instead of green.  

      (7) In Figure 8, a 5-trial test was used to assess the effect of scheduled feeding on social behaviors. It is essential to present the results for all trials (1 to 4). Additionally, it is unclear whether the results for familial mice in Figure 8A correspond to trials 1, 2, 3, or 4. 

      The legend for Figure 8 also appears to be incorrect: "The left panels show the time spent in social interactions when the second novel stranger mouse was introduced to the testing mouse in the 5-trial social interaction test. The significant differences were analyzed by two-way ANOVA followed by Holm-Sidak's multiple comparisons test with feeding treatment and genotype as factors." This description does not align with the content of the left panels. Moreover, two-way ANOVA is not the appropriate statistical analysis for Figure 8A. The authors need to provide accurate details about the analysis and revise the figure legend accordingly. 

      We apologies for the confusing Figure legend which has been revised: 

      “Fig. 8: TRF improved social memory and stereotypic grooming behavior in the Fmr1 KO mice. (A) Social memory was evaluated with the 5-trial social interaction test as described above. The social memory recognition was significantly augmented in the Fmr1 KO by the intervention, suggesting that the treated mutants were able to distinguish the novel mouse from the familiar mouse. The time spent in social interactions with the novel mouse in the 5<sup>th</sup>-trial was increased to WT-like levels in the mutants on TRF. Paired t-tests were used to evaluate significant differences in the time spent interacting with the test mouse in the 4<sup>th</sup> (familiar mouse) and 5<sup>th</sup> (novel mouse) trials.  *P < 0.05 indicates the significant time spent with the novel mouse compared to the familiar mouse. (B) Grooming was assessed in a novel arena in mice of each genotype (WT, Fmr1 KO) under each feeding condition and the resulting data analyzed by two-way ANOVA followed by the Holm-Sidak’s multiple comparisons test with feeding regimen and genotype as factors. *P < 0.05 indicates the significant difference within genotype - between diet regimens , and #P < 0.05 those between genotypes - same feeding regimen. (C) TRF did not alter the overall locomotion in the treated mice. See Table 8.”

      To assess social recognition memory, mice underwent a five-trial social interaction paradigm in a neutral open-field arena. Each trial lasted 5 minutes and was separated by a 1-minute inter-trial interval. During trials 1–4, the test mouse was exposed to the same conspecific (Stimulus A) enclosed within a wire cup to permit olfactory and limited tactile interaction. In trial 5, a novel conspecific (Stimulus B) was introduced. Time spent investigating the stimulus B mouse (defined as sniffing or directing the nose toward the enclosure within close proximity) was scored using AnyMaze software. A progressive decrease in investigation time across trials 1–4 reflects habituation, while a significant increase in trial 5 indicates dishabituation and intact social recognition memory. In our data, there was not a lot of habituation in both genotypes, but clear differences can be appreciated between trial 4 with the now familiar mouse and trial 5 with novel mouse. Fig. 8A plots the results from individual animals in Trial 4 with a familiar mouse and in Trial 5 with a novel mouse, we have well specified this in the legends. As such, these data were analyzed with a pair t-test. 

      We used Tow-Way ANOVA to analyse the data reported in Panel 8B and as well as the results in Table 8.  This has been clarified in the legend.

      (8) The circadian activity and sleep behaviors of Fmr1 KO mice have been reported previously, with some findings consistent with the current manuscript, while others contradict it. Although the authors acknowledge this discrepancy, it seems insufficiently thorough to simply state that the reasons for the conflicts are unknown. Did the studies use the same equipment for behavior recording? Were the same parameters used to define locomotor activity and sleep behaviors? The authors are encouraged to investigate these details further, as doing so may uncover something interesting or significant. 

      We agree with the reviewers, and believe that the main differences were likely in the experimental design and possibly interpretation.

      (9) Some subtitles in the Results section and the figure legends do not align well with the presented data. For example, in the section titled "Reduced rhythmic strength and nocturnality in the Fmr1 KOs," it is unclear how the authors justify the claim of altered nocturnality in Fmr1 KO mice. How do the authors define changes in nocturnality? Additionally, the tense used in the subtitles and figure legends is incorrect. The authors are encouraged to carefully review all subtitles and figure legends to correct these errors and enhance readability. 

      Nocturnality is defined as the % of total activity within a 24-h cycle that occurred in the night, since this can be confusing and we agree that it was not well explained we have removed it from the subtitle/figure legends. 

      We have adjusted the subtitles as recommended; however, the tense of the verbs might be a matter of writing style.

      Reviewer #2 (Public review): 

      Summary: 

      In the present study, the authors, using a mouse model of Fragile X syndrome, explore the very interesting hypothesis that restricting food access over a daily schedule will improve sleep patterns and, subsequently, behavioral capacities. By restricting food access from 12h to 6h over the nocturnal period (active period for mice), they show, in these KO mice, an improvement of the sleep pattern accompanied by reduced systemic levels of inflammatory markers and improved behavior. Using a classical mouse model of neurodevelopmental disorder (NDD), these data suggest that eating patterns might improve sleep quality, reduce inflammation and improve cognitive/behavioral capacities in children with NDD. 

      Strengths: 

      Overall, the paper is very well-written and easy to follow. The rationale of the study is generally well-introduced. The data are globally sound. The provided data support the interpretation overall. 

      Thank you for the positive comments.  

      Weaknesses:  

      (1) The introduction part is quite long in the Abstract, leaving limited space for the data provided by the present study.

      We have revised the Abstract to better focus on the most impactful findings as suggested. 

      (2) A couple of points are not totally clear for a non-expert reader:  - The Fmr1/Fxr2 double KO mice are not well described. What is the rationale for performing both LD and DD measures? 

      We did not use the Fmr1/Fxr2 double KO mice in this study.  

      While measurement of day/night differences in activity rhythms are standardly done in a light/dark (LD) cycle, the organisms must be under constant conditions (DD) to measure their endogenous circadian rhythms (free running activity); this is often needed to uncover a compromised clock as entrainment to the LD cycle can mask deficits in the endogenous circadian rhythms.

      (3) The data on cytokines and chemokines are interesting. However, the rationale for the selection of these molecules is not given. In addition, these measures have been performed in the systemic blood. Measures in the brain could be very informative. 

      The panel that we used had 16 cytokines/chemokines which are reported in Table 9. The experiment included WT and mutants held under 2 different feeding conditions with an n=8 per group. If we are able to obtain more resources, we would like to also carry out a comprehensive investigation of immunomediator levels as well as RNA-seq or Nanostring in selected brain regions associated with ASD aberrant behavioural phenotypes, for instance the prefrontal cortex.

      (4) An important question is the potential impact of fasting vs the impact of the food availability restriction. Indeed, fasting has several effects on brain functioning including cognitive functions. 

      We did not address this issue in the present study. Briefly, the distinction between caloric restriction (CR) and TRF, in which no calories are restricted, has important mechanistic implications in mouse models. While both interventions can impact metabolism, circadian rhythms, and aging, they operate via overlapping but distinct molecular pathways. These have been the topic of recent reviews and investigations. Importantly, the fast-feed cycle can also act as a circadian entrainer (Zeitgeber)

      Ribas-Latre A, Fernández-Veledo S, Vendrell J. Time-restricted eating, the clock ticking behind the scenes. Front Pharmacol. 2024 Aug 8;15:1428601. doi: 10.3389/fphar.2024.1428601. PMID: 39175542; PMCID: PMC11338815.

      Wang R, Liao Y, Deng Y, Shuang R. Unraveling the Health Benefits and Mechanisms of Time-Restricted Feeding: Beyond Caloric Restriction. Nutr Rev. 2025 Mar 1;83(3):e1209-e1224. doi: 10.1093/nutrit/nuae074.

      (5) How do the authors envision the potential translation of the present study to human patients? How to translate the 12 to 6 hours of food access in mice to children with Fragile X syndrome? 

      Time-restricted feeding (TRF) is a type of intermittent fasting that limits food intake to a specific window of time each day (usually 8–12 hours in humans), is being actively studied in adults for benefits on metabolic health, sleep, and circadian rhythms. However, applying TRF to children is not currently recommended as a general intervention, and there are important developmental, medical, and ethical considerations to take into account.  

      On the other hand, we believe that the Fmr1 KO mouse is a good preclinical model for FXS because it closely recapitulates key molecular, cellular, and behavioral phenotypes observed in humans with the disorder. A number of the behavioral phenotypes seen in the mouse mirror those seen in patients including increased anxiety-like behavior, sensory hypersensitivity, social interaction deficits and repetitive behaviors so there is strong face validity.  

      As we show in this study, Fmr1 KO mice present with disrupted sleep/wake cycles and reduced amplitude of circadian rhythms, consistent with findings in individuals with FXS. This makes the Fmr1 KO an excellent model to test out circadian based interventions such as scheduled feeding.

      We believe that pre-clinical research in Fmr1 KO mice bridges the gap between basic discovery and human clinical application. It provides a controlled, cost-effective, and biologically relevant platform for understanding disease mechanisms and testing interventions. These types of experiments need to be done before jumping to humans to ensure that the human trials are scientifically justified and ethically sound.

      Reviewer #1 (Recommendations for the authors): 

      The authors should: 

      (1) Revise the Methods section for clarity and completeness.  

      We have re-worked the methods for clarity and completeness. 

      (2) Provide consistent and precise definitions for all parameters and terms.  

      We believe that we have provided definitions for all terms.  

      (3) Clarify the rationale for experimental designs, such as the feeding schedule.  

      We have added to the rationale for the feeding schedule.  This feeding schedule has been used in a number of prior studies including our own.  All this work is cited in the manuscript.   

      (4) Reanalyze and transparently present data, including individual trial results.  

      We have added to the figure showing the individual trail results for the 5-trial tests as requested (Supplementary Fig. 2).  

      (5) Conduct appropriate statistical tests and correct figure legends.  

      We believe that we have carried out appropriate statistical tests and have carefully rechecked the figure legends.  

      (6) Investigate discrepancies with prior studies to enhance the discussion. 

      We have added to our discussion of prior work. 

      (7) Improve language quality and ensure consistency in terminology and grammar.  

      We have edited the manuscript to improve language quality.  

      Reviewer #2 (Recommendations for the authors): 

      (1) The Abstract should be rewritten to provide more room for the obtained data.  

      We have re-written the Abstract to focus on the most impactful findings. 

      (2) An additional sentence describing the double KO mice should be added.  

      We did not use double KO mice in this study.  

      (3) The rationale for studying LD and DD should be provided. 

      Measurement of day/night differences are standardly done in a light/dark cycle.  To measure the endogenous circadian rhythms, the organisms must be under constant conditions (Dark/Dark).

      (4) The data on cytokines/chemokines should be strengthened by performing a larger panel of measures both in blood and the brain.  

      The panel that we used had 16 cytokines/chemokines which we report in Table 9.  This was a large experiment with 2 genotypes being held under 2 feeding conditions with n=8 mice per group. If we are able to obtain more resources, we would like to also carry out RNA-seq in different brain regions.  

      (5) The authors should discuss in more detail the potential role of fastening vs restriction of food access.  

      We did not address this issue in the present study.  Briefly, the distinction between caloric restriction (CR) and TRF when no calories are restricted has important mechanistic implications in mouse models. While both interventions can impact metabolism, circadian rhythms, and aging, they operate via overlapping but distinct molecular pathways. 

      (6) The authors should also provide some insight into their view on the potential translation of their experimental studies.  

      We believe that the Fmr1 KO mouse is considered a good preclinical model for FXS because it closely recapitulates key molecular, cellular, and behavioral phenotypes observed in humans with the disorder. A number of the behavioral phenotypes seen in the mouse mirror those seen in patients including increased anxiety-like behavior, sensory hypersensitivity, social interaction deficits and repetitive behaviors so there is strong face validity.   As we  demonstrate in this study, Fmr1 KO mice exibit disrupted sleep/wake cycles and reduced amplitude of circadian rhythms, consistent with findings in individuals with FXS.  This makes the Fmr1 KO an excellent model to test out circadian based interventions such as scheduled feeding.  

      Still we are mindful that the translation of therapeutic findings from mouse to human has proven challenging e.g., mGluR5 antagonists failed in clinical trials despite strong preclinical data (Berry-Kravis et al., 2016).  Therefore, we are cautious in overreaching in our translational interpretations. 

      Berry-Kravis, E., Des Portes, V., Hagerman, R., Jacquemont, S., Charles, P., Visootsak, J., Brinkman, M., Rerat, K., Koumaras, B., Zhu, L., Barth, G. M., Jaecklin, T., Apostol, G., & von Raison, F. (2016). Mavoglurant in fragile X syndrome: Results of two randomized, double-blind, placebo-controlled trials. Science translational medicine, 8(321), 321ra5. https://doi.org/10.1126/scitranslmed.aab4109).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The manuscript proposes that 5mC modifications to DNA, despite being ancient and widespread throughout life, represent a vulnerability, making cells more susceptible to both chemical alkylation and, of more general importance, reactive oxygen species. Sarkies et al take the innovative approach of introducing enzymatic genome-wide cytosine methylation system (DNA methyltransferases, DNMTs) into E. coli, which normally lacks such a system. They provide compelling evidence that the introduction of DNMTs increases the sensitivity of E. coli to chemical alkylation damage. Surprisingly they also show DNMTs increase the sensitivity to reactive oxygen species and propose that the DNMT generated 5mC presents a target for the reactive oxygen species that is especially damaging to cells. Evidence is presented that DNMT activity directly or indirectly produces reactive oxygen species in vivo, which is an important discovery if correct, though the mechanism for this remains obscure.

      Strengths:

      This work is based on an interesting initial premise, it is well-motivated in the introduction and the manuscript is clearly written. The results themselves are compelling.

      We thank the reviewer for their positive response to our study.  We also really appreciate the thoughtful comments raised.  We have addressed the comments raised as detailed below. 

      Weaknesses:

      I am not currently convinced by the principal interpretations and think that other explanations based on known phenomena could account for key results. Specific points below.

      (1) As noted in the manuscript, AlkB repairs alkylation damage by direct reversal (DNA strands are not cut). In the absence of AlkB, repair of alklylation damage/modification is likely through BER or other processes involving strand excision and resulting in single stranded DNA. It has previously been shown that 3mC modification from MMS exposure is highly specific to single stranded DNA (PMID:20663718) occurring at ~20,000 times the rate as double stranded DNA. Consequently, the introduction of DNMTs is expected to introduce many methylation adducts genome-wide that will generate single stranded DNA tracts when repaired in an AlkB deficient background (but not in an AlkB WT background), which are then hyper-susceptible to attack by MMS. Such ssDNA tracts are also vulnerable to generating double strand breaks, especially when they contain DNA polymerase stalling adducts such as 3mC. The generation of ssDNA during repair is similarly expected follow the H2O2 or TET based conversion of 5mC to 5hmC or 5fC neither of which can be directly repaired and depend on single strand excision for their removal. The potential importance of ssDNA generation in the experiments has not been considered.

      We thank the reviewer for this interesting and insightful suggestion.  Our interpretation of our findings is that a subset of MMS-induced DNA damage, specifically 3mC, overlaps with the damage introduced by DNMTs and this accounts for increased sensitivity to MMS when DNMTs are expressed.  However, the idea that the introduction of 3mC by DNMT actually makes the DNA more liable to damage by MMS, potentially through increasing the level of ssDNA, is also a potential explanation, which could operate in addition to the mechanism that we propose.

      (2) The authors emphasise the non-additivity of the MMS + DNMT + alkB experiment but the interpretation of the result is essentially an additive one: that both MMS and DNMT are introducing similar/same damage and AlkB acts to remove it. The non-additivity noted would seem to be more consistent with the ssDNA model proposed in #1. More generally non-additivity would also be seen if the survival to DNA methylation rate is non-linear over the range of the experiment, for example if there is a threshold effect where some repair process is overwhelmed. The linearity of MMS (and H2O2) exposure to survival could be directly tested with a dilution series of MMS (H2O2).

      We thank the reviewer for this point.  As in the response to point #1, the reviewer’s hypothesis of increased potency of MMS, potentially through increased ssDNA, downstream of 3mC induction by DNMT, is a good one.  We have added a dose-response curve for DNMT-expressing cells to MMS to the revised version of the manuscript.  This shows that there is a non-linear response to MMS in the WT background.  Sensitivity is exacerbated by expression of DNMT and alkB mutation individually but there is also a strong non-additive effect that is particularly marked at low MMS concentrations where sensitivity is much higher in the double mutant than predicted from the two single mutants.  This is consistent with induction of DNA damage by DNMT that is repaired by alkB because alkB can be ‘overwhelmed’ even in WT backgrounds as the reviewer suggests.  However, it is also perfectly possible that the effect is due to increased levels of DNA damage induction in DNMT-expressing cells.  Both these results are compatible with our central hypothesis, namely that DNMT expression induces 3mC.  We have included these results along with discussion of them in the revised text in the results section:

      In order to investigate the non-additivity between DNMT expression and alkB mutation further, we investigated the effect of MMS over a range of concentrations for the different strains (Supplemental Figure 1A).  We quantified the non-additivity by comparing between the survival of alkB expressing DNMT to the predicted combined effect of either alkB mutation alone or DNMT expression alone(Supplemental Figure 1B).  Significantly reduced survival than expected was observed, most notably at low concentrations of MMS, which could be due to the saturation of the effect at high concentrations of MMS for alkB mutants expressing DNMT, where extremely high levels of sensitivity were observed.  The non-linear shape of the graph observed for WT cells expressing DNMTs further suggests that the ability of AlkB to repair the DNA is overwhelmed at high MMS concentrations even in the WT background.  These results are consistent with the idea that AlkB repairs a form of DNA damage from MMS that is more prevalent when DNMT is expressed.  This could be because DNMT induces 3mC, repaired by AlkB, and further 3mC is induced by MMS leading to much higher 3mC levels in the absence of AlkB activity.  Alternatively, 3mC induction by DNMT may lead to increased levels of ssDNA, particularly in alkB mutants, which could increase the risk of further DNA damage by MMS exposure and heighten sensitivity.  Either of these mechanisms are consistent with induction of 3mC by DNMT, and  indicate that the induction of DNA damage by DNMT expression has a fitness cost for cells when exposed to genotoxic stress in their environment. 

      (3) The substantial transcriptional changes induced by DNMT expression (Supplemental Figure 4) are a cause for concern and highlight that the ectopic introduction of methylation into a complex system is potentially more confounded than it may at first seem. Though the expression analysis shows bulk transcription properties, my concern is that the disruptive influence of methylation in a system not evolved with it adds not just consistent transcriptional changes but transcriptional heterogeneity between cells which could influence net survival in a stressed environment. In practice I don't think this can be controlled for, possibly quantified by single-cell RNA-seq but that is beyond the reasonable scope of this paper.

      We fully agree with the reviewer and, indeed, we are very interested in what is driving the transcriptional changes that we observed.  Work is currently underway in the lab to investigate this further but, as the reviewer suggests, is beyond the scope of this paper.  Importantly, we have used the transcriptional data to determine that the effect of DNMTs on ROS is unlikely to be due to failure of ROS-induced detoxification mechanisms by investigating the expression of oxyR regulated genes.  Nevertheless we have explicitly mentioned the concern raised by the reviewer in the revised manuscript as follows:

      “The substantial transcriptional responses could potentially affect how individual cells respond to genotoxic stress and thus could be contributing to some of the excess sensitivity to MMS and H2O2 in cells expressing DNMTs. However, the induction of oxyR regulated genes such as catalase was unaffected by 5mC (Supplementary Figure 4B).  Thus, the increased sensitivity to H2O2 is unlikely to be caused by failure of detoxification gene induction by DNMT expression.”

      (4) Figure 4 represents a striking result. From its current presentation it could be inferred that DNMTs are actively promoting ROS generation from H2O2 and also to a lesser extent in the absence of exogenous H2O2. That would be very surprising and a major finding with far-reaching implications. It would need to be further validated, for example by in vitro reconstitution of the reaction and monitoring ROS production. Rather, I think the authors are proposing that some currently undefined, indirect consequence of DNMT activity promotes ROS generation, especially when exogenous H2O2 is available. It would help if this were clarified.

      We thank the reviewer for picking this up.  In the discussion, we raise two possible explanations for why DNMT (even without H2O2) increases the ROS levels.  One idea is direct activity of DNMT, and one is through the product of DNMT activity (5mC) acting as a platform to generate more ROS from endogenous or exogenous sources.  Whilst we attempted to measure ROS from mSSSI activity in vitro, this experiment gave inconsistent results and therefore we cannot distinguish between these two possibilities.  However, we argued that direct activity is less likely, exactly as the reviewer points out.  We have clarified our discussion in the revised version, rewriting the entire section titled

      Oxidative stress as a new source of DNA damage induction by DNMT expression to more clearly set out these possibilities. 

      Reviewer #2 (Public review):

      5-methylcytosine (5mC) is a key epigenetic mark in DNA and plays a crucial role in regulating gene expression in many eukaryotes including humans. The DNA methyltransferases (DNMTs) that establish and maintain 5mC, are conserved in many species across eukaryotes, including animals, plants, and fungi, mainly in a CpG context. Interestingly, 5mC levels and distributions are quite variable across phylogenies with some species even appearing to have no such DNA methylation.

      This interesting and well-written paper discusses the continuation of some of the authors' work published several years ago. In that previous paper, the laboratory demonstrated that DNA methylation pathways coevolved with DNA repair mechanisms, specifically with the alkylation repair system. Specifically, they discovered that DNMTs can introduce alkylation damage into DNA, specifically in the form of 3-methylcytosine (3mC). (This appears to be an error in the DNMT enzymatic mechanism where the generation 3mC as opposed to its preferred product 5-methylcytosine (5mC), is caused by the flipped target cytosine binding to the active site pocket of the DNMT in an inverted orientation.) The presence of 3mC is potentially toxic and can cause replication stress, which this paper suggests may explain the loss of DNA methylation in different species. They further showed that the ALKB2 enzyme plays a crucial role in repairing this alkylation damage, further emphasizing the link between DNA methylation and DNA repair.

      The co-evolution of DNMTs with DNA repair mechanisms suggests there can be distinct advantages and disadvantages of DNA methylation to different species which might depend on their environmental niche. In environments that expose species to high levels of DNA damage, high levels of 5mC in their genome may be disadvantageous. This present paper sets out to examine the sensitivity of an organism to genotoxic stresses such as alkylation and oxidation agents as the consequence of DNMT activity. Since such a study in eukaryotes would be complicated by DNA methylation controlling gene regulation, these authors cleverly utilize Escherichia coli (E.coli) and incorporate into it the DNMTs from other bacteria that methylate the cytosines of DNA in a CpG context like that observed in eukaryotes; the active sites of these enzymes are very similar to eukaryotic DNMTs and basically utilize the same catalytic mechanism (also this strain of E.coli does not specifically degrade this methylated DNA) .

      The experiments in this paper more than adequately show that E. coli expression of these DNMTs (comparing to the same strain without the DNMTS) do indeed show increased sensitivity to alkylating agents and this sensitivity was even greater than expected when a DNA repair mechanism was inactivated. Moreover, they show that this E. coli expressing this DNMT is more sensitive to oxidizing agents such as H2O2 and has exacerbated sensitivity when a DNA repair glycosylase is inactivated. Both propensities suggest that DNMT activity itself may generate additional genotoxic stress. Intrigued that DNMT expression itself might induce sensitivity to oxidative stress, the experimenters used a fluorescent sensor to show that H2O2 induced reactive oxygen species (ROS) are markedly enhanced with DNMT expression. Importantly, they show that DNMT expression alone gave rise to increased ROS amounts and both H2O2 addition and DNMT expression has greater effect that the linear combination of the two separately. They also carefully checked that the increased sensitivity to H2O2 was not potentially caused by some effect on gene expression of detoxification genes by DNMT expression and activity. Finally, by using mass spectroscopy, they show that DNMT expression led to production of the 5mC oxidation derivatives 5-hydroxymethylcytosine (5hmC) and 5-formylcytosine (5fC) in DNA. 5fC is a substrate for base excision repair while 5hmC is not; more 5fC was observed. Introduction of non-bacterial enzymes that produce 5hmC and 5fC into the DNMT expressing bacteria again showed a greater sensitivity than expected. Remarkedly, in their assay with addition of H2O2, bacteria showed no growth with this dual expression of DNMT and these enzymes.

      Overall, the authors conduct well thought-out and simple experiments to show that a disadvantageous consequence of DNMT expression leading to 5mC in DNA is increased sensitivity to oxidative stress as well as alkylating agents.

      Again, the paper is well-written and organized. The hypotheses are well-examined by simple experiments. The results are interesting and can impact many scientific areas such as our understanding of evolutionary pressures on an organism by environment to impacting our understanding about how environment of a malignant cell in the human body may lead to cancer.

      We thank the reviewer for their response to our study, and value the time taken to produce a public review that will aid readers in understanding the key results of our study. 

      Reviewer #3 (Public review):

      Summary:

      Krwawicz et al., present evidence that expression of DNMTs in E. coli results in (1) introduction of alkylation damage that is repaired by AlkB; (2) confers hypersensitivity to alkylating agents such as MMS (and exacerbated by loss of AlkB); (3) confers hypersensitivity to oxidative stress (H2O2 exposure); (4) results in a modest increase in ROS in the absence of exogenous H2O2 exposure; and (5) results in the production of oxidation products of 5mC, namely 5hmC and 5fC, leading to cellular toxicity. The findings reported here have interesting implications for the concept that such genotoxic and potentially mutagenic consequences of DNMT expression (resulting in 5mC) could be selectively disadvantageous for certain organisms. The other aspect of this work which is important for understanding the biological endpoints of genotoxic stress is the notion that DNA damage per se somehow induces elevated levels of ROS.

      Strengths:

      The manuscript is well-written, and the experiments have been carefully executed providing data that support the authors' proposed model presented in Fig. 7 (Discussion, sources of DNA damage due to DNMT expression).

      Weaknesses:

      (1) The authors have established an informative system relying on expression of DNMTs to gauge the effects of such expression and subsequent induction of 3mC and 5mC on cell survival and sensitivity to an alkylating agent (MMS) and exogenous oxidative stress (H2O2 exposure). The authors state (p4) that Fig. 2 shows that "Cells expressing either M.SssI or M.MpeI showed increased sensitivity to MMS treatment compared to WT C2523, supporting the conclusion that the expression of DNMTs increased the levels of alkylation damage." This is a confusing statement and requires revision as Fig. 2 does ALL cells shown in Fig. 2 are expressing DNMTs and have been treated with MMS. It is the absence of AlkB and the expression of DNMTs that that causes the MMS sensitivity.

      We thank the reviewer for this and agree that this needs to be clarified with regards to the figure presented and will do so in the revised manuscript. The key comparison is between the active and inactive mSSSI which shows increased sensitivity when active methyltransferases are expressed.  We have clarified this in the revised version of the manuscript as follows:

      “Cells expressing either M.SssI or M.MpeI showed increased sensitivity to MMS treatment compared to cells expressing inactive M.SssI”

      (2) It would be important to know whether the increased sensitivity (toxicity) to DNMT expression and MMS is also accompanied by substantial increases in mutagenicity. The authors should explain in the text why mutation frequencies were not also measured in these experiments.

      This is an important point because it is not immediately obvious that increased sensitivity would be associated with increased mutagenicity (if, for example, 3mC was never a cause of innacurate DNA repair even in the absence of AlkB).  We have now added a Rif resistance assay which demonstrates increased mutagenesis in the presence of DNMT, and that this is exacerbated by loss of AlkB. This is now added as supplemental figure 2 and described in the manuscript as follows:

      “One potential consequence of DNMT activity in inducing DNA damage might be increased mutagenesis.  To test this we performed a rifampicin resistance mutagenesis assay, in the absence of MMS, to test whether DNMT induced damage was sufficient to lead to mutation rate increase.  Mutation rate was increased by DNMT expression (p=1.6e-12; two way anova; Supplemental Figure 2) and alkB mutation (two way anova) separately (p<1e-16).  Moreover, there was a significant interaction such that combined alkB mutation and DNMT expression led to a further increased mutation rate compared to the expectation from alkB mutation and DNMT expression separately (p = 7.9e-10; Supplemental Figure 2).  Importantly, DNMT induction alone would be expected to lead to increased mutations due to cytosine deamination(Sarkies, 2022a); however, there is a synergistic effect on mutations when this is combined with loss of AlkB function in alkB mutants. This is consistent with 3mC induction by DNMTs which is repaired by AlkB in WT cells but leads to mutations in alkB mutant cells.

      (3) Materials and Methods. ROS production monitoring. The "Total Reactive Oxygen Species (ROS) Assay Kit" has not been adequately described. Who is the Vendor? What is the nature of the ROS probes employed in this assay? Which specific ROS correspond to "total ROS"?

      The ROS measurement was with a kit from ThermoFisher: https://www.thermofisher.com/order/catalog/product/88-5930-74.  The probe is DCFH-DA.  This is a general ROS sensor that is oxidised by a large number of cellular reactive oxygen species hence we cannot attribute the signal to a single species.  Use of a technique with the potential to more precisely identify the species involved is something we plan to do in future, but is beyond what we can do as part of this study.  We have added a comment as to the specificity of the ROS sensor in the revised version as follows:

      “The ROS detection reagent in this system is DCFH-DA, a generalised ROS sensor that is not specific to any particular ROS molecule.”     

      (4) The demonstration (Fig. 4) that DNMT expression results in elevated ROS and its further synergistic increase when cells are also exposed to H2O2 is the basis for the authors' discussion of DNA damage-induced increases in cellular ROS. S. cerevisiae does not possess DNMTs/5mC, yet exposure to MMS also results in substantial increases in intracellular ROS (Rowe et al, (2008) Free Rad. Biol. Med. 45:1167-1177. PMC2643028). The authors should be aware of previous studies that have linked DNA damage to intracellular increases in ROS in other organisms and should comment on this in the text.

      We thank the reviewer for this point.  We note that the increased ROS that we observed occur in the presence of DNMTs alone and in the presence of H2O2, not in the presence of MMS; however, the point that DNA damage in general can promote increased ROS in some circumstances is well taken.  We have included a comment on this in the revised version as follows:

      “We believe this is a plausible mechanism to explain both increased ROS and increased sensitivity to oxidative stress when DNMT is expressed.  However, other explanations are possible, and it is notable that DNA damaging agents such as MMS can lead to ROS generation(Rowe et al., 2008).  A more detailed chemical and kinetic study of the ROS formation in DNMT-expressing cells would be needed to resolve these questions.”

    1. Author Response:

      Reviewer #1( Public review):

      The reviewer raised two main concerns: the potential confound between XOR and motor coding, and the relationship between neural coding and behaviour.

      First, we appreciate the consideration of the collinearity between the XOR and motor dimensions. We fully agree that this confound may have contributed to the observed increase in XOR decoding over the course of learning. In response, we will merge the XOR and motor features in the main figures, tone down our interpretation of the XOR learning effect, and clarify how motor signals may obscure or mimic XOR-related changes. As the reviewer noted, this confound does not affect the colour/context cross-generalisation analyses, which remain central to our conclusions regarding flexible and prospective working memory coding.

      We also thank the reviewer for the suggestion to examine the behavioural relevance of the neural representations more directly. We agree entirely, and will incorporate new analyses relating coding strength to reaction times, as well as reflect on the implications of these results in the revised Discussion.

      Reviewer #2 (Public Review):

      The reviewer rightly noted that our manuscript overlooks the established concept of retrospective/prospective coding in working memory, giving the impression that we attempted to reframe it using newer machine learning terminology. We thank the reviewer for catching this important omission. Our intention was not to override this well-established conceptual framework with a newer machine learning term, but rather to build upon it. In fact, prospective coding and the idea of working memory as a resource for computation are closely related—one helps define the functions (prospective and retrospective coding) and the other explains the computational rationale behind applying them. For example, prospective codes specify what is being stored (future-relevant information), while the “memory-as-computation” view addresses why such representation is useful: to enable temporal decomposition of complex tasks and reduce computational load at decision time. We will revise the relevant paragraphs to explicitly reference this cognitive framework and clarify how it relates to — and is complemented by — the newer computational perspective we introduce. Thank you again for highlighting this.

      Reviewer 2 also argues that the evidence presented does not support dimensionality reduction, noting that participants likely transition from processing the sensory cue (e.g., blue) to a rule-based representation (e.g., context 1 vs context 2) later in the trial, and that this remapping does not inherently require dimensionality reduction. We agree that our results are consistent with such a transformation into an abstract rule representation during the delay period, as supported by the observed cross- colour context generalisation (Figure 3b) and that this process does not require dimensionality reduction per se. However, we would like to clarify that a shared decision boundary between two colour pairs (e.g., context 1 vs context 2) can manifest in two types of neural geometries. In one case — observed in our data — the irrelevant colour dimension is not maintained after the presentation period, such that blue and pink are maintained as context 1 but variance along the blues vs pink dimension is not represented in neural activity. In the other case, it is possible for the same abstract rule (context 1) to be constructed while maintaining the sensory representation of colour (e.g., “blue” or “pink”), resulting in a change in representational geometry without a reduction in dimensionality. Our data do not support the latter scenario: irrelevant colour information is not maintained in the delay period, suggesting that the abstraction is accompanied by a loss of variance along irrelevant sensory dimensions—i.e., a form of dimensionality reduction. We will clarify this point in the revised manuscript and include a new analysis that explicitly tests whether shattering dimensionality changes as a function of trial time.

      The reviewer also raised concerns about inconsistencies in our terminology, particularly the use of “colour pair” and “irrelevant colour.” We agree with the reviewer that the term “colour pair” was a conceptual device rather than a literal aspect of the task, and we will revise the text to make this clear. We recognise that our wording around “irrelevant colour” might have caused confusion. We did not mean “colour” in the broad sense of all colour processing, but rather referred to specific colour dimensions that are not relevant for task performance—for example, when context 1 is cued by both pink and blue, the dimension carrying variance between blue and pink can be considered irrelevant. We will clarify this point in the revised manuscript, using the reviewer’s suggestion to incorporate the description we had already provided in the Methods section.

      While we respectfully disagree with the reviewer’s interpretation of our findings—particularly regarding the absence of dimensionality reduction, which they associate with the failure of the direct test of cross-colour context decoding (see Fig. 3b, which shows a significant effect)—we appreciate the opportunity to clarify our position and will revise the manuscript to ensure our reasoning is as transparent and rigorous as possible.

      Reviewer #3 (PublIc review):

      The reviewer values the study’s demonstration that learning promotes abstraction in task representations, but raises concerns about the lack of direct evidence linking delay-period activity to specific working memory mechanisms and the ambiguous dissociation between XOR and motor representations. We thank the reviewer for their careful reading of the manuscript and will address both concerns in the revised version. As mentioned in our response to Reviewer #1, we will merge the motor and XOR analyses, tone down our interpretations, and clarify why these signals are entangled. Additionally, we will link delay-period neural activity to behavioural performance to establish a more direct connection to working memory processes. Notably, in Figure 4f, we show that early in learning, participants who exhibit stronger cross-generalisation of context during the delay are also more likely to exhibit decreased shattering dimensionality at decision time — providing an early link between the preparation of a contextual signal and the subsequent reduction in computational complexity at decision time. We will include additional analyses to further strengthen this link in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      In Figure 1, it is very difficult to identify where CySCs end and GSCs begin without using a cell surface marker for these different cell types. In addition, the methods for quantifying the mitochondrial distribution in GSCs vs. CySCs are very much unclear and appear to rely on colocalization with molecular markers that are not in the same cellular compartment (Tj-nuclear vs. Vasa-perinuclear and cytoplasmic) the reader has no way to determine the validity of the mitochondrial distribution. Similarly, the labelling with gstD1-GFP is also very much unclear - I see little to no GFP signal in either GSCs or CySCs in panels 1GK. Lastly, while the expression o SOD in CySCs does increase the gstD1-GFP signal in CySCs, the effects on GSCs claimed by the authors are not apparent.

      We appreciate the reviewer’s detailed feedback on Figure 1 and the concerns raised regarding identifying CySCs and GSCs, as well as the methods used for quantifying mitochondrial distribution and gstD1-GFP labeling. Below, we address each point and describe the revisions made to improve clarity and rigor

      Distinguishing CySCs and GSCs and Mitochondrial Distribution in GSCs vs. CySCs in Figure1

      We acknowledge the difficulty in distinguishing CySCs from GSCs without the use of additional cell surface markers. To improve clarity, we have now included a membrane marker discslarge (Dlg) in our revised Figure 1 and S1 to delineate cell boundaries more clearly. Additionally, we provide higher-magnification images to indicate the mitochondria in CySCs and GSCs. We also agree that ing on mitochondrial distribution might be far-fetched. In the revised manuscript, we have limited our analysis to mitochondrial shape, which was found to be different in GSC and CySC (Fig. 1, D, F, G, and S1B). We have clarified our quantification methods in the revised Methods section, providing details on the image processing and analysis pipeline used to assess mitochondrial distribution. 

      Clarity of gstD1-GFP Labelling:

      We recognize the reviewer’s concern regarding the weak GFP signal in these panels. To improve visualization, we have included fresh set of images by optimizing the contrast and presenting additional monochrome images with higher exposure settings to better illustrate gstD1-GFP expression (Figure 1L,1Q, and S1C’’’-D’’’). Additionally, we have demarcated the cell boundaries using Dlg along with individual labelling of Vasa+ and Tj+ cells. Due to technical difficulty associated with acquisition of images, we could not co-stain Vasa, Tj and Dlg together. Therefore, quantified the gstD-GFP intensity separately for GSCs and CySCs under similar acquisition conditions (Figure 1R).   

      Effects of SOD depletion on GSCs:

      While our initial analysis suggested changes in gstD1-GFP expression in GSCs upon Sod1 depletion in CySCs, we acknowledge that the effects may not be as apparent in the provided images. In response, we have expanded our quantification, included a statistical analysis of gstD1-GFP intensity specifically in GSCs and CySCs (Figure 1S), and added more representative images in the revised figure panels (Figure S1C-D’’’) to support our claims.

      In Figure 2, while the cell composition of the niche region does appear to be different from controls when SOD1 is knocked down in the CySCs, at least in the example images shown in Figures 2A and B, how cell type is quantified in figures 2E-G is very much unclear in the figure and methods. Are these counts of cells contacting the niche? If so, how was that defined? Or were additional regions away from the niche also counted and, if so, how were these regions defined?

      Thank you for your  regarding the quantification of cell types in Figures 2E-G. We counted all cells that were Tj-positive and Zfh1-positive in individual testis, while for GSCs, only those in direct contact with the hub were included. This clarification has been incorporated into the revised figure legend and methods (line no.400-407). We have now provided a clearer description in the text to improve transparency in our analysis.

      In Figure 3, it is quite interesting that there is an increase in Eya<sup>+</sup>, differentiating cyst cells in SOD1 knockdown animals, and that these Eya+ cells appear closer to the niche than in controls. However, this seems at odds with the proliferation data presented in Figure 2, since Eya<sup>+</sup> somatic cells do not normally divide at all. Are they suggesting that now differentiating cyst cells are proliferative? In addition, it is important for them to show example images of the changes in Socs36E and ptp61F expression.

      Thank you for your insightful observations. We acknowledge the apparent contradiction and appreciate the opportunity to clarify our interpretation.

      Regarding the increase in Eya<sup>+</sup> differentiating cyst cells in Sod1RNAi individuals and their proximity to the niche, we do not suggest that these differentiating cells are proliferative. Instead, we propose that the knockdown of Sod1 may alter the timing or regulation of cyst cell differentiation, leading to an accumulation of Eya<sup>+</sup> cells near the niche. To clarify this point, we have revised the manuscript (line no. 186-189) to emphasize that our proliferation data specifically refers to early-stage somatic cells, not Eya<sup>+</sup> differentiating cyst cells.

      We also appreciate the reviewer's request for example images illustrating the changes in Socs36E and Ptp61F expression. We could not access the antibodies specific to Socs36E and Ptp61F. Hence, we had to rely on the measurements were obtained using real-time PCR from the tip region of testis. We have clarified the same in the figure legends (line 700). 

      Overall, the various changes in signaling are quite puzzling-while Jak/Stat signaling from the niche is reduced, hh signaling appears to be increased. Similarly, while the authors conclude that premature differentiation occurs close to the niche, EGF signaling, which occurs from germ cells to cyst cells during differentiation, is decreased. Many times these, changes are contradictory, and the authors do not provide a suitable explanation to resolve these contradictions. 

      We appreciate the reviewer’s thoughtful feedback on the signaling changes described in our study. We acknowledge that the observed alterations in Jak/Stat, Hedgehog (Hh), and EGF signaling may appear contradictory at first glance. However, our data suggest that these changes reflect a complex interplay between different signaling pathways that regulate cyst cell behavior in response to specific genetic perturbation.

      Regarding Jak/Stat and Hh signaling, while Jak/Stat activity is reduced in the niche, the increase in Hh signaling may reflect a compensatory mechanism or a context-dependent response of cyst cells to reduced Jak/Stat input. Prior studies have suggested that Hh signaling can function in parallel and independently of Jak/Stat signaling (PMID: 23175633) and our findings align with this possibility. 

      The reduction in EGFR signaling in this context appears contradictory to existing literature. One possible explanation is that, the altered GSC -CySC balance and loss of contact in Tj>Sod1i testes, leads to insufficient ligand response, thereby failing to activate EGFR signaling. (line no.222-224, 313-318). 

      Reviewer #2 (Public review):

      We sincerely appreciate the reviewer’s detailed feedback, which has helped refine our manuscript. In this study we have focussed on the role of ROS generated due to manipulation of Sod1 in the interplay between GSC and CySCs. In this regard, we have conducted additional experiments and incorporated quantitative data into the revised manuscript. Additionally, we have refined the text and provided further context to enhance the clarity. Key revisions include:

      (1) Clarification of Quantification Methods – We have refined intensity measurements by incorporating a membrane marker (Dlg) to better delineate cell boundaries and have normalized Ptc and Ci expression per cell to improve clarity.

      (2) Cell-Specific ROS Measurement – We separately measured ROS in germ cells and cyst cells and performed independent Sod1 depletion in GSCs to determine its direct effects.

      (3) Mitochondrial Analysis – We revised our approach, focusing on mitochondrial shape rather than asymmetric distribution, and removed overreaching claims.

      (4) Proliferation Analysis – We reanalyzed FUCCI data by normalizing to total cell count, supporting the conclusion that increased proliferation, rather than differentiation delay, underlies the observed phenotype.

      (5) E-Cad Quantification – We specifically analyzed E-Cad levels at the GSC-hub interface to strengthen conclusions on GSC attachment.

      (6) JAK/STAT Signaling – While we could not obtain a STAT92E antibody, we clarified the spatial limitations of our current analysis and revised the text accordingly.

      (7) Rescue Experiments and Gal4 Titration Control – We performed additional control experiments to confirm that observed effects are not due to Gal4 dilution.

      (8) Image Quality and Terminology Corrections – We enhanced figure resolution, corrected terminology (e.g., "cystic" to "cyst"), and revised ambiguous phrasing for clarity and accuracy.

      As suggested, we have also changed the manuscript title to better align with our results:

      Previous Manuscript Title: Non-autonomous cell redox-pairs dictate niche homeostasis in multi-lineage stem populations

      Updated Manuscript Title: Superoxide Dismutases maintain niche homeostasis in stem cell populations

      Specific responses to the reviewer’s: 

      While the decrease in pERK in CySCs is clear from the image and matched in the quantification, the increase in cyst cells is not apparent from the fire LUT used. The change in fluorescence intensity therefore may be that more cells have active ERK, rather than an increase per cell (similar arguments apply to the quantifications for p4E-BP or Ptc). Therefore, it is hard to know whether Sod1 knockdownresults in increased or decreased signaling in individual cells.

      Thank you for your insightful . To clarify, in the Fire LUT images, only pERK intensity is shown, not the cyst cell number. In our context, while there are more cells, the overall pERK intensity is lower, eliminating any ambiguity about whether the change is occurring per cell or due to an increased number of circulating cells. Moreover, for Ptc and Ci levels, we have normalized Ptc and Ci expression intensity per cell to enhance clarity and ensure an accurate interpretation of signaling changes.

      There are several places in which the authors could strengthen their manuscript by explaining the methods more clearly. For example, it is unclear how the intensity graphs in Figure 1Q are obtained. The curves appear smoothed and therefore unlikely to be from individual samples, but this is not clearly explained. However, this quantification method is clearly not helpful, as it shows the overlap between somatic and germline markers, suggesting it cannot accurately distinguish between the two cell types. Additionally, using a nuclear marker (Tj) for the cyst cells and cytoplasmic marker (Vasa) for the germ cells risks being misleading, as one would not expect much overlap between cytoplasmic gstD1-GFP and nuclear Tj. Also related to the methods, it is unclear how Vasa+ cells at the hub were counted. The methods suggest this was from a single plane, but this runs the risk of being arbitrary since GSCs can be distributed around the hub in 3D. (As a note, the label on the graph "Vasa+ cells" is misleading, as there are many more cells that are Vasa-positive than the ones counted.)

      We appreciate the reviewer’s careful evaluation of our manuscript and their insightful suggestions for improving the clarity of our methods. Below, we address each concern raised and describe the revisions made accordingly.

      Clarification of Intensity Graphs in Figure 1Q

      We have removed this graph, as we recognize that the markers previously used were not appropriate for distinguishing the different cell types. To address this concern, we have revised the text and now included a membrane marker discs-large (Dlg) in our revised Figure 1 and S1 to more clearly delineate cell boundaries. Due to technical difficulty associated with acquisition of images, we could not co-stain Vasa, Tj and Dlg together. Therefore, quantified the gstD-GFP intensity separately for GSCs and CySCs under similar acquisition conditions (Figure 1R).   

      Counting of Vasa<sup>+</sup> Cells at the Hub

      We appreciate the reviewer’s concern regarding our method for counting Vasa+ cells. In our original analysis, we included GSCs as the Vasa-positive cells that were in direct contact with the hub. To account for the three-dimensional arrangement of GSCs, we used the Cell counter plugin of Fiji and performed counting across different focal planes to ensure all hub-associated cells were considered. For better clarity on cell distribution around the hub, we have presented a single focal place image sliced through mid of the hub zone. To enhance transparency, we have now provided a more detailed explanation of our counting approach in the Methods section (line no 400- 403).

      We agree that the label "Vasa+ cells" may be misleading, as many cells express Vasa beyond the specific subset being counted. To address this, we have changed the label to " GSCs" to reflect the subset analyzed more accurately.

      The crucial experiment for this manuscript is presented in Figures 1 G-S, arguing that Sod1 knockdown with Tj-Gal4 increases gstD1-GFP expression in germ cells. This needs strengthening as the current quantifications are not convincing and appear to show an overlap between Tj (a nuclear cyst cell marker) and Vasa (a cytoplasmic germ cell marker). Labeling cell outlines would help, or alternatively, labeling different cell types genetically can be used to determine whether the expression is increased specifically within that cell type. Similarly, the measurement of ROS shown in the supplemental data should be conducted in a cell-specific manner. To clearly make the case that Sod1 knockdown in cyst cells is impacting ROS in the germline, it would be important to manipulate germ cell ROS independently. Without this, it will be difficult to prove that any effects observed are a result of increased ROS in the germline rather than indirect effects on the germline of altered cyst cell behaviour. 

      We appreciate the reviewer’s insightful feedback regarding the specificity of Sod1 knockdown effects in germ cells and the need for clearer quantification in Figures 1G–S. Below, we address each concern and outline the modifications made:

      Clarification of Cell Type-Specific Expression:

      We acknowledge the overlap observed between Tj (nuclear cyst cell marker) and Vasa (cytoplasmic germ cell marker) in the presented images. To strengthen our claim that gstD1GFP expression increases specifically in germ cells upon Sod1 knockdown, we have now labelled cell outlines using membrane marker discs-large (Dlg) to better distinguish cell boundaries, along with individual labelling of Vasa<sup>+</sup> and Tj<sup>+</sup> cells. Due to technical difficulty associated with acquisition of images, we could not co-stain Vasa, Tj and Dlg together. 

      Cell-Specific Measurement of ROS:

      We agree that a cell-type-specific ROS measurement is critical to establishing a direct effect on germ cells. To address this, we have now performed ROS measurements separately in germ cells and cyst cells under similar acquisition conditions. These data are now included in the revised (Figure 1R). Similarly, upon CySC-specific Sod1 depletion, we performed measurement of gstD1-GFP intensity which was found to be enhanced in GSCs, along with expected increase in CySCs (Fig 1S). We have independently manipulated ROS levels in GSCs (Nos Gal4> Sod1i) and observed that elevated ROS negatively impacts GSCs, leading to a reduction in their number, while having an insignificant effect on adjacent CySCs.(Fig S2 E, F).

      Quantifications of mitochondrial localization in Figure 1 should include some adequate statistical method to evaluate whether the distribution is random or oriented towards the GSC/CySC interface. From the image provided (Figure 1B), it would appear that there are two clusters of mitochondria, on either side of a CySC nucleus, one cluster towards a GSC and one cluster away. Therefore evaluating bias would be important. Additional experiments will be necessary to support the statement that "Redox state of GSC is maintained by asymmetric distribution of CySC mitochondria". This would require manipulating mitochondrial distribution in CySCs.

      We appreciate the reviewer’s suggestion regarding the quantification of mitochondrial localization. We agree that ing on mitochondrial distribution might be far-fetched. In revised manuscript, we have demarcated the cell boundary and limited our analysis to mitochondrial shape which was found to be different in GSC and CySC (Fig. 1, D, F, G and S1B). Mitochondrial shape was quantified based on the mitochondrial area and circularity (Figure 1F and G). To prevent any misinterpretation, we have removed the statement, "Redox state of GSC is maintained by asymmetric distribution of CySC mitochondria."

      One point raised by the authors is that the increase of somatic cell numbers is driven by accelerated proliferation, based on an increased number of cells in various stages of the cell cycle as assessed by the FUCCI reporter. However, there are more somatic cells in this genetic background, so it could be argued that the observed increase in different phases of the cell cycle is due to an increased number of cells. In order to argue for an increased proliferation rate, the number of cells in each phase should be divided by the total number of cells, expecting to see an increase in S and G2/M phases along with a decrease in G1. Otherwise, the simplest explanation is a block or delay in differentiation, meaning that more cells remain in the cell cycle.

      We appreciate the  regarding the interpretation of our FUCCI reporter data. We acknowledge that the observed increase in the number of cells in various phases of the cell cycle could be influenced by the overall higher number of somatic cells in this genetic background.

      To address this concern, we have now re-analyzed our FUCCI data by normalizing the number of cells in each phase to the total number of cells and we did not observe a significant shift in the proportion of cells in S and G2/M phases relative to G1. This suggests presence of more proliferative cells, that is less cells in Go phase, rather than alterations in the timing of cell cycle progression stages. We are not sure about a block in differentiation because we see an enhanced accumulation of Eya+ cells near the niche. We have also supported our FUCCI data with pH3 staining where we have found more pH3+ spots under SOD1 depleted background. We have revised our manuscript accordingly (Figure 2I, K and S2U) to reflect this interpretation and appreciate the constructive feedback.

      In Figure 3, the authors claim that knockdown of Sod1 in the soma decreases the attachment of GSCs to the hub-based on lower E-Cad levels compared to controls. Previous work has shown that in GSCs, E-Cad localizes to the Hub-GSC interface (PMID: 20622868). Therefore, the authors should quantify E-Cad staining at the interphase between the germ cells and the niche.

      We appreciate the reviewer’s . As suggested, we have now quantified ECad staining specifically at the interface between the germ cells and the niche. Our analysis confirms that E-Cad levels are significantly reduced at this interphase upon Sod1 knockdown in the soma compared to controls, supporting our conclusion that Sod1 depletion affects GSC attachment to the hub as well as the whole niche. The revised Figure 3M now includes these quantifications, and we have updated the figure legend and results section accordingly.

      The authors show decreased expression of the JAK/STAT targets socs36E and ptp61F, arguing that this could be a reason for decreased GSC adhesion to the hub. However, these data were obtained from whole testes and lacked spatial resolution, whereas a STAT92E staining in control and tj>Sod1 RNAi testes could easily prove this point. Indeed, previous work has shown that socs36E is expressed in the CySCs, not GSCs (PMID: 19797664), suggesting that any decrease in JAK/STAT may be autonomous to the CySCs.

      We appreciate the reviewer’s observation regarding the spatial resolution of our JAK/STAT target expression analysis. To improve accuracy, we have attempted to collect only the tip of the testes while excluding the rest; however, we acknowledge that this approach may still obscure cell-specific changes. We had attempted to procure the STAT92E antibody but, despite multiple inquiries, we did not receive a positive response. While we agree that STAT92E staining would have strengthen our findings, we are currently unable to perform this experiment. Nevertheless, our observations align with prior work indicating that socs36E is predominantly expressed in CySCs (PMID: 19797664). We have revised the manuscript text accordingly to clarify this limitation.

      Additional considerations should be taken regarding the rescue experiments where PI3KDN and Hh RNAi are expressed in a Tj>Sod1 RNAi background. To rule out that any rescue can be attributed to titration of the Gal4 protein when an additional UAS sequence is present, a titration control would be useful. These pathways are not described accurately since Insulin signaling is necessary for the differentiation of somatic cells (not maintenance as written in the text), and its inhibition has been shown to increase the number of undifferentiated somatic cells (PMID:27633989). As far as Hh is concerned, the expression of this molecule is restricted to the niche. It would be important to establish whether the expression is altered in this case, especially as the authors rescue the Sod1 knockdown by also knocking down Hh. One possibility that the authors need to rule out is that some of the effects they observe are due to the knockdown of Sod1 (and/or Hh) in the hub as Tj-Gal4 is expressed in the hub as well as the CySCs (PMID:27546574).

      We appreciate the reviewer’s insightful s and suggestions. Below, we address each concern and describe the steps we have taken to incorporate the necessary modifications in our revised manuscript.

      Titration Control for Rescue Experiments  

      We acknowledge the reviewer’s concern regarding potential Gal4 titration effects when introducing additional UAS constructs. To address this, we conducted a control experiment quantifying SOD1 levels in control, Tj > Sod1 RNAi, and Tj > Sod1 RNAi, UAS hhRNAi backgrounds using real-time PCR (Figure S4 M). The Sod1 levels in single and double UAS copy conditions were comparable, indicating that Gal4 titration does not significantly affect the results.

      Clarification of Insulin Signaling Role 

      We appreciate the reviewer’s insight regarding the involvement of insulin signaling in this context. Initially, we included data on PI3K/TOR as we found it intriguing. However, as the data didn’t add much to the overall observations, we have removed them to ensure clarity and prevent any potential confusion.

      Hh Expression and Niche Consideration 

      We recognize the importance of evaluating whether Hedgehog (Hh) expression is altered in the Sod1 RNAi background. We have already quantified hh in qRT-PCR (Figure S4C). 

      Potential Effects of Sod1 and Hh Knockdown in the Hub 

      We acknowledge the concern that Tj-Gal4 is expressed in both the hub and CySCs, potentially affecting hub function upon Sod1 and Hh knockdown. To address this, we have included additional data using the CySC-specific driver C-587 Gal4 to distinguish CySC-intrinsic effects from potential hub contributions. Our results show that while the phenotypic changes are consistent across both drivers, the effects are significantly stronger with Tj-Gal4, suggesting a role of the hub in this process. These findings have been incorporated into the revised manuscript (Fig S1G-H, M-N).

      In general, the GSCs (and other aspects) are difficult to see in the images; enlargements or higher-resolution images should be provided. Additionally, the manuscript contains several mistakes or inaccuracies (examples include referring to ROS having "evolved" in the abstract when it is cells that have evolved to use ROS, or the references to "cystic" cells when they are usually referred to as "cyst" cells, or that "CySCs also repress GSC differentiation by suppressing transcription of bag-of-marbles" when CySCs produce BMPs that lead to suppression of bam expression in the germline). These would need editing for both clarity and accuracy.

      We appreciate the reviewer’s insightful feedback and have made the necessary revisions to address the concerns raised.

      Image Clarity and Resolution: 

      We have provided higher-resolution images in some of the revised images for better understanding. The revised figures now offer better clarity for key observations.

      Clarification of Terminology and Accuracy:

      The phrase regarding ROS in the abstract has been revised to reflect that cells have evolved to utilize ROS, rather than ROS itself evolving (line no. 27).

      References to "cystic" cells have been corrected to "cyst" cells for consistency with standard terminology.

      The statement about CySCs repressing GSC differentiation has been revised for accuracy, clarifying that CySCs produce BMPs, which lead to the suppression of bam expression in the germline (line no. 84).

      We have carefully reviewed the manuscript for any additional inaccuracies or ambiguities to ensure clarity and precision. We appreciate the reviewer’s constructive s, which have helped improve the manuscript.

      Reviewer #3 (Public review):

      In response to Reviewer 3’s comments, we would like to highlight the point that in the present study we have focussed on the interplay between CySC and GSC and have accordingly conducted our experiments. We did observe some changes in the hub and do not rule out the effect of hub cells in exacerbating some of our phenotypes. We have included additional controls to highlight the effect of CySC ROS. These points have been appropriately discussed in the manuscript. Key revisions include:  

      (1)  Data Clarity & Visualization: To improve mitochondrial lineage association, we incorporated a membrane marker (Dlg) in Figure 1, enhancing the distinction between CySCs and GSCs. Additionally, we refined gstD-GFP quantifications in individual cell types and provided high-resolution images.

      (2) ROS Transfer & Measurement: We revised our discussion to acknowledge indirect ROS transfer mechanisms and added separate ROS quantifications in GSCs and CySCs, confirming higher ROS levels in CySCs (Figure 1R).

      (3) Tj-Gal4 Specificity & Niche Characterization: Recognizing Tj-Gal4 expression in hub cells, we included C587-Gal4 as a CySC-specific driver, demonstrating that hub cells contribute partially to the phenotype (Figure S1G,H,M,N).

      (4) Signaling Pathway Validation: We optimized dpERK staining, included controls (Tj>EGFRi), and clarified limitations regarding MAPK signaling. Due to lethality, we could not perform an EGFR gain-of-function rescue. We also validated increased Hh signaling via qPCR and a Tj>UAS Ci control (Figure S4).

      (5) Conceptual & Terminological Refinements: We revised our discussion of BMP signaling, ROS gradients, and testis-specific terminology. All figures and labels now accurately represent GSC scoring (single Vasa⁺ cells in contact with the niche).

      (6) Figure & Methods Improvements: We enhanced image resolution, provided grayscale versions where needed,and expanded Materials & Methods to clarify experimental conditions.

      These revisions strengthen our conclusions and address the reviewer’s concerns, ensuring a more precise and transparent presentation of our findings. To align with the reviewer’s s we have changed the title of the manuscript to “Superoxide Dismutases maintain niche homeostasis in stem cell populations”.

      Specific responses to the reviewer’s comments: 

      (1) Data

      a.  Problems proving which mitochondria are associated with which lineage.

      We acknowledge the challenge of distinguishing CySCs from GSCs without additional cell surface markers. To enhance clarity, we have incorporated the membrane marker Discs-large (Dlg) in our revised Figure 1 to better delineate cell boundaries, providing a clearer depiction of mitochondrial distribution in GSCs and CySCs.

      b.There is no evidence that ROS diffuses from CySCs into GSCs.

      We acknowledge the reviewer’s concern. There are reports which talks about diffusion of ROS across cells on which we have included a few lines in the discussion (line no. 274-276). We do understand that our previous quantifications showed ROS diffusion from CySC to GSC rather indirectly. Therefore, in revised manuscript we have measured ROS separately in the two cell populations. We found that the CySCs show higher ROS profile than GSCs (Fig 1R).  

      c.The changes in GST-GFP (redox readout) are possibly seen in differentiating germ cells (i.e., spermatogonia) but not in GSCs. This weakens their model that ROS in CySC is transferred to GSCs.

      Thank you for your observation. We acknowledge that the changes in gstD-GFP (redox readout) are more prominent in differentiating germ cells. It is known that differentiating cells show higher ROS profile than the stem cells. Hence, expectedly the intensity of gstDGFP was lesser in stem cell zone compared to the differentiating zone. In our manuscript we are focussed on the redox state among stem cell populations. Therefore, we have included better quality images and measured the gstD1-GFP intensity individually in GSCs and CySCs (Figure 1R) by demarcating the cell boundaries (Figure 1M, S1C-D’’’). We found that CySCs show higher ROS profile than GSCs and enhancement of ROS in CySC by Sod1 depletion resulted in a consequent increase in ROS in GSCs. We believe this revision strengthens our model by addressing the potential discrepancy and providing a more comprehensive understanding of ROS dynamics within the GSC niche.

      d.Most of the paper examines the effect of SOD depletion (which should increase ROS) on the CySC lineage and GSC lineage. One big caveat is that Tj-Gal4 is expressed in hub cells (Fairchild, 2016), so the loss of SOD from hub cells may also contribute to the phenotype. In fact, the niche in Figure 2D looks larger than the niche in the control in Figure 2C, arguing that the expression of Tj in niche cells may be contributing to the phenotype. The authors need to better characterize the niche in tj>SOD-RNAi testes.

      We appreciate the reviewer’s insightful  regarding the potential contribution of hub cell to the observed phenotype. We acknowledge that Tj-Gal4 is expressed in hub cells and this could influence the niche size and overall phenotype.

      To address this concern, we have included an additional control using C587-Gal4, a CySC specific driver, to distinguish CySC-specific effects from potential hub contributions. All the effects on cell number observed in Tj>Sod1i was replicated in C587>Sod1i testis, except that the observed phenotypes were comparatively weaker. These indicate partial contribution of hub cells to the observed phenotype, exacerbating its severity. However, the effect of Sod1 depletion in CySC on GSC lineages remains significant. These findings have been incorporated into Figure S1- G,H,M and N) and incorporated in the discussion (line no.308311). 

      e. The Tj>SOD1-RNAi phenotype is an expansion of the Zfh1<sup+</sup> CySC pool, expansion of the Tj<sup>+</sup> Zfh1- cyst cells (both due to increased somatic proliferation) and a non-autonomous disruption of the germline.

      We appreciate the reviewer’s observation. Our data confirm that Tj>SOD-RNAi leads to an expansion of both Zfh1<sup+</sup> CySCs and Tj<sup>+</sup> Zfh1- cyst cells, which we attribute to increased somatic proliferation. Additionally, we observe a non-autonomous disruption of the germline, likely due to dysregulated signaling from the altered somatic niche.

      f. I am not convinced that MAPK signaling is decreased in tj>SOD-i testes. Not only is this antibody finicky, but the authors don't have any follow-up experiments to see if they can restore SOD-depleted CySCs by expressing an EGFR gain of function. Additionally, reduced EGFR activity causes fewer somatic cells (not more) (Amoyel, 2016) and also inhibits abscission between GSCs and gonial blasts (Lenhart 2015), which causes interconnected cysts of 8- to 16 germ cells with one GSC emanating from the hub.

      We acknowledge that the dpERK antibody can be challenging. We took necessary precautions, including optimizing staining conditions and using positive control (Tj>EGFRi) (Figure: S4B). Our results consistently showed a decrease in dpERK levels in Tj>Sod1i testes, supporting our conclusion.

      We agree that inclusion of an experiment using EGFR gain-of-function to rescue the effects of CySC-Sod1 depletion would have strengthened our findings. We had attempted this experiment; however, the progenies constitutively expressing EGFR under Sod1RNAi background were lethal, preventing us from completing the analysis.

      We agree that our observations do not align with the reported effects of EGFR signaling on somatic cell numbers and abscission and we appreciate the references provided. Based on our observations, we feel that modulation of MAPK signaling in the niche probably, happens in a context-dependent manner. One possible explanation is that, the altered GSC -CySC balance and loss of contact in Tj>Sod1i testes, leads to insufficient ligand response, thereby failing to activate EGFR signaling. While it is well established that ROS can enhance EGFR signaling to promote cellular proliferation and early differentiation, our results indicate a more nuanced regulation in this context. However, further detailed analysis is required to completely understand the regulatory controls. We have clarified this point in the manuscript (line no.

      313-320).

      g. The increase in Hh signaling in SOD-depleted CySCs would increase their competitiveness against GSCs and GSCs would be lost (Amoyel 2014). The authors need to validate that Hh protein expression is indeed increased in SOD-depleted CySCs/cyst cells and which cells are producing this Hh. Normally, only hub cells produce Hh (Michel,2012; Amoyel 2013) to promote self-renewal in CySCs.

      We appreciate the reviewer’s suggestion regarding the validation of Hh protein expression and its source. Since Tj-Gal4 is expressed in the hub, it is likely activating the Hh pathway and promoting CySC proliferation. Unfortunately, we could not procure Hh antibody to directly assess its protein levels. However, to address this, we performed real-time PCR from RNA derived from the tip region and found a significant increase in hh mRNA levels in SOD-depleted cyst cells. These findings support our hypothesis that elevated Hh signaling enhances CySC competitiveness, leading to GSC loss. To support this idea, we have included a Tj>Ci positive control which caused abnormal proliferation of Tj<sup>+</sup> cells resulted in ablation of GSCs. We have incorporated these results in the revised manuscript (Results section, Figure S-4).

      h.The increase in p4E-BP is an indication that Tor signaling is increased, but an increase in Tor in the CySC lineage does not significantly affect the number of CySCs or cyst cells (Chen, 2021). So again I am not sure how increased Tor factors into their phenotype.

      We acknowledge the reviewer’s concern regarding the role of increased Tor signaling in our phenotype. The observed increase in Tor could indeed be a downstream effect of elevated ROS levels. However, establishing a direct causal relationship between Sod1 and Tor would require additional experiments, which we feel might be a good study in its own merit. To maintain clarity and focus in the revised manuscript, we have opted not to include this preliminary data at this stage.

      I.The over-expression of SOD in CySCs part is incomplete. The authors would need to monitor ROS in these testes. They would also need to examine with tj>SOD affects the size of the hub.

      We value the reviewer's . To address this, we have now monitored ROS levels in the testes upon SOD overexpression in CySCs using DHE (Figure S5 I). Our results indicate a significant reduction in ROS levels compared to controls. 

      Additionally, we examined hub size upon Sod1 overexpression and observed a slight, but statistically insignificant, reduction. As our study primarily focuses on ROS-mediated GSCCySC interactions, we did not include a detailed investigation on hub size regulation.

      (2) Concept

      Why would it be important to have a redox gradient across adjacent cells? The authors mention that ROS can be passed between cells, but it would be helpful for them to provide more details about where this has been documented to occur and what biological functions ROS transfer regulates.

      We thank the reviewer for this insightful . We acknowledge that the concept of a redox gradient was not adequately conveyed, as the cell boundary was not clearly defined. To address this, we have revised our interpretation to propose that high ROS levels in one cell may influence the ROS levels in an adjacent cell through either direct transfer or as a secondary effect of altered niche maintenance signaling, rather than through the establishment of a gradient.

      Regarding ROS transfer between cells, it has been documented in several biological contexts. For instance, hydrogen peroxide (H<sub>2</sub>O<sub>2</sub>) can diffuse through aquaporins, influencing signaling pathways in neighbouring cells (PMID: 17105724). We have incorporated these details and relevant references into the revised manuscript to enhance the conceptual understanding of ROS transfer. 

      (3) Issues with the scholarship of the testis

      a. Line 82 - There is no mention of BMPs, which are the only GSC-self-renewal signal. Upd/Jak/STAT is required for the adhesion of GSCs to the niche but not self-renewal (Leatherman and Dinardo, 2008, 2010). The author should read a review about the testis. I suggest Greenspan et al 2015. The scholarship of the testis should be improved.

      We appreciate the reviewer’s feedback regarding the role of BMPs in GSC selfrenewal, we have added this in the revised manuscript (line no. 83) We have now incorporated a discussion on BMP signaling as the primary self-renewal signal for GSCs, distinguishing it from the role of Upd/JAK/STAT in niche adhesion, as highlighted in Leatherman and Dinardo (2010). Additionally, we have cited and reviewed the work by Greenspan et al. (2015) and ensure a more comprehensive discussion of GSC regulation. These revisions can be found in the line no. 285-289 of the revised manuscript.

      b. Line 82-84 - BMPs are produced by both hub cells and CySCs. BMP signaling in GSCs represses bam. So it is not technically correct to say the CySCs repress bam expression in GSCs.

      We acknowledge the reviewer’s clarification regarding BMP signaling and its role in repressing bam expression in GSCs. We have revised the relevant section (line no.83-85). 

      c.Throughout the figures the authors score Vasa<sup>+</sup> cells for GSCs. This is technically not correct. What they are counting is single, Vasa<sup>+</sup> cells in contact with the niche. All graphs should be updated with the label "GSCs" on the Y-axis.

      We appreciate the reviewer’s careful assessment of our methodology. We acknowledge that scoring Vasa⁺ cells alone does not definitively identify GSCs. Our quantification specifically considers single Vasa<sup>⁺</sup> cells in direct contact with the niche. To ensure clarity and accuracy, we have updated all figure legends and Y-axis labels in the relevant graphs to explicitly state "GSCs" instead of "Vasa⁺ cells."

      (4) Issues with the text

      a. Line 1: multi-lineage is not correct. Multi-lineage refers to stem cells that produce multiple types of daughter cells. GSCs produce only one type of offspring and CySCs produce only one type of offspring. So both are uni-lineage. Please change accordingly.

      We acknowledge the incorrect usage of "multi-lineage" and agree that both GSCs and CySCs are uni-lineage, as they each produce only one type of offspring. We have revised Line 1 accordingly and also updated the title. 

      b. Lines 62-75 - Intestinal stem cells have constitutively high ROS (Jaspar lab paper), so low ROS in stem cell cells is not an absolute.

      We appreciate the clarification. We have revised Lines 62–75 to acknowledge that low ROS is not universal in stem cells, citing the Jaspar lab study on intestinal stem cells (Line 70). Thank you for the valuable insight.

      c.  Line 79: The term cystic is not used in the Drosophila testis. There are cyst stem cells (CySCs) that produce cyst cells. Please revise.

      We have revised the text to replace "cystic" with the correct terminology, referring to cyst stem cells (CySCs) in the manuscript.

      d. Line 90 - perfectly balanced is an overstatement and should be toned down.

      Thank you for the suggestion. We have revised it to “balanced” instead of "perfectly balanced."  

      e. Line 98 - division of labour is not supported by the data and should be rephrased.

      Thank you for the feedback. We have rephrased it (line no. 98-101) to avoid the term "division of labor".

      f. Line 200 - the authors provide no data on BMPs - the GSC self-renewal cue - so they should avoid discussing an absence of self-renewal cues.

      We appreciate the reviewer’s point. We have revised it to avoid discussing the absence of self-renewal cues, given that we do not present data on BMP signaling. This ensures that our conclusions remain within the scope of the provided data.

      (5) Issues with the figures

      a The images are too small to appreciate the location of mitochondria in GSCs and CySCs.

      b. Figure 1

      c. cell membranes are not marked, reducing the precision of assigning mitochondria to GSC or CySCs. It would be very helpful if the authors depleted ATP5A from GSCs and showed that the puncta are reduced in these cells, and did a similar set of experiments for the Tj-Gal4 lineage. It would also be very helpful if the authors expressed membrane markers (like myrGFP) in the GSC and then in the CySC lineage and then stained with ATP5A. This would pinpoint in which cells ATP5A immunoreactivity is occurring.

      d. The presumed changes in gst-GFP (redox readout) are possibly seen in differentiating germ cells (i.e.,spermatogonia) but not in GSC. iii. Panels F, Q, and S are not explained and currently are irrelevant.

      e. Figure 3K - The evidence to support less Ecad in GSCs in tj>SOD-i testes is not compelling as the figure is too small and the insets show changes in Ecad in somatic cells, not GSC. d. Figure 4:

      f. Panel A, B The apparent decline (not quantified) may not contribute to the phenotype.

      ii.dpERK is a finicky antibody and the authors are showing a single example of each genotype. This is an important experiment because the authors are going to use it to conclude that MAPK is decreased in the tj>SOD-i samples. However, the authors don't have any positive (dominantactive EGFR) or negative (tj>mapk-i). As is standing, the data is not compelling. The graph in F does not convey any useful information.

      g. Figure S1D - cannot discern green on black. It is critical for the authors to show monochromes (grayscale) for thereabouts that they want to emphasize. I cannot see the green on black in Figure S1D.

      h. Figure S4 - there is no quantification of the number of Tj cells in K-N.

      We appreciate your detailed feedback regarding the figures in our manuscript. Below, we address each concern and outline the revisions we have made.

      (a) Image Size and Mitochondrial Localization in GSCs and CySCs 

      We acknowledge the need for larger images to better visualize mitochondrial localization. We have now increased the resolution and size of the images in Figure 1. Additionally, we have included high-magnification insets to enhance clarity (Figure 1 B#)

      (b) Figure 1 B,B#,C 

      (i) We have now marked cell membranes using Dlg to improve the precision of mitochondrial assignment to GSCs and CySCs and then stained for ATP5A, which clearly demarcates ATP5A immunoreactivity in specific cell types.

      (ii) We have revisited the gstD-GFP (redox readout) data and now provide revised images (Figure S1C-D’’’) and quantification (Figure 1 R,S) to better illustrate changes in the redox state. It is indeed intense in differentiating germ cells as expected but also present in the stem cell zone.

      (iii) Panels F, Q, and S have now been removed in the revised figure legend. 

      (C) Figure 3K: We have digitally magnified the figure size and improved contrast to better visualize E-cadherin levels. The insets have been revised to ensure they focus specifically on GSCs rather than somatic cells. Earlier, we quantified the E-cadherin intensity changes in the GSC-hub interface and provided statistical analysis to support our findings (Figure 3M).

      (d) Figure 4: (i) Panels A and B have now been quantified, and we provide statistical comparisons to support our observations. (ii) We acknowledge the variability of dpERK staining. To strengthen our conclusions, we have provided negative (Tj>MAPK-i) controls (Figure S4 B). Additionally, we have removed panel F (MAPK area cover) to avoid confusion.

      (e) We appreciate the suggestion regarding grayscale images and have provided the monochrome images for mitochondria and gstD-GFP image representation. We have now removed Figure S1D as it was no longer required.

      (f) Figure S4: The quantification of the number of Tj-positive cells was actually included in the main figure along with statistical analysis.

      (g) We sincerely appreciate the reviewer’s insightful s, which have significantly improved the quality and clarity of our manuscript. We hope that our revisions adequately address the concerns raised.

      (6) Issues with Methods

      a.  Materials and Methods are not described in sufficient depth - please revise.

      b.  Note that Tj-Gal4 has real-time expression in hub cells and this is not considered by the authors. The ideal genotype for targeting CySCs is Tj-Gal4, Gal80TS, hh-Gal80. Additionally, the authors do not mention whether they are depleting throughout development into adulthood or only in adults. If the latter, then they must have used a temperature shift, growing the flies at 18C and then upshifting to 25C or 29C during adult stages.

      c.  The authors need to show data points in all of the graphs. Some graphs do this but others do not.

      d.  The authors state that all data points are from three biological replicates. This is not sufficient for GSC and CySC counts. Most labs count GSCs and CySCs from at least 10 testes of the correct genotype.

      We appreciate the reviewer’s valuable feedback and have made the necessary revisions to improve the clarity and rigor of our study. Below, we address each concern in detail:

      Materials and Methods

      We have revised the Materials and Methods section to provide a more detailed description of the experimental procedures, including genotypes, sample preparation, and quantification methods.

      Tj-Gal4 Expression and Experimental Design

      We acknowledge the reviewer’s point regarding Tj-Gal4 expression in hub cells. While Tj-Gal4 is active in hub cells, our focus was on CySCs, and we have now included a discussion of this caveat in the revised manuscript (line no. 308-311)

      Thank you for your suggestion on the ideal genotype for targeting CySCs. While we attempted to procure hh-Gal80, we couldn’t manage to get it, so we opted for another well-established Gal4 driver, C-587 Gal4, to target CySCs. Our results indicate that although the phenotypic changes are consistent across both drivers, the effects are significantly stronger with Tj-Gal4, highlighting the role of CySCs in this process with partial contributions from the hub. These findings have been incorporated into the revised manuscript (lines 309–311).

      We now clarify whether gene depletion was conducted throughout development or restricted to adulthood. For adult-specific depletion using the UAS-Gal4 system, crosses were set up at 25°C, and after two days, progenies were shifted to 29°C and aged for 3–5 days at 29°C. This process is now explicitly detailed in the revised Methods section (line no. 345-348).

      Data Presentation in Graphs

      We have updated all graphs to ensure that individual data points are shown consistently across all figures.

      Sample Size for GSC and CySC Counts

      We acknowledge the reviewer’s concern regarding biological replicates. Our initial study was based on 10 biological replicates, each set consisting of at least 7-8 testes per genotype, in line with standard practice in the field. This change is reflected in the revised Results and Methods sections.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Comments:

      (1) HCC shows heterogeneity, and it is unclear what tissues (tumor or normal) were used from the DKO mice and human HCC gene expression dataset to obtain the gene signature, and how the authors reconcile these gene signatures with HCC prognosis.

      Mice studies: Aged DKO mice develop aggressive tumors (major and minor nodules, See Figure 1), and the entire liver is burdened with multiple tumor nodules. It is technically challenging to demarcate the tumor boundaries as most of the surrounding tissues do not display normal tissue architecture. Therefore, livers from age- and sex-matched wild-type C57/BL6 mice were used as control tissue. All the mice were inbred in our facility. Spatial transcriptomics and longitudinal studies are ongoing to collect tumors at earlier time points wherein we can differentiate tumor and non-tumor tissue.

      Human Studies: We mined five separate clinical data sets. The human HCC gene expression comprised of samples from the (i) National Cancer Institute (NCI) cohort (GEO accession numbers, GSE1898 and GSE4024) and (ii) Korea, (iii) Samsung, (iv) Modena, and (v) Fudan cohorts as previously described (GEO accession numbers, GSE14520, GSE16757, GSE43619, GSE36376, and GSE54236). We have added a new supplemental table 4, giving details of these datasets. Depending on the cohort, they are primarily HCC samples- surgical resections of HCC, control samples, with some tumors and paired non-tumor tissues.

      (2) The authors identified a unique set of gene expression signatures that are linked to HCC patient outcomes, but analysis of these gene sets to understand the causes of cancer promotion is still lacking. The studies of urea cycle metabolism and estrogen signaling were preliminary and inconclusive. These mechanistic aspects may be followed up in revision or future studies.

      We agree. Experiments to elicit HCC causality and promotion are complex, given the heterogeneous nature of liver cancer. Moreover, the length of time (12 months) needed to spontaneously develop cancer in this DKO mouse model makes it challenging. As mentioned by the reviewer, mechanistic studies are ongoing, and longitudinal time course experiments are actively being pursued to delineate causality. Having said that, we mined the TCGA LIHC (The Cancer Genome Atlas Liver Hepatocellular Carcinoma) database to examine the expression of the individual urea cycle genes and found them suppressed in liver tumorigenesis (new Supplementary Figure 4). We also evaluated if estrogen receptor a (Era) targets altered in DKO females (DKO_Estrogen) correlate with overall survival in HCC (new Supplementary Figure 6). We note that Era expression per se is reduced in males and females upon liver tumorigenesis. Also, DKO_Estrogen signature positively corroborated with better overall survival (new Supplementary Figure 6). These findings further bolster the relevance of urea cycle metabolism and estrogen signaling during HCC.

      (3) While high levels of bile acids are convincingly shown to promote HCC progression, their role in HCC initiation is not established. The DKO model may be limited to conditions of extremely high levels of organ bile acid exposure. The DKO mice do not model the human population of HCC patients with various etiology and shared liver pathology (i.e. cirrhosis). Therefore, high circulating bile acids may not fully explain the male prevalence of HCC incidence.

      We agree with this comment that our studies do not show bile acids can initiate HCC and may act as one of the many factors that contribute to the high male prevalence of HCC. This is exactly the reason why throughout the manuscript we do not write about HCC initiation. To clarify further, in the revised discussion of the manuscript, we have added a sentence to highlight this aspect, “while this study demonstrates bile acids promote HCC progression it does not investigate or provide evidence if excess bile acids are sufficient for HCC initiation.”

      (4) The authors showed lower circulating bile acids and increased fecal bile acid excretion in female mice and hypothesized that this may be a mechanism underlying the lower bile acid exposure that contributed to lower HCC incidence in female DKO mice. Additional analysis of organ bile acids within the enterohepatic circulation may be performed because a more accurate interpretation of the circulating bile acids and fecal bile acids can be made in reference to organ bile acids and total bile acid pool changes in these mice.

      As shown in this manuscript- we provide BA compositional analyses from the liver, serum, urine, and feces (Figures 5 and 6, new Supplementary Figure 8, Supplementary Tables 4 and 5). Unfortunately, we did not collect the intestinal tissue or gallbladders for BA analysis in this study. Separate cohorts of mice are being aged for future BA analyses from different organs within the enterohepatic loop. We thank you for this suggestion. Nevertheless, we have previously measured and reported BA values to be elevated in the intestines and the gall bladder of young DKO mice (PMC3007143).

      Reviewer #2 (Public review)

      Weaknesses:

      (1) The translational value to human HCC is not so strong yet. Authors show that there is a correlation between the female-selective gene signature and low-grade tumors and better survival in HCC patients overall. However, these data do not show whether this signature is more highly correlated with female tumor burden and survival. In other words, whether the mechanisms of female protection may be similar between humans and mice. In that respect, it would also be good to elaborate on whether women have higher fecal BA excretion and lower serum BA concentration.

      The reviewer poses an interesting question to test if the DKO female-specific signatures are altered differently in male vs. female HCC samples. As we found the urea cycle and estrogen signaling to be protective and enriched in our mouse model, we tested their expression pattern using the TCGA-LIHC RNA-seq data. We found urea cycle genes and Era transcripts broadly reduced in tumor samples irrespective of the sex (new Supplementary Figure 4 and Supplementary Figure 6), indicating that these pathways are compromised upon tumorigenesis even in the female livers.

      While prior studies have shown (i) a smaller BA pool w synthesis in men than women (PMID: 22003820), we did not find a study that systematically investigated BA excretion between the sexes in HCC context. The reviewer is spot on in suggesting BA analysis from HCC and unaffected human fecal samples from both sexes. Designing and performing such studies in the future will provide concrete proof of whether BA excretion protects female livers from developing liver cancer. We thank you for these suggestions.

      (2) The authors should perform a thorough spelling and grammar check.

      We apologize for the typos, which have been fixed, and as suggested by the reviewer, we have performed a grammar check.

      (3) There are quite some errors and inaccuracies in the result section, figures, and legends. The authors should correct this.

      We apologize for the inadvertent errors in the manuscript, and we have clarified these inaccuracies in the revised version. Thank you.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #2 (Public review):

      Summary:

      The manuscript by Xie and colleagues presents transcriptomic experiments that measure gene expression in eight different tissues taken from adult female and male mice from four species. These data are used to make inferences regarding the evolution of sex-biased gene expression across these taxa.

      Strengths:

      The experimental methods and data analysis appear appropriate. The authors promote their study as unprecedented in its size and technical precision.

      We do not understand the statement "the authors promote" as if there was a doubt about this. If there is a doubt, we welcome to see it specified.

      Weaknesses:

      The manuscript does not present a clear set of novel evolutionary conclusions. The major findings recapitulate many previous comparative transcriptomics studies - gene expression variation is prevalent between individuals, sexes, and species; and genes with sex-biased expression evolve more rapidly than genes with unbiased expression - but it is not clear how the study extends our understanding of gene expression or its evolution.

      There have been no "previous comparative transcriptomics studies" at a micro- evolutionary scale in animals, hence, we do not "replicate" these. And our contrast between somatic and gonadal patterns reveals insights that have not been recognized before, namely that gonadal sex-specific expression turnover is actually not faster that the corresponding non-sex-specific truover. We have now further clarified this distinction throughout the text and have also adapted the title of the paper accordingly.

      We agree with the overall statement that "gene expression variation is prevalent between individuals, sexes, and species" but the aspect of "sex-biased gene expression between individuals" has not been systematically analysed before in such a context.

      Concerning the statement that "genes with sex-biased expression evolve more rapidly than genes with unbiased expression", we note that this is mostly derived from gonadal data and that there is no study that has quantified this so far at a population level and between subspecies in comparison to somatic data.

      Our results show further that previous assumptions of a substantial set of genes with sex- biased expression conserved between mice and humans are due to underestimating the convergence issues when there is an extremly fast turnover of sex-biased gene expression. This has a major implication for using mice as a model for gender-speficic medicine questions in humans.

      Many gene expression differences between individual animals are selectively neutral, because these differences in mRNA concentration are buffered at the level of translation, or differences in protein abundance have no effect on cellular or organismal function. The hypothesis that sex-biased genes are enriched for selectively neutral expression differences is supported by the excess of inter-individual expression variance and inter-specific expression differences in sex-biased genes.

      This statement repeats a statement from the first round of reviews. We had added new data and extensive discussion on this topic. We do not understand why this has not been taken into account. In fact, a major strength of our paper is that it shows that most sex- biased gene expression differences are not neutral!

      There are two major issues here: to identify sex-biased gene expression in the first place, we (and all other papers in the field) use the neutral model as null-hypothesis. Genes that are not compatible with this null-hypothesis are considered sex-biased. In contrast to most previous papers, we have the possibility to take into account the variances between individuals to add an additional significance test. Hence, we can apply a much more rigorous two-step process: first a ratio-cutoff plus a Wilcoxon rank sum test with correction for multiple testing to identify significant deviations from the null-hypothesis. We have added some additional statements in the Results and Discussion sections to emphasize this.Second, by focusing on the genes that are not following a neutral model, the variance and divergences data support the action of selection, rather than neutral drift.

      A higher rate of adaptive coding evolution is inferred among sex-biased genes as a group, but it is not clear whether this signal is driven by many sex-biased genes experiencing a little positive selection, or a few sex-biased genes experiencing a lot of positive selection, so the relationship between expression and protein-coding evolution remains unclear.

      Again, there are two major issues here. First, the distribution of alpha-values shown in Figure 3B are rather homogeneous, i.e. there is not support for a scenario that the average is driven by only a few genes.

      Second, it seems that the referee wants to see an analysis where dn/ds ratios are broken down for every single gene. This has been done in previous papers, but it is now understood that this procedure is fraught with error because of the demographic contingencies inherent to natural populations that can yield wrong results for individual loci. We have added some statements to the text to clarify this further.

      It is likely that only a subset of the gene expression differences detected here will have phenotypic effects relevant for fitness or medicine, but without some idea of how many or which genes comprise this subset, it is difficult to interpret the results in this context.

      It is the basic underlying assumption for the whole research field that significantly sex- biased genes are phenotypically relevant for fitness, since they would otherwise not be sex- biased in the first place.

      Throughout the paper the concepts of sexual selection and sexually antagonistic selection are conflated; while both modes of selection can drive the evolution of sexually dimorphic gene expression, the conditions promoting and consequence of both kinds of selection are different, and the manuscript is not clear about the significance of the results for either mode of selection.

      We had explained in our previous response that our data collection was not designed to distinguish between these two processes. But given that the issue is being brought up again, we have now added some discussion on this issue.

      The manuscript's conclusion that "most of the genetic underpinnings of sex-differences show no long-term evolutionary stability" is not supported by the data, which measured gene expression phenotypes but did not investigate the underlying genetic variation causing these differences between individuals, sexes, or species.

      We agree that - under a strict definition - our use of the term "genetic underpinning" in this conclusion sentence can be criticized. The most correct term would be "transcriptional underpinnings", but of course, given that it is the current practice of the whole field to assume that "transcriptional" is part of the overall genetics, we do not consider our initial statement as incorrect. Still, we have changed the term accordingly.

      Furthermore, most of the gene expression differences are observed between sex-specific organs such as testes and ovaries, which are downstream of the sex-determination pathway that is conserved in these four mouse species, so these conclusions are limited to gene expression phenotypes in somatic organs shared by the sexes.

      Yes - correct. But the whole focus of the paper is on somatic expression, i.e. organs that share the same cell compositions. Of course, the comparison between gonadal organs is conflated by being composed of different cell types. We have extended the discussion of this point.

      The differences between sex-biased expression in mice and humans are attributed to differences in the two species effective population sizes; but the human samples have significantly more environmental variation than the mouse samples taken from age-matched animals reared in controlled conditions, which could also explain the observed pattern.

      These are indeed the two alternative explanations that we had discussed (last paragraph of the discussion section, now the penultimate paragraph).

      The smoothed density plots in Figure 5 are confusing and misleading. Examining the individual SBI values in Table S9 reveals that all of the female and male SBI values for each species and organ are non-overlapping, with the exception of the heart in domesticus and mammary gland in musculus, where one male and one female individual fall within the range of the other sex. The smoothed plots therefore exaggerate the overlap between the sexes;

      Smoothing across discrete values is an entirely standard procedure for continuous variables. It allows to visualize the inherent data trends that cannot easily be glanced from simple inspection of the actual values. This is a mathematical procedure, not an "exaggeration". We used the same smoothening procedure for all the comparisons, and it is clear that the distributions between females and males of the sex organs and a few somatic organs are well separated (non-overlapping), which serves as a control.

      in particular, the extreme variation shown in the SBI in the mammary glands in spretus females and spicilegus males is hard to understand given the normalized values in Table S3. The R code used to generate the smoothed plots is not included in the Github repository, so it is not possible to independently recreate those plots from the underlying data.

      We apologize that there was indeed an error in the Figure - the columns for SPR and SPI were accidentally interchanged. We have corrected this figure. Generally, the smoothened patterns we show are easily verified by looking up the respective primary values. We apologize that the code lines for the plots were accidentally omitted. We have used a standard function from ggplot2: geom_density, with "adjust=3, alpha=0.5" for all plots and included this description in the Methods. We have now added this to the R code in the GitHub repository.

      The correlations provided in Table S9 are confusing - most of the reported correlations are 1.0, which are not recovered when using the SBI values in Table S9, and which does not support the manuscript's assertion that sex-biased gene expression can vary between organs within an individual. Indeed, using the SBI values in Table S9, many correlations across organs are negative, which is expected given the description of the result in the text.

      There is a misunderstanding here. The tables do not report correlations, but only p-values for correlations, the raw ones and the ones after corrections for multiple testing. P = 1.0 means no significant correlation. We have adjusted the caption of this table to clarify this further.

      Reviewer #3 (Public review):

      This manuscript reports interesting data on sex differences in expression across several somatic and reproductive tissues among 4 mice species or subspecies. The focus is on sex- biased expression in the somatic tissues, where the authors report high rates of turnover such that the majority of sex-biased genes are only sex-biased in one or two taxa. The authors show sex-biased genes have higher expression variance than unbiased genes but also provide some evidence that sex-bias is likely to evolve from genes with higher expression variance. The authors find that sex-biased genes (both female- and male-biased) experience more adaptive evolution (i.e., higher alpha values) than unbiased genes. The authors develop a summary statistic (Sex-Bias Index, SBI) of each individual's degree of sex- bias for a given tissue. They show that the distribution of SBI values often overlap considerably for somatic (but not reproductive) tissues and that SBI values are not correlated across tissues, which they interpret as indicating an individual can be relatively "male-like" in one tissue and relatively "female-like" in another tissue.

      This is a good summary of the data, but we are puzzled that it does not include the completely new module analysis and the finding of extremely fast evolution of sex-biased somatic gene expression compared to the gonadal one.

      Though the data are interesting, there are some disappointing aspects to how the authors have chosen to present the work. For example, their criteria for sex-bias requires an expression ratio of one sex to the other of 1.25. A reasonably large fraction of the "sex- biased genes" have ratios just beyond this cut-off (Fig. S1). A gene which has a ratio of 1.27 in taxa 1 can be declared as "sex-biased" but which has a ratio of 1.23 in taxa 2 will not be declared as "sex-biased". It is impossible to know from how the data are presented in the main text the extent to which the supposed very high turnover represents substantial changes in dimorphic expression. A simple plot of the expression sex ratio of taxa 1 vs taxa 2 would be illuminating but the authors declined this suggestion.

      Choosing a cutoff is the standard practice when dealing with continuously distributed data. As we have pointed out, we looked at various cutoff options and decided to use the present one, based on the observed data distributions. Note that some studies have used even lower ones (e.g. 1.1). To visualize the data distribution, we had provided the overall distribution of ratios, because one would have to look at many more plots otherwise. But we have now also added individual plots as Figure 1, Figure supplement 2, as requested. They confirm what is also evident from the overall plots, namely that most ratio changes are larger than the incremental values suggested by the reviewer. Note that the original data are of course also available for inspection.

      I was particularly intrigued by the authors' inference of the proportion of adaptive substitutions ("alpha") in different gene sets. The show alpha is higher for sex-biased than unbiased genes and nicely shows that the genes that are unbiased in focal taxa but sex- biased in the sister taxa also have low alpha. It would be even stronger that sex-bias is associated with adaptive evolution to estimate alpha for only those genes that are sex- biased in the focal taxa but not in the sister taxa (the current version estimates alpha on all sex-biased genes within the focal taxa, both those that are sex-biased and those that are unbiased in the sister taxa).

      We have added the respective values in the results section, but since fewer genes are involved, they are less comparable to the other sets of genes. Still, the tendencies remain.

      The author's Sex Bias Index is measured in an individual sample as: SBI = median(TPM of female-biased genes) - median(TPM of male-biased genes). This index has some strange properties when one works through some toy examples (though any summary statistic will have limitations). The authors do little to jointly discuss the merits and limitations of this metric. It would have been interesting to examine their two key points (degree of overlapping distributions between sexes and correlation across tissues) using other individual measures of sex-bias.

      We had responded to this comment before (including the explanation that it has no strange properties when one applies the normalization that is now implemented) and we have added a whole section devoted to the discussion of the merits of the SBI. We do not know which other "individual measures of sex-bias" this should be compared to. Still, we have now added a paragraph in the discussion about using PCA as an alternative to show that this would result in similar conclusions, but is technically less suitable for this purpose.

      Figure 5 shows symmetric gaussian-looking distributions of SBI but it makes me wonder to what extent this is the magic of model fitting software as there are only 9 data points underlying each distribution. Whereas Figure 5 shows many broadly overlapping distributions for SBI, Figure 6 seems to suggest the sexes are quite well separated for SBI (e.g., brain in MUS, heart in DOM).

      We use a standard fitting function in R (see above), which tries to fit a normalized distribution, but this function can also add an additional peak when the data are too heterogeneous (e.g. Mammary in Figure 7).

      Fig. S1 should be shown as the log(F/M) ratio so it is easier to see the symmetry, or lack thereof, of female and male-biased genes.

      The log will work differently for values <1, compared to values >1 when used in a single plot. We have now generated combined plots with symmetric values to allow a better comparability.

      It is important to note that for the variance analysis that IQR/median was calculated for each gene within each sex for each tissue. This is a key piece of information that should be in the methods or legend of the main figure (not buried in Supplemental Table 17).

      ​We have now moved these descriptions into the Methods section.

    1. Author response:

      Evidence reducibility and clarity

      Reviewer 1:

      In this manuscript, the role of the insulin receptor and the insulin growth factor receptor was investigated in podocytes. Mice, were both receptors were deleted, developed glomerular dysfunction and developed proteinuria and glomerulosclerosis over several months. Because of concerns about incomplete KO, the authors generated podocyte cell lines where both receptors were deleted. Loss of both receptors was highly deleterious with greater than 50% cell death. To elucidate the mechanism, the authors performed global proteomics and find that spliceosome proteins are downregulated. They confirm this by using long-range sequencing. These results suggest a novel role for these pathways in podocytes.

      Thank you

      This is primarily a descriptive study and no technical concerns are raised. The mechanism of how insulin and IGF1 signaling are linked to the spiceosome is not addresed.

      We do not think the paper is descriptive as we used non-biased phospho and total proteomics in the DKO cells to uncover the alterations in the spliceosome (that have not been previously described) that were detrimental. However, we are happy to look further into the underlying mechanism.

      We would propose:

      (1) Stimulating/inhibiting insulin/IGF signalling pathways in the Wild-type and DKO knockout cells and check expression levels and/or phosphorylation status of splice factors (including those in Figure 3E) and those revealed by phospho-proteomic data; a variety of inhibitors of insulin/IGF1 pathways could also be used along the pathways that are shown in Fig 2.

      (2) Looking at the RNaseq data bioinformatically in more detail – the introns/exons that move up or down are targets of the splice factors involved; most splice factors binding sequences are known, so it should be possible to ask bioinformatically – from the sequences around the splice sites of the exons and introns that move in the DKO, which splice factors binding sites are seen most frequently? To uncover splice factors/RNA-binding proteins (RBPs) that are involved in the insulin signaling we will use a software named MATT which was specifically designed to look for RNA-binding motifs (PMID 30010778). In brief, using the long-sequencing data, we will test 250 nt sequences flanking the splice sites of all regulated splicing events (intronic and exonic) against all RNA- binding proteins in the CISBP-RNA database (PMID 23846655) using MATT. This will result in a list of RBPs potentially involved in the insulin signaling. We will validate these by activating insulin signaling (similar to Figures 2 B,C) and probe whether the RBPs are activated (e.g. phosphorylated or change in expression) or we will manipulate expression of the candidate RBPs and measure how they affect the insulin signaling.

      (3) Examining the phospho and total proteomic data for IGF1R and Insulin receptor knockout alone podocytes (which we have already generated) and analysing these in more detail and include this data set to elucidate the relative importance of both receptors to spliceosome function.

      The phenotype of the mouse is only superficially addressed. The main issues are that the completeness of the mouse KO is never assessed nor is the completeness of the KO in cell lines. The absence of this data is a significant weakness.

      We apologise for not making clear but we did assess the level of receptor knockdown in the animal and cell models.  The in vivo model showed variable and non-complete levels of insulin receptor and IGF1 receptor podocyte knock down (shown in supplementary figure 1B). This is why we made the in vitro  floxed podocyte cell lines in which we could robustly knockdown both the insulin receptor and IGF1 receptor (shown in Figure 2A)

      The mouse experiments would be improved if the serum creatinines were measured to provide some idea how severe the kidney injury is.

      We can address this:

      We have further urinary Albumin:creatinine ratio (uACR) data at 12, 16 and 20 weeks. We also have more blood tests of renal function that can be added. There is variability in creatinine levels which is not uncommon in transgenic mouse models (probably partly due to variability in receptor knock down with cre-lox system). This is part of rationale of developing the robust double receptor knockout cell models where we knocked out both receptors by >80%.

      An attempt to rescue the phenotype by overexpression of SF3B4 would also be useful. If this didn't work, an explanation in the text would suffice.

      We would consider  over express SF3BF4 in the Wild type and DKO cells and assess the effects on spliceosome if deemed necessary.  However, we think it is unlikely to rescue the phenotype as so many other spliceosome components are downregulated in the DKO cells.

      As insulin and IGF are regulators of metabolism, some assessment of metabolic parameters would be an optional add-on.

      We have some detail on this and can add to the manuscript. However it is not extensive as not a major driver of this work.

      Lastly, the authors should caveat the cell experiments by discussing the ramifications of studying the 50% of the cells that survive vs the ones that died.

      Thank you, we appreciate this and this was the rationale behind cells being studied after 2 days differentiation before significant cell loss in order to avoid the issue of studying the 50% of cells that survive.

      Reviewer 2:

      In this manuscript, submitted to Review Commons (journal agnostic), Coward and colleagues report on the role of insulin/IGF axis in podocyte gene transcription. They knocked out both the insulin and IGFR1 mice. Dual KO mice manifested a severe phenotype, with albuminuria, glomerulosclerosis, renal failure and death at 4-24 weeks.

      Long read RNA sequencing was used to assess splicing events. Podocyte transcripts manifesting intron retention were identified. Dual knock-out podocytes manifested more transcripts with intron retention (18%) compared wild-type controls (18%), with an overlap between experiments of ~30%.

      Transcript productivity was also assessed using FLAIR-mark-intron-retention software. Intron retention w seen in 18% of ciDKO podocyte transcripts compared to 14% of wild-type podocyte transcripts (P=0.004), with an overlap between experiments of ~30% (indicating the variability of results with this method). Interestingly, ciDKO podocytes showed downregulation of proteins involved in spliceosome function and RNA processing, as suggested by LC/MS and confirmed by Western blot.

      Pladienolide (a spliceosome inhibitor) was cytotoxic to HeLa cells and to mouse podocytes but no toxicity was seen in murine glomerular endothelial cells.<br /> Specific comments.

      The manuscript is generally clear and well-written. Mouse work was approved in advance. The six figures are generally well-designed, bars/superimposed dot-plots.

      Thank you

      Evaluation.

      Methods are generally well described. It would be helpful to say that tissue scoring was performed by an investigator masked to sample identity.

      We did this and will add this information to the methods/figure legend.

      Specific comments.

      (1) Data are presented as mean/SEM. In general, mean/SD or median/IQR are preferred to allow the reader to evaluate the spread of the data. There may be exceptions where only SEM is reasonable.

      Graphs can be changed to SD rather than SEM.

      (2) It would be useful to for the reader to be told the number of over-lapping genes (with similar expression between mouse groups) and the results of a statistical test comparing WT and KO mice. The overlap of intron retention events between experimental repeats was about 30% in both knock-out podocytes. This seems low and I am curious to know whether this is typical for typical for this method; a reference could be helpful.

      This is an excellent question. We had 30% overlap as the parameters used for analysis were very stringent. We suspect we could get more than 30% by being less stringent, which still be considered as similar events if requested. Our methods were based on FLAIR analysis (PMID: 32188845)

      (3) Please explain "adjusted p value of 0.01." It is not clear how was it adjusted. The number of differentially-expressed proteins between the two cell types was 4842.

      We used the Benjamini-Hochberg method to adjust our data. We think the reviewer is referring to the transcriptomic data and not the proteomic data.

      Minor comments

      Page numbers in the text would help the reviewer communicate more effectively with the author.

      We will do this

      Reviewer 3:

      These investigators have previously shown important roles for either insulin receptor (IR) or insulin-like growth factor receptor (IGF1R) in glomerular podocyte function. They now have studied mice with deletion of both receptors and find significant podocyte dysfunction. They then made a podocyte cell line with inducible deletion of both receptors and find abnormalities in transcriptional efficiency with decreased expression of spliceosome proteins and increased transcripts with impaired splicing or premature termination.

      The studies appear to be performed well and the manuscript is clearly written.

      Thank you

      Referees cross-commenting

      I am in agreement with Reviewer 1 that the studies are overly descriptive and do not provide sufficient mechanism and the lack of more investigation of the in vivo model is a significant weakness.

      Please see our responses to reviewer 1 above.

      Significance

      Reviewer 1:

      With the GLP1 agonists providing renal protection, there is great interest in understanding the role of insulin and other incretins in kidney cell biology. It is already known that Insulin and IGFR signaling play important roles in other cells of the kidney. So, there is great interest in understanding these pathways in podocytes. The major advance is that these two pathways appear to have a role in RNA metabolism, the major limitations are the lack of information regarding the completeness of the KO's. If, for example, they can determine that in the mice, the KO is complete, that the GFR is relatively normal, then the phenotype they describe is relatively mild.

      Thank you. The receptor  KO in the mice is unlikely to be complete (Please see comments above and Supplementary Figure 1b). There are many examples of KO models targeting other tissues showing that complete KO of these receptors seems difficult to achieve , particularly in reference to the IGF1 receptor. In the brain (which is also terminally differentiated cells PMID:28595357 (barely 50% iof IGF1R knockdown was achieved in the target cells). Ovarian granulosa cells PMID:28407051 -several tissue specific drivers tried but couldn't achieve any better than 80%. The paper states that 10% of IGF1R is sufficient for function in these cells so they conclude that their knockdown animals are probably still responding to IGF1. Finally, in our recent IGF1R podocyte knockdown model we found Cre levels were important for excision of a single floxed gene (PMID: 38706850) hence we were not surprised that trying to excise two floxed genes (insulin receptor and IGF1 receptor) was challenging. This is the rationale for making the double receptor knockout cell lines to understand process / biology in more detail.

      Reviewer 2:

      The manuscript is generally clear and well-written. Mouse work was approved in advance. The figures are generally well-designed, bars/superimposed dot-plots.

      Evaluation.

      Methods are generally well described. It would be helpful to say that tissue scoring was performed by an investigator masked to sample identity.

      Thank you we will do this.

      Reviewer 3:

      There are a number of potential issues and questions with these studies.

      (1) For the in vivo studies, the only information given is for mice at 24 weeks of age. There needs to be a full time course of when the albuminuria was first seen and the rate of development. Also, GFR was not measured. Since the podocin-Cre utilized was not inducible, there should be a determination of whether there was a developmental defect in glomeruli or podocytes. Were there any differences in wither prenatal post natal development or number of glomeruli?

      Thank you we will add in further phenotyping data. We do not think there was a major developmental phenotype as  albuminuria did not become significantly different until several months of age. We could have used a doxycycline inducible model but we know the excision efficiency is much less than the podocin-cre driven model SUPP FIGURE 1. This would likely give a very mild (if any) phenotype and not reveal the biology adequately.

      (2) Although the in vitro studies are of interest, there are no studies to determine if this is the underlying mechanism for the in vivo abnormalities seen in the mice. Cultured podocytes may not necessarily reflect what is occurring in podocytes in vivo.

      Thank you for this we are happy to employ Immunohistochemistry (IHC) and immunofluorescence (IF) using spliceosome antibodies on tissue sections from DKO and control mice to examine spliceosome changes. However, as the DKO results in podocyte loss, there may not be that many DKO podocytes still present in the tissue sections. This will be taken into consideration.

      (3) Given that both receptors are deleted in the podocyte cell line, it is not clear if the spliceosome defect requires deletion of both receptors or if there is redundancy in the effect. The studies need to be repeated in podocyte cell lines with either IR or IGFR single deletions.

      Thank you. We have full total and phospho-proteomic data sets from single insulin receptor and IGF1 receptor knockout cell lines that we will investigate for this point.

      (4) There are not studies investigating signaling mechanisms mediating the spliceosome abnormalities.

      Thank you as outlined as above to reviewer 1 point 1 we are very happy to investigate insulin / IGF signalling pathways in more detail.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      In this study, Ma et al. aimed to determine previously uncharacterized contributions of tissue autofluorescence, detector afterpulse, and background noise on fluorescence lifetime measurement interpretations. They introduce a computational framework they named "Fluorescence Lifetime Simulation for Biological Applications (FLiSimBA)" to model experimental limitations in Fluorescence Lifetime Imaging Microscopy (FLIM) and determine parameters for achieving multiplexed imaging of dynamic biosensors using lifetime and intensity. By quantitatively defining sensor photon effects on signal-to-noise in either fitting or averaging methods of determining lifetime, the authors contradict any claims of FLIM sensor expression insensitivity to fluorescence lifetime and highlight how these artifacts occur differently depending on the analysis method. Finally, the authors quantify how statistically meaningful experiments using multiplexed imaging could be achieved. 

      A major strength of the study is the effort to present results in a clear and understandable way given that most researchers do not think about these factors on a day-to-day basis. The model code is available and written in Matlab, which should make it readily accessible, although a version in other common languages such as Python might help with dissemination in the community. One potential weakness is that the model uses parameters that are determined in a

      specific way by the authors, and it is not clear how vastly other biological tissue and microscope setups may differ from the values used by the authors. 

      Overall, the authors achieved their aims of demonstrating how common factors

      (autofluorescence, background, and sensor expression) will affect lifetime measurements and they present a clear strategy for understanding how sensor expression may confound results if not properly considered. This work should bring to awareness an issue that new users of lifetime biosensors may not be aware of and that experts, while aware, have not quantitatively determined the conditions where these issues arise. This work will also point to future directions for improving experiments using fluorescence lifetime biosensors and the development of new sensors with more favorable properties. 

      We appreciate the comments and helpful suggestions. We now also include FLiSimBA simulation code in Python in addition to Matlab to make it more accessible to the community.

      One advantage of FLiSimBA is that the simulation package is flexible and adaptable, allowing users to input parameters based on the specific sensors, hardware, and autofluorescence measurements for their biological and optical systems. We used parameters based on a FRETbased sensor, measured autofluorescence from mouse tissue, and measured dark count/after pulse of our specific GaAsP PMT in this manuscript as examples. In Discussion and Materials and methods, we now emphasize this advantage and further clarify how these parameters can be adapted to diverse tissues, imaging systems, and sensors based on individual experiments. We further explain that these input parameters will not affect the conclusions of our study, but the specific input parameters would alter the quantitative thresholds.

      Reviewer #2 (Public review): 

      Summary: 

      By using simulations of common signal artefacts introduced by acquisition hardware and the sample itself, the authors are able to demonstrate methods to estimate their influence on the estimated lifetime, and lifetime proportions, when using signal fitting for fluorescence lifetime imaging. 

      Strengths: 

      They consider a range of effects such as after-pulsing and background signal, and present a range of situations that are relevant to many experimental situations. 

      Weaknesses: 

      A weakness is that they do not present enough detail on the fitting method that they used to estimate lifetimes and proportions. The method used will influence the results significantly. They seem to only use the "empirical lifetime" which is not a state of the art algorithm. The method used to deconvolve two multiplexed exponential signals is not given. 

      We appreciate the comments and constructive feedback. Our revision based on the reviewer’s suggestions has made our manuscript clearer and more user friendly. We originally described the detail of the fitting methods in Materials and methods. Given the importance of these methodological details for evaluating the conclusions of this study, we have moved the description of the fitting method from Materials and methods to Results. In addition, we provide further clarification and more details of the rationale of using these different methods of lifetime estimates in Discussion to aid users in choosing the best metric for evaluating fluorescence lifetime data.

      More specifically, we modified our writing to highlight the following.

      (1) In Results, we describe that lifetime histograms were fitted to Equation 3 with the GaussNewton nonlinear least-square fitting algorithm and the fitted P<sub1</sub> was used as lifetime estimation.

      (2) In Results, we clarify that our simulation of multiplexed imaging was modeled with two sensors, each displaying a single exponential decay, but the two sensors have different decay constants. We also describe that Equation 3 with the Gauss-Newton nonlinear least-square fitting algorithm was used to deconvolve the two multiplexed exponential signals (Fig. 8)

      Reviewer #3 (Public review): 

      Summary: 

      This study presents a useful computational tool, termed FLiSimBA. The MATLAB-based FLiSimBA simulations allow users to examine the effects of various noise factors (such as autofluorescence, afterpulse of the photomultiplier tube detector, and other background signals) and varying sensor expression levels. Under the conditions explored, the simulations unveiled how these factors affect the observed lifetime measurements, thereby providing useful guidelines for experimental designs. Further simulations with two distinct fluorophores uncovered conditions in which two different lifetime signals could be distinguished, indicating multiplexed dynamic imaging may be possible. 

      Strengths: 

      The simulations and their analyses were done systematically and rigorously. FliSimba can be useful for guiding and validating fluorescence lifetime imaging studies. The simulations could define useful parameters such as the minimum number of photons required to detect a specific lifetime, how sensor protein expression level may affect the lifetime data, the conditions under which the lifetime would be insensitive to the sensor expression levels, and whether certain multiplexing could be feasible. 

      Weaknesses: 

      The analyses have relied on a key premise that the fluorescence lifetime in the system can be described as two-component discrete exponential decay. This means that the experimenter should ensure that this is the right model for their fluorophores a priori and should keep in mind that the fluorescence lifetime of the fluorophores may not be perfectly described by a twocomponent discrete exponential (for which alternative algorithms have been implemented: e.g., Steinbach, P. J. Anal. Biochem. 427, 102-105, (2012)). In this regard, I also couldn't find how good the fits were for each simulation and experimental data to the given fitting equation (Equation 2, for example, for Figure 2C data). 

      We thank the reviewer for the constructive feedback. We agree that the FLiSimBA users should ensure that the right decay equations are used to describe the fluorescent sensors. In this study, we used a FRET-based PKA sensor FLIM-AKAR to provide proof-of-principle demonstration of the capability of FLiSimBA. The donor fluorophore of FLIM-AKAR, truncated monomeric enhanced GFP, displays a single exponential decay. FLIM-AKAR, a FRET-based sensor, displays a double exponential decay. The time constants of the two exponential components were determined and reported previously (Chen, et al, Neuron (2017)).  Thus, a double exponential decay equation with known τ<sub>1</sub> and τ<sub>2</sub> was used for both simulation and fitting. The goodness of fit is now provided in Supplementary Fig. 1 for both simulated and experimental data. In addition to referencing our prior study characterizing the double exponential decay model of FLIM-AKAR in Materials and methods, we have emphasized in Discussion the versality of FLiSimBA to adapt to different sensors, tissues, and analysis methods, and the importance of using the right mathematical models to describe the fluorescence decay of specific sensors. 

      Also, in Figure 2C, the 'sensor only' simulation without accounting for autofluorescence (as seen in Sensor + autoF) or afterpulse and background fluorescence (as seen in Final simulated data) seems to recapitulate the experimental data reasonably well. So, at least in this particular case where experimental data is limited by its broad spread with limited data points, being able to incorporate the additional noise factors into the simulation tool didn't seem to matter too much.  

      In the original Fig 2C, the sensor fluorescence was much higher than the contributions from autofluorescence, afterpulse, and background signals, resulting in minimal effects of these other factors, as the reviewer noted. This original figure was based on photon counts from single neurons expressing FLIM-AKAR. For the rest of the manuscript, photon counts were based on whole fields of view (FOV). Since the FOV includes cells that do not express fluorescent sensors, the influence of autofluorescence, dark currents, and background is much more pronounced, as shown in Fig. 2B. 

      Both approaches – using photon counts from the whole FOV or from individual neurons – have their justifications. Photon counts from the whole FOV simulate data from fluorescence lifetime photometry (FLiP), whereas photon counts from individual neurons simulate data from fluorescence lifetime imaging microscopy (FLIM). However, the choice of approach does not affect the conclusions of the manuscript, as a range of photon count values are simulated. To maintain consistency throughout the manuscript, we have revised the photon counts in this figure (now Supplementary Fig. 1C) to match those from the whole FOV.

      Additionally, we have made some modifications in our analyses of Supplementary Fig. 1C and Fig. 2B, detailed in the “FLIM analysis” section of Materials and methods. For instance, to minimize system artifact interference at the histogram edges, we now use a narrower time range (1.8 to 11.5 ns) for fitting and empirical lifetime calculation.

      Reviewer #1 (Recommendations for the authors): 

      (1) The authors report how autofluorescence was measured from "imaged brain slices from mice at postnatal 15 to 19 days of age without sensor expression." However, it remains unclear how many acute slices and animals were used (for example, were all 15um x 15um FOV from a single slice) and if mouse age affects autofluorescence quantification. Furthermore, would in vivo measurements have different autofluorescence conditions given that blood flow would be active? It would help if the authors more clearly explained how reliable their autofluorescence measurement is by clarifying how they obtained it, whether this would vary across brain areas, and whether in vitro vs in vivo conditions would affect autofluorescence. 

      We have added description in Materials and methods that for autofluorescence ‘Fluorescence decay histograms from 19 images of two brain slices from a single mouse were averaged.’ We have added in Discussion that users should carefully ‘measure autofluorescence that matches the age, brain region, and data collection conditions (e.g., ex vivo or in vivo) of their tissue…’, and emphasize that FLiSimBA offers customization of inputs, and it is important for users to adapt the inputs such as autofluorescence to their experimental conditions. We also clarify in Discussion that the change of input parameters such as autofluorescence across age and brain region would not affect the general insights from this study, but will affect quantitative values.

      (2) Does sensor expression level issues arise more with in-utero electroporation compared to AAV-based delivery of biosensors? A brief comment on this in the discussion may help as most users in the field today may be using AAV strategies to deliver biosensors.

      In our experience, in-utero electroporation results in higher sensor expression than AAV-based delivery, and so pose less concern for expression-level dependence. However, both delivery methods can result in expression level dependence, especially with a sensor that is not bright. We have added in Discussion ‘For a sensor with medium brightness delivered via in utero electroporation, adeno-associated virus, or as a knock-in gene, the brightness may not always fall within the expression level-independent regime.’

      (3) Figure 1. Should the x-axis on the top figures be "Time (ns)" instead of "Lifetime (ns)"?

      Similarly in Figure 8A&B, wouldn't it make more sense to have the x-axis be Time not Lifetime?

      The x-axis labels in Fig. 1 and Fig. 8A-8B have been changed to ‘Time (ns)’.   

      (4) Figure 2b: why is the empirical lifetime close to 3.5ns? Shouldn't it be somewhere between

      2.14 and 0.69? 

      In our empirical lifetime calculation, we did not set the peak channel to have a time of 0.0488 ns (i.e. the laser cycle 12.5 ns divided by 256 time channels). Rather, we set the first time channel within a defined calculation range (i.e. 1.8 ns in Supplementary Fig. 1B) to have a time of 0.0488 ns (i.e.). Thus, the empirical lifetime exceeds 2.14 ns and depends on the time range of the histogram used for calculation. 

      For Fig. 2B and Supplementary Fig. 1C, we have now adjusted the range to 1.8-11.5 ns to eliminate FLIM artifacts at the histogram edges in our experimental data, resulting in an empirical lifetime around 2.255 ns. In contrast, the range for calculating the empirical lifetime of simulated data in the rest of the study (e.g. Fig. 4D) is 0.489-11.5 ns, yielding a larger lifetime of ~3.35 ns. 

      We have clarified these details and our rationale in Materials and methods.

      (5) Figure 2b: how come the afterpulse+background contributes more to the empirical lifetime than the autofluorescence (shorter lifetime). This was unclear in the results text why autofluorescence photons did not alter empirical lifetime as much as did the afterpulse/background.

      With a histogram range from 1.8 ns to 11.5 ns used in Fig. 2B, the empirical lifetime for FLIM-AKAR sensor fluorescence, autofluorescence, and background/afterpulse are: 2-2.3 ns, around 1.69 ns, and around 4.90 ns. The larger difference of background/afterpulse from FLIM-AKAR sensor fluorescence leads to larger influence of afterpulse+background than autofluorescence. We have added an explanation of this in Results.

      (6) One overall suggestion for an improvement that could help active users of lifetime biosensors understand the consequences would be to show either a real or simulated example of a "typical experiment" conducted using FLIM-AKAR and how an incorrect interpretation could be drawn as a consequence of these artifacts. For example, do these confounds affect experiments involving comparisons across animals more than within-subject experiments such as washing a drug onto the brain slice, and the baseline period is used to normalize the change in signal? I think this type of direct discussion will help biosensor users more deeply grasp how these factors play out in common experiments being conducted.

      We have added the following in Discussion, ‘…While this issue is less problematic when the same sample is compared over short periods (e.g. minutes), It can lead to misinterpretation when fluorescence lifetime is compared across prolonged periods or between samples when comparison is made across chronic time periods or between samples with different sensor expression levels. For example, apparent changes in fluorescence lifetime observed over days, across cell types, or subcellular compartments may actually reflect variations in sensor expression levels rather than true differences in biological signals (Fig. 6), Therefore, considering biologically realistic factors in FLiSimBA is essential, as it qualitatively impacts the conclusions.’

      Reviewer #2 (Recommendations for the authors): 

      The paper would be improved with more detail on the fitting methods, and the use of state-of-theart methods. Consult for example the introduction of this paper where many methods are listed: https://www.mdpi.com/1424-8220/22/19/7293

      We have moved the description of the Gauss-Newton nonlinear least-square fitting algorithm from Materials and methods to Results to enhance clarity. We appreciate the reviewer’s suggestion to combine FLiSimBA with various analysis methods. However, the primary focus of our manuscript is to call for attention of how specific contributing factors in biological experiments influence FLIM data, and to provide a tool that rigorously considers these factors to simulate FLIM data, which can then be used for fitting. Therefore, we did not expand the scope of our manuscript. Instead, we have added in the Discussion that ‘‘FLiSimBA can be used to test multiple fitting methods and lifetime metrics as an exciting future direction for identifying the best analysis method for specific experimental conditions’, citing relevant references.

      I would also improve the content of the GitHub repository as it is very hard to identify to source code used for simulation and fitting. 

      We have reorganized and relabeled our GitHub repository and now have three folders labeled as ‘Simulation_inMatlab’, ‘DataAnalysis_inMatlab’, and ‘SimulationAnalysis_inPython’. We also updated the clarification of the contents of each folder in the README file.

      Reviewer #3 (Recommendations for the authors): 

      (1) P. 10 "For example, to detect a P1 change of 0.006 or a lifetime change of 5 ps with one sample measurement in each comparison group, approximately 300,000 photons are needed." If I am reading the graphs in Figures 3B and C, this sentence is talking about the red line. However, the intersection of 0.006 in the MDD of P1 in 3B and red is not 3E5 photons. And the intersection of 0.005 ns and red in 3C is not 3E5 photons either. Are you sure you are talking about n=1? Maybe the values are correct for the blue curve with n=5.

      Thank you for catching our error. We have corrected the text to ‘with five sample measurements’.

      (2) Figure 2 (B) legend: It would be helpful to specify what is being compared in the legend. For example, consider revising "* p < 0.05 vs sensor only; n.s. not significant vs sensor + autoF; # p < 0.05 vs sensor + autoF. Two-way ANOVA with Šídák's multiple comparisons test" to "* p <0.05 for sensor + auto F (cyan) vs sensor only; n.s. not significant for final simulated data (purple) vs sensor + autoF; # p < 0.05 for final simulated data (purple) vs sensor + autoF. Twoway ANOVA with Šídák's multiple comparisons test".

      We’ve made the change and thanks for the suggestion to make it clearer.

      (3) Figure 2 (c) Can you please show the same Two-way ANOVA test values for Experimental vs. Sensor only and for Experimental vs. Sensor + autoF? Currently, the value (n.s.) is marked only for Experimental vs. Final simulation. Given that the experimental data are sparse (compared to the simulations), it seems likely that there may be no significant difference among the 3 different simulations regarding how well they match the experimental data. Also, can you specify the P1 and P2 of the experimental data  used to generate the simulated data on this panel? Also, what is the reason why P1=0.5 was used for panels A and B, instead of the value matching the experimental value?

      As the reviewer suggested, we have included statistical tests in the figure (now Supplementary Fig. 1C). Please see our response to the Public Review of Reviewer 3’s comments as well as our changes in Materials and Methods on other changes and their rationale for this figure. We have now specified the P<sub>1</sub> value of the experimental data used to generate the simulated data on this panel both in Figure Legends and Materials and Methods. Based on the suggestion, we have now used the same P<sub>1</sub> value in Fig. 2B.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #3 (Public Review)

      Summary:

      In this paper the authors examined the effects of strip cropping, a relatively new agricultural technique of alternating crops in small strips of several meters wide, on ground beetle diversity. The results show an increase in species diversity (i.e. abundance and species richness) of the ground beetle communities compared to monocultures.

      Strengths:

      The article is well written; it has an easily readable tone of voice without too much jargon or overly complicated sentence structure. Moreover, as far as reviewing the models in depth without raw data and R scripts allows, the statistical work done by the authors looks good. They have well thought out how to handle heterogenous, unbalanced and taxonomically unspecific yet spatially and temporarily correlated field data. The models applied and the model checks performed are appropriate for the data at hand. Combining RDA and PCA axes together is a nice touch. Moreover, after the first round of reviews, the authors have done a great job at rewriting the paper to make it less overstated, more relevant to the data at hand and more solid in the findings. Many of the weaknesses noted in the first review have been dealt with. The overall structure of the paper is good, with a clear introduction, hypotheses, results section and discussion.

      We are grateful for this positive feedback. We are glad that our extensive revision after extensive review from three reviewers has paid off in addressing earlier weakness of our manuscript.

      Weaknesses:

      The weaknesses that remain are mainly due to a difficult dataset and choices that could have stressed certain aspects more, like the relationship between strip cropping and intercropping. The mechanistic understanding of strip cropping is what is at stake here. Does strip cropping behave similar to intercropping, a technique which has been proven to be beneficial to biodiversity because of added effects due to increased resource efficiency and greater plant species richness.

      Unfortunately, the authors do not go into this in the introduction or otherwise and simply state that they consider strip cropping a form of intercropping.

      We agree with the reviewer that a mechanistic understanding on how intercropping and strip cropping differ would be very interesting. However, we also feel that this topic is somewhat beyond the scope of the current manuscript. We are already planning work to elucidate mechanisms that may explain the pest and suppressive effects of strip cropping.

      I also do not like the exclusive focus on percentages, as these are dimensionless. I think more could have been done to show underlying structure in the data, even after rarefaction.

      While we generally agree with this point raised by the reviewer, for our heterogeneous dataset it was difficult to come up with meaningful units with dimensions. Therefore, we believe that percentages are the most suitable approach to present readers a fair comparison of the treatments.

      A further weakness is a limited embedding into the larger scientific discourses other than providing references. But this may be a matter of style and/or taste

      We believe our manuscript to be well-embedded within the relevant scientific discourse, but as indicated by reviewer 3 this might indeed be a matter of style/taste. Without exact examples it is difficult for us to judge this point.

      Reviewer #3 (Recommendations for the authors): 

      Suggestion for title: "Strip cropping shows promising preliminary increases in ground beetle community diversity compared to monocultures"

      We agree that the title could indeed be nuanced. We incorporated the suggested title, except for the word “preliminary”, as we felt that this is slightly misplaced for a 4-year study conducted at 4 locations.

      line 26: the word previous may be confusing to readers, as it suggests previous research on beetles or insects. I think it would be better to use for instance "related" or "productivity focused research"

      We agree that this wording might be confusing, and changed it to “other studies showed”.

      Line 84-85: this is vague. can you make explicit what you are trying to answer here?

      We made “biodiversity metric changes” more explicit, and changed the sentence accordingly.

      Line 88-89: I think this would fit better with the first question in line 83-84, so I suggest placing it upwards. Also, I think you mean abundant instead of common. Common suggests commonness in the entire population. Abundant suggests found often in this study. While these definitions may very much overlap, they are distinctly different.

      We have moved this sentence up and changed “common” to “abundant”. To make the result section more in line with this section, we also moved the section on the relationship between crop configuration and abundant genera up.  

      Line 146: defining rareness of species should be in the methods section. Also "following" would be better than "according"

      We now added a sentence on how we examine habitat preferences and rarity in the methods section (line 316-317). We also changed “according to” to “following”.

      Line 291: it is called being "flush" with the soil surface. This expression is not much used by non-native speakers, but is regularly encountered in studies on pitfalls, so the authors could decide to change the sentence using the proper English vernacular.

      Suggestion incorporated.

      Line 322-327, this method could do with a reference

      This method is a relatively standard calculation to calculate relative changes and to center variation around zero. Nevertheless, we added a reference to a paper that used the same method.

      Line: 333-335. I would still like to see a reference for this method.

      This methodology has not been described in literature to the best of our knowledge. As we compared two crops within strip cropping with their respective monoculture references, we compare one strip cropping field with two monocultural fields. Here we took a conservative approach by comparing the strip crop field with the monoculture with the highest richness and activity density, to see if strip cropped fields outperformed monocultures with diverse ground beetle communities.

      Line 364-366. references?

      We have added references for these R packages.

    1. Author response:

      The following is the authors’ response to the previous reviews

      We would like to thank you and your chosen reviewers for the diligent work and insightful comments. Following the latest round of feedback, we have made the following changes to the manuscript:

      (1) We have added details regarding the specific versions of Cryosparc and cryoDRGN used in our analysis.

      (2) We have addressed Reviewer 2’s comment concerning the negative RMSF values in Figure S12. The negative values occur because this display shows the difference in RMSF values from the MD simulations of glycosylated versus non-glycosylated ACE. To avoid similar confusion, we have split Figure S12 into three panels. Panels A and B show the RMSF values for each residue in the glycosylated and non-glycosylated sACE MD simulations, respectively, and all values here are positive. Panel C (the original Figure S12) now includes expanded labeling to clarify that it depicts the difference in RMSF values between the presence and absence of glycans. In this panel, a negative value indicates that the residues exhibit higher RMSF in simulations where glycans are present. The figure legend has been revised to accurately describe the updated figure.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The authors claim that they can use a combination of repetitive transcranial magnetic stimulation (intermittent theta burst-iTBS) and transcranial alternating current stimulation (gamma tACS) to cause slight improvements in memory in a face/name/profession task.

      Strengths:

      The idea of stimulating the human brain non-invasively is very attractive because, if it worked, it could lead to a host of interesting applications. The current study aims to evaluate one such exciting application.

      Weaknesses:

      (1) It is highly unclear what, if anything, transpires in the brain with non-invasive stimulation. To cite one example of many, a rigorous study in rats and human cadavers, compellingly showed that traditional parameters of transcranial electrical stimulation lead to no change in brain activity due to the attenuation by the soft tissue and skull (Mihály Vöröslakos et al Nature Communications 2018): https://www.nature.com/articles/s41467-018-02928-3. It would be very useful to demonstrate via invasive neurophysiological recordings that the parameters used in the current study do indeed lead to any kind of change in brain activity. Of course, this particular study uses a different non-invasive stimulation protocol.

      Thank you for raising the important issue regarding the actual neurophysiological effects of non-invasive brain stimulation. Unfortunately, invasive neurophysiological recordings in humans for this type of study are not feasible due to ethical constraints, while studies on cadavers or rodents would not fully resolve our question. Indeed, the authors of the cited study (Mihály Vöröslakos et al., Nature Communications, 2018) highlight the impossibility of drawing definitive conclusions about the exact voltage required in the in-vivo human brain due to significant differences between rats and humans, as well as the in-vivo human brain and cadavers due to alterations in electrical conductivity that occur in postmortem tissue. Huang and colleagues addressed the difficulties in reaching direct evidence of non-invasive brain stimulation (NIBS) effects in a review published in Clinical Neurophysiology in 2017. They conclude that the use of EEG to assess brain response to TMS has great potential for a less indirect demonstration of plasticity mechanisms induced by NIBS in humans.

      To address this challenge, we conducted Experiments 3 and 4, which respectively examined the neurophysiological and connectivity changes induced by the stimulation in a non-invasive manner using TMS-EEG and fMRI. The observed changes in brain oscillatory activity (increased gamma oscillatory activity), cortical excitability (enhanced posteromedial parietal cortex reactivity), and brain connectivity (strengthened connections between the precuneus and hippocampi) provided evidence of the effects of our non-invasive brain stimulation protocol, further supporting the behavioral data.

      Additionally, we carefully considered the issue of stimulation distribution and, in response, performed a biophysical modeling analysis and E-field calculation using the parameters employed in our study (see Supplementary Materials).

      We acknowledge that further exploration of this aspect would be highly valuable, and we agree that it is worth discussing both as a technical limitation and as a potential direction for future research. We therefore, modify the discussion accordingly (main text, lines 280-289).

      “Although we studied TMS and tACS propagation through the E-field modeling and observed an increase in the precuneus gamma oscillatory activity, excitability and connectivity with the hippocampi, we cannot exclude that our results might reflect the consequences of stimulating more superficial parietal regions other than the precuneus nor report direct evidence of microscopic changes in the brain after the stimulation. Invasive neurophysiological recordings in humans for this type of study are not feasible due to ethical constraints. Studies on cadavers or rodents would not fully resolve our question due to significant differences between them (i.e. rodents do not have an anatomical correspondence while cadavers have an alterations in electrical conductivity occurring in postmortem tissue). However, further exploration of this aspect in future studies would help in the understanding of γtACS+iTBS effects.”

      (2) If there is any brain activity triggered by the current stimulation parameters, then it is extremely difficult to understand how this activity can lead to enhancing memory. The brain is complex. There are hundreds of neuronal types. Each neuron receives precise input from about 10,000 other neurons with highly tuned synaptic strengths. Let us assume that the current protocol does lead to enhancing (or inhibiting) simultaneously the activity of millions of neurons. It is unclear whether there is any activity at all in the brain triggered by this protocol, it is also unclear whether such activity would be excitatory, or inhibitory. It is also unclear how many neurons, let alone what types of neurons would change their activity. How is it possible that this can lead to memory enhancement? This seems like using a hammer to knock on my laptop and hope that the laptop will output a new Mozart-like sonata.

      Thank you for your comment. As you correctly point out, we still do not have precise knowledge of which neurons—and to what extent—are activated during non-invasive brain stimulation in humans. However, this challenge is not limited to brain stimulation but applies to many other therapeutic interventions, including psychiatric medications, without limiting their use.

      Nevertheless, a substantial body of research has investigated the mechanisms underlying the efficacy of TMS and tACS in producing behavioral after-effects, primarily through its ability to induce long-term potentiation (Bliss & Collingridge, The Journal of Physiology, 1993a; Ridding & Rothwell, Nature Reviews Neuroscience, 2007; Huang et al., Clinical Neurophysiology, 2017; Koch et al., Neuroimage 2018; Koch et al., Brain 2022; Jannati et al., Neuropsychopharmacology, 2023; Wischnewski et al., Trends in Cognitive Science, 2023; Griffiths et al., Trends in Neuroscience, 2023).

      We acknowledge that we took this important aspect for granted. We consequently expanded the introduction accordingly (main text, lines 48-60).

      “Repetitive transcranial magnetic stimulation (rTMS) and transcranial alternating current stimulation (tACS) are two forms of NIBS widely used to enhance memory performances (Grover et al., 2022; Koch et al., 2018; Wang et al., 2014). rTMS, based on the principle of Faraday, induces depolarization of cortical neuronal assemblies and leads to after-effects that have been linked to changes in synaptic plasticity involving mechanisms of long-term potentiation (LTP) (Huang et al., 2017; Jannati et al., 2023). On the other hand, tACS causes rhythmic fluctuations in neuronal membrane potentials, which can bias spike timing, leading to an entrainment of the neural activity (Wischnewski et al., 2023). In particular, the induction of gamma oscillatory a has been proposed to play an important role in a type of LTP known as spike timing-dependent plasticity, which depends on a precise temporal delay between the firing of a presynaptic and a postsynaptic neuron (Griffiths and Jensen, 2023). Both LTP and gamma oscillations have a strong link with memory processes such as encoding (Bliss and Collingridge, 1993; Griffiths and Jensen, 2023; Rossi et al., 2001), pointing to rTMS and tACS as good candidates for memory enhancement.”

      (3) Even if there is any kind of brain activation, it is unclear why the authors seem to be so sure that the precuneus is responsible. Are there neurophysiological data demonstrating that the current protocol only activates neurons in the precuneus? Of note, the non-invasive measurements shown in Figure 3 are very weak (Figure 3A top and bottom look very similar, and Figure 3C left and right look almost identical). Even if one were to accept the weak alleged differences in Figure 3, there is no indication in this figure that there is anything specific to the precuneus, rather a whole brain pattern. This would be the kind of minimally rigorous type of evidence required to make such claims. In a less convincing fashion, one could look at different positions of the stimulation apparatus. This would not be particularly compelling in terms of making a statement about the precuneus. But at least it would show that the position does matter, and over what range of distances it matters, if it matters.

      Thank you for your feedback. Our assumption that the precuneus plays a key role in the observed effects is based on several factors:

      (1) The non-invasive stimulation protocol was applied to an individually identified precuneus for each participant. Given existing evidence on TMS propagation, we can reasonably assume that the precuneus was at least a mediator of the observed effects (Ridding & Rothwell, Nature Reviews Neuroscience 2007). For further details about target identification and TMS and tACS propagation, please refer to the MRI data acquisition section in the main text and Biophysical modeling and E-field calculation section in the supplementary materials.

      (2) To investigate the effects of the neuromodulation protocol on cortical responses, we conducted a whole-brain analysis using multiple paired t-tests comparing each data point between different experimental conditions. To minimize the type I error rate, data were permuted with the Monte Carlo approach and significant p-values were corrected with the false discovery rate method (see the Methods section for details). The results identified the posterior-medial parietal areas as the only regions showing significant differences across conditions.

      (3) To control for potential generalized effects, we included a control condition in which TMS-EEG recordings were performed over the left parietal cortex (adjacent to the precuneus). This condition did not yield any significant results, reinforcing the cortical specificity of the observed effects.

      However, as stated in the Discussion, we do not claim that precuneus activity alone accounts for the observed effects. As shown in Experiment 4, stimulation led to connectivity changes between the precuneus and hippocampus, a network widely recognized as a key contributor to long-term memory formation (Bliss & Collingridge, Nature 1993). These connectivity changes suggest that precuneus stimulation triggered a ripple effect extending beyond the stimulation site, engaging the broader precuneus-hippocampus network.

      Regarding Figure 3A, it represents the overall expression of oscillatory activity detected by TMS-EEG. Since each frequency band has a different optimal scaling, the figure reflects a graphical compromise. A more detailed representation of the significant results is provided in Figure 3B. The effect sizes for gamma oscillatory activity in the delta T1 and T2 conditions were 0.52 and 0.50, respectively, which correspond to a medium effect based on Cohen’s d interpretation.

      We add a paragraph in the discussion to improve the clarity of the manuscript regarding this important aspect (lines 193-198).

      “Given the existing evidence on TMS propagation and the computation of the Biophysical model with the Efield, we can reasonably assume that the individually identified PC was a mediator of the observed effects (Ridding and Rothwell, 2007). Moreover, we observed specific cortical changes in the posteromedial parietal areas, as evidenced by the whole-brain analysis conducted on TMS-EEG data and the absence of effect on the lateral posterior parietal cortex used as a control condition.”

      (4) In the absence of any neurophysiological documentation of a direct impact on the brain, an argument in this type of study is that the behavioral results show that there must be some kind of effect. I agree with this argument. This is also the argument for placebo effects, which can be extremely powerful and useful even if the mechanism is unrelated to what is studied. Then let us dig into the behavioral results.

      Hoping to have already addressed your concern regarding the neurophysiological impact of the stimulation on the brain, we would like to emphasize that the behavioral results were obtained controlling for placebo effects. This was achieved by having participants perform the task under different stimulation conditions, including a sham condition.

      4a. There does not seem to be any effect on the STMB task, therefore we can ignore this.

      4b. The FNAT task is minimally described in the supplementary material. There are no experimental details to understand what was done. What was the size of the images? How long were the images presented for? Were there any repetitions of the images? For how long did the participants study the images? Presumably, all the names and occupations are different? What were the genders of the faces? What is chance level performance? Presumably, the same participant saw different faces across the different stimulation conditions. If not, then there can be memory effects across different conditions that are even more complex to study. If yes, then it would be useful to show that the difficulty is the same across the different stimuli.

      We thank you for signaling the lack in the description of FNAT task. We added the information required in the supplementary information (lines 93-101).

      “Each picture's face size was 19x15cm. In the learning phase, faces were shown along with names and occupations for 8 seconds each (totaling approximately 2 minutes). During immediate recall, the faces were displayed alone for 8 seconds. In the delayed recall and recognition phase, pictures were presented until the subject provided answers. We used a different set of stimuli for each stimulation condition, resulting in a total of 3 parallel task forms balanced across conditions and session order. All parallel forms comprised 6 male and 6 female faces; for each sex, there were 2 young adults (around 30 years old), 2 middle-aged adults (around 50 years old), and 2 elderly adults (around 70 years old). Before the experiments, we conducted a pilot study to ensure no differences existed between the parallel forms of the task.”

      The chance level in the immediate and delayed recall is not quantifiable since the participants had to freely recall the name and the occupation without a multiple choice. In the recognition, the chance level was around 33% (since the possible answers were 3).

      4c. Although not stated clearly, if I understand FNAT correctly, the task is based on just 12 presentations. Each point in Figure 2A represents a different participant. Unfortunately, there is no way of linking the performance of individual participants across the conditions with the information provided. Lines joining performance for each participant would be useful in this regard. Because there are only 12 faces, the results are quantized in multiples of 100/12 % in Figure 3A. While I do not doubt that the authors did their homework in terms of the statistical analyses, it is difficult to get too excited about these 12 measurements. For example, take Figure 3A immediate condition TOTAL, arguably the largest effect in the whole paper. It seems that on average, the participants may remember one more face/name/occupation.

      Thank you for the suggestion. We added graphs showing lines linking the performance of individual participants across conditions to improve clarity, please see Fig.2 revised. We apologize for the lack of clarity in the description of the FNAT. As you correctly pointed out, we used the percentage based on the single association between face, name and occupation (12 in total). However, each association consisted of three items, resulting in a total of 36 items to learn and associate – we added a paragraph to make it more explicit in the manuscript (lines 425-430).

      “We considered a correct association when a subject was able to recall all the information for each item (i.e. face, name and occupation), resulting in a total of 36 items to learn and associate. To further investigate the effect on FNAT we also computed a partial recall score accounting for those items where subjects correctly matched only names with faces (FNAT NAME) and only occupations with faces (FNAT OCCUPATION). See supplementary information for score details.”

      In the example you mentioned, participants were, on average, able to correctly recall and associate three more items compared to the other conditions. While this difference may not seem striking at first glance, it is important to consider that we assessed memory performance after a single, three-minute stimulation session. Similar effects are typically observed only after multiple stimulation sessions (Koch et al., NeuroImage, 2018; Grover et al., Nature Neuroscience, 2022). Moreover, memory performance changes are often measured by a limited set of stimuli due to methodological constraints related to memory capacity. For example, Rey Auditory Verbal learning task, requiring to learn and recall 15 words, is a typical test used to detect memory changes (Koch et al., Neuroimage, 2018; Benussi et al., Brain stimulation 2021; Benussi et al., Annals of Neurology, 2022). 

      4d. Block effects. If I understand correctly, the experiments were conducted in blocks. This is always problematic. Here is one example study that articulated the big problems in block designs (Li et al TPAMI 2021):https://ieeexplore.ieee.org/document/9264220

      Thank you for the interesting reference. According to this paper, in a block design, EEG or fMRI recordings are performed in response to different stimuli of a given class presented in succession. If this is the case, it does not correspond to our experimental design where both TMS-EEG and fMRI were conducted in resting state on different days according to the different stimulation conditions.

      4e. Even if we ignore the lack of experimental descriptions, problems with lack of evidence of brain activity, the minimalistic study of 12 faces, problems with the block design, etc. at the end of the day, the results are extremely weak. In FNAT, some results are statistically significant, some are not. The interpretation of all of this is extremely complex. Continuing with Figure 3A, it seems that the author claims that iTBS+gtACS > iTBS+sham-tACS, but iTBS+gtACS ~ sham+sham. I am struggling to interpret such a result. When separating results by name and occupation, the results are even more perplexing. There is only one condition that is statistically significant in Figure 3A NAME and none in the occupation condition.

      Thank you again for your feedback. Hoping to have thoroughly addressed your initial concerns in our previous responses, we now move on to your observations regarding the behavioral results, assuming you were referring to Figure 2A. The main finding of this study is the improvement in long-term memory performance, specifically the ability to correctly recall the association between face, name, and occupation (total FNAT), which was significantly enhanced in both Experiments 1 and 2. However, we also aimed to explore the individual contributions of name and occupation separately to gain a deeper understanding of the results. Our analysis revealed that the improvement in total FNAT was primarily driven by an increase in name recall rather than occupation recall. We understand that this may have caused some confusion. We consequently modified the manuscript in the (lines 97-99; 107-111; 425-430) to make it clearer and moved the graph relative to FNAT NAME and OCCUPATION from fig.2 in the main text to fig. S4 in supplementary information.

      “Dual iTBS+γtACS increased the performances in recalling the association between face, name and occupation (FNAT accuracy) both for the immediate (F<sub>2,38</sub>=7.18; p =0.002; η<sup>2</sup><sub>p</sub>=0.274) and the delayed (F<sub>2,38</sub>=5.86; p =0.006; η<sup>2</sup><sub>p</sub>=0.236) recall performances (Fig. 2, panel A).”

      “The in-depth analysis of the FNAT accuracy investigating the specific contribution of face-name and face-occupation recall reveald that dual iTBS+γtACS increased the performances in the association between face and name (FNAT NAME) delayed recall (F<sub>2,38</sub> =3.46; p =0.042; η<sup>2</sup>p =0.154; iTBS+γtACS vs. sham-iTBS+sham-tACS: 42.9±21.5 % vs. 33.8±19 %; p=0.048 Bonferroni corrected) (Fig. S4, supplementary information).”

      “We considered a correct association when a subject was able to recall all the information for each item (i.e. face, name and occupation), resulting in a total of 36 items to learn and associate. To further investigate the effect on FNAT we also computed a partial recall score accounting for those items where subjects correctly matched only names with faces (FNAT NAME) and only occupations with faces (FNAT OCCUPATION). See supplementary information for score details.”

      Regarding the stimulation conditions, your concerns about the performance pattern (iTBS+gtACS > iTBS+sham-tACS, but iTBS+gtACS ~ sham+sham) are understandable. However, this new protocol was developed precisely in response to the variability observed in behavioral outcomes following non-invasive brain stimulation, particularly when used to modulate memory functions (Corp et al., 2020; Pabst et al., 2022). As discussed in the manuscript, it is intended as a boost to conventional non-invasive brain stimulation protocols, leveraging the mechanisms outlined in the Discussion section.

      (5) In sum, it would be amazing to be able to use non-invasive stimulation for any kind of therapeutic purpose as the authors imagine. More work needs to be done to convince ourselves that this kind of approach is viable. The evidence provided in this study is weak.

      We hope our response will be carefully considered, fostering a constructive exchange and leading to a reassessment of your evaluation.

      Reviewer #2 (Public review):

      Summary:

      The manuscript "Dual transcranial electromagnetic stimulation of the precuneus-hippocampus network boosts human long-term memory" by Borghi and colleagues provides evidence that the combination of intermittent theta burst TMS stimulation and gamma transcranial alternating current stimulation (γtACS) targeting the precuneus increases long-term associative memory in healthy subjects compared to iTBS alone and sham conditions. Using a rich dataset of TMS-EEG and resting-state functional connectivity (rs-FC) maps and structural MRI data, the authors also provide evidence that dual stimulation increased gamma oscillations and functional connectivity between the precuneus and hippocampus. Enhanced memory performance was linked to increased gamma oscillatory activity and connectivity through white matter tracts.

      Strengths:

      The combination of personalized repetitive TMS (iTBS) and gamma tACS is a novel approach to targeting the precuneus, and thereby, connected memory-related regions to enhance long-term associative memory. The authors leverage an existing neural mechanism engaged in memory binding, theta-gamma coupling, by applying TMS at theta burst patterns and tACS at gamma frequencies to enhance gamma oscillations. The authors conducted a thorough study that suggests that simultaneous iTBS and gamma tACS could be a powerful approach for enhancing long-term associative memory. The paper was well-written, clear, and concise.

      Weaknesses:

      (1) The study did not include a condition where γtACS was applied alone. This was likely because a previous work indicated that a single 3-minute γtACS did not produce significant effects, but this limits the ability to isolate the specific contribution of γtACS in the context of this target and memory function

      Thank you for your comments. As you pointed out, we did not include a condition where γtACS was applied alone. This decision was based on the findings of Guerra et al. (Brain Stimulation 2018), who investigated the same protocol and reported no aftereffects. Given the substantial burden of the experimental design on patients and our primary goal of demonstrating an enhancement of effects compared to the standalone iTBS protocol, we decided to leave out this condition. However, you raise an important aspect that should be further discussed, we modified the limitation section accordingly (lines 290-297).

      “We did not assess the effects of γtACS alone. This decision was based on the findings of Guerra et al. (Guerra et al., 2018), who investigated the same protocol and reported no aftereffects. Given the substantial burden of the experimental design on patients and our primary goal of demonstrating an enhancement of effects compared to the standalone iTBS protocol, we decided to leave out this condition. While examining the effects of γtACS alone could help isolate its specific contribution to this target and memory function, extensive research has shown that achieving a cognitive enhancement aftereffect with tACS alone typically requires around 20–25 minutes of stimulation (Grover et al., 2023).”

      (2) The authors applied stimulation for 3 minutes, which seems to be based on prior tACS protocols. It would be helpful to present some rationale for both the duration and timing relative to the learning phase of the memory task. Would you expect additional stimulation prior to recall to benefit long-term associative memory?

      Thank you for your comment and for raising this interesting point. As you correctly noted, the protocol we used has a duration of three minutes, a choice based on previous studies demonstrating its greater efficacy with respect to single stimulation from a neurophysiological point of view. Specifically, these studies have shown that the combined stimulation enhanced gamma-band oscillations and increased cortical plasticity (Guerra et al., Brain Stimulation 2018; Maiella et al., Scientific Reports 2022). Given that the precuneus (Brodt et al., Science 2018; Schott et al., Human Brain Mapping 2018), gamma oscillations (Osipova et al., Journal of Neuroscience 2006; Deprés et al., Neurobiology of Aging 2017; Griffiths et al., Trends in Neurosciences 2023), and cortical plasticity (Brodt et al., Science 2018) are all associated with memory formation and encoding processes, we decided to apply the co-stimulation immediately before it to enhance the efficacy. We added this paragraph to the manuscript rationale (lines 48-60).

      “Repetitive transcranial magnetic stimulation (rTMS) and transcranial alternating current stimulation (tACS) are two forms of NIBS widely used to enhance memory performances (Grover et al., 2022; Koch et al., 2018; Wang et al., 2014). rTMS, based on the principle of Faraday, induces depolarization of cortical neuronal assemblies and leads to after-effects that have been linked to changes in synaptic plasticity involving mechanisms of long-term potentiation (LTP) (Huang et al., 2017; Jannati et al., 2023). On the other hand, tACS causes rhythmic fluctuations in neuronal membrane potentials, which can bias spike timing, leading to an entrainment of the neural activity (Wischnewski et al., 2023). In particular, the induction of gamma oscillatory a has been proposed to play an important role in a type of LTP known as spike timing-dependent plasticity, which depends on a precise temporal delay between the firing of a presynaptic and a postsynaptic neuron (Griffiths and Jensen, 2023). Both LTP and gamma oscillations have a strong link with memory processes such as encoding (Bliss and Collingridge, 1993; Griffiths and Jensen, 2023; Rossi et al., 2001), pointing to rTMS and tACS as good candidates for memory enhancement.”

      Regarding the question of whether stimulation could also benefit recall, the answer is yes. We can speculate that repeating the stimulation before recall might provide an additional boost. This is supported by evidence showing that both the precuneus and gamma oscillations are involved in recall processes (Flanagin et al., Cerebral Cortex 2023; Griffiths et al., Trends in Neurosciences 2023). Furthermore, previous research suggests that reinstating the same brain state as during encoding can enhance recall performance (Javadi et al., The Journal of Neuroscience 2017). We added this consideration to the discussion (lines 305-311).

      “Future studies should further investigate the effects of stimulation on distinct memory processes. In particular, stimulation could be applied before retrieval (Rossi et al., 2001), to better elucidate its specific contribution to the observed enhancements in memory performance. Additionally, it would be worth examining whether repeated stimulation - administered both before encoding and before retrieval - could produce a boosting effect. This is especially relevant in light of findings showing that matching the brain state between retrieval and encoding can significantly enhance memory performance (Javadi et al., 2017).”

      (3) How was the burst frequency of theta iTBS and gamma frequency of tACS chosen? Were these also personalized to subjects' endogenous theta and gamma oscillations? If not, were increases in gamma oscillations specific to patients' endogenous gamma oscillation frequencies or the tACS frequency?

      The stimulation protocol was chosen based on previous studies (Guerra et al., Brain Stimulation 2018; Maiella et al., Scientific Reports 2022).  Gamma tACS sinusoid frequency wave was set at 70 Hz while iTBS consisted of ten bursts of three pulses at 50 Hz lasting 2 s, repeated every 10 s with an 8 s pause between consecutive trains, for a total of 600 pulses total lasting 190 s (see iTBS+γtACS neuromodulation protocol section). In particular, the theta iTBS has been inspired by protocols used in animal models to elicit LTP in the hippocampus (Huang et al., Neuron 2005). Consequently, neither Theta iTBS nor the gamma frequency of tACS were personalized. The increase in gamma oscillations was referred to the patient’s baseline and did not correspond to the administrated tACS frequency.

      (4) The authors do a thorough job of analyzing the increase in gamma oscillations in the precuneus through TMS-EEG; however, the authors may also analyze whether theta oscillations were also enhanced through this protocol due to the iTBS potentially targeting theta oscillations. This may also be more robust than gamma oscillations increases since gamma oscillations detected on the scalp are very low amplitude and susceptible to noise and may reflect activity from multiple overlapping sources, making precise localization difficult without advanced techniques.

      Thank you for the suggestion. We analyzed theta oscillations, finding no changes.

      (5) Figure 4: Why are connectivity values pre-stimulation for the iTBS and sham tACS stimulation condition so much higher than the dual stimulation? We would expect baseline values to be more similar.

      We acknowledge that the pre-stimulation connectivity values for the iTBS and sham tACS conditions appear higher than those for the dual stimulation condition. However, as noted in our statistical analyses, there were no significant differences at baseline between conditions (p-FDR= 0.3514), suggesting that any apparent discrepancy is due to natural variability rather than systematic bias. One potential explanation for these differences is individual variability in baseline connectivity measures, which can fluctuate due to factors such as intrinsic neural dynamics, participant state, or measurement noise. Despite these variations, our statistical approach ensures that any observed post-stimulation effects are not confounded by pre-existing differences.

      (6) Figure 2: How are total association scores significantly different between stimulation conditions, but individual name and occupation associations are not? Further clarification of how the total FNAT score is calculated would be helpful.

      We apologize for any lack of clarity. The total FNAT score reflects the ability to correctly recall all the information associated with a person—specifically, the correct pairing of the face, name, and occupation. Participants received one point for each triplet they accurately recalled. The scores were then converted into percentages, as detailed in the Face-Name Associative Task Construction and Scoring section in the supplementary materials.

      Total FNAT was the primary outcome measure. However, we also analyzed name and occupation recall separately to better understand their partial contributions. Our analysis revealed that the improvement in total FNAT was primarily driven by an increase in name recall rather than occupation recall.

      We acknowledge that this distinction may have caused some confusion. To improve clarity, we revised the manuscript accordingly (lines 97-98; 107-111; 425-430).

      “Dual iTBS+γtACS increased the performances in recalling the association between face, name and occupation (FNAT accuracy) both for the immediate (F<sub>2,38</sub>=7.18 ;p=0.002; η<sup>2</sup><sub>p</sub>=0.274) and the delayed (F<sub>2,38</sub>=5.86;p=0.006; η<sup>2</sup><sub>p</sub>=0.236) recall performances (Fig. 2, panel A).”

      “The in-depth analysis of the FNAT accuracy investigating the specific contribution of face-name and face-occupation recall revealed that dual iTBS+γtACS increased the performances in the association between face and name (FNAT NAME) delayed recall (F<sub>2,38</sub> =3.46; p =0.042; η<sup>2</sup>p =0.154; iTBS+γtACS vs. sham-iTBS+sham-tACS: 42.9±21.5 % vs. 33.8±19 %; p=0.048 Bonferroni corrected) (Fig. S4, supplementary information).”

      “We considered a correct association when a subject was able to recall all the information for each item (i.e. face, name and occupation), resulting in a total of 36 items to learn and associate. To further investigate the effect on FNAT we also computed a partial recall score accounting for those items where subjects correctly matched only names with faces (FNAT NAME) and only occupations with faces (FNAT OCCUPATION). See supplementary information for score details.”

      We also moved the data regarding the specific contribution of name and occupation recall in the supplementary information (fig.S4) and further specified how we computed the score in the score (lines 102-104).

      “The score was computed by deriving an accuracy percentage index dividing by 12 and multiplying by 100 the correct association sum. The partial recall scores were computed in the same way only considering the sum of face-name (NAME) and face-occupation (OCCUPATION) correctly recollected.”

      Reviewer #3 (Public review):

      Summary:

      Borghi and colleagues present results from 4 experiments aimed at investigating the effects of dual γtACS and iTBS stimulation of the precuneus on behavioral and neural markers of memory formation. In their first experiment (n = 20), they found that a 3-minute offline (i.e., prior to task completion) stimulation that combines both techniques leads to superior memory recall performance in an associative memory task immediately after learning associations between pictures of faces, names, and occupation, as well as after a 15-minute delay, compared to iTBS alone (+ tACS sham) or no stimulation (sham for both iTBS and tACS). Performance in a second task probing short-term memory was unaffected by the stimulation condition. In a second experiment (n = 10), they show that these effects persist over 24 hours and up to a full week after initial stimulation. A third (n = 14) and fourth (n = 16) experiment were conducted to investigate the neural effects of the stimulation protocol. The authors report that, once again, only combined iTBS and γtACS increase gamma oscillatory activity and neural excitability (as measured by concurrent TMS-EEG) specific to the stimulated area at the precuneus compared to a control region, as well as precuneus-hippocampus functional connectivity (measured by resting-state MRI), which seemed to be associated with structural white matter integrity of the bilateral middle longitudinal fasciculus (measured by DTI).

      Strengths:

      Combining non-invasive brain stimulation techniques is a novel, potentially very powerful method to maximize the effects of these kinds of interventions that are usually well-tolerated and thus accepted by patients and healthy participants. It is also very impressive that the stimulation-induced improvements in memory performance resulted from a short (3 min) intervention protocol. If the effects reported here turn out to be as clinically meaningful and generalizable across populations as implied, this approach could represent a promising avenue for the treatment of impaired memory functions in many conditions.

      Methodologically, this study is expertly done! I don't see any serious issues with the technical setup in any of the experiments (with the only caveat that I am not an expert in fMRI functional connectivity measures and DTI). It is also very commendable that the authors conceptually replicated the behavioral effects of experiment 1 in experiment 2 and then conducted two additional experiments to probe the neural mechanisms associated with these effects. This certainly increases the value of the study and the confidence in the results considerably.

      The authors used a within-subject approach in their experiments, which increases statistical power and allows for stronger inferences about the tested effects. They are also used to individualize stimulation locations and intensities, which should further optimize the signal-to-noise ratio.

      Weaknesses:

      I want to state clearly that I think the strengths of this study far outweigh the concerns I have. I still list some points that I think should be clarified by the authors or taken into account by readers when interpreting the presented findings.

      I think one of the major weaknesses of this study is the overall low sample size in all of the experiments (between n = 10 and n = 20). This is, as I mentioned when discussing the strengths of the study, partly mitigated by the within-subject design and individualized stimulation parameters. The authors mention that they performed a power analysis but this analysis seemed to be based on electrophysiological readouts similar to those obtained in experiment 3. It is thus unclear whether the other experiments were sufficiently powered to reliably detect the behavioral effects of interest. That being said, the authors do report significant effects, so they were per definition powered to find those. However, the effect sizes reported for their main findings are all relatively large and it is known that significant findings from small samples may represent inflated effect sizes, which may hamper the generalizability of the current results. Ideally, the authors would replicate their main findings in a larger sample. Alternatively, I think running a sensitivity analysis to estimate the smallest effect the authors could have detected with a power of 80% could be very informative for readers to contextualize the findings. At the very least, however, I think it would be necessary to address this point as a potential limitation in the discussion of the paper.

      Thank you for the observation. As you mentioned, our power analysis was based on our previous study investigating the same neuromodulation protocol with a corresponding experimental design. The relatively small sample could be considered a possible limitation of the study which we will add to the discussion. A fundamental future step will be to replay these results on a larger population, however, to strengthen our results we performed the sensitivity analysis you suggested.

      In detail, we performed a sensitivity analysis for repeated-measures ANOVA with α=0.05 and power(1-β)=0.80 with no sphericity correction. For experiment 1, a sensitivity analysis with 1 group and 3 measurements showed a minimal detectable effect size of f=0.524 with 20 participants. In our paper, the ANOVA on total FNAT immediate performance revealed an effect size of η<sup>2</sup>=0.274 corresponding to f=0.614; the ANOVA on FNAT delayed performance revealed an effect size of η<sup>2</sup>=0.236 corresponding to f=0.556. For experiment 2, a sensitivity analysis for total FNAT immediate performance (1 group and 3 measurements) showed a minimal detectable effect size of f=0.797 with 10 participants. In our paper, the ANOVA on total FNAT immediate performance revealed an effect size of η<sup>2</sup>=0.448 corresponding to f=0.901. The sensitivity analysis for total FNAT delayed performance (1 group and 6 measurements) showed a minimal detectable effect size of f=0.378 with 10 participants. In our paper, the ANOVA on total FNAT delayed performance revealed an effect size of η<sup>2</sup>=0.484 corresponding to f=0.968. Thus, the sensitivity analysis showed that both experiments were powered enough to detect the minimum effect size computed in the power analysis. We have now added this information to the manuscript and we thank the reviewer for her/his suggestion in the statistical analysis and results section (lines 99-100; 127-128; 130-131; 543-545).

      “The sensitivity analysis showed a minimal detectable effect size of  η<sup>2</sup>=0.215 with 20 participants.”

      “The sensitivity analysis showed a minimal detectable effect size of  η<sup>2</sup>=0.388 with 10 participants.”

      “The sensitivity analysis showed a minimal detectable effect size of η<sup>2</sup>=0.125 with 10 participants.”

      “Since we do not have an a priori effect size for experiment 1 and 2, we performed a sensitivity power analysis to ensure that these experiments were able to detect the minimum effect size with 80% power and alpha level of 0.05.”

      It seems that the statistical analysis approach differed slightly between studies. In experiment 1, the authors followed up significant effects of their ANOVAs by Bonferroni-adjusted post-hoc tests whereas it seems that in experiment 2, those post-hoc tests where "exploratory", which may suggest those were uncorrected. In experiment 3, the authors use one-tailed t-tests to follow up their ANOVAs. Given some of the reported p-values, these choices suggest that some of the comparisons might have failed to reach significance if properly corrected. This is not a critical issue per se, as the important test in all these cases is the initial ANOVA but non-significant (corrected) post-hoc tests might be another indicator of an underpowered experiment. My assumptions here might be wrong, but even then, I would ask the authors to be more transparent about the reasons for their choices or provide additional justification. Finally, the authors sometimes report exact p-values whereas other times they simply say p < .05. I would ask them to be consistent and recommend using exact p-values for every result where p >= .001.

      Thank you again for the suggestions. Your observations are correct, we used a slightly different statistical depending on our hypothesis. Here are the details:

      In experiment 1, we used a repeated-measure ANOVA with one factor “stimulation condition” (iTBS+γtACS; iTBS+sham-tACS; sham-iTBS+sham-tACS). Following the significant effect of this factor we performed post-hoc analysis with Bonferroni correction.

      In experiment 2, we used a repeated-measures with two factors “stimulation condition” and “time”. As expected, we observed a significant effect of condition, confirming the result of experiment 1, but not of time. Thus, this means that the neuromodulatory effect was present regardless of the time point. However, to explore whether the effects of stimulation condition were present in each time point we performed some explorative t-tests with no correction for multiple comparisons since this was just an explorative analysis.

      In experiment 3, we used the same approach as experiment 1. However, since we had a specific hypothesis on the direction of the effect already observed in our previous study, i.e. increase in spectral power (Maiella et al., Scientific Report 2022), our tests were 1-tailed.

      For the p-values, we corrected the manuscript reporting the exact values for every result.

      While the authors went to great lengths trying to probe the neural changes likely associated with the memory improvement after stimulation, it is impossible from their data to causally relate the findings from experiments 3 and 4 to the behavioral effects in experiments 1 and 2. This is acknowledged by the authors and there are good methodological reasons for why TMS-EEG and fMRI had to be collected in sperate experiments, but it is still worth pointing out to readers that this limits inferences about how exactly dual iTBS and γtACS of the precuneus modulate learning and memory.

      Thank you for your comment. We fully agree with your observation, which is why this aspect has been considered in the study's limitations. To address your concern, we add this sentence to the limitation discussion (lines 299-301).

      “Consequently, these findings do not allow precise inferences regarding the specific mechanisms by which dual iTBS and γtACS of the precuneus modulate learning and memory.”

      There were no stimulation-related performance differences in the short-term memory task used in experiments 1 and 2. The authors argue that this demonstrates that the intervention specifically targeted long-term associative memory formation. While this is certainly possible, the STM task was a spatial memory task, whereas the LTM task relied (primarily) on verbal material. It is thus also possible that the stimulation effects were specific to a stimulus domain instead of memory type. In other words, could it be possible that the stimulation might have affected STM performance if the task taxed verbal STM instead? This is of course impossible to know without an additional experiment, but the authors could mention this possibility when discussing their findings regarding the lack of change in the STM task.

      Thank you for your interesting observation. We argue that the intervention primarily targeted long-term associative memory formation, as our findings demonstrated effects only on FNAT. However, as you correctly pointed out, we cannot exclude the possibility that the stimulation may also influence short-term verbal associative memory. We add this aspect when discussing the absence of significant findings in the STM task (lines 205-210).

      “Visual short-term associative memory, measured by STBM performance, was not modulated by any experimental condition. Even if we cannot exclude the possibility that the stimulation could have influenced short-term verbal associative memory, we expected this result since short-term associative memory is known to rely on a distinct frontoparietal network while FNAT, used to investigate long-term associative memory, has already been associated with the neural activity of the PC and the hippocampus (Parra et al., 2014; Rentz et al., 2011).”

      While the authors discuss the potential neural mechanisms by which the combined stimulation conditions might have helped memory formation, the psychological processes are somewhat neglected. For example, do the authors think the stimulation primarily improves the encoding of new information or does it also improve consolidation processes? Interestingly, the beneficial effect of dual iTBS and γtACS on recall performance was very stable across all time points tested in experiments 1 and 2, as was the performance in the other conditions. Do the authors have any explanation as to why there seems to be no further forgetting of information over time in either condition when even at immediate recall, accuracy is below 50%? Further, participants started learning the associations of the FNAT immediately after the stimulation protocol was administered. What would happen if learning started with a delay? In other words, do the authors think there is an ideal time window post-stimulation in which memory formation is enhanced? If so, this might limit the usability of this procedure in real-life applications.

      Thank you for your comment and for raising these important points.

      We hypothesized that co-stimulation would enhance encoding processes. Previous studies have shown that co-stimulation can enhance gamma-band oscillations and increase cortical plasticity (Guerra et al., Brain Stimulation 2018; Maiella et al., Scientific Reports 2022). Given that the precuneus (Brodt et al., Science 2018; Schott et al., Human Brain Mapping 2018), gamma oscillations (Osipova et al., Journal of Neuroscience 2006; Deprés et al., Neurobiology of Aging 2017; Griffiths et al., Trends in Neurosciences 2023), and cortical plasticity (Brodt et al., Science 2018) have all been associated with encoding processes, we decided to apply co-stimulation before the encoding phase, to boost it. We enlarged the introduction to specify the link between neural mechanisms and the psychological process of the encoding (lines 55-60).

      “In particular, the induction of gamma oscillatory activity has been proposed to play an important role in a type of LTP known as spike timing-dependent plasticity, which depends on a precise temporal delay between the firing of a presynaptic and a postsynaptic neuron (Griffiths and Jensen, 2023). Both LTP and gamma oscillations have a strong link with memory processes such as encoding (Bliss and Collingridge, 1993; Griffiths and Jensen, 2023; Rossi et al., 2001), pointing to rTMS and tACS as good candidates for memory enhancement.”

      We applied the co-stimulation immediately before the learning phase to maximize its potential effects. While we observed a significant increase in gamma oscillatory activity lasting up to 20 minutes, we cannot determine whether the behavioral effects we observed would have been the same with a co-stimulation applied 20 minutes before learning. Based on existing literature, a reduction in the efficacy of co-stimulation over time could be expected (Huang et al., Neuron 2005; Thut et al., Brain Topography 2009). However, we hypothesize that multiple stimulation sessions might provide an additional boost, helping to sustain the effects over time (Thut et al., Brain Topography 2009; Koch et al., Neuroimage 2018; Koch et al., Brain 2022).

      Regarding the absence of further forgetting in both stimulation conditions, we think that the clinical and demographical characteristics of the sample (i.e. young and healthy subjects) explain the almost absence of forgetting after one week.

      Reviewer #1 (Recommendations for the authors):

      To address the concerns, the authors should:

      (1) Include invasive neuronal recordings (e.g., in rats or monkeys if not possible in humans) demonstrating that the current stimulation protocol leads to direct changes in brain activity.

      We understand the interest of the first reviewer in the understanding of neurophysiological correlates of the stimulation protocol, however, we are skeptical about this request as we think it goes beyond the aims of the study. As already mentioned in the response to the reviewer, invasive neurophysiological recordings in humans for this type of study are not feasible due to ethical constraints. At the same time, studies on cadavers or rodents would not fully resolve the question. Indeed, the authors of the study cited by the reviewer (Mihály Vöröslakos et al., Nature Communications, 2018) highlight the impossibility of drawing definitive conclusions about the exact voltage required in the in-vivo human brain due to significant differences between rats and humans, as well as the in-vivo human cadavers due to alterations in electrical conductivity that occur in postmortem tissue. Huang and colleagues addressed the difficulties in reaching direct evidence of non-invasive brain stimulation (NIBS) effects in a review published in Clinical Neurophysiology in 2017. They conclude that the use of EEG to assess brain response to TMS has a great potential for a less indirect demonstration of plasticity mechanisms induced by NIBS in humans.

      It is exactly to meet the need to investigate the changes in brain activity after the stimulation protocol that we conducted Experiments 3 and 4. These experiments respectively examined the neurophysiological and connectivity changes induced by the stimulation in a non-invasive manner using TMS-EEG and fMRI. The observed changes in brain oscillatory activity (increased gamma oscillatory activity), cortical excitability (enhanced posteromedial parietal cortex reactivity), and brain connectivity (strengthened connections between the precuneus and hippocampi) provided evidence of the effects of our non-invasive brain stimulation protocol, further supporting the behavioral data.

      Additionally, we carefully considered the issue of stimulation distribution and, in response, performed a biophysical modeling analysis and E-field calculation using the parameters employed in our study (see Supplementary Materials).

      Acknowledging the reviewer's point of view, we modified the manuscript accordingly, discussing this aspect both as a technical limitation and as a potential direction for future research (main text, lines 280-289).

      “Although we studied TMS and tACS propagation through the E-field modeling and observed an increase in the precuneus gamma oscillatory activity, excitability and connectivity with the hippocampi, we cannot exclude that our results might reflect the consequences of stimulating more superficial parietal regions other than the precuneus nor report direct evidence of microscopic changes in the brain after the stimulation. Invasive neurophysiological recordings in humans for this type of study are not feasible due to ethical constraints. Studies on cadavers or rodents would not fully resolve our question due to significant differences between them (i.e. rodents do not have an anatomical correspondence while cadavers have an alterations in electrical conductivity occurring in postmortem tissue). However, further exploration of this aspect in future studies would help in the understanding of γtACS+iTBS effects.”

      (2) Address all the technical questions about the experimental design.

      We addressed all the technical questions about the experimental design.

      (3) Repeat the experiments with randomized trial order and without a block design.

      The experiments were conducted with randomized trial order and we did not use a block design.

      (4) Add many more faces to the study. It is extremely difficult to draw any conclusion from merely 12 faces. Ideally, there would be lots of other relevant memory experiments where the authors show compelling positive results.

      We understand your perplexity about drawing conclusions from 12 faces, however, this is not the case. As we explained in the response reviewer, the task we implemented did not rely on the recall of merely 12 faces. Instead, participants had to correctly learn, associate and recall 12 faces, 12 names and 12 occupations for a total of 36 items. To improve the clarity of the manuscript, we added a paragraph to make this aspect more explicit (lines 425-430).

      “We considered a correct association when a subject was able to recall all the information for each item (i.e. face, name and occupation), resulting in a total of 36 items to learn and associate. To further investigate the effect on FNAT we also computed a partial recall score accounting for those items where subjects correctly matched only names with faces (FNAT NAME) and only occupations with faces (FNAT OCCUPATION). See supplementary information for score details.”

      The behavioral changes we observed are similar to those who are typically observed after multiple stimulation sessions (Koch et al., NeuroImage, 2018; Grover et al., Nature Neuroscience, 2022, Benussi et al., Annals of Neurology, 2022). Moreover, memory performance changes are often measured by a limited set of stimuli due to methodological constraints related to memory capacity. For example, Rey Auditory Verbal learning task, requiring to learn and recall 15 words, is a typical test used to detect memory changes (Koch et al., Neuroimage, 2018; Benussi et al., Brain stimulation 2021; Benussi et al., Annals of Neurology, 2022). 

      (5) Provide a clear explanation of the apparent randomness of which results are statistically significant or not in Figure 3. But perhaps with many more experiments, a lot more memory evaluations, many more stimuli, and addressing all the other technical concerns, either the results will disappear or there will be a more interpretable pattern of results.

      We provided explanations for all the concerns shown by the reviewer.

      Reviewer #2 (Recommendations for the authors):

      Minor comments:

      (1) Figure 4: Why are connectivity values pre-stimulation for the iTBS and sham tACS stimulation condition so much higher than the dual stimulation? We would expect baseline values to be more similar.

      We acknowledge that the pre-stimulation connectivity values for the iTBS and sham tACS conditions appear higher than those for the dual stimulation condition. However, as noted in our statistical analyses, there were no significant differences at baseline between conditions (p-FDR= 0.3514), suggesting that any apparent discrepancy is due to natural variability rather than systematic bias. One potential explanation for these differences is individual variability in baseline connectivity measures, which can fluctuate due to factors such as intrinsic neural dynamics, participant state, or measurement noise. Despite these variations, our statistical approach ensures that any observed post-stimulation effects are not confounded by pre-existing differences.

      (2) Figure 2: How are total association scores significantly different between stimulation conditions, but individual name and occupation associations are not? Further clarification of how the total FNAT score is calculated would be helpful.

      We apologize for any lack of clarity. The total FNAT score reflects the ability to correctly recall all the information associated with a person—specifically, the correct pairing of the face, name, and occupation. Participants received one point for each triplet they accurately recalled. The scores were then converted into percentages, as detailed in the Face-Name Associative Task Construction and Scoring section in the supplementary materials.

      Total FNAT was the primary outcome measure. However, we also analyzed name and occupation recall separately to better understand their partial contributions. Our analysis revealed that the improvement in total FNAT was primarily driven by an increase in name recall rather than occupation recall.

      We acknowledge that this distinction may have caused some confusion. To improve clarity, we revised the manuscript accordingly (lines 97-98; 107-111; 425-430).

      “Dual iTBS+γtACS increased the performances in recalling the association between face, name and occupation (FNAT accuracy) both for the immediate (F<sub>2,38</sub>=7.18; p=0.002; η<sup>2</sup><sub>p</sub>=0.274) and the delayed (F<sub>2,38</sub>=5.86; p =0.006; η<sup>2</sup><sub>p</sub>=0.236) recall performances (Fig. 2, panel A).”

      “The in-depth analysis of the FNAT accuracy investigating the specific contribution of face-name and face-occupation recall revealed that dual iTBS+γtACS increased the performances in the association between face and name (FNAT NAME) delayed recall (F<sub>2,38</sub> =3.46; p =0.042; η<sup>2</sup>p =0.154; iTBS+γtACS vs. sham-iTBS+sham-tACS: 42.9±21.5 % vs. 33.8±19 %; p=0.048 Bonferroni corrected) (Fig. S4, supplementary information).”

      “We considered a correct association when a subject was able to recall all the information for each item (i.e. face, name and occupation), resulting in a total of 36 items to learn and associate. To further investigate the effect on FNAT we also computed a partial recall score accounting for those items where subjects correctly matched only names with faces (FNAT NAME) and only occupations with faces (FNAT OCCUPATION). See supplementary information for score details.”

      We also moved the data regarding the specific contribution of name and occupation recall in the supplementary information (fig.S4) and further specified how we computed the score in the score (lines 102-104).

      “The score was computed by deriving an accuracy percentage index dividing by 12 and multiplying by 100 the correct association sum. The partial recall scores were computed in the same way only considering the sum of face-name (NAME) and face-occupation (OCCUPATION) correctly recollected.”

      Reviewer #3 (Recommendations for the authors):

      A very small detail, in the caption for Figure 2A, OCCUPATION is described as being shown on the 'left' but it should be 'right'.

      We corrected this error.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      (1) Figure 1: It might be simpler to streamline  acronyms for different test cases, e.g,  E01contra, E01 ipsi (rather than EO1IPS), E02, and control. Thus, it would be possible to label  each of the three schematic panels as E01, E02, control.

      Please describe what the dots in the brain mean and move the V1 label so it does not occlude  dots.

      Please make clear that the "track reconstructions" are the bright spheres in the micrographs (there are track-like elements in some micrographs which may be tears or?)

      Thank you. We relabeled the groups as control, EO1contra, EO1ipsi, and EO2. These were  changed in all figures and in the document at several places.

      We indicated in the new caption that “Dots schematize ocular dominance columns”.

      We indicated that electrode track penetrations were the “(bright spots at right/posterior)”.

      (2) Figure 2: Should "horizontal" be vertical (line  556) of the caption? When describing the  scale bar for firing rate, please explain the meaning of italicized vs regular font.

      Please make the purple lines in Figures I and J easier to see (invisible in my PDF).

      Not quite clear what is significantly different from what when viewing the figure at a glance.  Would it be possible to clarify using standard methods?

      Yes, it should say vertical, thank you. We explained the italics (they denote the standard scale  bar size if no number is provided.)

      We changed the purple lines to yellow in all figures.

      We added comparison bars that help indicate significance.

      (3) Figures 3-5. Please make corrections like those  noted above.

      Yes, we applied the previous changes to Figures 3 - 5.

      (4) Minor. Sometimes the authors spell out temporal  frequency and sometimes abbreviate it.  Perhaps adopt a consistent style.

      Fixed, thanks.

      Reviewer #2 (Public Review):

      (1) The assessment of the tuning properties is  based on fits to the data. Presumably,  neurons for which the fits were poor were excluded? It would be useful to know what the criteria  were, how many neurons were excluded, and whether there was a significant difference  between the groups in the numbers of neurons excluded (which could further point to  differences between the groups).

      Yes, this is an important omission, thank you for catching it. We now write in methods (line 213):  “ Inclusion/exclusion: For each stimulus type, we examined  the set of all responses to visual  stimuli and blanks with an ANOVA test to evaluate the null hypothesis that the mean response  to all of these stimuli were the same; cells with a p<0.05 to this visual responsiveness test were  included in fits and analyses, and cells with p>0.05 were excluded. ”

      (2) For the temporal frequency data, low- and high-frequency  cut-offs are defined, but then  only used for the computation of the bandwidth. Given that the responses to low temporal  frequencies change profoundly with premature eye opening, it would be useful to directly  compare the low- and high-frequency cut-offs between groups, in addition to the index that is  currently used.

      We now provide this data in Figure 3 - figure supplement  1 .

      (3) In addition to the tuning functions and firing  rates that have been analyzed so far, are  there any differences in the temporal profiles of neural responses between the groups  (sustained versus transient responses, rates of adaptation, latency)? If the temporal dynamics  of the responses are altered significantly, that could be part of an explanation for the altered  temporal tuning.

      This is a great topic for future studies. Unfortunately, with drifting gratings, it is difficult to  establish these properties, which could be better assessed with standing or  square-wave-modulated gratings or other stimuli. We did not run standing gratings in our battery  of stimuli for this initial study.

      (4) It would be beneficial for the general interpretation  of the results to extend the discussion. First, it would be useful to provide a more detailed discussion of what type of visual information might make it through the closed eyelids (the natural state), in contrast to the structured  information available through open eyes. Second, it would be useful to highlight more clearly  that these data were collected in peripheral V1 by discussing what might be expected in  binocular, more central V1 regions. Third, it would be interesting to discuss the observed  changes in firing rates in the context of the development of inhibitory neurons in V1 (which still  undergo significant changes through the time period of premature visual experience chosen  here).

      Thank you, good ideas. Let’s take these three suggestions in turn.

      First, in the discussion, we added a subsection “ Biology  of early development in mustelids ” that  focuses on the developmental conditions of wild and laboratory animals:

      In the wild, mustelids raise their young in nests in the ground, in cavities such as holes in trees  or caves, or in areas of dense vegetation (Ruggiero et al. 1994). They may move the young  from one nest to another as they grow, but otherwise the young are primarily in the relatively  dark nest. It is highly likely that some light penetrates and that information about the 24-hour  cycle is available, but the light is likely to be dim and unlikely to provide a basis for high  luminance, high contrast stimulation through the closed lids. The animals begin to spend  substantial time outside the nest after eye opening.

      The ferret is a domesticated strain of the European polecat. In laboratory settings, ferret  jills give birth and keep their kits in a nest box. A laboratory typically maintains a 24-hour cycle  with 12 or 14 hours of light, and the light reaching the closed lids must first pass through the  cage, the nest box, and the nesting material. Therefore, developing ferrets have an obvious  circadian light signal but the light available for image formation is likely dim and of low contrast.

      Although the light that reaches the close lids in developing ferrets is likely to be relatively  dim, and any image-forming signal passing through the closed lids would be highly filtered in  luminance, spatial frequency, and contrast, it is important to remember that visual input before  natural eye opening (through the closed lids) can drive activity in retina, LGN, and cortex  (Huttenlocher 1967, Chapman and Stryker 1993, Krug et al., 2001, Akerman et al., 2002,Akerman et al., 2004). Further, orientation selectivity can be observed through the closed lids  (Krug et al., 2001), indicating that some coarse image-forming information does make it through  the closed lids.

      Second, we added text speculating about binocular cortex (lines 492 - 500): … our recordings  were performed in monocular cortex so that we could be sure of the developmental condition of  the eye that drove the classic responses. It is interesting to speculate about what might occur  more centrally in binocular visual cortex. Ocular dominance shifts are not induced when one eye  is opened prematurely (Issa et al 1999), indicating that ocular dominance plasticity is not  engaged at this early stage, but one might imagine that the impacts on temporal frequency and  spontaneous firing rates would still be present.

      Third, on inhibition, we added a paragraph (lines 502 - 509):

      We introduced premature patterned vision at a time when cortical inhibition is undergoing  substantial changes. GABAergic signaling has already undergone its switch (Ben-Ari, 2002)  from providing primarily depolarizing input to hyperpolarizing input by P21-23 (Mulholland et al.,  2021). In the days prior to eye opening, inhibitory cells exhibit activity that is closely associated  with the emerging functional modules that will reflect orientation columns (Mulholland et al.,  2021), but do not yet exhibit selectivity to orientation, in contrast to excitatory neurons, which do  exhibit selectivity to orientation at that time (Chang and Fitzpatrick, 2022).

      (5) In the methods section, the statement 'actively  kept in nesting box' is unclear. Presumably  this means that the jill prevents the kits from leaving the nesting box? It also would be worth at  least mentioning in this context that there obviously are still visual events in the nesting box too.

      Thanks. We improved this description (lines 118 - 121):  Ferret kits in laboratory housing receive  limited visual stimulation through their closed lids, as the mother actively keeps the kits in their  relatively dark nest . In order to ensure that animals  with early-opened eyes actually had  patterned visual experience  (and animals with closed  lids had the same stimulation filtered  through the lids) , animals were brought to the lab  for 2 hours a day for 4 consecutive days  beginning at P25.

      (6) The stimulus presentation could be more clearly  described. Is every stimulus presented in  an individual trial (surrounded by periods with a blank screen), or are all stimuli shown as a  continuous sequence? The description of the parameter screening is also potentially confusing  ('orientation was co-varied with stimuli consisting of drifting gratings at different spatial  frequencies' sounds as if there are separate stimuli for orientation; might be better to say  something like 'in the first set, orientation, spatial frequency, ... were covaried...')

      Yes, thank you, we fixed this (lines 184 - 201). We deleted the text indicated and added a  sentence “Each individual grating stimulus was full screen and had a single set of parameters  (direction, spatial frequency, temporal frequency), and was separated from the other stimuli by a  gray screen interstimulus interval.”. We also deleted a repetition of 100% contrast in the  description of the second set.

      (7) Description of low-pass index is unclear. What  is the 'largest temporal frequency response  observed'? The maximum response or the response to the largest temporal frequency tested?

      Thanks. We added a paragraph at line 236:

      We defined a low pass index as the response to the lowest temporal frequency tested (in this case 0.5 Hz) to the maximum response obtained to the set of temporal frequencies shown. LPI =  R(TF=0.5 Hz)/max(R(TF=0.5Hz), R(TF=1Hz), … R(TF=32Hz)).  If a cell exhibited the highest  firing for a temporal frequency of 0.5 Hz, then it would have an low pass index of 1. If it  exhibited a similar firing rate in response to a temporal frequency of 0.5 Hz even if the preferred  temporal frequency were higher, then the low pass index would still be near 1. If the cell  responded poorly at a temporal frequency of 0.5 Hz, then it would have a low pass index near 0.

      (8) The discussion should also cite the results  of strobe-reared cats by Pasternak et al (1981  and 1985).

      Thank you for pointing out the omission. We now write (lines 430-435):  Cats raised in a  strobe-light environment (mostly after eye opening) exhibited strong changes in subsequent  direction selectivity (Kennedy and Orban 1983; Humphrey and Saul 1998)  and behavioral  sensitivity to motion (Pasternak et al., 1981; Pasternak et al., 1985) that partially recovers with  motion detection training . However, temporal frequency  tuning of these animals has not been  reported in detail.  Pasternak et al (1981) reported  that strobe-reared ferrets exhibited greater  difficulty in distinguishing slow moving stimuli from static stimuli compared to controls, an  ability that slightly improved with practice, suggesting possible temporal frequency deficits.

      (9) Finally, it would be useful to include a mention  of the early development of MT in  marmosets in the discussion of impacts of prematurity on motion vision (Bourne & Rosa 2006).

      Yes, thank you. We cited Bourne & Rosa and also Lempel and Nielsen (for ferret PSS). (Lines  492-501):

      Several other basic mechanistic questions remain unanswered. It is unclear where in the visual  circuit cascade these deficits first arise. Does the lateral geniculate nucleus or retina exhibit  altered temporal frequency tuning? Is the influence of the patterned visual stimulation  instructive, so that if one provided premature stimulation with only certain temporal frequencies,  one would see selectivity for those temporal frequencies, or would tuning always be broad?  Other questions remain concerning the top-down influence on V1 from “higher” motion areas  such as MT (monkeys) or PSS (ferret); MT exhibits mature neural markers earlier than V1  (Bourne and Rosa, 2006), and suppression of PSS impacts motion selectivity in V1 (Lempel and  Nielsen, 2021).  Future studies will be needed to  address these questions.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      Phytophathogens including fungal pathogens such as F. graminearum remain a major threat to agriculture and food security. Several agriculturally relevant fungicides including the potent Quinofumelin have been discovered to date, yet the mechanisms of their action and specific targets within the cell remain unclear. This paper sets out to contribute to addressing these outstanding questions.

      We appreciate the reviewer's accurate summary of our manuscript.

      Strengths:

      The paper is generally well-written and provides convincing data to support their claims for the impact of Quinofumelin on fungal growth, the target of the drug, and the potential mechanism. Critically the authors identify an important pyrimidine pathway dihydroorotate dehydrogenase (DHODH) gene FgDHODHII in the pathway or mechanism of the drug from the prominent plant pathogen F. graminearum, confirming it as the target for Quinofumelin. The evidence is supported by transcriptomic, metabolomic as well as MST, SPR, molecular docking/structural biology analyses.

      We appreciate the reviewer's recognition of the strengths of our manuscript.

      Weaknesses:

      Whilst the study adds to our knowledge about this drug, it is, however, worth stating that previous reports (although in different organisms) by Higashimura et al., 2022 https://pmc.ncbi.nlm.nih.gov/articles/PMC9716045/ had already identified DHODH as the target for Quinofumelin and hence this knowledge is not new and hence the authors may want to tone down the claim that they discovered this mechanism and also give sufficient credit to the previous authors work at the start of the write-up in the introduction section rather than in passing as they did with reference 25? other specific recommendations to improve the text are provided in the recommendations for authors section below.

      We appreciate the reviewer's suggestion. In the revised manuscript, we have incorporated the reference in the introduction section and expanded the discussion of previous work on quinofumelin by Higashimura et al., 2022 in the discussion section to more effectively contextualize their contributions. Moreover, we have made revisions and provided responses in accordance with the recommendations.

      Reviewer #2 (Public review):

      Summary:

      In the current study, the authors aim to identify the mode of action/molecular mechanism of characterized a fungicide, quinofumelin, and its biological impact on transcriptomics and metabolomics in Fusarium graminearum and other Fusarium species. Two sets of data were generated between quinofumelin and no treatment group, and differentially abundant transcripts and metabolites were identified. The authors further focused on uridine/uracil biosynthesis pathway, considering the significant up- and down-regulation observed in final metabolites and some of the genes in the pathways. Using a deletion mutant of one of the genes and in vitro biochemical assays, the authors concluded that quinofumelin binds to the dihydroorotate dehydrogenase.

      We appreciate the reviewer's accurate summary of our manuscript.

      Strengths:

      Omics datasets were leveraged to understand the physiological impact of quinofumelin, showing the intracellular impact of the fungicide. The characterization of FgDHODHII deletion strains with supplemented metabolites clearly showed the impact of the enzyme on fungal growth.

      We appreciate the reviewer's recognition of the strengths of our manuscript.

      Weaknesses:

      Some interpretation of results is not accurate and some experiments lack controls. The comparison between quinofumelin-treated deletion strains, in the presence of different metabolites didn't suggest the fungicide is FgDHODHII specific. A wild type is required in this experiment.

      Potential Impact: Confirming the target of quinofumelin may help understand its resistance mehchanism, and further development of other inhibitory molecules against the target.

      The manuscript would benefit more in explaining the study rationale if more background on previous characterization of this fungicide on Fusarium is given.

      We appreciate the reviewer's suggestion. Under no treatment with quinofumelin, mycelial growth remains normal and does not require restoration. In the presence of quinofumelin treatment, the supplementation of downstream metabolites in the de novo pyrimidine biosynthesis pathway can restore mycelial growth that is inhibited by quinofumelin. The wild-type control group is illustrated in Figure 4. Figure 5b depicts the phenotypes of the deletion mutants. With respect to the relationship among quinofumelin, FgDHODHII, and other metabolites, quinofumelin specifically targets the key enzyme FgDHODHII in the de novo pyrimidine biosynthesis pathway, disrupting the conversion of dihydroorotate to orotate, which consequently inhibits the synthesis downstream metabolites including uracil. In our previous study, quinofumelin not only exhibited excellent antifungal activity against the mycelial growth and spore germination of F. graminearum, but also inhibited the biosynthesis of deoxynivalenol (DON). We have added this part to the introduction section.

      Reviewer #3 (Public review):

      Summary:

      The manuscript shows the mechanism of action of quinofumelin, a novel fungicide, against the fungus Fusarium graminearum. Through omics analysis, phenotypic analysis, and in silico approaches, the role of quinofumelin in targeting DHODH is uncovered.

      We appreciate the reviewer's accurate summary of our manuscript.

      Strengths:

      The phenotypic analysis and mutant generation are nice data and add to the role of metabolites in bypassing pyrimidine biosynthesis.

      We appreciate the reviewer's recognition of the strengths of our manuscript.

      Weaknesses:

      The role of DHODH in this class of fungicides has been known and this data does not add any further significance to the field. The work of Higashimura et al is not appreciated well enough as they already showed the role of quinofumelin upon DHODH II.

      There is no mention of the other fungicide within this class ipflufenoquin, as there is ample data on this molecule.

      We appreciate the reviewer's suggestion. We sincerely appreciate the reviewer's insightful comment regarding the work of Higashimura et al. We agree that their investigation into the role of quinofumelin in DHODH II inhibition provides critical foundational insights for this field. In the revised manuscript, we have incorporated the reference in the introduction section and expanded the discussion of their work in the discussion section to more effectively contextualize their contributions. The information regarding action mechanism of ipflufenoquin against filamentous fungi was added in discussion section.

      Reviewer #1 (Recommendations for the authors):

      (1) Given that the DHODH gene had been identified as a target earlier, could the authors perform blast experiments with this gene instead and let us know the percentage similarity between the FgDHODHII gene and the Pyricularia oryzae class II DHODH gene in the report by Higashimura et al., 2022.

      BLAST experiment revealed that the percentage similarity between the FgDHODHII gene and the class II DHODH gene of P. oryzae was 55.41%. We have added the description ‘Additionally, the amino acid sequence of the FgDHODHII exhibits 55.41% similarity to that of DHODHII from Pyricularia oryzae, as previously reported (Higashimura et al., 2022)’ in section Results.

      (2) Abstract:

      The authors started abbreviating new terms e.g. DEG, DMP, etc but then all of a sudden stopped and introduced UMP with no full meaning of the abbreviation. Please give the full meaning of all abbreviations in the text, UMP, STC, RM, etc.

      We have provided the full meaning for all abbreviations as requested.

      (3) Introduction section:

      The introduction talks very little about the work of other groups on quinofumelin. Perhaps add this information in and reference them including the work of Higashimura et al., 2022 which has done quite significant work on this topic but is not even mentioned in the background

      We have added the work of other groups on quinofumelin in section introduction.

      (4) General statements:

      Please show a model of the pyrimidine pathway that quinofumelin attacks to make it easier for the reader to understand the context. They could just copy this from KEGG

      We have added the model (Fig. 7).

      (5) Line 186:

      The authors did a great job of demonstrating interactions with the Quinofumelin and went to lengths to perform MST, SPR, molecular docking, and structural biology analyses yet in the end provide no details about the specific amino acid residues involved in the interaction. I would suggest that site-directed mutagenesis studies be performed on FgDHODHII to identify specific amino acid residues that interact with Quinofumelin and show that their disruption weakens Quinofumelin interaction with FgDHODHII.

      Thank you for this insightful suggestion. We fully agree with the importance of elucidating the interaction mechanism. At present, we are conducting site-directed mutagenesis studies based on interaction sites from docking results and the mutation sites of FgDHODHII from the resistant mutants; however, due to the limitations in the accuracy of existing predictive models, this work remains ongoing. Additionally, we are undertaking co-crystallization experiments of FgDHODHII with quinofumelin to directly and precisely reveal their interaction pattern

      (6) Line 76:

      What is the reference or evidence for the statement 'In addition, quinofumelin exhibits no cross-resistance to currently extensively used fungicides, indicating its unique action target against phytopathogenic fungi.

      If two fungicides share the same mechanism of action, they will exhibit cross resistance. Previous studies have demonstrated that quinofumelin retains effective antifungal activity against fungal strains resistant to commercial fungicides, indicating that quinofumelin does not exhibit cross-resistance with other commercially available fungicides and possesses a novel mechanism of action. Additionally, we have added the relevant inference.

      (7) Line 80-82:

      Again, considering the work of previous authors, this target is not newly discovered. Please consider toning down this statement 'This newly discovered selective target for antimicrobial agents provides a valuable resource for the design and development of targeted pesticides.'

      We have rewritten the description of this sentence.

      (8) Line 138: If the authors have identified DHODH in experimental groups (I assume in F. graminearum), what was the exact locus tag or gene name in F. graminearum, and why not just continue with this gene you identified or what is the point of doing a blast again to find the gene if the DHODH gene if it already came up in your transcriptomic or metabolic studies? This unfortunately doesn't make sense but could be explained better.

      The information of FgDHODHII (gene ID: FGSG_09678) has been added. We have revised this part.

      Reviewer #2 (Recommendations for the authors):

      (1) Line 40:

      Please add a reference.

      We have added the reference

      (2) Line 47:

      Please add a reference.

      We have added the reference.

      (3) Line 50:

      The lack of target diversity in existing fungicides doesn't necessarily serve as a reason for discovering new targets being more challenging than identifying new fungicides within existing categories, please consider adjusting the argument here. Instead, the authors can consider reasons for the lack of new targets in the field.

      We have revised the description.

      (4) Line 63:

      Please cite your source with the new technology.

      We have added the reference.

      (5) Line 68:

      What are you referring to for "targeted medicine", do you have a reference?

      We have revised the description and the reference.

      (6) Line 74:

      One of the papers referred to "quinoxyfen", what are the similarities and differences between the two? Please elaborate for the readership.

      Quinoxyfen, similar to quinofumelin, contains a quinoline ring structure. It inhibits mycelial growth by disrupting the MAP kinase signaling pathway in fungi (https://www.frac.info). In addition, quinoxyfen still exhibits excellent antifungal activity against the quinofumelin-resistant mutants (the findings from our group), indicating that action mechanism for quinofumelin and quinoxyfen differ.

      (7) Line 84:

      Please introduce why RNA-Seq was designed in the study first. What were the groups compared? How was the experiment set up? Without this background, it is hard to know why and how you did the experiment.

      According to your suggestions, we have added the description in Section Results. In addition, the experimental process was described in Section Materials and methods as follows: A total of 20 mL of YEPD medium containing 1 mL of conidia suspension (1×105 conidia/mL) was incubated with shaking (175 rpm/min) at 25°C. After 24 h, the medium was added with quinofumelin at a concentration of 1 μg/mL, while an equal amount of dimethyl sulfoxide was added as the control (CK). The incubation continued for another 48 h, followed by filtration and collection of hyphae. Carry out quantitative expression of genes, and then analyze the differences between groups based on the results of DESeq2 for quantitative expression.

      (8) Figures:

      The figure labeling is missing (Figures 1,2,3 etc). Please re-order your figure to match the text

      The figures have been inserted.

      (9) Line. 97:

      "Volcano plot" is a common plot to visualize DEGs, you can directly refer to the name.

      We have revised the description.

      (10) Figure 1d, 1e:

      Can you separate down- and up-regulated genes here? Does the count refer to gene number?

      The expression information for down- and up-regulated genes is presented in Figure 1a and 1b. However, these bubble plots do not distinguish down- and up-regulated genes. Instead, they only display the significant enrichment of differentially expressed genes in specific metabolic pathways. To more clearly represent the data, we have added the detailed counts of down- and up-regulated genes for each metabolic pathway in Supplementary Table S1 and S2. Here, the term "count" refers to differentially expressed genes that fall within a certain pathway.

      (11) Line 111:

      Again, no reasoning or description of why and how the experiment was done here.

      Based on the results of KEGG enrichment analysis, DEMs are associated with pathways such as thiamine metabolism, tryptophan metabolism, nitrogen metabolism, amino acid sugar and nucleotide sugar metabolism, pantothenic acid and CoA biosynthesis, and nucleotide sugar production compounds synthesis. To specifically investigate the metabolic pathways involved action mechanism of quinofumelin, we performed further metabolomic experiments. Therefore, we have added this description according the reviewer’s suggestions.

      (12) Figure 2a:

      It seems many more metabolites were reduced than increased. Is this expected? Due to the antifungal activity of this compound, how sick is the fungus upon treatment? A physiological study on F. graminearum (in a dose-dependent manner) should be done prior to the omics study. Why do you think there's a stark difference between positive and negative modes in terms of number of metabolites down- and up-regulated?

      Quinofumelin demonstrates exceptional antifungal activity against Fusarium graminearum. The results indicate that the number of reduced metabolites significantly exceeds the number of increased metabolites upon quinofumelin treatment. Mycelial growth is markedly inhibited under quinofumelin exposure. Prior to conducting omics studies, we performed a series of physiological and biochemical experiments (refer to Qian Xiu's dissertation https://paper.njau.edu.cn/openfile?dbid=72&objid=50_49_57_56_49_49&flag=free). Upon quinofumelin treatment, the number of down-regulated metabolites notably surpasses that of up-regulated metabolites compared to the control group. Based on the findings from the down-regulated metabolites, we conducted experiments by exogenously supplementing these metabolites under quinofumelin treatment to investigate whether mycelial growth could be restored. The results revealed that only the exogenous addition of uracil can restore mycelial growth impaired by quinofumelin.

      Quinofumelin exhibits an excellent antifungal activity against F. graminearum. At a concentration of 1 μg/mL, quinofumelin inhibits mycelial growth by up to 90%. This inhibitory effect indicates that life activities of F. graminearum are significantly disrupted by quinofumelin. Consequently, there is a marked difference in down- and up-regulated metabolites between quinofumelin-treated group and untreated control group. The detailed results were presented in Figures 1 and 2.

      (13) Figure 2e:

      This is a good analysis. To help represent the data more clearly, the authors can consider representing the expression using fold change with a p-value for each gene.

      To more clearly represent the data, we have incorporated the information on significant differences in metabolites in the de novo pyrimidine biosynthesis pathway, as affected by quinofumelin, in accordance with the reviewer’s suggestions.

      (14) Line 142:

      Please indicate fold change and p-value for statistical significance. Did you validate this by RT-qPCR?

      We validated the expression level of the DHODH gene under quinofumelin treatment using RT-qPCR. The results indicated that, upon treatment with the EC50 and EC90 concentrations of quinofumelin, the expression of the DHODH gene was significantly reduced by 11.91% and 33.77%, respectively (P<0.05). The corresponding results have been shown in Figure S4.

      (15) Line 145:

      It looks like uracil is the only metabolite differentially abundant in the samples - how did you conclude this whole pathway was impacted by the treatment?

      The experiments involving the exogenous supplementation of uracil revealed that the addition of uracil could restore mycelial growth inhibited by quinofumelin. Consequently, we infer that quinofumelin disrupts the de novo pyrimidine biosynthesis pathway. In addition, as uracil is the end product of the de novo pyrimidine biosynthesis pathway, the disruption of this pathway results in a reduction in uracil levels.

      (16) Figure 3:

      What sequence was used as the root of the tree? Why were the species chosen? Since the BLAST query was Homo sapiens sequence, would it be good to use that as the root?

      FgDHODHII sequence was used as the root of the tree. These selected fungal species represent significant plant-pathogenic fungi in agriculture production. According to your suggestion, we have removed the BLAST query of Homo sapiens in Figure 3.

      (17) Figure 4:

      How were the concentrations used to test chosen?

      Prior to this experiment, we carried out concentration-dependent exogenous supplementation experiments. The results indicated that 50 μg/mL of uracil can fully restore mycelial growth inhibited by quinofumelin. Consequently, we chose 50 μg/mL as the testing concentration.

      (18) Line 164:

      Why do you hypothesize supplementing dihydroorotate would restore resistance? The metabolite seemed accumulated in the treatment condition, whereas downstream metabolites were comparable or even depleted. The DHODH gene expression was suppressed. Would accumulation of dihydroorotate be associated with growth inhibition by quinofumelin? Please include the hypothesis and rationale for the experimental setup.

      DHODH regulates the conversion of dihydroorotate to orotate in the de novo pyrimidine biosynthesis pathway. The inhibition of DHODH by quinofumelin results in the accumulation of dihydroorotate and the depletion of the downstream metabolites, including UMP, uridine and uracil. Consequently, downstream metabolites were considered as positive controls, while upstream metabolite dihydroorotate served as a negative control. This design further demonstrates DHODH as action target of quinofumelin against F. graminearum. In addition, the accumulation of dihydroorotate is not associated with growth inhibition by quinofumelin; however, but the depletion of downstream metabolites in the de novo pyrimidine biosynthesis pathway is closely associated with growth inhibition by quinofumelin.

      (19) Line 168:

      I'm not sure if this conclusion is valid from your results in Figure 4 showing which metabolites restore growth.

      o minimize the potential influence of strain-specific effects, five strains were tested in the experiments shown in Figure 4. For each strain, the first row (first column) corresponds to control condition, while second row (first column) represents treatment with 1 μg/mL of quinofumelin, which completely inhibits mycelial growth. The second row (second column) for each strain represents the supplementation with 50 μg/mL of dihydroorotate fails to restore mycelial growth inhibited by quinofumelin. In contrast, the second row (third column, fourth column, fifth colomns) for each strain demonstrated that the supplementation of 50 μg/mL of UMP, uridine and uracil, respectively, can effectively restore mycelial growth inhibited by quinofumelin.

      (20) Figure 5a:

      The fact you saw growth of the deletion mutant means it's not lethal. However, the growth was severely inhibited.

      Our experimental results indicate that the growth of the deletion mutant is lethal. The mycelial growth observed originates from mycelial plugs that were not exposed to quinofumelin, rather than from the plates amended with quinofumelin.

      (21) Figure 5b:

      Would you expect different restoration of growth in the presence of quinofumelin vs. no treatment? The wild type control is missing here. Any conclusions about the relationship between quinofumelin, FgDHODHII, and other metabolites in the pathway?

      Under no treatment with quinofumelin, mycelial growth remains normal and does not require restoration. In the presence of quinofumelin treatment, the supplementation of downstream metabolites in the de novo pyrimidine biosynthesis pathway can restore mycelial growth that is inhibited by quinofumelin. The wild-type control group is illustrated in Figure 4. Figure 5b depicts the phenotypes of the deletion mutants. With respect to the relationship among quinofumelin, FgDHODHII, and other metabolites, quinofumelin specifically targets the key enzyme FgDHODHII in the de novo pyrimidine biosynthesis pathway, disrupting the conversion of dihydroorotate to orotate, which consequently inhibits the synthesis downstream metabolites including uracil.

      (22) Figure 6b:

      Lacking positive and negative controls (known binder and non-binder). What does the Kd (in comparison to other interactions) indicate in terms of binding strength?

      We tested the antifungal activities of publicly reported DHODH inhibitors (such as leflunomide and teriflunomide) against F. graminearum. The results showed that these inhibitors exhibited no significant inhibitory effects against the strain PH-1. Therefore, we lacked an effective chemical for use as a positive control in subsequent experiments. Biacore experiments offers detailed insights into molecular interactions between quinofumelin and DHODHII. As shown in Figure 6b, the left panel illustrates the time-dependent kinetic curve of quinofumelin binding to DHODHII. Within the first 60 s after quinofumelin was introduced onto the DHODHII surface, it bound to the immobilized DHODHII on the chip surface, with the response value increasing proportionally to the quinofumelin concentration. Following cessation of the injection at 60 s, quinofumelin spontaneously dissociated from the DHODHII surface, leading to a corresponding decrease in the response value. The data fitting curve presented on the right panel indicates that the affinity constant KD of quinofumelin for DHODHII is 6.606×10-6 M, which falls within the typical range of KD values (10-3 ~ 10-6 M) for protein-small molecule interaction patterns. A lower KD value indicates a stronger affinity; thus, quinofumelin exhibits strong binding affinity towards DHODHII.

      Reviewer #3 (Recommendations for the authors):

      The authors should add information about the other molecule within this class, ipflufenoquin, and what is known about it. There are already published data on its mode of action on DHODH and the role of pyrimidine biosynthesis.

      We have added the information regarding action mechanism of ipflufenoquin against filamentous fungi in discussion section.

      The work of Higashimura et al is not appreciated well enough as they already showed the role of quinofumelin upon DHODH II.

      We sincerely appreciate the reviewer's insightful comment regarding the work of Higashimura et al. We agree that their investigation into the role of quinofumelin in DHODH II inhibition provides critical foundational insights for this field. In the revised manuscript, we have incorporated the reference in the introduction section and expanded the discussion of their work in the discussion section to more effectively contextualize their contributions.

      It is unclear how the protein model was established and this should be included. What species is the molecule from and how was it obtained? How are they different from Fusarium?

      The three-dimensional structural model of F. graminearum DHODHII protein, as predicted by AlphaFold, was obtained from the UniProt database. Additionally, a detailed description along with appropriate citations has been incorporated in the ‘Manuscript’ file.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review):

      We thank the reviewer for the positive feedback on the work. The reviewer has raised two weaknesses and in the following we discuss how those can be addressed.  

      Weaknesses:

      The impact of the article is limited by using a network with discrete time- steps, and only a small number of time steps from stimulus to reward. They assume that each time step is on the order of hundreds of ms. They justify this by pointing to some slow intrinsic mechanisms, but they do not implement these slow mechanisms is a network with short time steps, instead they assume without demonstration that these could work as suggested. This is a reasonable first approximation, but its validity should be explicitly tested.

      Our goal here was to give a proof of concept that online random feedback is sufficient to train an RNN to estimate value. Indeed, it is important to show that the idea works in a model where the slow mechanisms are explicitly implemented. However, this is a non-trivial task and desired to be addressed in future works.  

      As the delay between cue and reward increases the performance decreases. This is not surprising given the proposed mechanism, but is still a limitation, especially given that we do not really know what a is the reasonable value of a single time step.

      In reply to this comment and the other reviewer's related comment, we have conducted two sets of additional simulations, one for examining incorporation of eligibility traces, and the other for considering (though not mechanistically implementing) behavioral time-scale synaptic plasticity (BTSP). We have added their results to the revised manuscript as Appendix. We think that the results addressed this point to some extent while how longer cue-reward delay can be learnt by elaboration of the model remains as a future issue.

      Reviewer #2 (Public Review):

      We thank the reviewer for the positive feedback on the work. The reviewer gave comments on our revisions, and here we discuss how those can be addressed.

      Comments on revisions: I would still want to see how well the network learns tasks with longer time delays (on the order of 100 or even 1000 timesteps). Previous work has shown that random feedback struggles to encode longer timescales (see Murray 2019, Figure 2), so I would be interested to see how that translates to the RL context in your model.

      We would like to note that in Murray et al 2019 the random feedback per se appeared not to be primarily responsible for the difficulty in encoding longer timesclaes. In the Figure 2d (Murray 2019), the author compared his RFLO (random feedback local online) and BPTT with two intermediate algorithms, which incorporated either one of the two approximations made in RFLO: i) random feedback instead of symmetric feedback, and ii) omittance of non-local effect (i.e., dependence of the derivative of the loss with respect to a given weight on the other weights). The performance difference between RFLO and BPTT was actually mostly explained by ii), as the author mentioned "The results show that the local approximation is essentially fully responsible for the performance difference between RFLO and BPTT, while there is no significant loss in performance due to the random feedback alone. (Line 6-8, page 7 of Murray, 2019, eLife)".

      Meanwhile, regarding the difference in the performance of the model with random feedback vs the model with symmetric feedback in our settings, actually it appeared (already) in the case with 6 time-steps or less (the biologically constrained model with random feedback performed worse: Fig. 6J, left).

      In practice, our model, either with random or symmetric feedback, would not be able to learn the cases with very long delays. This is indeed a limitation of our model. However, our model is critically different from the model of Murray 2019 in that we use RL rather than supervised learning and we use a scalar bootstrapped (TD) reward-prediction-error rather than the true output error. We would think that these differences may be major reasons for the limited learning ability of our model.

      Regarding the feasibility of the model when tasks involve longer time delays: Indeed this is a problem and the other reviewers have also raised the same point. Our model can be extended by incorporating either a kind of eligibility trace (similar one to those contained in RFLO and e-prop) or behavioral time-scale synaptic plasticity (BTSP), and we have added the results of simulations incorporating each to the revised manuscript as Appendix. But how longer cue-reward delay can be learnt by elaboration of the model remains as a future issue.

      Reviewer #3 (Public Review):

      Comments on revisions: Thank you for addressing all my comments in your reply.

      We are happy to learn that all concerns raised by the reviewer in the previous round were addressed adequately. We agree with the reviewer that there are several ways the work can be improved.

      The various points raised by the reviewers at weaknesses are desired to be taken up in future works.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      This manuscript provides an initial characterization of three new missense variants of the PLCG1 gene associated with diverse disease phenotypes, utilizing a Drosophila model to investigate their molecular effects in vivo. Through the meticulous creation of genetic tools, the study assesses the small wing (sl) phenotype - the fly's ortholog of PLCG1 - across an array of phenotypes from longevity to behavior in both sl null mutants and variants. The findings indicate that the Drosophila PLCG1 ortholog displays aberrant functions. Notably, it is demonstrated that overexpression of both human and Drosophila PLCG1 variants in fly tissue leads to toxicity, underscoring their pathogenic potential in vivo.

      Strengths:

      The research effectively highlights the physiological significance of sl in Drosophila. In addition, the study establishes the in vivo toxicity of disease-associated variants of both human PLCG1 and Drosophila sl.

      Weaknesses:

      The study's limitations include the human PLCG1 transgene's inability to compensate for the Drosophila sl null mutant phenotype, suggesting potential functional divergence between the species. This discrepancy signals the need for additional exploration into the mechanistic nuances of PLCG1 variant pathogenesis, especially regarding their gain-of-function effects in vivo.

      Overall:

      The study offers compelling evidence for the pathogenicity of newly discovered disease-related PLCG1 variants, manifesting as toxicity in a Drosophila in vivo model, which substantiates the main claim by the authors. Nevertheless, a deeper inquiry into the specific in vivo mechanisms driving the toxicity caused by these variants in Drosophila could significantly enhance the study's impact.

      Reviewer #2 (Public Review):

      The manuscript by Ma et al. reports the identification of three unrelated people who are heterozygous for de novo missense variants in PLCG1, which encodes phospholipase C-gamma 1, a key signaling protein. These individuals present with partially overlapping phenotypes including hearing loss, ocular pathology, cardiac defects, abnormal brain imaging results, and immune defects. None of the patients present with all of the above phenotypes. PLCG1 has also been implicated as a possible driver for cell proliferation in cancer.

      The three missense variants found in the patients result in the following amino acid substitutions: His380Arg, Asp1019Gly, and Asp1165Gly. PLCG1 (and the closely related PLCG2) have a single Drosophila ortholog called small wing (sl). sl-null flies are viable but have small wings with ectopic wing veins and supernumerary photoreceptors in the eye. As all three amino acids affected in the patients are conserved in the fly protein, in this work Ma et al. tested whether they are pathogenic by expressing either reference or patient variant fly or human genes in Drosophila and determining the phenotypes produced by doing so.

      Expression in Drosophila of the variant forms of PLCG1 found in these three patients is toxic; highly so for Asp1019Gly and Asp1165Gly, much more modestly for His380Arg. Another variant, Asp1165His which was identified in lymphoma samples and shown by others to be hyperactive, was also found to be toxic in the Drosophila assays. However, a final variant, Ser1021Phe, identified by others in an individual with severe immune dysregulation, produced no phenotype upon expression in flies.

      Based on these results, the authors conclude that the PLCG1 variants found in patients are pathogenic, producing gain-of-function phenotypes through hyperactivity. In my view, the data supporting this conclusion are robust, despite the lack of a detectable phenotype with Ser1021Phe, and I have no concerns about the core experiments that comprise the paper.

      Figure 6, the last in the paper, provides information about PLCG1 structure and how the different variants would affect it. It shows that the His380, Asp1019, and Asp1165 all lie within catalytic domains or intramolecular interfaces and that variants in the latter two affect residues essential for autoinhibition. It also shows that Ser1021 falls outside the key interface occupied by Asp1019, but more could have been said about the potential effects of Ser1021Phe.

      Overall, I believe the authors fully achieved the aims of their study. The work will have a substantial impact because it reports the identification of novel disease-linked genes, and because it further demonstrates the high value of the Drosophila model for finding and understanding gene-disease linkages.

      Reviewer #3 (Public Review):

      Summary:

      The paper attempts to model the functional significance of variants of PLCG2 in a set of patients with variable clinical manifestations.

      Strengths:

      A study attempting to use the Drosophila system to test the function of variants reported from human patients.

      Weaknesses:

      Additional experiments are needed to shore up the claims in the paper. These are listed below.

      Major Comments:

      (1) Does the pLI/ missense constraint Z score prediction algorithm take into consideration whether the gene exhibits monoallelic or biallelic expression?

      To our knowledge, pLI and missense Z don't consider monoallelic or biallelic expression. Instead, they reflect sequence constraint and are calculated based on the observed versus expected variant frequencies in population databases.

      (2) Figure 1B: Include human PLCG2 in the alignment that displays the species-wide conserved variant residues.

      We have updated Figure 1B and incorporated the alignment of PLCG2.

      (3) Figure 4A:

      Given that

      (i) sl is predicted to be the fly ortholog for both mammalian PLCγ isozymes: PLCG1 and PLCG2 [Line 62]

      (ii) they are shown to have non-redundant roles in mammals [Line 71]

      (iii) reconstituting PLCG1 is highly toxic in flies, leading to increased lethality.

      This raises questions about whether sl mutant phenotypes are specifically caused by the absence of PLCG1 or PLCG2 functions in flies. Can hPLCG2 reconstitution in sl mutants be used as a negative control to rule out the possibility of the same?

      The studies about the non-redundant roles of PLCG1 and PLCG2 mainly concern the immune system.

      We have assessed the phenotypes in the sl<sup>T2A</sup>/Y; UAS-hPLCG2 flies. Expression of human PLCG2 in flies is also toxic and leads to severely reduced eclosion rate.

      We have updated the manuscript with these results, and included the eclosion rate of sl<sup>T2A</sup>/Y; UAS-hPLCG2 flies in the new Figure 4B.

      (4) Do slT2A/Y; UAS-PLCG1Reference flies survive when grown at 22{degree sign}C? Since transgenic fly expressing PLCG1 cDNA when driven under ubiquitous gal4s, Tubulin and Da, can result in viable progeny at 22{degree sign}C, the survival of slT2A/Y; UAS-PLCG1Reference should be possible.

      The eclosion rate of sl<sup>T2A</sup>/Y >PLCG1<sup>Reference</sup> flies at 22°C is slightly higher than at 25°C, but remains severely reduced compared to the UAS-Empty control. We have presented these results in the updated Figure S3.

      and similarly

      Does slT2A flies exhibit the phenotypes of (i) reduced eclosion rate (ii) reduced wing size and ectopic wing veins and (iii) extra R7 photoreceptor in the fly eye at 22{degree sign}C?

      The mutant phenotypes are still observed at 22 °C.

      If so, will it be possible to get a complete rescue of the slT2A mutant phenotypes with the hPLCG1 cDNA at 22{degree sign}C? This dataset is essential to establish Drosophila as an ideal model to study the PLCG1 de novo variants.

      Thank you for the suggestion. It is difficult to directly assess the rescue ability of the PLCG1 cDNAs due to the toxicity. However, our ectopic expression assays show that the variants are more toxic than the reference with variable severities, suggesting that the variants are deleterious.

      The ectopic expression strategy has been used to evaluate the consequence of genetic variants and has significantly contributed to the interpretation of their pathogenicity in many cases (reviewed in Her et al., Genome, 2024, PMID: 38412472).

      (5) Localisation and western blot assays to check if the introduction of the de novo mutations can have an impact on the sub-cellular targeting of the protein or protein stability respectively.

      Thank you for the suggestion.

      We expressed PLCG1 cDNAs in the larval salivary glands and performed antibody staining (rabbit anti-Human PLCG1; 1:100, Cell Signaling Technology, #5690). The larval salivary gland are composed of large columnar epithelia cells that are ideal for analyzing subcellular localization of proteins. The PLCG1 proteins are cytoplasmic and localize near the cell surface, with some enrichment in the plasma membrane region. The variant proteins are detected, and did not show significant difference in expression level or subcellular distribution compared to the reference. We did not include this data.

      (6) Analysing the nature of the reported gain of function (experimental proof for the same is missing in the manuscript) variants:

      Instead of directly showing the effect of introducing the de novo variant transgenes in the Drosophila model especially when the full-length PLCG1 is not able to completely rescue the slT2A phenotype;

      (i) Show that the gain-of-function variants can have an impact on the protein function or signalling via one of the three signalling outputs in the mammalian cell culture system: (i) inositol-1,4,5-trisphosphate production, (ii) intracellular Ca2+ release or (iii) increased phosphorylation of extracellular signal-related kinase, p65, and p38.

      We appreciate the reviewer’s suggestion. We utilized the CaLexA (calcium-dependent nuclear import of LexA) system (Masuyama et al., J Neurogenet, 2012, PMID: 22236090) to assess the intracellular Ca<sup>2+</sup> change associated with the expression of PLCG1 cDNAs in fly wing discs. The results show that, compared to the reference, expression of the D1019G or D1165G variants leads to elevated intracellular Ca<sup>2+</sup> levels, similar to the hyperactive S1021F and D1165H variants. However, the H380R or L597F variants did not show a detectable phenotype in this assay. These results suggest that D1019G and D1165G are hyperactive variants, whereas H380R and L597F variant are not, or their effect is too mild to be detected in this assay. We have updated the related sections in the manuscript and Figures 5A and S5.

      OR

      (ii) Run a molecular simulation to demonstrate how the protein's auto-inhibited state can be disrupted and basal lipase activity increased by introducing D1019G and D1165G, which destabilise the association between the C2 and cSH2 domains. The H380R variant may also exhibit characteristics similar to the previously documented H335A mutation which leaves the protein catalytically inactive as the residue is important to coordinate the incoming water molecule required for PIP2 hydrolysis.

      We utilized the DDMut platform, which predicts changes in the Gibbs Free Energy (ΔΔG) upon single and multiple point mutations (Zhou et al., Nucleic Acid Res, 2023, PMID: 37283042), to gain insight into the molecular dynamics changes of variants. The results are now presented in Figure S7.

      Additionally, we performed Molecular dynamics (MD) simulations. The results show that, similar to the hyperactive D1165H variant, the D1019G and D11656G variants exhibit increased disorganization, with a higher root mean square deviations (RMSD) compared to the reference PLCG1.The data are also presented in the updated Figure S7.

      (7) Clarify the reason for carrying out the wing-specific and eye-specific experiments using nub-gal4 and eyless-gal4 at 29˚C despite the high gal4 toxicity at this temperature.

      We used high temperature and high expression level to see if the mild H380R and L597F variants could show phenotypes in this condition.

      The toxicity of the two strong variants (D1019G and D1165G) has been consistently confirmed in multiple assays at different temperatures.

      (8) For the sake of completeness the authors should also report other variants identified in the genomes of these patients that could also contribute to the clinical features.

      Thank you!

      The additional variants and their potential contributions to the clinical features are listed and discussed in Table 1 and its legend.

      Reviewer #1 (Recommendations For The Authors):

      The manuscript's significant contribution is tempered by a lack of comprehensive analysis using the generated genetic reagents in Drosophila. To enhance our understanding of the PLCG1 orthologs, I suggest the following:

      (1) A more detailed molecular analysis to distinguish the actions of sl variants from the wild-type could be very informative. For example, utilizing the HA-epitope tag within the current UAS-transgenes could reveal more about the cellular dynamics and abundance of these variants, potentially elucidating mechanisms beyond gain-of-function.

      We appreciate the reviewer’s suggestion. The UAS-sl cDNA constructs contain stop codon and do not express an HA-epitope tag. Alternatively, we utilized commercially available antibodies against human PLCG1 antibodies to assess the subcellular localization and protein stability by expressing the reference and variant PLCG1 cDNAs in Drosophila larval salivary glands. The reference proteins are cytoplasmic with some enrichment along the plasma membrane. However, we did not observe significant differences between the reference and variant proteins in this assay. We did not include this data.

      (2) I suggest further investigating the relative contributions of developmental processes and acute (Adult) effects on the sl-variant phenotypes observed. For example, employing systems that allow for precise temporal control of gene expression, such as the temperature-sensitive Gal80, could differentiate between these effects, shedding light on the mechanisms that affect longevity and locomotion. This knowledge would be vital for a deeper understanding of the corresponding human disorders and for developing therapeutic interventions.

      We appreciate the reviewer’s suggestion. We utilized Tub-GAL4, Tub-GAL80<sup>ts</sup> to drive the expression of sl wild-type or variant cDNAs, and performed temperature shifts after eclosion to induce expression of the cDNAs only in adult flies. The sl<sup>D1184G</sup> variant (corresponding to PLCG1<sup>D1165G</sup>) caused severely reduced lifespan and the flies mostly die within 10 days. The sl<sup>D1041G</sup> variant (corresponding to PLCG1<sup>D1019G</sup>) led to reduced longevity and locomotion. The sl<sup>H384R</sup> variant (corresponding to PLCG1<sup>H380R</sup>) showed only a mild effect on longevity and no significant effect on climbing ability. These results suggest that the two strong variants (sl<sup>D1041G<sup> and sl<sup>D1184G</sup>) contribute to both developmental and acute effects while the H384R variant mainly contributes to developmental stages.

      I also suggest a more refined analysis of overexpression toxicity. Rather than solely focusing on ubiquitous transgene expression, overexpressing transgene in endogenous pattern using sl-t2a-Gal4 may yield a more nuanced understanding of the pathogenic mechanisms of gain-of-function mutations, particularly in the pathogenesis associated with these variants exclusively located in the coding regions.

      We appreciate the reviewer’s suggestion. We therefore performed the experiments using sl<sup>T2A</sup> to drive overexpression ofPLCG1cDNAs in heterozygous female progeny with one copy of wild-type sl+ (sl<sup>T2A</sup>/ yw > UAS-cDNAs). In this context, expression of PLCG1<sup>Reference<sup>, PLCG1<sup>H380R</sup>orPLCG1<sup>L597F</sup> is viable whereas expression of PLCG1<sup>D1019G</sup> or PLCG1<sup>D1165G</sup> is lethal, suggesting that the PLCG1<sup>D1019G</sup> and PLCG1<sup>D1165G</sup> variants exert a strong dominant toxic effect while the PLCG1<sup>H380R</sup>and PLCG1<sup>L597F<sup> are comparatively milder. Similar patterns have been consistently observed in other ectopic expression assays with varying degrees of severity. These results are updated in the manuscript and figures.

      Reviewer #2 (Recommendations For The Authors):

      The work in the paper could be usefully extended by determining the effects of expressing His380Phe and His380Ala in flies. These variants suppress PLCG1 activity, so their phenotype, if any, would be predicted not to be the same as His380Arg. Determining this would add further strength to the conclusions of the paper.

      We thank the reviewer for the constructive suggestions! We have tested the enzymatic-dead H380A variant, which still exhibits toxicity when expressed in sl<sup>T2A</sup>/Y hemizygous flies, but it is not toxic in heterozygous females suggesting that the reduced eclosion rate is likely not directly associated with enzymatic activity. We have updated the manuscript and figures accordingly.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations for the authors):

      Suggestions:

      Although this study has an impressive dataset, I felt that some parts of the discussion would benefit from further explanation, specifically when discussing the differences in female aggression direction between groups with different sex compositions. In the discussion is suggested that males buffer female-on-female aggression and that they 'support' lower-ranking females (see line 212), however, the study only tested the sex composition of the group and does not provide any evidence of this buffering. Thus, I would suggest adding more information on how this buffering or protection from males might manifest (for example, listing male behaviours that might showcase this protection) or referencing other studies that support this claim. Another example of this can be found in lines 223-224, which suggests that females choose lower-ranking individuals when they are presented with a larger pool of competitors; however, in lines 227-228, it's stated that this result contradicts previous work in baboons, which makes the previous claim seem unjustified. I recommend adding other examples from studies that support the results of this paper and adding a line that addresses reasons why these differences between gorillas and baboons might be caused (for example, different social dynamics or ecological constraints). In addition, I suggest the inclusion of physiological data such as direct measures of energy expenditure, caloric intake, or hormone levels, as it would strengthen the claims made in the second paragraph of the discussion. However, I understand this might not be possible due to data or time constraints, so I suggest adding more robust justification on why lactation and pregnancy were used as a proxy for energetic need. In the methods (lines 127-128), it is unclear which phase of the pregnancy or lactation is more energetically demanding. I would also suggest adding a comment on the limitations of using reproductive state to infer energetic need. Lastly, if the data is available, I believe it would be interesting to add body size and age of the females or the size difference between aggressor and target as explanatory variables in the models to test if physiological characteristics influence female-on-female aggression.

      Male support:

      We have now added more references (Watts 1994, 1997) and enriched our arguments regarding male presence buffering aggression. Previous research suggests that male gorillas may support lower-ranking females and they may intervene in female-female conflicts (Sicotte 2002). Unfortunately, our dataset did not allow us to test for male protection. We conduct proximity scans every 10 minutes and these scans are not associated to each interaction, meaning that we cannot reliably test if proximity to a male influences the likelyhood to receive aggression.

      Number of competitors and choice of weaker competitors:

      We added a very relevant reference in humans, showing that people choose weaker competitors when they have they can choose. We removed the example to baboons because it used sex ratio and the relevance to our study was not that straightforward.

      Reproductive state as a proxy for energetic needs:

      We now mention clearly that reproductive state is an indirect measure of energetic needs.

      We rephrased our methods to: “Lactation is often considered more energetically demanding than pregnancy as a whole but the latest stages of pregnancy are highly energetically demanding, potentially even more than lactation”

      Unfortunately, we do not have access to physiological and body size data. Regarding female age, for many females, ages are estimates with errors up to a decade, and thus, we choose not to use them as a reliable predictor. Having accurate values for all these variables, would indeed be very valuable and improve the predicting power of our study.

      Recommendations for writing and presentation:

      Overall, the manuscript is well-organised and well-written, but there are certain areas that could improve in clarity. In the introduction, I believe that the term 'aggression heuristic' should be introduced earlier and properly defined in order to accommodate a broader audience. The main question and aims of the study are not stated clearly in the last paragraph of the introduction. In the methods, I think it would improve the clarity to add a table for the classification of each type of agonistic interactions instead of naming them in the text. For example, a table that showcase the three intensity categories (severe, mild and moderate), than then dives into each behaviour (e.g. hit, bite, attack, etc.) and a short description of these behaviours, I think this would be helpful since some of the behaviours mentioned can be confusing (what's the difference between attack, hit and fight?). In addition, in line 104, it states that all interactions were assigned equal intensity, which needs to be explained.

      We now define aggression heuristics in both the abstract and the first paragraph of the introduction. We have also explained aggressive interactions that their nature was not obvious from their names. Hopefully, these explanations make clear the differences among the recorded behaviours.

      We have now specified that the “equal intensity” refers to avoidances and displacements used to infer power relationships: “We assigned to all avoidance/displacement interactions equal intensity, that is, equal influence to the power relationship of the interacting individuals”

      Minor corrections:

      (1) In line 41, there is a 1 after 'similar'. I am unsure if it's a mistake or a reference.

      We corrected the typo.

      (2) In lines 68-69, there is mention of other studies, but no references are provided.

      We added citations as suggested.

      (3) Remove the reference to Figure 1 (line 82) from the introduction; the figure should be referenced in the text just before the image, however, your figure is in a different section.

      We removed the reference as suggested.

      (4) Line 98 and 136, it's written 'ad libtum' but the correct spelling is 'ad libitum'.

      We corrected the typo.

      (5) Figure 3, remove the underscores between the words in the axis titles.

      We removed the underscores.

      Reviewer #2 (Recommendations for the authors):

      Here, I have outlined some specific suggestions that require attention. Addressing these comments will enhance the readability and enhance the quality of the manuscript.

      (1) L69. Add citation here, indicating the studies focusing on aggression rates.

      We added citations as suggested.

      (2) L88. The study periods used in this study and the authors' previous study (Reference 11) are different. So please add one table as Table 1 showing the details info on the sampling efforts and data included in their analysis of this study. For example, the study period, the numbers of females and males, sampling hours, the number of avoidance/displacement behaviors used to calculate individual Elo-ratings, and the number of mild/moderate/severe aggressive interactions, etc.

      We have now added another table, as suggested (new Table 1) and we have also made clear that we used the hierarchies presented in detail in (Smit & Robbins 2025).

      (3) L103. If readers do not look over Reference 25 on purpose, they do not know what the authors want to talk about and why they mention the optimized Elo-rating method. Clarify this statement and add more content explaining the differences between the two methods, or just remove it.

      We rephrased the text and in response to the previous comment, we clearly state that there are more details about our approach in Smit & Robbins 2025. At the end of the relevant sentence, we added the following parenthesis “(see “traditional Elo rating method”; we do not use the “optimized Elorating method” as it yields similar results and it is not widely used)” and we removed the sentence referring to the optimized Elo-rating method.

      (4) L110. Here, the authors stated that the individual with the standardized Elo-score 1 was the highest-ranking. L117, the "aggression direction" score of each aggressive interaction was the standardized Elo-score of the aggressor, subtracting that of the recipient. So, when the "aggression direction" score was 1, it should mean that the aggressor was the highest-ranking and the recipient was the lowest-ranking female. This is not as the authors stated in L117-120 (where the description was incorrectly reversed). Please clarify.

      The highest ranking individual has indeed Elo_score equal to 1 and we calculated the interaction score (or "aggression direction score") of each aggressive interaction by subtracting the standardized Elo-score of the aggressor from that of the recipient (Elo_recepient – Elo_aggressor). So, when the aggressor is the lowest-ranking female (Elo_score=0) and the recipient the highestranking female one (Elo_score=1), the "aggression direction score" is 1-0 = 1.

      (5) Regarding point 3 of the Public Review, please also revise/expand the paragraph L193-208 in the Discussion section accordingly.

      Please see our response to the public review. We have enriched the results section, added pairwise comparisons in a new table (Table 2) and modified the discussion accordingly.

      (6) Table 1. It's not clear why authors added the column 'Aggression Rate' but did not provide any explanation in the Methods/Results section. How did they calculate the correlation between each tested variable and the "overall adult female aggression rates"? Correlating the number of females in the first trimester of female pregnancy with the female aggression rates in each study group? What did the correlation coefficients mean? L202-204 may provide some hints as to why the authors introduced the Aggression Rate. But it should be made clear in the previous text.

      We now added more details in the legend of the table to make our point clear: “To highlight that aggression rates can increase due to increase in interactions of different score, we also include the effect of some of the tested variables on overall adult female aggression rates, based on results of linear mixed effects models from (Smit & Robbins 2024).”  We did not include detailed methods to calculate those results because they are detailed in (Smit & Robbins 2024). We find it valuable to show the results of both aggression rates and aggression directionality according to the same predictor variables as a means to clarify that aggression rates and aggression directionality are not always coordinated to one another (they do not always change in a consistent manner relative to one another).

      (7) L166.This is not rigorous. Please rephrase. There is only one western gorilla group containing only one resident male included in the analysis.

      We have toned down our text: “Our results did not show any significant difference between femalefemale aggression patterns within the one western and four mountain gorillas groups”

      (8) L167. I don't think the interaction scores in the third trimester of female pregnancy were significantly higher than those in the first trimester. The same concern applies in L194-195.

      We have now added a new table with post hoc pairwise comparisons among the different reproductive states that clarifies that.

      (9) L202. There is no column 'Aggression rates' in Table 1 of Reference 11.

      We have rephrased to make clear that we refer to Table 1 of the present study.

      (10) L204-205. Reference 49. Maybe not a proper citation here. This claim requires stronger evidence or further justification. Additionally, please rephrase and clarify the arguments in L204208 for better readability and precision.

      We have added three more references and rephrased to clarify our argument.

      Reviewer #3 (Recommendations for the authors):

      (1) Line 41: The word "similar" is misspelled.

      We corrected the typo.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Recommendations for the authors):

      (1) I am not convinced by the figures the authors present on Shh protein expression. The "bright tiny dots" of Shh protein in the cortex are not visible on the images in Figure 7. I wonder whether the authors could present higher magnification and/or black and white images with increased contrast.

      We have modified Figure 7: we now present a higher magnification and a black and white image with increased contrast to better visualize SHH (+) bright tiny dots in the lateral cortex.

      (2)The manuscript also contains several typos.

      We apologize for these mistakes which have all been corrected.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      The study "Monitoring of Cell-free Human Papillomavirus DNA in Metastatic or Recurrent Cervical Cancer: Clinical Significance and Treatment Implications" by Zhuomin Yin and colleagues focuses on the relationship between cell-free HPV (cfHPV) DNA and metastatic or recurrent cervical cancer patients. It expands the application of cfHPV DNA in tracking disease progression and evaluating treatment response in cervical cancer patients. The study is overall well-designed, including appropriate analyses.

      Strengths:

      The findings provide valuable reference points for monitoring drug efficacy and guiding treatment strategies in patients with recurrent and metastatic cervical cancer. The concordance between HPV cfDNA fluctuations and changes in disease status suggests that cfDNA could play a crucial role in precision oncology, allowing for more timely interventions. As with similar studies, the authors used Droplet Digital PCR to measure cfDNA copy numbers, a technique that offers ultrasensitive nucleic acid detection and absolute quantification, lending credibility to the conclusions.

      Weaknesses:

      Despite including 28 clinical cases, only 7 involved recurrent cervical cancer, which may not be sufficient to support some of the authors' conclusions fully. Future studies on larger cohorts could solidify HPV cfDNA's role as a standard in the personalized treatment of recurrent cervical cancer patients.

      (1) The authors should provide source data for Figures 2, 3, and 4 as supplementary material.

      We greatly appreciate your evaluation of our study and fully agree with the limitations you have pointed out. We appreciate your constructive feedback. Based on your suggestions, we have made the following additions to the article. We have realized that the information provided in Figures 2, 3, and 4 is limited. Therefore, we have presented the original data from Figures 2, 3, and 4 in tabular form in Supplementary Table 2.

      (2) Description of results in Figure 2: Figure 2 would benefit from clearer annotations regarding HPV virus subtypes. For example, does the color-coding in Figure 2B imply that all samples in the LR subgroup are of type HPV16? If that is the case, is it possible that detection variations are due to differences in subtype detection efficiency rather than cfDNA levels? The authors should clarify these aspects. Annotation of Figure 2B suggests that the p-value comes from comparing the LR and LN + H + DSM groups. This should be clarified in the legend. If this p-value comes from comparing HPV cfDNA copies for the (LR, LNM, HM) and (LN + HM, LN + HM + DSM) groups, did the authors carry out post-hoc pairwise comparisons? It would be helpful to include acronyms for these groups in the legend also.

      We fully agree with your point regarding the need for clearer labeling of HPV genotypes in Figures 2B and 2C. If each data point could be color-coded to represent the HPV genotype, Figures 2B and 2C would be clearer and provide more information. However, we must acknowledge that due to the limitations of our current graphing software and our graphical expertise, we were unable to fully represent each HPV genotype in the figures. To address this, we have presented the data in Supplementary Table 2. This table shows the HPV genotype for each patient, the corresponding metastasis patterns, and the baseline HPV copy numbers. We hope this will address the limitation of insufficient information in Figure 2.

      The point you raised regarding whether the differences in detection results might stem from variations in subtype detection efficiency rather than cfDNA levels is a valid limitation of this study. Due to the limited sample size, we did not perform subgroup analyses based on different HPV genotypes, which may have introduced bias in the results presented in Figures 2B and 2C. In response, we have added the following clarification in the discussion section (lines 416-422) and addressed this limitation in the limitations section (lines 499-502). Based on your suggestion, we believe that it is essential to expand the sample size and perform subgroup analysis of the baseline copy numbers for each HPV genotype before treatment. We hope to achieve this goal in future studies.

      Thank you for your thoughtful comments regarding the statistical analyses in the study. The p-value in Figure 2B comes from the comparison among five groups, using a two-sided Kruskal-Wallis test. Your suggestion to perform post-hoc pairwise comparisons is excellent and has made the data presentation in the article more rigorous. Following your advice, we conducted pairwise comparisons between the groups. We used the Mann-Whitney U test to compare HPV cfDNA copy numbers between two groups. Since the LR group only had one value, it could not be included in the pairwise comparisons. Significant differences were observed in two comparisons: LNM vs. LN + H + DSM (P = 0.006) and HM vs. LN + H + DSM (P = 0.036). No significant differences were found between the other groups: LNM vs. HM (P = 0.768), LNM vs. LN + HM (P = 0.079), HM vs. LN + HM (P = 0.112), and LN + HM vs. LN + H + DSM (P = 0.145), as determined by the Mann-Whitney U test  (Figure 2B). (Lines 258-263).

      Thank you for your thoughtful suggestion regarding the inclusion of group acronyms in the legends of Figures 2B and 2C. Including the full names corresponding to the abbreviations would indeed enhance clarity. While we attempted to add both acronyms and full names to the figure legend, the full names were too lengthy and impacted the figure's presentation. Therefore, we have provided the full names corresponding to the abbreviations in the figure caption below, to help readers easily understand the abbreviations used in the figure.

      (3) Interpretation of results in Figure 2 and elsewhere: Significant differences detected in Figure 2B could imply potential associations between HPV cfDNA levels (or subtypes) and recurrence/metastasis patterns. Figure 2C shows that there is a difference in cfDNA levels between the groups compared, suggesting an association but this would not necessarily be a direct "correlation". Overall, interpretation of statistical findings would benefit from more precise language throughout the text and overstatement should be avoided.

      Thank you for your insightful comments regarding the interpretation of results in Figure 2 and elsewhere. We acknowledge that there are several limitations in this study, and the interpretation of the results should be more careful and cautious. Indeed, in the results section, there were issues with inaccurate wording and exaggeration. We have made revisions in the discussion section, which are presented as follows: Preliminary results indicate that baseline HPV cfDNA levels may be linked to recurrence/metastasis patterns, potentially reflecting tumor burden and spread (Lines 411-413). Additionally, we have also made changes in the conclusion section, which are presented as follows: The baseline copy number of HPV cfDNA may be associated with metastatic patterns, thereby reflecting tumor burden and the extent of spread to some extent (Lines 511-513).

      (4) The authors state that six patients showed cfDNA elevation with clinically progressive disease, yet only three are represented in Figure 3B1 under "Patients whose disease progressed during treatment." What is the expected baseline variability in cfDNA for patients? If we look at data from patients with early-stage cancer would we see similar fluctuations? And does the degree of variability vary for different HPV subtypes? Without understanding the normal fluctuations in cfDNA levels, interpreting these changes as progression indicators may be premature.

      Thank you for your feedback. We appreciate your thorough review and attention to detail. Six cervical squamous cell carcinoma (SCC) patients exhibited elevated HPV cfDNA levels as their clinical condition progressed. In the previous Figures 3A1 and 3A2, we only presented data from three patients, as we initially believed that displaying the cfDNA curves from three patients would offer a clearer view, while including six patients might lead to overlap and reduce clarity. However, this may have caused confusion for readers. Based on your suggestion, we have revised Figure 3A1 to include the cfDNA curves for all six patients who with squamous cell carcinoma who experienced clinical disease progression during treatment (Figure 3A1), along with the corresponding SCC-Ag curves (Figure 3A2).

      Thank you for highlighting the issue of baseline variability in HPV cfDNA. This is indeed a limitation of our study, which did not address this aspect. If baseline variability is defined as changes in HPV cfDNA levels measured at different time points before treatment in the same patient, fluctuations at different time points are inevitable and objective. Following your suggestion, we have added a discussion on baseline variability in the limitations section of the manuscript to provide readers with a more objective understanding of our study's findings (Lines 501-502).In future studies, we will incorporate baseline variability into the research design to better understand pre-treatment HPV cfDNA fluctuations and provide support for clinical decision-making.

      (5) It would be helpful if where p-values are given, the test used to derive these values was also stated within parentheses e.g. (P < 0.05, permutation test with Benjamini-Hochberg procedure).

      Thank you for your valuable suggestions and examples. Following your advice, we have included the statistical test methods used to obtain the p-values in parentheses wherever they appear in the results section. Additionally, we have specified the statistical test methods for the p-values below the figures in the results section.

      Reviewer #2 (Public review):

      Summary:

      The authors conducted a study to evaluate the potential of circulating HPV cell-free DNA (cfDNA) as a biomarker for monitoring recurrent or metastatic HPV+ cervical cancer. They analyzed serum samples from 28 patients, measuring HPV cfDNA levels via digital droplet PCR and comparing these to squamous cell carcinoma antigen (SCC-Ag) levels in 26 SCC patients, while also testing the association between HPV cfDNA levels and clinical outcomes. The main hypothesis that the authors set out to test was whether circulating HPV cfDNA levels correlated with metastatic patterns and/or treatment response in HPV+ CC.

      The main claims put forward by the paper are that:

      (1) HPV cfDNA was detected in all 28 CC patients enrolled in the study and levels of HPV cfDNA varied over a median 2-month monitoring period.

      (2) 'Median baseline' HPV cfDNA varied according to 'metastatic pattern' in individual patients.

      (3) Positivity rate for HPV cfDNA was more consistent than SCC-Ag.

      (4) In 20 SCC patients monitored longitudinally, concordance with changes in disease status was 90% for HPV cfDNA.

      This study highlights HPV cfDNA as a promising biomarker with advantages over SCC-Ag, underscoring its potential for real-time disease surveillance and individualized treatment guidance in HPV-associated cervical cancer.

      Strengths:

      This study presents valuable insights into HPV+ cervical cancer with potential translational significance for management and guiding therapeutic strategies. The focus on a non-invasive approach is particularly relevant for women's cancers, and the study exemplifies the promising role of HPV cfDNA as a biomarker that could aid personalized treatment strategies.

      Weaknesses:

      While the authors acknowledge the study's small cohort and variability in sequential sampling protocols as a limitation, several revisions should be made to ensure that (1) the findings are presented in a way that aligns more closely with the data without overstatement and (2) that the statistical support for these findings is made more clear. Specific suggestions are outlined below.

      (1) Line 54 in the abstract refers to 'combined multiple-metastasis pattern' but it is not clear what this refers to at this point in the text.

      Thank you for your detailed feedback. You are correct that the "combined multi-metastatic pattern" was not adequately explained in the abstract, which may have caused confusion. To address this, we have clarified the definitions of the combined multi-metastatic pattern and single-metastatic pattern in lines 53-55 of the manuscript. Patients with a combined multi-metastatic pattern (lymph node + hematogenous ± diffuse serosal metastasis)  exhibited a higher median baseline HPV cfDNA level compared to those with a single-metastasis pattern (local recurrence, lymph node metastasis, or hematogenous metastasis) (P = 0.003).

      (2) Line 90 The reference to 'prospective clinical study (NCT03175848) in primary stage IVB CC to investigate the role of radiotherapy (RT) in combination therapy' seems not to be at all relevant at this point in the text. I would limit the description of this study to the methods.

      Thank you for your thoughtful and thorough review. Your suggestions are highly relevant. Upon further reflection, we recognized that this sentence was redundant in its original placement. Following your recommendation, we have removed it from this section and moved it to the methods section (Lines 109-111). The revised statement is as follows: "Notably, 19 cases from the primary CC group participated in our prospective clinical study (NCT03175848), focused on stage IVB cervical cancer."

      (3) Line 56 refers to HPV cfDNA levels (range 0.3-16.9) but what units?

      Thank you for your feedback regarding the manuscript format. While you highlighted this specific issue, we have since identified several other instances of omitted units in parentheses throughout the manuscript. We acknowledge that such formatting oversights can create ambiguity for readers. Following your suggestions, we have corrected all such issues in the manuscript. We greatly appreciate your careful and thorough review.

      (4) Lines 247-248 claim that higher baseline HPV cfDNA levels correlated with a more substantial post-chemotherapy decrease. This correlation should be statistically validated, and the p-value should be included.

      Thank you for your insightful comments, which highlighted an issue with this sentence. Upon review, I have made the necessary revisions. Since no statistical analysis was conducted and the P-value was not provided, the original sentence was imprecise. Given the small sample size, statistical analysis is not feasible. I have revised the sentence as follows: “For patients in whom systemic cytotoxic chemotherapy was effective, a significant decrease in HPV cfDNA levels could be detected after chemotherapy” (Lines 297-298).

      (5) The authors mention that baseline samples were collected "between Day -14 and Day +30 preceding initial treatment." If Day -14 indicates two weeks before treatment, then this would imply some samples were taken up to 30 days post-treatment. This notation should be clarified. To what extent might outliers or more extreme values in Figure 2 driven by variability in how baseline sampling was carried out?

      Thank you for your insightful comments. Undoubtedly, this is indeed a major limitation of our study. These factors could lead to a certain degree of bias in the detection data. The primary reason is that the study was conducted during the COVID-19 pandemic, making it sometimes difficult to conduct sampling regularly. In accordance with your suggestion, I have already added this part of the content to the results section of the article (Lines 266-275). We have also included the variation in baseline sampling as a limitation in the discussion section (Lines 497-499). In future studies, we will strive to improve the study design by ensuring baseline samples are collected prior to treatment, thereby enhancing the reliability of statistical and analytical results.

      (6) Would be useful to amend Figure 1 to show a subset of patients with SCC and a subset of patients who underwent longitudinal monitoring.

      Thank you for your detailed suggestion. Including a subset of pathological types could indeed add more information to Figure 1. However, regarding the pathological types of the patients in this group, we have listed them in Table 1 and Supplementary Table 2. Among the 28 patients, 26 are diagnosed with squamous cell carcinoma, so 92.9% of the patients in this study have squamous cell carcinoma. To avoid making Figure 1 too complex, we decided not to include the pathological type in the figure.

      (7) Line 120 "a time point matching or closely following HPV cfDNA sampling" - what is the time range for 'closely following' here? A couple of hours or days after sampling?

      Thank you for your detailed feedback. Based on your suggestion, we have revised the sentence as follows:

      "For patients with squamous cell CC in the sequential sampling group, concurrent SCC-Ag testing was performed at a time point that matched, or was within 7 days before or after, the HPV cfDNA sampling." (Line 123-125)

      (8) Lines 178-190 and lines 179-180 seem to make exactly the same point.

      Thank you very much for your careful review. Indeed, these two sentences were repetitive and conveyed the same point. I have removed the previous sentence here (lines 206-207).

      (9) In Figure 4, please indicate the number of patients in each group in the legend e.g. HPV16+ (n=x number of patients).

      Thank you for your feedback on the details of Figure 4 and the examples provided. We have updated Figure 4 according to your suggestions and included the number of patients in each group in the figure legend.

      (10) Lines 322-3 'HPV cfDNA predicted treatment response or disease progression at an earlier time point than imaging assessments' - based on the data available and the numbers of patients, I would argue that this is too bold a claim.

      Thank you very much for pointing out this issue. We fully agree with your view. We have modified this sentence as follows: "Secondly, dynamically monitored HPV cfDNA levels appeared to predict treatment response and disease progression. " (Lines 391-392).

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      Chao et al. produced an updated version of the SpliceAI package using modern deep learning frameworks. This includes data preprocessing, model training, direct prediction, and variant effect prediction scripts. They also added functionality for model fine-tuning and model calibration. They convincingly evaluate their newly trained models against those from the original SpliceAI package and investigate how to extend SpliceAI to make predictions in new species. While their comparisons to the original SpliceAI models are convincing on the grounds of model performance, their evaluation of how well the new models match the original's understanding of non-local mutation effects is incomplete. Further, their evaluation of the new calibration functionality would benefit from a more nuanced discussion of what set of splice sites their calibration is expected to hold for, and tests in a context for which calibration is needed.

      Strengths:

      (1) They provide convincing evidence that their new implementation of SpliceAI matches the performance of the original model on a similar dataset while benefiting from improved computational efficiencies. This will enable faster prediction and retraining of splicing models for new species as well as easier integration with other modern deep learning tools.

      (2) They produce models with strong performance on non-human model species and a simple, well-documented pipeline for producing models tuned for any species of interest. This will be a boon for researchers working on splicing in these species and make it easy for researchers working on new species to generate their own models.

      (3) Their documentation is clear and abundant. This will greatly aid the ability of others to work with their code base.

      We thank the reviewer for these positive comments.  

      Weaknesses:

      (1) The authors' assessment of how much their model retains SpliceAI's understanding of "nonlocal effects of genomic mutations on splice site location and strength" (Figure 6) is not sufficiently supported. Demonstrating this would require showing that for a large number of (non-local) mutations, their model shows the same change in predictions as SpliceAI or that attribution maps for their model and SpliceAI are concordant even at distances from the splice site. Figure 6A comes close to demonstrating this, but only provides anecdotal evidence as it is limited to 2 loci. This could be overcome by summarizing the concordance between ISM maps for the two models and then comparing across many loci. Figure 6B also comes close, but falls short because instead of comparing splicing prediction differences between the models as a function of variants, it compares the average prediction difference as a function of the distance from the splice site. This limits it to only detecting differences in the model's understanding of the local splice site motif sequences. This could be overcome by looking at comparisons between differences in predictions with mutants directly and considering non-local mutants that cause differences in splicing predictions.

      We agree that two loci are insufficient to demonstrate preservation of non-local effects. To address this, we have extended our analysis to a larger set of sites: we randomly sampled 100 donor and 100 acceptor sites, applied our ISM procedure over a 5,001 nt window centered at each site for both models, and computed the ISM map as before. We then calculated the Pearson correlation between the collection of OSAI<sub>MANE</sub> and SpliceAI ISM importance scores. We also created 10 additional ISM maps similar to those in Figure 6A, which are now provided in Figure S23.

      Follow is the revised paragraph in the manuscript’s Results section:

      First, we recreated the experiment from Jaganathan et al. in which they mutated every base in a window around exon 9 of the U2SURP gene and calculated its impact on the predicted probability of the acceptor site. We repeated this experiment on exon 2 of the DST gene, again using both SpliceAI and OSAI<sub>MANE</sub> . In both cases, we found a strong similarity between the resultant patterns between SpliceAI and OSAI<sub>MANE</sub> , as shown in Figure 6A. To evaluate concordance more broadly, we randomly selected 100 donor and 100 acceptor sites and performed the same ISM experiment on each site. The Pearson correlation between SpliceAI and OSAI<sub>MANE</sub> yielded an overall median correlation of 0.857 (see Methods; additional DNA logos in Figure S23). 

      To characterize the local sequence features that both models focus on, we computed the average decrease in predicted splice-site probability resulting from each of the three possible singlenucleotide substitutions at every position within 80bp for 100 donor and 100 acceptor sites randomly sampled from the test set (Chromosomes 1, 3, 5, 7, and 9). Figure 6B shows the average decrease in splice site strength for each mutation in the format of a DNA logo, for both tools.

      We added the following text to the Methods section:

      Concordance evaluation of ISM importance scores between OSAI<sub>MANE</sub> and SpliceAI

      To assess agreement between OSAI<sub>MANE</sub> and SpliceAI across a broad set of splice sites, we applied our ISM procedure to 100 randomly chosen donor sites and 100 randomly chosen acceptor sites. For each site, we extracted a 5,001 nt window centered on the annotated splice junction and, at every coordinate within that window, substituted the reference base with each of the three alternative nucleotides. We recorded the change in predicted splice-site probability for each mutation and then averaged these Δ-scores at each position to produce a 5,001-score ISM importance profile per site.

      Next, for each splice site we computed the Pearson correlation coefficient between the paired importance profiles from ensembled OSAI<sub>MANE</sub> and ensembled SpliceAI. The median correlation was 0.857 for all splice sites. Ten additional zoom-in representative splice site DNA logo comparisons are provided in Supplementary Figure S23.

      (2) The utility of the calibration method described is unclear. When thinking about a calibrated model for splicing, the expectation would be that the models' predicted splicing probabilities would match the true probabilities that positions with that level of prediction confidence are splice sites. However, the actual calibration that they perform only considers positions as splice sites if they are splice sites in the longest isoform of the gene included in the MANE annotation. In other words, they calibrate the model such that the model's predicted splicing probabilities match the probability that a position with that level of confidence is a splice site in one particular isoform for each gene, not the probability that it is a splice site more broadly. Their level of calibration on this set of splice sites may very well not hold to broader sets of splice sites, such as sites from all annotated isoforms, sites that are commonly used in cryptic splicing, or poised sites that can be activated by a variant. This is a particularly important point as much of the utility of SpliceAI comes from its ability to issue variant effect predictions, and they have not demonstrated that this calibration holds in the context of variants. This section could be improved by expanding and clarifying the discussion of what set of splice sites they have demonstrated calibration on, what it means to calibrate against this set of splice sites, and how this calibration is expected to hold or not for other interesting sets of splice sites. Alternatively, or in addition, they could demonstrate how well their calibration holds on different sets of splice sites or show the effect of calibrating their models against different potentially interesting sets of splice sites and discuss how the results do or do not differ.

      We thank the reviewer for highlighting the need to clarify our calibration procedure. Both SpliceAI and OpenSpliceAI are trained on a single “canonical” transcript per gene: SpliceAI on the hg 19 Ensembl/Gencode canonical set and OpenSpliceAI on the MANE transcript set. To calibrate each model, we applied post-hoc temperature scaling, i.e. a single learnable parameter that rescales the logits before the softmax. This adjustment does not alter the model’s ranking or discrimination (AUC/precision–recall) but simply aligns the predicted probabilities for donor, acceptor, and non-splice classes with their observed frequencies. As shown in our reliability diagrams (Fig. S16-S22), temperature scaling yields negligible changes in performance, confirming that both SpliceAI and OpenSpliceAI were already well-calibrated. However, we acknowledge that we didn’t measure how calibration might affect predictions on non-canonical splice sites or on cryptic splicing. It is possible that calibration might have a detrimental effect on those, but because this is not a key claim of our paper, we decided not to do further experiments. We have updated the manuscript to acknowledge this potential shortcoming; please see the revised paragraph in our next response.

      (3) It is difficult to assess how well their calibration method works in general because their original models are already well calibrated, so their calibration method finds temperatures very close to 1 and only produces very small and hard to assess changes in calibration metrics. This makes it very hard to distinguish if the calibration method works, as it doesn't really produce any changes. It would be helpful to demonstrate the calibration method on a model that requires calibration or on a dataset for which the current model is not well calibrated, so that the impact of the calibration method could be observed.

      It’s true that the models we calibrated didn’t need many changes. It is possible that the calibration methods we used (which were not ours, but which were described in earlier publications) can’t improve the models much. We toned down our comments about this procedure, as follows.

      Original:

      “Collectively, these results demonstrate that OSAIs were already well-calibrated, and this consistency across species underscores the robustness of OpenSpliceAI’s training approach in diverse genomic contexts.” Revised:

      “We observed very small changes after calibration across phylogenetically diverse species, suggesting that OpenSpliceAI’s training regimen yielded well‐calibrated models, although it is possible that a different calibration algorithm might produce further improvements in performance.”

      Reviewer #2 (Public review):

      Summary:

      The paper by Chao et al offers a reimplementation of the SpliceAI algorithm in PyTorch so that the model can more easily/efficiently be retrained. They apply their new implementation of the SpliceAI algorithm, which they call OpenSpliceAI, to several species and compare it against the original model, showing that the results are very similar and that in some small species, pretraining on other species helps improve performance.

      Strengths:

      On the upside, the code runs fine, and it is well documented.

      Weaknesses:

      The paper itself does not offer much beyond reimplementing SpliceAI. There is no new algorithm, new analysis, new data, or new insights into RNA splicing. There is no comparison to many of the alternative methods that have since been published to surpass SpliceAI. Given that some of the authors are well-known with a long history of important contributions, our expectations were admittedly different. Still, we hope some readers will find the new implementation useful.

      We thank the reviewer for the feedback. We have clarified that OpenSpliceAI is an open-source PyTorch reimplementation optimized for efficient retraining and transfer learning, designed to analyze cross-species performance gains, and supported by a thorough benchmark and the release of several pretrained models to clearly position our contribution.

      Reviewer #3 (Public review):

      Summary:

      The authors present OpenSpliceAI, a PyTorch-based reimplementation of the well-known SpliceAI deep learning model for splicing prediction. The core architecture remains unchanged, but the reimplementation demonstrates convincing improvements in usability, runtime performance, and potential for cross-species application.

      Strengths:

      The improvements are well-supported by comparative benchmarks, and the work is valuable given its strong potential to broaden the adoption of splicing prediction tools across computational and experimental biology communities.

      Major comments:

      Can fine-tuning also be used to improve prediction for human splicing? Specifically, are models trained on other species and then fine-tuned with human data able to perform better on human splicing prediction? This would enhance the model's utility for more users, and ideally, such fine-tuned models should be made available.

      We evaluated transfer learning by fine-tuning models pretrained on mouse (OSAI<sub>Mouse</sub>), honeybee (OSAI<sub>Honeybee</sub>), Arabidopsis (OSAI<sub>Arabidopsis</sub>), and zebrafish (OSAI<sub>Zebrafish</sub>) on human data. While transfer learning accelerated convergence compared to training from scratch, the final human splicing prediction accuracy was comparable between fine-tuned and scratch-trained models, suggesting that performance on our current human dataset is nearing saturation under this architecture.

      We added the following paragraph to the Discussion section:

      We also evaluated pretraining on mouse (OSAI<sub>Mouse</sub>), honeybee (OSAI<sub>Honeybee</sub>), zebrafish (OSAI<sub>Zebrafish</sub>), and Arabidopsis (OSAI<sub>Arabidopsis</sub>) followed by fine-tuning on the human MANE dataset. While cross-species pretraining substantially accelerated convergence during fine-tuning, the final human splicing-prediction accuracy was comparable to that of a model trained from scratch on human data. This result indicates that our architecture seems to capture all relevant splicing features from human training data alone, and thus gains little or no benefit from crossspecies transfer learning in this context (see Figure S24).

      Reviewer #1 (Recommendations for the authors):

      We thank the editor for summarizing the points raised by each reviewer. Below is our point-bypoint response to each comment:

      (1) In Figure 3 (and generally in the other figures) OpenSpliceAI should be replaced with OSAI_{Training dataset} because otherwise it is hard to tell which precise model is being compared. And in Figure 3 it is especially important to emphasize that you are comparing a SpliceAI model trained on Human data to an OSAI model trained and evaluated on a different species.

      We have updated the labels in Figures 3, replacing “OpenSpliceAI” with “OSAI_{training dataset}” to more clearly specify which model is being compared.

      (2) Are genes paralogous to training set genes removed from the validation set as well as the test set? If you are worried about data leakage in the test set, it makes sense to also consider validation set leakage.

      Thank you for this helpful suggestion. We fully agree, and to avoid any data leakage we implemented the identical filtering pipeline for both validation and test sets: we excluded all sequences paralogous or homologous to sequences in the training set, and further removed any sequence sharing > 80 % length overlap and > 80 % sequence identity with training sequences. The effect of this filtering on the validation set is summarized in Supplementary Figure S7C.

      Figure S7. (C) Scatter plots of DNA sequence alignments between validation and training sets for Human-MANE, mouse, honeybee, zebrafish, and Arabidopsis. Each dot represents an alignment, with the x-axis showing alignment identity and the y-axis showing alignment coverage. Alignments exceeding 80% for both identity and coverage are highlighted in the redshaded region and were excluded from the test sets.

      Reviewer #3 (Recommendations for the authors):

      (1) The legend in Figure 3 is somewhat confusing. The labels like "SpliceAI-Keras (species name)" may imply that the model was retrained using data from that species, but that's not the case, correct?

      Yes, “SpliceAI-Keras (species name)” was not retrained; it refers to the released SpliceAI model evaluated on the specified species dataset. We have revised the Figure 3 legends, changing “SpliceAI-Keras (species name)” to “SpliceAI-Keras” to clarify this.

      (2) Please address the minor issues with the code, including ensuring the conda install works across various systems.

      We have addressed the issues you mentioned. OpenSpliceAI is now available on Conda and can be installed with:  conda install openspliceai. 

      The conda package homepage is at: https://anaconda.org/khchao/openspliceai We’ve also corrected all broken links in the documentation.

      (3) Utility:

      I followed all the steps in the Quick Start Guide, and aside from the issues mentioned below, everything worked as expected.

      I attempted installation using conda as described in the instructions, but it was unsuccessful. I assume this method is not yet supported.

      In Quick Start Guide: predict, the link labeled "GitHub (models/spliceai-mane/10000nt/)" appears to be incorrect. The correct path is likely "GitHub (models/openspliceaimane/10000nt/)".

      In Quick Start Guide: variant (https://ccb.jhu.edu/openspliceai/content/quick_start_guide/quickstart_variant.html#quick-startvariant), some of the download links for input files were broken. While I was able to find some files in the GitHub repository, I think the -A option should point to data/grch37.txt, not examples/data/input.vcf, and the -I option should be examples/data/input.vcf, not data/vcf/input.vcf.

      Thank you for catching these issues. We’ve now addressed all issues concerning Conda installation and file links. We thank the editor for thoroughly testing our code and reviewing the documentation.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This fundamental work employed multidisciplinary approaches and conducted rigorous experiments to study how a specific subset of neurons in the dorsal striatum (i.e., "patchy" striatal neurons) modulates locomotion speed depending on the valence of the naturalistic context.

      Strengths:

      The scientific findings are novel and original and significantly advance our understanding of how the striatal circuit regulates spontaneous movement in various contexts.

      We appreciate the reviewer’s positive evaluation.

      Weaknesses:

      This is extensive research involving various circuit manipulation approaches. Some of these circuit manipulations are not physiological. A balanced discussion of the technical strengths and limitations of the present work would be helpful and beneficial to the field. Minor issues in data presentation were also noted.

      We have incorporated the recommended discussion of technical limitations and addressed the physiological plausibility of our manipulations on Page 33 of the revised Discussion section. Specifically, we wrote:

      “Judicious interpretation of the present data must consider the technical limitations of the various methods and circuit-level manipulations applied. Patchy neurons are distributed unevenly across the extensive structure of the striatum, and their targeted manipulation is constrained by viral spread in the dorsal striatum. Somatic calcium imaging using single-photon microscopy captures activity from only a subset of patchy neurons within a narrow focal plane beneath each implanted GRIN lens. Similarly, limitations in light diffusion from optical fibers may reduce the effective population of targeted fibers in both photometry and optogenetic experiments. For example, the more modest locomotor slowing observed with optogenetic activation of striatonigral fibers in the SNr compared to the stronger effects seen with Gq-DREADD activation across the dorsal striatum could reflect limited fiber optic coverage in the SNr. Alternatively, it may suggest that non-striatonigral mechanisms also contribute to generalized slowing. Our photometry data does not support a role for striatopallidal projections from patchy neurons in movement suppression. The potential contribution of intrastriatal mechanisms, discussed earlier, remains to be empirically tested. Although the behavioral assays used were naturalistic, many of the circuit-level interventions were not. Broad ablation or widespread activation of patchy neurons and their efferent projections represent non-physiological manipulations. Nonetheless, these perturbation results are interpreted alongside more naturalistic observations, such as in vivo imaging of patchy neuron somata and axon terminals, to form a coherent understanding of their functional role”.

      Reviewer #2 (Public review):

      Hawes et al. investigated the role of striatal neurons in the patch compartment of the dorsal striatum. Using Sepw1-Cre line, the authors combined a modified version of the light/dark transition box test that allows them to examine locomotor activity in different environmental valence with a variety of approaches, including cell-type-specific ablation, miniscope calcium imaging, fiber photometry, and opto-/chemogenetics. First, they found ablation of patchy striatal neurons resulted in an increase in movement vigor when mice stayed in a safe area or when they moved back from more anxiogenic to safe environments. The following miniscope imaging experiment revealed that a larger fraction of striatal patchy neurons was negatively correlated with movement speed, particularly in an anxiogenic area. Next, the authors investigated differential activity patterns of patchy neurons' axon terminals, focusing on those in GPe, GPi, and SNr, showing that the patchy axons in SNr reflect movement speed/vigor. Chemogenetic and optogenetic activation of these patchy striatal neurons suppressed the locomotor vigor, thus demonstrating their causal role in the modulation of locomotor vigor when exposed to valence differentials. Unlike the activation of striatal patches, such a suppressive effect on locomotion was absent when optogenetically activating matrix neurons by using the Calb1-Cre line, indicating distinctive roles in the control of locomotor vigor by striatal patch and matrix neurons. Together, they have concluded that nigrostriatal neurons within striatal patches negatively regulate movement vigor, dependent on behavioral contexts where motivational valence differs.

      We are grateful for the reviewer’s thorough summary of our main findings.

      In my view, this study will add to the important literature by demonstrating how patch (striosomal) neurons in the striatum control movement vigor. This study has applied multiple approaches to investigate their functionality in locomotor behavior, and the obtained data largely support their conclusions. Nevertheless, I have some suggestions for improvements in the manuscript and figures regarding their data interpretation, accuracy, and efficacy of data presentation.

      We appreciate the reviewer’s overall positive assessment and have made substantial improvements to the revised manuscript in response to reviewers’ constructive suggestions. 

      (1) The authors found that the activation of the striatonigral pathway in the patch compartment suppresses locomotor speed, which contradicts with canonical roles of the direct pathway. It would be great if the authors could provide mechanistic explanations in the Discussion section. One possibility is that striatal D1R patch neurons directly inhibit dopaminergic cells that regulate movement vigor (Nadal et al., Sci. Rep., 2021; Okunomiya et al., J Neurosci., 2025). Providing plausible explanations will help readers infer possible physiological processes and give them ideas for future follow-up studies.

      We have added the recommended data interpretation and future perspectives on Page 30 of the revised Discussion section. Specifically, we wrote:

      “Potential mechanisms by which striatal patchy neurons reduce locomotion involve the suppression of dopamine availability within the striatum. Dopamine, primarily supplied by neurons in the SNc and VTA, broadly facilitates locomotion (Gerfen and Surmeier 2011, Dudman and Krakauer 2016). Recent studies have shown that direct activation of patchy neurons leads to a reduction in striatal dopamine levels, accompanied by decreased walking speed (Nadel, Pawelko et al. 2021, Dong, Wang et al. 2025, Okunomiya, Watanabe et al. 2025). Patchy neuron projections terminate in structures known as “dendron bouquets”, which enwrap SNc dendrites within the SNr and can pause tonic dopamine neuron firing (Crittenden, Tillberg et al. 2016, Evans, Twedell et al. 2020). The present work highlights a role for patchy striatonigral inputs within the SN in decelerating movement, potentially through GABAergic dendron bouquets that limit dopamine release back to the striatum (Dong, Wang et al. 2025). Additionally, intrastriatal collaterals of patch spiny projection neurons (SPNs) have been shown to suppress dopamine release and associated synaptic plasticity via dynorphin-mediated activation of kappa opioid receptors on dopamine terminals (Hawes, Salinas et al. 2017). This intrastriatal mechanism may further contribute to the reduction in striatal dopamine levels and the observed decrease in locomotor speed, representing a compelling avenue for future investigation.”

      (2) On page 14, Line 301, the authors stated that "Cre-dependent mCheery signals were colocalized with the patch marker (MOR1) in the dorsal striatum (Fig. 1B)". But I could not find any mCherry on that panel, so please modify it.

      We have included representative images of mCherry and MOR1 staining in Supplementary Fig. S1 of the revised manuscript.

      (3) From data shown in Figure 1, I've got the impression that mice ablated with striatal patch neurons were generally hyperactive, but this is probably not the case, as two separate experiments using LLbox and DDbox showed no difference in locomotor vigor between control and ablated mice. For the sake of better interpretation, it may be good to add a statement in Lines 365-366 that these experiments suggest the absence of hyperactive locomotion in general by ablating these specific neurons.

      As suggested by the reviewer, we have added the following statement on Page 17 of the revised manuscript: “These data also indicate that PA elevates valence-specific speed without inducing general hyperactivity”.

      (4) In Line 536, where Figure 5A was cited, the author mentioned that they used inhibitory DREADDs (AAV-DIO-hM4Di-mCherrry), but I could not find associated data on Figure 5. Please cite Figure S3, accordingly.

      We have added the citation for the now Fig. S4 on Page 25 of the revised manuscript.

      (5) Personally, the Figure panel labels of "Hi" and "ii" were confusing at first glance. It would be better to have alternatives.

      As suggested by the reviewer, we have now labeled each figure panel with a distinct single alphabetical letter.

      (6) There is a typo on Figure 4A: tdTomata → tdTomato

      We have made the correction on the figure.

      Reviewer #3 (Public review):

      Hawes et al. combined behavioral, optical imaging, and activity manipulation techniques to investigate the role of striatal patch SPNs in locomotion regulation. Using Sepw1-Cre transgenic mice, they found that patch SPNs encode locomotion deceleration in a light-dark box procedure through optical imaging techniques. Moreover, genetic ablation of patch SPNs increased locomotion speed, while chemogenetic activation of these neurons decreased it. The authors concluded that a subtype of patch striatonigral neurons modulates locomotion speed based on external environmental cues. Below are some major concerns:

      The study concludes that patch striatonigral neurons regulate locomotion speed. However, unless I missed something, very little evidence is presented to support the idea that it is specifically striatonigral neurons, rather than striatopallidal neurons, that mediate these effects. In fact, the optogenetic experiments shown in Fig. 6 suggest otherwise. What about the behavioral effects of optogenetic stimulation of striatonigral versus striatopallidal neuron somas in Sepw1-Cre mice?

      Our photometry data implicate striatonigral neurons in locomotor slowing, as evidenced by a negative cross-correlation with acceleration and a negative lag, indicating that their activity reliably precedes—and may therefore contribute to—deceleration. In contrast, photometry results from striatopallidal neurons showed no clear correlation with speed or acceleration.

      Figure 6 demonstrates that optogenetic manipulation within the SNr of Sepw1-Cre<sup>+</sup> striatonigral axons recapitulated context-dependent locomotor changes seen with Gq-DREADD activation of both striatonigral and striatopallidal Sepw1-Cre<sup>+</sup> cells in the dorsal striatum but failed to produce the broader locomotor speed change observed when targeting all Sepw1-Cre<sup>+</sup> cells in the dorsal striatum using either ablation or Gq-DREADD activation. The more subtle speed-restrictive phenotype resulting from ChR activation in the SNr could, as the reviewer suggests, implicate striatopallidal neurons in broad locomotor speed regulation. However, our photometry data indicate that this scenario is unlikely, as activity of striatopallidal Sepw1-Cre<sup>+</sup> fibers is not correlated with locomotor speed. Another plausible explanation is that the optogenetic approach may have affected fewer striatonigral fibers, potentially due to the limited spatial spread of light from the optical fiber within the SNr. Broad locomotor speed change in LDbox might require the recruitment of a larger number of striatonigral fibers than we were able to manipulate with optogenetics. We have added discussion of these technical limitations to the revised manuscript. Additionally, we now discuss the possibility that intrastriatal collaterals may contribute to reduced local dopamine levels by releasing dynorphin, which acts on kappa opioid receptors located on dopamine fibers (Hawes, Salinas et al. 2017), thereby suppressing dopamine release.

      The reviewer also suggests an interesting experiment involving optogenetic stimulation of striatonigral versus striatopallidal somata in Sepw1-Cre mice. While we agree that this approach would yield valuable insights, we have thus far been unable to achieve reliable results using retroviral vectors. Moreover, selectively targeting striatopallidal terminals optogenetically remains technically challenging, as striatonigral fibers also traverse the pallidum, and the broad anatomical distribution of the pallidum complicates precise targeting. This proposed work will need to be pursued in a future study, either with improved retrograde viral tools or the development of additional mouse lines that offer more selective access to these neuronal populations as we documented recently (Dong, Wang et al. 2025).

      In the abstract, the authors state that patch SPNs control speed without affecting valence. This claim seems to lack sufficient data to support it. Additionally, speed, velocity, and acceleration are very distinct qualities. It is necessary to clarify precisely what patch neurons encode and control in the current study.

      We believe the reviewer’s interpretation pertains to a statement in the Introduction rather than the Abstract: “Our findings reveal that patchy SPNs control the speed at which mice navigate the valence differential between high- and low-anxiety zones, without affecting valence perception itself.” Throughout our study, mice consistently preferred the dark zone in the Light/Dark box, indicating intact perception of the valence differential between illuminated areas. While our manipulations altered locomotor speed, they did not affect time spent in the dark zone, supporting the conclusion that valence perception remained unaltered. We appreciate the reviewer’s insight and agree it is an intriguing possibility that locomotor responses could, over time, influence internal states such as anxiety. We addressed this in the Discussion, noting that while dark preference was robust to our manipulations, future studies are warranted to explore the relationship between anxious locomotor vigor and anxiety itself.

      We report changes in scalar measures of animal speed across Light/Dark box conditions and under various experimental manipulations. Separately, we show that activity in both patchy neuron somata and striatonigral fibers is negatively correlated with acceleration—indicating a positive correlation with deceleration. Notably, the direction of the cross-correlational lag between striatonigral fiber activity and acceleration suggests that this activity precedes and may causally contribute to mouse deceleration, thereby influencing reductions in speed. To clarify this, we revised a sentence in the Results section: “Moreover, patchy neuron efferent activity at the SNr may causally contribute to deceleration, as indicated by the negative cross-correlational lag, thereby reducing animal speed.”. We also updated the Discussion to read: “Together, these data specifically implicate patchy striatonigral neurons in slowing locomotion by acting within the SNr to drive deceleration.”

      One of the major results relies on chemogenetic manipulation (Figure 5). It would be helpful to demonstrate through slice electrophysiology that hM3Dq and hM4Di indeed cause changes in the activity of dorsal striatal SPNs, as intended by the DREADD system. This would support both the positive (Gq) and negative (Gi) findings, where no effects on behavior were observed.

      We were unable to perform this experiment; however, hM3Dq has previously been shown to be effective in striatal neurons (Alcacer, Andreoli et al. 2017). The lack of effect observed in Gi-DREADD mice serves as an unintended but valuable control, helping to rule out off-target effects of the DREADD agonist JHU37160 and thereby reinforcing the specificity of hM3Dq-mediated activation in our study. We have now included an important caveat regarding the Gi-DREADD results, acknowledging the possibility that they may not have worked effectively in our target cells: “Potential explanations for the negative results in Gi-DREADD mice include inherently low basal activity among patchy neurons or insufficient expression of GIRK channels in striatal neurons, which may limit the effectiveness of Gi-coupling in suppressing neuronal activity (Shan, Fang et al. 2022).

      Finally, could the behavioral effects observed in the current study, resulting from various manipulations of patch SPNs, be due to alterations in nigrostriatal dopamine release within the dorsal striatum?

      We agree that this is an important potential implication of our work, especially given that we and others have shown that patchy striatonigral neurons provide strong inhibitory input to dopaminergic neurons involved in locomotor control (Nadel, Pawelko et al. 2021, Lazaridis, Crittenden et al. 2024, Dong, Wang et al. 2025, Okunomiya, Watanabe et al. 2025). Accordingly, we have expanded the discussion section to include potential mechanistic explanations that support and contextualize our main findings.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Here are some minor issues for the authors' reference:

      (1) This work supports the motor-suppressing effect of patchy SPNs, and >80% of them are direct pathway SPNs. This conclusion is not expected from the traditional basal ganglia direct/indirect pathway model. Most experiments were performed using nonphysiological approaches to suppress (i.e., ablation) or activate (i.e., continuous chemo-optogenetic stimulation). It remains uncertain if the reported observations are relevant to the normal biological function of patchy SPNs under physiological conditions. Particularly, under what circumstances an imbalanced patch/matrix activity may be induced, as proposed in the sections related to the data presented in Figure 6. A thorough discussion and clarification remain needed. Or it should be discussed as a limitation of the present work.

      We have added discussion and clarification of physiological limitations in response to reviewer feedback. Additionally, we revised the opening sentence of an original paragraph in the discussion section to emphasize that it interprets our findings in the context of more physiological studies reporting natural shifts in patchy SPN activity due to cognitive conflict, stress, or training. The revised opening sentence now reads: “Together with previous studies of naturally occurring shifts in patchy neuron activation, these data illustrate ethologically relevant roles for a subgroup of genetically defined patchy neurons in behavior.”

      (2) Lines 499-500: How striato-nigral cells encode speed and deceleration deserves a thorough discussion and clarification. These striatonigral cells can target both SNr GABAergic neurons and dendrites of the dopaminergic neurons. A discussion of microcircuits formed by the patchy SPNs axons in the SNr GABAergic and SNC DAergic neurons should be presented.

      We have added this point at lines 499–500, including a reference to a relevant review of microcircuitry. Additionally, we expanded the discussion section to address microcircuit mechanisms that may underlie our main findings.

      (3) Line 70: "BNST" should be spelled out at the first time it is mentioned.

      This has been done.

      (4) Line 133: only GCaMP6 was listed in the method, but GCaMP8 was also used (Figure 4). Clarification or details are needed.

      Thank you for your careful attention to detail. We have corrected the typographical errors in the Methods section. Specifically, in the Stereotaxic Injections section, we corrected “GCaMP83” to “GCaMP8s.” In the Fiber Implant section, we removed the incorrect reference to “GCaMP6s” and clarified that GCaMP8s was used for photometry, and hChR2 was used for optogenetics.

      (5) Line 183: Can the authors describe more precisely what "a moment" means in terms of seconds or minutes?

      This has been done.

      (6) Line 288: typo: missing / in ΔF.

      Thank you this has been fixed.

      (7) Line 301-302: the statement of "mCherry and MOR1 colocalization" does not match the images in Figure 1B.

      This has been corrected by proving a new Supplementary Figure S1.

      (8) Related to the statement between Lines 303-304: Figure 1c data may reflect changes in MOR1 protein or cell loss. Quantification of NeuN+ neurons within the MOR1 area would strengthen the conclusion of 60% of patchy cell loss in Figure 1C.

      Since the efficacy of AAV-FLEX-taCasp3 in cell ablation has been well established in our previous publications and those of others (Yang, Chiang et al. 2013, Wu, Kung et al. 2019), we do not believe the observed loss of MOR1 staining in Fig. 1C merely reflects reduced MOR1 expression. Moreover, a general neuronal marker such as NeuN may not reliably detect the specific loss of patchy neurons in our ablation model, given the technical limitations of conventional cell-counting methods like MBF’s StereoInvestigator, which typically exhibit a variability margin of 15–20%.

      (9) Lines 313-314: "Similarly, PA mice demonstrated greater stay-time in the dark zone (Figure 1E)." Revision is needed to better reflect what is shown in Figure 1E and avoid misunderstandings.

      Thank you this has been addressed.

      (10) The color code in Figure 2Gi seems inconsistent with the others? Clarifications are needed.

      Color coding in Figure 2Gi differs from that in 2Eii out of necessity. For example, the "Light" cells depicted in light blue in 2Eii are represented by both light gray and light red dots in 2Gi. Importantly, Figure 2G does not encode specific speed relationships; instead, any association with speed is indicated by a red hue.

      (11) Lines 538-539: the statement of "Over half of the patch was covered" was not supported by Figure 5C. Clarification is needed.

      Thank you. For clarity, we updated the x-axis labels in Figures 1C and 5C from “% area covered” to “% DS area covered,” and defined “DS” as “dorsal striatal” in the corresponding figure legends. Additionally, we revised the sentence in question to read: “As with ablation, histological examination indicated that a substantial fraction of dorsal patch territories, identified through MOR1 staining, were impacted (Fig. 5C).”

      (12) Figure 3: statistical significance in Figure 3 should be labeled in various panels.

      We believe the reviewer's concern pertains to the scatter plot in panel F—specifically, whether the data points are significantly different from zero. In panel 3F, the 95% confidence interval clearly overlaps with zero, indicating that the results are not statistically significant.

      (13) Figures 6D-E: no difference in the speed of control mice and ChR2 mice under continuous optical stimulation was not expected. It was different from Gq-DRADDS study in Figure 5E-F. Clarifications are needed.

      For mice undergoing constant ChR2 activation of Sepw1-Cre<sup>+</sup> SNr efferents, overall locomotor speed does not differ from controls. However, the BIL (bright-to-illuminated) effect on zone transitions is disrupted: activating Sepw1-Cre<sup>+</sup> fibers in the SNr blunts the typical increase in speed observed when mice flee from the light zone toward the dark zone. This impaired BIL-related speed increase upon exiting the light was similarly observed in the Gq-DREADD cohort. The reviewer is correct that this optogenetic manipulation within the SNr did not produce the more generalized speed reductions seen with broader Gq-DREADD activation of all Sepw1-Cre<sup>+</sup> cells in the dorsal striatum. A likely explanation is the difference in targeting—ChR2 specifically activates SNr-bound terminals, whereas Gq-DREADD broadly activates entire Sepw1-Cre<sup>+</sup> cells. Notably, many of the generalized speed profile changes observed with chemogenetic activation are opposite to those resulting from broad ablation of Sepw1-Cre<sup>+</sup> cells.

      The more subtle speed-restrictive phenotype observed with ChR2 activation targeted to the SNr may suggest that fewer striatonigral fibers were affected by this technique, possibly due to the limited spread of light from the fiber optic. Broad locomotor speed change in LDbox might require the recruitment of a larger number of striatonigral fibers than we were able to manipulate with an optogenetic approach. Alternatively, it could indicate that non-striatonigral Sepw1-Cre+ projections—such as striatopallidal or intrastriatal pathways—play a role in more generalized slowing. If striatopallidal fibers contributed to locomotor slowing, we would expect to see non-zero cross-correlations between neural activity and speed or acceleration, along with negative lag indicating that neural activity precedes the behavioral change. However, our fiber photometry data do not support such a role for Sepw1-Cre+ striatopallidal fibers.

      We have also referenced the possibility that intrastriatal collaterals could suppress striatal dopamine levels, potentially explaining the stronger slowing phenotype observed when the entire striatal population is affected, as opposed to selectively targeting striatonigral terminals.

      These technical considerations and interpretive nuances have been incorporated and clarified in the revised discussion section.

      (14) Lines 632: "compliment": a typo?

      Yes, it should be “complement”.

      (15) Figure 4 legend: descriptions of panels A and B were swapped.

      Thank you. This has been corrected.

      6) Friedman (2020) was listed twice in the bibliography (Lines 920-929).

      Thank you. This has been corrected.

      Reviewer #3 (Recommendations for the authors):

      It will be helpful to label and add figure legends below each figure.

      Thank you for the suggestion.

      Editor's note:

      Should you choose to revise your manuscript, if you have not already done so, please include full statistical reporting including exact p-values wherever possible alongside the summary statistics (test statistic and df) and, where appropriate, 95% confidence intervals. These should be reported for all key questions and not only when the p-value is less than 0.05 in the main manuscript. We noted some instances where only p values are reported.

      Readers would also benefit from coding individual data points by sex and noting N/sex.

      We have included detailed statistical information in the revised manuscript. Both male and female mice were used in all experiments in approximately equal numbers. Since no sex-related differences were observed, we did not report the number of animals by sex.

      References

      Alcacer, C., L. Andreoli, I. Sebastianutto, J. Jakobsson, T. Fieblinger and M. A. Cenci (2017). "Chemogenetic stimulation of striatal projection neurons modulates responses to Parkinson's disease therapy." J Clin Invest 127(2): 720-734.

      Crittenden, J. R., P. W. Tillberg, M. H. Riad, Y. Shima, C. R. Gerfen, J. Curry, D. E. Housman, S. B. Nelson, E. S. Boyden and A. M. Graybiel (2016). "Striosome-dendron bouquets highlight a unique striatonigral circuit targeting dopamine-containing neurons." Proc Natl Acad Sci U S A 113(40): 11318-11323.

      Dong, J., L. Wang, B. T. Sullivan, L. Sun, V. M. Martinez Smith, L. Chang, J. Ding, W. Le, C. R. Gerfen and H. Cai (2025). "Molecularly distinct striatonigral neuron subtypes differentially regulate locomotion." Nat Commun 16(1): 2710.

      Dudman, J. T. and J. W. Krakauer (2016). "The basal ganglia: from motor commands to the control of vigor." Curr Opin Neurobiol 37: 158-166.

      Evans, R. C., E. L. Twedell, M. Zhu, J. Ascencio, R. Zhang and Z. M. Khaliq (2020). "Functional Dissection of Basal Ganglia Inhibitory Inputs onto Substantia Nigra Dopaminergic Neurons." Cell Rep 32(11): 108156.

      Gerfen, C. R. and D. J. Surmeier (2011). "Modulation of striatal projection systems by dopamine." Annual review of neuroscience 34: 441-466.

      Hawes, S. L., A. G. Salinas, D. M. Lovinger and K. T. Blackwell (2017). "Long-term plasticity of corticostriatal synapses is modulated by pathway-specific co-release of opioids through kappa-opioid receptors." J Physiol 595(16): 5637-5652.

      Lazaridis, I., J. R. Crittenden, G. Ahn, K. Hirokane, T. Yoshida, A. Mahar, V. Skara, K. Meletis, K. Parvataneni, J. T. Ting, E. Hueske, A. Matsushima and A. M. Graybiel (2024). "Striosomes Target Nigral Dopamine-Containing Neurons via Direct-D1 and Indirect-D2 Pathways Paralleling Classic Direct-Indirect Basal Ganglia Systems." bioRxiv.

      Nadel, J. A., S. S. Pawelko, J. R. Scott, R. McLaughlin, M. Fox, M. Ghanem, R. van der Merwe, N. G. Hollon, E. S. Ramsson and C. D. Howard (2021). "Optogenetic stimulation of striatal patches modifies habit formation and inhibits dopamine release." Sci Rep 11(1): 19847.

      Okunomiya, T., D. Watanabe, H. Banno, T. Kondo, K. Imamura, R. Takahashi and H. Inoue (2025). "Striosome Circuitry Stimulation Inhibits Striatal Dopamine Release and Locomotion." J Neurosci 45(4).

      Shan, Q., Q. Fang and Y. Tian (2022). "Evidence that GIRK Channels Mediate the DREADD-hM4Di Receptor Activation-Induced Reduction in Membrane Excitability of Striatal Medium Spiny Neurons." ACS Chem Neurosci 13(14): 2084-2091.

      Wu, J., J. Kung, J. Dong, L. Chang, C. Xie, A. Habib, S. Hawes, N. Yang, V. Chen, Z. Liu, R. Evans, B. Liang, L. Sun, J. Ding, J. Yu, S. Saez-Atienzar, B. Tang, Z. Khaliq, D. T. Lin, W. Le and H. Cai (2019). "Distinct Connectivity and Functionality of Aldehyde Dehydrogenase 1a1-Positive Nigrostriatal Dopaminergic Neurons in Motor Learning." Cell Rep 28(5): 1167-1181 e1167.

      Yang, C. F., M. C. Chiang, D. C. Gray, M. Prabhakaran, M. Alvarado, S. A. Juntti, E. K. Unger, J. A. Wells and N. M. Shah (2013). "Sexually dimorphic neurons in the ventromedial hypothalamus govern mating in both sexes and aggression in males." Cell 153(4): 896-909.

    1. Author response:

      Reviewer #1 (Public Review):

      The study would benefit from clearer evidence and additional experiments that would help to establish the molecular and cellular mechanisms underlying the brain phenotype, the central topic of the work.

      We agree that additional experiments are necessary to elucidate the mechanism(s) by which EML3 deficiency causes the observed developmental phenotypes. However, as no further experimentation is possible due to the closure of our laboratory, we are committed to sharing available materials—including custom antibodies and cryopreserved sperm from our mouse lines. We will include previously generated experimental data not presented in the original submission. While these additional data do not reveal the mechanisms, we believe that sharing hypotheses that were experimentally ruled out will benefit the scientific community.

      Reviewer #2 (Public Review):

      While the manuscript presents valuable data, there are also several weaknesses that limit the overall impact of the study. Most notably, there is no clear mechanistic link established between the loss of Eml3 function and the observed phenotype, leaving the biological significance of the findings somewhat speculative, as it is not straightforward how a microtubule-associated protein can have an impact on the stability of the pial basement membrane. In this respect, but also in general for the whole manuscript, there seems to be a considerable amount of experimental work that has been conducted but is not presented, possibly due to the negative nature of the results. At least some of those results could be shown, particularly (but not only) the stainings for the composition of the ECM components.

      We agree that additional experiments are necessary to elucidate the mechanisms at play. While we cannot conduct further experiments, we will include additional existing data, including supplemental ECM component staining, in a new figure or panel. As this reviewer rightly anticipated, these results might not clarify the mechanism but sharing the hypotheses that were already experimentally tested will be helpful.

      Additionally, the phenotype reported appears to be dependent on the genetic background, as it is absent in the CD1 strain. This observation raises concerns as to how robust the results are and how much they can be generalized to other mouse strains, but, more importantly, to humans.

      Indeed, we have determined that genetic background greatly influences the manifestation of developmental defects caused by absence or mutation of the EML3 protein in mice. Modifier genes appear to play a significant role in phenotypic expression. In humans, the presence or absence of such modifiers may result in a broad spectrum of outcomes—from no clinical relevance, as seen in CD1 mice, to potential intrauterine mortality. We agree that this underscores the challenge of translating mouse model findings to human implications. Future studies could include a search for EML3 non-coding regulatory mutations and expanded analysis of neuronal development defects, such as COB, as well as cases of intrauterine growth restriction (IUGR).

      There is no data included in the manuscript about the generation and analysis of the Eml3AAA/AAA mouse line. This is an important omission, especially as no details on the validation or phenotypic characterization of this additional mouse line are provided. Including these elements would greatly strengthen the rigor and interpretability of the work, especially if that mouse line is to be shared with the scientific community.

      We acknowledge this oversight and will add a Materials and Methods section describing the generation of Eml3 TQT86AAA mice as well as validation and phenotypic characterizations that were done for that mouse line.

      Reviewer #3 (Public Review):

      Besides the data provided in the figures, the authors report a significant amount of experiments/results as "Data not shown". Negative data is still important data to report, and the authors may want to choose some crucial "not shown data" to report in the manuscript.

      We will incorporate key datasets previously omitted, with priority given to those requested by Reviewer #2.

      Results in Figure 3A apparently contradict results in 3B. A better explanation of the results should improve understanding of the data. Even though the conclusion that the "onset and progression of neurogenesis is normal in Eml3 null mice" seems logical based on the data, the final numbers are not (Figure 3A) and this should be acknowledged, as well.

      We will provide further explanations for the data presented in figures 3A and 3B to better convey the fact that the two datasets are not contradicting. In essence, since Eml3 null mice are developmentally delayed (as determined by the number of somites at a specific age, Fig. 1C), the milestones in neurogenesis are reached at a later age in Eml3 null mice (Fig. 3A). However, Eml3 null mice have reached the same neurogenesis milestones as their WT counterparts when they have the same number of somites (Fig. 3B).

      The authors should define which cell types are identified by SOX1 and PAX6.

      We will expand our manuscript to define the expression timing and cell identity marked by SOX1 and PAX6 in neural progenitors during cortical development.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary: 

      During early Drosophila pupal development, a subset of larval abdominal muscles (DIOMs) is remodelled using an autophagy-dependent mechanism. 

      To better understand this not very well studied process, the authors have generated a transcriptomics time course using dissected abdominal muscles of various stages from wild-type and autophagy-deficient mutants. The authors have further identified a function for BNIP3 in muscle mitophagy using this system. 

      Strengths: 

      (1) The paper does provide a detailed mRNA time course resource for DIOM remodeling. 

      (2) The paper does find an interesting BNIP3 loss of function phenotype, a block of mitophagy during muscle remodeling, and hence identifies a specific linker between mitochondria and the core autophagy machinery. This adds to the mechanism of how mitochondria are degraded. 

      (3) Sophisticated fly genetics demonstrates that the larval muscle mitochondria are, to a large extent, degraded by autophagy during DIOM remodeling. 

      Weaknesses: 

      (1) Mitophagy during DIOM remodeling is not novel (earlier papers from Fujita et al.). 

      (2) The transcriptomics time course data are not well connected to the autophagy part. Both could be separated into 2 independent manuscripts. 

      (3) The muscle phenotypes need better quantifications, both for the EM and light microscopy data in various figures. 

      (4) The transcriptomics data are hard to browse in the provided PDF format. 

      Thank you for reviewing our manuscript and for your feedback. While we understand and appreciate the suggestion to divide the manuscript into two separate studies, we believe that presenting the work as a single manuscript is more appropriate. This is because the time-course RNA-seq of DIOMs provides critical insight into BNIP3-mediated mitophagy during DIOM remodeling, which ties together the two components of our study. In response to Reviewer #1’s recommendations, we have quantified data from both EM and confocal images, and we have revised the RNA counts table in Supplementary File 1 accordingly. Please see our detailed responses and revisions on the following pages.

      Reviewer #2 (Public review): 

      Summary: 

      Autophagy (macroautophagy) is known to be essential for muscle function in flies and mammals. To date, many mitophagy (selective mitochondrial autophagy) receptors have been identified in mammals and other species. While the loss of mitophagy receptors has been shown to impair mitochondrial degradation (e.g., OPTN and NDP52 in Parkin-mediated mitophagy and NIX and BNIP3 in hypoxia-induced mitophagy) at the level of cultured cells, it remains unclear, especially under physiological conditions in vivo. In this study, the authors revealed that one of the receptors BNIP3 plays a critical role in mitochondrial degradation during muscle remodeling in vivo. 

      Overall, the manuscript provides solid evidence that BNIP3 is involved in mitophagy during muscle remodeling with in vivo analyses performed. In particular, all experiments in this study are well-designed. The text is well written and the figures are very clear. 

      Strengths: 

      (1) In each experiment, appropriate positive and negative controls are used to indicate what is responsible for the phenomenon observed by the authors: e.g. FIP200, Atg18, Stx17 siRNAs during DIOM remodeling in Figure 2 and Full, del-LIR, del-MER in Figure 5. 

      (2) Although the transcriptional dynamics of DIOM remodeling during metamorphosis is autophagy-independent, the transcriptome data obtained by the authors would be valuable for future studies. 

      (3) In addition to the simple observation that loss of BNIP3 causes mitochondrial accumulation, the authors further observed that, by combining siRNA against STX17, which is required for fusion of autophagosomes with lysosomes, BNIP3 KO abolishes mitophagosome formation, which will provide solid evidence for BNIP3-mediated mitophagy. Furthermore, using a Gal80 temperature-sensitive approach, the authors showed that mitochondria derived from larval muscle, but not those synthesized during hypertrophy, remain in BNIP3 KO fly muscles. 

      Weaknesses: 

      (1) Because BNIP3 KO causes mitochondrial accumulation, it is expected that adult flies will have some physiological defects, but this has not been fully analyzed or sufficiently mentioned in the manuscript. 

      (2) In Figure 5, the authors showed that BNIP3 binds to Atg18a by co-IP, but no data are provided on whether MER-mut or del-MER attenuates the affinity for Atg18a. 

      Thank you for pointing out the critical issues in the previous version of our manuscript. In this revision, we have conducted several physiological assays using BNIP3 KO flies, as well as co-IP experiments to confirm that the DMER weakens the interaction with Atg18a. We have also addressed all the recommendations provided. Please see our detailed point-by-point responses below.

      Reviewer #3 (Public review): 

      Summary: 

      Fujita et al build on their earlier, 2017 eLife paper that showed the role of autophagy in the developmental remodeling of a group of muscles (DIOM) in the abdomen of Drosophila. Most larval muscles undergo histolysis during metamorphosis, while DIOMs are programmed to regrow after initial atrophy to give rise to temporary adult muscles, which survive for only 1 day after eclosion of the adult flies (J Neurosci. 1990;10:403-1. and BMC Dev Biol 16, 12, 2016). The authors carry out transcriptomics profiling of these muscles during metamorphosis, which is in agreement with the atrophy and regrowth phases of these muscles. Expression of the known mitophagy receptor BNIP3/NIX is high during atrophy, so the authors have started to delve more into the role of this protein/mitophagy in their model. BNIP3 KO indeed impairs mitophagy and muscle atrophy, which they convincingly demonstrate via nice microscopy images. They also show that the already known Atg8a-binding LIR and Atg18a-binding MER motifs of human NIX are conserved in the Drosophila protein, although the LIR turned out to be less critical for in vivo protein function than the MER motif. 

      Strengths: 

      Established methodology, convincing data, in vivo model. 

      Weaknesses: 

      The significance for Drosophila physiology and for human muscles remains to be established. 

      Thank you for reviewing our manuscript. In response to the comment, we have performed lifespan, adult locomotion, and eclosion assays in BNIP3 KO flies. Although we observed substantial mitochondrial accumulation in the DIOMs of BNIP3 KO flies, no significant differences were detected in these physiological assays under our experimental conditions. We plan to further investigate the physiological role of BNIP3 in flies and extend our studies to human muscle in future work. Please see our detailed responses below.

      Reviewer #1 (Recommendations for the authors): 

      Major points: 

      (1) Unfortunately, the RNA counts file table in Supplementary file 1 is a PDF and not an Excel sheet. The labelling makes it unclear from which time points and genotype the listed values on the 650-page files are. 

      We have now corrected the labelling of time points and genotypes in Supplementary File 1 to improve clarity and have provided the updated Excel file.

      Looking at these counts it seems that sarcomere genes (Mhc, bt, sls, wupA, TpnC ) are 10x to 100x lower in sample "ctrl_1" compared to the three other control samples. Which time point is that? It is essential to have access to the full dataset, wild type and autophagy-deficient, to be able to assess the quality of the RNA SEQ data. These need to be deposited in a public database or to be provided in a useful format. 

      Thank you for pointing that out. In the previous version, “Ctrl_1” referred to the Control sample at 1 day APF, when atrophy occurs. We have corrected the labeling in Supplementary File 1 accordingly and have deposited the RNA-seq data to GEO, where it is now publicly available (GSE293359).

      (2) Which statistical test was used to assess the differences in muscle volumes in Figure 2E? I was not able to find a table with the measured data.

      In Figure 2E, we used the Mann-Whitney test for statistical analysis. The raw data used for quantification have also been provided (Supplementary File 2).

      The shown volumes do not correlate with the scheme shown in Figure 2A, in particular at the larval stage the muscle seems much larger.

      We have revised the schematic models of muscle cells in Figures 1C and 2A in accordance with the reviewer’s suggestion.

      (3) It is important to remember that adult Drosophila muscles are not homogenous, at least not the adult leg and abdominal muscles, as they are organised as tubes with myofibrils closer to the surface, and nuclei as well as mitochondria largely in the centre (see PMID 33828099). Hence, only showing a single plane in the muscle images can be very misleading. The authors should at least provide virtual XZ-cross section views in Figure 3G to ensure that similar muscle planes are compared. This applies to the interpretation of both, the mitochondria and the myofibril phenotypes in wildtype vs BNIP3-KO. 

      Thank you for your comment. As suggested, we have added XZ-cross-sectional views in Figure 3G. The XY plane corresponds to a central section of the Z-stack, as indicated in the figure.

      (4) The EM images are nice, however only 2 of the 4 conditions shown were quantified. As the section plane can be misleading, at least several planes should be analysed also for wild type and BNIP3-KO, and not only for stx17 RNAi and the double mutant. 

      In response to the comment, we quantified the TEM images of wild-type and BNIP3-KO DIOMs and added the resulting graph to Figure 4C. The corresponding raw data have also been provided (Supplementary File 2).

      (5) How was Figure 5D, 5D' quantified? What corresponds to "regular", "medium", "high"? A statistical test is missing. I would rather conclude that MIR and LIR are redundant as double mutant appears to be stronger than both singles. This is also concluded in some sections of the text, so the authors seem to contradict themselves. Why not measure the mitochondria areas as done in Figure 6A' instead? 

      In the previous version, we manually categorized pooled, blinded images from different genotypes. However, as the reviewer pointed out, this approach was not quantitative. In the revised version, we analyzed the images using ImageJ to quantify the mitochondrial area per cell. Statistical significance was assessed using the Kruskal-Wallis test. Accordingly, we have revised Figure 5D, the method section, and the figure legend.

      (6) Figure 6B data seem to come from a single image per genotype only. At least 3 or 4 animals should be measured and the values reported. 

      We analyzed Pearson’s correlation coefficients (R values) from at least five images per genotype and performed statistical analysis. The resulting quantification is presented in Figure 6B’, and the corresponding text has been revised accordingly.

      (7) As BNIP3 mutants are viable, it would be interesting to report if they can fly and how long they live. 

      Additional data on adult lifespan, climbing ability, and elapsed time for eclosion in BNIP3 KO flies have been included as supplemental information (Figure 3-figure supplement 2). No significant differences were observed in those assays under our experimental conditions.

      (8) The transcriptomics data are not well linked to the autophagy mechanism. In particular, the mutant transcriptomics data are confusing, as the abstract seems to suggest that blocking autophagy impacts transcriptomics, which is not (strongly) the case. I would at least re-write this part, as it is currently misleading and sparks wrong expectations to the reader. Also throughout the text, the authors need to make clear if there are transcriptomic changes or not and if there are, how these are linked to autophagy. 

      In the abstract, we described the findings as “transcriptional dynamics independent of autophagy” (line 49) because the loss of autophagy had only a minimal effect on transcriptional changes. This conclusion is supported by the data presented in our manuscript. In the result section, we state: “In contrast to our prediction, the knockdown of Atg18a, FIP200, or Stx17 only had a slight impact on transcriptomic dynamics in DIOM remodeling (Fig. 2C), with only minor changes detected (Fig. 2-figure supplement 2G)” (lines 199-201). In the Discussion section, we further note: “The transcriptional dynamics associated with DIOM remodeling are largely independent of autophagy (Fig.2). Instead, our RNA-seq data suggest that it is regulated primarily by ecdysone signaling, with minimal influence from autophagy inhibition” (lines 326-328).

      (9) No table with the measured data is provided. 

      We have provided the raw data files corresponding to all quantified results as Supplementary File 2.

      Minor points: 

      (1) To my knowledge, it is standard to indicate the time after puparium formation in hours, instead of days, (e.g. 24h, 48h etc.). 

      Thank you for the comments. In our previous publications on DIOM remodeling during metamorphosis (PMID: 28063257 and 33077556), we used days rather than hours to indicate developmental time points. To maintain consistency across our studies, we have chosen to continue using days in the present manuscript.

      (2) "Myofibrils typically form beneath the sarcolemma (Mao et al., 2022; Sanger et al., 2010); therefore, when mitochondria accumulate, myofibrils are restricted to the cell periphery." This is quite a general statement that does not always hold, in particular not in Drosophila flight muscles and likely also not in abdominal muscles (see PMIDs 29846170, 28174246). 

      Thank you for pointing that out. We rewrote the sentence as follows: In the absence of BNIP3, mitochondria derived from the larval muscle accumulate and cluster in the cell center, physically obstructing myofibril formation during hypertrophy and restricting myofibrils to the cell periphery (Fig. 6E) (lines 392-394).

      Reviewer #2 (Recommendations for the authors): 

      Suggestions for improved or additional experiments, data or analyses. 

      The authors should test, by a co-IP experiment, whether BNIP3 mutants lose the interaction with HA-Atg18a. 

      As requested, we tested the effect of MER deletion on the interaction between BNIP3 and Atg18a in co-IP experiment. As shown in the new Fig. 5C, the deletion of MER weakened the interaction. This result was confirmed in three independent experiments. Its corresponding text has also been revised as follows: “We confirmed that HA-tagged Drosophila Atg18a co-immunoprecipitated with GFP-tagged full-length Drosophila BNIP3, and that this interaction was attenuated by the deletion of the MER (residues 42-53) (Fig. 5C)” (lines 270-273).

      Minor corrections to the text and figures 

      (1) In the list of authors, Kawaguchi Kohei could be Kohei Kawaguchi_._ 

      Thank you very much. It has been corrected.

      (2) In Fig3D, other receptors (Zonda, CG12511, Key, Ref2P) should be mentioned briefly. 

      Thank you for the suggestion. We have revised the sentences as follows: “The time course RNA-seq data (Fig. 1 and 2) indicated that, among the known mitophagy regulators, only BNIP3 was robustly expressed in 1 d APF DIOMs. In contrast, Zonda, CG12511, Pink1, Park, Key, Ref(2)P, and IKKe—the Drosophila orthologs of FKBP8, FUNDC1, PINK1, Parkin, Optineurin, p62, and TBK1, respectively—showed little or undetectable expression at this stage (Fig. 3D).” (lines 230-234).

      Reviewer #3 (Recommendations for the authors): 

      Remarks: 

      (1) What is the consequence of impaired muscle remodeling on the organismal level? Is the eclosion of adult flies impaired? One could think of assays for this, such as quantifying failed eclosions and/or video microscopy of the eclosion process. Is muscle function impaired? One could measure the contractile force of isolated fibers during electrical stimulation as well, etc. I believe that showing the physiological importance of muscle remodeling would be the biggest advantage that could arise from using a complete animal model.

      We appreciate the comments. We have added data on adult lifespan, climbing ability, and the elapsed time for eclosion in BNIP3 KO flies as supplemental information (Figures 3-figure supplement 2). In BNIP3 KO DIOMs, despite the massive accumulation of mitochondria, an organized peripheral myofibril layer with contractile function is retained. However, we have not measured the contractile force of isolated muscle cells due to technical limitations. We plan to address this in future studies.

      A related note is that I missed the proper discussion of the function and fate of these short-lived adult muscles (please see references in my summary). 

      We have added a sentence regarding the function and fate of DIOMs in the introduction (lines 80-82) as follows: “The remodeled adult DIOMs function during eclosion, persist for approximately 12 hours, and are subsequently eliminated via programmed cell death (Kimura and Truman, 1990; J Neurosci. 1990;10:403-1)”.

      (2) I don't think that "data not shown" should be used these days, when supplemental data allow the inclusion of not-so-critical results. 

      We have added the data as Figure 5-figure supplement 2. As shown in the figure, overexpression of GFP-BNIP3 in 3IL BWMs did not induce the formation of tdTomato-positive autolysosomes, which are abundantly accumulated in DIOMs at 1 and 2 d APF.

      (3) The term "naked mitochondria" does not sound scientific enough to this reviewer. I suggest "cytosolic mitochondria" or "unengulfed mitochondria". 

      In accordance with the reviewer’s suggestion, we have replaced “naked mitochondria” with “unengulfed mitochondria” (lines 251 and 670).

    1. Author Response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public review):

      Summary:

      This work uses a novel, ethologically relevant behavioral task to explore decision-making paradigms in C. elegans foraging behavior. By rigorously quantifying multiple features of animal behavior as they navigate in a patch food environment, the authors provide strong evidence that worms exhibit one of three qualitatively distinct behavioral responses upon encountering a patch: (1) "search", in which the encountered patch is below the detection threshold; (2) "sample", in which animals detect a patch encounter and reduce their motor speed, but do not stay to exploit the resource and are therefore considered to have "rejected" it; and (3) "exploit", in which animals "accept" the patch and exploit the resource for tens of minutes. Interestingly, the probability of these outcomes varies with the density of the patch as well as the prior experience of the animal. Together, these experiments provide an interesting new framework for understanding the ability of the C. elegans nervous system to use sensory information and internal state to implement behavioral state decisions.

      Strengths:

      The work uses a novel, neuroethologically-inspired approach to studying foraging behavior

      The studies are carried out with an exceptional level of quantitative rigor and attention to detail

      Powerful quantitative modeling approaches including GLMs are used to study the behavioral states that worms enter upon encountering food, and the parameters that govern the decision about which state to enter

      The work provides strong evidence that C. elegans can make 'accept-reject' decisions upon encountering a food resource

      Accept-reject decisions depend on the quality of the food resource encountered as well as on internally represented features that provide measurements of multiple dimensions of internal state, including feeding status and time

      Reviewer #2 (Public review):

      This study provides an experimental and computational framework to examine and understand how C. elegans make decisions while foraging environments with patches of food. The authors show that C. elegans reject or accept food patches depending on a number of internal and external factors.

      The key novelty of this paper is the explicit demonstration of behavior analysis and quantitative modeling to elucidate decision-making processes. In particular, the description of the exploring vs. exploiting phases, and sensing vs. non-sensing categories of foraging behavior based on the clustering of behavioral states defined in a multi-dimensional behavior-metrics space, and the implementation of a generalized linear model (GLM) whose parameters can provide quantitative biological interpretations.

      The work builds on the literature of C. elegans foraging by adding the reject/accept framework.

      Reviewer #3 (Public review):

      Summary:

      In this study by Haley et al, the authors investigated explore-exploit foraging using C. elegans as a model system. Through an elegant set of patchy environment assays, the authors built a GLM based on past experience that predicts whether an animal will decide to stay on a patch to feed and exploit that resource, instead of choosing to leave and explore other patches.

      Strengths:

      I really enjoyed reading this paper. The experiments are simple and elegant, and address fundamental questions of foraging theory in a well-defined system. The experimental design is thoroughly vetted, and the authors provide a considerable volume of data to prove their points. My only criticisms have to do with the data interpretation, which I think are easily addressable.

      Weaknesses:

      History-dependence of the GLM

      The logistic GLM seems like a logical way to model a binary choice, and I think the parameters you chose are certainly important. However, the framing of them seem odd to me. I do not doubt the animals are assessing the current state of the patch with an assessment of past experience; that makes perfect logical sense. However, it seems odd to reduce past experience to the categories of recently exploited patch, recently encountered patch, and time since last exploitation. This implies the animals have some way of discriminating these past patch experiences and committing them to memory. Also, it seems logical that the time on these patches, not just their density, should also matter, just as the time without food matters. Time is inherent to memory. This model also imposes a prior categorization in trying to distinguish between sensed vs. not-sensed patches, which I criticized earlier. Only "sensed" patches are used in the model, but it is questionable whether worms genuinely do not "sense" these patches.

      It seems more likely that the worm simply has some memory of chemosensation and relative satiety, both of which increase on patches and decrease while off of patches. The magnitudes are likely a function of patch density. That being said, I leave it up to the reader to decide how best to interpret the data.

      Model design: We agree with the reviewer that past experience is not likely to be discretized into the exact parameters of our model. We have added to our manuscript to further clarify this point (lines 645-647). Investigating the mechanisms behind this behavior is beyond the scope of this project but is certainly an exciting trajectory for future C. elegans research.

      osm-6

      The argument is that osm-6 animals can't sense food very well, so when they sense it, they enter the exploitation state by default. That is what they appear to do, but why? Clearly they are sensing the food in some other way, correct? Are ciliated neurons the only way worms can sense food? Don't they also actively pump on food, and can therefore sense the food entering their pharynx? I think you could provide further insight by commenting on this. Perhaps your decision model is dependent on comparing environmental sensing with pharyngeal sensing? Food intake certainly influences their decision, no? Perhaps food intake triggers exploitation behavior, which can be over-run by chemo/mechanosensory information?

      osm-6 behavior: We thank the reviewer for pointing out the need to further elaborate on a mechanistic hypothesis to explain the behavior of osm-6 sensory mutants. We agree with the reviewer’s speculation that post-ingestive and other non-ciliary sensory cues likely drive detection of food. We have added additional commentary to our manuscript to state this (lines 529-538).

      Impact

      I think this work will have a solid impact on the field, as it provides tangible variables to test how animals assess their environment and decide to exploit resources. I think the strength of this research could be strengthened by a reassessment of their model that would both simplify it and provide testable timescales of satiety/starvation memory.

      Reviewer #2 (Recommendations for the authors):

      The authors have addressed most of my concerns.

      Reviewer #3 (Recommendations for the authors):

      The authors provide a considerable amount of processed data (great, thank you!), but it would be even better if they provided the raw data of the worm coordinates, and when and where these coordinates overlapped with patches. This is the raw data that was ultimately used for all the quantifications in the paper, and would be incredibly useful to readers who are interested in modeling the data themselves.

      This should not be prohibitive.

      Data Availability: We thank the reviewer for pointing out this need. We are uploading all processed data (e.g. worm coordinates relative to the arena and patches) to a curated data storage server. We have updated our data availability statement to state this (lines 684-688).

      Search vs. sample & sensing vs. non-sensing.

      The different definitions of behaviors in Figures 2H-K are a bit confusing. I think the confusion stems in part from the changing terms and color associations in Figures 2 H-K. Essentially the explore density in Figure 2 H is split into two densities based on the two densities (sensing vs. non-responding) observed in Figure 2I. In turn, the sensing density in Figure 2I is split into two densities (explore vs exploit) based on the two densities observed in Figure 2 H. But the way the figures are colored, yellow means search (Figure 2H) and non-responding (Figure 2I), green means exploit (Figure 2H) which includes sensing and non-responding, but also exclusively sensing (Figure 2I), and blue consistently means exploit in both figures. It might help to use two different color codes for Figures 2H and 2I, and then in 2J you define search as explore AND non-responding, sample as explore AND sensing, and exploit as exploit.

      Color schema: While we understand the confusion, we believe that introducing additional colors may also present some misunderstandings. We have decided to leave the figure as it is.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      Two important factors in visual performance are the resolving power of the lens and the signal-to-noise ratio of the photoreceptors. These both compete for space: a larger lens has improved resolving power over a smaller one, and longer photoreceptors capture more photons and hence generate responses with lower noise. The current paper explores the tradeoff of these two factors, asking how space should be allocated to maximize eye performance (measured as encoded information).

      Your summary is clear, concise and elegant. The competition is not just for space, it is for space, materials and energy. We  now emphasise that we are considering these three costs in our rewrites of the Abstract and the first paragraph of the Discussion.  

      Strengths:

      The topic of the paper is interesting and not well studied. The approach is clearly described and seems appropriate (with a few exceptions - see weaknesses below). In most cases, the parameter space of the models are well explored and tradeoffs are clear.  

      Weaknesses:

      Light level

      The calculations in the paper assume high light levels (which reduces the number of parameters that need to be considered). The impact of this assumption is not clear. A concern is that the optimization may be quite different at lower light levels. Such a dependence on light level could explain why the model predictions and experiment are not in particularly good agreement. The paper would benefit from exploring this issue.

      Thank you for raising this point. We briefly explained in our original Discussion, under Understanding the adaptive radiation of eyes (Version 1, Iines 756 – 762), how our method can be modified to investigate eyes adapted for lower light levels. We have some thoughts on how eyes might be adapted. In general, transduction rates are increased by increasing D, reducing f, increasing d<sub>rh</sub> and increasing L . In addition, d<sub>rh</sub> is increased to allow for a larger D within the constraint of eye radius/corneal surface area, and to avoid wasteful oversampling (the changes in D, f and d<sub>rh</sub> increase acceptance angle ∆ρ). We suspect that in eyes optimised for the efficient use of space, materials and energy the increases in L will be relatively small, first because  increasing D, reducing f and increasing d<sub>rh</sub> are much more effective at increasing transduction rate than increasing L. Second, increasing sensitivity by reducing f decreases the cost Vo whereas increasing sensitivity by increasing L increases the cost V<sub>ph</sub>. This disadvantage, together with exponential absorption, might explain why L is only 10% - 20% longer in the apposition eyes of nocturnal bees (Somanathan et al, J. comp. Physiol. A195, 571583, 2009). Because this line of argument is speculative and enters new territory, we have not included it in our revised version. We already present a lot of new material for readers to digest, and we agree with referee 2 that “It is possible to extend the theory to other types of eyes, although it would likely require more variables and assumptions/constraints to the theory. It is thus good to introduce the conceptual ideas without overdoing the applications of the theory”. Nonetheless, we take your point that some of the eyes in our data set might be adapted for lower light levels, and we have rewritten the Discussion section, How efficiently do insects allocate resources within their apposition eyes accordingly. On line 827 – 843 we address the assumption that eyes are adapted for full daylight,  and also take the opportunity  to mention two more reasons for increasing the eye parameter p: namely increasing image velocity (Snyder, 1979), and constructing  bright zones that increase the detectability of small targets (van Hateren et al., 1989; Straw et al., 2006).

      Discontinuities

      The discontinuities and non-monotonicity of the optimal parameters plotted in Figure 4 are concerning. Are these a numerical artifact? Some discussion of their origin would be quite helpful.

      Good points, we now address the discontinuities in the Results, where they are first observed (lines 311 - 319) 

      Discrepancies between predictions and experiment

      As the authors clearly describe, experimental measurements of eye parameters differ systematically from those predicted. This makes it difficult to know what to take away from the paper. The qualitative arguments about how resources should be allocated are pretty general, and the full model seems a complex way to arrive at those arguments. Could this reflect a failure of one of the assumptions that the model rests on - e.g. high light levels, or that the cost of space for photoreceptors and optics is similar? Given these discrepancies between model and experiment, it is also hard to evaluate conclusions about the competition between optics and photoreceptors (e.g. at the end of the abstract) and about the importance for evolution (end of introduction).

      Your misgivings boil down to two issues: what use is a model that fails to fit the data, and do we need a complicated model to show something that seems to be intuitively obvious?  Our study is useful because it introduces new approaches, methods, factors and explanations which advance our analysis and understanding of eye design and evolution. Your comments make it clear that we failed to get this message across and we have revised the manuscript accordingly. We have rewritten the Abstract and the first paragraph of the Discussion to emphasise the value of our new measure of cost, specific volume, by including more of its practical advantages. In particular, our use of specific volume 1) opens the door to the morphospace of all eyes of given type and cost. 2) This allows one to construct performance surfaces across morphospace that not only identify optima, but by evaluating the sub-optimal cast light on efficiency and adaptability. 3) Shows that photoreceptor energy costs have a major impact on design and efficiency, and 4) allows us to calculate and compare the capacities and efficiencies of compound eyes and simple eyes using a superior measure of cost. It is also possible that your dissatisfaction was deepened by disappointment. The first sentence of our original Abstract said that the goal of design is to maximize performance, so you might have expected to see that eyes are optimised.  Given that optimization provides cast iron proof that a system is designed to be efficient, and previous studies of coding by fly LMCs (Laughlin, 1981; Srinivasan et al., 1982 & van Hateren 1992) validated Barlow’s Efficient Coding Hypothesis by showing that coding is optimised, your expectation is reasonable. However, our investigation of how the allocation of resources to optics and photoreceptors affects an eye’s performance, efficiency and design does not depend a priori  on finding optima, therefore we have removed the “maximized”. Our revised Abstract now says, “to improve performance”.  

      In short, our study illustrates an old adage in statistics “All models fail to fit, but some are useful”. As is often the case, the way in which our model fails is useful. In the original version of the Results and Discussion, we argued that the allocation of resources is efficient, and identified factors that can, in principle, explain the scattering of data points. Indeed, our modelling identifies two of these deficiencies; a lack of data on species-specific energy usage, and the need for models that quantify the relationship between the quality of the captured image and the behavioural tasks for which an eye might be specialised. Thus, by examining the model’s failings we identify critical factors and pose new questions for future research.  We have rewritten the Discussion section How efficiently do insects allocate resources…. to make these points. We hope that these revisions will convince you that we have established a starting point for definitive studies, invented a vehicle that has travelled far enough to discover new territory, and shown that it can be modified to cope with difficult terrain.

      Turning to the need for a complicated model, because the costs and benefits depend on elementary optics and geometry, we too thought that there ought to be a simple model. However, when we tried to formulate a simple set of equations that approximate the definitive findings of our more complicated model we discovered that this is not as straightforward as we thought.  Many of the parameters in our model interact to determine costs and benefits, and many of these interactions are non-linear (e.g. the volumes of shells in spheres involve quadratic and cubic terms, and information depends on the log of a square root). So, rather than hold back publication of our complicated model, we decided to explain how it works as clearly as we can and demonstrate its value.

      In response to your final comment, “it is hard to evaluate conclusions about the competition between optics and photoreceptors (e.g. at the end of the abstract) and about the importance for evolution (end of introduction)”, we stand by our original argument. There must be competition in an eye of fixed cost, and because competition favours a heavy investment in photoreceptors, both in theory and in practice, it  is a significant factor in eye design. A match between investments in optics and photoreceptors is predicted by theory and observed in fly NS eyes, therefore this is a design principle. As for evolution, no one would deny that it is important to view the adaptive radiation of eyes through a cost-benefit lens. Our lens is the first to view the whole eye, optics and photoreceptor array, and the first to treat the costs of space, materials and energy. Although the view through our lens is a bit fuzzy, it reveals that costs, benefits and trade-offs are important. Thus we have established a promising starting point for a new and more comprehensive cost-benefit approach to understanding eye design and evolution.  As for the involvement of genes, when there are heritable changes in phenotype genes must be involved and if, as we suggest, efficient resource allocation is beneficial, the developmental mechanisms responsible for allocating resources to optics and photoreceptor array will be playing a formative role in eye evolution.

      Reviewer #2 (Public Review):

      Summary:

      In short, the paper presents a theoretical framework that predicts how resources should be optimally distributed between receptors and optics in eyes.

      Strengths:

      The authors build on the principle of resource allocation within an organism and develop a formal theory for optimal distribution of resources within an eye between the receptor array and the optics. Because the two parts of eyes, receptor arrays and optics, share the same role of providing visual information to the animal it is possible to isolate these from resource allocation in the rest of the animal. This allows for a novel and powerful way of exploring the principles that govern eye design. By clever and thoughtful assumptions/constraints, the authors have built a formal theory of resource allocation between the receptor array and the optics for two major types of compound eye as well as for camera-type eyes. The theory is formalized with variables that are well characterized in a number of different animal eyes, resulting in testable predictions.

      The authors use the theory to explain a number of design features that depend on different optimal distribution of resources between the receptor array and the optics in different types of eyes. As an example, they successfully explain why eye regions with different spatial resolution should be built in different ways. They also explain differences between different types of eyes, such as long photoreceptors in apposition compound eyes and much shorter receptors in camera type eyes. The predictive power in the theory is impressive.

      To keep the number of parameters at a minimum, the theory was developed for two types of compound eye (neural superposition, and apposition) and for camera-type eyes. It is possible to extend the theory to other types of eyes, although it would likely require more variables and assumptions/constraints to the theory. It is thus good to introduce the conceptual ideas without overdoing the applications of the theory.

      The paper extends a previous theory, developed by the senior author, that develops performance surfaces for optimal cost/benefit design of eyes. By combining this with resource allocation between receptors and optics, the theoretical understanding of eye design takes a major leap and provides entirely new sets of predictions and explanations for why eyes are built the way they are.

      The paper is well written and even though the theory development in the Results may be difficult to take in for many biologists, the Discussion very nicely lists all the major predictions under separate headings, and here the text is more tuned for readers that are not entirely comfortable with the formalism of the Results section. I must point out though that the Results section is kept exemplary concise. The figures are excellent and help explain concepts that otherwise may go above the head of many biologists.

      We are heartened by your appreciation of our manuscript - it persuaded us not to undertake extensive revisions – thank you.

      Reviewer #3 (Public Review):

      Summary:

      This is a proposal for a new theory for the geometry of insect eyes. The novel costbenefit function combines the cost of the optical portion with the photoreceptor portion of the eye. These quantities are put on the same footing using a specific (normalized) volume measure, plus an energy factor for the photoreceptor compartment. An optimal information transmission rate then specifies each parameter and resource allocation ratio for a variable total cost. The elegant treatment allows for comparison across a wide range of species and eye types. Simple eyes are found to be several times more efficient across a range of eye parameters than neural superposition eyes. Some trends in eye parameters can be explained by optimal allocation of resources between the optics and photoreceptors compartments of the eye.

      Strengths:

      Data from a variety of species roughly align with rough trends in the cost analysis, e.g. as a function of expanding the length of the photoreceptor compartment.

      New data could be added to the framework once collected, and many species can be compared.

      Eyes of different shapes are compared.

      Weaknesses:

      Detailed quantitative conclusions are not possible given the approximations and simplifying assumptions in the models and poor accounting for trends in the data across eye types.

      Reviewer #1 (Recommendations For The Authors):

      Figure 1: Panel E defines the parameters described in panel d. Consider swapping the order of those panels (or defining D and Delta Phi in the figure legend for d). Order follows narrative, eye types then match 

      We think that you are referring to Figure 1. We modified the legend.

      Lines 143-145: How does a different relative cost impact your results?

      Thank you for raising this question. Because our assumption that relative costs are the same is our starting point, and for optics it is not an obvious mistake, we do not raise your question here. We address your question where you next raise it because, for photoreceptors the assumption is obviously wrong.  We now emphasise that our method for accounting for photoreceptor energy costs can be applied to other costs. 

      Lines 187-190: Same as above - how do your results change if this assumption is not accurate?

      We have revised our manuscript to emphasise that we are dealing with the situation in which our initial assumption (costs per unit volume are equal) breaks down. On (lines 203 - 208) we write “ However, this assumption breaks down when we consider specific metabolic rates. To enable and power phototransduction, photoreceptors have an exceptionally high specific metabolic rate (energy consumed per gram, and hence unit volume, per second) (Laughlin et al., 1998; Niven et al., 2007; Pangršič et al., 2005). We account for this extra cost by applying an energy surcharge, S<sub>E</sub>. To equate…. 

      We also revised part of the Discussion section, Specific volume is a useful measure of cost to make it clear that we are able take account for situations in which the costs per unit volume are not equal, and we give our treatment of photoreceptor energy costs as an example of how this is done. On lines 626 - 640 we say  

      Cost estimates can be adjusted for situations in which costs per unit volume are not equal, as illustratedby our treatment of photoreceptor energy consumption.  To support transduction the photoreceptor array has an exceptionally high metabolic rate (Laughlin et al., 1998; Niven et al., 2007; Pangršič et al., 2005). We account forthis higher energy cost by using the animal’s specific metabolic rate (power per unit mass and hence power per unit volume) to convert an array’s power consumption into an equivalent volume (Methods). Photoreceptor ion pumps are the major consumers of energy and the smaller contribution of pigmented glia (Coles, 1989) is included in our calculation of the energy tariff K<sub>E</sub>. (Methods) The higher costs of materials and their turnover in the photoreceptor array can be added the energy tariff K<sub>E</sub> but given the magnitude of the light-gated current (Laughlin et al., 1998) the relative increase will be very small. Thus for our intents and purposes the effects of these additional costs are covered by our models. For want of sufficient data…”.

      Reviewer #2 (Recommendations For The Authors):

      A few comments for consideration by the authors:

      (1) In the abstract, Maybe give another example explaining why other eyes should be different to those of fast diurnal insects.

      This worthwhile extrapolation is best kept to the Discussion.

      (2) Would it be worthwhile mentioning that the photopigment density is low in rhabdoms compared to vertebrate outer segments? This will have major effects on the relative size of retina and optics.

      Thank you, we now make this good point in the Discussion (lines 698-702).

      (3) It took me a while to understand what you mean by an energy tariff. For the less initiated reader many other variables may be difficult to comprehend. A possible remedy would be to make a table with all variables explained first very briefly in a formal way and then explained again with a few more words for readers less fluent in the formalism.

      A very useful suggestion. We have taken your advice (p.4).

      (4) The "easy explanation" on lines 356-357 need a few more words to be understandable.

      We have expanded this argument, and corrected a mistake, the width of the head front to back is not 250 μm, it is 600 μm (lines 402-407)

      (5) Maybe devote a short paragraph in the Discussion to other types of eye, such as optical superposition eyes and pinhole eyes. This could be done very shortly and without formalism. I'm sure the authors already have a good idea of the optimal ratio of receptor arrays and optics in these eye types.

      We do not discuss this because we have not found a full account of the trade-offs and their  effects on costs and benefits. We hope that our analysis of apposition and simple eyes will encourage people to analyse the relationships between costs and benefits in other eye types. To this end we pointed out in the Discussion that recent advances in imaging and modelling could be helpful.

      (6)  Could the sentence on lines 668-671 be made a little clearer?

      “Efficiency is also depressed by increasing the photoreceptor energy tariff K<sub>E</sub>, and in line with the greater impact of photoreceptor energy costs in simple eyes, the reduction in efficiency is much greater in simple eyes (Figure 8b).0.

      We replaced this sentence with “In both simple and apposition eyes efficiency is reduced by increasing the photoreceptor energy tariff K<sub>E</sub>. This effect is much greater in simple eyes, thus as found for reductions in photoreceptor length (Figure 7b),K<sub>E</sub> has more impact on the design of simple eyes” (lines749 – 752).

      (7)  I have some reservations about the text on lines 789-796. The problem is that optics can do very little to improve the performance of a directional photoreceptor where delrho should optimally be very wide. Here, membrane folding is the only efficient way to improve performance (SNR). The option to reduce delrho for better performance comes later when simultaneous spatial resolution (multiple pixels) is introduced.

      Yes, we have been careless. We have rewritten this paragraph to say (lines 920-931)

      “Two key steps in the evolution of eyes were the stacking of photoreceptive membranes to absorb more photons, and the formation of optics to intercept more photons and concentrate them according to angle of incidence to form an image (Nilsson, 2013, 2021). Our modelling of well-developed image forming eyes shows that to improve performance stacked membranes (rhabdomeres) compete with optics for the resources invested in an eye, and this competition profoundly influences both form and function. It is likely that competition between optics and photoreceptors was shaping eyes as lenses evolved to support low resolution spatial vision. Thus the developmental mechanisms that allocate resources within modern high resolution eyes (Casares & MacGregor, 2021), by controlling cell size and shape, and as our study emphasises, gradients in size and shape across an eye, will have analogues or homologues in more ancient eyes. Their discovery….” (lines 920-931

      Reviewer #3 (Recommendations For The Authors):

      Suggestions for major revisions:

      While the approach is novel and elegant, the results from the analysis of insect morphology do not broadly support the optimization argument and hardly constrain parameters, like the energy tariff value, at all. The most striking result of the paper is the flat plateau in information across a broad range of shape parameters and the length, and resolution trend in Figure 5.

      At no point in the Results and Discussion do we argue that resource allocation is optimized. Indeed, we frequently observe that it is not. Our mistake was to start the Abstract by observing that animals evolve to minimise costs. We have rewritten the Abstract accordingly.

      The information peaks are quite shallow. This might actually be a very important and interesting result in the paper - the fact that the information plateaus could give the insect eye quite a wide range of parameters to slide between while achieving relatively efficient sensing of the environment. Instead of attempting to use a rather ad hoc and poorly supported measure of energetics in PR cost, perhaps the pitch could focus on this flexibility. K<sub>E</sub> does not seem to constrain eye parameters and does not add much to the paper.

      We agree, being able to construct performance surfaces across morphospace is an important advance in the field of eye design and evolution, and the performance surface’s flat top has interesting implications for the evolution of adaptations. Encouraged by your remarks, we have rewritten the Abstract and the introductory paragraph of the Discussion to draw attention to these points. 

      We are disappointed that we failed to convince you that our energy tariff, K<sub>E</sub> , is no better than a poorly supported ad hoc parameter that does not add much to the paper. In our opinion a resource allocation model that ignores photoreceptor energy consumption is obviously inadequate because the high energy cost of phototransduction is both wellknown and considered to be a formative factor in eye evolution (Niven and Laughlin, 2008). One of the advantages of modelling is that one can assess the impact of factors that are known to be present, are thought to be important, but have not been quantified. We followed standard modelling practice by introducing a cost that has the same units as the other costs and, for good physiological reasons, increases linearly with the number of microvilli, according to K<sub>E</sub>. We then vary this unknown cost parameter to discover when and why it is significant. We were pleased to discover that we could combine data on photoreceptor energy demands and whole animal metabolic rates to establish the likely range of K<sub>E</sub>. This procedure enabled us to unify the cost-benefit analyses of optics and photoreceptors, and to discover that realistic values of K<sub>E</sub> have a profound impact on the structure and performance of an efficient eye. We hope that this advance will encourage people to collect the data needed to evaluate K<sub>E</sub>.To emphasise the importance of K<sub>E</sub> and dispel doubts associated with the failure of the model to fit the data, we have revised two sections:  Flies invest efficiently in costly photoreceptor arrays in the Results, and How efficiently do insects allocate resources within their apposition eyes?  in the Discussion. These rewrites also explain why it is impossible for us to infer K<sub>E</sub> by adjusting its value so that the model’s predictions fit the data.

      The graphics after Figure 3 are quite dense and hard to follow. None of the plateau extent shown in Fig 3 is carried through to the subsequent plots, which makes the conclusions drawn from these figures very hard to parse. If the peak information occurs on a flat plateau, it would be more helpful to see those ranges of parameters displayed in the figures.

      Ideally one should do as you suggest and plot the extent of the plateau, but in our situation this is not very helpful. In the best data set, flies, optimised models predict D well, get close to ∆φ in larger eyes, and demonstrate that these optimum values are not very sensitive to K<sub>E</sub> L is a different matter, it is very sensitive to K<sub>E</sub> L which, as we show (and frequently remind) is poorly constrained by experimental data. The best we can do is estimate the envelope of L vs C<sub>tot</sub>  curves, as defined by a plausible range of K<sub>E</sub>L . Because most of the plateau boundaries you ask for will fall within this envelope, plotting them does little to clear the fog of uncertainty. We note that all three referees agree that our model can account for two robust trends, i) in apposition eyes L increase with optical resolving power and acuity, both within individual eyes and among eyes of different sizes, and ii) L is much longer is apposition eyes than in simple eyes. Nonetheless, the scatter of data points and their failure to fit creates a bad impression. We gave a number of reasons why the model does not fit the data points, but these were scattered throughout the Results and Discussion and, as referees 1 and 3 point out, this makes it difficult to draw convincing conclusions. To rectify this failing, we have rewritten two sections, in the Results Flies invest efficiently in costly photoreceptor arrays and in the Discussion, How efficiently do insects allocate resources within their apposition eyes?, to discuss these reasons en bloc, draw conclusions and suggest how better data and refinements to modelling could resolve these issues.  

      Throughout the figures, the discontinuities in the optimal cuts through parameter space are not sufficiently explained.

      We added a couple of sentences that address the “jumps” (lines 313 – 318)

      None of the data seems to hug any of the optimal lines and only weakly follow the trends shown in the plots. This makes interpretation difficult for the reader and should be better explained. The text can be a little telegraphic in the Results after roughly page 10, and requires several readings to glean insight into the manuscript's conclusions.

      We revised the Results section in which we compare the best data set, flies’  NS eyes with theoretical predictions, Flies invest efficiently in costly photoreceptor arrays,  to expand our interpretation of the data and clarify our arguments. The remaining sections have not been expanded. In the next section, which is on fused rhabdom apposition eyes, our interpretation of the scattering of data points follows the same line of argument. The remaining Results sections are entirely theoretical.  

      Overall, the rough conclusions outlined in the Results seem moderately supported by the matches of the data to the optimal information transmission cuts through parameter space, but only weakly.

      We agree, more data is required to test and refine our theoretical predictions.

      The Discussion is long and well-argued, and contains the most cogent writing in the manuscript.

      Thank you: this is most pleasing. We submitted our study to eLife because it allows longer Discussions, but we worried that ours was too long. However, we felt that our extensive Discussion was necessary for two reasons. First, we are introducing a new approach to understanding of eye design and evolution. Second, because the data on eye morphology and costs are limited, we had to make a number of assumptions and by discussing these, warts and all, we hoped to encourage experimentalists to gather more data and focus their efforts on the most revealing material.  

      Minor comments:

      We have acted upon most of your minor comments and we confine our remarks to our disagreements. We are grateful for your attention to details that we \textshould have picked up on.  

      It's a more standard convention to say "cost-benefit" rather than with a colon. 

      "equation" should be abbreviated "eq" or "eqn", never with a "t"

      when referring to the work of van Hateren, quote the paper and the database using "van Hateren" not just "Hateren"

      small latex note: use "\textit{SNR}" to get the proper formatting for those letters when in the math environment

      Line 100-110: "f" is introduced, but only f' is referenced in the figure. This should be explained in order. d_rh is not included in the figure. Also in this section, d_rh/f is also referenced before \Delta \rho_rf, which is the same quantity, without explanation.  

      Figure 1 shows eye structure and geometry. f’ is a lineal dimension of the eye but f is not, so f is not shown in Fig 1e. We eliminated the confusion surrounding ∆ρ<sub>rh</sub>  by deleting “and changing the acceptance angle of the photoreceptive waveguide ∆ρ<sub>rh</sub> (Snyder, 1979)”.  

      Fig 1 caption: this says "From dorsal to ventral," then describes trends that run ventral to dorsal, which is a confusing typo.

      Fig 3 - adding some data points to these plots might help the reader understand how (or if) K_E is constrained by the data.

      It is not possible to add data points because to total cost, Ctot ,is unknown.

      Fig 4c (and in other subplots): the jumps in L with C_tot could be explained better in the text - it wasn't clear to this reviewer why there are these discontinuities.

      Dealt with in the revised text (lines  310-318).

      Fig 4d: The caption for this subplot could be more clearly written.

      We have rewritten the subscript for subplot 4d.

      Fig 5 and other plots with data: please indicate which symbols are samples from the same species. This info is hard to reconstruct from the tables.

      We have revised Figure 5 accordingly. Species were already indicated in Figure 6.

      Line 328: missing equation number

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The objective of this research is to understand how the expression of key selector transcription factors, Tal1, Gata2, Gata3, involved in GABAergic vs glutamatergic neuron fate from a single anterior hindbrain progenitor domain is transcriptionally controlled. With suitable scRNAseq, scATAC-seq, CUT&TAG, and footprinting datasets, the authors use an extensive set of computational approaches to identify putative regulatory elements and upstream transcription factors that may control selector TF expression. This data-rich study will be a valuable resource for future hypothesis testing, through perturbation approaches, of the many putative regulators identified in the study. The data are displayed in some of the main and supplemental figures in a way that makes it difficult to appreciate and understand the authors' presentation and interpretation of the data in the Results narrative. Primary images used for studying the timing and coexpression of putative upstream regulators, Insm1, E2f1, Ebf1, and Tead2 with Tal1 are difficult to interpret and do not convincingly support the authors' conclusions. There appears to be little overlap in the fluorescent labeling, and it is not clear whether the signals are located in the cell soma nucleus.

      Strengths:

      The main strength is that it is a data-rich compilation of putative upstream regulators of selector TFs that control GABAergic vs glutamatergic neuron fates in the brainstem. This resource now enables future perturbation-based hypothesis testing of the gene regulatory networks that help to build brain circuitry.

      We thank Reviewer #1 for the thoughtful assessment and recognition of the extensive datasets and computational approaches employed in our study. We appreciate the acknowledgment that our efforts in compiling data-rich resources for identifying putative regulators of key selector transcription factors (TFs)—Tal1, Gata2, and Gata3—are valuable for future hypothesis-driven research.

      Weaknesses:

      Some of the findings could be better displayed and discussed.

      We acknowledge the concerns raised regarding the clarity and interpretability of certain figures, particularly those related to expression analyses of candidate upstream regulators such as Insm1, E2f1, Ebf1, and Tead2 in relation to Tal1. We agree that clearer visualization and improved annotation of fluorescence signals are crucial to accurately support our conclusions. In our revised manuscript, we will enhance image clarity and clearly indicate sites of co-expression for Tal1 and its putative regulators, ensuring the results are more readily interpretable. Additionally, we will expand explanatory narratives within the figure legends to better align the figures with the results section.

      Reviewer #2 (Public review):

      Summary:

      In the manuscript, the authors seek to discover putative gene regulatory interactions underlying the lineage bifurcation process of neural progenitor cells in the embryonic mouse anterior brainstem into GABAergic and glutamatergic neuronal subtypes. The authors analyze single-cell RNA-seq and single-cell ATAC-seq datasets derived from the ventral rhombomere 1 of embryonic mouse brainstems to annotate cell types and make predictions or where TFs bind upstream and downstream of the effector TFs using computational methods. They add data on the genomic distributions of some of the key transcription factors and layer these onto the single-cell data to get a sense of the transcriptional dynamics.

      Strengths:

      The authors use a well-defined fate decision point from brainstem progenitors that can make two very different kinds of neurons. They already know the key TFs for selecting the neuronal type from genetic studies, so they focus their gene regulatory analysis squarely on the mechanisms that are immediately upstream and downstream of these key factors. The authors use a combination of single-cell and bulk sequencing data, prediction and validation, and computation.

      We also appreciate the thoughtful comments from Reviewer #2, highlighting the strengths of our approach in elucidating gene regulatory interactions that govern neuronal fate decisions in the embryonic mouse brainstem. We are pleased that our focus on a critical cell-fate decision point and the integration of diverse data modalities, combined with computational analyses, has been recognized as a key strength.

      Weaknesses:

      The study generates a lot of data about transcription factor binding sites, both predicted and validated, but the data are substantially descriptive. It remains challenging to understand how the integration of all these different TFs works together to switch terminal programs on and off.

      Reviewer #2 correctly points out that while our study provides extensive data on predicted and validated transcription factor binding sites, clearly illustrating how these factors collectively interact to regulate terminal neuronal differentiation programs remains challenging. We acknowledge the inherently descriptive nature of the current interpretation of our combined datasets.

      In our revision, we will clarify how the different data types support and corroborate one another, highlighting what we consider the most reliable observations of TF activity. Additionally, we will revise the discussion to address the challenges associated with interpreting the highly complex networks of interactions within the gene regulatory landscape.

      We sincerely thank both reviewers for their constructive feedback, which we believe will significantly enhance the quality and accessibility of our manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The results in Figure 3 and several associated supplements are mainly a description/inventory of putative CREs some of which are backed to some extent by previous transgenic studies. But given the way the authors chose to display the transgenic data in the Supplements, it is difficult to fully appreciate how well the transgenic data provide functional support. Take, for example, the Tal +40kb feature that maps to a midbrain enhancer: where exactly does +40kb map to the enhancer region? Is Tal +40kb really about 1kb long? The legend in Supplemental Figure 6 makes it difficult to interpret the bar charts; what is the meaning of: features not linked to gene -Enh? Some of the authors' claims are not readily evident or are inscrutable. For example, Tal locus features accessible in all cell groups are not evident (Fig 2A,B). Other cCREs are said to closely correlate with selector expression for example, Tal +.7kb and +40kb. However, inspection of the data seems to indicate that the two cCREs have very different dynamics and only +40kb seems to correlate with the expression track above it. Some features are described redundantly such as the Gata2 +22 kb, +25.3 kb, and +32.8 kb cCREs above and below the Gata3 cCRE. What is meant by: The feature is accessible at 3' position early, and gains accessibility at 5' positions ... Detailed feature analysis later indicated the binding of Nkx6-1 and Ascl1 that are expressed in the rV2 neuronal progenitors, at 3' positions, and binding of Insm1 and Tal1 TFs that are activated in early precursors, at 5' positions (Figure 3C).

      To allow easier assessment of the overlap of the features described in this study in reference to the transgenic studies, we have added further information about the scATAC features, cCREs and previously published enhancers, as well as visual schematics of the feature-enhancer overlaps in the Supplementary table 4. The Supplementary Table 4 column contents are also now explained in detail in the table legend (under the table). We hope those changes make the feature descriptions clearer. To answer the reviewer's question about the Tal1+40kb enhancer, the length of the published enhancer element is 685 bp and the overlapping scATAC feature length is 2067 bp (Supplementary Table 3, sheet Tal1, row 103).

      The legend and the chart labelling in the Supplementary Figure 5 (formerly Supplementary figure 6) have been elaborated, and the shown categories explained more clearly.

      Regarding the features at the Tal1 locus, the text has been revised and the references to the features accessible in all cell groups were removed. These features showed differences in the intensity of signal but were accessible in all cell groups. As the accessibility of these features does not correlate with Tal1 expression, they are of less interest in the context of this paper.

      The gain in accessibility of the +0.7kb and +40 kb features correlates with the onset of Tal1 RNA expression. This is now more clearly stated in the text, as " For example, the gain in the accessibility of Tal1 cCREs at +0.7 and +40 kb correlated temporally with the expression of Tal1 mRNA (Figure 2B), strongly increasing in the earliest GABAergic precursors (GA1) and maintained at a lower level in the more mature GABAergic precursor groups (GA2-GA6), " (Results, page 4). The reviewer is right that the later dynamics of the +0.7 and +40 cCREs differ and this is now stated more clearly in the text (Results, page 5, last chapter).

      The repetition in the description of the Gata2 +22 kb, +25.3 kb, and +32.8 kb cCREs has been removed.

      The Tal1 +23 kb cCRE showed within-feature differences in accessibility signal. This is explained in the text on page 5, referring to the relevant figure 2A, showing the accessibility or scATAC signal in cell groups and the features labelled below, and 3C, showing the location of the Nkx6-1 and Ascl1 binding sites in this feature: "The Tal1 +23 kb cCRE contained two scATAC-seq peaks, having temporally different patterns of accessibility. The feature is accessible at 3' position early, and gains accessibility at 5' positions concomitant with GABAergic differentiation (Figure 2A, accessibility). Detailed feature analysis later indicated that the 3' end of this feature contains binding sites of Nkx6-1 and Ascl1 that are expressed in the rV2 neuronal progenitors, while the 5' end contains TF binding sites of Insm1 and Tal1 TFs that are activated in early precursors (described below, see Figure 3C)."

      (2) Supplementary Figure 3 is not presented in the Results.

      Essential parts of previous Supplementary Figure 3 have been incorporated into the Figure 4 and the previous Supplementary Figure omitted.

      (3) The significance of Figure 3 and the many related supplements is difficult to understand. A large number of footprints with wide-ranging scores, many very weak or unbound, are displayed in the various temporal cell groups in different epigenomic regions of Tal1 and Vsx2. The footprints for GA1 and Ga2 are combined despite Tal1 showing stronger expression in GA1 and stronger accessibility (Figure 2). Many possibilities are outlined in the Results for how the many different kinds of motifs in the cCREs might bind particular TFs to control downstream TF expression, but no experiments are performed to test any of the possibilities. How well do the TOBIAS footprints align with C&T peaks? How was C&T used to validate footprints? Are Gata2, 3, and Vsx2 known to control Tal1 expression from perturbation experiments?

      Figure 3 and related supplements present examples of the primary data and summarise the results of comprehensive analysis. The methods of identifying the selector TF regulatory features and the regulators are described in the Methods (Materials and Methods page 16). Briefly, the correlation between feature accessibility and selector TF RNA expression (assessed by the LinkPeaks score and p-value) were used to select features shown in the Figure 3.

      We are aware of differences in Tal1 expression and accessibility between GA1 and GA2. However, number of cells in GA2 was not high enough for reliable footprint calculations and therefore we opted for combining related groups throughout the rV2 lineage for footprinting.

      As suggested, CUT&Tag could be used to validate the footprinting results with some restrictions. In the revised manuscript, we included analysis of CUT&Tag peak location and footprints similarly to an earlier study (Eastman et al. 2025). In summary, we analysed whether CUT&Tag peaks overlap locations in which footprinting was also recognized and vice versa. Per each TF with CUT&Tag data we calculated a) Total number of CUT&Tag consensus peaks b) Total number of bound TFBS (footprints) c) Percentage of CUT&Tag overlapping bound TFBS d) Percentage of bound TFBS overlapping CUT&Tag. These results are shown in Supplementary Table 6 and in Supplementary figure 11 with analysis described in Methods (Materials and Methods, page 19). There is considerable overlap between CUT&Tag peaks and bound footprints, comparable to one shown in Eastman et al. 2025. However, these two methods are not assumed to be completely matching for several reasons: binding by related/redundant TFs, antigen masking in the TF complex, chromatin association without DNA binding, etc. In addition, some CUT&Tag peaks with unbound footprints could arise from non-rV2 cells that were part of the bulk CUT&Tag analysis but not of the scATAC footprint analysis.

      The evidence for cross-regulation of selector genes and the regulation of Tal1 by Gata2, Gata3 and Vsx2 is now discussed (Discussion, chapter Selector TFs directly autoregulate themselves and cross-regulate each other, page 12-13). The regulation of Tal1 expression by Vsx2 has, to our knowledge, not been earlier studied.

      (4) Figure 4 findings are problematic as the primary images seem uninterpretable and unconvincing in supporting the authors' claims. There is a lack of clear evidence in support of TF coexpression and that their expression precedes Tal1.

      Figure 4 has been entirely redrawn with higher resolution images and a more logical layout. In the revised Figure 4, only the most relevant ISH images are shown and arrowheads are added showing the colocalization of the mRNA in the cell cytoplasm. Next to the plots of RNA expression along the apical-basal axis of r1, an explanatory image of the quantification process is added (Figure 4D).

      (5) What was gained from also performing ChromVAR other than finding more potential regulators and do the results of the two kinds of analyses corroborate one another? What is a dual GATA:TAL BS?

      Our motivation for ChromVAR analysis is now more clearly stated in the text (Results, page 9): “In addition to the regulatory elements of GABAergic fate selectors, we wanted to understand the genome-wide TF activity during rV2 neuron differentiation. To this aim we applied ChromVAR (Schep et al., 2017)" Also, further explanation about the Tal1and Gata binding sites has been added in this chapter (Results, page 9).

      The dual GATA:Tal BS (TAL1.H12CORE.0.P.B) is a 19-bp motif that consists of an E-box and GATA sequence, and is likely bound by heteromeric Gata2-Tal1 TF complex, but may also be bound by Gata2, Gata3 or Tal1 TFs separately. The other TFBSs of Tal1 contain a strong E-box motif and showed either a lower activity (TAL1.H12CORE.1.P.B) or an earlier peak of activity in common precursors with a decline after differentiation (TAL1.H12CORE.2.P.B) (Results, page 9).

      (6) The way the data are displayed it is difficult to see how the C&T confirmed the binding of Ebf1 and Insm1, Tal1, Gata2, and Gata3 (Supplementary Figures 9-11). Are there strong footprints (scores) centered at these peaks? One can't assess this with the way the displays are organized in Figure 3. What is the importance of the H3K4me3 C&T? Replicate consistency, while very strong for some TFs, seems low for other TFs, e.g. Vsx2 C&T on Tal1 and Gata2. The overlaps do not appear very strong in Supplementary Figure 10. Panels are not letter labeled.

      We have added an analysis of footprint locations within the CUT&Tag peaks (Supplementary Figure 11). The Figure shows that the footprints are enriched at the middle regions of the CUT&Tag peaks, which is expected if TF binding at the footprinted TFBS site was causative for the CUT&Tag peaks.

      The aim of the Supplementary Figures 9-11 (Supplementary Figures 8-10 in the revised manuscript) was to show the quality and replicability of the CUT&Tag.

      The anti-H3K4me3 antibody, as well as the anti-IgG antibody, was used in CUT&Tag as part of experiment technical controls. A strong CUT&Tag signal was detected in all our CUT&Tag experiments with H3K4me3. The H3K4me3 signal was not used in downstream analyses.

      We have now labelled the H3K4me3 data more clearly as "positive controls" in the Supplementary Figure 8. The control samples are shown only on Supplementary Figure 8 and not in the revised Supplementary Figure 10, to avoid repetition. The corresponding figure legends have been modified accordingly.

      To show replicate consistency, the genome view showing the Vsx2 CUT&Tag signal at Gata2 gene has been replaced by a more representative region (Supplementary Figure 8, Vsx2). The Vsx2 CUT&Tag signal at the Gata2 locus is weak, explaining why the replicability may have seemed low based on that example.

      Panel labelling is added on Supplementary Figures S8, S9, S10.  

      (7) It would be illuminating to present 1-2 detailed examples of specific target genes fulfilling the multiple criteria outlined in Methods and Figure 6A.

      We now present examples of the supporting evidence used in the definition of selector gene target features and target genes. The new Supplementary Figure 12 shows an example gene Lmo1 that was identified as a target gene of Tal1, Gata2 and Gata3.

      Reviewer #2 (Recommendations for the authors):

      (1) The authors perform CUT&Tag to ask whether Tal1 and other TFs indeed bind putative CREs computed. However, it is unclear whether some of the antibodies (such as Gata3, Vsx2, Insm1, Tead2, Ebf1) used are knock-out validated for CUT&Tag or a similar type of assay such as ChIP-seq and therefore whether the peaks called are specific. The authors should either provide specificity data for these or a reference that has these data. The Vsx2 signal in Figure S9 looks particularly unconvincing.

      Information about the target specificity of the antibodies can be found in previous studies or in the product information. The references to the studies have been now added in the Methods (Materials and Methods, CUT&Tag, pages 18-19). Some of the antibodies are indeed not yet validated for ChIP-seq, Cut-and-run or CUT&Tag. This is now clearly stated in the Materials and Methods (page 19): "The anti-Ebf1, anti-Tal1, anti-IgG and anti-H3K4me3 antibodies were tested on Cut-and-Run or ChIP-seq previously (Boller et al., 2016b; Courtial et al., 2012) and Cell Signalling product information). The anti-Gata2 and anti-Gata3 antibodies are ChIP-validated ((Ahluwalia et al., 2020a) and Abcam product information). There are no previous results on ChIP, ChIP-seq or CUT&Tag with the anti-Insm1, anti-Tead2 and anti-Vsx2 antibodies used here. The specificity and nuclear localization have been demonstrated in immunohistochemistry with anti-Vsx2 (Ahluwalia et al., 2020b) and anti-Tead2 (Biorbyt product information). We observed good correlation between replicates with anti-Insm1, similar to all antibodies used here, but its specificity to target was not specifically tested". We admit that specificity testing with knockout samples would increase confidence in our data. However, we have observed robust signals and good replicability in the CUT&Tag for the antibodies shown here.

      Vsx2 CUT&Tag signal at the loci previously shown in Supplementary Figure S9 (now Supplementary Figure 8) is weak, explaining why the replicability may seem low based on those examples. The genome view showing the Vsx2 CUT&Tag signal at Gata2 gene locus in Supplementary Figure 8 (previously Supplementary figure 9) has now been replaced by a view of Vsx2 locus that is more representative of the signal.

      (2) It is unclear why the authors chose to focus on the transcription factor genes described in line 626 as opposed to the many other putative TFs described in Figure 3/Supplementary Figure 8. This is the major challenge of the paper - the authors are trying to tell a very targeted story but they show a lot of different names of TFs and it is hard to follow which are most important.

      We agree with the reviewer that the process of selection of the genes of interest is not always transparent. We are aware that interpretations of a paper are based on the known functions of the putative regulatory TFs, however additional aspects of regulation could be revealed even if the biological functions of all the TFs were known. This is now stated in the Discussion “Caveats of the study” chapter. It would be relevant to study all identified candidate genes, but as often is the case, our possibilities were limited by the availability of materials (probes, antibodies), time, and financial resources. In the revised manuscript, we now briefly describe the biological processes related to the selected candidate regulatory TFs of the Tal1 gene (Results, page 8, "Pattern of expression of the putative regulators of Tal1 in the r1"). We hope this justifies the focus on them in our RNA co-expression analysis. The TFs analysed by RNAscope ISH are examples, which demonstrate alignment of the tissue expression patterns with the scRNA-seq data, suggesting that the dynamics of gene expression detected by scRNA-seq generally reflects the pattern of expression in the developing brainstem.

      (3) How is the RNA expression level in Figure 5B and 4D-L computed? These are the clusters defined by scATAC-seq. Is this an inferred RNA expression? This should be made more clear in the text.

      The charts in Figures 5B and 4G,H,I show inferred RNA expression. The Y-axis labels have now been corrected and include the term inferred’. RNA expression in the scATAC-seq cell clusters is inferred from the scRNA-seq cells after the integration of the datasets.

      (4) The convergence of the GABA TFs on a common set of target genes reminds me of a nice study from the Rubenstein lab PMID: 34921112 that looked at a set of TFs in cortical progenitors. This might be a good comparison study for the authors to use as a model to discuss the convergence data.

      We thank the reviewer for bringing this article to our attention. The article is now discussed in the manuscript (Discussion, page 11).

      (5) The data in Figure 4, the in-situ figure, needs significant work. First, the images especially B, F, and J appear to be of quite low resolution, so they are hard to see. It is unclear exactly what is being graphed in C, G, and K and it does not seem to match the text of the results section. Perhaps better labeling of the figure and a more thorough description will make it clear. It is not clear how D, H, and L were supposed to relate to the images - presumably, this is a case where cell type is spatially organized, but this was unclear in the text if this is known and it needs to be more clearly described. Overall, as currently presented this figure does not support the descriptions and conclusions in the text.

      Figure 4 has been entirely redrawn with higher resolution images and more logical layout. In the revised Figure 4, the ISH data and the quantification plots are better presented; arrows showing the colocalization of the mRNA in the cell cytoplasm were added; and an explanatory image of the quantification process is added on (D).

      Minor points

      (1) Helpful if the authors include scATAC-seq coverage plots for neuronal subtype markers in Figure 1/S1.

      We are unfortunately uncertain what is meant with this request. Subtype markers in Figure 1/S1 scATAC-seq based clusters are shown from inferred RNA expression, and therefore these marker expression plots do not have any coverage information available.

      (2) The authors in line 429 mention the testing of features within TADs. They should make it clear in the main text (although tadmap is mentioned in the methods) that this is a prediction made by aggregating HiC datasets.

      Good point and that this detail has been added to both page 3 and 16.

      (3) The authors should include a table with the phastcons output described between lines 511 and 521 in the main or supplementary figures.

      We have now clarified int the text that we did not recalculate any phastcons results, we merely used already published and available conservation score per nucleotide as provided by the original authors (Siepel et al. 2005). (Results, page 5: revised text is " To that aim, we used nucleotide conservation scores from UCSC (Siepel et al., 2005). We overlaid conservation information and scATAC-seq features to both validate feature definition as well as to provide corroborating evidence to recognize cCRE elements.")

      (4) It is very difficult to read the names of the transcription factor genes described in Figure 3B-D and Supplementary Figure 8 - it would be helpful to resize the text.

      The Figures 3B-D and Supplementary Figure 7 (former Supplementary figure 8) have been modified, removing unnecessary elements and increasing the size of text.

      (5) It is unclear what strain of mouse is used in the study - this should be mentioned in the methods.

      Outbred NMRI mouse strain was used in this study. Information about the mouse strain is added in Materials and Methods: scRNA-seq samples (page 14), scATAC-seq samples (page 15), RNAscope in situ hybridization (page 17) and CUT&Tag (page 18).

      (6) Text size in Figure 6 should be larger. R-T could be moved to a Supplementary Figure.

      The Figure 6 has been revised, making the charts clearer and the labels of charts larger. The Figure 6R-S have been replaced by Supplementary table 8 and the Figure 6T is now shown as a new Figure (Figure 7).

      Additional corrections in figures

      Figure 6 D,I,N had wrong y-axis scale. It has been corrected, though it does not have an effect on the interpretation of the data as Pos.link and Neg.link counts were compared to each other’s (ratio).

      On Figure 2B, the heatmap labels were shifted making it difficult to identify the feature name per row. This is now corrected.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public reviews:

      Reviewer 1 (Public Review):

      Many thanks for the positive and constructive feedback on the manuscript.

      This study reveals a great deal about how certain neural representations are altered by expectation and learning on shorter and longer timescales, so I am loath to describe certain limitations as 'weaknesses'. But one limitation inherent in this experimental design is that, by focusing on implicit, task-irrelevant predictions, there is not much opportunity to connect the predictive influences seen at the neural level to the perceptual performance itself (e.g., how participants make perceptual decisions about expected or unexpected events, or how these events are detected or appear).

      Thank you for the interesting comment. We now discuss the limitation of task-irrelevant prediction . In brief, some studies which showed sharpening found that task demands were relevant, while some studies which showed dampening were based on task-irrelevant predictions, but it is unlikely that task relevance - which was not manipulated in the current study - would explain the switch between sharpening and dampening that we observe within and across trials.

      The behavioural data that is displayed (from a post-recording behavioural session) shows that these predictions do influence perceptual choice - leading to faster reaction times when expectations are valid. In broad strokes, we may think that such a result is broadly consistent with a 'sharpening' view of perceptual prediction, and the fact that sharpening effects are found in the study to be larger at the end of the task than at the beginning. But it strikes me that the strongest test of the relevance of these (very interesting) EEG findings would be some evidence that the neural effects relate to behavioural influences (e.g., are participants actually more behaviourally sensitive to invalid signals in earlier phases of the experiment, given that this is where the neural effects show the most 'dampening' a.k.a., prediction error advantage?)

      Thank you for the suggestion. We calculated Pearson’s correlation coefficients for behavioural responses (difference in mean reaction times), neural responses during the sharpening effect (difference in decoding accuracy), and neural responses during the dampening effect for each participant, which resulted in null findings.

      Reviewer 2 (Public Review):

      Thank you for your helpful and constructive comments on the manuscript.

      The strength in controlling for repetition effects by introducing a neutral (50% expectation) condition also adds a weakness to the current version of the manuscript, as this neutral condition is not integrated into the behavioral (reaction times) and EEG (ERP and decoding) analyses. This procedure remained unclear to me. The reported results would be strengthened by showing differences between the neutral and expected (valid) conditions on the behavioral and neural levels. This would also provide a more rigorous check that participants had implicitly learned the associations between the picture category pairings.

      Following the reviewer's suggestion, we have included the neutral condition in the behavioural analysis and performed a repeated measures ANOVA on all three conditions.

      It is not entirely clear to me what is actually decoded in the prediction condition and why the authors did not perform decoding over trial bins in prediction decoding as potential differences across time could be hidden by averaging the data. The manuscript would generally benefit from a more detailed description of the analysis rationale and methods.

      In the original version of the manuscript, prediction decoding aimed at testing if the upcoming stimulus category can be decoded from the response to the preceding ( leading) stimulus. However, in response to the other Reviewers’ comments we have decided to remove the prediction decoding analysis from the revised manuscript as it is now apparent that prediction decoding cannot be separated from category decoding based on pixel information.

      Finally, the scope of this study should be limited to expectation suppression in visual perception, as the generalization of these results to other sensory modalities or to the action domain remains open for future research.

      We have clarified the scope of the study in the revised manuscipt .

      Reviewer 3 (Public Review):

      Thank you for the thought-provoking and interesting comments and suggestions.

      (1) The results in Figure 2C seem to show that the leading image itself can only be decoded with ~33% accuracy (25% chance; i.e. ~8% above chance decoding). In contrast, Figure 2E suggests the prediction (surprisingly, valid or invalid) during the leading image presentation can be decoded with ~62% accuracy (50% chance; i.e. ~12% above chance decoding). Unless I am misinterpreting the analyses, it seems implausible to me that a prediction, but not actually shown image, can be better decoded using EEG than an image that is presented on-screen.

      Following this and the remaining comments by the Reviewer (see below), we have decided to remove the prediction analysis from the manuscript. Specifically, we have focused on the Reviewer’s concern that it is implausible that image prediction would be better decoded that an image that is presented on-screen. This led us to perform a control analysis, in which we tried to decode the leading image category based on pixel values alone (rather than on EEG responses). Since this decoding was above chance, we could not rule out the possibility that EEG responses to leading images reflect physical differences between image categories. This issue does not extend to trailing images, as the results of the decoding analysis based on trailing images are based on accuracy comparisons between valid and invalid trials, and thus image features are counterbalanced. We would like to thank the Reviewer for raising this issue

      (2) The "prediction decoding" analysis is described by the authors as "decoding the predictable trailing images based on the leading images". How this was done is however unclear to me. For each leading image decoding the predictable trailing images should be equivalent to decoding validity (as there were only 2 possible trailing image categories: 1 valid, 1 invalid). How is it then possible that the analysis is performed separately for valid and invalid trials? If the authors simply decode which leading image category was shown, but combine L1+L2 and L4+L5 into one class respectively, the resulting decoder would in my opinion not decode prediction, but instead dissociate the representation of L1+L2 from L4+L5, which may also explain why the time-course of the prediction peaks during the leading image stimulus-response, which is rather different compared to previous studies decoding predictions (e.g. Kok et al. 2017). Instead for the prediction analysis to be informative about the prediction, the decoder ought to decode the representation of the trailing image during the leading image and inter-stimulus interval. Therefore I am at present not convinced that the utilized analysis approach is informative about predictions.

      In this analysis, we attempted to decode ( from the response to leading images) which trailing categories ought to be presented. The analysis was split between trials where the expected category was indeed presented (valid) vs. those in which it was not (invalid). The separation of valid vs invalid trials in the prediction decoding analysis served as a sanity check as no information about trial validity was yet available to participants. However, as mentioned above, we have decided to remove the “prediction decoding” analysis based on leading images as we cannot disentangle prediction decoding from category decoding.

      (3) I may be misunderstanding the reported statistics or analyses, but it seems unlikely that >10  of the reported contrasts have the exact same statistic of Tmax= 2.76 . Similarly, it seems implausible, based on visual inspection of Figure 2, that the Tmax for the invalid condition decoding (reported as Tmax = 14.903) is substantially larger than for the valid condition decoding (reported as Tmax = 2.76), even though the valid condition appears to have superior peak decoding performance. Combined these details may raise concerns about the reliability of the reported statistics.

      Thank you for bringing this to our attention. This copy error has now been rectified.

      (4) The reported analyses and results do not seem to support the conclusion of early learning resulting in dampening and later stages in sharpening. Specifically, the authors appear to base this conclusion on the absence of a decoding effect in some time-bins, while in my opinion a contrast between time-bins, showing a difference in decoding accuracy, is required. Or better yet, a non-zero slope of decoding accuracy over time should be shown ( not contingent on post-hoc and seemingly arbitrary binning).

      Thank you for the helpful suggestion. We have performed an additional analysis to address this issue, we calculated the trial-by-trial time-series of the decoding accuracy benefit for valid vs. invalid for each participant and averaged this benefit across time points for each of the two significant time windows. Based on this, we fitted a logarithmic model to quantify the change of this benefit over trials, then found the trial index for which the change of the logarithmic fit was < 0.1% (i.e., accuracy was stabilized). Given the results of this analysis and to ensure a sufficient number of trials, we focussed our further analyses on bins 1-2 to directly assess the effects of learning. This is explained in more detail in the revised manuscript .

      (5) The present results both within and across trials are difficult to reconcile with previous studies using MEG (Kok et al., 2017; Han et al., 2019), single-unit and multi-unit recordings (Kumar et al., 2017; Meyer & Olson 2011), as well as fMRI (Richter et al., 2018), which investigated similar questions but yielded different results; i.e., no reversal within or across trials, as well as dampening effects with after more training. The authors do not provide a convincing explanation as to why their results should differ from previous studies, arguably further compounding doubts about the present results raised by the methods and results concerns noted above.

      The discussion of these findings has been expanded in the revised manuscript . In short, the experimental design of the above studies did not allow for an assessment of these effects prior to learning. Several of them also used repeated stimuli (albeit some studies changed the pairings of stimuli between trials), potentially allowing for RS to confound their results.

      Recommendations for the Authors:

      Reviewer 1 (Recommendations for the authors):

      (1) On a first read, I was initially very confused by the statement on p.7 that each stimulus was only presented once - as I couldn't then work out how expectations were supposed to be learned! It became clear after reading the Methods that expectations are formed at the level of stimulus category (so categories are repeated multiple times even if exemplars are not). I suspect other readers could have a similar confusion, so it would be helpful if the description of the task in the 'Results' section (e.g., around p.7) was more explicit about the way that expectations were generated, and the (very large) stimulus set that examples are being drawn from.

      Following your suggestion, we have clarified the paradigm by adding details about the categories and the manner in which expectations are formed.

      (2) p.23: the authors write that their 1D decoding images were "subjected to statistical inference amounting to a paired t-test between valid and invalid categories". What is meant by 'amounting to' here? Was it a paired t-test or something statistically equivalent? If so, I would just say 'subjected to a paired t-test' to avoid any confusion, or explaining explicitly which statistic inference was done over.

      We have rephrased this as “subjected to (1) a one-sample t-test against chance-level, equivalent to a fixed-effects analysis, and (2) a paired t-test”.

      Relatedly, this description of an analysis amounting to a 'paired t-test' only seems relevant for the sensory decoding and memory decoding analyses (where there are validity effects) rather than the prediction decoding analysis. As far as I can tell the important thing is that the expected image category can be decoded, not that it can be decoded better or worse on valid or invalid trials.

      In the previous version of the manuscript, the comparison of prediction decoding between valid and invalid trials was meant as a sanity check. However, in response to the other Reviewers’ comments we have decided to remove the prediction decoding analysis from the revised manuscript due to confounds.

      It would be helpful if authors could say a bit more about how the statistical inferences were done for the prediction decoding analyses and the 'condition against baseline' contrasts (e.g., when it is stated that decoding accuracy in valid trials *,in general,* is above 0 at some cluster-wise corrected value). My guess is that this amounts to something like a one-sample t-test - but it may be worth noting that one-sample t-tests on information measures like decoding accuracy cannot support population-level inference, because these measures cannot meaningfully be below 0 (see Allefeld et al, 2016).

      When testing for decoding accuracy against baseline, we used one-sample t-tests against chance level (rather than against 0) throughout the manuscript. We now clarify in the manuscript that this corresponds to a fixed-effects analysis (Allefeld et al., 2016). In contrast, when testing for differences in decoding accuracy between valid and invalid conditions, we used paired-sample t-tests. As mentioned above, the prediction decoding analysis has been removed from the analysis.

      (3) By design, the researchers focus on implicit predictive learning which means the expectations being formed are ( by definition) task-irrelevant. I thought it could be interesting if the authors might speculate in the discussion on how they think their results may or may not differ when predictions are deployed in task-relevant scenarios -  particularly given that some studies have found sharpening effects do not seem to depend on task demands ( e.g., Kok et al, 2012 ; Yon et al, 2018)  while other studies have found that some dampening effects do seem to depend on what the observer is attending to ( e.g., Richter et al, 2018) . Do these results hint at a possible explanation for why this might be? Even if the authors think they don't, it might be helpful to say so!

      Thank you for the interesting comment. We have expanded on this in the revised manuscript.

      Reviewer 2  (Recommendations for the authors):

      Methods/results

      (1) The goal of this study is the assessment of expectation effects during statistical learning while controlling for repetition effects, one of the common confounds in prediction suppression studies (see, Feuerriegel et al., 2021). I agree that this is an important aspect and I assume that this was the reason why the authors introduced the P=0.5 neutral condition (Figure 1B, L3). However, I completely missed the analyses of this condition in the manuscript. In the figure caption of Figure 1C, it is stated that the reaction times of the valid, invalid, and neutral conditions are shown, but only data from the valid and invalid conditions are depicted. To ensure that participants had built up expectations and had learned the pairing, one would not only expect a difference between the valid and invalid conditions but also between the valid and neutral conditions. Moreover, it would also be important to integrate the neutral condition in the multivariate EEG analysis to actually control for repetition effects. Instead, the authors constructed another control condition based on the arbitrary pairings. But why was the neutral condition not compared to the valid and invalid prediction decoding results? Besides this, I also suggest calculating the ERP for the neutral condition and adding it to Figure 2A to provide a more complete picture.

      As mentioned above, we have included the neutral condition in the behavioural analysis, as outlined in the revised manuscript. We have also included a repeated measures ANOVA on all 3 conditions. The purpose of the neutral condition was not to avoid RS, but rather to provide a control condition. We avoided repetition by using individual, categorised stimuli. Figure 1C has been amended to include the neutral condition). In response to the remaining comments, we have decided to remove the prediction decoding analysis from the manuscript.

      (2) One of the main results that is taken as evidence for the OPT is that there is higher decoding accuracy for valid trials (indicate sharpening) early in the trial and higher decoding accuracy for invalid trials (indicate dampening) later in the trial. I would have expected this result for prediction decoding that surprisingly showed none of the two effects. Instead, the result pattern occurred in sensory decoding only, and partly (early sharpening) in memory decoding. How do the authors explain these results? Additionally, I would have expected similar results in the ERP; however, only the early effect was observed. I missed a more thorough discussion of this rather complex result pattern. The lack of the opposing effect in prediction decoding limits the overall conclusion that needs to be revised accordingly.

      Since sharpening vs. dampening rests on the comparison between valid and invalid trials, evidence for sharpening vs. dampening could only be obtained from decoding based on responses to trailing images. In prediction decoding (removed from the current version), information about the validity of the trial is not yet available. Thus, our original plan was to compare this analysis with the effects of validity on the decoding of trailing images (i.e. we expected valid trials to be decoded more accurately after the trailing image than before). The results of the memory decoding did mirror the sensory decoding of the trailing image in that we found significantly higher decoding accuracy of the valid trials from 123-180 ms. As with the sensory decoding, there was a tendency towards a later flip (280-296 ms) where decoding accuracy of invalid trials became nominally higher, but this effect did not reach statistical significance in the memory decoding.

      (3) To increase the comprehensibility of the result pattern, it would be helpful for the reader to clearly state the hypotheses for the ERP and multivariate EEG analyses. What did you expect for the separate decoding analyses? How should the results of different decoding analyses differ and why? Which result pattern would (partly, or not) support the OPT?

      Our hypotheses are now stated in the revised manuscript.

      (4) I was wondering why the authors did not test for changes during learning for prediction decoding. Despite the fact that there were no significant differences between valid and invalid conditions within-trial, differences could still emerge when the data set is separated into bins. Please test and report the results.

      As mentioned above, we have decided to remove the prediction decoding analysis from the current version of the manuscript.

      (5) To assess the effect of learning the authors write: 'Given the apparent consistency of bins 2-4, we focused our analyses on bins 1-2.' Please explain what you mean by 'apparent consistency'. Did you test for consistency or is it based on descriptive results? Why do the authors not provide the complete picture and perform the analyses for all bins? This would allow for a better assessment of changes over time between valid and invalid conditions. In Figure 3, were valid and invalid trials different in any of the QT3 or QT4 bins in sensory or memory encoding?

      We have performed an additional analysis to address this issue. The reasoning behind the decision to focus on bins 1-2 is now explained in the revised manuscript. In short, fitting a learning curve to trial-by-trial decoding estimates indicates that decoding stabilizes within <50% of the trials. To quantify changes in decoding occurring within these <50% of the trials while ensuring a sufficient number of trials for statistical comparisons, we decided to focus on bins 1-2 only.

      (6) Please provide the effect size for all statistical tests.

      Effect sizes have now been provided.

      (7) Please provide exact p-values for non-significant results and significant results larger than 0.001.

      Exact p-values have now been provided.

      (8) Decoding analyses: I suppose there is a copy/paste error in the T-values as nearly all T-values on pages 11 and 12 are identical (2.76) leading to highly significant p-values (0.001) as well as non-significant effects (>0.05). Please check.

      Thank you for bringing this to our attention. This error has now been corrected.

      (9) Page 12:  There were some misleading phrases in the result section. To give one example: 'control analyses was slightly above change' - this sounds like a close to non-significant effect, but it was indeed a highly significant effect of p<0.001. Please revise.

      This phrase was part of the prediction decoding analysis and has therefore been removed.

      (10) Sample size: How was the sample size of the study be determined (N=31)? Why did only a subgroup of participants perform the behavioral categorization task after the EEG recording? With a larger sample, it would have been interesting to test if participants who showed better learning (larger difference in reaction times between valid and invalid conditions) also showed higher decoding accuracies.

      This has been clarified in the revised manuscript. In short, the larger sample size of N=31 was based on previous research; ten participants were initially tested as part of a pilot which was then expanded to include the categorisation task.

      (11) I assume catch trials were removed before data analyses?

      We have clarified that catch trials were indeed removed prior to analyses.

      (12) Page 23, 1st line: 'In each, the decoder...' Something is missing here.

      Thank you for bringing this to our attention, this sentence has now been rephrased as “In both valid and invalid analyses” in the revised manuscript.

      Discussion

      (1) The analysis over multiple trials showed dampening within the first 15 min followed by sharpening. I found the discussion of this finding very lengthy and speculative (page 17). I recommend shortening this part and providing only the main arguments that could stimulate future research.

      Thank you for the suggestion. Since Reviewer 3 has requested additional details in this part of the discussion, we have opted to keep this paragraph in the manuscript. However, we have also made it clearer that this section is relatively speculative and the arguments provided for the across trials dynamics are meant to stimulate further research.

      (2) As this task is purely perceptual, the results support the OPT for the area of visual perception. For action, different results have been reported. Suppression within-trial has been shown to be larger for expected than unexpected features of action targets and suppression even starts before the start of the movement without showing any evidence for sharpening ( e.g., Fuehrer et al., 2022, PNAS). For suppression across trials, it has been found that suppression decreases over the course of learning to associate a sensory consequence to a specific action (e.g., Kilteni et al., 2019, ELife). Therefore, expectation suppression might function differently in perception and action (an area that still requires further research). Please clarify the scope of your study and results on perceptual expectations in the introduction, discussion, and abstract.

      We have clarified the scope of the study in the revised manuscript.

      Figures

      (1) Figure 1A: Add 't' to the arrow to indicate time.

      This has been rectified.

      (2) Figure 3:  In the figure caption, sensory and memory decoding seem to be mixed up. Please correct. Please add what the dashed horizontal line indicates.

      Thank you for bringing this to our attention, this has been rectified.

      Reviewer 3  (Recommendations for the authors):

      I applaud the authors for a well-written introduction and an excellent summary of a complicated topic, giving fair treatment to the different accounts proposed in the literature. However, I believe a few additional studies should be cited in the Introduction, particularly time-resolved studies such as Han et al., 2019; Kumar et al., 2017; Meyer and Olson, 2011. This would provide the reader with a broader picture of the current state of the literature, as well as point the reader to critical time-resolved studies that did not find evidence in support of OPT, which are important to consider in the interpretation of the present results.

      The introduction has been expanded to include the aforementioned studies in the revised manuscript.

      Given previous neuroimaging studies investigating the present phenomenon, including with time-resolved measures (e.g. Kok et al., 2017; Han et al., 2019; Kumar et al., 2017; Meyer & Olson 2011), why do the authors think that their data, design, or analysis allowed them to find support for OPT but not previous studies? I do not see obvious modifications to the paradigm, data quantity or quality, or the analyses that would suggest a superior ability to test OPT predictions compared to previous studies. Given concerns regarding the data analyses (see points below), I think it is essential to convincingly answer this question to convince the reader to trust the present results.

      The most obvious alteration to the paradigm is the use of non-repeated stimuli. Each of the above time-resolved studies utilised repeated stimuli (either repeated, identical stimuli, or paired stimuli where pairings are changed but the pool of stimuli remains the same), allowing for RS to act as a confound as exemplars are still presented multiple times. By removing this confound, it is entirely plausible that we may find different time-resolved results given that it has been shown that RS and ES are separable in time (Todorovic & de Lange, 2012). We also test during learning rather than training participants on the task beforehand. By foregoing a training session, we are better equipped to assess OPT predictions as they emerge. In our across-trial results, learning appears to take place after approximately 15 minutes or 432 trials, at which point dampening reverses to sharpening. Had we trained the participants prior to testing, this effect would have been lost.

      What is actually decoded in the "prediction decoding" analysis? The authors state that it is "decoding the predictable trailing images based on the leading images" (p.11). The associated chance level (Figure 2E) is indicated as 50%. This suggests that the classes separated by the SVM are T6 vs T7. How this was done is however unclear. For each leading image decoding the predictable trailing images should be equivalent to decoding validity (as there are only 2 possible trailing images, where one is the valid and the other the invalid image). How is it then possible that the analysis is performed separately for valid and invalid trials? Are the authors simply decoding which leading image was shown, but combine L1+L2 and L4+L5 into one class respectively? If so, this needs to be better explained in the manuscript. Moreover, the resulting decoder would in my opinion not decode the predicted image, but instead learn to dissociate the representation of L1+L2 from L4+L5, which may also explain why the time course of the prediction peaks during the leading image stimulus-response, which is rather different compared to previous studies decoding (prestimulus) predictions (e.g. Kok et al. 2017). If this is indeed the case, I find it doubtful that this analysis relates to prediction. Instead for the prediction analysis to be informative about the predicted image the authors should, in my opinion, train the decoder on the representation of trailing images and test it during the prestimulus interval.

      As mentioned above, the prediction decoding analysis has been removed from the manuscript. The prediction decoding analysis was intended as a sanity check, as validity information was not yet available to participants.

      Related to the point above, were the leading/trailing image categories and their mapping to L1, L2, etc. in Figure 1B fixed across subjects? I.e. "'beach' and 'barn' as 'Leading' categories would result in 'church' as a 'Trailing' category with 75% validity" (p.20) for all participants? If so, this poses additional problems for the interpretation of the analysis discussed in the point above, as it may invalidate the control analyses depicted in Figure 2E, as systematic differences and similarities in the leading image categories could account for the observed results.

      Image categories and their mapping were indeed fixed across participants. While this may result in physical differences and similarities between images influencing results, counterbalancing categories across participants would not have addressed this issue. For example, had we swapped “beach” with “barn” in another participant, physical differences between images may still be reflected in the prediction decoding. On the other hand, counterbalancing categories across trials was not possible given our aim of examining the initial stages of learning over trials. Had we changed the mappings of categories throughout the experiment for each participant, we would have introduced reversal learning and nullified our ability to examine the initial stages of learning under flat priors. In any case, the prediction decoding analysis has been removed from the manuscript, as outlined above.

      Why was the neutral condition L3 not used for prediction decoding? After all, if during prediction decoding both the valid and invalid image can be decoded, as suggested by the authors, we would also expect significant decoding of T8/T9 during the L3 presentation.

      In the neutral condition, L3 was followed by T8 vs. T9 with 50% probability, precluding prediction decoding. While this could have served as an additional control analysis for EEG-based decoding, we have opted for removing prediction decoding from the analysis. However, in response to the other Reviewers’ comments, the neutral condition has now been included in the behavioral analysis.

      The following concern may arise due to a misunderstanding of the analyses, but I found the results in Figures 2C and 2E concerning. If my interpretation is correct, then these results suggest that the leading image itself can only be decoded with ~33% accuracy (25% chance; i.e. ~8% above chance decoding). In contrast, the predicted (valid or invalid) image during the leading image presentation can be decoded with ~62% accuracy (50% chance; i.e. ~12% above chance decoding). Does this seem reasonable? Unless I am misinterpreting the analyses, it seems implausible to me that a prediction but not actually shown image can be better decoded than an on-screen image. Moreover, to my knowledge studies reporting decoding of predictions can (1) decode expectations just above chance level (e.g. Kok et al., 2017; which is expected given the nature of what is decoded) and (2) report these prestimulus effects shortly before the anticipated stimulus onset, and not coinciding with the leading image onset ~800ms before the predicted stimulus onset. For the above reasons, the key results reported in the present manuscript seem implausible to me and may suggest the possibility of problems in the training or interpretation of the decoding analysis. If I misunderstood the analyses, the analysis text needs to be refined. If I understood the analyses correctly, at the very least the authors would need to provide strong support and arguments to convince the reader that the effects are reliable (ruling out bias and explaining why predictions can be decoded better than on-screen stimuli) and sensible (in the context of previous studies showing different time-courses and results).

      As explained above, we have addressed this concern by performing an additional analysis, implementing decoding based on image pixel values. Indeed we could not rule out the possibility that “prediction” decoding reflected stimulus differences between leading images.

      Relatedly, the authors use the prestimulus interval (-200 ms to 0 ms before predicted stimulus onset) as the baseline period. Given that this period coincides with prestimulus expectation effects ( Kok et al., 2017) , would this not result in a bias during trailing image decoding? In other words, the baseline period would contain an anticipatory representation of the expected stimulus ( Kok et al., 2017) , which is then subtracted from the subsequent EEG signal, thereby allowing the decoder to pick up on this "negative representation" of the expected image. It seems to me that a cleaner contrast would be to use the 200ms before leading image onset as the baseline.

      The analysis of trailing images aimed at testing specific hypotheses related to differences between decoding accuracy in valid vs. invalid trials. Since the baseline was by definition the same for both kinds of trials (since information about validity only appears at the onset of the trailing image), changing the baseline would not affect the results of the analysis. Valid and invalid trials would have the same prestimulus effect induced by the leading image.

      Again, maybe I misunderstood the analyses, but what exactly are the statistics reported on p. 11 onward? Why is the reported Tmax identical for multiple conditions, including the difference between conditions? Without further information this seems highly unlikely, further casting doubts on the rigor of the applied methods/analyses. For example: "In the sensory decoding analysis based on leading images, decoding accuracy was above chance for both valid (Tmax= 2.76, pFWE < 0.001) and invalid trials (Tmax= 2.76, pFWE < 0.001) from 100 ms, with no significant difference between them (Tmax= 2.76, pFWE > 0.05) (Fig. 2C)" (p.11).

      Thank you for bringing this to our attention. As previously mentioned, this copy error has been rectified in the revised manuscript.

      Relatedly, the statistics reported below in the same paragraph also seem unusual. Specifically, the Tmax difference between valid and invalid conditions seems unexpectedly large given visual inspection of the associated figure: "The decoding accuracy of both valid (Tmax = 2.76, pFWE < 0.001) and invalid trials (Tmax = 14.903, pFWE < 0.001)" (p.12). In fact, visual inspection suggests that the largest difference should probably be observed for the valid not invalid trials (i.e. larger Tmax).

      This copy error has also been rectified in the revised manuscript.

      Moreover, multiple subsequent sections of the Results continue to report the exact same Tmax value. I will not list all appearances of "Tmax = 2.76" here but would recommend the authors carefully check the reported statistics and analysis code, as it seems highly unlikely that >10 contrasts have exactly the same Tmax. Alternatively, if I misunderstand the applied methods, it would be essential to better explain the utilized method to avoid similar confusion in prospective readers.

      This error has also now been rectified. As mentioned above the prediction decoding analysis has been removed.

      I am not fully convinced that Figures 3A/B and the associated results support the idea that early learning stages result in dampening and later stages in sharpening. The inference made requires, in my opinion, not only a significant effect in one-time bin and the absence of an effect in other bins. Instead to reliably make this inference one would need a contrast showing a difference in decoding accuracy between bins, or ideally an analysis not contingent on seemingly arbitrary binning of data, but a decrease ( or increase) in the slope of the decoding accuracy across trials. Moreover, the decoding analyses seem to be at the edge of SNR, hence making any interpretation that depends on the absence of an effect in some bins yet more problematic and implausible.

      Thank you for the helpful suggestion. As previously mentioned we fitted a logarithmic model to quantify the change of the decoding benefit over trials, then found the trial index for which the change of the logarithmic fit was < 0.1 %. Given the results of this analysis and to ensure a sufficient number of trials, we focussed our further analyses on bins 1-2 . This is explained in more detail in the revised manuscript.

      Relatedly, based on the literature there is no reason to assume that the dampening effect disappears with more training, thereby placing more burden of proof on the present results. Indeed, key studies supporting the dampening account (including human fMRI and MEG studies, as well as electrophysiology in non-human primates) usually seem to entail more learning than has occurred in bin 2 of the present study. How do the authors reconcile the observation that more training in previous studies results in significant dampening, while here the dampening effect is claimed to disappear with less training?

      The discussion of these findings has been expanded on in the revised manuscript. As previously outlined, many of the studies supporting dampening did not explicitly test the effect of learning as they emerge, nor did they control for RS to the same extent.

      The Methods section is quite bare bones. This makes an exact replication difficult or even impossible. For example, the sections elaborating on the GLM and cluster-based FWE correction do not specify enough detail to replicate the procedure. Similarly, how exactly the time points for significant decoding effects were determined is unclear (e.g., p. 11). Relatedly, the explanation of the decoding analysis, e.g. the choice to perform PCA before decoding, is not well explained in the present iteration of the manuscript. Additionally, it is not mentioned how many PCs the applied threshold on average resulted in.

      Thank you for this suggestion, we have described our methods in more detail.

      To me, it is unclear whether the PCA step, which to my knowledge is not the default procedure for most decoding analyses using EEG, is essential to obtain the present results. While PCA is certainly not unusual, to my knowledge decoding of EEG data is frequently performed on the sensor level as SVMs are usually capable of dealing with the (relatively low) dimensionality of EEG data. In isolation this decision may not be too concerning, however, in combination with other doubts concerning the methods and results, I would suggest the authors replicate their analyses using a conventional decoding approach on the sensory level as well.

      Thank you for this suggestion, we have explained our decision to use PCA in the revised manuscript.

      Several choices, like the binning and the focus on bins 1-2 seem rather post-hoc. Consequently, frequentist statistics may strictly speaking not be appropriate. This further compounds above mentioned concerns regarding the reliability of the results.

      The reasoning behind our decision to focus on bins 1-2 is now explained in more detail in the revised manuscript.

      A notable difference in the present study, compared to most studies cited in the introduction motivating the present experiment, is that categories instead of exemplars were predicted.

      This seems like an important distinction to me, which surprisingly goes unaddressed in the Discussion section. This difference might be important, given that exemplar expectations allow for predictions across various feature levels (i.e., even at the pixel level), while category predictions only allow for rough (categorical) predictions.

      The decision to use categorical predictions over exemplars lies in the issue of RS, as it is impossible to control for RS while repeating stimuli over many trials. This has been discussed in more detail in the revised manuscript.

      While individually minor problems, I noticed multiple issues across several figures or associated figure texts. For example: Figure 1C only shows valid and invalid trials, but the figure text mentions the neutral condition. Why is the neutral condition not depicted but mentioned here? Additionally, the figure text lacks critical information, e.g. what the asterisk represents. The error shading in Figure 2 would benefit from transparency settings to not completely obscure the other time-courses. Increasing the figure content and font size within the figure (e.g. axis labels) would also help with legibility (e.g. consider compressing the time-course but therefore increasing the overall size of the figure). I would also recommend using more common methods to indicate statistical significance, such as a bar at the bottom of the time-course figure typically used for cluster permutation results instead of a box. Why is there no error shading in Figure 2A but all other panels? Fig 2C-F has the y-axis label "Decoding accuracy (%)" but certainly the y-axis, ranging roughly from 0.2 to 0.7, is not in %. The Figure 3 figure text gives no indication of what the error bars represent, making it impossible to interpret the depicted data. In general, I would recommend that the authors carefully revisit the figures and figure text to improve the quality and complete the information.

      Thank you for the suggestions. Figure 1C now includes the neutral condition. Asterisks denote significant results. The font size in Figure 2C-E has been increased. The y-axis on Figure 2C-E has been amended to accurately reflect decoding accuracy in percentage. Figure 2A has error shading, however, the error is sufficiently small that the error shading is difficult to see. The error bars in Figure 3 have been clarified.

      Given the choice of journal (eLife), which aims to support open science, I was surprised to find no indication of (planned) data or code sharing in the manuscript.

      Plans for sharing code/data are now outlined in the revised manuscript.

      While it is explained in sufficient detail later in the Methods section, it was not entirely clear to me, based on the method summary at the beginning of the Results section, whether categories or individual exemplars were predicted. The manuscript may benefit from clarifying this at the start of the Results section.

      Thank you for this suggestion, following this and suggestions from other reviewers, the experimental paradigm and the mappings between categories has been further explained in the revised manuscript, to make it clearer that predictions are made at the categorical level.

      "Unexpected trials resulted in a significantly increased neural response 150 ms after image onset" (p.9). I assume the authors mean the more pronounced negative deflection here. Interpreting this, especially within the Results section as "increased neural response" without additional justification may stretch the inferences we can make from ERP data; i.e. to my knowledge more pronounced ERPs could also reflect increased synchrony. That said, I do agree with the authors that it is likely to reflect increased sensory responses, it would just be useful to be more cautious in the inference.

      Thank you for the interesting comment, this has been rephrased as a “more pronounced negative deflection” in the revised manuscript.

      Why was the ERP analysis focused exclusively on Oz? Why not a cluster around Oz? For object images, we may expect a rather wide dipole.

      Feuerriegel et al (2021) have outlined issues questioning the robustness of univariate analyses for ES, as such we opted for a targeted ROI approach on the channel showing peak amplitude of the visually evoked response (Fig. 2B). More details on this are in the revised manuscript.           

      How exactly did the authors perform FWE? The description in the Method section does not appear to provide sufficient detail to replicate the procedure.

      FWE as implemented in SPM is a cluster-based method of correcting for multiple comparisons using random field theory. We have explained our thresholding methods in more detail in the revised manuscript.

      If I misunderstand the authors and they did indeed perform standard cluster permutation analyses, then I believe the results of the timing of significant clusters cannot be so readily interpreted as done here (e.g. p.11-12); see: Maris & Oostenveld 2007; Sassenhagen & Dejan 2019.

      All statistics were based on FWE under random field theory assumptions (as implemented in SPM) rather than on cluster permutation tests (as implemented in e.g.  Fieldtrip)

      Why did the authors choose not to perform spatiotemporal cluster permutation for the ERP results?

      As mentioned above, we opted to target our ERP analyses on Oz due to controversies in the literature regarding univariate effects of ES (Feuerriegel et al., 2021).

      Some results, e.g. on p.12 are reported as T29 instead of Tmax. Why?

      As mentioned above, prediction decoding analyses have been removed from the manuscript.

    1. Author response:

      Reviewer #1 (Public Review):

      (1) The network they propose is extremely simple. This simplicity has pros and cons: on the one hand, it is nice to see the basic phenomenon exposed in the simplest possible setting. On the other hand, it would also be reassuring to check that the mechanism is robust when implemented in a more realistic setting, using, for instance, a network of spiking neurons similar to the one they used in the 2008 paper. The more noisy and heterogeneous the setting, the better.

      The choice of a minimal model to illustrate our hypothesis is deliberate. Our main goal was to suggest a physiologically-grounded mechanism to rapidly encode temporally-structured information (i.e., sequences of stimuli) in Working Memory, where none was available before. Indeed, as discussed in the manuscript, previous proposals were unsatisfactory in several respects. In view of our main goal, we believe that a spiking implementation is beyond the scope of the present work.

      We would like to note that the mechanism originally proposed in Mongillo et al. (2008), has been repeatedly implemented, by many different groups, in various spiking network models with different levels of biological realism (see, e.g., Lundquivst et al. (2016), for an especially ‘detailed’ implementation) and, in all cases, the relevant dynamics has been observed. We take this as an indication of ‘robustness’; the relevant network dynamics doesn’t critically depend on many implementation details and, importantly, this dynamics is qualitatively captured by a simple rate model (see, e.g., Mi et al. (2017)).

      In the present work, we make a relatively ‘minor’ (from a dynamical point of view) extension of the original model, i.e., we just add augmentation. Accordingly, we are fairly confident that a set of parameters for the augmentation dynamics can be found such that the spiking network behaves, qualitatively, as the rate model. A meaningful study, in our opinion, then would require extensively testing the (large) parameters’ space (different models of augmentation?) to see how the network behavior compares with the relevant experimental observations (which ones? behavioral? physiological?). As said above, we believe that this is beyond the scope of the present work.       

      This being said, we definitely agree with the reviewer that not presenting a spiking implementation is a limitation of the present work. We will clearly acknowledge, and discuss, this limitation in the revised version.

      (2) One major issue with the population spike scenario is that (to my knowledge) there is no evidence that these highly synchronized events occur in delay periods of working memory experiments. It seems that highly synchronized population spikes would imply (a) a strong regularity of spike trains of neurons, at odds with what is typically observed in vivo (b) high synchronization of neurons encoding for the same item (and also of different items in situations where multiple items have to be held in working memory), also at odds with in vivo recordings that typically indicate weak synchronization at best. It would be nice if the authors at least mention this issue, and speculate on what could possibly bridge the gap between their highly regular and synchronized network, and brain networks that seem to lie at the opposite extreme (highly irregular and weakly synchronized). Of course, if they can demonstrate using a spiking network simulation that they can bridge the gap, even better.

      Direct experimental evidence (in monkeys) in support of the existence of highly synchronized events -- to be identified with the ‘population spikes’ of our model -- during the delay period of a memory task is available in the literature and we have cited it, i.e., Panichello et al. (2024). In the revised version, we will provide an explicit discussion of the results of Panichello et al. (2024) and how these results directly relate to our model. After submission, we became aware of another experimental study (in humans) specifically dealing with sequence memory, i.e., Liebe et al. (2025). Their results, again, are fully consistent with our model. We will also provide an explicit discussion of these results in the revised version.

      We note that there is no fundamental contradiction between highly synchronized events in ‘small’ neural populations (e.g., a cell assembly) on one hand, and temporally irregular (i.e., Poisson-like) spiking at the single-neuron level and weakly synchronized activity at the network level, on the other hand. This was already illustrated in our original publication, i.e., Mongillo et al. (2008) (see, in particular, Fig. S2).

      We further note that the mechanism we propose to encode temporal order -- a temporal gradient in the synaptic efficacies brought about by synaptic augmentation -- would also work if the memory of the items is maintained by ‘tonic’ persistent activity (i.e., without highly synchronized events), provided this activity occurs at suitably low rates such as to prevent the saturation of the synaptic augmentation.

      We will include a detailed discussion of these points in the revised version.

      Reviewer #2 (Public Review):

      The study relates to the well-known computational theory for working memory, which suggests short-term synaptic facilitation is required to maintain working memory, but doesn't rely on persistent spiking. This previous theory appears similar to the proposed theory, except for the change from facilitation to augmentation. A more detailed explanation of why the authors use augmentation instead of facilitation in this paper is warranted: is the facilitation too short to explain the whole process of WM? Can the theory with synaptic facilitation also explain the immediate storage of novel sequences in WM?

      In the model, synaptic dynamics displays both short-term facilitation and augmentation (and shortterm depression). Indeed, synaptic facilitation, alone, would be too short-lived to encode novel sequences. This is illustrated in Fig. 1B. We will provide a more detailed discussion of this point in the revised version. 

      In Figure 1, the authors mention that synaptic augmentation leads to an increased firing rate even after stimulus presentation. It would be good to determine, perhaps, what the lowest threshold is to see the encoding of a WM task, and whether that is biologically plausible.

      We believe that this comment is related to the above point. The reviewer is correct; augmentation alone would require fairly long stimulus presentations to encode an item in WM. ‘Fast’ encoding, indeed, is guaranteed by the presence of short-term facilitation. We will emphasize this important point in the revised version.

      In the middle panel of Figure 4, after 15-16 sec, when the neuronal population prioritizes with the second retro-cue, although the second retro-cue item's synaptic spike dominates, why is the augmentation for the first retro-cue item higher than the second-cue augmentation until the 20 sec?

      This is because of the slow build-up and slow decay of the augmentation. When the second item is prioritized, and the corresponding neuronal population re-activates, its augmentation level starts to increase. At the same time, as the first item is now de-prioritized and the corresponding neuronal population is now silent, its augmentation level starts to decrease. Because of the ‘slowness’ of both processes (i.e., augmentation build-up and decay), it takes about 5 seconds for the augmentation level of the second item to overcome the augmentation level of the first item.

      We note that the slow time scales of the augmentation dynamics, consistently with experimental observations, are necessary for our mechanism to work.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors introduce a novel algorithm for the automatic identification of longrange axonal projections. This is an important problem as modern high-throughput imaging techniques can produce large amounts of raw data, but identifying neuronal morphologies and connectivities requires large amounts of manual work. The algorithm works by first identifying points in three-dimensional space corresponding to parts of labelled neural projections, these are then used to identify short sections of axons using an optimisation algorithm and the prior knowledge that axonal diameters are relatively constant. Finally, a statistical model that assumes axons tend to be smooth is used to connect the sections together into complete and distinct neural trees. The authors demonstrate that their algorithm is far superior to existing techniques, especially when dense labelling of the tissue means that neighbouring neurites interfere with the reconstruction. Despite this improvement, however, the accuracy of reconstruction remains below 90%, so manual proofreading is still necessary to produce accurate reconstructions of axons.

      Strengths:

      The new algorithm combines local and global information to make a significant improvement on the state-of-the-art for automatic axonal reconstruction. The method could be applied more broadly and might have applications to reconstructions of electron microscopy data, where similar issues of highthroughput imaging and relatively slow or inaccurate reconstruction remain.

      We thank the reviewer for their positive comments and for taking the time to review our manuscript. We are truly grateful that the reviewer recognized the value of our method in automatically reconstructing long-range axonal projections. While we report that our method achieves reconstruction accuracy of approximately 85%, we fully acknowledge that manual proofreading is still necessary to ensure accuracy greater than 95%. We also appreciate the reviewer’s insightful suggestion regarding the potential adaptation of our algorithm for reconstructing electron microscopy (EM) data, where similar challenges in high-throughput imaging and relatively slow or inaccurate reconstruction persist. We look forward to exploring ways to integrate our method with EM data in future work.

      Weaknesses:

      There are three weaknesses in the algorithm and manuscript.

      (1) The best reconstruction accuracy is below 90%, which does not fully solve the problem of needing manual proofreading.

      We sincerely appreciate the reviewer's valuable insights regarding reconstruction accuracy. Indeed, as illustrated in Figure S4, our current best automated reconstruction accuracy on fMOST data is still below 90%. This indicates that manual proofreading remains essential to ensure reliability.

      For the reconstruction of long-range axonal projections, ensuring the accuracy of the reconstruction process necessitates manual revision of the automatically generated results. Existing literature has demonstrated that a higher accuracy in automatic reconstruction correlates with a reduced need for manual revisions, thereby facilitating an accelerated reconstruction process (Winnubst et al., Cell 2019; Liu et al., Nature Methods 2025).

      As the reviewer rightly points out, achieving an accuracy exceeding 95% currently necessitates manual proofreading. Although our method does not completely eliminate this requirement, it significantly alleviates the proofreading workload by: 1) Minimizing common errors in regions with dense neuron distributions; 2) Providing more reliable initial reconstructions; and 3) Reducing the number of corrections needed during the proofreading process.

      In the future, we will continue to enhance our reconstruction framework. As imaging systems achieve higher signal-to-noise ratios and deep learning techniques facilitate more accurate foreground detection, we anticipate that our method will attain even greater reconstruction accuracy. Furthermore, we plan to develop a software system capable of predicting potential error locations in our automated reconstruction results, thereby streamlining manual revisions. This approach distinguishes itself from existing models by obviating the need for individual traversal of the brain regions associated with each neuron reconstruction.

      (2) The 'minimum information flow tree' model the authors use to construct connected axonal trees has the potential to bias data collection. In particular, the assumption that axons should always be as smooth as possible is not always correct. This is a good rule-of-thumb for reconstructions, but real axons in many systems can take quite sharp turns and this is also seen in the data presented in the paper (Figure 1C). I would like to see explicit acknowledgement of this bias in the current manuscript and ideally a relaxation of this rule in any later versions of the algorithm.

      We appreciate the reviewer's insightful opinion regarding the potential bias introduced by our minimum information flow tree model. The reviewer is absolutely correct in noting that while axon smoothness serves as a useful reconstruction heuristic, it should not be treated as an absolute constraint given that real axons can exhibit sharp turns (as shown in Figure 1C). In response to this valuable feedback, we add explicit discussion of this limitation in Discussion section as follow: “Finally, the minimal information flow tree’s fundamental assumption, that axons should be as smooth as possible does not always hold true.

      In fact, real axons can take quite sharp turns leading the algorithm to erroneously separate a single continuous axon into disjoint neurites.”

      In our reconstruction process, the post-processing approach partially mitigates erroneous reconstructions derived from this rule. Specifically: The minimum information flow tree will decompose such structures into two separate branches (Fig. S7A), but the decomposition node is explicitly recorded. The newly decomposed branches attempt to reconnect by searching for plausible neurites starting from their head nodes (determined by the minimum information flow tree). If no connectable neurites are found, the branch is automatically reconnected to its originally recorded decomposition node (Fig. S7B). In Fig.S7C, two reconstruction examples demonstrate the effectiveness of the post-processing approach.

      As pointed out by the reviewers, the proposed rule for revising neuron reconstruction does not encompass all scenarios. Relaxing the constraints of this rule may lead to numerous new erroneous connections. Currently, the proposed rule is solely based on the positions of neurite centerlines and does not integrate information regarding the intensity of the original images or segmentation data. Incorporating these elements into the rule could potentially reduce reconstruction errors. 

      (3) The writing of the manuscript is not always as clear as it could be. The manuscript would benefit from careful copy editing for language, and the Methods section in particular should be expanded to more clearly explain what each algorithm is doing. The pseudo-code of the Supplemental Information could be brought into the Methods if possible as these algorithms are so fundamental to the manuscript.

      We sincerely thank the reviewer for these valuable suggestions to improve our manuscript’s clarity and methodological presentation. We have implemented the following revisions:

      (1) Language Enhancement: we have conducted rigorous internal linguistic reviews to address grammatical inaccuracies and improve textual clarity.

      (2) Methods Expansion and Pseudo-code Integration: we have incorporated all relevant derivations from the Supplementary Materials into the Methods section, with additional explanatory text to clarify the purpose and implementation of each algorithm. All mathematical formulations have been systematically rederived with modifications to variable nomenclature, subscript/superscript notations and identified errors in the original submission. All pseudocode from Supplementary Materials has been integrated into their corresponding methods subsection.

      Reviewer #2 (Public review):

      In this manuscript, Cai et al. introduce PointTree, a new automated method for the reconstruction of complex neuronal projections. This method has the potential to drastically speed up the process of reconstructing complex neurites. The authors use semi-automated manual reconstruction of neurons and neurites to provide a 'ground-truth' for comparison between PointTree and other automated reconstruction methods. The reconstruction performance is evaluated for precision, recall, and F1-score and positions. The performance of PointTree compared to other automated reconstruction methods is impressive based on these 3 criteria.

      As an experimentalist, I will not comment on the computational aspects of the manuscript. Rather, I am interested in how PointTree's performance decreases in noisy samples. This is because many imaging datasets contain some level of background noise for which the human eye appears essential for the accurate reconstruction of neurites. Although the samples presented in Figure 5 represent an inherent challenge for any reconstruction method, the signal-to-noise ratio is extremely high (also the case in all raw data images in the paper). It would be interesting to see how PointTree's performance changes in increasingly noisy samples, and for the author to provide general guidance to the scientific community as to what samples might not be accurately reconstructed with PointTree.

      We thank the reviewer for her/his time reviewing our manuscript and the interest on how PointTree perform on noisy samples. It is important to clarify that PointTree is solely responsible for the reconstruction of neurons from the foreground regions of neural images. The foreground regions of these neuronal images are obtained through a deep learning segmentation network. In cases where the image has a low signal-to-noise ratio, if the segmentation network can accurately identify the foreground areas, then PointTree will be able to accurately reconstruct neurons. In fact, existing deep learning networks have demonstrated their capability to effectively extract foreground regions from low signal-to-noise ratio images; therefore, PointTree is well-suited for processing neuronal images characterized by low signal-to-noise ratios.

      In the revised manuscript, we conducted experiments on datasets with varying signal-to-noise ratios (SNR). The results demonstrate that Unet3D is capable of identifying the foreground regions in low-SNR images, thereby supporting the assertion that PointTree has broad applicability across diverse neuronal imaging datasets. 

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      It would be interesting to see how PointTree's performance changes in increasingly noisy samples, and for the author to provide general guidance to the scientific community as to what samples might not be accurately reconstructed with PointTree.

      We extend our heartfelt gratitude to the reviewer for their insightful suggestion concerning experiments involving different noisy samples. Here are the details of the datasets used:

      LSM dataset: Mean SNR = 5.01, with 25 samples, and a volume size of 192×192×192.

      fMOST dataset: Mean SNR = 8.68, with 25 samples, and a volume size of 192×192×192.

      HD-fMOST dataset: Mean SNR = 11.4, with 25 samples, and a volume size of 192×192×192.

      The experimental results reveal that, thanks to the deep learning network's robust feature extraction capabilities, even when working with low-SNR data (as depicted in Figure 4B, first two columns of the top row), satisfactory segmentation results (Figure 4B, first two columns of the third row) were achieved. These results laid a solid foundation for subsequent accurate reconstruction.

      PointTree demonstrated consistent mean F1-scores of 91.0%, 90.0%, and 93.3% across the three datasets, respectively. This underscores its reconstruction robustness under varying SNR conditions when supported by the segmentation network. For more in-depth information, please refer to the manuscript section titled "Reconstruction of data with different signal-to-noise ratios" and Figure 4.

    1. Author Response:

      The following is the authors’ response to the previous reviews

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors investigated how partial loss of SynGap1 affects inhibitory neurons derived from the MGE in the auditory cortex, focusing on their synaptic inputs and excitability. While haplo-insufficiently of SynGap1 is known to lead to intellectual disabilities, the underlying mechanisms remain unclear.

      Strengths:

      The questions are novel

      Weaknesses:

      Despite the interesting and novel questions, there are significant issues regarding the experimental design and potential misinterpretations of key findings. Consequently, the manuscript contributes little to our understanding of SynGap1 loss mechanisms.

      Major issues in the second version of the manuscript:

      In the review of the first version there were major issues and contradictions with the sEPSC and mEPSC data, and were not resolved after the revision, and the new control experiments rather confirmed the contradiction.

      In the original review I stated: "One major concern is the inconsistency and confusion in the intermediate conclusions drawn from the results. For instance, while the sEPSC data indicates decreased amplitude in PV+ and SOM+ cells in cHet animals, the frequency of events remains unchanged. In contrast, the mEPSC data shows no change in amplitudes in PV+ cells, but a significant decrease in event frequency. The authors conclude that the former observation implies decreased excitability. However, traditionally, such observations on mEPSC parameters are considered indicative of presynaptic mechanisms rather than changes of network activity. The subsequent synapse counting experiments align more closely with the traditional conclusions. This issue can be resolved by rephrasing the text. However, it would remain unexplained why the sEPSC frequency shows no significant difference. If the majority of sEPSC events were indeed mediated by spiking (which is blocked by TTX), the average amplitudes and frequency of mEPSCs should be substantially lower than those of sEPSCs. Yet, they fall within a very similar range, suggesting that most sEPSCs may actually be independent of action potentials. But if that was indeed the case, the changes of purported sEPSC and mEPSC results should have been similar." Contradictions remained after the revision of the manuscript. On one hand, the authors claimed in the revised version that "We found no difference in mEPSC amplitude between the two genotypes (Fig. 1g), indicating that the observed difference in sEPSC amplitude (Figure 1b) could arise from decreased network excitability". On the other hand, later they show "no significative difference in either amplitude or inter-event intervals between sEPSC and mEPSC, suggesting that in acute slices from adult A1, most sEPSCs may actually be AP independent." The latter means that sEPSCs and mEPSCs are the same type of events, which should have the same sensitivity to manipulations.

      We thank the reviewer for the detailed comments. Our results suggest a diverse population of PV+ cells, with varying reliance on action potential-dependent and -independent release. Several PV+ cells indeed show TTX sensitivity (reduced EPSC event amplitudes following TTX application: See new Supplementary Figure 2b-e), but their individual responses are diluted when all cells are pooled together. To account for this variability, we recorded sEPSC followed by mEPSC from more mice of both genotypes (new Figure 1f-j). Further, following the editors and reviewers’ suggestions, we removed speculations about the role of network activity changes.

      In summary, our data confirmed that TTX blocked APs in PV+ cells and that recordings were stable as indicated by lack of changes in series resistance during the recording period in our experimental setup (new Suppl. Figure 2f-i). We found no difference in mEPSC amplitude between the two genotypes (Fig. 1g, right), indicating that the observed difference in sEPSC amplitude (Figure 1c, right) could be due to impaired AP-dependent release in cHet mice and the presence of large-amplitude sEPSCs that are preferentially affected by TTX in control mice (new Suppl. Figure 2b-e). Conversely, cHet mice showed longer inter-mEPSC time interval (cumulative distribution in Figure 1g, left), and significantly lower charge transfer and DQ*f (Figure 1j) compared to controls littermates, suggesting a decrease of glutamatergic presynaptic release sites onto PV+ cells. 

      Concerns about the quality of the synapse counting experiments were addressed by showing additional images in a different and explaining quantification. However, the admitted restriction of the analysis of excitatory synapses to the somatic region represent a limitation, as they include only a small fraction of the total excitation - even if, the slightly larger amplitudes of their EPSPs are considered.

      We agree with the reviewer that restricting the anatomical analysis of excitatory synapses to PV cell somatic region is a limitation, as highlighted it in the discussion of the revised manuscript. Recent studies, based on serial block-face scanning electron microscopy, suggest that cortical PV+ interneurons receive more robust excitatory inputs to their perisomatic region as compared to pyramidal neurons (see for example, Hwang et al. 2021, Cerebral Cortex, http://doi.org/10.1093/cercor/bhaa378). It is thus possible that putative glutamatergic synapses, analysed by vGlut1/PSD95 colocalisation around PV+ cell somata, may be representative of a substantially major excitatory input population. Since analysing putative excitatory synapses onto PV+ dendrites would be difficult and require a much longer time, we re-phrased the text to more clearly highlight the rationale and limitation of this approach.

      New experiments using paired-pulse stimulation provided an answer to issues 3 and 4. Note that the numbering of the Figures in the responses and manuscript are not consistent.

      We are glad that the reviewer found that the new paired-pulse experiments answered previously raised concerns. We corrected the discrepancy in figure numbers in the manuscript. Thank you for noticing.

      I agree that low sampling rate of the APs does not change the observed large differences in AP threshold, however, the phase plots are still inconsistent in a sense that there appears to be an offset, as all values are shifted to more depolarized membrane potentials, including threshold, AP peak, AHP peak. This consistent shift may be due to a non-biological differences in the two sets of recordings, and, importantly, it may negate the interpretation of the I/f curves results (Fig. 5e).

      We agree with the reviewers that higher sampling rate would allow to more accurately assess different parameters, such as AP peak, half-width, rise time, etc., while it would not affect the large differences in AP threshold we observed between control and mutant mice. Since the phase plots to not add to our result analysis, we removed them from the revised manuscript. 

      Additional issues:

      The first paragraph of the Results mentioned that the recorded cells were identified by immunolabelling and axonal localization. However, neither the Results nor the Methods mention the criteria and levels of measurements of axonal arborization.

      Recorded MGE-derived interneurons were filled with biocytin, and their identity was confirmed by immunolabeling for neurochemical markers (PV or SST) and analysis of anatomical properties. In particular, whole biocytin-positive immunolabelled neurons were acquired using a Leica SP8-DLS confocal microscope (20x objective, NA 0.75; Z-step 1 1μm).  For each imaged neuron, which was the result of multiple merged confocal stacks, we visually determined the spatial distribution across cortical layers of the axonal arbor and whether its dendrites carried spines.  We added this information in the method section. Furthermore, to better represent our methodological approach, we added a new figure (Supplemental Figure 1) including 1) two examples of PV+ interneurons, showing dendrites devoid of spines and axons spreading from Layer II to Layer V (new Suppl. Figure 1a); and 2) two examples of SST+ interneurons showing dendritic with spines and axons projecting from Layer IV to Layer I where they gave rise to multiple collaterals (new Suppl. Figure 1b).  

      The other issues of the first review were adequately addressed by the Authors and the manuscript improved by these changes.

      We are happy the reviewer found that the other issues were well addressed.

      Reviewer #3 (Public review):

      This paper compares the synaptic and membrane properties of two main subtypes of interneurons (PV+, SST+) in the auditory cortex of control mice vs mutants with Syngap1 haploinsufficiency. The authors find differences between control and mutants in both interneuron populations, although they claim a predominance in PV+ cells. These results suggest that altered PVinterneuron functions in the auditory cortex may contribute to the network dysfunctions observed in Syngap1 haploinsufficiency-related intellectual disability.

      The subject of the work is interesting, and most of the approach is rather direct and straightforward, which are strengths. There are also some methodological weaknesses and interpretative issues that reduce the impact of the paper.

      (1) Supplementary Figure 3: recording and data analysis. The data of Supplementary Figure 3 show no differences either in the frequency or amplitude of synaptic events recorded from the same cell in control (sEPSCs) vs TTX (mEPSCs). This suggests that, under the experimental conditions of the paper, sEPSCs are AP-independent quantal events. However, I am concerned by the high variability of the individual results included in the Figure. Indeed, several datapoints show dramatically different frequencies in control vs TTX, which may be explained by unstable recording conditions. It would be important to present these data as time course plots, so that stability can be evaluated. Also, the claim of lack of effect of TTX should be corroborated by positive control experiments verifying that TTX is working (block of action potentials, for example). Lastly, it is not clear whether the application of TTX was consistent in time and duration in all the experiments and the paper does not clarify what time window was used for quantification.

      We understand the reviewer’s concern about high variability. To account for this variability, we recorded sEPSC followed by mEPSC from more mice of both genotypes (see new Figure 1f-j). We confirmed that TTX worked as expected several times through the time course of this study, in different aliquots prepared from the same TTX vial that was used for all experiments. The results of the last test we performed, showing that TTX application blocks action potentials in a PV+ cell, are depicted in new Suppl. Figure 2a. Furthermore, new Suppl. Figure 2f-i shows series resistance (Rs) over time for 4 different PV+ interneurons, indicating recording stability. These results are representative of the entire population of recorded neurons, which we have meticulously analysed one by one. TTX was applied using the same protocol for all recorded neurons. In particular, sEPSCs were first sampled over a 2 min period. A TTX (1μM; Alomone Labs)-containing solution was then perfused into the recording chamber at a flow rate of 2 mL/min. We then waited for 5 min before sampling mEPSCs over a 2 min period. We added this information in the revised manuscript methods.

      (2)  Figure 1 and Supplementary Figure 3: apparent inconsistency. If, as the authors claim, TTX does not affect sEPSCs (either in the control or mutant genotype, Supplementary Figure 3 and point 1 above), then comparing sEPSC and mEPSC in control vs mutants should yield identical results. In contrast, Figure 1 reports a _selective_ reduction of sEPSCs amplitude (not in mEPSCs) in mutants, which is difficult to understand. The proposed explanation relying on different pools of synaptic vesicles mediating sEPSCs and mEPSCs does not clarify things. If this was the case, wouldn't it also imply a decrease of event frequency following TTX addition? However, this is not observed in Supplementary Figure 3. My understanding is that, according to this explanation, recordings in control solution would reflect the impact of two separate pools of vesicles, whereas, in the presence of TTX, only one pool would be available for release. Therefore, TTX should cause a decrease in the frequency of the recorded events, which is not what is observed in Supplementary Figure 3.

      To account for the large variability and clarify these results, we recorded sEPSCs followed by mEPSCs from more mice of both genotypes (new Figure 1f-j). We found no difference in mEPSC amplitude between the two genotypes (Fig. 1g, right), indicating that the observed difference in sEPSC amplitude (Figure 1c, right) could be due to impaired AP-dependent release in cHet mice and the presence of large-amplitude sEPSCs that are preferentially affected by TTX in control mice (new Suppl. Figure 2b-e). Conversely, cHet mice showed longer inter-mEPSC time interval (cumulative distribution in Figure 1g, left), and significantly lower charge transfer and DQ*f (Figure 1j) compared to controls littermates, suggesting a decrease of glutamatergic presynaptic release sites. We rephrased the text in the revised manuscript according to the updated data and, following the reviewer’s suggestions, we removed speculations relying on different pools of synaptic vesicles.

      (3) Figure 1: statistical analysis. Although I do appreciate the efforts of the authors to illustrate both cumulative distributions and plunger plots with individual data, I am confused by how the cumulative distributions of Figure 1b (sEPSC amplitude) may support statistically significant differences between genotypes, but this is not the case for the cumulative distributions of Figure 1g (inter mEPSC interval), where the curves appear even more separated. A difference in mEPSC frequency would also be consistent with the data of Supplementary Fig 2b, which otherwise are difficult to reconciliate. I would encourage the authors to use the Kolmogorov-Smirnov rather than a t-test for the comparison of cumulative distributions.

      We thank the reviewer for this thoughtful suggestion. We recorded more mice of both genotypes and the updated data now show a significant difference between the cumulative distributions of the inter mEPSC intervals recorded from the two genotypes (new Figure 1g). For statistical analysis, we based our conclusion on the statistical results generated by LMM, modelling animal as a random effect and genotype as fixed effect. We used this statistical analysis because we considered the number of mice as independent replicates and the number of cells in each mouse as repeated measures (Berryer et al. 2016; Heggland et al., 2019; Yu et al., 2022). For cumulative distributions, the same number of events was chosen randomly from each cell and analysed by LMM, modelling animal as a random effect and genotype as fixed effect. The reason we decided to use LMM for our statistical analyses is based on the growing concern over reproducibility in biomedical research and the ongoing discussion on how data are analysed (see for example, Yu et al (2022), Neuron 110:21-35 https://doi: 10.1016/j.neuron.2021.10.030; Aarts et al. (2014). Nat Neurosci 17, 491–496. https://doi.org/10.1038/nn.3648). We acknowledge that patch-clamp data has been historically analysed using t-test and analysis of variance (ANOVA), or equivalent nonparametric tests. However, these tests assume that individual observations (recorded neurons in this case) are independent of each other. Whether neurons from the same mouse are independent or correlated variables is an unresolved question, but does not appear to be likely from a biological point of view. Statisticians have developed effective methods to analyze correlated data, including LMM.

      (4) Methods. I still maintain that a threshold at around -20/-15 mV for the first action potential of a train seems too depolarized (see some datapoints of Fig 5c and Fig7c) for a healthy spike. This suggest that some cells were either in precarious conditions or that the capacitance of the electrode was not compensated properly.

      As suggested by the reviewer, in the revised figures we excluded the neurons with threshold at -20/-15 mV. In addition, we performed statistical analysis with and without these cells (data reported below) and found that whether these cells are included or excluded, the statistical significance of the results does not change.

      Fig.5c: including the 2 outliers from cHet group with values of -16.5 and 20.6 mV: 42.6±1.01 mV in control, n=33 cells from 15 mice vs -35.3±1.2 mV in cHet, n=40 cells from 17 mice, ***p<0.001, LMM; excluding the 2 outliers from cHet group -42.6±1.01 mV in control, n=33 cells from 15 mice vs -36.2±1.1 mV in cHet, n=38 cells from 17 mice, ***p<0.001, LMM.

      Fig.7c: including the 2 outliers from cHet group with values of -16.5 and 20.6 mV: 43.4±1.6 mV in control, n=12 cells from 9 mice vs -33.9±1.8 mV in cHet, n=24 cells from 13 mice, **p=0.002, LMM; excluding the 2 outliers from cHet group -43.4±1.6 mV in control, n=12 cells from 9 mice vs -35.4±1.7 mV in cHet, n=22 cells from 13 mice, *p=0.037, LMM.

      (5) The authors claim that "cHet SST+ cells showed no significant changes in active and passive membrane properties (Figure 8d,e); however, their evoked firing properties were affected with fewer AP generated in response to the same depolarizing current injection".

      This sentence is intrinsically contradictory. Action potentials triggered by current injections are dependent on the integration of passive and active properties. If the curves of Figure 8f are different between genotypes, then some passive and/or active property MUST have changed. It is an unescapable conclusion. The general _blanket_ statement of the authors that there are no significant changes in active and passive properties is in direct contradiction with the current/#AP plot.

      We agreed with the reviewer and rephrased the abstract, results and discussion according to better represent the data. As discussed in the previous revision, it's possible that other intrinsic factors, not assessed in this study, may have contributed to the effect shown in the current/#AP plot. 

      (6) The phase plots of Figs 5c, 7c, and 7h suggest that the frequency of acquisition/filtering of current-clamp signals was not appropriate for fast waveforms such as spikes. The first two papers indicated by the authors in their rebuttal (Golomb et al., 2007; Stevens et al., 2021) did not perform a phase plot analysis (like those included in the manuscript). The last work quoted in the rebuttal (Zhang et al., 2023) did perform phase plot analysis, but data were digitized at a frequency of 20KHz (not 10KHz as incorrectly indicated by the authors) and filtered at 10 kHz (not 2-3 kHz as by the authors in the manuscript). To me, this remains a concern.

      We agree with the reviewer that higher sampling rate would allow to more accurately assess different AP parameters, such as AP peak, half-width, rise time, etc. The papers were cited in context of determining AP threshold, not performing phase plot analysis. We apologize for the confusion and error. Finally, we removed the phase plots since they did not add relevant information. 

      (7)  The general logical flow of the manuscript could be improved. For example, Fig 4 seems to indicate no morphological differences in the dendritic trees of control vs mutant PV cells, but this conclusion is then rejected by Fig 6. Maybe Fig 4 is not necessary. Regarding Fig 6, did the authors check the integrity of the entire dendritic structure of the cells analyzed (i.e. no dendrites were cut in the slice)? This is critical as the dendritic geometry may affect the firing properties of neurons (Mainen and Sejnowski, Nature, 1996).

      As suggested by the reviewer, we removed Fig.4. All the reconstructions used for dendritic analysis contained intact cells with no evidently cut dendrites.

    1. Author response:

      The following is the authors’ response to the current reviews.

      We thank the editors at eLife and the one reviewer for engaging our revised manuscript. As we noted in our previous response to reviewers, which we wrote in October 2024 when we submitted our initial revision the majority of critique we received was targeted not so much at the argument of this manuscript but at the debate regarding the evidence in the two other manuscripts that this one accompanied; “ Evidence for deliberate burial of the dead by Homo naledi” and “241,000 to 335,000 Years Old Rock Engravings Made by Homo naledi in the Rising Star Cave system, South Africa.” Because of that critique we revised this manuscript to emphasize that the key element in constructing our argument is that H. naledi engaged in mortuary behavior (the movement of dead H. naledi by living H. naledi into the Rising Star cave system) and place that in context of a) the increasingly complex later Pleistocene record of meaning making activity and b) the assumed correlations between brain size and cognitive capacities in Pliocene and Pleistocene hominins. This framing, as noted in the eLife editorial comment, is the main thrust of our manuscript. There is a growing convergence of evidence that totality of the currently available data and analyses for H. naledi in the Rising Star cave system support mortuary behavior: that is, the agential and intentional action by H. naledi individuals in the transport of bodies to the Lesedi Chamber and Dinaledi Subsystem--see Berger et al. 2025 plus the 2nd round reviews and the eLife editorial comment associated with it, and also Van Rooyen et al. 2025. We acknowledge the serious debates around the assertion of funerary behavior (cultural burial) and seek to illustrate that while we believe the data support the funerary behavior hypothesis, it is not a necessary requirement for our main argument.

      A few specific responses to the reviewer in this revised manuscript:

      Reviewer states: “Claims for a positive correlation between absolute and/or relative brain size and cognitive ability are not common in discussions surrounding the evolution of Middle- and Late Pleistocene hominin behavior.” We are not making the argument that absolute brain size in the later Pleistocene is a point of focus, rather that there are many arguments and assertions about EQ and cognitive capacity that are central in the proposals for the evolution of hominins in general and genus Homo in particular across the Plio-pleistocene period. We offer a brief review of this in the text and suggest, as noted by this reviewer, that “exploration of the specific/potential socio-cultural, neuro-structural, ecological and other factors will be more informative than the emphasis on absolute/relative brain size”…this (in their words) is exactly our main point. However, we contend that such a framing should not be exclusive to later Pleistocene contexts, but rather that the examination of earlier hominins might also be better served by moving away from the traditional assumptions of cognitive complexity associated with absolute/relative brain size. The reviewer states: “The authors use, in a number of instances throughout the paper, secondary sources of information such as review papers (e.g., McBrearty & Brooks 2000; Scerri & Will 2023; Galway-Witham et al. 2019) instead of the original works that are the basis for making the desired case.” We do indeed use review papers in the main body of the text for clarity, brevity, and to acknowledge robust previous review work in these areas, however in the supplemental text and with the figures and table we offer substantive bibliographies of the original citations and studies. We encourage readers to please spend time with those materials as well. Finally, the reviewer states: “Given the inadequate analyses in the accompanying papers, and the lack of evidence for stone tools in the naledi sites, the present claims for the expression of culturally and symbolically mediated behaviors by this small-brained hominin must be adequately established.” We are quite specific in this manuscript, and in other publications, that we are not arguing for “symbolically mediated” behavior, but do stand by our non-controversial suggestions of meaning-making, and cultural behavior, as relevant in Pleistocene hominins (e.g. Kissel and Fuentes 2017, 2018). We do not argue that stone tools are necessary as mandatory indicators of such possibilities and lay out the H. naledi information in the context of the broader and increasing datasets and analyses for meaning-making behavior in Pleistocene hominins (see Figure 1 and table 1, and in the text).

      Our point with this manuscript which we reiterate here is that “The increasing data for complex behavior and meaning-making across the Pleistocene should play a major element in structuring how we investigate, explain, and model the origins and patterns of hominin and human evolution” and we feel that the current evidence for H. naledi behavior contributes to the broader suites of data, hypotheses, analyses, and theory building in this endeavor.


      The following is the authors’ response to the original reviews.

      Before laying out how we addressed the specific comments on this manuscript we want to clarify the goal and intent of this paper to maximize effective critical reading of its contents. We appreciate and look forward to continued critique and enhanced discussion of this topic and argument.

      Our starting point for constructing the argument in this manuscript is that H. naledi engaged in mortuary behavior. This emerges from the totality of the currently available data and analyses for Homo naledi in the Rising Star cave system, which support agential and intentional action by Homo naledi individuals in the transport of bodies to the Lesedi Chamber and Dinaledi Subsystem. We do feel that the data support the cultural burial hypothesis as well as the likelihood that at least some of the markings reported as engravings are non-naturally occurring (see Martinón-Torres et al. 2024) and made by Homo naledi. But these two elements are not necessary for the validity of the argument we pursue in this manuscript.

      Our second key point is that gross brain size does not necessarily correlate with particular patterns of complex behavior in Pleistocene hominins. On this there is wide agreement, yet both scholarly and public arguments for the success of the genus Homo and the success of Homo sapiens have incorporated an assumption of a Rubicon of cerebral size. From this we propose a third point: that smaller brained Pleistocene hominins, including Homo naledi, are part of a Pleistocene hominin niche that includes patterns of complex social and cognitive behavior. Such behavior was historically considered to be exclusive to Homo sapiens but is now documented to occur earlier, across a range of hominin taxa in the latter half of the Pleistocene. We offer the case of H. naledi behavior in the Rising Star system as an example of this. This case contributes to the development of a broader approach to the cognitive, physiological, and behavioral framings of, and explanations for, Pleistocene hominin behavior.

      Responses to specific critiques in the eLife reviews centered on this manuscript:

      Reviewer #1:

      All inferences regarding hominin behaviour and biology of Homo naledi, discussed by Fuentes and colleagues, are wholly dependent on the evidence presented in the archaeology preprints being true.

      Reviewer #2:

      Fuentes et al. provide a detailed and thoughtful commentary on the evolutionary and behavioral implications of complex behaviors associated with a small-brained hominin, Homo naledi…..While the review by Fuentes et al. highlights important assumptions about the relationship between hominin brain size, cognition, and complex behaviors, the evidence presented by Berger et al. 2023a,b does not support the claim that Homo naledi engaged in burial practices or symbolic expression through wall engravings.

      Reviewer #3:

      This paper presents the cognitive implications of claims made in two accompanying papers (Berger et al. 2023a, 2023b) about the creation of rock engravings, the intentional disposal of the dead, and fire use by Homo naledi. The importance of the paper, therefore, relies on the validity of the claims for the presence of socio-culturally complex and cognitively demanding behaviors that are presented in the associated papers. Given the archaeological, hominin, and taphonomic analyses in the associated papers are not adequate to enable the exceptional claims for nalediassociated complex behaviors, the inferences made in this paper are currently inadequate and incomplete.

      We have clarified in the manuscript text and above why we argue that the inferences we are setting as core to our argument do not require cultural burial or engravings by H. naledi be demonstrated. However, we do clarify in the revision that the current evidence for the transport of dead conspecifics into difficult to reach areas deep into the cave system by naledi is well supported by the archeological and paleoanthropological data currently available (e.g. Berger et al. 2024, Elliott et al. 2021, Robbins et al. 2021, Hawks et al. 2017) and that this is the basis for our argument.

      Reviewer #3:

      The claimed behaviors are widely recognized as complex and even quintessential to Homo sapiens. The implications of their unequivocal association with such a small-brained Middle Pleistocene hominin are thus far reaching. Accordingly, the main thrust of the paper is to highlight that greater cognition and complex socio-cultural behaviors were not necessarily associated with a positively encephalized brain. This argument begs the obvious question of whether absolute brain size and/or encephalization quotient (i.e., the actual brain volume of a given species relative the expected brain size for a species of the same average body size) can measure cognitive capacity and the complexity of socio-cultural behaviors among late Middle Pleistocene hominins….Claims for a positive correlation between absolute and/or relative brain size and cognitive ability are not common in discussions surrounding the evolution of Middle- and Late Pleistocene hominin behavior.

      We assert that claims for a positive correlation between absolute and/or relative brain size and cognitive ability are central—either explicitly or implicitly—in most arguments concerning cognitively complex behavior in the genus Homo. This is especially true for ideas about success of Pleistocene Homo relative to other hominins. We clarify this in the text offering various citations in support of this position (e.g. Meneganzin and Currie 2022, Galway-Witham, Cole, and Stringer 2019, DeCasien, Barton, and Higham 2022, Dunbar 2003, Kissel and Fuentes 2021, Muthukrishna et al. 2018, Püschelet al. 2021, Tattersall 2023).

      Reviewer #3:

      Currently, the bulk of the evidence for early complex technological and social behaviors derives from multiple sites across South Africa and postdates the emergence of H. sapiens by more than 100,000 years. Such lag in the expression of complex technologies and behaviors within our species renders the brain size-implies-cognitive capacity argument moot. Instead, a rich body of research over the past several decades has focused on aspects related to sociocultural, environmental, and even the wiring of the brain in order to understand factors underlying the expression of the capacity for greater behavioral variability. In this regard, even if the claimed evidence for complex behaviors among the small-brained naledi populations proves valid, the exploration of the specific/potential socio-cultural, neuro-structural, ecological and other factors will be more informative than the emphasis on absolute/relative brain size.”

      While not at all denying the critically important and rich record of cultural complexity in the Late Pleistocene South African archeological record, we disagree that “the bulk of the evidence for early complex technological and social behaviors derives from multiple sites across South Africa and postdates the emergence of H. sapiens by more than 100,000 years”. We offer a range of examples and citations in support of our assertion in the text (esp. in pp12-14 and Table 1 and Figure 1)

      We lay out the currently available data for such cultural complexity in Figure 1 with extensive documentation and citations for each case in the Supplementary material (both aa a table and a bibliography). We wholly agree with Reviewer 3 that “the exploration of the specific/potential socio-cultural, neuro-structural, ecological and other factors will be more informative than the emphasis on absolute/relative brain size” and are attempting to do just that in the manuscript.

      Reviewer #3:

      The paper presents as supporting evidence previous claims for the appearance of similar complex behaviors predating the emergence of our species, H. sapiens, although it does acknowledge their controversial nature. It then uses the current claims for the association of such behaviors with H. naledi as decisive. Given the inadequate analyses in the accompanying papers and the lack of evidence for stone tools in the naledi sites, the present claims for the expression of culturally and symbolically mediated behaviors by this small-brained hominin must be adequately established.

      We respond to the first part of this critique above (regarding the other papers). But again, we emphasize that although we do feel that the argument for cultural burial is supported (see Berger et al. 2024 preprint) what we are arguing for in this paper is that the agential and intentional transportation of dead (mortuary behavior) is the sufficient factor undergirding our proposal. We do not agree that absence of recognizable stone tools at the site negates our proposal and assert that the context provided by Figure 1, and the data in the table for figure 1 in the SOM, in concert with the supported mortuary behavior (transport and emplacement of the dead) offer sufficient support for the argument we make in the text regarding brain size and the role of emotional cognition and complex behavior in the Pleistocene hominin niche and H. naledi’s participation in it.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      The paper by Lee and Ouellette explores the role of cyclic-d-AMP in chlamydial developmental progression. The manuscript uses a collection of different recombinant plasmids to up- and down-regulate cdAMP production, and then uses classical molecular and microbiological approaches to examine the effects of expression induction in each of the transformed strains. 

      Strengths: 

      This laboratory is a leader in the use of molecular genetic manipulation in Chlamydia trachomatis and their efforts to make such efforts mainstream is commendable. Overall, the model described and defended by these investigators is thorough and significant.

      Thank you for these comments.

      Weaknesses: 

      The biggest weakness in the document is their reliance on quantitative data that is statistically not significant, in the interpretation of results. These challenges can be addressed in a revision by the authors. 

      Thank you for these comments. We point out that, while certain RT-qPCR data may not be statistically significant, our RNAseq data indicate late genes are, as a group, statistically significantly increased when increasing c-di-AMP levels and decreased when decreasing c-di-AMP levels. We do not believe running additional experiments to “achieve” statistical significance in the RT-qPCR data is worthwhile. We hope the reviewer agrees with this assessment.

      We have also included new data in this revised manuscript, which we believe further strengthens aspects of the conclusions linked to individual expression of full-length DacA isoforms. We have also quantified inclusion areas and bacterial sizes for critical strains.

      Reviewer #2 (Public review): 

      Summary: 

      This manuscript describes the role of the production of c-di-AMP on the chlamydial developmental cycle. Chlamydia are obligate intracellular bacterial pathogens that rely on eukaryotic host cells for growth. The chlamydial life cycle depends on a cell form developmental cycle that produces phenotypically distinct cell forms with specific roles during the infectious cycle. The RB cell form replicates amplifying chlamydia numbers while the EB cell form mediates entry into new host cells disseminating the infection to new hosts. Regulation of cell form development is a critical question in chlamydia biology and pathogenesis. Chlamydia must balance amplification (RB numbers) and dissemination (EB numbers) to maximize survival in its infection niche. The main findings In this manuscript show that overexpression of the dacA-ybbR operon results in increased production of c-di-AMP and early expression of the transitionary gene hctA and late gene omcB. The authors also knocked down the expression of the dacA-ybbR operon and reported a reduction in the expression of both hctA and omcB. The authors conclude with a model suggesting the amount of c-di-AMP determines the fate of the RB, continued replication, or EB conversion. Overall, this is a very intriguing study with important implications however the data is very preliminary and the model is very rudimentary and is not well supported by the data. 

      Thank you for your comments. Chlamydia is not an easy experimental system, but we have done our best to address the reviewer’s concerns in this revised submission.

      Describing the significance of the findings: 

      The findings are important and point to very exciting new avenues to explore the important questions in chlamydial cell form development. The authors present a model that is not quantified and does not match the data well. 

      Describing the strength of evidence: 

      The evidence presented is incomplete. The authors do a nice job of showing that overexpression of the dacA-ybbR operon increases c-di-AMP and that knockdown or overexpression of the catalytically dead DacA protein decreases the c-di-AMP levels. However, the effects on the developmental cycle and how they fit the proposed model are less well supported. 

      dacA-ybbR ectopic expression: 

      For the dacA-ybbR ectopic expression experiments they show that hctA is induced early but there is no significant change in OmcB gene expression. This is problematic as when RBs are treated with Pen (this paper) and (DOI 10.1128/MSYSTEMS.00689-20) hctA is expressed in the aberrant cell forms but these forms do not go on to express the late genes suggesting stress events can result in changes in the developmental expression kinetic profile. The RNA-seq data are a little reassuring as many of the EB/Late genes were shown to be upregulated by dacA-ybbR ectopic expression in this assay.

      As the reviewer notes, we also generated RNAseq data, which validates that late gene transcripts (including sigma28 and sigma54 regulated genes) are statistically significantly increased earlier in the developmental cycle in parallel to increased c-di-AMP levels. The lack of statistical significance in the RT-qPCR data for omcB, which shows a trend of higher transcripts, is less concerning given the statistically significantly RNAseq dataset. We have reported the data from three replicates for the RT-qPCR and do not think it would be worthwhile to attempt more replicates in an attempt to “achieve” statistical significance.

      We recognize that hctA may also increase during stress as noted by the Grieshaber Lab. In re-evaluating these data, we decided to remove the Penicillin-linked studies from the manuscript since they detract from the focus of the story we are trying to tell given the potential caveat the reviewer mentions.

      The authors also demonstrate that this ectopic expression reduces the overall growth rate but produces EBs earlier in the cycle but overall fewer EBs late in the cycle. This observation matches their model well as when RBs convert early there is less amplification of cell numbers. 

      dacA knockdown and dacA(mut) 

      The authors showed that dacA knockdown and ectopic expression of the dacA mutant both reduced the amount of c-di-AMP. The authors show that for both of these conditions, hctA and omcB expression is reduced at 24 hpi. This was also partially supported by the RNA-seq data for the dacA knockdown as many of the late genes were downregulated. However, a shift to an increase in RB-only genes was not readily evident. This is maybe not surprising as the chlamydial inclusion would just have an increase in RB forms and changes in cell form ratios would need more time points.

      Thank you for this comment. We agree that it is not surprising given the shift in cell forms. The reduction in hctA transcripts argues against a stress state as noted above by the reviewer, and the RNAseq data from dacA-KD conditions indicates at least that secondary differentiation has been delayed. We agree that more time points would help address the reviewer’s point, but the time and cost to perform such studies is prohibitive with an obligate intracellular bacterium.

      Interestingly, the overall growth rate appears to differ in these two conditions, growth is unaffected by dacA knockdown but is significantly affected by the expression of the mutant. In both cases, EB production is repressed. The overall model they present does not support this data well as if RBs were blocked from converting into EBs then the growth rate should increase as the RB cell form replicates while the EB cell form does not. This should shift the population to replicating cells. 

      We agree that it seems that perturbing c-di-AMP production by knockdown or overexpressing the mutant DacA(D164N) has different impacts on chlamydial growth. We have generated new data, which we believe addresses this. Overexpressing membrane-localized DacA isoforms is clearly detrimental to chlamydiae as noted in the manuscript. However, when we removed the transmembrane domain and expressed N-terminal truncations of these isoforms, we observed no effects of overexpression on chlamydial morphology or growth. Importantly, for the wild-type full-length or truncated isoforms, overexpressing each resulted in the same level of c-di-AMP production, further supporting that the negative effect of overexpressing the wild-type full-length is linked to its membrane localization and not c-di-AMP levels. These data have been included as new Figure 3. These data indicate that too much DacA in the membrane is disruptive and suggest that the balance of DacA to YbbR is important since overexpression of both did not result in the same phenotype. This is further described in the Discussion.

      As it relates to knockdown of dacA-ybbR, we have essentially removed/reduced the amount of these proteins from the membrane and have blocked the production of c-di-AMP. This is fundamentally different from overexpression.

      Overall this is a very intriguing finding that will require more gene expression data, phenotypic characterization of cell forms, and better quantitative models to fully interpret these findings. 

      Reviewer #1 (Recommendations for the authors): 

      There is a generally consistent set of experiments conducted with each of the mutant strains, allowing a straightforward examination of the effects of each transformant. There are a few general and specific things that need to be addressed for both the benefit of the reader and the accuracy of interpretation. The following is a list of items that need to be addressed in the document, with an overall goal of making it more readable and making the interpretations more quantitatively defended. 

      Specific comments: 

      (1) The manuscript overall is wordy and there are quite a few examples of text in the results that should be in the discussion (examples include lines 224-225, 248-262, 282-288, 304-308) the manuscript overall could use a careful editing for verbosity. 

      Thank you for this comment. We have removed some of the indicated sentences. However, to maintain the flow and logic of the manuscript, some statements may have been preserved to help transition between sections. As far as verbosity, we have tried to be as clear as possible in our descriptions of the results to minimize ambiguity. Others who read our manuscript appreciated the thoroughness of our descriptions.

      (2) There is also a trend in the document to base fact statements on qualitative and quantitative differences that do not approach statistical significance. Examples of this include the following: lines 156-158, 190-192, 198-199, 230-232, 239-242, 292-293). This is something the authors need to be careful about, as these different statistically insignificant differences may tend to multiply a degree of uncertainty across the entire manuscript. 

      We have quantified inclusion areas and tried to remove instances of qualitative assessments as noted by the reviewer. In regards to some of the transcripts, we can only report the data as they are. In some cases, there are trends that are not statistically significant, but it would seem to be inaccurate to state that they were unchanged. In other cases, a two-fold or less difference in transcript levels may be statistically significant but biologically insignificant. A reader can and should make their own conclusions.

      (3) Any description of inclusion or RB size being modestly different needs to be defended with microscopic quantification. 

      We have quantified inclusion areas and RB sizes and tried to remove instances of qualitative assessments as noted by the reviewer.

      (4) It would be very helpful to reviewers if there was a figure number added to each figure in the reviewer-delivered text. 

      Added.

      (5) Figure 1A: This should indicate that the genes indicated beneath each developmental form are on high (I think that is what that means). 

      We have reorganized Figure 1 to better improve the flow.

      (6) Figure 1B is exactly the same as the three images in Figure 8B. I would delete this in Figure 1. This relates to comment 9. 

      We presented this intentionally to clearly illustrate to the reader, who may not be knowledgeable in this area, what we propose is happening in the various strains. As such, we respectfully disagree and have left this aspect of the figure unchanged.

      (7) Figure 1D: It is not clear if the period in E.V has any meaning. I think this is just a typo. Also, the color coding needs to be indicated here. What do the gray bars represent? The labeling for the gene schematic for dacA-KDcom should not be directly below the first graph in D. This makes the reader think this is a label for the graph. This can be accomplished if the image in panel B is removed and the first graph in panel D is moved into B. This will make a better figure. 

      We have reorganized Figure 1 to better improve the flow.

      (8) Figure 2 C, G: The utility of these panels is not clear. For them to have any value, they need to be expressed in genome copies. If they are truly just a measure of chlamydia genomic DNA, they have minimal utility to the reader. There are similar panels in several other figures. 

      We have reported genome copies as suggested in lieu of ng gDNA for these measurements. Importantly, it does not alter any interpretations.

      (9) I am not sure about the overall utility of Figure 8. Granted, a summary of their model is useful, but the cartoons in the figure are identical or very nearly identical to model figures shown in two other publications from the same group (PMID: 39576108, 39464112) These are referenced at least tangentially in the current manuscript (Jensen paper- now published- and ref 53). Because the model has been published before, if they are to be included, there needs to be a direct comparison of the results in each of these three papers, as they basically describe the same developmental process. The model images should also be referenced directly to the first of the other papers.

      This was intentional so that readers familiar with our work will see the similarities between these systems. We have added additional comments in the Discussion related to our newly published work. As an aside, Dr. Lee generated the first version of the figure that was adapted by others in the lab. It is perhaps unlucky that those other studies have been published before his work.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review)

      Summary:

      Advances in machine vision and computer learning have meant that there are now state-of-the-art and open-source toolboxes that allow for animal pose estimation and action recognition. These technologies have the potential to revolutionize behavioral observations of wild primates but are often held back by labor-intensive model training and the need for some programming knowledge to effectively leverage such tools. The study presented here by Fuchs et al unveils a new framework (ASBAR) that aims to automate behavioral recognition in wild apes from video data. This framework combines robustly trained and well-tested pose estimate and behavioral action recognition models. The framework performs admirably at the task of automatically identifying simple behaviors of wild apes from camera trap videos of variable quality and contexts. These results indicate that skeletal-based action recognition offers a reliable and lightweight methodology for studying ape behavior in the wild and the presented framework and GUI offer an accessible route for other researchers to utilize such tools.

      Given that automated behavior recognition in wild primates will likely be a major future direction within many subfields of primatology, open-source frameworks, like the one presented here, will present a significant impact on the field and will provide a strong foundation for others to build future research upon.

      Strengths:

      Clearly articulated the argument as to why the framework was needed and what advantages it could convey to the wider field.

      For a very technical paper it was very well written. Every aspect of the framework the authors clearly explained why it was chosen and how it was trained and tested. This information was broken down in a clear and easily digestible way that will be appreciated by technical and non-technical audiences alike.

      The study demonstrates which pose estimation architectures produce the most robust models for both within-context and out-of-context pose estimates. This is invaluable knowledge for those wanting to produce their own robust models.

      The comparison of skeletal-based action recognition with other methodologies for action recognition helps contextualize the results.

      We thank Reviewer #1 for their thoughtful and constructive review of our manuscript. We are especially grateful for your recognition of the clarity of the manuscript, the strength of the technical framework, and its accessibility to both technical and non-technical audiences. Your feedback highlights exactly the kind of interdisciplinary engagement we hope to foster with this work.

      Weaknesses

      While I note that this is a paper most likely aimed at the more technical reader, it will also be of interest to a wider primatological readership, including those who work extensively in the field. When outlining the need for future work I felt the paper offered almost exclusively very technical directions. This may have been a missed opportunity to engage the wider readership and suggest some practical ways those in the field could collect more ASBAR-friendly video data to further improve accuracy.

      We appreciate this insightful suggestion and fully agree that emphasizing practical relevance is important for engaging a broader readership. In response, we have reformulated the opening of the Discussion section to place stronger emphasis on the value of shared, open-source resources and the real-world accessibility of the ASBAR framework. The revised text explicitly highlights the practical benefits of ASBAR for field researchers working in resource-constrained environments, and underscores the importance of community-driven data sharing to advance behavioral research in natural settings.

      This section now reads: Despite the growing availability of open-source resources, such as large-scale animal pose datasets and machine learning toolboxes for pose estimation and human skeleton-based action recognition, their integration for animal behavior recognition—particularly in natural settings—remains largely unexplored. With ASBAR, a framework combining animal pose estimation and skeleton-based action recognition, we provide a comprehensive data and model pipeline, methodology, and GUI to assist researchers in automatically classifying animal behaviors via pose estimation. We hope these resources will become valuable tools for advancing the understanding of animal behavior within the research community.

      To illustrate ASBAR’s capabilities, we applied it to the challenging task of classifying great ape behaviors in their natural habitat. Our skeletonbased approach achieved accuracy comparable to previous video-based studies for Top-K and Mean Class Accuracies. Additionally, by reducing the input size of the action recognition model by a factor of approximately 20 compared to video-based methods, our approach requires significantly less computational power, storage space, and data transfer resources. These qualities make ASBAR particularly suitable for field researchers working in resource-constrained environments.

      Our framework and results are built on the foundation of shared and open-source materials, including tools like DeepLabCut, MMAction2, and datasets such as OpenMonkeyChallenge and PanAf500. This underscores the importance of making resources publicly available, especially in primatology, where data scarcity often impedes progress in AI-assisted methodologies. We strongly encourage researchers with large annotated video datasets to make them publicly accessible to foster interdisciplinary collaboration and further advancements in animal behavior research.

      Reviewer #2 (Public Review)

      Fuchs et al. propose a framework for action recognition based on pose estimation. They integrate functions from DeepLabCut and MMAction2, two popular machine-learning frameworks for behavioral analysis, in a new package called ASBAR.

      They test their framework by

      Running pose estimation experiments on the OpenMonkeyChallenge (OMC) dataset (the public train + val parts) with DeepLabCut.

      Annotating around 320 image pose data in the PanAf dataset (which contains behavioral annotations). They show that the ResNet-152 model generalizes best from the OMC data to this out-of-domain dataset.

      They then train a skeleton-based action recognition model on PanAf and show that the top-1/3 accuracy is slightly higher than video-based methods (and strong), but that the mean class accuracy is lower - 33% vs 42%. Likely due to the imbalanced class frequencies. This should be clarified. For Table 1, confidence intervals would also be good (just like for the pose estimation results, where this is done very well).

      We thank Reviewer #2 for their clear and helpful summary of our work, and for the thoughtful suggestions to improve the manuscript. We appreciate this observation. In the revised manuscript, we now clarify that the lower Mean Class Accuracy (MCA) in the initial version was indeed driven by significant class imbalance in the PanAf dataset, which contains highly uneven representation across behavior categories. To address this, we made two key improvements to the action recognition model:

      (1) We replaced the standard cross-entropy loss with a class-balanced focal loss, following the approach of Sakib et al. (2021), to better account for rare behaviors during training.

      (2) We initialized the PoseConv3D model with pretrained weights from FineGym (Shao et al., 2020) rather than training from scratch, which increased performance across underrepresented classes.

      Together, these changes substantially improved model performance on tail classes, increasing the Mean Class Accuracy from 33.6% to 47%, now exceeding that of the videobased baseline.

      Moreover, we sincerely thank Reviewer #2 for the thorough and constructive private feedback. Your comments have greatly helped us improve both the structure and clarity of the manuscript, and we have implemented several key revisions based on your recommendations to streamline the text and sharpen its focus on the core contributions. In particular, we have revised the tone of both the Introduction and Discussion sections to more modestly and accurately reflect the scope of our findings. We removed unnecessary implementation details—such as the description of graph-based models that were not part of the final pipeline—to avoid distracting tangents. The Methods section has been clarified and consolidated to include all evaluation metrics, a description of the data augmentation, and other methodological elements that were previously scattered across the Results section. Additionally, the Discussion now explicitly addresses the limitations of our EfficientNet results, including a dedicated paragraph that acknowledges the use of suboptimal hyperparameters and highlights the need for architecture-specific tuning, particularly with respect to learning rate schedules.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Given the importance that these coupling mechanisms have been given in theory, this is a timely and important contribution to the literature in terms of determining whether these theoretical assumptions hold true in human data.

      Thank you!

      I did not follow the logic behind including spindle amplitude in the meta-analysis. This is not a measure of SO-spindle coupling (which is the focus of the review), unless the authors were restricting their analysis of the amplitude of coupled spindles only. It doesn't sound like this is the case though. The effect of spindle amplitude on memory consolidation has been reviewed in another recent meta-analysis (Kumral et al, 2023, Neuropsychologia). As this isn't a measure of coupling, it wasn't clear why this measure was included in the present meta-analysis. You could easily make the argument that other spindle measures (e.g., density, oscillatory frequency) could also have been included, but that seems to take away from the overall goal of the paper which was to assess coupling.

      Indeed, spindle amplitude refers to all spindle events rather than only coupled spindles. This choice was made because we recognized the challenge of obtaining relevant data from each study—only 4 out of the 23 included studies performed their analyses after separating coupled and uncoupled spindles. This inconsistency strengthens the urgency and importance of this meta-analysis to standardize the methods and measures used for future analysis on SO-SP coupling and beyond. We agree that focusing on the amplitude of coupled spindles would better reveal their relations with coupling, and we have discussed this limitation in the manuscript.

      Nevertheless, we believe including spindle amplitude in our study remains valuable, as it served several purposes. First, SO-SP coupling involves the modulation between spindle amplitude and slow oscillation phase. Different studies have reported conflicting conclusions regarding how overall spindle amplitude was related to coupling as an indicator of oscillation strength overnight– some found significant correlations (e.g., Baena et al., 2023), while others did not (e.g., Roebber et al., 2022). This discrepancy highlights an indirect but potentially crucial insight into the role of spindle amplitude in coupling dynamics. Second, in studies related to SO-SP coupling, spindle amplitude is one of the most frequently reported measures along with other coupling measures that significantly correlated with oversleep memory improvements (e.g. Kurz et al., 2023; Ladenbauer et al., 2021; Niknazar et al., 2015), so we believe that including this measure can provide a more comprehensively review of the existing literature on SO-SP coupling. Third, incorporating spindle amplitude allows for a direct comparison between the measurement of coupling and individual events alone in their contribution to memory consolidation– a question that has been extensively explored in recent research. (e.g., Hahn et al., 2020; Helfrich et al., 2019; Niethard et al., 2018; Weiner et al., 2023). Finally, spindle amplitude was identified as the most important moderator for memory consolidation in Kumral et al.'s (2023) meta-analysis. By including it in our analysis, we sought to replicate their findings within a broader framework and introduce conceptual overlaps with existing reviews. Therefore, although we were not able to selectively include coupled spindles, there is still a unique relation between spindle amplitude and SO-SP coupling that other spindle measures do not have. 

      Originally, we also intended to include coupling density or counts in the analysis, which seems more relevant to the coupling metrics. However, the lack of uniformity in methods used to measure coupling density posed a significant limitation. We hope that our study will encourage consistent reporting of all relevant parameters in future research, allowing future meta-analyses to incorporate these measures comprehensively. We have added this discussion to the revised version of the manuscript (p. 3) to further clarify these points.

      All other citations were referenced in the manuscript.

      At the end of the first paragraph of section 3.1 (page 13), the authors suggest their results "... further emphasise the role of coupling compared to isolated oscillation events in memory consolidation". This had me wondering how many studies actually test this. For example, in a hierarchical regression model, would coupled spindles explain significantly more variance than uncoupled spindles? We already know that spindle activity, independent of whether they are coupled or not, predicts memory consolidation (e.g., Kumral meta-analysis). Is the variance in overnight memory consolidation fully explained by just the coupled events? If both overall spindle density and coupling measures show an equal association with consolidation, then we couldn't conclude that coupling compared to isolated events is more important.

      While primary coupling measurements, including coupling phase and strength, showed strong evidence for their associations with memory consolidation, measures of spindles, including spindle amplitude, only exhibited limited evidence (or “non-significant” effect) for their association with consolidation. These results are consistent with multiple empirical studies using different techniques (e.g., Hahn et al., 2020; Helfrich et al., 2019; Niethard et al., 2018; Weiner et al., 2023), which reported that coupling metrics are more robust predictors of consolidation and synaptic plasticity than spindle or slow oscillation metrics alone. However, we agree with the reviewer that we did not directly separate the effect between coupled and uncoupled spindles, and a more precise comparison would involve contrasting the “coupling of oscillation events” with ”individual oscillation events” rather than coupling versus isolated events.

      We recognized that Kumral and colleagues’ meta-analysis reported a moderate association between spindle measures and memory consolidation (e.g., for spindle amplitude-memory association they reported an effect size of approximately r = 0.30). However, one of the advantages of our study is that we actively cooperated with the authors to obtain a large number of unreported and insignificant data relevant to our analysis, as well as separated data that were originally reported under mixed conditions. This approach decreases the risk of false positives and selective reporting of results, making the effect size more likely to approach the true value. In contrast, we found only a weak effect size of r = 0.07 with minimal evidence for spindle amplitude-memory relation. However, we agree with the reviewer that using a more conservative term in this context would be a better choice since we did not measure all relevant spindle metrics including the density.

      To improve clarity in our manuscript, we have revised the statement to: “Together with other studies included in the review, our results suggest a crucial role of coupling but did not support the role of spindle events alone in memory consolidation,” and provide relevant references (p. 13). We believe this can more accurately reflect our findings and the existing literature to address the reviewer’s concern.

      It was very interesting to see that the relationship between the fast spindle coupling phase and overnight consolidation was strongest in the frontal electrodes. Given this, I wonder why memory promoting fast spindles shows a centro-parietal topography? Surely it would be more adaptive for fast spindles to be maximally expressed in frontal sites. Would a participant who shows a more frontal topography of fast spindles have better overnight consolidation than someone with a more canonical centro-parietal topography? Similarly, slow spindles would then be perfectly suited for memory consolidation given their frontal distribution, yet they seem less important for memory.

      Regarding the topography of fast spindles and their relationship to memory consolidation, we agree this is an intriguing issue, and we have already developed significant progress in this topic in our ongoing work, and have found evidence that participants with a more frontal topography of fast spindles show better overnight consolidation. These findings will be presented in our future publications. We share a few relevant observations: First, there are significant discrepancies in the definition of “slow spindle” in the field. Some studies defined slow spindle from 9-12 Hz (e.g. Mölle et al., 2011; Kurz et al., 2021), while others performed the event detection within a range of 11-13/14 Hz and found a frontal-dominated topography (e.g. Barakat et al., 2011; D'Atri et al., 2018). Compounding this issue, individual and age differences in spindle frequency are often overlooked, leading to challenges in reliably distinguishing between slow and fast spindles. Some studies have reported difficulty in clearly separating the two types of spindles altogether (e.g., Hahn et al., 2020). Moreover, a critical factor often ignored in past research is the propagating nature of both slow oscillations and spindles across the cortex, where spindles are coupled with significantly different phases of slow oscillations (see Figure 5). In addition, the frontal region has the strongest and most active SOs as its origin site, which may contribute to the role of frontal coupling. In contrast, not all SOs propagate from PFC to centro-parietal sites. The reviewer also raised an interesting idea that slow spindles would be perfectly suited for memory consolidation given their frontal distribution. We propose that one possible explanation is that if SOs couple exclusively with slow SPs, they may lose their ability to coordinate inter-area activity between centro-parietal and frontal regions, which could play a critical role in long-range memory transmission across hippocampus, thalamus, and prefrontal cortex. This hypothesis requires investigation in future studies. We believe a better understanding of coupling in the context of the propagation of these waves will help us better understand the observed frontal relationship with consolidation. Therefore, we believe this result supports our conclusion that coupling precision is more important than intensity, and we have addressed this in revised manuscript (pp. 15-16).

      The authors rightly note the issues with multiple comparisons in sleep physiology and memory studies. Multiple comparison issues arise in two ways in this literature. First are comparisons across multiple electrodes (many studies now use high-density systems with 64+ channels). Second are multiple comparisons across different outcome variables (at least 3 ways to quantify coupling (phase, consistency, occurrence) x 2 spindle types (fast, slow). Can the authors make some recommendations here in terms of how to move the field forward, as this issue has been raised numerous times before (e.g., Mantua 2018, Sleep; Cox & Fell 2020, Sleep Medicine Reviews for just a couple of examples). Should researchers just be focusing on the coupling phase? Or should researchers always report all three metrics of coupling, and correct for multiple comparisons? I think the use of pre-registration would be beneficial here, and perhaps could be noted by the authors in the final paragraph of section 3.5, where they discuss open research practices.

      There are indeed multiple methods that we can discuss, including cluster-based and non-parametric methods, etc., to correct for multiple comparisons in EEG data with spatiotemporal structures. In addition, encouraging the reporting of all tested but insignificant results, at least in supplementary materials, is an important practice that helps readers understand the findings with reduced bias. We agree with the reviewer’s suggestions and have added more information in section 3.4-3.5 (p. 17) to advocate for a standardized “template” used to report effect sizes and correct multiple comparisions in future research.

      We advocate for the standardization of reporting all three coupling metrics– phase, strength, and prevalence (density, count, and/or percentage coupled). Each coupling metric captures distinct a property of the coupling process and may interact with one another (Weiner et al., 2023). Therefore, we believe it is essential to report all three metrics to comprehensively explore their different roles in the “how, what, and where” of long-distance communication and consolidation of memory. As we advance toward a deeper understanding of the relationship between memory and sleep, we hope this work establishes a standard for the standardization, transparency, and replication of relevant studies.

      Reviewer #2 (Public review):

      Regarding the Moderator of Age: Although the authors discuss the limited studies on the analysis of children and elders regarding age as a moderator, the figure shows a significant gap between the ages of 40 and 60. Furthermore, there are only a few studies involving participants over the age of 60. Given the wide distribution of effect sizes from studies with participants younger than 40, did the authors test whether removing studies involving participants over 60 would still reveal a moderator effect?

      We agree that there is an age gap between younger and older adults, as current studies often focus on contrasting newly matured and fully aged populations to amplify the effect, while neglecting the gradual changes in memory consolidation mechanisms across the aging spectrum. We suggest that a non-linear analysis of age effects would be highly valuable, particularly when additional child and older adult data become available.

      In response to the reviewer’s suggestion, we re-tested the moderation effect of age after excluding effect sizes from older adults. The results revealed a decrease in the strength of evidence for phase-memory association due to increased variability, but were consistent for all other coupling parameters. The mean estimations also remained consistent (coupling phase-memory relation: -0.005 [-0.013, 0.004], BF10 = 5.51, the strength of evidence reduced from strong to moderate; coupling strength-memory relation: -0.005 [-0.015, 0.008], BF10 = 4.05, the strength of evidence remained moderate). These findings align with prior research, which typically observed a weak coupling-memory relationship in older adults during aging (Ladenbauer et al, 2021; Weiner et al., 2023) but not during development (Hahn et al., 2020; Kurz et al., 2021; Kurz et al., 2023). Therefore, this result is not surprising to us, and there are still observable moderate patterns in the data. We have reported these additional results in the revised manuscript (pp. 6, 11), and interpret “the moderator effect of age in the phase-memory association becomes less pronounced during development after excluding the older adult data”. We believe the original findings including the older adult group remain meaningful after cautious interpretation, given that the older adult data were derived from multiple studies and different groups, and they represent the aging effects.

      Reviewer #3 (Public review):

      First, the authors conclude that "SO-SP coupling should be considered as a general physiological mechanism for memory consolidation". However, the reported effect sizes are smaller than what is typically considered a "small effect”.

      While we acknowledge the concern about the small effect sizes reported in our study, it is important to contextualize these findings within the field of neuroscience, particularly memory research. Even in individual studies, small effect sizes are not uncommon due to the inherent complexity of the mechanisms involved and the multitude of confounding variables. This is an important factor to be considered in meta-analyses where we synthesize data from diverse populations and experimental conditions. For example, the relationship between SO-slow SP coupling and memory consolidation in older adults is expected to be insignificant.

      As Funder and Ozer (2019) concluded in their highly cited paper, an effect size of r = 0.3 in psychological and related fields should be considered large, with r = 0.4 or greater likely representing an overestimation and rarely found in a large sample or a replication. Therefore, we believe r = 0.1 should not be considered as a lower bound of the small effect. Bakker et al. (2019) also advocate for a contextual interpretation of the effect size. This is particularly important in meta-analyses, where the results are less prone to overestimation compared to individual studies, and we cooperated with all authors to include a large number of unreported and insignificant results. In this context, small correlations may contain substantial meaningful information to interpret. Although we agree that effect sizes reported in our study are indeed small at the overall level, they reflect a rigorous analysis that incorporates robust evidence across different levels of moderators. Our moderator analyses underscore the dynamic nature of coupling-memory relationships, with stronger associations observed in moderator subgroups that have historically exhibited better memory performance, particularly after excluding slow spindles and older adults. For example, both the coupling phase and strength of frontal fast spindles with slow oscillations exhibited "moderate-to-large" correlations with the consolidation of different types of memory, especially in young adults, with r values ranging from 0.18 to 0.32. (see Table S9.1-9.4). We have included discussion about the influence of moderators and hierarchical structures on the dynamics of coupling-memory associations (pp. 17, 20). In addition, we have updated the conclusion to be “SO-fast SP coupling should be considered as a general physiological mechanism for memory consolidation” (p. 1).

      Second, the study implements state-of-the-art Bayesian statistics. While some might see this as a strength, I would argue that it is the greatest weakness of the manuscript. A classical meta-analysis is relatively easy to understand, even for readers with only a limited background in statistics. A Bayesian analysis, on the other hand, introduces a number of subjective choices that render it much less transparent.

      This kind of analysis seems not to be made to be intelligible to the average reader. It follows a recent trend of using more and more opaque methods. Where we had to trust published results a decade ago because the data were not openly available, today we must trust the results because the methods can no longer be understood with reasonable effort.

      This becomes obvious in the forest plots. It is not immediately apparent to the reader how the distributions for each study represent the reported effect sizes (gray dots). Presumably, they depend on the Bayesian priors used for the analysis. The use of these priors makes the analyses unnecessarily opaque, eventually leading the reader to question how much of the findings depend on subjective analysis choices (which might be answered by an additional analysis in the supplementary information).

      We appreciate the reviewer for sharing this viewpoint and we value the opportunity to clarify some key points. To address the concern about clarity, we have included more details in the methods section explaining how to interpret Bayesian statistics including priors, posteriors, and Bayes factors, making our results more accessible to those less familiar with this approach.

      On the use of Bayesian models, we believe there may have been a misunderstanding. Bayesian methods, far from being "opaque" or overly complex, are increasingly valued for their ability to provide nuanced, accurate, and transparent inferences (Sutton & Abrams, 2001; Hackenberger, 2020; van de Schoot et al., 2021; Smith et al., 1995; Kruschke & Liddell, 2018). It has been applied in more than 1,200 meta-analyses as of 2020 (Hackenberger, 2020). In our study, we used priors that assume no effect (mean set to 0, which aligns with the null) while allowing for a wide range of variation to account for large uncertainties. This approach reduces the risk of overestimation or false positives and demonstrates much-improved performance over traditional methods in handling variability (Williams et al., 2018; Kruschke & Liddell, 2018). In addition, priors can also increase transparency, since all assumptions are formally encoded and open to critique or sensitivity analysis. In contrast, frequentist methods often rely on hidden or implicit assumptions such as homogeneity of variance, fixed-effects models, and independence of observations that are not directly testable. Sensitivity analyses reported in the supplemental material (Table S9.1-9.4) confirmed the robustness of our choices of priors– our results did not vary by setting different priors.

      As Kruschke and Liddell (2018) described, “shrinkage (pulling extreme estimates closer to group averages) helps prevent false alarms caused by random conspiracies of rogue outlying data,” a well-known advantage of Bayesian over traditional approaches. This explains the observed differences between the distributions and grey dots in the forest plots, which is an advantage of Bayesian models in handling heterogeneity. Unlike p-values, which can be overestimated with a large sample size and underestimated with a small sample size, Bayesian methods make assumptions explicit, enabling others to challenge or refine them– an approach aligned with open science principles (van de Schoot et al., 2021). For example, a credible interval in Bayesian model can be interpreted as “there is a 95% probability that the parameter lies within the interval.”, while a confidence interval in frequentist model means “In repeated experiments, 95% of the confidence intervals will contain the true value.” We believe the former is much more straightforward and convincing for readers to interpret. We will ensure our justification for using Bayesian models is more clearly presented in the manuscript (pp. 21-23).

      We acknowledge that even with these justifications, different researchers may still have discrepancies in their preferences for Bayesian and frequentist models. To increase the effort of transparent reporting, we have also reported the traditional frequentist meta-analysis results in Supplemental Material 10 to justify the robustness of our analysis, which suggested non-significant differences between Bayesian and frequentist models. We have included clearer references in the updated version of the manuscript to direct readers to the figures that report the statistics provided by traditional models.

      However, most of the methods are not described in sufficient detail for the reader to understand the proceedings. It might be evident for an expert in Bayesian statistics what a "prior sensitivity test" and a "posterior predictive check" are, but I suppose most readers would wish for a more detailed description. However, using a "Markov chain Monte Carlo (MCMC) method with the no-U-turn Hamiltonian Monte Carlo (HMC) sampler" and checking its convergence "through graphical posterior predictive checks, trace plots, and the Gelman and Rubin Diagnostic", which should then result in something resembling "a uniformly undulating wave with high overlap between chains" is surely something only rocket scientists understand. Whether this was done correctly in the present study cannot be ascertained because it is only mentioned in the methods and no corresponding results are provided. 

      We appreciate the reviewer’s concerns about accessibility and potential complexity in our descriptions of Bayesian methods. Our decision to provide a detailed account serves to enhance transparency and guide readers interested in replicating our study. We acknowledge that some terms may initially seem overwhelming. These steps, such as checking the MCMC chain convergence and robustness checks, are standard practices in Bayesian research and are analogous to “linearity”, “normality” and “equal variance” checks in frequentist analysis. In addition, Hamiltonian Monte Carlo (HMC) is the default algorithm Stan (the software we used to fit Bayesian models) uses to sample from the posterior distribution in Bayesian models. It is a type of MCMC method designed to be faster and more efficient than traditional sampling algorithms, especially for complex or high-dimensional models. We have added exemplary plots in the supplemental material S4.1-4.3 and the method section (pp. 21-22) to explain the results and interpretation of these convergence checks. We hope this will help address any concerns about methodological rigor.

      In one point the method might not be sufficiently justified. The method used to transform circular-linear r (actually, all references cited by the authors for circular statistics use r² because there can be no negative values) into "Z_r", seems partially plausible and might be correct under the H0. However, Figure 12.3 seems to show that under the alternative Hypothesis H1, the assumptions are not accurate (peak Z_r=~0.70 for r=0.65). I am therefore, based on the presented evidence, unsure whether this transformation is valid. Also, saying that Z_r=-1 represents the null hypothesis and Z_r=1 the alternative hypothesis can be misinterpreted, since Z_r=0 also represents the null hypothesis and is not half way between H0 and H1.

      First, we realized that in the title of Figures 12.2 and 12.3. “true r = 0.35” and “true r = 0.65” should be corrected as “true r_z” (note that we use r_z instead of Z_r in the revised manuscript per your suggestion). The method we used here is to first generate an underlying population that has null (0), moderate (0.35), or large (0.65) r_z correlations, then test whether the sampling distribution drawn from these populations followed a normal distribution across varying sample sizes. Nevertheless, the reviewer correctly noticed discrepancies between the reported true r_z and its sampling distribution peak. This discrepancy arises because, when generating large population data, achieving exact values close to a strong correlation like r_z = 0.65 is unlikely. We loop through simulations to generate population data and ensure their r_z values fall within a threshold. For moderate effect sizes (e.g., r_z = 0.35), this is straightforward using a narrow range (0.34 < r_z < 0.35). However, for larger effect sizes like r_z = 0.65, a wider range (0.6 < r_z < 0.7) is required. therefore sometimes the population we used to draw the sample has a r_z slightly deviated from 0.65. This remains reasonable since the main point of this analysis is to ensure that a large r_z still has a normal sampling distribution, but not focus specifically on achieving r_z = 0.65.

      We acknowledge that this variability of the range used was not clearly explained in supplemental material 12 and it is not accurate to report “true r_z = 0.65”. In the revised version, we have addressed this issue by adding vertical lines to each subplot to indicate the r_z of the population we used to draw samples, making it easier to check if it aligns with the sampling peak. In addition, we have revised the title to “Sampling distributions of r_z drawn from strong correlations

      (r_z = 0.6-0.7)”. We confirmed that population r_z and the peak of their sampling distribution remain consistent under both H0 and H1 in all sample sizes with n > 25, and we hope this explanation can fully resolve your concern.

      We agree with the reviewer that claiming r_z = -1 represents the null hypothesis is not accurate. The circlin r_z = 0 is better analogous to Pearson’s r = 0 since both represent the mean drawn from the population under the null hypothesis. In contrast, the mean effect size under null will be positive in the raw circlin r, which is one of the important reasons for the transformation. To provide a more accurate interpretation, we updated Table 6 to describe the following strength levels of evidence: no effect (r < 0), null (r = 0), small (r = 0.1), moderate (r = 0.3), and large (r =0.5). We thank the reviewer again for their valuable feedback.

      Reviewer #2 (Recommendations for the authors):

      (1) There is an extra space in the Notes of Figure 1. "SW R sharp-wave ripple.".

      We thank the reviewer for pointing this out. We have confirmed that the "extra space" is not an actual error but a result of how italicized Times New Roman font is rendered in the LaTeX format. We believe that the journal’s formatting process will resolve this issue.

      (2) In the introduction, slow oscillations (SO) are defined with a frequency of 0.16-4 Hz, sleep spindles (SP) at 8-16 Hz, and sharp-wave ripples (SWR) at 80-300 Hz. The term "fast oscillation" (FO) is first introduced with the clarification "SPs in our case." However, on page 2, the authors state, "SO-FO coupling involving SWRs, SPs, and SOs..." There seems to be a discrepancy in the definition of FO; does it consistently refer to SPs and SWRs throughout the article?

      We appreciate the reviewer’s observation regarding the potential ambiguity of the term "FO." In our manuscript, "FO" is used as a general term to describe the interaction of a "relatively faster oscillation" with a "relatively slower oscillation" in the phase-amplitude coupling mechanism, therefore it is not intended to exclusively refer to SPs or SWRs. For example, it is usually used to describe SO–SP–SWR couplings during sleep memory studies, but Theta–Alpha–Gamma couplings in wakeful memory studies. To address this confusion, we removed the phrase "SPs in our case" and explicitly use "SPs" when referring to spindles. In addition, we have replaced "fast oscillation" with "faster oscillation" to emphasize that it is used in a relative sense (p. 1), rather than to refer to a specific oscillation. Also, we only retained the term “FO” when introducing the PAC mechanism.

      (3) On page 2, the first paragraph contains the phrase: "...which occur in the precise hierarchical temporal structure of SO-FO coupling involving SWRs, SPs, and SOs ..." Since "SO-FO" refers to slow and fast oscillations, it is better to maintain the order of frequencies, suggesting it as: SOs, SPs, and SWRs.

      We sincerely thank the reviewer for their valuable suggestion. We have updated the sentence to maintain the correct order from the lowest to the highest frequencies in the revised version (p. 2).

      (4) References should be provided:

      a “Studies using calcium imaging after SP stimulation explained the significance of the precise coupling phase for synaptic plasticity.".

      b. "Electrophysiology evidence indicates that the association between memory consolidation and SO-SP coupling is influenced by a variety of behavioral and physiological factors under different conditions."

      c. "Since some studies found that fast SPs predominate in the centroparietal region, while slow SPs are more common in the frontal region, a significant amount of studies only extracted specific types of SPs from limited electrodes. Some studies even averaged all electrodes to estimate coupling..."

      This is a great point.  These have been referenced as follows:

      a. Rephrased: “Studies using calcium imaging and SP stimulation explained the significance of the precise coupling phase for synaptic plasticity.” We changed “after” to “and” to reflect that these were conducted as two separate experiments. This is a summary statement, with relevant citations provided in the following two sentences of the paragraph, including Niethard et al., 2018, and Rosanova et al., 2005. (p. 2)

      b. Included diverse sources of evidence: “Electrophysiology evidence from studies included in our meta-analysis (e.g. Denis et al., 2021; Hahn et al., 2020; Mylonas et al., 2020) and others (e.g. Bartsch et al., 2019; Muehlroth et al., 2019; Rodheim et al., 2023) reported that the association between memory consolidation and SO-SP coupling is influenced by a variety of behavioral and physiological factors under different conditions.” (p. 3)

      c. Added references and more details: “Since some studies found that fast SPs predominate in the centroparietal region, while slow SPs are more common in the frontal region, a significant amount of studies selectively extracted specific types of SPs from limited electrodes (e.g. Dehnavi et al., 2021; Perrault et al., 2019; Schreiner et al., 2021). Some studies even averaged all electrodes in their spectral and/or time-series analysis to estimate metrics of oscillations and their couplings (e.g. Denis et al., 2022; Mölle et al., 2011; Nicolas et al., 2022).” (p. 4)

      Reviewer #3 (Recommendations for the authors):

      There are a number of terms that are not clearly defined or used:

      (1) SP amplitude. Does this mean only the amplitude of coupled spindles or of spindles in general?

      This refers to the amplitude of spindles in general. We clarified this in the revised text (and see response to reviewer #1, point #1).

      (2) The definition of a small effect

      We thank the reviewer again for raising this important question. As we responded in the public review, small effect sizes are common in neuroscience and meta-analyses due to the complexity of the underlying mechanisms and the presence of numerous confounding variables and hierarchical levels. To help readers better interpret effect sizes, we changed rigid ranges to widely accepted benchmarks for effect size levels in neuroscience research: small (r=0.1), moderate (r=0.3), and large (r=0.5; Cohen, 1988). We also noted that an evidence and context-based framework will provide a more practical way to interpret the observed effect sizes compared to rigid categorizations.

      (3) Can a BF10 based on experimental evidence actually be "infinite" and a probability actually be 1.00?

      We appreciate the reviewer for highlighting this potential confusion. The formula used to calculate BF10 is P(data | H1) / P(data | H0). In the experimental setting with an informative prior, an ‘infinite’ BF10 value indicates that all posterior samples are overwhelmingly compatible with H1 given the data and assumptions (Cox et al., 2023; Heck et al., 2023; Ly et al., 2016). In such cases, the denominator P(data | H0) becomes vanishingly small, leading BF10 to converge to infinity. This scenario occurs when the probability of H1 converges to 1 (e.g., 0.9999999999…).

      It is a well-established convention in Bayesian statistics to report the Bayes factor as "infinity" in cases where the evidence is overwhelmingly strong, and BF10 exceeds the numerical limits of the computation tools to become effectively infinite. To address this ambiguity, we added a footnote in the revised version of the manuscript to clarify the interpretation of an 'infinite' BF10 . (p. 8)

      (4) Z_r should be renamed to r_z or similar. These are not Z values (-inf..+inf), but r values (-1..1).

      We thank the reviewers for their suggestions. We agree that r_z would provide a clearer and more accurate interpretation, while z is more appropriate for referring to Fisher's z-transformed r (see point (5)). We have updated the notation accordingly.

      (5) Also, it remains quite unclear at which points in the analyses, "r" values or "Fisher's z transformed r" values are used. Assumptions of normality should only apply to the transformed values. However, the formulas for the random effects model seem to assume normality for r values.

      The correlation values were z-transformed during preprocessing to ensure normality and the correct estimation of sampling variances before running the models. The outputs were then back-transformed to raw r values only when reporting the results to help readers interpret the effect size. We mentioned this in Section 5.5.1, therefore the normality assumptions are not a concern. We have updated the notation r to z (-inf..+inf) in the formula of the random and mixed effect models in the revised version of the manuscript (p. 22).

      Language

      (1) Frequency. In the introduction, the authors use "frequency" when they mean something like the incidence of spindles.

      We agree that the term "frequency" has been used inconsistently to describe both the incidence of events and the frequency bands of oscillations. We have replaced "frequency" with "prevalence" to refer to the incidence of coupling events where applicable (p. 3).

      (2) Moderate and mediate. These two terms are usually meant to indicate two different types of causal influences.

      Thanks for the reviewer’s suggestions. We agree that "moderate" is more appropriate to describe moderators in this study since it does not directly imply causality. We have replaced mediate with moderate in relevant contexts.

      (3) "the moderate effect of memory task is relatively weak": "moderator effect" or "moderate effect"?

      We appreciate the reviewer for pointing out this mistake. We have updated the term to "moderator effect" in Section 2.2.2 (p. 6).

      (4) "in frontal regions we found a latest coupled but most precise and strong SO-fast SP coupling" Meaning?

      We thank the reviewer for bringing this concern of clarity to our attention. By 'latest,' we refer to the delayed phase of SO-fast SP coupling observed in the frontal regions compared to the central and parietal regions (see Figure 5), "Precise and strong" describes the high precision and strength of phase-locking between the SO up-state and the fast SP peak in these regions. We have rephrased this sentence to be: “We found that SO-fast SP coupling in the frontal region occurred at the latest phase observed across all regions, characterized by the highest precision and strength of phase-locking.” to improve clarity (p. 9).

      (5) Figure 5 and others contain angles in degrees and radians.

      We appreciate the reviewer pointing out this inconsistency. We have updated the manuscript and supplementary material to consistently use radians throughout.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      We thank the reviewer for their careful evaluation and positive comments. 

      Adaptation paradigm

      “why is it necessary to use an *adaptation* paradigm to study the link between SF tuning and pRF estimation? Couldn't you just use pRF bar stimuli with varying SFs?” 

      We thank the reviewer for this question. First, by using adaptation we can infer the correspondence between the perceptual and the neuronal adaptation to spatial frequency. We couldn’t draw any inference about perception if we only varied the SF inside the bar. More importantly, while changing the SF inside the bar might help drive different neuronal populations, this is not guaranteed. As we touched on in our discussion, responses obtained from the mapping stimuli are dominated by complex processing rather than the stimulus properties alone. A considerable proportion of the retinotopic mapping signal is probably simply due to spatial attention to the bar (de Haas & Schwarzkopf, 2018; Hughes et al., 2019). So, adaptation is a more targeted way to manipulate different neuronal populations.

      Other pRF estimates: polar angle and eccentricity 

      We included an additional plot showing the polar angle for both adapter conditions (Figure S4), as well as participant-wise scatter plots comparing raw pRF size, eccentricity, and polar angle between two adapter conditions (available in shared data repository). In line with previous work on the reliability of pRF estimates (van Dijk, de Haas, Moutsiana, & Schwarzkopf, 2016; Senden, Reithler, Gijsen, & Goebel, 2014), both polar angle and eccentricity maps are very stable between the two adaptation conditions. 

      Variability in pRF size change

      As the reviewer pointed out, the pRF size changes show some variability across eccentricities, and ROIs (Figure 5A and 5B). It is likely that the variability could relate to the varying tuning properties of different regions and eccentricities for the specific SF we used in the mapping stimulus. So one reason V2 is most consistent could be that the stimulus is best matched for the tuning there. However, what factors contribute to this variability is an interesting question that will require further study. 

      Other recommendations

      We have addressed the other recommendations of the reviewer with one exception. The reviewer suggested we should comment on the perceived contrast decrease after SF adaptation (as seen in Figure 6B) in the main text. However, since we refer the readers to the supplementary analyses (Supplementary section S8) where we discuss this in detail, we chose to keep this aspect unchanged to avoid overcomplicating the main text.

      Reviewer #2 (Public Review):

      We thank the reviewer for their comments - we improved how we report key findings which we hope will clarify matters raised by the reviewer.

      RF positions in a voxel

      The reviewer’s comments suggest that they may have misunderstood the diagram (Figure 1A) illustrating the theoretical basis of the adaptation effect, likely due to us inadvertently putting the small RFs in the middle of the illustration. We changed this figure to avoid such confusion.

      Theoretical explanation of adaptation effect

      The reviewer’s explanation for how adaptation should affect the size of pRF averaging across individual RFs is incorrect. When selecting RFs from a fixed range of semi-uniformly distributed positions (as in an fMRI voxel), the average position of RFs (corresponding to pRF position) is naturally near the center of this range. The average size (corresponding to pRF size) reflects the visual field coverage of these individual RFs. This aggregate visual field coverage thus also reflects the individual sizes. When large RFs have been adapted out, this means the visual field coverage at the boundaries is sparser, and the aggregate pRF is therefore smaller. The opposite happens when adapting out the contribution of small RFs. We demonstrate this with a simple simulation at this OSF link: https://osf.io/ebnky/. The pRF size of the simulated voxels illustrate the adaptation effect should manifest precisely as we hypothesized.

      Figure S2

      It is not actually possible to compare R<sup>2</sup> between regions by looking at Figure S2 because it shows the pRF size change, not R<sup>2</sup>. Therefore, the arguments Reviewer #2 made based on their interpretation of the figure are not valid. Just as the reviewer expected, V1 is one of the brain regions with good pRF model fits. We included normalized and raw R<sup>2</sup> maps to make this more obvious to the readers.

      V1 appeared essentially empty in that plot primarily due to the sigma threshold we selected, which was unintentionally more conservative than those applied in our analyses and other figures. We apologize for this mistake. We corrected it in the revised version by including a plot with the appropriate sigma threshold.

      Thresholding details 

      Thresholding information was included in our original manuscript; however, we included more information in the figure captions to make it more obvious.

      2D plots replaced histograms

      We thank the reviewer for this suggestion. The original manuscript contained histograms showing the distribution of pRF size for both adaptation conditions for each participant and visual area (Figure S1). However, we agree that 2D plots better communicate the difference in pRF parameters between conditions. So we moved the histogram plots to the online repository, and included scatter plots with a color scheme revealing the 2D kernel density.

      We chose to implement 2D kernel density in scatter plots to display the distribution of individual pRF sizes transparently.

      (proportional) pRF size-change map 

      The reviewer requests pRF size difference maps. Figure S2 in fact demonstrates the proportional difference between the pRF sizes of the two adaptation conditions. Instead of simply taking the difference, we believe showing the proportional change map is more sensible because overall pRF size varies considerably between visual regions. We explained this more clearly in our revision. 

      pRF eccentricity plot 

      “I suspect that the difference in PRF size across voxels correlates very strongly with the difference in eccentricity across voxels.”

      Our original manuscript already contained a supplementary plot (Figure S4 B, now Figure S4 C) comparing the eccentricity between adapter conditions, showing no notable shift in eccentricities except in V3A - but that is a small region and the results are generally more variable. In addition, we included participant-wise plots in the online repository, presenting raw comparisons of pRF size, eccentricity, and polar angle estimates between adaptation conditions. These 2D plots provide further evidence that the SF adapters resulted in a change in pRF size, while eccentricity and polar angle estimates did not show consistent differences.  

      To the reviewer’s point, even if there were an appreciable shift in eccentricity between conditions (as they suggest may have happened for the example participant we showed), this does not mean that the pRF size effect is “due [...] to shifts in eccentricity.” Parameters in a complex multi-dimensional model like the pRF are not independent. There is no way of knowing whether a change in one parameter is causally linked with a change in another. We can only report the parameter estimates the model produces. 

      In fact, it is conceivable that adaptation causes both: changes in pRF size and eccentricity. If more central or peripheral RFs tend to have smaller or larger RFs, respectively, then adapting out one part of the distribution will shift the average accordingly. However, as we already established, we find no compelling evidence that pRF eccentricity changes dramatically due to adaptation, while pRF size does.

      Other recommendations

      We have addressed the other recommendations of the reviewer, except for the y-axis alignment. Different regions in the visual hierarchy naturally vary substantially in pRF size. Aligning axes would therefore lead to incorrect visual inferences that (1) the absolute pRF sizes between ROIs are comparable, and (2) higher regions show the effect most

      prominently. However, for clarity, we now note this scale difference in our figure captions. Finally, as mentioned earlier, we also present a proportional pRF size change map to enable comparison of the adaptation effect between regions.

      Reviewer #3 (Public Review):

      We thank the reviewer for their comments.

      pRF model

      Top-up adapters were not modelled in our analyses because they are shared events in all TRs, critically also including the “blank” periods, providing a constant source of signal. Therefore modelling them separately cannot meaningfully change the results. However, the reviewer makes a good suggestion that it would be useful to mention this in the manuscript, so we added a discussion of this point in Section 3.1.5.

      pRF size vs eccentricity

      We added a plot showing pRF size in the two adaptation conditions (in addition to the pRF size difference) as a function of eccentricity.

      Correlation with behavioral effect

      In the original manuscript, we pointed out why the correlation between the magnitude of the behavioral effect and the pRF size change is not an appropriate test for our data. First, the reviewer is right that a larger sample size would be needed to reliably detect such a between-subject correlation. More importantly, as per our recruitment criteria for the fMRI experiment, we did not scan participants showing weak perceptual effects. This limits the variability in the perceptual effect and makes correlation inapplicable.

    1. Author Response:

      Public Reviews:

      Reviewer #1 (Public review):

      Weaknesses:

      (1) It remains unclear how this stimulation protocol is proposed to enhance memory. Memories are believed to be stored by precise inputs to specific neurons and highly tuned changes in synaptic strengths. It remains unclear whether proposed neural activity generated by the stimulation reflects the activation of specific memories or generally increased activity across all classes of neurons.

      Thank you for raising the important issue of the actual neurophysiological effects of non-invasive brain stimulation. Unfortunately, invasive neurophysiological recordings in humans for this type of study are not feasible due to ethical constraints, while studies on cadavers or rodents would not fully resolve our question. Indeed, the authors of the cited study (Mihály Vöröslakos et al., Nature Communications, 2018) highlight the impossibility of drawing definitive conclusions about the exact voltage required in the in-vivo human brain due to significant differences between rats and humans, as well as the in-vivo human brain and cadavers due to alterations in electrical conductivity that occur in postmortem tissue.

      We acknowledge that further exploration of this aspect would be highly valuable, and we agree that it is worth discussing both as a technical limitation and as a potential direction for future research, we therefore modify the manuscript correspondingly. However, to address the challenge of in vivo recordings, we conducted Experiments 3 and 4, which respectively examined the neurophysiological and connectivity changes induced by the stimulation in a non-invasive manner. The observed changes in brain oscillatory activity (increased gamma oscillatory activity), cortical excitability (enhanced posteromedial parietal cortex reactivity), and brain connectivity (strengthened connections between the precuneus and hippocampi) provided evidence of the effects of our non-invasive brain stimulation protocol, further supporting the behavioral data.

      Additionally, we carefully considered the issue of stimulation distribution and, in response, performed a biophysical modeling analysis and E-field calculation using the parameters employed in our study (see Supplementary Materials).

      (2) The claim that effects directly involve the precuneus lacks strong support. The measurements shown in Figure 3 appear to be weak (i.e., Figure 3A top and bottom look similar, and Figure 3C left and right look similar). The figure appears to show a more global brain pattern rather than effects that are limited to the precuneus. Related to this, it would perhaps be useful to show the different positions of the stimulation apparatus. This could perhaps show that the position of the stimulation matters and could perhaps illustrate a range of distances over which position of the stimulation matters.

      Thank you for your feedback. We will improve the clarity of the manuscript to better address this important aspect. Our assumption that the precuneus plays a key role in the observed effects is based on several factors:

      (1) The non-invasive stimulation protocol was applied to an individually identified precuneus for each participant. Given existing evidence on TMS propagation, we can reasonably assume that the precuneus was at least a mediator of the observed effects (Ridding & Rothwell, Nature Reviews Neuroscience 2007). For further details about target identification and TMS and tACS propagation, please refer to the MRI data acquisition section in the main text and Biophysical modeling and E-field calculation section in the supplementary materials.

      (2) To investigate the effects of the neuromodulation protocol on cortical responses, we conducted a whole-brain analysis using multiple paired t-tests comparing each data point between different experimental conditions. To minimize the type I error rate, data were permuted with the Monte Carlo approach and significant p-values were corrected with the false discovery rate method (see the Methods section for details). The results identified the posterior-medial parietal areas as the only regions showing significant differences across conditions.

      (3) To control for potential generalized effects, we included a control condition in which TMS-EEG recordings were performed over the left parietal cortex (adjacent to the precuneus). This condition did not yield any significant results, reinforcing the cortical specificity of the observed effects.

      However, as stated in the Discussion, we do not claim that precuneus activity alone accounts for the observed effects. As shown in Experiment 4, stimulation led to connectivity changes between the precuneus and hippocampus, a network widely recognized as a key contributor to long-term memory formation (Bliss & Collingridge, Nature 1993). These connectivity changes suggest that precuneus stimulation triggered a ripple effect extending beyond the stimulation site, engaging the broader precuneus-hippocampus network.

      Regarding Figure 3A, it represents the overall expression of oscillatory activity detected by TMS-EEG. Since each frequency band has a different optimal scaling, the figure reflects a graphical compromise. A more detailed representation of the significant results is provided in Figure 3B. The effect sizes for gamma oscillatory activity in the delta T1 and T2 conditions were 0.52 and 0.50, respectively, which correspond to a medium effect based on Cohen’s d interpretation.

      (3) Behavioral results showing an effect on memory would substantiate claims that the stimulation approach produces significant changes in brain activity. However, placebo effects can be extremely powerful and useful, and this should probably be mentioned. Also, in the behavioral results that are currently presented, there are several concerns:

      a) There does not appear to be a significant effect on the STMB task.

      b) The FNAT task is minimally described in the supplementary material. Experimental details that would help the reader understand what was done are not described. Experimental details are missing for: the size of the images, the duration of the image presentation, the degree of image repetition, how long the participants studied the images, whether the names and occupations were different, genders of the faces, and whether the same participant saw different faces across the different stimulation conditions. Regarding the latter point, if the same participant saw the same faces across the different stimulation conditions, then there could be memory effects across different conditions that would need to be included in the statistical analyses. If participants saw different faces across the different stimulus conditions, then it would be useful to show that the difficulty was the same across the different stimuli.

      We thank you for signaling the lack in the description of FNAT task. We will add all the information required to the manuscript.

      In the meantime, here we provide the answers to your questions. The size of the images 19x15cm. They were presented in the learning phase and the immediate recall for 8 seconds each, while in the delayed recall they were shown (after the face recognition phase) until the subject answered. The learning phase, where name and occupation were shown together with the faces, lasted around 2 minutes comprising the instructions. We used a different set of stimuli for each stimulation condition, for a total of 3 parallel task forms balanced across the condition and order of sessions. All the parallel forms were composed of 6 male and 6 female faces, for each sex there were 2 young adults (aged around 30 years old), 2 middle adults (aged around 50 years old), and 2 old adults (aged around 70 years old). Before the experiments, we ran a pilot study to ensure there were no differences between the parallel forms of the task. We can provide the task with its parallel form upon request. The chance level in the immediate and delayed recall is not quantifiable since the participants had to freely recall the name and the occupation without a multiple choice. In the recognition, the chance level was around 33% (since the possible answers were 3).

      c) Also, if I understand FNAT correctly, the task is based on just 12 presentations, and each point in Figure 2A represents a different participant. How the performance of individual participants changed across the conditions is unclear with the information provided. Lines joining performance measurements across conditions for each participant would be useful in this regard. Because there are only 12 faces, the results are quantized in multiples of 100/12 % in Figure 3A. While I do not doubt that the authors did their homework in terms of the statistical analyses, it seems as though these 12 measurements do not correspond to a large effect size. For example, in Figure 3A for the immediate condition (total), it seems that, on average, the participants may remember one more face/name/occupation.

      We will add another graph to the manuscript with lines connecting each participant's performance. Unfortunately, we were not able to incorporate it in the box-and-whisker plot.

      We apologize for the lack of clarity in the description of the FNAT. As you correctly pointed out, we used the percentage based on the single association between face, name and occupation (12 in total). However, each association consisted of three items, resulting in a total of 36 items to learn and associate – we will make it more explicit in the manuscript.

      In the example you mentioned, participants were, on average, able to recall three more items compared to the other conditions. While this difference may not seem striking at first glance, it is important to consider that we assessed memory performance after a single, three-minute stimulation session. Similar effects are typically observed only after multiple stimulation sessions (Koch et al., NeuroImage, 2018; Grover et al., Nature Neuroscience, 2022).

      d) Block effects. If I understand correctly, the experiments were conducted in blocks. This is potentially problematic. An example study that articulates potential problems associated with block designs is described in Li et al (TPAMI 2021, https://ieeexplore.ieee.org/document/9264220). It is unclear if potential problems associated with block designs were taken into consideration.

      Thank you for the interesting reference. According to this paper, in a block design, EEG or fMRI recordings are performed in response to different stimuli of a given class presented in succession. If this is the case, it does not correspond to our experimental design where both TMS-EEG and fMRI were conducted in a resting state on different days according to the different stimulation conditions.

      e) In the FNAT portion of the paper, some results are statistically significant, while others are not. The interpretation of this is unclear. In Figure 3A, it seems as though the authors claim that iTBS+gtACS > iTBS+sham-tACS, but iTBS+gtACS ~ sham+sham. The interpretation of such a result is unclear. Results are also unclear when separated by name and occupation. There is only one condition that is statistically significant in Figure 3A in the name condition, and no significant results in the occupation condition. In short, the statistical analyses, and accompanying results that support the authors’ claims, should be explained more clearly.

      Thank you again for your feedback. We will work on making the large amount of data we reported easier to interpret.

      Hoping to have thoroughly addressed your initial concerns in our previous responses, we now move on to your observations regarding the behavioral results, assuming you were referring to Figure 2A. The main finding of this study is the improvement in long-term memory performance, specifically the ability to correctly recall the association between face, name, and occupation (total FNAT), which was significantly enhanced in both Experiments 1 and 2. However, we also aimed to explore the individual contributions of name and occupation separately to gain a deeper understanding of the results. Our analysis revealed that the improvement in total FNAT was primarily driven by an increase in name recall rather than occupation recall. We understand that this may have caused some confusion. Therefore we will clarify this in the manuscript and consider presenting the name and occupation in a separate plot.

      Regarding the stimulation conditions, your concerns about the performance pattern (iTBS+gtACS > iTBS+sham-tACS, but iTBS+gtACS ~ sham+sham) are understandable. However, this new protocol was developed precisely in response to the variability observed in behavioral outcomes following non-invasive brain stimulation, particularly when used to modulate memory functions (Corp et al., 2020; Pabst et al., 2022). As discussed in the manuscript, it is intended as a boost to conventional non-invasive brain stimulation protocols, leveraging the mechanisms outlined in the Discussion section.

      Reviewer #2 (Public review):

      Weaknesses:

      (1) The study did not include a condition where γtACS was applied alone. This was likely because a previous work indicated that a single 3-minute γtACS did not produce significant effects, but this limits the ability to isolate the specific contribution of γtACS in the context of this target and memory function

      Thank you for your comments. As you pointed out, we did not include a condition where γtACS was applied alone. This decision was based on the findings of Guerra et al. (Brain Stimulation 2018), who investigated the same protocol and reported no aftereffects. Given the substantial burden of the experimental design on patients and our primary goal of demonstrating an enhancement of effects compared to the standalone iTBS protocol, we decided to leave out this condition. However, we agree that investigating the effects of γtACS alone is an interesting and relevant aspect worthy of further exploration. In line with these observations, we will expand the discussion on this point in the study’s limitations section.

      (2) The authors applied stimulation for 3 minutes, which seems to be based on prior tACS protocols. It would be helpful to present some rationale for both the duration and timing relative to the learning phase of the memory task. Would you expect additional stimulation prior to recall to benefit long-term associative memory?

      Thank you for your comment and for raising this interesting point. As you correctly noted, the protocol we used has a duration of three minutes, a choice based on previous studies demonstrating its greater efficacy with respect to single stimulation from a neurophysiological point of view. Specifically, these studies have shown that the combined stimulation enhanced gamma-band oscillations and increased cortical plasticity (Guerra et al., Brain Stimulation 2018; Maiella et al., Scientific Reports 2022). Given that the precuneus (Brodt et al., Science 2018; Schott et al., Human Brain Mapping 2018), gamma oscillations (Osipova et al., Journal of Neuroscience 2006; Deprés et al., Neurobiology of Aging 2017; Griffiths et al., Trends in Neurosciences 2023), and cortical plasticity (Brodt et al., Science 2018) are all associated with encoding processes, we decided to apply the co-stimulation immediately before it to enhance the efficacy.

      Regarding the question of whether stimulation could also benefit recall, the answer is yes. We can speculate that repeating the stimulation before recall might provide an additional boost. This is supported by evidence showing that both the precuneus and gamma oscillations are involved in recall processes (Flanagin et al., Cerebral Cortex 2023; Griffiths et al., Trends in Neurosciences 2023). Furthermore, previous research suggests that reinstating the same brain state as during encoding can enhance recall performance (Javadi et al., The Journal of Neuroscience 2017).

      We will expand the study rationale and include these considerations in the future directions section.

      (3) How was the burst frequency of theta iTBS and gamma frequency of tACS chosen? Were these also personalized to subjects' endogenous theta and gamma oscillations? If not, were increases in gamma oscillations specific to patients' endogenous gamma oscillation frequencies or the tACS frequency?

      The stimulation protocol was chosen based on previous studies (Guerra et al., Brain Stimulation 2018; Maiella et al., Scientific Reports 2022). Gamma tACS sinusoid frequency wave was set at 70 Hz while iTBS consisted of ten bursts of three pulses at 50 Hz lasting 2 s, repeated every 10 s with an 8 s pause between consecutive trains, for a total of 600 pulses total lasting 190 s (see iTBS+γtACS neuromodulation protocol section). In particular, the theta iTBS has been inspired by protocols used in animal models to elicit LTP in the hippocampus (Huang et al., Neuron 2005). Consequently, neither Theta iTBS nor the gamma frequency of tACS were personalized. The increase in gamma oscillations was referred to the patient’s baseline and did not correspond to the administrated tACS frequency.

      (4) The authors do a thorough job of analyzing the increase in gamma oscillations in the precuneus through TMS-EEG; however, the authors may also analyze whether theta oscillations were also enhanced through this protocol due to the iTBS potentially targeting theta oscillations. This may also be more robust than gamma oscillations increases since gamma oscillations detected on the scalp are very low amplitude and susceptible to noise and may reflect activity from multiple overlapping sources, making precise localization difficult without advanced techniques.

      Thank you for the suggestion. We analyzed theta oscillations finding no changes.

      (5) Figure 4: Why are connectivity values pre-stimulation for the iTBS and sham tACS stimulation condition so much higher than the dual stimulation? We would expect baseline values to be more similar.

      We acknowledge that the pre-stimulation connectivity values for the iTBS and sham tACS conditions appear higher than those for the dual stimulation condition. However, as noted in our statistical analyses, there were no significant differences at baseline between conditions (p-FDR= 0.3514), suggesting that any apparent discrepancy is due to natural variability rather than systematic bias. One potential explanation for these differences is individual variability in baseline connectivity measures, which can fluctuate due to factors such as intrinsic neural dynamics, participant state, or measurement noise. Despite these variations, our statistical approach ensures that any observed post-stimulation effects are not confounded by pre-existing differences.

      (6) Figure 2: How are total association scores significantly different between stimulation conditions, but individual name and occupation associations are not? Further clarification of how the total FNAT score is calculated would be helpful.

      We apologize for any lack of clarity. The total FNAT score reflects the ability to correctly recall all the information associated with a person—specifically, the correct pairing of the face, name, and occupation. Participants received one point for each triplet they accurately recalled. The scores were then converted into percentages, as detailed in the Face-Name Associative Task Construction and Scoring section in the supplementary materials.

      Total FNAT was the primary outcome measure. However, we also analyzed name and occupation recall separately to better understand their individual contributions. Our analysis revealed that the improvement in total FNAT was primarily driven by an increase in name recall rather than occupation recall.

      We acknowledge that this distinction may have caused some confusion. To improve clarity, we will revise the manuscript accordingly and consider presenting name and occupation recall in separate plots.

      Reviewer #3 (Public review):

      Weaknesses:

      I want to state clearly that I think the strengths of this study far outweigh the concerns I have. I still list some points that I think should be clarified by the authors or taken into account by readers when interpreting the presented findings.

      I think one of the major weaknesses of this study is the overall low sample size in all of the experiments (between n = 10 and n = 20). This is, as I mentioned when discussing the strengths of the study, partly mitigated by the within-subject design and individualized stimulation parameters. The authors mention that they performed a power analysis but this analysis seemed to be based on electrophysiological readouts similar to those obtained in experiment 3. It is thus unclear whether the other experiments were sufficiently powered to reliably detect the behavioral effects of interest. That being said, the authors do report significant effects, so they were per definition powered to find those. However, the effect sizes reported for their main findings are all relatively large and it is known that significant findings from small samples may represent inflated effect sizes, which may hamper the generalizability of the current results. Ideally, the authors would replicate their main findings in a larger sample. Alternatively, I think running a sensitivity analysis to estimate the smallest effect the authors could have detected with a power of 80% could be very informative for readers to contextualize the findings. At the very least, however, I think it would be necessary to address this point as a potential limitation in the discussion of the paper.

      Thank you for the observation. As you mentioned, our power analysis was based on our previous study investigating the same neuromodulation protocol with a corresponding experimental design. The relatively small sample could be considered a possible limitation of the study which we will add to the discussion. A fundamental future step will be to replay these results on a larger population, however, to strengthen our results we performed the sensitivity analysis you suggested.

      In detail, we performed a sensitivity analysis for repeated-measures ANOVA with α=0.05 and power(1-β)=0.80 with no sphericity correction. For experiment 1, a sensitivity analysis with 1 group and 3 measurements showed a minimal detectable effect size of f=0.524 with 20 participants. In our paper, the ANOVA on total FNAT immediate performance revealed an effect size of η2\=0.274 corresponding to f=0.614; the ANOVA on FNAT delayed performance revealed an effect size of η2 =0.236 corresponding to f=0.556. For experiment 2, a sensitivity analysis for total FNAT immediate performance (1 group and 3 measurements) showed a minimal detectable effect size of f=0.797 with 10 participants. In our paper, the ANOVA on total FNAT immediate performance revealed an effect size of η2 =0.448 corresponding to f=0.901. The sensitivity analysis for total FNAT delayed performance (1 group and 6 measurements) showed a minimal detectable effect size of f=0.378 with 10 participants. In our paper, the ANOVA on total FNAT delayed performance revealed an effect size of η2 =0.484 corresponding to f=0.968. Thus, the sensitivity analysis showed that both experiments were powered enough to detect the minimum effect size computed in the power analysis. We have now added this information to the manuscript and we thank the reviewer for her/his suggestion.

      It seems that the statistical analysis approach differed slightly between studies. In experiment 1, the authors followed up significant effects of their ANOVAs by Bonferroni-adjusted post-hoc tests whereas it seems that in experiment 2, those post-hoc tests where "exploratory", which may suggest those were uncorrected. In experiment 3, the authors use one-tailed t-tests to follow up their ANOVAs. Given some of the reported p-values, these choices suggest that some of the comparisons might have failed to reach significance if properly corrected. This is not a critical issue per se, as the important test in all these cases is the initial ANOVA but non-significant (corrected) post-hoc tests might be another indicator of an underpowered experiment. My assumptions here might be wrong, but even then, I would ask the authors to be more transparent about the reasons for their choices or provide additional justification. Finally, the authors sometimes report exact p-values whereas other times they simply say p < .05. I would ask them to be consistent and recommend using exact p-values for every result where p >= .001.

      Thank you again for the suggestions. Your observations are correct, we used a slightly different statistical depending on our hypothesis. Here are the details:

      In experiment 1, we used a repeated-measure ANOVA with one factor “stimulation condition” (iTBS+γtACS; iTBS+sham-tACS; sham-iTBS+sham-tACS). Following the significant effect of this factor we performed post-hoc analysis with Bonferroni correction.

      In experiment 2, we used a repeated-measures with two factors “stimulation condition” and “time”. As expected, we observed a significant effect of condition, confirming the result of experiment 1, but not of time. Thus, this means that the neuromodulatory effect was present regardless of the time point. However, to explore whether the effects of stimulation condition were present in each time point we performed some explorative t-tests with no correction for multiple comparisons since this was just an explorative analysis.

      In experiment 3, we used the same approach as experiment 1. However, since we had a specific hypothesis on the direction of the effect already observed in our previous study, i.e. increase in spectral power (Maiella et al., Scientific Report 2022), our tests were 1-tailed.

      For the p-values, we will correct the manuscript reporting the exact values for every result.

      While the authors went to great lengths trying to probe the neural changes likely associated with the memory improvement after stimulation, it is impossible from their data to causally relate the findings from experiments 3 and 4 to the behavioral effects in experiments 1 and 2. This is acknowledged by the authors and there are good methodological reasons for why TMS-EEG and fMRI had to be collected in sperate experiments, but it is still worth pointing out to readers that this limits inferences about how exactly dual iTBS and γtACS of the precuneus modulate learning and memory.

      Thank you for your comment. We fully agree with your observation, which is why this aspect has been considered in the study's limitations. To address your concern, we will further emphasize the fact that our findings do not allow precise inferences regarding the specific mechanisms by which dual iTBS and γtACS of the precuneus modulate learning and memory.

      There were no stimulation-related performance differences in the short-term memory task used in experiments 1 and 2. The authors argue that this demonstrates that the intervention specifically targeted long-term associative memory formation. While this is certainly possible, the STM task was a spatial memory task, whereas the LTM task relied (primarily) on verbal material. It is thus also possible that the stimulation effects were specific to a stimulus domain instead of memory type. In other words, could it be possible that the stimulation might have affected STM performance if the task taxed verbal STM instead? This is of course impossible to know without an additional experiment, but the authors could mention this possibility when discussing their findings regarding the lack of change in the STM task.

      Thank you for your insightful observation. We argue that the intervention primarily targeted long-term associative memory formation, as our findings demonstrated effects only on FNAT. However, as you correctly pointed out, we cannot exclude the possibility that the stimulation may also influence short-term verbal associative memory. We will acknowledge this potential effect when discussing the absence of significant findings in the STM task.

      While the authors discuss the potential neural mechanisms by which the combined stimulation conditions might have helped memory formation, the psychological processes are somewhat neglected. For example, do the authors think the stimulation primarily improves the encoding of new information or does it also improve consolidation processes? Interestingly, the beneficial effect of dual iTBS and γtACS on recall performance was very stable across all time points tested in experiments 1 and 2, as was the performance in the other conditions. Do the authors have any explanation as to why there seems to be no further forgetting of information over time in either condition when even at immediate recall, accuracy is below 50%? Further, participants started learning the associations of the FNAT immediately after the stimulation protocol was administered. What would happen if learning started with a delay? In other words, do the authors think there is an ideal time window post-stimulation in which memory formation is enhanced? If so, this might limit the usability of this procedure in real-life applications.

      Thank you for your comment and for raising these important points.

      We hypothesized that co-stimulation would enhance encoding processes. Previous studies have shown that co-stimulation can enhance gamma-band oscillations and increase cortical plasticity (Guerra et al., Brain Stimulation 2018; Maiella et al., Scientific Reports 2022). Given that the precuneus (Brodt et al., Science 2018; Schott et al., Human Brain Mapping 2018), gamma oscillations (Osipova et al., Journal of Neuroscience 2006; Deprés et al., Neurobiology of Aging 2017; Griffiths et al., Trends in Neurosciences 2023), and cortical plasticity (Brodt et al., Science 2018) have all been associated with encoding processes, we decided to apply co-stimulation before the encoding phase, to boost it.

      We applied the co-stimulation immediately before the learning phase to maximize its potential effects. While we observed a significant increase in gamma oscillatory activity lasting up to 20 minutes, we cannot determine whether the behavioral effects we observed would have been the same with a co-stimulation applied 20 minutes before learning. Based on existing literature, a reduction in the efficacy of co-stimulation over time could be expected (Huang et al., Neuron 2005; Thut et al., Brain Topography 2009). However, we hypothesize that multiple stimulation sessions might provide an additional boost, helping to sustain the effects over time (Thut et al., Brain Topography 2009; Koch et al., Neuroimage 2018; Koch et al., Brain 2022).

      Regarding the absence of further forgetting in both stimulation conditions, we think that the clinical and demographical characteristics of the sample (i.e. young and healthy subjects) explain the almost absence of forgetting after one week.

    1. Author response:

      We appreciate the reviewers’ insightful feedback and propose to undertake an extensive revision of the manuscript to strengthen our findings and underscore the significance of this work. We remain convinced that our study offers critical insights into the largely independent dopamine and serotonin neural circuits. Nevertheless, we concur that substantial revisions are warranted, as the current organization may not be ideal to showcase the central findings. In particular, we will increase the number of animals to address data variability and enhance the reproducibility of the observed effects. We also recognize the need to perform additional control experiments and to include complementary anatomical tracing studies. Moreover, we will reformat the manuscript and conduct additional analyses to emphasize that evoked dopamine and serotonin release originate from distinct loci with minimal crosstalk. To address all of these points thoroughly, we estimate that a 12-month revision period will be required.

    1. Author response:

      Point-by-point description of the revisions

      Reviewer #1 (Evidence, reproducibility and clarity):

      The study is well-executed and provides many interesting leads for further experimental studies, which makes it very important. One of the significant hypotheses in this context is metazoan Wnt Lipocone domain interactions with lipids, which remain to be explored.

      The manuscript is generally navigable for interesting reading despite being content-rich. Overall, the figures are easy to follow.

      We thank the reviewer for the thoughtful and favorable assessment.

      Major comments:

      I urge the authors to consider creating a first figure summarizing the broad approach and process involved in discovering the lipocone superfamily. This would help the average reader easily follow the manuscript.

      It will be helpful to have the final model/synthesis figure, which provides a take-home message that combines the main deductions from Fig 1c, Fig 4, Fig 5, and Fig 6 to provide an eagle's eye view (also translating the arguments on Page 38 last para into this potential figure).

      We have generated a two-part figure that synthesizes these two requests, also in line with the recommendations made by Reviewer 3. Depending on the accepting Review Commons journal, we plan to either submit this as a graphical abstract/TOC figure (as suggested by Reviewer 3) or as a single figure. We prefer starting with the first approach as it will keep our figure count the same.

      Minor comments:

      Fig 1C: The authors should provide a statistical estimate of the difference in transmembrane tendency scores between the "membrane" and "globular" versions of the Lipocone domains.

      To address this, we calculated group-wise differences using the Kruskal-Wallis nonparametric test, followed by Dunn’s test with Bonferroni correction for a more stringent evaluation. The results of which are presented as a critical difference diagram in the new Supplementary Figure S3. The analysis is explained in the Methods section of the revised manuscript, and the statistically significant difference is mentioned in the text. This analysis identifies three groups of significantly different Lipocone families based on their transmembrane tendency: those predicted (or known) to associate with the prokaryotic membranes, those predicted to be diffusible, and a small number of families residing eukaryotic ER membranes or bacterial outer membranes.

      Reviewer #2 (Evidence, reproducibility and clarity):

      This is a remarkable study, one of a kind. The authors trace the entire huge superfamily containing Wnt proteins which origins remained obscure before this work. Even more amazingly, they show that Wnts originated from transmembrane enzymes. The work is masterfully executed and presented. The conclusions are strongly supported by multiple lines of evidence. Illustrations are beautifully crafted. This is an exemplary work of how modern sequence and structure analysis methods should be used to gain unprecedented insights into protein evolution and origins.

      We thank the reviewer for the positive evaluation of our work.

      Minor comments.

      (1) In fig 1, VanZ structure looks rather different from the rest and is a more tightly packed helical bundle. It might be useful for the readers to learn more about the arguments why authors consider this family to be homologous with the rest, and what caused these structural changes in packing of the helices.

      First, the geometry of an α-helix can be approximated as a cylinder, resulting in contact points that are relatively small. Fewer contact constraints can lead to structural variation in the angular orientations between the helices of an all α-helical domain, resulting in some dispersion in space of the helical axes. As a result, some of the views can be a bit confounding when presented as static 2D images. Second, of the two VanZ clades the characteristic structure similar to the other superfamily members is more easily seen in the VanZ-2 clade (as illustrated in supplementary Figure S2).    

      Importantly, the membership of the VanZ domains was recovered via significant hits in our sequence analysis of the superfamily. Importantly, when the sequence alignments of the active site are compared (Figure 2), VanZ retains the conserved active site residue positions, which are predicted to reside spatially in the same location and project into an equivalent active site pocket as seen in the other families in the superfamily. Further, this sequence relationship is captured by the edges in the network in Figure 1B: multiple members of the superfamily show edges indicating significant relationships with the two VanZ families (e.g., HHSearch hits of probability greater than 90%; p<0.0001 are observed between VanZ-1 and Skillet-DUF2809, Skillet-1, Skillet-4, YfiM-1, YfiM-DUF2279, Wok, pPTDSS, and cpCone-1). Thus, they occupy relatively central locations in the sequence similarity network, indicating a consistent sequence similarity connection to multiple other families.

      (2) Fig. 4 color bars before names show a functional role. How does the blue bar "described for the first time" fits into this logic? Maybe some other way to mark this (an asterisk?) could be better to resolve this sematic inconsistency.

      We have shifted the blue bars into asterisks, which follow family names, now stated in the updated legend.

      Reviewer #3 (Evidence, reproducibility and clarity):

      The manuscript by Burroughs et al. uses informatic sequence analysis and structural modeling to define a very large, new superfamily which they dub the Lipocone superfamily, based on its function on lipid components and cone-shaped structure. The family includes known enzymatic domains as well as previously uncharacterized proteins (30 families in total). Support for the superfamily designation includes conserved residues located on the homologous helical structures within the fold. The findings include analyses that shed light on important evolutionary relationships including a model in which the superfamily originated as membrane proteins where one branch evolved into a soluble version. Their mechanistic proposals suggest possible functions for enzymes currently unassigned. There is also support for the evolutionary connection of this family with the human immune system. The work will be of interest to those in the broad areas of bioinformatics, enzyme mechanisms, and evolution. The work is technically well performed and presented.

      We appreciate the positive evaluation of our work by the reviewer.

      Referees cross-commenting

      All the comments seem useful to me. I like Reviewer 1's suggestion for a flowchart showing the methodology. I think the summarizing figure suggested could be a TOC abstracvt, which many journals request.

      To accommodate this comment (along with Reviewer 1’s comments), we have generated a two-part figure containing the methodology flowchart and the summary of findings. Combining the two provides some before-and-after symmetry to a TOC figure, while also avoiding further inflation of the figure count, which would likely be an issue at one or more of the Review Commons journals.

      The authors may wish to consider the following points (page numbers from PDF for review):

      (1) It would be useful in Fig 1A, either in main text or the supporting information, to also have a an accompanying topology diagram- I like the coloring of the helices to show the homology but the connections between them are hard to follow

      We acknowledge the reviewer’s concern as one shared by ourselves. We have placed such a topology diagram in Figure 1A, and now refer to it at multiple points in the manuscript text.

      (2) Page: 6- In the paragraph marked as an example- please call out Fig1A when the family mentioned is described (I believe SAA is described as one example)

      We have added these pointers in the text, where appropriate.

      (3) Page: 7- The authors state "these 'hydrophobic families' often evince a deeper phyletic distribution pattern than the less-hydrophobic families (Figure S1), implying that the ancestral version of the superfamily was likely a TM domain" there should be more explanation or information here - I am not certain from looking at FigS1 what a deeper phyletic distribution pattern means. Perhaps explaining for a single example? I also see that this important point is discussed in the conclusions- it is useful to point to the conclusion here.

      Our use of the ‘deeper’ in this context is meant to convey the concept that more widely conserved families/clades (both across and within lineages) suggest an earlier emergence. In the Lipocone superfamily, this phylogenetic reasoning supports an evolutionary scenario where the membrane-inserted versions generally emerged early, while the solubilized versions, which are found in relatively fewer lineages, emerged later.

      To address this objectively, we have calculated a simple phyletic distribution metric that combines the phyletic spread of a Lipocone clade with its depth within individual lineages, which is then plotted as a bargraph (Supplemental Figure S1). Briefly, this takes the width of the bar as the phyletic spread across the number of distinct taxonomic lineages and its height as a weighted mean of occurrence within each lineage (depth). The latter helps dampen the effects of sampling bias. In the resulting graph, lineages with a lower height and width are likely to have been derived later than those with a greater height and width. A detailed description clarifying this has been added to the Methods section of the revised manuscript. The results support two statements that are made in the text: 1) that the Wok and VanZ clades are the most widely and deeply represented clades in the superfamily, and 2) that the predicted transmembrane versions tend to be more widely and deeply distributed. We have also added a statement in the results with a pointer to Figure S1 to clarify this point raised by the referee.

      (4) For figure 3 I would suggest instead of coloring by atom type- to color the leaving group red and the group being added blue so the reader can see where the moieties start and end in substrates and products

      We have retained the atom type coloring in the figure for ease of visualizing the atom types. However, to address the reviewer’s concern, we have added dashed colored circles to highlight attacking and leaving groups in the reactions. The legend has been updated accordingly.

      (5) Page: 13- The authors state "While the second copy in these versions is catalytically inactive, the H1' from the second duplicate displaces the H1 from the first copy," So this results in a "sort of domain swap" correct? It may be more clear to label both copies in Figure 3 upper right so it is easier for the reader to follow.

      We have added these labels to the updated Figure S4 (formerly S3).

      (6) The authors state "In addition to the fusion to the OMP β-barrel, the YfiM-DUF2279 family (Figure 5H) shows operonic associations with a secreted MltG-like peptidoglycan lytic transglycosylase (127,128), a lipid anchored cytochrome c heme-binding domain (129), a phosphoglucomutase/phosphomannomutase enzyme (130), a GNAT acyltransferase (131), a diaminopimelate (DAP) epimerase (132), and a lysozyme like enzyme (133). In a distinct operon, YfiM-DUF2279 is combined with a GT-A glycosyltransferase domain (79), a further OMP β-barrel, and a secreted PDZ-like domain fused to a ClpP-like serine protease (134,135) (Figure 5H)." this combination of enzymes sounds like those in the pathways for oligosaccharide synthesis which is cytoplasmic but the flippase acts to bring the product to the periplasm. Please make sure it is clear that these enzymes may act at different faces of the membrane.

      We have made that point explicit in the revised manuscript in the paragraph following the above-quoted statement.

      (7) Page: 21- the authors should remove the unpublished observations on other RDD domain or explain or cite them

      The analysis of the RDD domain is a part of a distinct study whose manuscript we are currently preparing, and explaining its many ramifications would be outside the scope of this manuscript. Moreover, placing even an account of it in this manuscript would break its flow and take the focus away from the Lipocone superfamily. Further, its inclusion of the RDD story would substantially increase the size of the manuscript. However, it is commonly fused to the Lipocone domain; hence, it would be remiss if we entirely remove a reference to it. Accordingly, we retain a brief account of the RDD-fused Lipocone domains in the revised manuscript that is just sufficient to make the relevant functional case”.

      (8) Page: 34- The authors state "For instance, the emergence of the outer membrane in certain bacteria was potentially coupled with the origin of the YfiM and Griddle clades (Figure 4)." I don't see origin point indicated in figure 4 (emergence of outer membrane- this may be helpful to indicate in some way- also I am not certain what the dashed circles in Fig 4 are indicating- its not in the legend?

      This annotation has been added to the revised Figure 4, and the point of recruitment is indicated with a  “X” sign, along with a clarification in the legend regarding the dashed circles.

      (9) In terms of the hydrophobicity analysis, it would be good to mark on the plot (Fig 1C) one or two examples of lipocone members with known structure that are transmembrane proteins as a positive control

      We have added these markers (colored triangles and squares for these families to the plot.

      Grammar, typos

      Page: 3- abstract severance is an odd word to use for hydrolysis or cleavage

      We have changed to “cleavage”.

      Page: 5- "While the structure of Wnt was described over a decade prior" should read "Although the structure of ..."

      Page 7 - "One family did not yield a consistent prediction for orientation"- please state which family

      Page: 8 "While the ancestral pattern is noticeably degraded in the metazoan Wnt (Met-Wnt) family, it is strongly preserved in the prokaryotic Min-Wnt family." Should read "Although the ancestral..."

      throughout- please replace solved with experimentally determined to be clear and avoid jargon

      Please replace "TelC severs the link" with "TelC cleaves the bond "

      We have made the above changes.

      Page: 19- the authors state "a lipobox-containing synaptojanin superfamily phosphoesterase (125) and a secreted R-P phosphatase (126) (see Figure 6, Supplementary Data)" I was uncertain if the authors meant Fig S6 or they meant see Fig 6 and something else in supplementary data. Please fix.

      In this pointer, we intended to flag the relevant gene neighborhoods in both Figures 5H and 6, as well as highlight the additional examples contained in the Supplementary Data. We have updated the point

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper concerns mechanisms of foraging behavior in C. elegans. Upon removal from food, C. elegans first executes a stereotypical local search behavior in which it explores a small area by executing many random, undirected reversals and turns called "reorientations." If the worm fails to find food, it transitions to a global search in which it explores larger areas by suppressing reorientations and executing long forward runs (Hills et al., 2004). At the population level, the reorientation rate declines gradually. Nevertheless, about 50% of individual worms appear to exhibit an abrupt transition between local and global search, which is evident as a discrete transition from high to low reorientation rate (Lopez-Cruz et al., 2019). This observation has given rise to the hypothesis that local and global search correspond to separate internal states with the possibility of sudden transitions between them (Calhoun et al., 2014). The main conclusion of the paper is that it is not necessary to posit distinct internal states to account for discrete transitions from high to low reorientation rates. On the contrary, discrete transitions can occur simply because of the stochastic nature of the reorientation behavior itself.

      Strengths:

      The strength of the paper is the demonstration that a more parsimonious model explains abrupt transitions in the reorientation rate.

      Weaknesses:

      (1) Use of the Gillespie algorithm is not well justified. A conventional model with a fixed dt and an exponentially decaying reorientation rate would be adequate and far easier to explain. It would also be sufficiently accurate - given the appropriate choice of dt - to support the main claims of the paper, which are merely qualitative. In some respects, the whole point of the paper - that discrete transitions are an epiphenomenon of stochastic behavior - can be made with the authors' version of the model having a constant reorientation rate (Figure 2f).

      We apologize, but we are not sure what the reviewer means by “fixed dt”. If the reviewer means taking discrete steps in time (dt), and modeling whether a reorientation occurs, we would argue that the Gillespie algorithm is a better way to do this because it provides floating-point precision, rather than a time resolution limited by dt, which we hopefully explain in the updated text (Lines 107-192).

      The reviewer is correct that discrete transitions are an epiphenomenon of stochastic behavior as we show in Figure 2f. However, abrupt stochastic jumps that occur with a constant rate do not produce persistent changes in the observed rate because it is by definition, constant. The theory that there are local and global searches is based on the observation that individual worms often abruptly change their reorientation rates. But this observation is only true for a fraction of worms. We are trying to argue that the reason why this is not observed for all, or even most worms is because these are the result of stochastic sampling, not a sudden change in search strategy.

      (2) In the manuscript, the Gillespie algorithm is very poorly explained, even for readers who already understand the algorithm; for those who do not it will be essentially impossible to comprehend. To take just a few examples: in Equation (1), omega is defined as reorientations instead of cumulative reorientations; it is unclear how (4) follows from (2) and (3); notation in (5), line 133, and (7) is idiosyncratic. Figure 1a does not help, partly because the notation is unexplained. For example, what do the arrows mean, what does "*" mean?

      We apologize for this, you are correct, 𝛀 is cumulative reorientations, and we have edited the text for clarity (Lines 107-192):

      We apologize for the arrow notation confusion. Arrow notation is commonly used in pseudocode to indicate variable assignment, and so we used it to indicate variable assignment updates in the algorithm.

      We added Figure 2a to help explain the Gillespie algorithm for people who are unfamiliar with it, but you are correct, some notation, like probabilities, were left unexplained. We have added more text to the figure legend. Hopefully this additional text, along with lines 105-190, provide better clarification.

      (3) In the model, the reorientation rate dΩ⁄dt declines to zero but the empirical rate clearly does not. This is a major flaw. It would have been easy to fix by adding a constant to the exponentially declining rate in (1). Perhaps fixing this obvious problem would mitigate the discrepancies between the data and the model in Figure 2d.

      You are correct that the model deviates slightly at longer times, but this result is consistent with Klein et al. that show a continuous decline of reorientations. However, we have added a constant to the model (b, Equation 2), since an infinite run length is likely not physiological.

      (4) Evidence that the model fits the data (Figure 2d) is unconvincing. I would like to have seen the proportion of runs in which the model generated one as opposed to multiple or no transitions in reorientation rate; in the real data, the proportion is 50% (Lopez). It is claimed that the "model demonstrated a continuum of switching to non-switching behavior" as seen in the experimental data but no evidence is provided.

      We should clarify that the 50% proportion cited by López-Cruz was based on an arbitrary difference in slopes, and by assessing the data visually (López-Cruz, Figure S2). We added a comment in the text to clarify this (Lines 76 – 78). We sought to avoid this subjective assessment by plotting the distribution of slopes and transition times produced by the method used in López-Cruz. We should also clarify by what we meant by “a continuum of switching and non-switching” behavior. Both the transition time distributions and the slope-difference distributions do not appear to be the result of two distributions (the distributions in Figure 1 are not bimodal). This is unlike roaming and dwelling on food, where two distinct distributions of behavioral metrics can be identified based on speed and angular speed (Flavell et al, 2009, Fig S2a).

      Based on the advice of Reviewer #3, we have also modeled the data using different starting amounts of M (M<sub>0</sub>). By definition, an initial value of M<sub>0</sub> = 1 is a two-state switching strategy; the worm either uses a reorientation rate of a (when M = 1) or b (when M = 0). As expected, this does produce a bimodal distribution of slope differences (Figure 3b), which is significantly different than the experimental distribution (Figure 3c). We have added a new section to explain this in more detail (Lines 253 – 297).

      (5) The explanation for the poor fit between the model and data (lines 166-174) is unclear. Why would externally triggered collisions cause a shift in the transition distribution?

      Thank you, we rewrote the text to clarify this better (Lines 227-233). There were no externally triggered collisions; 10 animals were used per experiment. They would occasionally collide during the experiment, but these collisions were excluded from the data that were provided. However, worms are also known to increase reorientations when they encounter a pheromone trail, and it is unknown (from this dataset) which orientations may have been a result of this phenomenon.

      (6) The discussion of Levy walks and the accompanying figure are off-topic and should be deleted.

      Thank you, we agree that this topic is tangential, and we removed it.

      Reviewer #2 (Public review):

      Summary:

      In this study, the authors build a statistical model that stochastically samples from a timeinterval distribution of reorientation rates. The form of the distribution is extracted from a large array of behavioral data, and is then used to describe not only the dynamics of individual worms (including the inter-individual variability in behavior), but also the aggregate population behavior. The authors note that the model does not require assumptions about behavioral state transitions, or evidence accumulation, as has been done previously, but rather that the stochastic nature of behavior is "simply the product of stochastic sampling from an exponential function".

      Strengths:

      This model provides a strong juxtaposition to other foraging models in the worm. Rather than evoking a behavioral transition function (that might arise from a change in internal state or the activity of a cell type in the network), or evidence accumulation (which again maps onto a cell type, or the activity of a network) - this model explains behavior via the stochastic sampling of a function of an exponential decay. The underlying model and the dynamics being simulated, as well as the process of stochastic sampling, are well described and the model fits the exponential function (Equation 1) to data on a large array of worms exhibiting diverse behaviors (1600+ worms from Lopez-Cruz et al). The work of this study is able to explain or describe the inter-individual diversity of worm behavior across a large population. The model is also able to capture two aspects of the reorientations, including the dynamics (to switch or not to switch) and the kinetics (slow vs fast reorientations). The authors also work to compare their model to a few others including the Levy walk (whose construction arises from a Markov process) to a simple exponential distribution, all of which have been used to study foraging and search behaviors.

      Weaknesses:

      This manuscript has two weaknesses that dampen the enthusiasm for the results. First, in all of the examples the authors cite where a Gillespie algorithm is used to sample from a distribution, be it the kinetics associated with chemical dynamics, or a Lotka-Volterra Competition Model, there are underlying processes that govern the evolution of the dynamics, and thus the sampling from distributions. In one of their references, for instance, the stochasticity arises from the birth and death rates, thereby influencing the genetic drift in the model. In these examples, the process governing the dynamics (and thus generating the distributions from which one samples) is distinct from the behavior being studied. In this manuscript, the distribution being sampled is the exponential decay function of the reorientation rate (lines 100-102). This appears to be tautological - a decay function fitted to the reorientation data is then sampled to generate the distributions of the reorientation data. That the model performs well and matches the data is commendable, but it is unclear how that could not be the case if the underlying function generating the distribution was fit to the data.

      Thank you, we apologize that this was not clearer. In the Lotka-Volterra model, the density of predators and prey are being modeled, with the underlying assumption that rates of birth and death are inherently stochastic. In our model, the number of reorientations are being modeled, with the assumption (based on the experiments), that the occurrence of reorientations is stochastic, just like the occurrence (birth) of a prey animal is stochastic. However, the decay in M is phenomenological, and we speculate about the nature of M later in the manuscript.

      You are absolutely right that the decay function for M was fit to the population average of reorientations and then sampled to generate the distributions of the reorientation data. This was intentional to show that the parameters chosen to match the population average would produce individual trajectories with comparable stochastic “switching” as the experimental data. All we’re trying to show really is that observed sudden changes in reorientation that appear persistent can be produced by a stochastic process without resorting to binary state assignments. In Calhoun, et al 2014 it is reported all animals produced switch-like behavior, but in Klein et al, 2017 it is reported that no animals showed abrupt transitions. López-Cruz et al seem to show a mix of these results, which can easily be explained by an underlying stochastic process.

      The second weakness is somewhat related to the first, in that absent an underlying mechanism or framework, one is left wondering what insight the model provides.

      Stochastic sampling a function generated by fitting the data to produce stochastic behavior is where one ends up in this framework, and the authors indeed point this out: "simple stochastic models should be sufficient to explain observably stochastic behaviors." (Line 233-234). But if that is the case, what do we learn about how the foraging is happening? The authors suggest that the decay parameter M can be considered a memory timescale; which offers some suggestion, but then go on to say that the "physical basis of M can come from multiple sources". Here is where one is left for want: The mechanisms suggested, including loss of sensory stimuli, alternations in motor integration, ionotropic glutamate signaling, dopamine, and neuropeptides are all suggested: these are basically all of the possible biological sources that can govern behavior, and one is left not knowing what insight the model provides. The array of biological processes listed is so variable in dynamics and meaning, that their explanation of what governs M is at best unsatisfying. Molecular dynamics models that generate distributions can point to certain properties of the model, such as the binding kinetics (on and off rates, etc.) as explanations for the mechanisms generating the distributions, and therefore point to how a change in the biology affects the stochasticity of the process. It is unclear how this model provides such a connection, especially taken in aggregate with the previous weakness.

      Providing a roadmap of how to think about the processes generating M, the meaning of those processes in search, and potential frameworks that are more constrained and with more precise biological underpinning (beyond the array of possibilities described) would go a long way to assuaging the weaknesses.

      Thank you, these are all excellent points. We should clarify that in López-Cruz et al, they claim that only 50% of the animals fit a local/global search paradigm. We are simply proposing there is no need for designating local and global searches if the data don’t really support it. The underlying behavior is stochastic, so the sudden switches sometimes observed can be explained by a stochastic process where the underlying rate is slowing down, thus producing the persistently slow reorientation rate when an apparent “switch” occurs. What we hope to convey is that foraging doesn’t appear to follow a decision paradigm, but instead a gradual change in reorientations which for individual worms, can occasionally produce reorientation trajectories that appear switch-like.

      As for M, you are correct, we should be more explicit, and we have added text (Lines 319-359) to expand upon its possible biological origin.

      Reviewer #3 (Public review):

      Summary:

      This intriguing paper addresses a special case of a fundamental statistical question: how to distinguish between stochastic point processes that derive from a single "state" (or single process) and more than one state/process. In the language of the paper, a "state" (perhaps more intuitively called a strategy/process) refers to a set of rules that determine the temporal statistics of the system. The rules give rise to probability distributions (here, the probability for turning events). The difficulty arises when the sampling time is finite, and hence, the empirical data is finite, and affected by the sampling of the underlying distribution(s). The specific problem being tackled is the foraging behavior of C. elegans nematodes, removed from food. Such foraging has been studied for decades, and described by a transition over time from 'local'/'area-restricted' search'(roughly in the initial 10-30 minutes of the experiments, in which animals execute frequent turns) to 'dispersion', or 'global search' (characterized by a low frequency of turns). The authors propose an alternative to this two-state description - a potentially more parsimonious single 'state' with time-changing parameters, which they claim can account for the full-time course of these observations.

      Figure 1a shows the mean rate of turning events as a function of time (averaged across the population). Here, we see a rapid transient, followed by a gradual 4-5 fold decay in the rate, and then levels off. This picture seems consistent with the two-state description. However, the authors demonstrate that individual animals exhibit different "transition" statistics (Figure 1e) and wish to explain this. They do so by fitting this mean with a single function (Equations 1-3).

      Strengths:

      As a qualitative exercise, the paper might have some merit. It demonstrates that apparently discrete states can sometimes be artifacts of sampling from smoothly time-changing dynamics. However, as a generic point, this is not novel, and so without the grounding in C. elegans data, is less interesting.

      Weaknesses:

      (1) The authors claim that only about half the animals tested exhibit discontinuity in turning rates. Can they automatically separate the empirical and model population into these two subpopulations (with the same method), and compare the results?

      Thank you, we should clarify that the observation that about half the animals exhibit discontinuity was not made by us, but by López-Cruz et al. The observed fraction of 50% was based on a visual assessment of the dual regression method we described. We added text (Lines 76-79) to clarify this. To make the process more objective, we decided to simply plot the distributions of the metrics they used for this assessment to see if two distinct populations could be observed. However, the distributions of slope differences and transition times do not produce two distinct populations. Our stochastic approach, which does not assume abrupt state-transitions, also produces comparable distributions. To quantify this, we have added a section varying M<sub>0</sub>, including setting M<sub>0</sub> to 1, so that the model by definition is a switch model. This model performs the worst (Lines 253-296, Figure 3).

      (2) The equations consider an exponentially decaying rate of turning events. If so, Figure 2b should be shown on a semi-logarithmic scale.

      We chose to not do this because this average is based on the number of discrete reorientation events observed within a 2-minute window. The range of events ranges from 0 to 6 (hence a rate of 0.5-3 min<sup>-1</sup>), which does not span one order of magnitude. Instead, we included a heat map (Figure 1a, Figure 2b bottom panel) which shows the density that the average is based on. We hope this provides some clarity to the reader.

      (3) The variables in Equations 1-3 and the methods for simulating them are not well defined, making the method difficult to follow. Assuming my reading is correct, Omega should be defined as the cumulative number of turning events over time (Omega(t)), not as a "turn" or "reorientation", which has no derivative. The relevant entity in Figure 1a is apparently <Omega (t)>, i.e. the mean number of events across a population which can be modelled by an expectation value. The time derivative would then give the expected rate of turning events as a function of time.

      Thank you, you are correct. Please see response to Reviewer #1.

      (4) Equations 1-3 are cryptic. The authors need to spell out up front that they are using a pair of coupled stochastic processes, sampling a hidden state M (to model the dynamic turning rate) and the actual turn events, Omega(t), separately, as described in Figure 2a. In this case, the model no longer appears more parsimonious than the original 2-state model. What then is its benefit or explanatory power (especially since the process involving M is not observable experimentally)?

      Thank you, yes we see how as written this was confusing. In our response to Reviewer #1, and in the text, we added an important detail:

      While reorientations are modeled as discrete events, which is observationally true, the amount of M at time t=0 is chosen to be large (M<sub>0</sub> = 1000), so that over the timescale of 40 minutes, the decay in M is practically continuous. This ensures that sudden changes in reorientations are not due to sudden changes in M, but due to the inherent stochasticity of reorientations.

      However you are correct that if M was chosen to have a binary value of 0 or 1, then this would indeed be the two state model. We added a new section to address this (Lines 253-287, Figure 3). Unlike the experiments, the two-state model produces bimodal distributions in slope and transition times, and these distributions are significantly different than the experimental data (Figure 3).

      (5) Further, as currently stated in the paper, Equations 1-3 are only for the mean rate of events. However, the expectation value is not a complete description of a stochastic system. Instead, the authors need to formulate the equations for the probability of events, from which they can extract any moment (they write something in Figure 2a, but the notation there is unclear, and this needs to be incorporated here).

      Thank you, yes please see our response to Reviewer #1. We have clarified the text in Lines 105-190.

      (6) Equations 1-3 have three constants (alpha and gamma which were fit to the data, and M0 which was presumably set to 1000). How does the choice of M0 affect the results?

      Thank you, this is a good question. We address this in lines 253-296. Briefly, the choice of M<sub>0</sub> does not have a strong effect on the results, unless we set it to M<sub>0</sub>, which by definition, creates a two-state model. This model was significantly different than the experimental data, relative to the other models (Figure 3c).

      (7) M decays to near 0 over 40 minutes, abolishing omega turns by the end of the simulations. Are omega turns entirely abolished in worms after 30-40 minutes off food? How do the authors reconcile this decay with the leveling of the turning rate in Figure 1a?

      Yes, Reviewer #1 recommended adding a baseline reorientation rate which we did for all models (Equation 2). However, we should also note that in Klein et al they observed a continuous decay over 50 minutes. Though realistically, it is likely not plausible that worms will produce infinitely long runs at long time points.

      (8) The fit given in Figure 2b does not look convincing. No statistical test was used to compare the two functions (empirical and fit). No error bars were given (to either). These should be added. In the discussion, the authors explain the discrepancy away as experimental limitations. This is not unreasonable, but on the flip side, makes the argument inconclusive. If the authors could model and simulate these limitations, and show that they account for the discrepancies with the data, the model would be much more compelling.

      To do this, I would imagine that the authors would need to take the output of their model (lists of turning times) and convert them into simulated trajectories over time. These trajectories could be used to detect boundary events (for a given size of arena), collisions between individuals, etc. in their simulations and to see their effects on the turn statistics.

      Thank you, we have added dashed lines to indicate standard deviation to Figures 2b and 3a. After running the models several times, we found that some of the small discrepancies noted (like s<sub>1</sub>-s<sub>2</sub> < 0 for experiments but not the model), were spurious due to these data points being <1% of the data, so we cut this from the text. To compare how similar the continuous (M<sub>0</sub> > 1) and discrete (M<sub>0</sub> = 1) models were to the experimental data, we calculated a Jensen-Shannon distance for the models, and found that the discrete model was significantly more dissimilar to the experimental data than the continuous models (Lines 289-296, Figure 3c).

      (9) The other figures similarly lack any statistical tests and by eye, they do not look convincing. The exception is the 6 anecdotal examples in Figure 2e. Those anecdotal examples match remarkably closely, almost suspiciously so. I'm not sure I understood this though - the caption refers to "different" models of M decay (and at least one of the 6 examples clearly shows a much shallower exponential). If different M models are allowed for each animal, this is no longer parsimonious. Are the results in Figure 2d for a single M model? Can Figure 2e explain the data with a single (stochastic) M model?

      We certainly don’t want the panels in Figure 2e to be suspicious! These comparisons were drawn from calculating the correlations between all model traces and all experimental traces, and then choosing the top hits. Every time we run the simulation, we arrive at a different set of examples. Since it was recommended we add a baseline rate, these examples will be a completely different set when we run the simulation, again.

      We apologize for the confusion regarding M. Since the worms do not all start out with identical reorientation rates, we drew the initial M value from a distribution centered on M<sub>0</sub> to match the initial distribution of observed experimental rates (Lines 206-214). However, the decay in M (γ), as well as α and β, are the same for all in silico animals.

      (10) The left axes of Figure 2e should be reverted to cumulative counts (without the normalization).

      Thank you, we made this change.

      (11) The authors give an alternative model of a Levy flight, but do not give the obvious alternative models:<br /> a) the 1-state model in which P(t) = alpha exp (-gamma t) dt (i.e. a single stochastic process, without a hidden M, collapsing equations 1-3 into a single equation).

      b) the originally proposed 2-state model (with 3 parameters, a high turn rate, a low turn rate, and the local-to-global search transition time, which can be taken from the data, or sampled from the empirical probability distributions). Why not? The former seems necessary to justify the more complicated 2-process model, and the latter seems necessary since it's the model they are trying to replace. Including these two controls would allow them to compare the number of free parameters as well as the model results. I am also surprised by the Levy model since Levy is a family of models. How were the parameters of the Levy walk chosen?

      Thank you, we removed this section completely, as it is tangential to the main point of the paper.

      (12) One point that is entirely missing in the discussion is the individuality of worms. It is by now well known that individual animals have individual behaviors. Some are slow/fast, and similarly, their turn rates vary. This makes this problem even harder. Combined with the tiny number of events concerned (typically 20-40 per experiment), it seems daunting to determine the underlying model from behavioral statistics alone.

      Thank you, yes we should have been more explicit in the reasoning behind drawing the initial M from a distribution (response to comment #9). We assume that not every worm starts out with the same reorientation rate, but that some start out fast (high M) and some start out slow (low M). However, we do assume M decays with the same kinetics, which seems sufficient to produce the observed phenomena. Multiple decay rates are not needed to replicate the experimental data.

      (13) That said, it's well-known which neurons underpin the suppression of turning events (starting already with Gray et al 2005, which, strangely, was not cited here). Some discussion of the neuronal predictions for each of the two (or more) models would be appropriate.

      Thank you, yes we will add Gray et al, but also the more detailed response to Reviewer #2 (Lines 319-359 of manuscript).

      (14) An additional point is the reliance entirely on simulations. A rigorous formulation (of the probability distribution rather than just the mean) should be analytically tractable (at least for the first moment, and possibly higher moments). If higher moments are not obtainable analytically, then the equations should be numerically integrable. It seems strange not to do this.

      Thank you for suggesting this. For the Levy section (which we cut) this would have been an improvement. However, since the distributions of slope differences and transition times are based on a recursive algorithm, rather than an analytical formulation, we decided to use the Jensen-Shannon divergence to compare distributions (Lines 272-296, Figure 3c) since this is a parameter-free approach.

      In summary, while sample simulations do nicely match the examples in the data (of discontinuous vs continuous turning rates), this is not sufficient to demonstrate that the transition from ARS to dispersion in C. elegans is, in fact, likely to be a single 'state', or this (eq 1-3) single state. Of course, the model can be made more complicated to better match the data, but the approach of the authors, seeking an elegant and parsimonious model, is in principle valid, i.e. avoiding a many-parameter model-fitting exercise.

      As a qualitative exercise, the paper might have some merit. It demonstrates that apparently discrete states can sometimes be artifacts of sampling from smoothly time-changing dynamics. However, as a generic point, this is not novel, and so without the grounding in C. elegans data, is less interesting.

      Thank you, we agree that this is a generic phenomenon, which is partly why we did this. The data from López-Cruz seem to agree in part with Calhoun et al, that claim abrupt transitions occur, and Klein et al, which claim they do not occur. Since the underlying phenomenon is stochastic, we propose the mixed observations of sudden and gradual changes in search strategy are simply the result of a stochastic process, which can produce both phenomena for individual observations. We hope this work can help clarify why sudden changes in search strategy are not consistently observed. We propose a simple hypothesis that there is no change in search strategy. The reorientation rate decays in time, and due to the stochastic nature of this behavior, what appears as a sudden change for individual observations is not due to an underlying decision, but rather the result of a stochastic process.

    2. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #2 (Public reviews):

      Weaknesses:

      This manuscript has two weaknesses that dampen the enthusiasm for the results. First, in all of the examples the authors cite where a Gillespie algorithm is used to sample from a distribution, be it the kinetics associated with chemical dynamics, or a Lotka-Volterra Competition Model, there are underlying processes that govern the evolution of the dynamics, and thus the sampling from distributions. In one of their references for instance, the stochasticity arises from the birth and death rates, thereby influencing the genetic drift in the model. In these examples, the process governing the dynamics (and thus generating the distributions from which one samples) are distinct from the behavior being studied. In this manuscript, the distribution being sampled from is the exponential decay function of the reorientation rate (lines 100-102). This appears to be tautological - a decay function fitted to the reorientation data is then sampled to generate the distributions of the reorientation data. That the model performs well, and matches the data is commendable, but it is unclear how that could not be the case if the underlying function generating the distribution was fit to the data.

      To use the Lotka-Volterra model as an analogy, the changing reorientation rate (like a changing rate of prey growth) is tied to the decay in M (like a loss of predators). You could infer the loss of predators by measuring the changing rate of prey growth. In our case, we infer the loss of M by observing the changing reorientation rate. In the LotkaVolterra model, the prey growth rate is negatively associated with predator numbers, but in our model, the reorientation rate is positively associated with M, hence a loss in M leads to a decay in the reorientation rate.

      You are correct that the decay parameters fit to the average should produce a distribution of in silico data that reproduce this average result (Figure 3a). However, this does not necessarily mean that these kinetic parameters should produce the same distributions of switch kinetics observed in Figure 3b. Indeed, a binary model (𝑴 ∈ {𝟎, 𝟏}), which produces an average distribution that matches the average experimental data (Figure 3a) produces a fundamentally different (bimodal) distribution of switch distributions in Figure 3b.

      The second weakness is somewhat related to the first, in that absent an underlying mechanism or framework, one is left wondering what insight the model provides. Stochastic sampling a function generated by fitting the data to produce stochastic behavior is where one ends up in this framework, and the authors indeed point this out: "simple stochastic models should be sufficient to explain observably stochastic behaviors." (Line 233-234). But if that is the case, what do we learn about how the foraging is happening. The authors suggest that the decay parameter M can be considered a memory timescale; which offers some suggestion, but then go on to say that the "physical basis of M can come from multiple sources". Here is where one is left for want: The mechanisms suggested, including loss of sensory stimuli, alternations in motor integration, ionotropic glutamate signaling, dopamine, and neuropeptides are all suggested: this is basically all of the possible biological sources that can govern behavior, and one is left not knowing what insight the model provides. The array of biological processes listed are so variable in dynamics and meaning, that their explanation of what govern M is at best unsatisfying. Molecular dynamics models that generate distributions can point to certain properties of the model, such as the binding kinetics (on and off rates, etc.) as explanations for the mechanisms generating the distributions, and therefore point to how a change in the biology affects the stochasticity of the process. It is unclear how this model provides such a connection, especially taken in aggregate with the previous weakness.

      Providing a roadmap of how to think about the processes generating M, the meaning of those processes in search, and potential frameworks that are more constrained and with more precise biological underpinning (beyond the array of possibilities described) would go a long way to assuaging the weaknesses.

      The insight we (hopefully) are trying to convey is that individual observations of apparent state-switching behavior does not necessarily imply that a state change is actually happening if a large fraction of the population is not producing this behavior. This same observation can be recreated by invoking a stochastic process, which we already know is how reorientation occurrences behave in the first place. Apparent switches to global foraging are simply due to the reorientation rate decaying in time, not necessarily due to a sudden state change. We modeled a stochastic binary switch (when M0=1) which produced a bimodal distribution of switch kinetics (Figure 3b) which was different than the experimental distribution. The biological basis of M is not addressed here, but we clarified the language on lines 342 and 343 to reinforce that it likely represents the timescales of AIA and ADE activities. We reiterated what was described in López-Cruz et al to convey that molecularly, what is governing the timescales of these two neurons is not trivial, and likely multi-faceted.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The presentation of the Gillespie algorithm, though much improved, is tough going and for many biologists will be a barrier to appreciation of what was done and what was achieved. I found the description of the algorithm generated by AI (ChatGTP) to be more accessible and the example given to be better related to the present application of the algorithm. This might provide a template for a more accessible description of the model.

      We are glad the newer draft is clearer, and apologize it is still difficult to read. We made a few changes that hopefully clarify some points (see below).

      It is unclear how instances of >1 transition were automatically distinguished from instances with 1 transition. A related point is how the transition-finding algorithm was kept from detecting too many transitions, as it seems that any quadruplet of points defines a slope change.

      In López-Cruz et al, >1 transitions (and all transitions) were distinguished by eye after running the findchangepts function. We added a clarifying statement on lines 78 and 79 to illuminate this point. As noted on line 72, the function itself only fits two regressions, so by definition, it can only define one transition. This is why we decided to plot the distribution of slope and transition parameters in the first place; to see if there was a clear bimodal distribution (as observed for other observably binary states, like roaming and dwelling). This was not the case for the experimental data, but was observed in the in silico data if we forced the algorithm to be a two-state model (Figure 3b, M0 = 1).

      Line 113-4: I was confused by the distinction between the probability of observing an event and the propensity for it to occur. Are the authors implying that some events occur but are not observed?

      We apologize for this confusion, and added some phrasing in Lines 115-130 to address this. The propensity is analogous to the rate of a reaction. Given this rate, the probability of seeing Ω+1 reorientations in the infinitesimal time interval dt is the product of the propensity and the probability the current state is Ω reorientations.

      Line 120: Shouldn't propensity at t = 0 be alpha + beta?

      Yes, thank you for catching this. We fixed it.

      Why was it necessary to posit two decay processes (equations 2 and 5?). Wouldn't one suffice?

      Thank you, we have added some text to clarify this point (lines 129-132). The Gillespie algorithm models discrete temporal events, which are explicitly dependent on the current state of the system. Since the propensity itself is changing in time, it implies that it is coupled to another state variable that is changing in time, i.e. another propensity. Since an exponential decay is sufficient to model the decay in reorientations, this implies that the reorientation propensity is coupled to a first order decay propensity (equations 4-5).

      Line 145: ...sudden changes in [reorientation rate] are not due to...

      Thank you, we have corrected this (Line 157).

      Fig. 2d: Legend implies (but fails to state) that each dot is a worm, raising the question of how single worms with multiple transitions were plotted in this graph as they would have more than one transition point.

      Thank you, we updated the legend. Multiple transitions are not quantified with the tworegression approach. Prior observations, such as by López-Cruz, were simply done by eye.

      Line 153: Does i denote either process 1 or 2?

      Yes, i is the subscript for each propensity ai. We have added text on line 166 to clarify this.

      Line 159: Confusing. If an "event" is a reorientation event and a "transition" is a discrete change in slope of Omega vs t, then "The probability that no events will occur for ALL transitions in this time interval" makes no sense.

      Thank you, we have reworded this part (Lines 169-172) to be clearer.

      Equation 17:Unclear what index i refers to

      Thank you, we have changed this to index to j, and modified the text on line 228 to reflect this.

      Line 227-9: Unclear how collisions are thought to have caused the shift in experimental distribution.

      We have clarified the text on lines 246 and 250. Collisions are not being referred to here, but instead the crossing of pheromone trails. This is purely speculative.

      Line 310-317. If M rises on food, then worms should reorient more on food than after long times off food, when M has decayed. But worms don't reorient much on food; they behave as though M is low. This seems like a contradiction, unless one supposes instead that M is low on food and after long times off food but spikes when food is removed.

      Thank you, we have added clarifying language on lines 333-336 to address this point. Worm behavior is fundamentally different on food, as worms transition to a dwell/roam behavioral dynamic which is fundamentally different than foraging behavior while off food.

    1. Author Response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public Review):

      (1) I think the article is a little too immature in its current form. I'd recommend that the authors work on their writing. For example, the objectives of the article are not completely clear to me after reading the manuscript, composed of parts where the authors seem to focus on SGCs, and others where they study "engram" neurons without differentiating the neuronal type (Figure 5). The next version of the manuscript should clearly establish the objectives and sub-aims.

      We now provide clarification for focusing on the labeling status versus the cell types in figure 5. Since figure 5 focuses on inputs to labeled pairs versus Labeledunlabeled pairs the pairs include mixed groups with GCs and SGCs. Since the question pertains to inputs rather than cell types, we did not specifically distinguish the cell types. This is now explained in the text on page 15:  “Note that since the intent was to determine the input correlation depending on labeling status of the cell pairs rather than based on cell type, we do not explicitly consider whether analyzed cell pairs included GCs or SGCs.”

      (2) In addition, some results are not entirely novel (e.g., the disproportionate recruitment as well as the distinctive physiological properties of SGCs), and/or based on correlations that do not fully support the conclusions of the article. In addition to re-writing, I believe that the article would benefit from being enriched with further analyses or even additional experiments before being resubmitted in a more definitive form.

      We now indicate the data comparing labeled versus unlabeled SGCs is novel. Moreover, we also highlight that (1) recruitment of SGCs has not been previously examined in Barnes Maze or Enriched Environment, (2) that our unbiased morphological analysis of SGC recruitment is more robust than subsampling of recorded neurons in prior studies and (3) that our data show that prior may have overestimated SGC recruitment to engrams. Thus, the data characterized as “not novel” are essential for appropriate analysis of behaviorally tagged neurons which is the thrust of our study.  

      Reviewer #2 (Public Review):

      (1) The authors conclude that SGCs are disproportionately recruited into cfos assemblies during the enriched environment and Barnes maze task given that their classifier identifies about 30% of labelled cells as SGCs in both cases and that another study using a different method (Save et al., 2019) identified less than 5% of an unbiased sample of granule cells as SGCs. To make matters worse, the classifier deployed here was itself established on a biased sample of GCs patched in the molecular layer and granule cell layer, respectively, at even numbers (Gupta et al., 2020). The first thing the authors would need to show to make the claim that SGCs are disproportionately recruited into memory ensembles is that the fraction of GCs identified as SGCs with their own classifier is significantly lower than 30% using their own method on a random sample of GCs (e.g. through sparse viral labelling). As the authors correctly state in their discussion, morphological samples from patch-clamp studies are problematic for this purpose because of inherent technical issues (i.e. easier access to scattered GCs in the molecular layer).

      We now clarify, on page 9, that a trained investigator classified cell types based on predefined morphological criteria.  No automated classifiers were used to assign cell types in the current study.

      (2) The authors claim that recurrent excitation from SGCs onto GCs or other SGCs is irrelevant because they did not find any connections in 32 simultaneous recordings (plus 63 in the next experiment). Without a demonstration that other connections from SGCs (e.g. onto mossy cells or interneurons) are preserved in their preparation and if so at what rates, it is unclear whether this experiment is indicative of the underlying biology or the quality of the preparation. The argument that spontaneous EPSCs are observed is not very convincing as these could equally well arise from severed axons (in fact we would expect that the vast majority of inputs are not from local excitatory cells). The argument on line 418 that SGCs have compact axons isn't particularly convincing either given that the morphologies from which they were derived were also obtained in slice preparations and would be subject to the same likelihood of severing the axon. Finally, even in paired slice recordings from CA3 pyramidal cells the experimentally detected connectivity rates are only around 1% (Guzman et al., 2016). The authors would need to record from a lot more than 32 pairs (and show convincing positive controls regarding other connections) to make the claim that connectivity is too low to be relevant.

      We have conducted additional control experiments (detailed in response to Editorial comment #3), in which we replicated the results of Stefanelli et al (2016) identifying that optogenetic activation of a focal cohort of ChR2 expressing granule cells leads to robust feedback inhibition of adjacent granule cells. These control experiments demonstrate that the slice system supports the feedback inhibitory circuit which requires GC/SGC to hilar neuron synapses.

      (3) Another troubling sign is the fact that optogenetic GC stimulation rarely ever evokes feedback inhibition onto other cells which contrasts with both other in vitro (e.g. Braganza et al., 2020) and in vivo studies (Stefanelli et al., 2016) studies. Without a convincing demonstration that monosynaptic connections between SGCs/GCs and interneurons in both directions is preserved at least at the rates previously described in other slice studies (e.g. Geiger et al., 1997, Neuron, Hainmueller et al., 2014, PNAS, Savanthrapadian et al., 2014, J. Neurosci), the notion that this setting could be closer to naturalistic memory processing than the in vivo experiments in Stefanelli et al. (e.g. lines 443-444) strikes me as odd. In any case, the discussion should clearly state that compromised connectivity in the slice preparation is likely a significant confound when comparing these results.

      We have conducted additional control experiments (detailed in response to Editorial comment #3), in which we replicated the results of Stefanelli et al identifying that optogenetic activation of a focal cohort of ChR2 expressing granule cells leads to robust feedback inhibition of adjacent granule cells. These control experiments demonstrate that the slice system in our studies support the feedback inhibitory circuit detailed in prior studies. We also clarify that Stefanelli study labeled random neurons and did not examine natural behavioral engrams and  discuss (on page 20) the correspondence/consistency of our results with that of Braganza et al 2020.

      (4) Probably the most convincing finding in this study is the higher zero-time lag correlation of spontaneous EPSCs in labelled vs. unlabeled pairs. Unfortunately, the fact that the authors use spontaneous EPSCs to begin with, which likely represent a mixture of spontaneous release from severed axons, minis, and coordinated discharge from intact axon segments or entire neurons, makes it very hard to determine the meaning and relevance of this finding. At the bare minimum, the authors need to show if and how strongly differences in baseline spontaneous EPSC rates between different cells and slices are contributing to this phenomenon. I would encourage the authors to use low-intensity extracellular stimulation at multiple foci to determine whether labelled pairs really share higher numbers of input from common presynaptic axons or cells compared to unlabeled pairs as they claim. I would also suggest the authors use conventional Cross correlograms (CCG; see e.g. English et al., 2017, Neuron; Senzai and Buzsaki, 2017, Neuron) instead of their somewhat convoluted interval-selective correlation analysis to illustrate codependencies between the event time series. The references above also illustrate a more robust approach to determining whether peaks in the CCGs exceed chance levels.

      We have included data on sEPSC frequency in the recorded cell pairs (Supplemental Fig 4) and have also conducted additional experiments and present data demonstrating that labeled cell show higher sEPSC frequency and amplitude than corresponding unlabeled cells in both cell types (new Fig 5).  We also include data from new  experiments to show that over 50% of the sEPSCs represent action potential driven events (Supplemental fig 3). 

      We thank the reviewer for the suggestion to explore alternative methods of analyses including CCGs to further strengthen our findings. We have now conducted CCGs on the same data set and report that “The dynamics of the cross-correlograms generated from our data sets using previously established methods to evaluate monosynaptic connectivity (Bartho et al., 2004; Senzai and Buzsaki, 2017) parallelled that of the CCP plots (Supplemental Fig. 6) illustrating that the methods similarly capture co-dependencies between event time series. We note, here, that while the CCG and CCP are qualitatively similar, the magnitude of the peaks were different, due to the sparseness of synaptic events. 

      (5) Finally, one of the biggest caveats of the study is that the ensemble is labelled a full week before the slice experiment and thereby represents a latent state of a memory rather than encoding consolidation, or recall processes. The authors acknowledge that in the discussion but they should also be mindful of this when discussing other (especially in vivo) studies and comparing their results to these. For instance, Pignatelli et al 2018 show drastic changes in GC engram activity and features driven by behavioral memory recall, so the results of the current study may be very different if slices were cut immediately after memory acquisition (if that was possible with a different labelling strategy), or if animals were re-exposed to the enriched environment right before sacrificing the animal.

      As noted by the reviewer, we fully acknowledge and are cognizant of the concern that slices prepared a week after labeling may not reflect ongoing encoding. Although our data show that labeled cells are reactivated in higher proportion during recall, we have discussed this caveat and will include alternative experimental strategies in the discussion.

      Reviewer #3 (Public Review):

      (1) Engram cells are (i) activated by a learning experience, (ii) physically or chemically modified by the learning experience, and (iii) reactivated by subsequent presentation of the stimuli present at the learning experience (or some portion thereof), resulting in memory retrieval. The authors show that exposure to Barnes Maze and the enriched environment-activated semilunar granule cells and granule cells preferentially in the superior blade of the dentate gyrus, and a significant fraction were reactivated on re-exposure. However, physical or chemical modification by experience was not tested. Experience modifies engram cells, and a common modification is the Hebbian, i.e., potentiation of excitatory synapses. The authors recorded EPSCs from labeled and unlabeled GCs and SGCs. Was there a difference in the amplitude or frequency of EPSCs recorded from labeled and unlabeled cells?

      We have included data on sEPSC frequency in the recorded cell pairs (Supplemental Fig 4) and have also conducted additional experiments and report and present data demonstrating that labeled cell show higher sEPSC frequency and amplitude than corresponding unlabeled cells in both cell types (new Fig 5).  We also include data from new  experiments to show that over 50% of the sEPSCs represent action potential driven events (Supplemental fig 3).

      (2) The authors studied five sequential sections, each 250 μm apart across the septotemporal axis, which were immunostained for c-Fos and analyzed for quantification. Is this an adequate sample? Also, it would help to report the dorso-ventral gradient since more engram cells are in the dorsal hippocampus. Slices shown in the figures appear to be from the dorsal hippocampus. 

      We thank the reviewer for the comment. We analyzed sections along the dorsoventral gradient. As explained in the methods, there is considerable animal to animal variability in the number of labeled cells which was why we had to use matched littermate pairs in our experiments This variability could render it difficult to tease apart dorsoventral differences. 

      (3) The authors investigated the role of surround inhibition in establishing memory engram SGCs and GCs. Surprisingly, they found no evidence of lateral inhibition in the slice preparation. Interneurons, e.g., PV interneurons, have large axonal arbors that may be cut during slicing.

      Similarly, the authors point out that some excitatory connections may be lost in slices. This is a limitation of slice electrophysiology.

      We have conducted additional control experiments (detailed in response to Editorial comment #3), in which we replicated the results of Stefanelli et al identifying that optogenetic activation of a focal cohort of ChR2 expressing granule cells leads to robust feedback inhibition of adjacent granule cells. These control experiments demonstrate that the slice system supports the feedback inhibitory circuit detailed in prior studies. 

      We now discuss (page 21) that “the possibility that slice recordings lead to underestimation of feedback dendritic inhibition cannot be ruled out.”

      Reviewer #1 (Recommendations for the authors):

      (1) I struggle to understand the added value of the Barnes Maze data (Figures 1 and S1), since the authors then focus on the EE for practical reasons. In particular, the analysis of mouse performance (presented in supplemental Figure 1) does not seem traditional to me. For example, instead of the 3 classical exploration strategies (i.e., random, serial, direct), the authors describe 6, and assign each of these strategies a score based on vague criteria (why are "long corrected" and "focused research" both assigned a score of 0.5?). Unless I'm mistaken, no other classic parameters are described (e.g., success rate, latency, number of errors). If the authors decide to keep the BM results, I recommend better justifying its existence and adding more details, including in the method section. Otherwise, perhaps they should consider withdrawing it. Even if we had to use two different behavioral contexts, wouldn't it have made sense to use, in addition to the EE, the fear conditioning test, which is widely used in the study of engrams? Under these conditions (Stefanelli et al., 2016), the number of cells recruited after fear conditioning seems sufficient to reproduce the analyses presented in Figures 2-5 and determine whether or not lateral inhibition is dependent on the type of context (Stefanelli and colleagues suggest significant strong lateral inhibition during fear conditioning, whereas the data from Dovek and colleagues suggest quite the opposite after exposure to EE).

      The Barnes Maze data was included to evaluate the DG ensemble activation during a dentate dependent non-fear based behavioral task. This is now introduced and explained in the results. We have now included plots of the primary latency and number of errors in finding the escape hole to confirm the improvement over time (Supplemental Fig. 1). We specifically used the BUNS analysis to evaluate the use of spatial strategy and show that by day 6, day of tamoxifen induction, the mice are using a spatial strategy for navigation. Our approach to evaluate exploration strategy is based on criteria published in Illouz et al 2016. This is now detailed in the methods on page 25. We hope that  the inclusion of the supplemental data and revisions to methods and results address the concerns regarding Barnes Maze experiments. 

      Regarding Stefanelli et al., 2016, please note that the study adopted random labeling of neurons using a CaMKII promotor driven reporter expression which they activated during spatial exploration of fear conditioning behaviors. As such labeled neurons in the Stefanelli study were NOT behaviorally driven, rather they were optically activated. This is now clarified in the text. The main drive for our study was to evaluate behaviorally tagged neurons which is novel, distinct from the Stefanelli study, and, we would argue, more behaviorally realistic and relevant.

      Additionally, the lateral inhibition observed in Stafanelli et al was in response to activation of GCs labeled by virally mediate CAMKII-driven ChR2 expression. Using a similar labeling approach, new control data presented in Supplemental fig. 3 show that we are fully able to replicate the lateral inhibition observed by Stefanalli et al. These control experiments further suggest that the sparse and distributed GC/SGC ensembles activated during non-aversive behavioral tasks may not be sufficient to elicit robust lateral inhibition as has been observed when a random population of adjacent neurons are activated. Our findings are also consistent with observations by Barganza et al., 2020. This is now Discussed on page 21.

      (2) The authors recorded sEPSCs received by recruited and non-recruited GCs and SGCs after EE exposure. However, it appears that they studied them very little, apart (from a temporal correlation analysis (Figure 5). Yet it would be interesting to determine whether or not the four neuronal populations possess different synaptic properties. 

      What is the frequency and amplitude of sEPSCs in GCs and SGCs recruited or not after EE exposure? Similarly, can the author record the sIPSCs received by dentate gyrus engram and non-engram GCs and SGCs? If so, what is their frequency and amplitude?

      As suggested by the editorial comment #2, we how include data on the frequency and amplitude of the sEPSCs in GCs and SGCs used in our analysis of figure 5. Given the low numbers of unlabeled SGCs and labeled GCs in our paired recordings (Supplemental Fig. 5), we choose not to use this data set for analysis of cell-type and labeling based differences in EPSC parameters. However, we have previously reported that sIPSC frequency is higher in SGCs than in GCs. Additionally, we have identified that sEPSC frequency in SGCs is higher than in GC (Dovek et al, in preprint, DOI: 10.1101/2025.03.14.643192).  

      To specifically address reviewer concerns, we have conducted new recorded EPSCs in a cohort of labeled and unlabeled GCs and SGCs and present data demonstrating that labeled cell show higher sEPSC frequency and amplitude than corresponding unlabeled cells in both cell types (new Fig 5). These experiments were conducted in TRAP2-tdT labeled cells which were not stable in cesium based recordings. As such we, we deferred the IPSC analysis for later and restricted analysis to sEPSCs for this study. 

      (3) Previous data showed that dentate gyrus neurons that are recruited or not in a given context could exhibit distinct morphological characteristics (Pléau et al. 2021) and biochemical content (Penk expression, Erwin et al., 2020). In order to enrich the electrophysiological data presented in Figure 2, could the authors take advantage of the biocytin filling to perform a morphological and biochemical comparison of the different neuronal types (i.e., GCs and SGCs recruited or not after EE)?

      Thank you for this suggestion. Unfortunately, detailed morphometry and biochemical analysis on labeled and unlabeled neurons was not conducted as part of this study as our focus was on circuit differences. In our experience, unless the sections are imaged soon after staining, the sections are suboptimal for detailed morphological reconstruction and analysis. Our ongoing studies suggest that PENK is an activity marker and not a selective marker for SGCs and we are undertaking transcriptomic analysis to identify molecular differences between GCs and SGCs. We respectfully submit that these experiments are outside the scope of this study.

      (4) Figures 3 and 4 show only schematic diagrams and representative data. No quantification is shown. Instead of pie charts showing the identity of each pair (which I find unnecessary), I'll use pie charts representing the % of each pair in which an excitatory or inhibitory drive was recorded (with the corresponding n).

      Please note that we did not observe evoked synaptic potentials in any except one pair precluding the possibility of quantification. However, we submit that it is important for the readers to have information on the number of pairs and the types of pre-post synaptic pairs in which the connections were tested.

      (5) Figure 3: Given that GCs form very few recurrences in non-pathological conditions, it hardly surprises me that they form few or no local glutamatergic connections. In contrast, this result surprises me more for SGCs, whose axons form collaterals in the dentate gyrus granular and molecular layers (Williams et al., 2007; Save et al., 2019). To control the reliability of their conditions, could the authors check whether SGCs do indeed form connections with hilar mossy cells, as has been reported in the past? To test whether this lack of interconnectivity is specific to neurons belonging to the same engram (or not), could the authors test whether or not the stimulation of labeled GCs/SGCs (via membrane depolarization or even optogenetics) generates EPSCs in unlabeled GCs?

      As suggested by the reviewer, we have examined whether widefield optical activation of all labeled neurons including GCs and SGCs lead to EPSCs in unlabeled GCs (63 cells tested). However, we did not observe eEPSCs. This data is presented on page 13, (Fig 4F) in the results and discussed on page 20. Since the wide field stimulation should activate terminals and lead to release even if the axon is severed, our data suggest the glutamatergic drive from SGC to GC may be limited.

      As noted above, we have demonstrated the presence of lateral inhibition consistent with data in Stefanelli et al in our new supplementary figure 3. We have also shown that sustained SGC firing upon perforant path stimulations is associated with sustained firing in hilar interneurons (Afrasiabi et al., 2022) indicating presence of the SGC to hilar connectivity in our slice preparation. Therefore, we choose not to undertake challenging 2P guided paired recording of SGCs and mossy cells adjacent to SGC axon terminals reported in Williams et al 2007 to replicate the 9%  SGC to MC synaptic connections. These 2P guided slice physiology studies are outside the technical scope of our study.

      (6) Figure 4: The results are relatively in contradiction with the strong lateral inhibition reported in the past (Stefanelli et al., 2016), but the experimental conditions are different in the two studies. Stimulation of a single labeled GC or SGC may not be sufficient to activate an inhibitory neuron, and for the latter to inhibit an unlabeled GC or SGC. Is it possible to measure the sIPSCs received by unlabelled neurons during optogenetic stimulation of all labelled neurons? Could the authors verify whether under their experimental conditions GCs and SGCs do indeed form connections with interneurons, as reported before? Finally, Stefanelli and colleagues (2016) suggest that lateral inhibition is provided by dendrites- targeting somatostatin interneurons. If the authors are recording in the soma, could they underestimate more distal inhibitory inputs? If so, could they record the dendrites of unlabeled neurons?

      Our new control data (Supplementary Fig. 3) using an AAV mediated CAMKII promotor driven random expression of ChR2 on GCs, similar to Stefanelli et al (2016) demonstrates our ability replicate the lateral inhibition observed by Stefanalli et al. (2016). Thus, our findings more accurately represent lateral inhibition supported by a sparse behaviorally labeled cohort than findings of Stefanelli et al based on randomly labeled neurons. This is now discussed on page 22-23. We respectfully submit that dendritic recordings are outside the scope of the current study.

      We also discuss the possibility that somatic recordings may under sample dendritic inhibitory inputs on page 23 “the possibility that slice recordings lead to underestimation of feedback dendritic inhibition cannot be ruled out.”

      (7) Figure 5: For ease of reading, I would substantially simplify the Results section related to Figure 5, keeping only the main general points of the analysis and the results themselves. The details of the analysis strategy, and the justification for the choices made, are better placed in the Method section (I advise against "data not shown").

      We thank the reviewer for the suggestion to improve accessibility of the results and have moved text related to justification of strategy and controls to the methods. We have also removed references to data not shown.

      (8) Figure 5: why do the authors no longer discriminate between GCs and SGCs?

      Since figure 5 focuses on inputs to labeled pairs versus labeled-unlabeled pairs the pairs include mixed groups with GCs and SGCs. Since the question pertains to inputs rather than cell types, we did not specifically distinguish the cell types. This is now explained in the text on page 15.

      (9) Figure 5: I would like to know more about the temporally connected inputs and their implication in context-dependent recruitment of dentate gyrus neurons. What could be the origin of the shared input received by the neurons recruited after EE exposure? For example, do labeled neurons receive more (temporally correlated or not) inputs from the entorhinal cortex (or any other upstream brain region) than unlabeled neurons? Is there any way (e.g., PP stimulation or any kind of manipulation) to test the causal relationship between temporally correlated input and the context-dependent recruitment of a given neuron?

      We appreciate the reviewer’s comments on the need to examine the source and nature of the correlated inputs to behaviorally labeled neurons. However, the suggested experiments are nontrivial as artificial stimulation of afferent fibers is unlikely to be selective for labeled and unlabeled cells. Given the complexities in design, implementation and interpretation of these experiments we respectfully submit that these are outside the scope of the current study.

      Reviewer #2 (Recommendations for the authors):

      There are a few minor issues limiting the extent of interpretations of the data:

      (1) Only about 7% of the 'engram' cells are re-activated one week after exposure (line 147), it is unclear how meaningful this assembly is given the high number of cells that may either be labelled unrelated to the EE or no longer be part of the memory-related ensemble.

      We now discuss (page 22-23) that the % labeling is consistent with what has been observed in the DG 1 week after fear conditioning (DeNardo et al., 2019) and discuss the caveat that all labeled cells may not represent an engram.  

      (2) Line 215: The wording '32 pairwise connections examined' suggests that there actually were synaptic connections, would recommend altering the wording to 'simultaneously recorded cells examined' to avoid confusion.

      Revised as suggested

    1. Author Response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public Review):

      Summary

      While DNA sequence divergence, differential expression, and differential methylation analysis have been conducted between humans and the great apes to study changes that "make us human", the role of lncRNAs and their impact on the human genome and biology has not been fully explored. In this study, the authors computationally predict HSlncRNAs as well as their DNA Binding sites using a method they have developed previously and then examine these predicted regions with different types of enrichment analyses. Broadly, the analysis is straightforward and after identifying these regions/HSlncRNAs the authors examined their effects using different external datasets.

      I no longer have any concerns about the manuscript as the authors have addressed my comments in the first round of review.

      We thank the reviewer for the valuable comments, which have helped us improve the manuscript.

      Reviewer #2 (Public Review):

      Lin et al attempt to examine the role of lncRNAs in human evolution in this manuscript. They apply a suite of population genetics and functional genomics analyses that leverage existing data sets and public tools, some of which were previously built by the authors, who clearly have experience with lncRNA binding prediction. However, I worry that there is a lack of suitable methods and/or relevant controls at many points and that the interpretation is too quick to infer selection. While I don't doubt that lncRNAs contribute to the evolution of modern humans, and certainly agree that this is a question worth asking, I think this paper would benefit from a more rigorous approach to tackling it.

      I thank the authors for their revisions to the manuscript; however, I find that the bulk of my comments have not been addressed to my satisfaction. As such, I am afraid I cannot say much more than what I said last time, emphasising some of my concerns with regards to the robustness of some of the analyses presented. I appreciate the new data generated to address some questions, but think it could be better incorporated into the text - not in the discussion, but in the results.

      We thank the reviewer for the careful reading and valuable comments. In this round of revision, we address the two main concerns: (1) there is a lack of suitable methods and/or relevant controls at many points, and (2) the interpretation is too quick to infer selection. Based on these comments, we have carefully revised all sections of the manuscript, including the Introduction, Results, Discussion, and Materials and Methods.

      In addition, we have performed two new analyses. Based on the two analyses, we have added one figure and two sections to Results, two sections to Materials and Methods, one figure to Supplementary Notes, and two tables to Supplementary Tables. These results were obtained using new methods and provided more support to the main conclusion.

      To be more responsible, we re-look into the comments made in the first round and respond to them further. The following are point-to-point responses to comments.

      Since many of the details in the Responses-To-Comments are available in published papers and eLife publishes Responses-To-Comments, we do not greatly revise supplementary notes to avoid ostensibly repeating published materials.

      “lack of suitable methods and/or relevant controls”.

      We carefully chose the methods, thresholds, and controls in the study; now, we provide clearer descriptions and explanations.

      (1) We have expanded the last paragraph in Introduction to briefly introduce the methods, thresholds, and controls.

      (2) In many places in Results and Materials and Methods, revisions are made to describe and justify methods, thresholds, and controls.

      (3) Some methods, thresholds, and controls have good consensus, such as FDR and genome-wide background, but others may not, such as the number of genes that greatly differ between humans and chimpanzees. Now, we describe our reasons for the latter situation. For example, we explain that “About 5% of genes have significant sequence differences in humans and chimpanzees, but more show expression differences due to regulatory sequences. We sorted target genes by their DBS affinity and, to be prudential, chose the top 2000 genes (DBS length>252 bp and binding affinity>151) and bottom 2000 genes (DBS length<60 bp but binding affinity>36) to conduct over-representation analysis”.

      (4) We also carefully choose proper words to make descriptions more accurate.

      Responses to the suggestion “new data generated could be better incorporated into the text”.

      (1) We think that this sentence “The occurrence of HS lncRNAs and their DBSs may have three situations – (a) HS lncRNAs preceded their DBSs, (b) HS lncRNAs and their DBSs co-occurred, (c) HS lncRNAs succeeded their DBSs. Our results support the third situation and the rewiring hypothesis”, previously in Discussion, should be better in section 2.3. We have revised it and moved it into the second paragraph of section 2.3.

      (2) Our two new analyses generated new data, and we describe them in Results.

      (3) It is possible to move more materials from Supplementary Notes to the main text, but it is probably unnecessary because the main text currently has eight sub-sections, two tables, and four figures.

      Responses to the comment “the interpretation is too quick to infer selection”.

      (1) When using XP-CLR, iSAFE, Tajima's D, Fay-Wu's H, the fixation index (Fst), and linkage disequilibrium (LD) to detect selection signals, we used the widely adopted parameters and thresholds but did not mention this clearly in the original manuscript. Now, in the first sentence of the second paragraph of section 2.4, we add the phrase “with widely-used parameters and thresholds” (more details are available in section 4.7 and Supplementary Notes).

      (2) It is not the first time we used these tests. Actually, we used these tests in two other studies (Tang et al. Uncovering the extensive trade-off between adaptive evolution and disease susceptibility. Cell Rep. 2022; Tang et al. PopTradeOff: A database for exploring population-specificity of adaptive evolution, disease susceptibility, and drug responsiveness. Comput Struct Biotechnol J. 2023). In this manuscript, section 2.5 and section 4.12 describe how we use these tests to detect signals and infer selection. We also cite the above two published papers from which the reader can obtain more details.

      (3) Also, in section 2.4, we stress that “Signals in considerable DBSs were detected by multiple tests, indicating the reliability of the analysis”.

      To further respond to the comments of “lack of suitable methods” and “this paper would benefit from a more rigorous approach to tackling it”, we have performed two new analyses. The results of the new analyses agree well with previous results and provide new support for the main conclusion. The result of section 2.5 is novel and interesting.

      We write in Discussion “Two questions are how mouse-specific lncRNAs specifically rewire gene expression in mice and how human- and mouse-specific rewiring influences the cross-species transcriptional differences”. To investigate whether the rewiring of gene expression by HS lncRNA in humans is accidental in evolution, we have made further genomic and transcriptomic analyses (Lin et al. Intrinsically linked lineage-specificity of transposable elements and lncRNAs reshapes transcriptional regulation species- and tissue-specifically. doi: https://doi.org/10.1101/2024.03.04.583292). To verify the obtained conclusions, we analyzed the spermatogenesis data from multiple species and obtained supporting evidence (not published).

      I note some specific points that I think would benefit from more rigorous approaches, and suggest possible ways forward for these.

      Much of this work is focused on comparing DNA binding domains in human-unique long-noncoding RNAs and DNA binding sites across the promoters of genes in the human genome, and I think the authors can afford to be a bit more methodical/selective in their processing and filtering steps here. The article begins by searching for orthologues of human lncRNAs to arrive at a set of 66 human-specific lncRNAs, which are then characterised further through the rest of the manuscript. Line 99 describes a binding affinity metric used to separate strong DBS from weak DBS; the methods (line 432) describe this as being the product of the DBS or lncRNA length times the average Identity of the underlying TTSs. This multiplication, in fact, undoes the standardising value of averaging and introduces a clear relationship between the length of a region being tested and its overall score, which in turn is likely to bias all downstream inference, since a long lncRNA with poor average affinity can end up with a higher score than a short one with higher average affinity, and it's not quite clear to me what the biological interpretation of that should be. Why was this metric defined in this way?

      (1) Using RNA:DNA base-pairing rules, other DBS prediction programs return just DBSs with lengths. Using RNA:DNA base-pairing rules and a variant of Smith-Waterman local alignment, LongTarget returns DBSs with lengths and identity values together with DBDs (local alignment makes DBDs and DBSs predicted simultaneously). Thus, instead of measuring lncRNA/DNA binding based on DBS length, we measure lncRNA/DNA binding based on both DBS length and DBD/DBS identity (simply called identity, which is the percentage of paired nucleotides in the RNA and DNA sequences). This allows us to define “binding affinity”. One may think that binding affinity is a more complex function of length and identity. But, according to in vitro studies (see the review Abu Almakarem et al. 2012 and citations therein, and see He et al. 2015 and citations therein), the strength of a triplex is determined by all paired nucleotides (i.e., triplet). Thus, binding affinity=length * identity is biologically reasonable.

      (2) Further, different from predicting DBS upon individual base-pairing rules such as AT-G and CG-C, LongTarget integrates base-pairing rules into rulesets, each covering A, T, C, and G (see the two figures below, which are from He et al 2015). This makes every nucleotide in the RNA and DNA sequences comparable and allows the computation of identity.

      (3) On whether LongTarget may predict unreasonably long DBSs. Three technical features of LongTarget make this highly unlikely (and more unlikely than other programs). The three features are (a) local alignment, (b) gap penalty, and (c) TT penalty (He et al. 2015).

      (4) Some researchers may think that a higher identity threshold (e.g., 0.8 or even higher) makes the predicted DBSs more reliable. This is not true. To explore plausible identity values, we analyzed the distribution of Kcnq1ot1’s DBSs in the large Kcnq1 imprinting region (which contains many known imprinted genes). We found that a high threshold for identity (e.g., 0.8) will make DBSs in many known imprinted genes fail to be predicted. Upon our analysis of many lncRNAs and upon early in vitro experiments, plausible identity values range from 0.4 to 0.8.

      (5) Is it necessary or advisable to define an identity threshold? Since identity values from 0.4 to 0.8 are plausible and identity is a property of a DBS but does not reflect the strength of the whole triplex, it is more reasonable to define a threshold for binding affinity to control predicted DBSs. As explained above, binding affinity = length*identity is a reasonable measure of the strength of a triplex. The default threshold is 60, and given an identity of 0.6 in many triplexes, a DBS with affinity=60 is about 100 bp. Compared with TF binding sites (TFBS), 100 bp is quite long. As we explain in the main text, “taking a DBS of 147 bp as an example, it is extremely unlikely to be generated by chance (p < 8.2e-19 to 1.5e-48)”.

      (6) How to validate predicted DBSs? Validation faces these issues. (a) DBDs are predicted on the genome level, but target transcripts are expressed in different tissues and cells. So, no single transcriptomic dataset can validate all predicted DBSs of a lncRNA. No matter using what techniques and what cells, only a small portion of predicted DBSs can be experimentally captured (validated). (b) The resolution of current experimental techniques is limited; thus, experimentally identified DBSs (i.e., “peaks”) are much longer than computationally predicted DBSs. (c) Experimental results contain false positives and false negatives. So, validation (or performance evaluation) should also consider the ROC curves (Wen et al. 2022).

      (7) As explained above, a long DBS may have a lower binding affinity than a short DBS. A biological interpretation is that the long DBS may accumulate mutations that decrease its binding ability gradually.

      There is also a strong assumption that identified sites will always be bound (line 100), which I disagree is well-supported by additional evidence (lines 109-125). The authors show that predicted NEAT1 and MALAT1 DBS overlap experimentally validated sites for NEAT1, MALAT1, and MEG3, but this is not done systematically, or genome-wide, so it's hard to know if the examples shown are representative, or a best-case scenario.

      (1) We did not make this assumption. Apparently, binding depends on multiple factors, including co-expression of genes and specific cellular context.

      (2) On the second issue, “this is not done systematically, or genome-wide”. We did genome-wide but did not show all results (supplementary fig 2 shows three genomic regions, which are impressively good). In Wen et al. 2022, we describe the overall results.

      It's also not quite clear how overlapping promoters or TSS are treated - are these collapsed into a single instance when calculating genome-wide significance? If, eg, a gene has five isoforms, and these differ in the 3' UTR but their promoter region contains a DBS, is this counted five times, or one? Since the interaction between the lncRNA and the DBS happens at the DNA level, it seems like not correcting for this uneven distribution of transcripts is likely to skew results, especially when testing against genome-wide distributions, eg in the results presented in sections 5 and 6. I do not think that comparing genes and transcripts putatively bound by the 40 HS lncRNAs to a random draw of 10,000 lncRNA/gene pairs drawn from the remaining ~13500 lncRNAs that are not HS is a fair comparison. Rather, it would be better to do many draws of 40 non-HS lncRNAs and determine an empirical null distribution that way, if possible actively controlling for the overall number of transcripts (also see the following point).

      (1) We predicted DBSs in the promoter region of 179128 Ensembl-annotated transcripts and did not merge DBSs (there is no need to merge them). If multiple transcripts share the same TSS, they may share the same DBS, which is natural.

      (2) If the DBSs of multiple transcripts of a gene overlap, the overlap does not raise a problem for lncRNA/DNA binding analysis in specific tissues because usually only one transcript is expressed in a tissue. Therefore, there is no such situation “If, e.g., a gene has five isoforms, and these differ in the 3' UTR but their promoter region contains a DBS, is this counted five times, or one?”

      (3) It is unclear to us what “it seems like not correcting for this uneven distribution of transcripts is likely to skew results” means. Regarding testing against genome-wide distributions, statistically, it is beneficial to make many rounds of random draws genome-wide, but this will take a huge amount of time. Since more variables demand more rounds of drawing, to our knowledge, this is not widely practiced in large-scale transcriptomic data analyses.

      (4) If the difference (result) is small thus calls for rigorous statistical testing, making many rounds of random draws genome-wide is necessary. In our results, “45% of these pairs show a significant expression correlation in specific tissues (Spearman's |rho| >0.3 and FDR <0.05). In contrast, when randomly sampling 10000 pairs of lncRNAs and protein-coding transcripts genome-wide, the percent of pairs showing this level of expression correlation (Spearman's |rho| >0.3 and FDR <0.05) is only 2.3%”.

      Thresholds for statistical testing are not consistent, or always well justified. For instance, in line 142 GO testing is performed on the top 2000 genes (according to different rankings), but there's no description of the background regions used as controls anywhere, or of why 2000 genes were chosen as a good number to test? Why not 1000, or 500? Are the results overall robust to these (and other) thresholds? Then line 190 the threshold for downstream testing is now the top 20% of genes, etc. I am not opposed to different thresholds in principle, but they should be justified.

      (1) We used the g:Profiler program to perform over-representation analysis to identify enriched GO terms. This analysis is used to determine what pre-defined gene sets (GO terms) are more present (over-represented) in a list of “interesting” genes than what would be expected by chance. Specifically, this analysis is often used to examine whether the majority of genes in a pre-defined gene set fall in the extremes of a list: the top and bottom of the list, for example, may correspond to the largest differences in expression between the two cell types. g:Profiler always takes the whole genome as the reference; that is why we did not mention the whole genome reference. We now add in section 2.2 “(with the whole genome as the reference)”.

      (2) Why choosing 2000 but not 2500 genes is somewhat subjective. We now explain that “About 5% of genes have significant sequence differences in humans and chimpanzees, but more show expression differences due to regulatory sequences. We sorted target genes by their DBS affinity and, to be prudential, chose the top 2000 genes (DBS length>252 bp and binding affinity>151) and bottom 2000 genes (DBS length<60 bp but binding affinity>36) to conduct over-representation analysis”.

      Likewise, comparing Tajima's D values near promoters to genome-wide values is unfair, because promoters are known to be under strong evolutionary constraints relative to background regions; as such it is not surprising that the results of this comparison are significant. A fairer comparison would attempt to better match controls (eg to promoters without HS lncRNA DBS, which I realise may be nearly impossible), or generate empirical p-values via permutation or simulation.

      We used these tests to detect selection signals in DBSs but not in the whole promoter regions. Using promoters without HS lncRNA DBS as the control also has risks because promoter regions contain other kinds of regulatory sequences.

      There are huge differences in the comparisons between the Vindija and Altai Neanderthal genomes that to me suggest some sort of technical bias or the such is at play here. e.g. line 190 reports 1256 genes to have a high distance between the Altai Neanderthal and modern humans, but only 134 Vindija genes reach the same threshold of 0.034. The temporal separation between the two specimens does not seem sufficient to explain this difference, nor the difference between the Altai Denisovan and Neanderthal results (2514 genes for Denisovan), which makes me wonder if it is a technical artefact relating to the quality of the genome builds? It would be worth checking.

      We feel it is hard to know whether or not the temporal separation between these specimens is sufficient to explain the differences because many details of archaic humans and their genomes remain unknown and because mechanisms determining genotype-phenotype relationships remain poorly known. After 0.034 was determined, these numbers of genes were determined accordingly. We chose parameters and thresholds that best suit the most important requirements, but these parameters and thresholds may not best suit other requirements; this is a problem for all large-scale studies.     

      Inferring evolution: There are some points of the manuscript where the authors are quick to infer positive selection. I would caution that GTEx contains a lot of different brain tissues, thus finding a brain eQTL is a lot easier than finding a liver eQTL, just because there are more opportunities for it. Likewise, claims in the text and in Tables 1 and 2 about the evolutionary pressures underlying specific genes should be more carefully stated. The same is true when the authors observe high Fst between groups (line 515), which is only one possible cause of high Fst - population differentiation and drift are just as capable of giving rise to it, especially at small sample sizes.

      (1) We add in Discussion that “Finally, not all detected signals reliably indicate positive selection”.

      (2) Our results are that more signals are detected in CEU and CHB than in YRI; this agrees all population genetics studies and implies that our results are not wrongly biased because more samples and larger samples were obtained from CEU and CHB.

    1. Author Response:

      We thank the reviewers for their insightful comments on our manuscript. We are encouraged by their positive assessment of our multiscale simulation approach and segment-capture mechanism.

      In our revision, we will address the reviewers' primary concerns, which are summarized into three key points: (1) providing a more comprehensive discussion of the validity, robustness, and limitations of our model; (2) improving contextualization with alternative mechanisms; and (3) enhancing the clarity of our results, figures, and terminology.

      1) Model Validity, Robustness, and Limitations:

      As suggested by Reviewers #1 and #3, we will provide a more thorough discussion of our model's assumptions and limitations.[tt1]  This is essential to evaluate the generalizability and reliability of our conclusions. We will clarify which aspects of the dynamics we believe to be robust, elaborate on the rationale behind key parameter choices, such as the selection criteria for hydrogen-bonding residues and the calibration of their interaction strength, and discuss how these choices may influence the simulation outcomes. Furthermore, we will mention the potential impact of our choices regarding DNA sequence, DNA length, and the high-salt concentration, explaining why we opted for this simulation strategy over alternative enhanced-sampling techniques.

      2) Contextualization with Alternative Mechanisms:

      Following the comments by Reviewer #2, we will expand our discussion to better contextualize our work. We will provide a more detailed comparison between our segment-capture model and alternative mechanisms, particularly the 'scrunching' model (e.g., the theoretical work by Takaki et al. Nat. Commun. 2021,). This will help clarify how our high-resolution mechanistic view that reveals stepwise conformational transitions underlying segment capture fits into the broader landscape of SMC loop extrusion research. We believe this will contribute to the ongoing scientific discourse.

      3) Clarity of Results, Figures, and Terminology:

      Based on valuable suggestions from Reviewers #2 and #3, we will revise our manuscript to improve the clarity and accessibility of our findings. We will update figures and their descriptions (e.g., Figure 4I, J), providing a clearer step-by-step explanation of the translocation process within the ATP cycle (related to Figure 2), clarifying the role of each conformational state, elucidating how these transitions contribute to the loop extrusion mechanism, and defining key terms such as "pumping" more precisely.

      We are confident that these revisions will substantially strengthen the mechanistic clarity and scientific contribution of our work.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The manuscript of Odermatt et al. investigates the volatiles released by two species of Desmodium plants and the response of herbivores to maize plants alone or in combination with these species. The results show that Desmodium releases volatiles in both the laboratory and the field. Maize grown in the laboratory also released volatiles, in a similar range. While female moths preferred to oviposit on maize, the authors found no evidence that Desmodium volatiles played a role in lowering attraction to or oviposition on maize.

      Strengths:

      The manuscript is a response to recently published papers that presented conflicting results with respect to whether Desmodium releases volatiles constitutively or in response to biotic stress, the level at which such volatiles are released, and the behavioral effect it has on the fall armyworm. These questions are relevant as Desmodium is used in a textbook example of pest-suppressive sustainable intercropping technology called push-pull, which has supported tens of thousands of smallholder farmers in suppressing moth pests in maize. A large number of research papers over more than two decades have implied that Desmodium suppresses herbivores in push-pull intercropping through the release of large amounts of volatiles that repel herbivores. This premise has been questioned in recent papers. Odermatt et al. thus contribute to this discussion by testing the role of odors in oviposition choice. The paper confirms that ovipositing FAW preferred maize, and also confirmed that odors released from Desmodium appeared not important in their bioassays.

      The paper is a welcome addition to the literature and adds quality headspace analyses of Desmodium from the laboratory and the field. Furthermore, the authors, some of whom have since long contributed to developing push-pull, also find that Desmodium odors are not significant in their choice between maize plants. This advances our knowledge of the mechanisms through which push-pull suppresses herbivores, which is critically important to evolving the technique to fit different farming systems and translating this mechanism to fit with other crops and in other geographical areas.

      Thank you for your careful assessment of our manuscript.

      Weaknesses:

      Below I outline the major concerns:

      (1) Clear induction of the experimental plants, and lack of reflective discussion around this: from literature data and previous studies of maize and Desmodium, it is clear that the plants used in this study, particularly the Desmodium, were induced. Maize appeared to be primarily manually damaged, possibly due to sampling (release of GLV, but little to no terpenoids, which is indicative of mostly physical stress and damage, for example, one of the coauthor's own paper Tamiru et al. 2011), whereas Desmodium releases a blend of many compounds (many terpenoids indicative of herbivore induction). Erdei et al. also clearly show that under controlled conditions maize, silver leaf and green leaf Desmodium release volatiles in very low amounts. While the condition of the plants in Odermatt et al. may be reflective of situations in push-pull fields, the authors should elaborate on the above in the discussion (see comments) such that the readers understand that the plant's condition during the experiments. This is particularly important because it has been assumed that Desmodium releases typical herbivore-induced volatiles constitutively, which is not the case (see Erdei et al. 2024). This reflection is currently lacking in the manuscript.

      We acknowledge the need for a more reflective discussion on the possible causes of volatile emission due to physical damage. Although the field plants were carefully handled, it is possible that some physical stress may have contributed to the release of volatiles, such as green leaf volatiles (GLVs). We ensured the revised manuscript reflects this nuanced interpretation (lines 282 – 286). However, we also explained more clearly that our aim was to capture the volatile emission of plants used by farmers under realistic conditions and moth responses to these plants, not to be able to attribute the volatile emission to a specific cause (lines 115 – 117). We revised relevant passages throughout the results and discussion to ensure that we do not make any claims about the reason for volatile emissions, and that our claims regarding these plants and their headspace being representative of the system as practiced by farmers are supported. In the revised manuscript we provide a new supplementary table S2 that additionally shows the classification of the identified substances, which also shows that the majority of the substances that were found in the headspace of the sampled plants of Desmodium intortum or Desmodium incanum are monoterpenes, sesquiterpenes, or aromatic compounds, and not GLVs (that are typically emitted following damage).

      (2) Lack of controls that would have provided context to the data: The experiments lack important controls that would have helped in the interpretation:

      2a The authors did not control the conditions of the plants. To understand the release of volatiles and their importance in the field, the authors should have included controlled herbivory in both maize and Desmodium. This would have placed the current volatile profiles in a herbivory context. Now the volatile measurements hang in midair, leading to discussions that are not well anchored (and should be rephrased thoroughly, see eg lines 183-188). It is well known that maize releases only very low levels of volatiles without abiotic and biotic stressors. However, this changes upon stress (GLVs by direct, physical damage and eg terpenoids upon herbivory, see above). Erdei et al. confirm this pattern in Desmodium. Not having these controls, means that the authors need to put the data in the context of what has been published (see above).

      We appreciate this concern. Our study aimed to capture the real-world conditions of push-pull fields, where Desmodium and maize grow in natural environments without the direct induction of herbivory for experimental purposes (lines 115 – 117). We agree that in further studies it would be important to carry out experiments under different environmental conditions, including herbivore damage. However, this was not within the scope of the present study.

      2b It would also have been better if the authors had sampled maize from the field while sampling Desmodium. Together with the above point (inclusion of herbivore-induced maize and Desmodium), the levels of volatile release by Desmodium would have been placed into context.

      We acknowledge that sampling maize and other intercrop plants, such as edible legumes, alongside Desmodium in the push-pull field would have allowed us to make direct comparisons of the volatile profiles of different plants in the push-pull system under shared field conditions. Again, this should be done in future experiments but was beyond the scope of the present study. Due to the amount of samples we could handle given cost and workload, we chose to focus on Desmodium because there is much less literature on the volatile profiles of field-grown Desmodium than maize plants in the field: we are aware of one study attempting to measure field volatile profiles from Desmodium intortum (Erdei et al. 2024) and no study attempting this for Desmodium incanum. We pointed out this justification for our focus on Desmodium in the manuscript (lines 435 - 439). Additionally, we suggested in the discussion that future studies should measure volatile profiles from all plants commonly used in push-pull systems alongside Desmodium (lines 267 – 269).

      2c To put the volatiles release in the context of push-pull, it would have been important to sample other plants which are frequently used as intercrop by smallholder farmers, but which are not considered effective as push crops, particularly edible legumes. Sampling the headspace of these plants, both 'clean' and herbivore-induced, would have provided a context to the volatiles that Desmodium (induced) releases in the field - one would expect unsuccessful push crops to not release any of these 'bioactive' volatiles (although 'bioactive' should be avoided) if these odors are responsible for the pest suppressive effect of Desmodium. Many edible intercrops have been tested to increase the adoption of push-pull technology but with little success.

      We very much agree that such measurements are important for the longer-term research program in this field. But again, for the current study this would have exploded the size of the required experiment. Regarding bioactivity, we have been careful to use the phrase "potentially bioactive" solely when referring to findings from the literature (lines 99–103), in order to avoid making any definitive claims about our own results.

      Because of the lack of the above, the conclusions the authors can draw from their data are weakened. The data are still valuable in the current discussion around push-pull, provided that a proper context is given in the discussion along the points above.

      We think our revisions made the specific aims of this study more explicit and help to avoid misleading claims.

      (3) 'Tendency' of the authors to accept the odor hypothesis (i.e. that Desmodium odors are responsible for repelling FAW and thereby reduce infestation in maize under push-pull management) in spite of their own data: The authors tested the effects of odor in oviposition choice, both in a cage assay and in a 'wind tunnel'. From the cage experiments, it is clear that FAW preferred maize over Desmodium, confirming other reports (including Erdei et al. 2024). However, when choosing between two maize plants, one of which was placed next to Desmodium to which FAW has no tactile (taste, structure, etc), FAW chose equally. Similarly in their wind tunnel setup (this term should not be used to describe the assay, see below), no preference was found either between maize odor in the presence or absence of Desmodium. This too confirms results obtained by Erdei et al. (but add an important element to it by using Desmodium plants that had been induced and released volatiles, contrary to Erdei et al. 2024). Even though no support was found for repellency by Desmodium odors, the authors in many instances in the manuscript (lines 30-33, 164-169, 202, 279, 284, 304-307, 311-312, 320) appear to elevate non-significant tendencies as being important. This is misleading readers into thinking that these interactions were significant and in fact confirming this in the discussion. The authors should stay true to their own data obtained when testing the hypothesis of whether odors play a role in the pest-suppressive effect of push-pull.

      We appreciate this feedback and agree that we may have overstated claims that could not be supported by strict significance tests. However, we believe that non-significant tendencies can still provide valuable insights. In the revised version of the manuscript, we ensured a clear distinction between statistically significant findings and non-significant trends and remove any language that may imply stronger support for the odor hypothesis than what the data show in all the lines that were mentioned.

      (4) Oviposition bioassay: with so many assays in close proximity, it is hard to certify that the experiments are independent. Please discuss this in the appropriate place in the discussion.

      We have pointed this out in the submitted manuscript in lines 275 – 279. Furthermore, we included detailed captions to figure 4 - supporting figure 3 & figure 4 - supporting figure 4. We are aware that in all such experiments there is a danger of between-treatment interference, which we pointed out for our specific case. We stated that with our experimental setup we tried to minimize interference between treatments by spacing and temporal staggering. We would like to point out that this common caveat does not invalidate experimental designs when practicing replication and randomization. We assume that insects are able to select suitable oviposition sites in the background of such confounding factors under realistic conditions.

      (5) The wind tunnel has a number of issues (besides being poorly detailed):

      5a. The setup which the authors refer to as a 'wind tunnel' does not qualify as a wind tunnel. First, there is no directional flow: there are two flows entering the setup at opposite sides. Second, the flow is way too low for moths to orient in (in a wind tunnel wind should be presented as a directional cue. Only around 1.5 l/min enters the wind tunnel in a volume of 90 l approximately, which does not create any directional flow. Solution: change 'wind tunnel' throughout the text to a dual choice setup /assay.)

      We agree with these criticisms and changed the terminology accordingly from ‘wind tunnel’ to ‘dual choice assay’. We have now conducted an additional experiment which we called ‘no-choice assay’ that provides conditions closer to a true wind tunnel. The setup of the added experiment features an odor entry point at only one side of the chamber to create a more directional airflow. Each treatment (maize alone, maize + D. intortum, maize + D. incanum, and a control with no plants) was tested separately, with only one treatment conducted per evening to avoid cross-contamination, as described in the methods section of the no-choice assay.

      5b. There is no control over the flows in the flight section of the setup. It is very well possible that moths at the release point may only sense one of the 'options'. Please discuss this.

      We added this to the discussion (lines 369 – 374). The new no-choice assays also address this concern by using a setup with laminar flow.

      5c. Too low a flow (1,5 l per minute) implies a largely stagnant air, which means cross-contamination between experiments. An experiment takes 5 minutes, but it takes minimally 1.5 hours at these flows to replace the flight chamber air (but in reality much longer as the fresh air does not replace the old air, but mixes with it). The setup does not seem to be equipped with e.g. fans to quickly vent the air out of the setup. See comments in the text. Please discuss the limitations of the experimental setup at the appropriate place in the discussion.

      We added these limitations to the discussion and addressed these concerns with new experiments (see answer 5a).

      5d. The stimulus air enters through a tube (what type of tube, diameter, length, etc) containing pressurized air (how was the air obtained into bags (type of bag, how is it sealed?), and the efflux directly into the flight chamber (how, nozzle?). However, it seems that there is no control of the efflux. How was leakage prevented, particularly how the bags were airtight sealed around the plants? 

      We added the missing information to the methods and provided details about types of bags, manufacturers, and pre-treatments in the method section. In short, PTFE tubes connected bagged plants to the bioassay setup and air was pumped in at an overpressure, so leakage was not eliminated but contamination from ambient air was avoided.

      5e. The plants were bagged in very narrowly fitting bags. The maize plants look bent and damaged, which probably explains the GLVs found in the samples. The Desmodium in the picture (Figure 5 supplement), which we should assume is at least a representative picture?) appears to be rather crammed into the bag with maize and looks in rather poor condition to start with (perhaps also indicating why they release these volatiles?). It would be good to describe the sampling of the plants in detail and explain that the way they were handled may have caused the release of GLVs.

      We included a more detailed description of the plant handling and bagging processes to the methods to clarify how the plants were treated during the dual-choice and the no-choice assays reported in the revised manuscript. We politely disagree that the maize plants were damaged and the Desmodium plants not representative of those encountered in the field. The plants were grown in insect-proof screen houses to prevent damage by insects and carefully curved without damaging them to fit into the bag. The Desmodium plant pictured was D. incanum, which has sparser foliage and smaller leaves than D. intortum.

      (6) Figure 1 seems redundant as a main figure in the text. Much of the information is not pertinent to the paper. It can be used in a review on the topic. Or perhaps if the authors strongly wish to keep it, it could be placed in the supplemental material.

      We think that Figure 1 provides essential information about the push-pull system and the FAW. To our knowledge, this partly contradictory evidence so far has not been synthesized in the literature. We realize that such a figure would more commonly be provided in a review article, but we do not think that the small number of studies on this topic so far justify a stand-alone review. Instead, the introduction to our manuscript includes a brief review of these few studies, complemented by the visual summary provided in Figure 1 and a detailed supplementary table.

      Reviewer #2 (Public review):

      Based on the controversy of whether the Desmodium intercrop emits bioactive volatiles that repel the fall armyworm, the authors conducted this study to assess the effects of the volatiles from Desmodium plants in the push-pull system on behavior of FAW oviposition. This topic is interesting and the results are valuable for understanding the push-pull system for the management of FAW, the serious pest. The methodology used in this study is valid, leading to reliable results and conclusions. I just have a few concerns and suggestions for improvement of this paper:

      (1) The volatiles emitted from D. incanum were analyzed and their effects on the oviposition behavior of FAW moth were confirmed. However, it would be better and useful to identify the specific compounds that are crucial for the success of the push-pull system.

      We fully agree that identifying specific volatile compounds responsible for the push-pull effect would provide valuable insights into the underlying mechanisms of the system. However, the primary focus of this study was to address the still unresolved question whether Desmodium emits detectable or “significant” amounts of volatiles at all under field conditions, and the secondary aim was to test whether we could demonstrate a behavioral effect of Desmodium headspace on FAW moths. Before conducting our experiments, we carefully considered the option of using single volatile compounds and synthetic blends in bioassays. We decided against this because we judged that the contradictory evidence in the literature was not a sufficient basis for composing representative blends. Furthermore, we think it is an important first step to test f. or behavioral responses to the headspaces of real plants. We consider bioassays with pure compounds to be important for confirmation and more detailed investigation in future studies. There was also contradictory evidence in the literature regarding moth responses to plants. We thus opted to focus on experiments with whole plants to maintain ecological relevance.

      (2) That would be good to add "symbols" of significance in Figure 4 (D).

      We report the statistical significance of the parameters in Figure 4 (D) in Table 3, which shows the mixed model applied for oviposition bioassays. While testing significance between groups is a standard approach, we used a more robust model-based analysis to assess the effects of multiple factors simultaneously. We provided a cross-reference to Table 3 from the figure description of Figure 4 (D) for readers to easily find the statistical details.

      (3) Figure A is difficult for readers to understand.

      Unfortunately, it is not entirely clear which specific figure is being referred to as "Figure A" in this comment. We tried to keep our figures as clear as possible.

      (4) It will be good to deeply discuss the functions of important volatile compounds identified here with comparison with results in previous studies in the discussion better.

      Our study does not provide strong evidence that specific volatiles from Desmodium plants are important determinants of FAW oviposition or choice in the push-pull system. Therefore, we prefer to refrain from detailed discussions of the potential importance of individual compounds. However, in the revised version, we provide an additional table S2 which identifies the overlap with volatiles previously reported from Desmodium, as only the total numbers are summarized in the discussion of the submitted paper.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The points raised are largely self-explanatory as to what needs to be done to fully resolve them. At a minimum the text needs to be seriously revised to:

      (1) reflect the data obtained.

      (2) reflect on the limitations of their experimental setup and data obtained.

      (3) put the data obtained and its limitations in what these tell us and particularly what not. Ideally, additional headspace measurements are taken, including from herbivory and 'clean' maize and Desmodium (in which there is better control of biotic and abiotic stress), as well as other crops commonly planted as companion crops with maize (but none of them reducing pest pressure).

      Thank you for this summary. Please see our detailed responses above.

      In addition to the main points of critique provided above, I have provided additional comments in the text (https://elife-rp.msubmit.net/elife-rp_files/2024/07/18/00134767/00/134767_0_attach_28_25795_convrt.pdf). These elaborate on the above points and include some new ones too. These are the major points of critique, which I hope the authors can address.

      Thank you very much for these detailed comments.

      Reviewer #2 (Recommendations for the authors):

      It is important to note that the original push-pull system was developed against stemborers and involved Napier grass (still used) around the field, which attracts stemborer moths, and Molasses grass as the intercrop that repels the moths and attracts parasitoids. Later, Molasses grass was replaced by desmodiums because it is a legume that fixes nitrogen and therefore can increase nitrate levels in the soil, but most importantly because it prevents germination of the parasitic Striga weed. The possible repellent effect of desmodium on pests and attraction of natural enemies was never properly tested but assumed, probably to still be able to use the push-pull terminology. This "mistake" should be recognized here and in future publications. It is a real pity that the controversy over the repellent effect of desmodium distracts from the amazing success of the push-pull system, also against the fall armyworm.

      We thank the reviewer for pointing out these issues, which are part of the reason for our Figure 1 and why we would like to keep it. We have described this development of the system in the introduction to better present the push-pull system. Our aim in Figure 1 and Table S1 is to highlight both the evidence of the system's success, and the gaps in our understanding, regarding specifically control of damage from the FAW.

    1. Author Response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review): 

      This study is focused on identifying unique, innovative surface markers for mature Achilles tendons by combining the latest multi-omics approaches and in vitro evaluation, which would address the knowledge gap of the controversial identity of TPSCs with unspecific surface markers. The use of multi-omics technologies, in vivo characterization, in vitro standard assays of stem cells, and in vitro tissue formation is a strength of this work and could be applied for other stem cell quantification in musculoskeletal research. The evaluation and identification of Cd55 and Cd248 in TPSCs have not been conducted in tendons, which is considered innovative. Additionally, the study provided solid sequencing data to confirm co-expressions of Cd55 and Cd248 with other well-described surface markers such as Ly6a, Tpp3, Pdgfra, and Cd34. Generally, the data shown in the manuscript support the claims that the identified surface antigens mark TPSCs in juvenile tendons.

      However, there are missing links between scientific questions aimed to be addressed in Introduction and Methodology/Results. If the study focuses on unsatisfactory healing responses of mature tendons and understanding of mature TPSCs, at least mature Achilles tendons from more than 12-week-old mice and their comparison with tendons from juvenile/neonatal mice should be conducted. However, either 2-week or 6-weekold mice, used for characterization here, are not skeletally mature, Additionally, there is a lack of complete comparison of TPSCs between 2-week and 6-week-old mice in the transcriptional and epigenetic levels.

      In order to distinguish TPSCs and characterize their epigenetic activities, the authors used scRNA-seq, snRNA-seq, and snATAC-seq approaches. The integration, analysis, and comparison of sequencing data across assays and/or time points is confusing and incomplete. For example, it should be more comprehensive to integrate both scRNA-seq and snRNA-seq data (if not, why both assays were used for Achilles tendons of both 2-week and 6-week timepoints). snRNA-seq and snATAC-seq data of 6-week-old mice were separately analyzed. No comparison of difference and similarity of TPSCs of 2-week and 6-week-old mice was conducted.

      Given the goal of this work to identify specific TPSC markers, the specificity of Cd55 and Cd248 for TPSCs is not clear. First, based on the data shown here, Cd55 and Cd248 mark the same cell population which is identified by Ly6a, TPPP3, and Pdgfra. Although, for instance, Cd34 is expressed by other tissues as discussed here, no data/evidence is provided by this work showing that Cd55 and Cd248 are not expressed by other musculoskeletal tissues/cells. Second, the immunostaining of Cd55 and Cd248 doesn't support their specificity. What is the advantage of using Cd55 and Cd248 for TPSCs compared to using other markers?

      Reviewer #2 (Public review): 

      Summary: 

      The molecular signature of tendon stem cells is not fully identified. The endogenous location of tendon stem cells within the native tendon is also not fully elucidated. Several molecular markers have been identified to isolate tendon stem cells but they lack tendon specificity. Using the declining tendon repair capacity of mature mice, the authors compared the transcriptome landscape and activity of juvenile (2 weeks) and mature (6 weeks) tendon cells of mouse Achilles tendons and identified CD55 and CD248 as novel surface markers for tendon stem cells. CD55+ CD248+ FACS-sorted cells display a preferential tendency to differentiate into tendon cells compared to CD55neg CD248neg cells.

      Strengths: 

      The authors generated a lot of data on juvenile and mature Achilles tendons, using scRNAseq, snRNAseq, and ATACseq strategies. This constitutes a resource dataset.

      Weaknesses: 

      The analyses and validation of identified genes are not complete and could be pushed further. The endogenous expression of newly identified genes in native tendons would be informative. The comparison of scRNAseq and snRNAseq datasets for tendon cell populations would strengthen the identification of tendon cell populations. 

      Reviewer #3 (Public review): 

      Summary: 

      In their report, Tsutsumi et al., use single nucleus transcriptional and chromatin accessibility analyses of mouse achilles tendon in an attempt to uncover new markers of tendon stem/progenitor cells. They propose CD55 and CD248 as novel markers of tendon stem/progenitor cells. 

      Strengths: 

      This is an interesting and important research area. The paper is overall well written.

      Weaknesses: 

      Major problems: 

      (1) It is not clear what tissue exactly is being analyzed. The authors build a story on tendons, but there is little description of the dissection. The authors claim to detect MTJ and cartilage cells, but not bone or muscle cells. The tendon sheath is known to express CD55, so the population of "progenitors" may not be of tendon origin.

      (2) Cluster annotations are seemingly done with a single gene. Names are given to cells without functional or spatial validation. For example, MTJ cells are annotated based on Postn, but it is never shown that Postn is only expressed at the MTJ, and not in other anatomical locations in the tendon. 

      (3) The authors compare their data to public data based on interrogating single genes in their dataset. It is now standard practice to integrate datasets (eg, using harmony), or at a minimum using gene signatures built into Seurat (eg AddModuleScore).

      (4) Progenitor populations (SP1, SP2). The authors claim these are progenitors but show very clearly that they express macrophage genes. What are they, macrophages or fibroblasts?

      (5) All omics analysis is done on single data points (from many mice pooled). The authors make many claims on n=1 per group for readouts dependent on sample number (eg frequency of clusters).

      (6) The scRNAseq atlas in Figure 1 is made by analyzing 2W and 6W tendons at the same time. The snRNAseq and ATACseq atlas are built first on 2W data, after which the 6W data is compared. Why use the 2W data as a reference?

      Why not analyze the two-time points together as done with the scRNAseq? 

      (7) Figure 5: The authors should show the gating strategy for FACS. Were non-fibroblasts excluded (eg, immune cells, endothelia...etc). Was a dead cell marker used? If not, it is not surprising that fibroblasts form colonies and express fibroblast genes when compared to CD55-CD248- immune cells, dead cells, or debris. Can control genes such as Ptprc or Pecam1 be tested to rule out contamination with other cell types?

      Minor problems: 

      (1) Report the important tissue processing details: type of collagenase used. Viability before loading into 10x machine.

      Reviewer #1 (Recommendations for the authors): 

      (1) Better healing responses in neonatal mice than mature mice have been well appreciated in the field and differences in ECM environment, immune responses, and cell function might account for varied injury results. However, direct evidence/data between better healing and abundant TSPCs needs to be discussed in the Introduction. 

      We agree with this insightful comment. We have now enhanced our introduction to include a more direct discussion of the relationship between better healing responses in neonatal mice and the abundance of TSPCs. We specifically highlighted how Howell et al. (2017) demonstrated that tendons in juvenile mice can regenerate functional tissue after injury, while this ability is lost in mature mice. Based on this observation, we articulated our hypothesis that juvenile mouse tendons likely contain abundant TSPCs, which potentially explains their superior healing capacity. Additionally, we have added a statement emphasizing that "investigating TSPCs biology is important for understanding tendon regeneration and homeostasis" (lines 61-62), which clearly articulates the central role that TSPCs play in tendon repair processes and tissue maintenance.

      (2) 6-week-old mouse Achilles tendons are not mature enough and clinically relevant to understand the deficiency of regenerative capacity of TPSCs for undesired healing. If the goal of this study is to identify TSPCs of mature tendons, evaluation of Achilles tendons from at least 12-week-old mice is more reasonable. 

      We agree with this insightful comment. We have now enhanced our introduction to include a more direct discussion of the relationship between better healing responses in neonatal mice and the abundance of TSPCs. We specifically highlighted how Howell et al. (2017) demonstrated that tendons in juvenile mice can regenerate functional tissue after injury, while this ability is lost in mature mice. Based on this observation, we articulated our hypothesis that juvenile mouse tendons likely contain abundant TSPCs, which potentially explains their superior healing capacity. Additionally, we have added a statement emphasizing that "investigating TSPCs biology is important for understanding tendon regeneration and homeostasis" (lines 61-62), which clearly articulates the central role that TSPCs play in tendon repair processes and tissue maintenance.

      (3) 40-60 mouse Achilles tendons pooled for one sample seems a lot and there is mixed/missed information about how many total cells were collected for each sample and how they were used for different sequencing assays. This could raise the concern that cell digestion was not complete and possibly abundant resident cells might be missed for sequencing analysis.

      We agree with this insightful comment. We have now enhanced our introduction to include a more direct discussion of the relationship between better healing responses in neonatal mice and the abundance of TSPCs. We specifically highlighted how Howell et al. (2017) demonstrated that tendons in juvenile mice can regenerate functional tissue after injury, while this ability is lost in mature mice. Based on this observation, we articulated our hypothesis that juvenile mouse tendons likely contain abundant TSPCs, which potentially explains their superior healing capacity. Additionally, we have added a statement emphasizing that "investigating TSPCs biology is important for understanding tendon regeneration and homeostasis" (lines 61-62), which clearly articulates the central role that TSPCs play in tendon repair processes and tissue maintenance.

      (4) The methods section has necessary information missing, which could create confusion for readers. Which time points are used for scRNA-seq and snATAC-seq? Which time points of cells are integrated and analyzed regarding each assay/combined assays? Why is transcriptional expression evaluated by both scRNA-seq and snRNA-seq and is there any technological difference between the two assays?

      We have thoroughly revised the Methods section to clearly specify which time points were used for each assay (line 132-133 and line 148-149). We have also clarified how cells from different time points were integrated and analyzed (lines 167-170, 179-184 and 494-502). Regarding the use of both scRNA-seq and snRNA-seq, we have explained that this complementary approach allowed us to capture both cytoplasmic and nuclear transcripts, providing a more comprehensive view of gene expression profiles while also enabling direct integration with snATAC-seq data. Comparison of similarity between scRNA-seq integration data (2-week and 6-week) and snRNA-seq (2-week) clusters confirmed that the clusters in each data set are almost correlated. We added the dot plot and correlation data in supplemental figure 5. Additionally, we have included comprehensive lists of differentially expressed genes (DEGs) for each identified cluster across all datasets (supplementary tables 1-15), which provide detailed molecular signatures for each cell population and facilitate cross-dataset comparisons.

      (5) snATAC-sequencing data seems to be used to only confirm the findings by snRNA-seq and snATAC-sequencing data is not well explored. This assay directly measures/predicts transcription factor activities and epigenetic changes, which might be more accurate in inferring transcription factors from RNA sequencing data using the R package SCENIC.

      We appreciate the reviewer's insightful comment regarding the utilization of our snATAC-seq data. We agree that snATAC-seq provides valuable direct measurements of chromatin accessibility and transcription factor binding sites that can complement inference-based approaches like SCENIC. To address this concern, we have revised our manuscript to better emphasize the value of our snATAC-seq data in transcription factor activity evaluation. We have modified our text (lines 570-574). This modification emphasizes that our integrated approach leverages the strengths of both methodologies, with snATAC-seq providing direct measurements of chromatin accessibility and transcription factor binding sites that can validate and enhance the inference-based predictions from SCENIC analysis of RNA-seq data.

      (6) The image quality of immunostaining of Cd55 and Cd248 is low. The images show that only part of the tendon sheath has positive staining. Co-localization of Cd55 and Cd248 can't be found.

      We agree with the reviewer regarding the limitations of our immunostaining images. To obtain clearer images, we used paraffin sections for our analysis. Additionally, the antibodies for CD55 and CD248 required different antigen retrieval conditions to work effectively, which unfortunately prevented us from performing co-immunostaining to directly demonstrate co-localization. Despite these technical limitations, we have optimized the processing and imaging parameters to improve the quality of the immunostaining images in Figure 5A. These improved images more clearly demonstrate the expression of CD55 and CD248 in the tendon sheath, although in separate sections. The consistent localization patterns observed in these separate stainings, together with our FACS and functional analyses of double-positive cells, strongly support their co-expression in the same cell population. We have also updated the corresponding Methods section (lines 260-272) to include these optimized immunostaining protocols for better reproducibility.

      (7) Only TEM data of tendon construct formed by sorted cells are shown. Results of mechanical tests will be super helpful to show the capacity of these TPSCs for tendon assembly.

      We appreciate the reviewer's suggestion regarding mechanical testing. We would like to direct the reviewer's attention to Figure 5I in our manuscript, where we have already included tensile strength measurements of the tendon construct. These mechanical test results demonstrate the functional capacity of CD55/CD248+ cells to form tendon-like tissue with appropriate mechanical properties, providing quantitative evidence of their ability for tendon assembly.

      (8) Cells negative for CD55/CD248 could be mixed cell populations, including hematopoietic lineages, cells from tendon mid substance, immune cells, and/or endothelial cells. Under induction of tri-lineage media, these mixed cell populations could process different, unpredicted phenotypes (shown by no increased gene expression of tenogenic, chondrogenic, and osteogenic markers after induction). Higher tenogenic gene expressions of TPSCs after induction don't mean that TPSCs are induced into tenocytes if compared to unknown cell populations with/without similar induction. Additionally, PCR data in Figure 5 presented as ΔΔCT, with unclear biological meanings, is challenging to interpret.

      We appreciate the reviewer's suggestion regarding mechanical testing. We would like to direct the reviewer's attention to Figure 5I in our manuscript, where we have already included tensile strength measurements of the tendon construct. These mechanical test results demonstrate the functional capacity of CD55/CD248+ cells to form tendon-like tissue with appropriate mechanical properties, providing quantitative evidence of their ability for tendon assembly.

      Reviewer #2 (Recommendations for the authors): 

      The aim of this study was to identify novel markers for tendon stem cells. The authors used the fact that tendon cells of juvenile tendons have a greater ability to regenerate versus mature tendons. scRNAseq, snRNAseq, and snATACseq datasets were generated and analyzed in juvenile and mature Achilles tendons (mice). 

      The authors generated a lot of data that could be exploited further to show that these two novel surface tendon markers are more tendon-specific than those previously identified. Another concern is that there is no robust data indicative of the endogenous location of CD55+ CD248+ cells in the native tendon. Same comments for the transcription factors regulating the transcription of CD55 and CD248 and that of Scx and Mkx. A validation of the ATACseq data with a location in native tendons would be pertinent.

      The analysis was performed by comparing 2 sub-clusters of the same datasets and not between the two stages. Given the introduction highlighting the differential ability to regenerate between the two stages, the comparison between the two stages was somehow expected. I wonder if there is an explanation for the absence of analysis between the two stages.

      The authors have all the datasets to (bioinformatically) compare scRNAseq and snRNAseq datasets. This comparative analysis would strengthen the clustering of tendon cell populations at both stages. The labeling/identification of clusters associated with tendon cell populations is not obvious. I am surprised that there is no tendon sheath cluster such as endotenon or peritenon. A discussion on the different tendon cell populations (tendon clusters) is lacking.

      (1) Choice of the three markers 

      The authors chose three genes known to be markers for tendon stem cells, Tppp3, PdgfRa, and Ly6a, and investigated clusters (or subclusters) that co-express these three genes. Except for Tppp3, the other two genes lack tendonspecificity. Ly6a is a stem cell marker and is recognized to be a marker of epi/perimysium in fetal and perinatal stages in mouse limbs (PMID: 39636726). Pdgfra is a generic marker of all connective tissue fibroblasts. Could it be that the identification of the two novel surface markers was biased with this choice? The identification of CD55 and CD248 has been done by comparing DEGs between cluster 4 (SP2) and cluster 1 (SP1). What about an unbiased comparison of both clusters 4 and 1 (or individual clusters) between mature and juvenile samples? The reader expected such a comparison since it was introduced as the rationale of the paper to compare juvenile and mature tendon cells.

      We selected Tppp3, PdgfRa, and Ly6a based on established literature identifying them as TSPC markers (Harvey et al., 2019; Tachibana et al., 2022). While only Tppp3 has tendon specificity, these genes collectively represent reliable TSPC markers currently available.

      Our identification of CD55 and CD248 came from comparing SP2 and SP1 clusters that showed these three markers plus tendon development genes. We did compare juvenile and mature samples as shown in Figure 1G, revealing decreased stem/progenitor marker expression with maturation. Additionally, we performed a comprehensive comparison between 2-week and 6-week samples visualized as a heatmap in Supplemental Figure 3, which clearly demonstrates the transcriptional changes that occur during tendon maturation. We have also provided the complete lists of differentially expressed genes for each identified cluster

      (supplementary tables 1-15), allowing for unbiased examination of cluster-specific gene signatures across developmental stages.

      Our functional validation confirmed CD55/CD248 positive cells express Tppp3, PdgfRa, and Ly6a while demonstrating high clonogenicity and tenogenic differentiation capacity, confirming their TSPC identity.

      (2) Concerns with cluster identification 

      The cluster11, named as MTJ cluster, in 2-week scRNAseq datasets was not detected in 6-week scRNAseq datasets (Figure 1A). Does it mean that MTJ disappears at 6 weeks in Achilles tendons? In the snRNAseq MTJ cluster was defined on the basis of Postn expression. «Cluster 11, with high Periostin (Postn) expression, was classified as a myotendinous junction (MTJ).» Line 379.

      What is the basis/reference to set a link between Postn and MTJ? 

      Could the CA clusters be enthesis clusters? Is there any cartilage in the Achilles tendon?

      If there are MTJ clusters, one could expect to see clusters reflecting tendon attachment to cartilage/bone.

      I am surprised to see no cluster reflecting tendon attachments (endotenon or peritenon).

      Cluster 9 was identified as a proliferating cluster in scRNAseq datasets. Does the Cell Cycle Regression step have been performed?

      Thank you for highlighting these important questions about our cluster identification. The MTJ cluster (cluster 11) appears reduced but not absent in 6-week samples. We based our MTJ classification on Postn expression, which is enriched at the myotendinous junction, as documented by Jacobson et al. (2020) in their proteome analysis of myotendinous junctions. We have added this reference to the manuscript to provide clear support for our cluster annotation (lines 400-401).

      Regarding the CA cluster, these cells express chondrogenic markers but are not enthesis clusters. We have revised our manuscript to acknowledge that these could potentially represent enthesis cells, as you suggested (lines 412-414). While Achilles tendons themselves don't contain cartilage, our digestion process likely captured some adjacent cartilaginous tissues from the calcaneus insertion site.

      We acknowledge the absence of clearly defined endotenon/epitenon clusters. We have added more comprehensive explanations about peritenon tissues in our manuscript (lines 431-433 and 584-585), noting that previous studies (Harvey et al., 2019) have reported that Tppp3-positive populations are localized to the peritenon, and our SP clusters might also reflect peritenon-derived cells. This additional context helps clarify the potential tissue origins of our identified cell populations.

      For the proliferating cluster (cluster 9), we confirmed high expression of cell cycle markers (Mki67, Stmn1) but did not perform cell cycle regression to maintain biological relevance of proliferation status in our analysis. We have clarified this methodological decision in the revised Methods section.

      (3) What is the meaning of all these tendon clusters in scRNAseq snRNAseq and snATACseq? The authors described 2 or 3 SP clusters (depending on the scRNAseq or snRNAseq datasets), 2 CT clusters, 1 MTJ cluster, and 1CA cluster. Do genes with enriched expression in these different clusters correspond to different anatomical locations in native tendons? Are there endotenon and peritenon clusters? Is there a correlation between clusters (or subclusters) expressing stem cell markers and peritenon as described for Tppp3

      Thank you for this important question about the biological significance of our identified clusters. The multiple tendon-related clusters we identified likely represent distinct cellular states and differentiation stages rather than strictly discrete anatomical locations. The SP clusters (stem/progenitor cells) express markers consistent with tendon progenitors reported in the literature, including Tppp3, which has been described in the peritenon. As we mentioned in our response to the previous question, we have added more comprehensive explanations about peritenon tissues in our manuscript (Lines 432-433 and 584-585), noting that previous studies (Harvey et al., 2019) have reported that Tppp3-positive populations are localized to the peritenon, and our SP clusters might reflect peritenon-derived cells. Our immunohistochemistry data in Figure 5A further confirms that CD55/CD248 positive cells are localized primarily to the tendon sheath region, similar to the localization pattern of Tppp3 reported by Harvey et al. (2019). The tenocyte clusters (TC) represent mature tendon cells within the fascicles, and their distinct transcriptional profiles suggest heterogeneity even within mature tenocytes. The MTJ cluster specifically expresses genes enriched at the myotendinous junction, while the CA cluster likely represents cells from the enthesis region, as you suggested. In the revised manuscript, we have clarified this interpretation and added additional discussion about the relationship between cluster identity and anatomical localization, particularly regarding the SP clusters and their correlation with peritenon regions.

      (4) The use of single-cell and single-nuclei RNAseq strategies to analyze tendon cell populations in juvenile and mature tendons is powerful, but the authors do not exploit these double analyses. A comparison between scRNAseq and snRNAseq datasets (2 weeks and 6 weeks) is missing. The similar or different features at the level of the clustering or at the level of gene expression should be explained/shown and discussed. This analysis should strengthen the clustering of tendon cell populations at both stages. In the same line, why are there 3 SP clusters in snRNAseq versus 2 SP clusters in scRNAseq? The MTJ cluster R2-5 expressing Sox9 should be discussed.

      Thank you for highlighting this important gap. We have conducted a comprehensive comparison between scRNA-seq and snRNA-seq datasets, revealing substantial correlation between cell populations identified by both methodologies. We've added a detailed dot plot visualization and correlation heatmap in Supplemental Figure 5 that demonstrates the relationships between clusters across datasets. The additional SP cluster in snRNA-seq likely reflects the greater sensitivity of nuclear RNA sequencing in capturing certain cell states that might be missed during whole-cell isolation. Our analysis shows this SP3 cluster represents a transitional state between stem/progenitor cells and differentiating tenocytes. Regarding the Sox9-expressing MTJ cluster R2-5, we have expanded our discussion in the revised manuscript (lines 500502) to address this finding, incorporating relevant references (Nagakura et al., 2020) that describe Sox9 expression at the myotendinous junction. This expression pattern suggests that cells at this specialized interface may maintain developmental plasticity between tendon and cartilage fates, which is consistent with the transitional nature of this anatomical region.

      (5) The claim of "high expression of CD55 and CD248 in the tendon sheath" is not supported by the experiments. The images of immunostaining (Figure 5A) are not very convincing. It is not explained if these are sections of 3Dtendon constructs or native tendons. The expression in 3D-tendon constructs is not informative, since tendon sheaths are not present. The endogenous expression of the transcription factors regulating tendon gene expression would be informative to localize tendon stem cells in native tendons.

      Thank you for this important critique. We agree that the original immunostaining images were not sufficiently convincing. To address this, we have used paraffin sections and optimized our staining protocols to improve image quality. It's worth noting that CD55 and CD248 antibodies required different antigen retrieval conditions to work effectively, which unfortunately prevented us from performing coimmunostaining to directly demonstrate co-localization in the same section. Despite these technical limitations, we have significantly improved the quality of the immunostaining images in Figure 5A with enhanced processing and imaging parameters 

      The improved images more clearly demonstrate the preferential expression of CD55 and CD248 in the tendon sheath/peritenon regions. The consistent localization patterns observed in these separate stainings, together with our FACS and functional analyses of double-positive cells, strongly support their coexpression in the same cell population.

      In the revised manuscript, we have also improved the figure legends to clearly indicate the nature of the tissue samples and updated the methods section to provide more detailed protocols for the immunostaining procedures used.

      Your suggestion regarding transcription factor visualization is valuable. While beyond the scope of our current study, we agree that examining the endogenous expression of regulatory transcription factors like Klf3 and Klf4 would provide additional insights into tendon stem cell localization in native tendons, and we plan to pursue this in future work

      Minor concerns:

      (1) Lines 392-397 « To identify progenitor populations within these clusters, we analyzed expression patterns of previously reported markers Tppp3 and Pdgfra (Harvey et al., 2019; Tachibana, et al., 2022), along with the known stem/progenitor cell marker Ly6a (Holmes et al., 2007; Sung et al., 2008; Hittinger et al., 2013; Sidney et al., 2014; Fang et al., 2022). We identified subclusters within clusters 1 and 4 showing high expression of these genes, which we defined as SP1 and SP2. SP2 exhibited the highest expression of these genes, suggesting it had the strongest progenitor characteristics.» Please cite relevant Figures. Feature and violin plots (scRNAseq) across all cells (not for the only 2 SP1 and SP2 clusters) of Tppp3, Pdgfra and Ly6a are missing.

      Thank you for pointing out this important oversight. We have modified the manuscript to clarify that the text in question describes Figure 1B. Additionally, we have added new feature plots showing the expression of Tppp3, Pdgfra, and Ly6a across all cells in supplymental figure 1B

      (2) The labeling of clusters with numbers in single-cell, single nuclei RNAseq, and ATACseq is difficult to follow.

      We appreciate your feedback on this issue. We recognize that the numerical labeling system across different datasets (scRNA-seq, snRNA-seq, and snATAC-seq) makes it difficult to track the same cell populations. To address this, we have added Supplemental Figure 5, which clearly shows the correspondence between cell populations in single-cell and single-nucleus RNA-seq datasets.

      (3) Figure 1C. It is not clear from the text and Figure legend if the DEGs are for the merged 2 and 6 weeks. If yes, an UMAP of the merged datasets of 2 and 6 weeks would be useful.

      We appreciate your feedback on this issue. We recognize that the numerical labeling system across different datasets (scRNA-seq, snRNA-seq, and snATAC-seq) makes it difficult to track the same cell populations. To address this, we have added Supplemental Figure 5, which clearly shows the correspondence between cell populations in single-cell and single-nucleus RNA-seq datasets.

      (4) Along the Text, there are a few sentences with obscure rationale. Here are a few examples (not exhaustive):

      Abstract 

      “Combining single-nucleus ATAC and RNA sequencing analyses revealed that Cd55 and Cd248 positive fractions in tendon tissue are TSPCs, with this population decreasing at 6 weeks.”

      The rationale of this sentence is not clear. How can single-nucleus ATAC and RNA sequencing analyses identify Cd55 and Cd248 positive fractions as tendon stem cells?

      Thank you for highlighting this unclear statement in our abstract. We agree that the previous wording did not adequately explain how our sequencing analyses identified CD55 and CD248 positive cells as TSPCs. We have revised this sentence to clarify that our multi-modal approach (combining scRNA-seq, snRNA-seq, and snATAC-seq) enabled us to identify Cd55 and Cd248 positive populations as TSPCs based on their co-expression with established TSPC markers such as Tppp3, Pdgfra, and Ly6a. This comprehensive analysis across different sequencing modalities provided strong evidence for their identity as tendon stem/progenitor cells, which we further validated through functional assays. The revised abstract now more clearly communicates the logical progression of our analysis and findings

      Line 80-82 

      “Cd34 is known to be highly expressed in mouse embryonic limb buds at E14.5 compared to E11.5 (Havis et al., 2014), making it a potential marker for TSPCs.”

      The rationale of this sentence is not clear. How can "the fact to be expressed in E14.5 mouse limbs" be an indicator of being a "potential marker of tendon stem cells"?

      Thank you for highlighting this unclear statement in our abstract. We agree that the previous wording did not adequately explain how our sequencing analyses identified CD55 and CD248 positive cells as TSPCs. We have revised this sentence to clarify that our multi-modal approach (combining scRNA-seq, snRNA-seq, and snATAC-seq) enabled us to identify Cd55 and Cd248 positive populations as TSPCs based on their co-expression with established TSPC markers such as Tppp3, Pdgfra, and Ly6a. This comprehensive analysis across different sequencing modalities provided strong evidence for their identity as tendon stem/progenitor cells, which we further validated through functional assays. The revised abstract now more clearly communicates the logical progression of our analysis and findings

      Line 611 

      “Recent reports have highlighted the role of the Klf family in limb development (Kult et al., 2021), suggesting its potential importance in tendon differentiation”

      Why does the "role of Klf family in limb development" suggest an "importance in tendon differentiation"?

      Thank you for highlighting this logical gap in our manuscript. You're right that involvement in limb development doesn't necessarily indicate specific importance in tendon differentiation. We've revised this statement to more accurately reflect current knowledge, noting that while Klf factors are involved in limb development, their specific role in tendon differentiation requires further investigation (lines 658-659). This revised text better aligns with our findings of Klf3 and Klf4 expression in tendon progenitor cells without making unsupported claims about their functional significance

      Reviewer #3 (Recommendations for the authors): 

      In addition to the points highlighted above some additional points are listed below.

      (1) Case in point: the authors claim CD55 and CD248 are found at the tendon sheath (line 541), which is not part of the tendon proper (although the IHC seems to show green in the epi/endotenon).

      Thank you for highlighting this logical gap in our manuscript. You're right that involvement in limb development doesn't necessarily indicate specific importance in tendon differentiation. We've revised this statement to more accurately reflect current knowledge, noting that while Klf factors are involved in limb development, their specific role in tendon differentiation requires further investigation (lines 658-659). This revised text better aligns with our findings of Klf3 and Klf4 expression in tendon progenitor cells without making unsupported claims about their functional significance

      (2) All cell types seem to express collagen based on Figure 1B, so either there is serious background contamination (eg, ambient RNA), or an error in data analysis.

      Thank you for highlighting this logical gap in our manuscript. You're right that involvement in limb development doesn't necessarily indicate specific importance in tendon differentiation. We've revised this statement to more accurately reflect current knowledge, noting that while Klf factors are involved in limb development, their specific role in tendon differentiation requires further investigation (lines 658-659). This revised text better aligns with our findings of Klf3 and Klf4 expression in tendon progenitor cells without making unsupported claims about their functional significance

      Minor problems: 

      (1) The figures are confusingly formatted. It is hard to go between cluster numbers and names. Clusters of similar cell types (eg progenitors) are not grouped to facilitate comparison, as ordering is based on cluster number).

      Thank you for highlighting this logical gap in our manuscript. You're right that involvement in limb development doesn't necessarily indicate specific importance in tendon differentiation. We've revised this statement to more accurately reflect current knowledge, noting that while Klf factors are involved in limb development, their specific role in tendon differentiation requires further investigation (lines 658-659). This revised text better aligns with our findings of Klf3 and Klf4 expression in tendon progenitor cells without making unsupported claims about their functional significance

      (2) The introduction does not distinguish between findings in mice and man. A lot of confusion in the tendon literature probably arises from interspecies differences, which are rarely addressed. 

      We appreciate this important point about species distinctions. We have revised our introduction to clearly identify species-specific findings by adding the term "murine" before TSPC references when discussing mouse studies (lines 64, 66, 70, 75, 100, and 108). We agree that interspecies differences are important considerations in tendon biology research, particularly when translating findings between animal models and humans. Our study focuses specifically on mouse models, and we have been careful not to overgeneralize our conclusions to human tendon biology without appropriate evidence. This clarification helps readers better contextualize our findings within the broader tendon literature landscape.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      (1) The use of single-cell RNA and TCR sequencing is appropriate for addressing potential relationships between gene expression and dual TCR.

      Thank you for your detailed review and suggestions. The main advantages of scRNA+TCR-seq are as follows: (1) It enables comparative analysis of features such as the ratio of single TCR paired T cells to dual TCR paired T cells at the level of a large number of individual T cells, through mRNA expression of the α and β chains. In the past, this analysis was limited to a small number of T cells, requiring isolation of single T cells, PCR amplification of the α and β chains, and Sanger sequencing; (2) While analyzing TCR paired T cell characteristics, it also allows examination of mRNA expression levels of transcription factors in corresponding T cells through scRNA-seq.

      (2) The data confirm the presence of dual TCR Tregs in various tissues, with proportions ranging from 10.1% to 21.4%, aligning with earlier observations in αβ T cells.

      Thank you very much for your detailed review and suggestions. Early studies on dual TCR αβ T cells have been very limited in number, with reported proportions of dual TCR T cells ranging widely from 0.1% to over 30%. In contrast, scRNA+TCR-seq can monitor over 5,000 single and paired TCRs, including dual paired TCRs, in each sample, enabling more precise examination of the overall proportion of dual TCR αβ T cells. It is important to note that our analysis focuses on T cells paired with functional α and β chains, while T cells with non-functional chain pairings and those with a single functional chain without pairing were excluded from the total cell proportion analysis. Previous studies generally lacked the ability to determine expression levels of specific chains in T cells without dual TCR pairings.

      (3) Tissue-specific patterns of TCR gene usage are reported, which could be of interest to researchers studying T cell adaptation, although these were more rigorously analyzed in the original works.

      Thank you very much for your detailed review and suggestions. T cell subpopulations exhibit tissue specificity; thus, we conducted a thorough investigation into Treg cells from different tissue sites. This study builds upon the original by innovatively analyzing the differences in VDJ rearrangement and CDR3 characteristics of dual TCR Treg cells across various tissues. This provides new insights and directions for the potential existence of “new Treg cell subpopulations” in different tissue locations. The results of this analysis suggest the necessity of conducting functional experiments on dual TCR Treg cells at both the TCR protein level and the level of effector functional molecules.

      (4) Lack of Novelty: The primary findings do not substantially advance our understanding of dual TCR expression, as similar results have been reported previously in other contexts.

      Thank you for your detailed review and suggestions. Early research on dual TCR T cells primarily relied on transgenic mouse models and in vitro experiments, using limited TCR alpha chain or TCR beta chain antibody pairings. Flow cytometry was used to analyze a small number of T cells to estimate dual TCR T cell proportion. No studies have yet analyzed dual TCR Treg cell proportion, V(D)J recombination, and CDR3 characteristics at high throughput in physiological conditions. The scRNA+TCR-seq approach offers an opportunity to conduct extensive studies from an mRNA perspective. With high-throughput advantages of single-cell sequencing technology, researchers can analyze transcriptomic and TCR sequence characteristics of all dual TCR Treg cells within a study sample, providing new ideas and technical means for investigating dual TCR T cell proportions, characteristics, and origins under different physiological and pathological states.

      (5) Incomplete Evidence: The claims about tissue-specific differences lack sufficient controls (e.g., comparison with conventional T cells) and functional validation (e.g., cell surface expression of dual TCRs).

      Thank you for your detailed review and suggestions. This study indeed only analyzed dual TCR Treg cells from different tissue locations based on the original manuscript, without a comparative analysis of other dual TCR T cell subsets corresponding to these tissue locations. The main reason for this is that, in current scRNA+TCR-seq studies of different tissue locations, unless specific T cell subsets are sorted and enriched, the number of T cells obtained from each subset is very low, making a detailed comparative analysis impossible. In the results of the original manuscript, we observed a relatively high proportion of dual TCR Treg cell populations in various tissues, with differences in TCR composition and transcription factor expression. Following the suggestions, we have included additional descriptions in R1, citing the study by Tuovinen et al., which indicates that the proportion of dual TCR Tregs in lymphoid tissues is higher than other T cell types. This will help understand the distribution characteristics of dual TCR Treg cells in different tissues and provide a basis for mRNA expression levels to conduct functional experiments on dual TCR Treg cells in different tissue locations.

      (6) Methodological Weaknesses: The diversity analysis does not account for sample size differences, and the clonal analysis conflates counts and clonotypes, leading to potential misinterpretation.

      We thank you for your review and suggestions. In response to your question about whether the diversity analysis considered the sample size issue, we conducted a detailed review and analysis. This study utilized the inverse Simpson index to evaluate TCR diversity of Treg cells. A preliminary analysis compared the richness and evenness of single TCR Treg cell and dual TCR Treg cell repertoires. The two datasets analyzed were from four mouse samples with consistent processing and sequencing conditions. However, when analyzing single TCR Tregs and dual TCR Tregs from various tissues, differences in detected T cell numbers by sequencing cannot be excluded from the diversity analysis. Following recommendations, we provided additional explanations in R1: CDR3 diversity analysis indicates TCR composition of dual TCR Treg cells exhibits diversity, similar to single TCR Treg cells; however, diversity indices of single TCR Tregs and dual TCR Tregs are not suitable for statistical comparison. Regarding the "clonal analysis" you mentioned, we define clonality based on unique TCR sequences; cells with identical TCR sequences are part of the same clone, with ≥2 counts defined as expansion. For example, in Blood, there are 958 clonal types and 1,228 cells, of which 449 are expansion cells. In R1, we systematically verified and revised clonal expansion cells across all tissue samples according to a unified standard.

      (7) Insufficient Transparency: The sequence analysis pipeline is inadequately described, and the study lacks reproducibility features such as shared code and data.

      Thank you for your review and suggestions. Based on the original manuscript, we have made corresponding detailed additions in R1, providing further elaboration on the analysis process of shared data, screening methods, research codes, and tools. This aims to offer readers a comprehensive understanding of the analytical procedures and results.

      (8) Weak Gene Expression Analysis: No statistical validation is provided for differential gene expression, and the UMAP plots fail to reveal meaningful clustering patterns.

      Thank you very much for your review and suggestions. Based on your recommendations, we conducted an initial differential expression analysis of the top 10 mRNA molecules in single TCR Treg and dual TCR Treg cells using the DESeq2 R package in R1, with statistical significance determined by Padj < 0.05. Regarding the clustering patterns in the UMAP plots, since the analyzed samples consisted of isolated Treg cell subpopulations that highly express immune suppression-related genes, we did not perform a more detailed analysis of subtypes and expression gene differences. This study primarily aims to explore the proportions of single TCR and dual TCR Treg cells from different tissue sources, as well as the characteristics of CDR3 composition, with a focus on showcasing the clustering patterns of samples from different tissue origins and various TCR pairing types.

      (9) A quick online search reveals that the same authors have repeated their approach of reanalysing other scientists' publicly available scRNA-VDJ-seq data in six other publications,In other words, the approach used here seems to be focused on quick re-analyses of publicly available data without further validation and/or exploration.

      Thank you for your review and suggestions. Most current studies utilizing scRNA+TCR-seq overlook analysis of TCR pairing types and related research on single TCR and dual TCR T cell characteristics. Through in-depth analysis of shared scRNA+TCR-seq data from multiple laboratories, we discovered a significant presence of dual TCR T cells in high-throughput T cell research results that cannot be ignored. In this study, we highlight the higher proportion of dual TCR Tregs in different tissue locations, which exhibits a certain degree of tissue specificity, suggesting these cells may participate in complex functional regulation of Tregs. This finding provides new ideas and a foundation for further research into dual TCR Treg functions. However, as reviewers pointed out, findings from scRNA+TCR-seq at the mRNA level require additional functional experiments on dual TCR T cells at the protein level. We have supplemented our discussion in R1 based on these suggestions.

      Reviewer #2 (Public review):

      (1)The existence of dual TCR expression by Tregs has previously been demonstrated in mice and humans (Reference #18 and Tuovinen. 2006. Blood. 108:4063; Schuldt. 2017. J Immunol. 199:33, both omitted from references). The presented results should be considered in the context of these prior important findings.

      Thank you very much for your review and suggestions. Based on the original manuscript, we have supplemented our reading, understanding, and citation of closely related literature (Tuovinen, 2006, Blood, 108:4063 (line 44,line175 in R1); Schuldt, 2017, J Immunol, 199:33 (line 44,line178 in R1)). We once again appreciate the valuable comments from the reviewers, and we will refer to these in our subsequent dual TCR T cell research.

      (2) This demonstration of dual TCR Tregs is notable, though the authors do not compare the frequency of dual TCR co-expression by Tregs with non-Tregs. This limits interpreting the findings in the context of what is known about dual TCR co-expression in T cells.

      Thank you very much for your review and suggestions. This analysis is primarily based on the scRNA+TCR-seq study of sorted Treg cells, where we found the proportions and distinguishing features of dual TCR Treg cells in different tissue sites. Given the diversity and complexity of Treg function, conducting a comparative analysis of the origins of dual TCR Treg cells and non-T cells with dual TCRs will be a meaningful direction. Currently, peripheral induced Treg cells can originate from the conversion of non-Treg cells; however, little is known about the sources and functions of dual TCR Treg cell subsets in both central and peripheral sites. In R1, we have supplemented the discussion regarding the possible origins and potential applications of the "novel dual TCR Treg" subsets.

      (3) Comparison of gene expression by single- and dual TCR Tregs is of interest, but as presented is difficult to interpret. Statistical analyses need to be performed to provide statistical confidence that the observed differences are true.

      Thank you very much for your review and suggestions. Based on your recommendations, we performed an initial differential expression analysis of the top 10 mRNA molecules in single TCR Treg and dual TCR Treg cells using the DESeq2 R package in R1, with a statistical significance threshold of Padj<0.05 for comparisons.

      (4) The interpretations of the gene expression analyses are somewhat simplistic, focusing on the single-gene expression of some genes known to have a function in Tregs. However, the investigators miss an opportunity to examine larger patterns of coordinated gene expression associated with developmental pathways and differential function in Tregs (Yang. 2015. Science. 348:589; Li. 2016. Nat Rev Immunol. Wyss. 2016. 16:220; Nat Immunol. 17:1093; Zenmour. 2018. Nat Immunol. 19:291).

      Thank you for your review and suggestions. This study is based on publicly available scRNA+TCR-seq data from different organ sites generated by the original authors, focusing on sorted and enriched Treg cells within each tissue sample. However, there was no corresponding research on other cell types in each tissue sample, preventing analysis of other cells and factors involved in development and differentiation of single TCR Treg and dual TCR Treg. The literature suggested by the reviewer indicates that development, differentiation, and function of Treg cells have been extensively studied, resulting in significant advances. It also highlights complexity and diversity of Treg origins and functions. This research aims to investigate "novel dual TCR Treg cell subpopulations" that may exhibit tissuespecific differences found in the original authors' studies of Treg cells across different organ sites. This suggests further experimental research into their development, differentiation, origin, and functional gene expression as an important direction, which we have supplemented in the discussion section of R1.

      Reviewer #3 (Public review):

      (1) Definition of Dual TCR and Validity of Doublet Removal:This study analyzes Treg cells with Dual TCR, but it is not clearly stated how the possibility of doublet cells was eliminated. The authors mention using DoubletFinder for detecting doublets in scRNA-seq data, but is this method alone sufficient?We strongly recommend reporting the details of doublet removal and data quality assessment in the Supplementary Data.

      Thank you very much for your review and suggestions. In the analysis of the shared scRNA+TCR-seq data across multiple laboratories, as you mentioned, this study employed the DoubletFinder R package to exclude suspected doublets. Additionally, we used the nCount values of individual cells (i.e., the total sequencing reads or UMI counts for each cell) as auxiliary parameters to further optimize the assessment of cell quality. Generally, due to the possibility that doublet cells may contain gene expression information from two or more cells, their nCount values are often abnormally high. In this study, all cells included in the analysis had nCount values not exceeding 20,000. Among the five tissue sample datasets, we further utilized hashtag oligonucleotide (HTO) labeling (where HTO labeling provides each cell with a unique barcode to differentiate cells from different tissue sources. By analyzing HTO labels, doublets and negative cells can be accurately identified) to eliminate doublets and negative cells.After the removal of chimeric cells, all samples exhibited T cells that possessed two or more TCR clones. This phenomenon validates the reliability of the methodological approach employed in this study and indicates that the analytical results accurately reflect the proportion of dual TCR T cells. Based on the recommendations of the reviewers, we have supplemented and clarified the methods and discussion sections in the manuscript. It is particularly noteworthy that in our analysis, the discussed dual TCR Treg cells and single TCR Treg cells specifically refer to those T cells that possess both functional α and β chains, which are capable of forming TCR. We have excluded from this analysis any Treg cells that possess only a single functional α or β chain and do not form TCR pairs, as well as those Treg cells in which the α or β chains involved in TCR pairing are non-functional.

      (2) In Figure 3D, the proportion of Dual TCR T cells (A1+A2+B1+B2) in the skin is reported to be very high compared to other tissues. However, in Figure 4C, the proportion appears lower than in other tissues, which may be due to contamination by non-Tregs. The authors should clarify why it was necessary to include non-Tregs as a target for analysis in this study. Additionally, the sensitivity of scRNA-seq and TCR-seq may vary between tissues and may also be affected by RNA quality and sequencing depth in skin samples, so the impact of measurement bias should be assessed.

      We deeply appreciate your review and constructive comments. Based on the original manuscript, we have further supplemented and elaborated on the uniqueness and relative proportions of double TCR T cell pairs in skin tissue samples in Section R1. Due to the scarcity of T cells in skin samples, we included some non-Treg cells during single-cell RNA sequencing and TCR sequencing to obtain a sufficient number of cells for effective analysis. The presence of non-regulatory T cells may indeed impact the statistical representation of double TCR T cells as well as the related comparative analyses, as noted by the reviewer. T cells with A1+A2+B1+B2 type double TCR pairings are primarily found within the non-regulatory T cell population in the skin. In response to this point, we have provided a detailed explanation of this analytical result in the revised manuscript R1. Furthermore, concerning the two datasets included in the study, we conducted a comparative analysis in R1, exploring how factors such as sequencing depth at different tissue sites might introduce biases in our findings, which we have thoroughly elaborated upon in the discussion section. We thank you once again for your valuable suggestions.

      (3) Issue of Cell Contamination:In Figure 2A, the data suggest a high overlap between blood, kidney, and liver samples, likely due to contamination. Can the authors effectively remove this effect? If the dataset allows, distinguishing between blood-derived and tissue-resident Tregs would significantly enhance the reliability of the findings. Otherwise, it would be difficult to separate biological signals from contamination noise, making interpretation challenging.

      We thank you for your review and suggestions. We have carefully verified data sources for tissues such as blood, kidneys, and liver. In the study by Oliver T et al., various techniques were employed to differentiate between leukocytes from blood and those from tissues, ensuring accurate identification of leukocytes from tissue samples. First, anti-CD45 antibody was injected intravenously to label cells in the vasculature, verifying that analyzed cells were indeed resident in the tissue. Second, prior to dissection and cell collection, authors performed perfusion on anesthetized mice to reduce contamination of tissue samples by leukocytes from the vasculature. Additionally, during single-cell sequencing, authors utilized HTO technology to avoid overlap between cells from different tissues.

      Analysis of the scRNA+TCR-seq data shared by the original authors revealed highly overlapping TCR sequences in blood, kidney, and liver, despite distinct cell labels associated with each tissue. While these techniques minimize overlap of cells from different sources, they cannot completely rule out the potential impact of this technical issue. As suggested, we have provided additional clarification in R1 of the manuscript regarding this phenomenon of high overlap in the kidney, liver, and blood, indicating that the possibility of Treg migration from blood to kidney and liver cannot be entirely excluded.

      (4) Inconsistency Between CDR3 Overlap and TCR Diversity:The manuscript states that Single TCR Tregs have a higher CDR3 overlap, but this contradicts the reported data that Dual TCR Tregs exhibit lower TCR diversity (higher 1/DS score). Typically, when TCR diversity is low (i.e., specific clones are concentrated), CDR3 overlap is expected to increase. The authors should carefully address this discrepancy and discuss possible explanations.

      Thank you for your review and suggestions. Regarding the potential relationship between CDR3 overlap and TCR diversity, in samples with consistent sequencing depth, lower diversity indeed corresponds to a higher proportion of CDR3 overlap. In our analysis of scRNA+TCR-seq data, we found that single TCR Tregs exhibit both higher diversity and CDR3 overlap, seemingly presenting contradictory analytical results (i.e., dual TCR Tregs show lower TCR diversity and CDR3 overlap). In R1, we supplemented the analysis of possible reasons: the presence of multiple TCR chains in dual TCR Treg cells may lead to a higher uniqueness of CDR3 due to multiple rearrangements and selections, resulting in lower CDR3 overlap; the lower diversity of dual TCR Tregs may be related to the number of T cells sequenced in each sample. The CDR3 diversity analysis in this study merely suggests that the TCR composition of dual TCR Treg cells is diverse, similar to that of single TCR Tregs. However, the diversity indices of single TCR Tregs and dual TCR Tregs are not suitable for statistical comparative analysis. A more in-depth and specific analysis of the diversity and overlap of the VDJ recombination mechanisms and CDR3 composition in dual TCR Tregs during development will be an important technical means to elucidate the function of dual TCR Treg cells.

      (5) Functional Evaluation of Dual TCR Tregs:This study indicates gene expression differences among tissue-resident Dual TCR T cells, but there is no experimental validation of their functional significance. Including functional assays, such as suppression assays or cytokine secretion analysis, would greatly enhance the study's impact.

      We sincerely appreciate your review and suggestions: In this analysis of scRNA+TCR-seq data, we innovatively discovered a higher proportion of dual TCR Treg cells in different tissue sites, which exhibited differences in tissue characteristics. Furthermore, we conducted a comparative analysis of the homogeneity and heterogeneity between single TCR Treg and dual TCR Treg cells. This result provides a foundation for further research on the origin and characteristics of dual TCR Treg cells in different tissue sites, offering new insights for understanding the complexity and functional diversity of Treg cells. Based on your suggestions, we have supplemented R1 with the feasibility of further exploring the functions of tissue-resident dual TCR T cells and the necessity for potential application research.

      (6) Appropriateness of Statistical Analysis:When discussing increases or decreases in gene expression and cell proportions (e.g., Figure 2D), the statistical methods used (e.g., t-test, Wilcoxon, FDR correction) should be explicitly described. They should provide detailed information on the statistical tests applied to each analysis.

      Thank you for your review and suggestions: Based on the original manuscript, we have supplemented the specific statistical methods for the differences in cell proportions and gene expression in R1.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1:

      (1) Developmental time series:

      It was not entirely clear how this experiment relates to the rest of the manuscript, as it does not compare any effects of transport within or across species.

      Implemented Changes:  

      The importance of species arrival timing for community assembly is addressed in both the introduction and discussion. To accommodate the reviewer’s concerns and further emphasize this point, we have added a clarifying sentence to the results section and included an illustrative example with supporting literature in the discussion.

      Results: Clarifying the timing of initial microbial colonization is essential for determining whether and how priority effects mediate community assembly of vertically transmitted microbes in early life, or whether these microbes arrive into an already established microbial landscape. We used non-sterile frogs of our captive laboratory colony (…)

      Discussion: For example, early microbial inoculation has been shown to increase the relative abundance of beneficial taxa such as Janthinobacterium lividum (Jones et al., 2024), whereas efforts to introduce the same probiotic into established adult communities have not led to long-term persistence (Bletz, 2013; Woodhams et al., 2016).  

      (2) Cross-foster experiment:

      The "heterospecific transport" tadpoles were manually brushed onto the back of the surrogate frog, while the "biological transport" tadpoles were picked up naturally by the parent. It is a little challenging to interpret the effect of caregiver species since it is conflated with the method of attachment to the parent. I noticed that the uptake of Os-associated microbes by Os-transported tadpoles seemed to be higher than the uptake of Rv-associated microbes by Rv-associated tadpoles (comparing the second box from the left to the rightmost boxplot in panel S2C). Perhaps this could be a technical artifact if manual attachment to Os frogs was more efficient than natural attachment to Rv frogs.

      I was also surprised to see so much of the tadpole microbiome attributed to Os in tadpoles that were not transported by Os frogs (25-50% in many cases). It suggests that SourceTracker may not be effectively classifying the taxa.

      Implemented Changes:  

      Methods (Study species, reproductive strategies and life history): Oophaga sylvatica (Os) (Funkhouser, 1956; CITES Appendix II, IUCN Conservation status: Near Threatened) is a large, diurnal poison frog (family Dendrobatidae) inhabiting lowland and submontane rainforests in Colombia and Ecuador. While male Os care for the clutch of up to seven eggs, females transport 1-2 tadpoles at a time to water-filled leaf axils where tadpoles complete their development (Pašukonis et al., 2022; Silverstone, 1973; Summers, 1992). Notably, females return regularly to these deposition sites to provision their offspring with unfertilized eggs.

      Discussion: Most poison frogs transport tadpoles on their backs, but the mechanism of adherence remains unclear. Similar to natural conditions, tadpoles that are experimentally placed onto a caregiver’s back also gradually adhere to the dorsal skin, where they remain firmly attached for several hours as the adult navigates dense terrain. Although transport durations were standardized, species-specific factors- such as microbial density at the contact site, microbial taxa identity, and skin physiology such as moisture -could influence microbial transmission between the transporting frog and the tadpole. While these differences may have contributed to varying transmission efficacies observed between the two frog species in our experiment, none of these factors should compromise the correct microbial source assignment. We thus conclude that transporting frogs serve as a source of microbiota for transported tadpoles. However, further studies on species-specific physiological traits and adherence mechanisms are needed to clarify what modulates the efficacy of microbial transmission during transport, both under experimental and natural conditions.  

      Methods (Vertical transmission): Cross-fostering tadpoles onto non-parental frogs has been used previously to study navigation in poison frogs (Pašukonis et al., 2017). According to our experience, successful adherence to both parent and heterospecific frogs depends on the developmental readiness of tadpoles, which must have retracted their gills and be capable of hatching from the vitelline envelope through vigorous movement. Another factor influencing cross-fostering success is the docility of the frog during initial attachment, as erratic movements easily dislodge tadpoles before adherence is established. Rv are small, jumpy frogs that are easily stressed by handling, making experimental fostering of tadpoles—even their own— impractical. Therefore, we favored an experimental design where tadpoles initiate natural transport and parental frogs pick them up with a 100% success rate. We chose the poison frog Os as foster frogs because adults are docile, parental care in this species involves transporting tadpoles, and skin microbial communities differ from Rv- a critical prerequisite for our SourceTracker analysis. The use of the docile Os as the foster species enabled a 100% cross-fostering success rate, with no notable differences in adherence strength after six hours.

      Methods (Sourcetracker Analysis): To assess training quality, we evaluated model selfassignment using source samples. We selected the model trained on a dataset rarefied to the read depth of the adult frog sample with the lowest read count (48162 reads), as it showed the best overall self-assignment performance, whereas models trained on datasets rarefied to the lowest overall read depth performed worse. Unlike studies using technical replicates, our source samples represent distinct biological individuals and sampling timepoints, where natural microbiome variability is expected within each source category. Consequently, we considered self-assignment rates above 70% acceptable. All source samples were correctly assigned to their respective categories (Rv, Os, or control), but with varying proportions of reads assigned as 'Unknown'. Adult frog sources were reliably selfidentified with high confidence (Os: 97.2% median, IQR = 1.4; Rv: 76.3% median, IQR = 38.1). Adult R. variabilis frogs displayed a higher proportion of 'Unknown' assignments compared to O. sylvatica, likely reflecting greater biological variability among individuals and/or a higher proportion of rare taxa not well captured in the training set. The control tadpole source showed lower self-assignment accuracy (median = 30.5%, IQR = 17.1), as expected given the low microbial biomass of these samples, which resulted in low read depth. Low readdepth limits the information available to inform the iterative updating steps in Gibbs sampling and reduces confidence in source assignments. We therefore verified the robustness of our results by performing the second Sourcetracker analysis as described above, training the model only on adult sources and assigning all tadpoles, including lowbiomass controls, as sinks (as described above). Self-assignment rates for the second training set varied (O. sylvatica: 79.2% median, IQR = 29; R. variabilis: 96.6% median, IQR = 3.7), while results remained consistent across analyses, supporting the reliability of our findings.

      (3) Cross-species analysis:

      Like the developmental time series, this analysis doesn't really address the central question of the manuscript. I don't think it is fair for the authors to attribute the difference in diversity to parental care behavior, since the comparison only includes n=2 transporting species and n=1 non-transporting species that differ in many other ways. I would also add that increased diversity is not necessarily an expectation of vertical transmission. The similarity between adults and tadpoles is likely a more relevant outcome for vertical transmission, but the authors did not find any evidence that tadpole-adult similarity was any higher in species with tadpole transport. In fact, tadpoles and adults were more similar in the non-transporting species than in one of the transporting species (lines 296-298), which seems to directly contradict the authors' hypothesis. I don't see this result explained or addressed in the Discussion.

      To address the reviewer’s concerns, we implemented the following changes:  

      Results:

      We rephrased the following sentence from the results part:  

      “These variations may therefore be linked to differing reproductive traits: Af and Rv lay terrestrial egg clutches and transport hatchlings to water, whereas Ll, a non-transporting species, lays eggs directly in water.”

      To read

      “These variations may therefore reflect differences in life history traits among the three species.”

      We moved the information on differing reproductive strategies into the Discussion, where it contributes to a broader context alongside other life history traits that may influence community diversity.

      Discussion (1): We added to our discussion that increased microbial diversity was not an expected outcome of vertical transmission.

      “However, increased microbial diversity is not a known outcome of vertical transmission, and further studies across a broader range of transporting and non-transporting species are needed to assess the role of transport in shaping diversity of tadpole-associated microbial communities.”

      Discussion (2): Likewise, communities associated with adults and tadpoles of transporting species were no more similar than those of non-transporting species. While poison frog tadpoles do acquire caregiver-specific microbes during transport, most of these microbes do not persist on the tadpoles' skin long-term. This pattern can likely be attributed to the capacity of tadpole skin- and gut microbiota to flexibly adapt to environmental changes (Emerson & Woodley, 2024; Santos et al., 2023; Scarberry et al., 2024). It may also reflect the limited compatibility of skin microbiota from terrestrial adults with aquatic habitats or tadpole skin, which differs structurally from that of adults (Faszewski et al., 2008). As a result, many transmitted microbes are probably outcompeted by microbial taxa continuously supplied by the aquatic environment. Interestingly, microbial communities of the non-transporting Ll were more similar to their adult counterparts than those of poison frogs. This pattern might reflect differences in life history among the species. While adult Ll commonly inhabit the rock pools where their tadpoles develop, adults of the two poison frog species visit tadpole nurseries only sporadically for deposition. These differences in habitat use may result in adult Ll hosting skin microbiota that are better adapted to aquatic environments as compared to Rv and Af. Additionally, their presence in the tadpoles’ habitat could make Ll a more consistent source of microbiota for developing tadpoles.

      (4) Field experiment: The rationale and interpretation of the genus-level network are not clear, and the figure is not legible. What does it mean to "visualize the microbial interconnectedness" or to be a "central part of the community"? The previous sentences in this paragraph (lines 337-343) seem to imply that transfer is parent-specific, but the genuslevel network is based on the current adult frogs, not the previous generation of parents that transported them. So it is not clear that the distribution or co-distribution of these taxa provides any insight into vertical transmission dynamics.

      Implemented Changes:  

      We appreciate the reviewer’s close reading and understand how the inclusion of the network visualization without further clarification may have led to confusion. To clarify, the network was constructed from all adult frogs in the population, including—but not limited to—the parental frogs examined in the field experiment. We do not make any claims about the origin of the microbial taxa found on parental frogs. Rather, our aim was to illustrate how genera retained on tadpoles (following potential vertical transmission) contribute to the skin microbial communities of adult frogs of this population beyond just the parental individuals. This finding supports the observation that these retained taxa are generally among the most abundant in adult frogs. However, since this information is already presented in Table S8 and the figure is not essential to the main conclusions, we have removed Supplementary Figure S5 and the accompanying sentence: “A genus-level network constructed from 44 adult frogs shows that the retained genera make up a central part of the community of adult Rv in wild populations (Fig. S5).” We have adjusted the Methods section accordingly.

      Reviewer #2:

      I did not find any major weaknesses in my review of this paper. The work here could potentially benefit from absolute abundance levels for shared ASVs between adults and tadpoles to more thoroughly understand the influences of vertical transmission that might be masked by relative abundance counts. This would only be a minor improvement as I think the conclusions from this work would likely remain the same, however.

      In response to the reviewer’s suggestion, we estimated the absolute abundance of specific ASVs for all samples of tadpoles in which Sourcetracker identified shared ASVs between adults and tadpoles. The resulting scaled absolute abundance values (in copies/μL and copies per tadpole) are provided in Table S10, and a description of the method has been incorporated into the revised Methods section of the manuscript. To support the robustness of this approach in our dataset, we additionally designed an ASV-specific system for ASV24902-Methylocella. Candidate primers were assessed for specificity by performing local BLASTn alignments against the full set of ASV sequences identified in the respective microbial communities of tadpoles. We optimized the annealing temperature via gradient PCR and confirmed primer specificity through Sanger sequencing of the PCR product (Forward: 5′–GAGCACGTAGGCGGATCT–3′ Reverse: 5′–GGACTACNVGGGTWTCTAAT–3′). Using this approach, we confirmed that the relative abundance of ASV24902 (18.05% in the amplicon sequencing data) closely matched its proportion of the absolute 16S rRNA copy number in transported tadpole 6 (18.01%). While we intended to quantify all shared ASVs, we were limited to this single target due to insufficient material for optimizing the assays. As this particular ASV was also detected in the water associated with the same tadpole, we chose not to include this confirmation in the manuscript. Nevertheless, the close match supports the reliability of our approach for scaling absolute abundances in this dataset.

      Results: Absolute abundances of shared ASVs likely originating from the parental source pool (as identified by Sourcetracker) after one month of growth ranged from 7804 to 172326 copies per tadpole (Table S10).

      Methods: Quantitative analysis of 16S rRNA copy numbers with digital PCR (dPCR)

      Absolute abundances were estimated for ASVs that were shared between tadpoles after a one-month growth period and their respective caregivers, and for which Sourcetracker analysis identified the caregiver as a likely source of microbiota. We followed the quantitative sequencing framework described by Barlow et al. (2020), measuring total microbial load via digital PCR (dPCR) with the same universal 16S rRNA primers used to amplify the v4 region in our sequencing dataset. Absolute 16S rRNA copy numbers obtained from dPCR were then multiplied by the relative abundances from our amplicon sequencing dataset to calculate ASV-specific scaled absolute abundances. All dPCR reactions were carried out on a QIAcuity Digital PCR System (Qiagen) using Nanoplates with a 8.5K partition configuration, using the following cycling program: 95°C for 2 minutes, 40 cycles of 95°C for 30 seconds and 52°C for 30 seconds and 72°C for 1 minute, followed by 1 cycle of 40°C for 5 minutes. Reactions were prepared using the QIAcuity EvaGreen PCR Kit (Qiagen, Cat. No. 250111) with 2 µL of DNA template per reaction, following the manufacturer's protocol, and included a negative no-template control and a cleaned and sequenced PCR product as positive control. Samples were measured in triplicates and serial dilutions were performed to ensure accurate quantification. Data were processed with the QIAcuity Software Suite (v3.1.0.0). The threshold was set based on the negative and positive controls in 1D scatterplots. We report mean copy numbers per microliter with standard deviations, correcting for template input, dPCR reaction volume, and dilution factor. Mean copy numbers per tadpole were additionally calculated by accounting for the DNA extraction (elution) volume.  

      Recommendations for the authors:

      Reviewer #1:

      (1) Figure 1b summarizes the ddPCR data as a binary (detected/not detected), but this contradicts the main text associated with this figure, which describes bacteria as present, albeit in low abundances, in unhatched embryos (lines 145-147). Could the authors keep the diagram of tadpole development, which I find very useful, but add the ddPCR data from Figure S1c instead of simply binarizing it as present/absent?

      We appreciate the reviewer’s positive feedback on the clarity of the figure. We agree that presenting the ddPCR data in a more quantitative manner provides a more accurate representation of bacterial abundance across developmental stages. In response, we have retained the developmental diagram, as suggested, and replaced the binary (detected/not detected) information in Figure 1B with rounded mean values for each stage. To complement this, we have included mean values and standard deviations in Table S1. The corresponding text in the main manuscript and legends has been revised accordingly to reflect these changes.  

      (2) More information about the foster species, Oophaga sylvatica, would be helpful. Are they sympatric with Rv? Is their transporting behavior similar to that of Rv?

      We thank the reviewer for this helpful comment. In response, we have added further details on the biology and parental care behavior of Oophaga sylvatica, including information on its distribution range. The species does not overlap with Ranitomeya variabilis at the specific study site where the field work was conducted, although the species are sympatric in other countries. These additions have been incorporated into the Methods section under "Study species, reproductive strategies, and life history."  

      (3) Plotting the proportion of each tadpole microbiome attributed to R. variabilis and the proportion attributed to O. sylvatica on the same plot is confusing, as these points are nonindependent and there is no way for the reader to figure out which points originated from the same tadpole. I would suggest replacing Figure 1D with Figure S2C, which (if I understand correctly) displays the same data, but is separated according to source.

      We agree with the reviewer that Figure S2C allows for clearer interpretation of our results. In response, we implemented the suggested change and replaced Figure 1D with the alternative visualization previously shown in Figure S2C, which displays the same data separated by source. To provide readers with a complementary overview of the full dataset, we have retained the original combined plot in the supplementary material as Figure S2D.

      (4) On the first read, I found the use of "transport" in the cross-fostering experiment confusing until I understood that they weren't being transported "to" anywhere in particular, just carried for 6 hours. A change of phrasing might help readers here.

      We acknowledge the reviewer’s concern and have replaced “transported” with “carried” to avoid confusion for readers who may be unfamiliar with the behavioral terminology. However, because “transport” is the term widely used by specialists to describe this behavior, we now introduce it in the context of the experimental design with the following phrasing:

      “For this design, sequence-based surveys of amplified 16S rRNA genes were used to assess the composition of skin-associated microbial communities on tadpoles and their adult caregivers (i.e., the frogs carrying the tadpoles, typically referred to as ‘transporting’ frogs).”

      (5) "Horizontal transfer" typically refers to bacteria acquired from other hosts, not environmental source pools (line 394).

      We addressed this concern by rephrasing the sentence in the Discussion to avoid potential confusion. The revised text now reads:

      “Across species, newborns might acquire bacteria not only through transfer from environmental source pools and other hosts (…)”  

      (6) The authors suggest that tadpole transport may have evolved in Rv and Af to promote microbial diversity because "increased microbial diversity is linked to better health outcomes" (lines 477-479). It is often tempting to assume that more diversity is always better/more adaptive, but this is not universally true. The fact that the Ll frogs seem to be doing fine in the same environment despite their lower microbiome diversity suggests that this interpretation might be too far of a reach based on the data here.

      We appreciate the reviewer’s concern, agree that increased microbial diversity is not inherently advantageous and have revised the paragraph to make this clearer.  

      “While increased microbial diversity is not inherently advantageous, it has been associated with beneficial outcomes such as improved immune function, lower disease risk, and enhanced fitness in multiple other vertebrate systems.”

      However, rather than claiming that greater diversity is always advantageous, we suggest that this possibility should not be excluded and consider it a relevant aspect of a comprehensive discussion. We also note that whether poison frog tadpoles perform equally well with lower microbial diversity remains an open question. Drawing such conclusions would require experimental validation and cannot be inferred from comparisons with an evolutionarily distant species that differs in life history.

      Reviewer #2:

      (1) Figure 2: Are the data points in C a subset (just the tadpoles for each species) of B? The numbers look a little different between them. The number of observed ASVs in panel B for Rv look a bit higher than the observed ASVs in panel C.

      The data shown in panel C are indeed a subset of the samples presented in panel B, focusing specifically on tadpoles of each species. The slight differences in the number of observed ASVs between panels result from differences in rarefaction depth between comparisons: due to variation in sequencing depth across species and life stages, we performed rarefaction separately for each comparison in order to retain the highest number of taxa while ensuring comparability within each group. Although we acknowledge that this is not a standard approach, we found that results were consistent when rarefying across the full dataset, but chose the presented approach to better accommodate variation in our sample structure. This methodological detail is described in the Methods section:

      “All alpha diversity analyses were conducted with datasets rarefied to 90% of the read number of the sample with the fewest reads in each comparison and visualized with boxplots.”

      It is also noted in the figure legend: “The dataset was separately rarefied to the lowest read depth f each comparison.” We hope this clarification adequately addresses the reviewer’s concern and therefore have not made additional changes.

      (2) Lines 304-305: in the Figure 4B plot, there appear to be 12 transported tadpoles and 8 non-transported tadpoles.

      Thank you for catching this. We have corrected the plot and the associated statistics (alpha and beta diversity) in the results section as well as in the figure. Importantly, the correction did not affect any other results, and the overall findings and interpretations remain unchanged.  

      (3) Line 311: I think this should be Figure 4B.

      (4) Line 430: tadpole transport.

      (5) Line 431: I believe commas need to surround this phrase "which range from a few hours to several days depending on the species (Lötters et al., 2007; McDiarmid & Altig, 1999; Pašukonis et al., 2019)".

      We thank the reviewer for the thorough review and have corrected all typographical and formatting errors noted in comments (3) – (5).

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations for the authors): 

      One minor question would be whether the authors could expand more on the application of END-Seq to examine the processive steps of the ALT mechanism? Can they speculate if the ssDNA detected in ALT cells might be an intermediate generated during BIR (i.e., is the ssDNA displaced strand during BIR) or a lesion? Furthermore, have the authors assessed whether ssDNA lesions are due to the loss of ATRX or DAXX, either of which can be mutated in the ALT setting?

      We appreciate the reviewer’s insightful questions regarding the application of our assays to investigate the nature of the ssDNA detected in ALT telomeres. Our primary aim in this study was to establish the utility of END-seq and S1-END-seq in telomere biology and to demonstrate their applicability across both ALT-positive and -negative contexts. We agree that exploring the mechanistic origins of ssDNA would be highly informative, and we anticipate that END-seq–based approaches will be well suited for such future studies. However, it remains unclear whether the resolution of S1-END-seq is sufficient to capture transient intermediates such as those generated during BIR. We have now included a brief speculative statement in the revised discussion addressing the potential nature of ssDNA at telomeres in ALT cells.

      Reviewer #2 (Recommendations for the authors):

      How can we be sure that all telomeres are equally represented? The authors seem to assume that END-seq captures all chromosome ends equally, but can we be certain of this? While I do not see an obvious way to resolve this experimentally, I recommend discussing this potential bias more extensively in the manuscript.

      We thank the reviewer for raising this important point. END-seq and S1-END-seq are unbiased methods designed to capture either double-stranded or single-stranded DNA that can be converted into blunt-ended double-stranded DNA and ligated to a capture oligo. As such, if a subset of telomeres cannot be processed using this approach, it is possible that these telomeres may be underrepresented or lost. However, to our knowledge, there are no proposed telomeric structures that would prevent capture using this method. For example, even if a subset of telomeres possesses a 5′ overhang, it would still be captured by END-seq. Indeed, we observed the consistent presence of the 5′-ATC motif across multiple cell lines and species (human, mouse, and dog). More importantly, we detected predictable and significant changes in sequence composition when telomere ends were experimentally altered, either in vivo (via POT1 depletion) or in vitro (via T7 exonuclease treatment). Together, these findings support the robustness of the method in capturing a representative and dynamic view of telomeres across different systems.

      That said, we have now included a brief statement in the revised discussion acknowledging that we cannot fully exclude the possibility that a subset of telomeres may be missed due to unusual or uncharacterized structures

      I believe Figures 1 and 2 should be merged.

      We appreciate the reviewer’s suggestion to merge Figures 1 and 2. However, we feel that keeping them as separate figures better preserves the logical flow of the manuscript and allows the validation of END-seq and its application to be presented with appropriate clarity and focus. We hope the reviewer agrees that this layout enhances the clarity and interpretability of the data.

      Scale bars should be added to all microscopy figures.

      We thank the reviewer for pointing this out. We have now added scale bars to all the microscopy panels in the figures and included the scale details in the figure legends.

      Reviewer #3 (Recommendations for the authors):

      Overall, the discussion section is lacking depth and should be expanded and a few additional experiments should be performed to clarify the results.

      We thank the reviewer for the suggestions. Based on this reviewer’s comments and comments for the other reviewers, we incorporated several points into the discussion. As a result, we hope that we provide additional depth to our conclusions.

      (1) The finding that the abundance of variant telomeric repeats (VTRs) within the final 30 nucleotides of the telomeric 5' ends is similar in both telomerase-expressing and ALT cells is intriguing, but the authors do not address this result. Could the authors provide more insight into this observation and suggest potential explanations? As the frequency of VTRs does not seem to be upregulated in POT1-depleted cells, what then drives the appearance of VTRs on the C-strand at the very end of telomeres? Is CST-Pola complex responsible?

      The reviewer raises a very interesting and relevant point. We are hesitant at this point to speculate on why we do not see a difference in variant repeats in ALT versus non-ALT cells, since additional data would be needed. One possibility is that variant repeats in ALT cells accumulate stochastically within telomeres but are selected against when they are present at the terminal portion of chromosome ends. However, to prove this hypothesis, we would need error-free long-read technology combined with END-seq. We feel that developing this approach would be beyond the scope of this manuscript.

      (2) The authors also note that, in ALT cells, the frequency of VTRs in the first 30 nucleotides of the S1-END-SEQ reads is higher compared to END-SEQ, but this finding is not discussed either. Do the authors think that the presence of ssDNA regions is associated with the VTRs? Along this line, what is the frequency of VTRs in the END-SEQ analysis of TRF1-FokI-expressing ALT cells? Is it also increased? Has TRF1-FokI been applied to telomerase-expressing cells to compare VTR frequencies at internal sites between ALT and telomerase-expressing cells?

      Similarly to what is discussed above, short reads have the advantage of being very accurate but do not provide sufficient length to establish the relative frequency of VTRs across the whole telomere sequence. The TRF1-FokI experiment is a good suggestion, but it would still be biased toward non-variant repeats due to the TRF1-binding properties. We plan to address these questions in a future study involving long-read sequencing and END-seq capture of telomeres.

      Finally, in these experiments (S1-END-SEQ or END-SEQ in TRF1-Fok1), is the frequency of VTRs the same on both the C- and the G-rich strands? It is possible that the sequences are not fully complementary in regions where G4 structures form.

      We thank the reviewer for this observation. While we do observe a higher frequency of variant telomeric repeats (VTRs) in the first 30 nucleotides of S1-END-seq reads compared to END-seq in ALT cells, we are currently unable to determine whether this difference is significant, as an appropriate control or matched normalization strategy for this comparison is lacking. Therefore, we refrain from overinterpreting the biological relevance of this observation.

      The reviewer is absolutely correct. Our calculation did not exclude the possibility of extrachromosomal DNA as a source of telomeric ssDNA. We have now addressed this point in our discussion.

      The reviewer is correct in pointing out that we still do not know what causes ssDNA at telomeres in ALT cells. Replication stress seems the most logical explanation based on the work of many labs in the field. However, our data did not reveal any significant difference in the levels of ssDNA at telomeres in non-ALT cells based on telomere length. We used the HeLa1.2.11 cell line (now clarified in the Materials section), which is the parental line of HeLa1.3 and has similarly long telomeres (~20 kb vs. ~23 kb). Despite their long telomeres and potential for replication-associated challenges such as G-quadruplex formation, HeLa1.2.11 cells did not exhibit the elevated levels of telomeric ssDNA that we observed in ALT cells (Figure 4B). Additional experiments are needed to map the occurrence of ssDNA at telomeres in relation to progression toward ALT.

      (3) Based on the ratio of C-rich to G-rich reads in the S1-END-SEQ experiment, the authors estimate that ALT cells contain at least 3-5 ssDNA regions per chromosome end. While the calculation is understandable, this number could be discussed further to consider the possibility that the observed ratios (of roughly 0.5) might result from the presence of extrachromosomal DNA species, such as C-circles. The observed increase in the ratio of C-rich to G-rich reads in BLM-depleted cells supports this hypothesis, as BLM depletion suppresses C-circle formation in U2OS cells. To test this, the authors should examine the impact of POLD3 depletion on the C-rich/G-rich read ratio. Alternatively, they could separate high-molecular-weight (HMW) DNA from low-molecular-weight DNA in ALT cells and repeat the S1-END-SEQ in the HMW fraction.

      The reviewer is absolutely correct. Our calculation did not exclude the possibility of extrachromosomal DNA as a source of telomeric ssDNA. We have now addressed this point in our discussion.

      (4) What is the authors' perspective on the presence of ssDNA at ALT telomeres? Do they attribute this to replication stress? It would be helpful for the authors to repeat the S1-END-SEQ in telomerase-expressing cells with very long telomeres, such as HeLa1.3 cells, to determine if ssDNA is a specific feature of ALT cells or a result of replication stress. The increased abundance of G4 structures at telomeres in HeLa1.3 cells (as shown in J. Wong's lab) may indicate that replication stress is a factor. Similar to Wong's work, it would be valuable to compare the C-rich/G-rich read ratios in HeLa1.3 cells to those in ALT cells with similar telomeric DNA content.

      The reviewer is correct in pointing out that we still do not know what causes ssDNA at telomeres in ALT cells. Replication stress seems the most logical explanation based on the work of many labs in the field. However, our data did not reveal any significant difference in the levels of ssDNA at telomeres in non-ALT cells based on telomere length. We used the HeLa1.2.11 cell line (now clarified in the Materials section), which is the parental line of HeLa1.3 and has similarly long telomeres (~20 kb vs. ~23 kb). Despite their long telomeres and potential for replication-associated challenges such as G-quadruplex formation, HeLa1.2.11 cells did not exhibit the elevated levels of telomeric ssDNA that we observed in ALT cells (Figure 4B). Additional experiments are needed to map the occurrence of ssDNA at telomeres in relation to progression toward ALT.

      Finally, Reviewer #3 raises a list of minor points:

      (1) The Y-axes of Figure 4 have been relabeled to account for the G-strand reads.

      (2) Statistical analyses have been added to the figures where applicable.

      (3) The manuscript has been carefully proofread to improve clarity and consistency throughout the text and figure legends

      (4) We have revised the text to address issues related to the lack of cross-referencing between the supplementary figures and their corresponding legends.

    1. Author Response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review): 

      Summary: 

      Genome-wide association studies have been an important approach to identifying the genetic basis of human traits and diseases. Despite their successes, for many traits, a substantial amount of variation cannot be explained by genetic factors, indicating that environmental variation and individual 'noise' (stochastic differences as well as unaccounted for environmental variation) also play important roles. The authors' goal was to address whether gene expression variation in genetically identical individuals, driven by historical environmental differences and 'noise', could be used to predict reproductive trait differences. 

      Strengths: 

      To address this question, the authors took advantage of genetically identical C. elegans individuals to transcriptionally profile 180 adult hermaphrodite individuals that were also measured for two reproductive traits. A major strength of the paper is its experimental design. While experimenters aim to control the environment that each worm experiences, it is known that there are small differences that each worm experiences even when they are grown together on the same agar plate - e.g. the age of their mother, their temperature, the amount of food they eat, and the oxygen and carbon dioxide levels depending on where they roam on the plate. Instead of neglecting this unknown variation, the authors design the experiment up front to create two differences in the historical environment experienced by each worm: 1) the age of its mother and 2) 8 8-hour temperature difference, either 20 or 25 {degree sign}C. This helped the authors interpret the gene expression differences and trait expression differences that they observed. 

      Using two statistical models, the authors measured the association of gene expression for 8824 genes with the two reproductive traits, considering both the level of expression and the historical environment experienced by each worm. Their data supports several conclusions. They convincingly show that gene expression differences are useful for predicting reproductive trait differences, predicting ~25-50% of the trait differences depending on the trait. Using RNAi, they also show that the genes they identify play a causal role in trait differences. Finally, they demonstrate an association with trait variation and the H3K27 trimethylation mark, suggesting that chromatin structure can be an important causal determinant of gene expression and trait variation. 

      Overall, this work supports the use of gene expression data as an important intermediate for understanding complex traits. This approach is also useful as a starting point for other labs in studying their trait of interest. 

      We thank the reviewer for their thorough articulation of the strengths of our study.  

      Weaknesses: 

      There are no major weaknesses that I have noted. Some important limitations of the work (that I believe the authors would agree with) are worth highlighting, however: 

      (1) A large remaining question in the field of complex traits remains in splitting the role of non-genetic factors between environmental variation and stochastic noise. It is still an open question which role each of these factors plays in controlling the gene expression differences they measured between the individual worms. 

      Yes, we agree that this is a major question in the field. In our study, we parse out differences driven between known historical environmental factors and unknown factors, but the ‘unknown factors’ could encompass both unknown environmental factors and stochastic noise.

      (2) The ability of the authors to use gene expression to predict trait variation was strikingly different between the two traits they measured. For the early brood trait, 448 genes were statistically linked to the trait difference, while for egg-laying onset, only 11 genes were found. Similarly, the total R2 in the test set was ~50% vs. 25%. It is unclear why the differences occur, but this somewhat limits the generalizability of this approach to other traits. 

      We agree that the difference in predictability between the two traits is interesting. A previous study from the Phillips lab measured developmental rate and fertility across Caenorhabditis species and parsed sources of variation (1). Results indicated that 83.3% of variation in developmental rate was explained by genetic variation, while only 4.8% was explained by individual variation. In contrast, for fertility, 63.3% of variation was driven by genetic variation and 23.3% was explained by individual variation. Our results, of course, focus only on predicting the individual differences, but not genetic differences, for these two traits using gene expression data. Considering both sets of results, one hypothesis is that we have more power to explain nongenetic phenotypic differences with molecular data if the trait is less heritable, which is something that could be formally interrogated with more traits across more strains.

      (3) For technical reasons, this approach was limited to whole worm transcription. The role of tissue and celltype expression differences is important to the field, so this limitation is important. 

      We agree with this assessment, and it is something we hope to address with future work.

      Reviewer #2 (Public review): 

      Summary: 

      This paper measures associations between RNA transcript levels and important reproductive traits in the model organism C. elegans. The authors go beyond determining which gene expression differences underlie reproductive traits, but also (1) build a model that predicts these traits based on gene expression and (2) perform experiments to confirm that some transcript levels indeed affect reproductive traits. The clever study design allows the authors to determine which transcript levels impact reproductive traits, and also which transcriptional differences are driven by stochastic vs environmental differences. In sum, this is a rather comprehensive study that highlights the power of gene expression as a driver of phenotype, and also teases apart the various factors that affect the expression levels of important genes. 

      Strengths: 

      Overall, this study has many strengths, is very clearly communicated, and has no substantial weaknesses that I can point to. One question that emerges for me is about the extent to which these findings apply broadly. In other words, I wonder whether gene expression levels are predictive of other phenotypes in other organisms. I

      think this question has largely been explored in microbes, where some studies (PMID: 17959824) but not others (PMID: 38895328) find that differences in gene expression are predictive of phenotypes like growth rate. Microbes are not the primary focus here, and instead, the discussion is mainly focused on using gene expression to predict health and disease phenotypes in humans. This feels a little complicated since humans have so many different tissues. Perhaps an area where this approach might be useful is in examining infectious single-cell populations (bacteria, tumors, fungi). But I suppose this idea might still work in humans, assuming the authors are thinking about targeting specific tissues for RNAseq. 

      In sum, this is a great paper that really got me thinking about the predictive power of gene expression and where/when it could inform about (health-related) phenotypes. 

      We thank the reviewer for recognizing the strengths of our study. We are also interested in determining the extent to which predictive gene expression differences operate in specific tissues.

      Reviewer #3 (Public review): 

      Summary: 

      Webster et al. sought to understand if phenotypic variation in the absence of genetic variation can be predicted by variation in gene expression. To this end they quantified two reproductive traits, the onset of egg laying and early brood size in cohorts of genetically identical nematodes exposed to alternative ancestral (two maternal ages) and same generation life histories (either constant 20C temperature or 8-hour temperature shift to 25C upon hatching) in a two-factor design; then they profiled genome-wide gene expression in each individual. 

      Using multiple statistical and machine learning approaches, they showed that, at least for early brood size, phenotypic variation can be quite well predicted by molecular variation, beyond what can be predicted by life history alone. 

      Moreover, they provide some evidence that expression variation in some genes might be causally linked to phenotypic variation. 

      Strengths: 

      (1) Cleverly designed and carefully performed experiments that provide high-quality datasets useful for the community. 

      (2) Good evidence that phenotypic variation can be predicted by molecular variation. 

      We thank the reviewer for recognizing the strengths of our study.

      Weaknesses:  

      What drives the molecular variation that impacts phenotypic variation remains unknown. While the authors show that variation in expression of some genes might indeed be causal, it is still not clear how much of the molecular variation is a cause rather than a consequence of phenotypic variation. 

      We agree that the drivers of molecular variation remain unknown. While we addressed one potential candidate (histone modifications), there is much to be done in this area of research. We agree that, while some gene expression differences cause phenotypic changes, other gene expression differences could in principle be downstream of phenotypic differences.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors): 

      I have a number of suggestions that I believe will improve the Methods section. 

      (1) Strain N2-PD1073 will probably be confusing to some readers. I recommend spelling out that this is the Phillips lab version of N2.

      Thank you for this suggestion; we have added additional explanation of this strain in the Methods.

      (2) I found the details of the experimental design confusing, and I believe a supplemental figure will help. I have listed the following points that could be clarified: 

      a. What were the biological replicates? How many worms per replicate?

      Biological replicates were defined as experiments set up on different days (in this case, all biological replicates were at least a week apart), and the biological replicate of each worm can be found in Supplementary File 1 on the Phenotypic Data tab.

      b. I believe that embryos and L4s were picked to create different aged P0s, and eggs and L4s were picked to separate plates? Is this correct?

      Yes, this is correct.

      c. What was the spread in the embryo age?

      We assume this is asking about the age of the F1 embryos, and these were laid over the course of a 2-hour window.  

      d. While the age of the parents is different, there are also features about their growth plates that will be impacted by the experimental design. For example, their pheromone exposure is different due to the role that age plays in the combination of ascarosides that are released. It is worth noting as my reading of the paper makes it seem that parental age is the only thing that matters.

      The parents (P0) of different ages likely have differential ascaroside exposure because they are in the vicinity of other similarly aged worms, but the F1 progeny were exposed to their parents for only the 2-hour egg-laying window, in an attempt to minimize this type of effect as much as possible.  

      e. Were incubators used for each temperature?

      Yes.

      f. In line 443, why approximately for the 18 hours? How much spread?

      The approximation was based on the time interval between the 2-hour egg-laying window on Day 4 and the temperature shift on Day 5 the following morning. The timing was within 30 minutes of 18 hours either direction.

      g.  In line 444, "continually left" is confusing. Does this mean left in the original incubator?

      Yes, this means left in the incubator while the worms shifted to 25°C were moved. To avoid confusion, we re-worded this to state they “remained at 20°C while the other half were shifted to 25°C”.

      h. In line 445, "all worms remained at 20 {degree sign}C" was confusing to me as to what it indicated. I assume, unless otherwise noted, the animals would not be moved to a new temperature.

      This was an attempt to avoid confusion and emphasize that all worms were experiencing the same conditions for this part of the experiment.  

      i. What size plates were the worms singled onto?

      They were singled onto 6-cm plates.

      j. If a figure were to be made, having two timelines (with respect to the P0 and F1) might be useful.

      We believe the methods should be sufficient for someone who hopes to repeat the experiment, and we believe the schematic in Figure 1A labeling P0 and F1 generations is sufficient to illustrate the key features of the experimental design.

      k. Not all eggs that are laid end up hatching. Are these censored from the number of progeny calculations?

      Yes, only progeny that hatched and developed were counted for early brood.

      (3) For the lysis, was the second transfer to dH20 also a wash step?

      Yes.

      (4) What was used for the Elution buffer?

      We used elution buffer consisting of 10 mM Tris, 0.1 mM EDTA. We have added this to the “Cell lysate generation” section of the methods

      (5) The company that produced the KAPA mRNA-seq prep kit should be listed.

      We added that the kit was from Roche Sequencing Solutions.

      (6) For the GO analysis - one potential issue is that the set of 8824 genes might also be restricted to specific GO categories. Was this controlled for?

      We originally did not explicitly control for this and used the default enrichGO settings with OrgDB = org.Ce.eg.db as the background set for C. elegans. We have now repeated the analysis with the “universe” set to the 8824-gene background set. This did not qualitatively change the significant GO terms, though some have slightly higher or lower p-values. For comparison purposes, we have added the background-corrected sets to the GO_Terms tab of Supplementary File 1 with each of the three main gene groups appended with “BackgroundOf8824”.

      Reviewer #2 (Recommendations for the authors): 

      (1) The abstract, introduction, and experimental design are well thought through and very clear.

      Thank you.

      (2) Figure 1B could use a clearer or more intuitive label on the horizontal axis. The two examples help. Maybe the genes (points) on the left side should be blue to match Figure 1C, where the genes with a negative correlation are in the blue cluster.

      Thank you for these suggestions. We re-labeled the x-axis as “Slope of early brood vs. gene expression (normalized by CPM)”, which we hope gives readers a better intuition of what the coefficient from the model is measuring. We also re-colored the points previously colored red in Figure 1B to be color-coded depending on the direction of association to match Figure 1C, so these points are now color-coded as pink and purple.  

      (3) If red/blue are pos/neg correlated genes in 1C, perhaps different colors should be used to label ELO and brood in Figures 2 and 3. Green/purple?

      We appreciate this point, but since we ended up using the cluster colors of pink and purple in Figure 1, we opted to leave Figures 2 and 3 alone with the early brood and ELO colorcoding of red and blue.

      (4) I am unfamiliar with this type of beta values, but I thought the explanation and figure were very clear. It could be helpful to bold beta1 and beta2 in the top panels of Figure 2, so the readers are not searching around for those among all the other betas. It could also be helpful to add an English phrase to the vertical axes inFigures 2C and 2D, in addition to the beta1 and beta2. Something like "overall effect (beta1)" and"environment-controlled effect (beta2)". Or maybe "effect of environment + stochastic expression differences

      (beta1)" and "effect of stochastic expression differences alone (beta2)". I guess those are probably too big to fit on the figure, but it might be nice to have a label somewhere on this figure connecting them to the key thing you are trying to measure - the effect of gene expression and environment.

      Thank you for these suggestions. We increased the font sizes and bolded β1 and β2 in Figure 2A-B. In Figure 2C-D, we added a parenthetical under β1 to say “(env + noise)” and β2 to say “(noise)”. We agree that this should give the reader more intuition about what the β values are measuring.  

      Reviewer #3 (Recommendations for the authors): 

      The authors collected individuals 24 hours after the onset of egg laying for transcriptomic profiling. This is a well-designed experiment to control for the physiological age of the germline. However, this does not properly control for somatic physiological age. Somatic age can be partially uncoupled from germline age across individuals, and indeed, this can be due to differences in maternal age (Perez et al, 2017). This is because maternal age is associated with increased pheromone exposure (unless you properly controlled for it by moving worms to fresh plates), which causes a germline-specific developmental delay in the progeny, resulting in a delayed onset of egg production compared to somatic development (Perez et al. 2021). You control for germline age, therefore, it is likely that the progeny of day 1 mothers are actually somatically older than the progeny of day 3 mothers. This would predict that many genes identified in these analyses might just be somatic genes that increase or decrease their expression during the young adult stage. 

      For example, the abundance of collagen genes among the genes negatively associated (including col-20, which is the gene most significantly associated with early brood) is a big red flag, as collagen genes are known to be changing dynamically with age. If variation in somatic vs germline age is indeed what is driving the expression variation of these genes, then the expectation is that their expression should decrease with age. Vice versa, genes positively associated with early brood that are simply explained by age should be increasing.  So I would suggest that the authors first check this using time series transcriptomic data covering the young adult stage they profiled. If this is indeed the case, I would then suggest using RAPToR ( https://github.com/LBMC/RAPToR ), a method that, using reference time series data, can estimate physiological age (including tissue-specific one) from gene expression. Using this method they can estimate the somatic physiological age of their samples, quantify the extent of variation in somatic age across individuals, quantify how much of the observed differences in expressions are explained just by differences in somatic age and correct for them during their transcriptomic analysis using the estimated soma age as a covariate (https://github.com/LBMC/RAPToR/blob/master/vignettes/RAPToR-DEcorrection-pdf.pdf). 

      This should help enrich a molecular variation that is not simply driven by hidden differences between somatic and germline age. 

      To first address some of the experimental details mentioned for our paper, parents were indeed moved to fresh plates where they were allowed to lay embryos for two hours and then removed. Thus, we believe this minimizes the effects of ascarosides as much as possible within our design. As shown in the paper, we also identified genes that were not driven by parental age and for all genes quantified to what extent each gene’s association was driven by parental age. Thus, it is unlikely that differences in somatic and germline age is the sole explanatory factor, even if it plays some role. We also note that we accounted for egg-laying onset timing in our experimental design, and early brood was calculated as the number of progeny laid in the first 24 hours of egg-laying, where egg-laying onset was scored for each individual worm to the hour. The plot of each worm’s ELO and early brood traits is in Figure S1. Nonetheless, we read the RAPToR paper with interest, as we highlighted in the paper that germline genes tend to be positively associated with early brood while somatic genes tend to be negatively associated. While the RAPToR paper discusses using tissue-specific gene sets to stage genetically diverse C. elegans RILs, the RAPToR reference itself was not built using gene expression data acquired from different C. elegans tissues and is based on whole worms, typically collected in bulk. I.e., age estimates in RILs differ depending on whether germline or somatic gene sets are used to estimate age when the the aging clock is based on N2 samples. Thus, it is unclear whether such an approach would work similarly to estimate age in single worm N2 samples. In addition, from what we can tell, the RAPToR R package appears to implement the overall age estimate, rather than using the tissue-specific gene sets used for RILs in the paper. Because RAPToR would be estimating the overall age of our samples using a reference that is based on fewer samples than we collected here, and because we already know the overall age of our samples measured using standard approaches, we believe that estimating the age with the package would not give very much additional insight.  

      Bonferroni correction: 

      First, I think there is some confusion in how the author report their p-values: I don't think the authors are using a cut-off of Bonferroni corrected p-value of 5.7 x 10-6 (it wouldn't make sense). It's more likely that they are using a Bonferroni corrected p of 0.05 or 0.1, which corresponds to a nominal p value of 5.7 x 10-6, am I right?

      Yes, we used a nominal p-value of 5.7 x 10-6 to correspond to a Bonferroni-corrected p-value of 0.05, calculated as 0.05/8824. We have re-worded this wherever Bonferroni correction was mentioned.

      Second, Bonferroni is an overly stringent correction method that has now been substituted by the more powerful Benjamini Hochberg method to control the false discovery rate. Using this might help find more genes and better characterize the molecular variation, especially the one associated with ELO?

      We agree that Bonferroni is quite stringent and because we were focused on identifying true positives, we may have some false negatives. Because all nominal p-values are included in the supplement, it is straightforward for an interested reader to search the data to determine if a gene is significant at any other threshold.   

      Minor comments: 

      (1) "In our experiment, isogenic adult worms in a common environment (with distinct historical environments) exhibited a range of both ELO and early brood trait values (Fig S1A)" I think this and the figure is not really needed, Figure S1B is already enough to show the range of the phenotypes and how much variation is driven by the life history traits.

      We agree that the information in S1A is also included in S1B, but we think it is a little more straightforward if one is primarily interested in viewing the distribution for a single trait.

      (2) Line 105 It should be Figure S2, not S3.

      Thank you for catching this mistake.

      (3) Gene Ontology on positive and negatively associated genes together: what about splitting the positive and negative?

      We have added a split of positive and negative GO terms to the GO_Terms tab of Supplement File 1. Broadly speaking, the most enriched positively associated genes have many of the same GO terms found on the combined list that are germline related (e.g., involved in oogenesis and gamete generation), whereas the most enriched negatively associated genes have GO terms found on the combined list that are related to somatic tissues (e.g., actin cytoskeleton organization, muscle cell development). This is consistent with the pattern we see for somatic and germline genes shown in Figure 4.

      (4) A lot of muscle-related GOs, can you elaborate on that?

      Yes, there are several muscle-related GOs in addition to germline and epidermis. While we do not know exactly why from a mechanistic perspective these muscle-related terms are enriched, it may be important to note that many of these terms have highly overlapping sets of genes which are listed in Supplementary File 1. For example, “muscle system process” and “muscle contraction” have the exact same set of 15 genes causing the term to be significantly enriched. Thus, we tend to not interpret having many GO terms on a given tissue as indicating that the tissue is more important than others for a given biological process. While it is clear there are genes related to muscle that are associated with early brood, it is not yet clear that the tissue is more important than others.  

      (5) "consistent with maternal age affecting mitochondrial gene expression in progeny " - has this been previously reported?

      We do not believe this particular observation has been reported. It is important to note that these genes are involved in mitochondrial processes, but are expressed from the nuclear rather than mitochondrial genome. We re-worded the quoted portion of the sentence to say “consistent with parental age affecting mitochondria-related gene expression in progeny”.

      (6) PCA: "Therefore, the optimal number of PCs occurs at the inflection points of the graph, which is after only7 PCs for early brood (R2 of 0.55) but 28 PCs for ELO (R2 of 0.56)." 

      Not clear how this is determined: just graphically? If yes, there are several inflection points in the plot. How did you choose which one to consider? Also, a smaller component is not necessarily less predictive of phenotypic variation (as you can see from the graph), so instead of subsequently adding components based on the variance, they explain the transcriptomic data, you might add them based on the variance they explain in the phenotypic data? To this end, have you tried partial least square regression instead of PCA? This should give gene expression components that are ranked based on how much phenotypic variance they explain.  

      Thank you for this thoughtful comment. We agree that, unlike for Figure 3B, there is some interpretation involved on how many PCs is optimal because additional variance explained with each PC is not strictly decreasing beyond a certain number of PCs. Our assessment was therefore made both graphically and by looking at the additional variance explained with each additional PC. For example, for early brood, there was no PC after PC7 that added more than 0.04 to the R2. We could also have plotted early brood and ELO separately and had a different ordering of PCs on the x-axis. By plotting the data this way, we emphasized that the factors that explain the most variation in the gene expression data typically explain most variation in the phenotypic data.  

      (7) The fact that there are 7 PC of molecular variation that explain early brood is interesting. I think the authors can analyze this further. For example, could you perform separate GO enrichment for each component that explains a sizable amount of phenotypic variance? Same for the ELO.  

      Because each gene has a PC loading in for each PC, and each PC lacks the explanatory power of combined PCs, we believe doing GO Terms on the list of genes that contribute most to each PC is of minimal utility. The power of the PCA prediction approach is that it uses the entire transcriptome, but the other side of the coin is that it is perhaps less useful to do a gene-bygene based analysis with PCA. This is why we separately performed individual gene associations and 10-gene predictive analyses. However, we have added the PC loadings for all genes and all PCs to Supplementary File 1.

      (8) Avoid acronyms when possible (i.e. ELO in figures and figure legends could be spelled out to improve readability).

      We appreciate this point, but because we introduced the acronym both in Figure 1 and the text and use it frequently, we believe the reader will understand this acronym. Because it is sometimes needed (especially in dense figures), we think it is best to use it consistently throughout the paper.

      (9) Multiple regression: I see the most selected gene is col-20, which is also the most significantly differentially expressed from the linear mixed model (LMM). But what is the overlap between the top 300 genes in Figure 3F and the 448 identified by the LMM? And how much is the overlap in GO enrichment?

      Genes that showed up in at least 4 out of 500 iterations were selected more often than expected by chance, which includes 246 genes (as indicated by the red line in Figure 3F). Of these genes, 66 genes (27%) are found in the set of 448 early brood genes. The proportion of overlap increases as the number of iterations required to consider a gene predictive increases, e.g., 34% of genes found in 5 of 500 iterations and 59% of genes found in 10 of 500 iterations overlap with the 448 early brood genes. However, likely because of the approach to identify groups of 10 genes that are predictive, we do not find significant GO terms among the 246 genes identified with this approach after multiple test correction. We think this makes sense because the LMM identifies genes that are individually associated with early brood, whereas each subsequent gene included in multiple regression affects early brood after controlling for all previous genes. These additional genes added to the multiple regression are unlikely to have similar patterns as genes that are individually correlated with early brood.  

      (10) Elastic nets: prediction power is similar or better than multiple regression, but what is the overlap between genes selected by the elastic net (not presented if I am not mistaken) and multiple regression and the linear mixed model?

      For the elastic net models, we used a leave-one-out cross validation approach, meaning there were separate models fit by leaving out the trait data for each worm, training a model using the trait data and transcriptomic data for the other worms, and using the transcriptomic data of the remaining worm to predict the trait data. By repeating this for each worm, the regressions shown in the paper were obtained. Each of these models therefore has its own set of genes. Of the 180 models for early brood, the median model selects 83 genes (range from 72 to 114 genes). Across all models, 217 genes were selected at least once. Interestingly, there was a clear bimodal distribution in terms of how many models a given gene was selected for: 68 genes were selected in over 160 out of 180 models, while 114 genes were selected in fewer than 20 models (and 45 genes were selected only once). Therefore, we consider the set of 68 genes as highly robustly selected, since they were selected in the vast majority of models. This set of 68 exhibits substantial overlap with both the set of 448 early brood-associated genes (43 genes or 63% overlap) and the multiple regression set of 246 genes (54 genes or 79% overlap). For ELO, the median model selected 136 genes (range of 96 to 249 genes) and a total of 514 genes were selected at least once. The distribution for ELO was also bimodal with 78 genes selected over 160 times and 255 genes selected fewer than 20 times. This set of 78 included 6 of the 11 significant ELO genes identified in the LMM.  We have added tabs to Supplementary File 1 that include the list of genes selected for the elastic net models as well as a count of how many times they were selected out of 180 models.

      (11) In other words, do these different approaches yield similar sets of genes, or are there some differences?

      In the end, which approach is actually giving the best predictive power? From the perspective of R2, both the multiple regression and elastic net models are similarly predictive for early brood, but elastic net is more predictive for ELO. However, in presenting multiple approaches, part of our goal was identifying predictive genes that could be considered the ‘best’ in different contexts. The multiple regression was set to identify exactly 10 genes, whereas the elastic net model determined the optimal number of genes to include, which was always over 70 genes. Thus, the elastic net model is likely better if one has gene expression data for the entire transcriptome, whereas the multiple regression genes are likely more useful if one were to use reporters or qRTPCR to measure a more limited number of genes.  

      (12) Line 252: "Within this curated set, genes causally affected early brood in 5 of 7 cases compared to empty vector (Figure 4A).

      " It seems to me 4 out of 7 from Figure 4A. In Figure 4A the five genes are (1) cin-4, (2) puf5; puf-7, (3) eef-1A.2, (4) C34C12.8, and (5) tir-1. We did not count nex-2 (p = 0.10) or gly-13 (p = 0.07), and empty vector is the control.

      (13) Do puf-5 and -7 affect total brood size or only early brood size? Not clear. What's the effect of single puf-5 and puf-7 RNAi on brood?

      We only measured early brood in this paper, but a previous report found that puf-5 and puf-7 act redundantly to affect oogenesis, and RNAi is only effective if both are knocked down together(2). We performed pilot experiments to confirm that this was the case in our hands as well.  

      (14)  To truly understand if the noise in expression of Puf-5 and /or -7 really causes some of the observed difference in early brood, could the author use a reporter and dose response RNAi to reduce the level of puf-5/7 to match the lower physiological noise range and observe if the magnitude of the reduction of early brood by the right amount of RNAi indeed matches the observed physiological "noise" effect of puf-5/7 on early brood?

      We agree that it would be interesting to do the dose response of RNAi, measure early brood, and get a readout of mRNA levels to determine the true extent of gene knockdown in each worm (since RNAi can be noisy) and whether this corresponds to early brood when the knockdown is at physiological levels. While we believe we have shown that a dose response of gene knockdown results in a dose response of early brood, this additional analysis would be of interest for future experiments.

      (15) Regulated soma genes (enriched in H3K27me3) are negatively correlated with early brood. What would be the mechanism there? As mentioned before, it is more likely that these genes are just indicative of variation in somatic vs germline age (maybe due to latent differences in parental perception of pheromone).

      We can think of a few potential mechanisms/explanations, but at this point we do not have a decisive answer. Regulated somatic genes marked with H3K27me3 (facultative heterochromatin) are expressed in particular tissues and/or at particular times in development. In this study and others, genes marked with H3K27me3 exhibit more gene expression noise than genes with other marks. This could suggest that there are negative consequences for the animal if genes are expressed at higher levels at the wrong time or place, and one interpretation of the negative association is that higher expressed somatic genes results in lower fitness (where early brood is a proxy for fitness). Another related interpretation is that there are tradeoffs between somatic and germline development and each individual animal lands somewhere on a continuum between prioritizing germline or somatic development, where prioritizing somatic integrity (e.g. higher expression of somatic genes) comes at a cost to the germline resulting in fewer progeny. Additional experiments, including measurements of histone marks in worms measured for the early brood trait, would likely be required to more decisively answer this question.  

      (16) Line 151: "Among significant genes for both traits, β2 values were consistently lower than β1 (Figures 2CD), suggesting some of the total effect size was driven by environmental history rather than pure noise".

      We are interpreting this quote as part of point 17 below.

      (17) It looks like most of the genes associated with phenotypes from the univariate model have a decreased effect once you account for life history, but have you checked for cases where the life history actually masks the effect of a gene? In other words, do you have cases where the effect of gene expression on a phenotype is only (or more) significant after you account for the effect of life history (β2 values higher than β1)?

      This is a good question and one that we did not explicitly address in the paper because we focused on beta values for genes that were significant in the univariate analysis. Indeed, for the sets of 448 early brood genes ad 11 ELO genes, there are no genes for which β2 is larger than β1. In looking at the larger dataset of 8824 genes, with a Bonferroni-corrected p-value of 0.05, there are 306 genes with a significant β2 for early brood. The majority (157 genes) overlap with the 448 genes significant in the univariate analysis and do not have a higher β2 than β1. Of the remaining genes, 72 of these have a larger β2 than β1. However, in most cases, this difference is relatively small (median difference of 0.025) and likely insignificant. There are only three genes in which β1 is not nominally significant, and these are the three genes with the largest difference between β1 and β2 with β2 being larger (differences of 0.166, 0.155, and 0.12). In contrast, the median difference between β1 and β2 the 448 genes (in which β1 is larger) is 0.17, highlighting the most extreme examples of β2 > β1 are smaller in magnitude than the typical case of β1 > β2. For ELO, there are no notable cases where β2 > β1. There are eight genes with a significant β2 value, and all of these have a β1 value that is nominally significant. Therefore, while this phenomenon does occur, we find it to be relatively rare overall. For completeness, we have added the β1 and β2 values for all 8824 genes as a tab in Supplementary File 1.

    1. Author Response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review): 

      Summary: 

      The authors address a fundamental question for cell and tissue biology using the skin epidermis as a paradigm and ask how stratifying self-renewing epithelia induce differentiation and upward migration in basal dividing progenitor cells to generate suprabasal barrier-forming cells that are essential for a functional barrier formed by such an epithelium. The authors show for the first time that an increase in intracellular actomyosin contractility, a hallmark of barrier-forming keratinocytes, is sufficient to trigger terminal differentiation. Hence the data provide in vivo evidence of the more general interdependency of cell mechanics and differentiation. The data appear to be of high quality and the evidences are strengthened through a combination of different genetic mouse models, RNA sequencing, and immunofluorescence analysis. 

      To generate and maintain the multilayered, barrier-forming epidermis, keratinocytes of the basal stem cell layer differentiate and move suprabasally accompanied by stepwise changes not only in gene expression but also in cell morphology, mechanics, and cell position. Whether any of these changes is instructive for differentiation itself and whether consecutive changes in differentiation are required remains unclear. Also, there are few comprehensive data sets on the exact changes in gene expression between different states of keratinocyte differentiation. In this study, through genetic fluorescence labeling of cell states at different developmental time points the authors were able to analyze gene expression of basal stem cells and suprabasal differentiated cells at two different stages of maturation: E14 (embryonic day 14) when the epidermis comprises mostly two functional compartments (basal stem cells and suprabasal socalled intermediate cells) and E16 when the epidermis comprise three (living) compartments where the spinous layer separates basal stem cells from the barrier-forming granular layer, as is the case in adult epidermis. Using RNA bulk sequencing, the authors developed useful new markers for suprabasal stages of differentiation like MafB and Cox1. The transcription factor MafB was then shown to inhibit suprabasal proliferation in a MafB transgenic model. 

      The data indicate that early in development at E14 the suprabasal intermediate cells resemble in terms of RNA expression, the barrier-forming granular layer at E16, suggesting that keratinocytes can undergo either stepwise (E16) or more direct (E14) terminal differentiation. 

      Previous studies by several groups found an increased actomyosin contractility in the barrierforming granular layer and showed that this increase in tension is important for epidermal barrier formation and function. However, it was not clear whether contractility itself serves as an instructive signal for differentiation. To address this question, the authors use a previously published model to induce premature hypercontractility in the spinous layer by using spastin overexpression (K10-Spastin) to disrupt microtubules (MT) thereby indirectly inducing actomyosin contractility. A second model activates myosin contractility more directly through overexpression of a constitutively active RhoA GEF (K10-Arhgef11CA). Both models induce late differentiation of suprabasal keratinocytes regardless of the suprabasal position in either spinous or granular layer indicating that increased contractility is key to induce late differentiation of granular cells. A potential weakness of the K10-spastin model is the disruption of MT as the primary effect which secondarily causes hypercontractility. However, their previous publications provided some evidence that the effect on differentiation is driven by the increase in contractility (Ning et al. cell stem cell 2021). Moreover, the data are confirmed by the second model directly activating myosin through RhoA. These previous publications already indicated a role for contractility in differentiation but were focused on early differentiation. The data in this manuscript focus on the regulation of late differentiation in barrier-forming cells. These important data help to unravel the interdependencies of cell position, mechanical state, and differentiation in the epidermis, suggesting that an increase in cellular contractility in most apical positions within the epidermis can induce terminal differentiation. Importantly the authors show that despite contractility-induced nuclear localization of the mechanoresponsive transcription factor YAP in the barrier-forming granular layer, YAP nuclear localization is not sufficient to drive premature differentiation when forced to the nucleus in the spinous layer. 

      Overall, this is a well-written manuscript and a comprehensive dataset. Only the RNA sequencing result should be presented more transparently providing the full lists of regulated genes instead of presenting just the GO analysis and selected target genes so that this analysis can serve as a useful repository. The authors themselves have profited from and used published datasets of gene expression of the granular cells. Moreover, some of the previous data should be better discussed though. The authors state that forced suprabasal contractility in their mouse models induces the expression of some genes of the epidermal differentiation complex (EDC). However, in their previous publication, the authors showed that major classical EDC genes are actually not regulated like filaggrin and loricrin (Muroyama and Lechler eLife 2017). This should be discussed better and necessitates including the full list of regulated genes to show what exactly is regulated. 

      We thank the reviewers for their suggestions and comments.

      Thank you for the suggestion to include gene lists. We had an excel document with all this data but neglected to upload it with the initial manuscript. This includes all the gene signatures for the different cell compartments across development. We also include a tab that lists all EDC genes and whether they were up-regulated in intermediate cells and cells in which contractility was induced. Further, we note that all the RNA-Seq datasets are available for use on GEO (GSE295753).  

      In our previous publication, we indeed included images showing that loricrin and filaggrin were both still expressed in the differentiated epidermis in the spastin mutant. Both Flg and Lor mRNA were up in the RNA-Seq (although only Flg was statistically significant), though we didn’t see a notable change in protein levels. It is unclear whether this is just difficult to see on top of the normal expression, or whether there are additional levels of regulation where mRNA levels are increased but protein isn’t. That said, our data clearly show that other genes associated with granular fate were increased in the contractile skin. 

      Reviewer #2 (Public review): 

      Summary: 

      The manuscript from Prado-Mantilla and co-workers addresses mechanisms of embryonic epidermis development, focusing on the intermediate layer cells, a transient population of suprabasal cells that contributes to the expansion of the epidermis through proliferation. Using bulk-RNA they show that these cells are transcriptionally distinct from the suprabasal spinous cells and identify specific marker genes for these populations. They then use transgenesis to demonstrate that one of these selected spinous layer-specific markers, the transcription factor MafB is capable of suppressing proliferation in the intermediate layers, providing a potential explanation for the shift of suprabasal cells into a non-proliferative state during development. Further, lineage tracing experiments show that the intermediate cells become granular cells without a spinous layer intermediate. Finally, the authors show that the intermediate layer cells express higher levels of contractility-related genes than spinous layers and overexpression of cytoskeletal regulators accelerates the differentiation of spinous layer cells into granular cells. 

      Overall the manuscript presents a number of interesting observations on the developmental stage-specific identities of suprabasal cells and their differentiation trajectories and points to a potential role of contractility in promoting differentiation of suprabasal cells into granular cells. The precise mechanisms by which MafB suppresses proliferation, how the intermediate cells bypass the spinous layer stage to differentiate into granular cells, and how contractility feeds into these mechanisms remain open. Interestingly, while the mechanosensitive transcription factor YAP appears deferentially active in the two states, it is shown to be downstream rather than upstream of the observed differences in mechanics. 

      Strengths: 

      The authors use a nice combination of RNA sequencing, imaging, lineage tracing, and transgenesis to address the suprabasal to granular layer transition. The imaging is convincing and the biological effects appear robust. The manuscript is clearly written and logical to follow. 

      Weaknesses: 

      While the data overall supports the authors' claims, there are a few minor weaknesses that pertain to the aspect of the role of contractility, The choice of spastin overexpression to modulate contractility is not ideal as spastin has multiple roles in regulating microtubule dynamics and membrane transport which could also be potential mechanisms explaining some of the phenotypes. Use of Arghap11 overexpression mitigates this effect to some extent but overall it would have been more convincing to manipulate myosin activity directly. It would also be important to show that these manipulations increase the levels of F-actin and myosin II as shown for the intermediate layer. It would also be logical to address if further increasing contractility in the intermediate layer would enhance the differentiation of these cells. 

      We agree with the reviewer that the development of additional tools to precisely control myosin activity will be of great use to the field. That said, our series of publications has clearly demonstrated that ablating microtubules results in increased contractility and that this phenocopies the effects of Arhgef11 induced contractility. Further, we showed that these phenotypes were rescued by myosin inhibition with blebbistatin. Our prior publications also showed a clear increase in junctional acto-myosin through expression of either spastin or Arhgef11, as well as increased staining for the tension sensitive epitope of alpha-catenin (alpha18).  We are not aware of tools that allow direct manipulation of myosin activity that currently exist in mouse models.  

      The gene expression analyses are relatively superficial and rely heavily on GO term analyses which are of course informative but do not give the reader a good sense of what kind of genes and transcriptional programs are regulated. It would be useful to show volcano plots or heatmaps of actual gene expression changes as well as to perform additional analyses of for example gene set enrichment and/or transcription factor enrichment analyses to better describe the transcriptional programs 

      We have included an excel document that lists all the gene signatures. In addition, a volcano plot is included in the new Fig 2, Supplement 1. All our NGS data are deposited in GEO for others to perform these analyses. As the paper does not delve further into transcriptional regulation, we do not specifically present this information in the paper.  

      Claims of changes in cell division/proliferation changes are made exclusively by quantifying EdU incorporation. It would be useful to more directly look at mitosis. At minimum Y-axis labels should be changed from "% Dividing cells" to % EdU+ cells to more accurately represent findings 

      We changed the axis label to precisely match our analysis. We note that Figure 1, Supplement 1 also contains data on mitosis.  

      Despite these minor weaknesses the manuscript is overall of high quality, sheds new light on the fundamental mechanisms of epidermal stratification during embryogenesis, and will likely be of interest to the skin research community. 

      Reviewer #3 (Public review): 

      Summary: 

      This is an interesting paper by Lechler and colleagues describing the transcriptomic signature and fate of intermediate cells (ICs), a transient and poorly defined embryonic cell type in the skin. ICs are the first suprabasal cells in the stratifying skin and unlike later-developing suprabasal cells, ICs continue to divide. Using bulk RNA seq to compare ICs to spinous and granular transcriptomes, the authors find that IC-specific gene signatures include hallmarks of granular cells, such as genes involved in lipid metabolism and skin barrier function that are not expressed in spinous cells. ICs were assumed to differentiate into spinous cells, but lineage tracing convincingly shows ICs differentiate directly into granular cells without passing through a spinous intermediate. Rather, basal cells give rise to the first spinous cells. They further show that transcripts associated with contractility are also shared signatures of ICs and granular cells, and overexpression of two contractility inducers (Spastin and ArhGEF-CA) can induce granular and repress spinous gene expression. This contractility-induced granular gene expression does not appear to be mediated by the mechanosensitive transcription factor, Yap. The paper also identifies new markers that distinguish IC and spinous layers and shows the spinous signature gene, MafB, is sufficient to repress proliferation when prematurely expressed in ICs. 

      Strengths: 

      Overall this is a well-executed study, and the data are clearly presented and the findings convincing. It provides an important contribution to the skin field by characterizing the features and fate of ICs, a much-understudied cell type, at high levels of spatial and transcriptomic detail. The conclusions challenge the assumption that ICs are spinous precursors through compelling lineage tracing data. The demonstration that differentiation can be induced by cell contractility is an intriguing finding and adds a growing list of examples where cell mechanics influence gene expression and differentiation. 

      Weaknesses: 

      A weakness of the study is an over-reliance on overexpression and sufficiency experiments to test the contributions of MafB, Yap, and contractility in differentiation. The inclusion of loss-offunction approaches would enable one to determine if, for example, contractility is required for the transition of ICs to granular fate, and whether MafB is required for spinous fate. Second, whether the induction of contractility-associated genes is accompanied by measurable changes in the physical properties or mechanics of the IC and granular layers is not directly shown. The inclusion of physical measurements would bolster the conclusion that mechanics lies upstream of differentiation. 

      We agree that loss of function studies would be useful. For MafB, these have been performed in cultured human keratinocytes, where loss of MafB and its ortholog cMaf results in a phenotype consistent with loss of spinous differentiation (Pajares-Lopez et al, 2015). Due to the complex genetics involved, generating these double mutant mice is beyond the scope of this study. Loss of function studies of myosin are also complicated by genetic redundancy of the non-muscle type II myosin genes, as well as the role for these myosins in cell division and in actin cross linking in addition to contractility. In addition, we have found that these myosins are quite stable in the embryonic intestine, with loss of protein delayed by several days from the induction of recombination. Therefore, elimination of myosins by embryonic day e14.5 with our current drivers is not likely possible. Generation of inducible inhibitors of contractility is therefore a valuable future goal. 

      Several recent papers have used AFM of skin sections to probe tissue stiffness. We have not attempted these studies and are unclear about the spatial resolution and whether, in the very thin epidermis at these stages, we could spatially resolve differences. That said, we previously assessed the macro-contractility of tissues in which myosin activity was induced and demonstrated that there was a significant increase in this over a tissue-wide scale (Ning et al, Cell Stem Cell, 2021).  

      Finally, whether the expression of granular-associated genes in ICs provides them with some sort of barrier function in the embryo is not addressed, so the role of ICs in epidermal development remains unclear. Although not essential to support the conclusions of this study, insights into the function of this transient cell layer would strengthen the overall impact.  

      By traditional dye penetration assays, there is no epidermal barrier at the time that intermediate cells exist. One interpretation of the data is that cells are beginning to express mRNAs (and in some cases, proteins) so that they are able to rapidly generate a barrier as they become granular cells. In addition, many EDC genes, important for keratinocyte cornification and barrier formation, are not upregulated in ICs at E14.5. We have attempted experiments to ablate intermediate cells with DTA expression - these resulted in inefficient and delayed death and thus did not yield strong conclusions about the role of intermediate cells. Our findings that transcriptional regulators of granular differentiation (such as Grhl3 and Hopx) are also present in intermediate cells, should allow future analysis of the effects of their ablation on the earliest stages of granular differentiation from intermediate cells. In fact, previous studies have shown that Grhl3 null mice have disrupted barrier function at embryonic stages (Ting et al, 2005), supporting the role of ICs in being important for barrier formation. (?)

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors): 

      Overall, this is a well-written manuscript and a comprehensive dataset. Only the RNA sequencing result should be presented more transparently providing the full lists of regulated genes instead of presenting just the GO analysis and selected target genes so that this analysis can serve as a useful repository. The authors themselves have profited from and used the published dataset of gene expression of the granular cells. Moreover, some of the previous data should be better discussed though. The authors state that forced suprabasal contractility in their mouse models induces the expression of some genes of the epidermal differentiation complex (EDC). However, in their previous publication, the authors showed that major classical EDC genes are actually not regulated like filaggrin and loricrin (Muroyama and Lechler eLife 2017). This should be discussed better and necessitates including the full list of regulated genes to show what exactly is regulated. 

      A general point regarding statistics throughout the manuscript. It seems like regular T-tests or ANOVAs have been used assuming Gaussian distribution for sample sizes below N=5 which is technically not correct. Instead, non-parametric tests like e.g. the Mann-Whitney test should be used. Since Graph-Pad was used for statistics according to the methods this is easy to change. 

      Figure 1: It would be good to show the FACS plot of the analyzed and sorted population in the supplementary figures. 

      If granular cells can be analyzed and detected by FACS, why were they not included in the RNA sequencing analysis? 

      Figure 1 supplement 1c: cell division numbers are analyzed from only 2 mice and the combined 5 or 4 fields of view are used for statistics using a test assuming normal distribution which is not really appropriate. Means per mice should be used or if accumulated field of views are used, the number should be increased using more stringent tests. Otherwise, the p-values here clearly overstate the significance. 

      Granular cells could not be specifically isolated in the approach we used. The lectin binds to both upper spinous and granular cells. For this reason, we relied on a separate granular gene list as described.

      For Figure 1 Supplement 1, we removed the statistical analysis and use it simply as a validation of the data in Figure 1.  

      Figure 2: It is not completely clear on which basis the candidate genes were picked. They are described to be the most enriched but how do they compare to the rest of the enriched genes. The full list of regulated genes should be provided. 

      Some markers for IC or granular layer are verified either by RNA scope or immunofluorescence. Is there a technical reason for that? It would be good to compare protein levels for all markers.  Figure 2-Supplement 1: There is no statement about the number of animals that these images are representative for. 

      We have included a volcano plot to show where the genes picked reside. We have also included the full gene lists for interested readers. 

      When validated antibodies were available, we used them. When they were not, we performed RNA-Scope to validate the RNA-Seq dataset. 

      We have included animal numbers in the revised Fig 2-Supplement 2 legend (previously Fig 2Supplement 1).  

      Figure 4b: It would be good to include the E16 spinous cells to get an idea of how much closer ICs are to the granular population. 

      We have included a new Venn diagram showing the overlap between each of the IC and spinous signatures with the granular cell signature in Fig 4B. Overall, 36% of IC signature genes are in common with granular cells, while just 20% of spinous genes overlap.  

      Reviewer #2 (Recommendations for the authors): 

      (1)  Figure 6B is confusing as y-axis is labeled as EdU+ suprabasal cells whereas basal cells are also quantified. 

      We have altered the y-axis title to make it clearer.  

      (2)  Not clear why HA-control is sometimes included and sometimes not. 

      We include the HA when it did not disrupt visualization of the loss of fluorescence. As it was uniform in most cases, we excluded it for clarity in some images. HA staining is now included in Fig 3C.

      (3)  The authors might reconsider the title as it currently is somewhat vague, to more precisely represent the content of the manuscript. 

      We thank the reviewer for the suggestion. We considered other options but felt that this gave an overview of the breadth of the paper.  

      Reviewer #3 (Recommendations for the authors): 

      (1)  ICs are shown to express Tgm1 and Abca12, important for cornified envelope function and formation of lamellar bodies. Do ICs provide any barrier function at E14.5? 

      By traditional dye penetration assays, there is no epidermal barrier at the time that intermediate cells exist. One interpretation of the data is that cells are beginning to express mRNAs (and in some cases, proteins) so that they are able to rapidly generate a barrier as they become granular cells.  

      (2)  Genes associated with contractility are upregulated in ICs and granular cells. And ICs have higher levels of F-actin, MyoIIA, alpha-18, and nuclear Yap. Does this correspond to a measurable difference in stiffness? Can you use AFM to compare to physical properties of ICs, spinous, and granular cells? 

      Several recent papers have used AFM of skin sections to probe tissue stiDness. We have not attempted these studies and are unclear about the spatial resolution and whether in the very thin epidermis at these stages whether we could spatially resolve diDerences. It is also important to note that this tissue rigidity is influenced by factors other than contractility. That said, we previously assessed the macro-contractility of tissues in which myosin activity was induced and demonstrated that there was a significant increase in this over a tissue-wide scale (Ning et al, Cell Stem Cell, 2021).

      (3)  Overexpression of two contractility inducers (spastin and ArhGEF-CA) can induce granular gene expression and repress spinous gene expression, suggesting differentiation lies downstream of contractility. Is contractility required for granular differentiation? 

      This is an important question and one that we hope to directly address in the future. Published studies have shown defects in tight junction formation and barrier function in myosin II mutants. However, a thorough characterization of differentiation was not performed.  

      (4)  ICs are a transient cell type, and it would be important to know what is the consequence of the epidermis never developing this layer. Does it perform an important temporary structural/barrier role, or patterning information for the skin?

      We have attempted experiments to ablate intermediate cells with DTA expression - this resulted in ineDicient and delayed death and thus did not yield strong conclusions. Our findings that transcriptional regulators of granular diDerentiation (such as Grhl3 and Hopx) are also present in intermediate cells, should allow future analysis of the eDects of their ablation on the earliest stages of granular diDerentiation from intermediate cells.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The manuscript by Shi et al, has utilized multiple imaging datasets and one set of samples for analyzing serum EV-miRNAs & EV-RNAs to develop an EV miRNA signature associated with disease-relevant radiomics features for early diagnosis of pancreatic cancer. CT imaging features (in two datasets (UMMD & JHC and WUH) were derived from pancreatic benign disease patients vs pancreatic cancer cases), while circulating EV miRNAs were profiled from samples obtained from a different center (DUH). The EV RNA signature from external public datasets (GSE106817, GSE109319, GSE113486, GSE112264) were analyzed for differences in healthy controls vs pancreatic cancer cases. The miRNAs were also analyzed in the TCGA tissue miRNA data from normal adjacent tissue vs pancreatic cancer.

      Strengths:

      The concept of developing EV miRNA signatures associated with disease relevant radiomics features is a strength.

      Weaknesses:

      While the overall concept of developing EV miRNA signature associated with radiomics features is interesting, the findings reported are not convincing for the reasons outlined below:

      (1) Discrepant datasets for analyzing radiomic features with EV-miRNAs: It is not justified how CT images (UMMD & JHC and WUH) and EV-miRNAs (DUH) on different subjects and centers/cohorts shown in Figures 1 &2 were analyzed for association. It is stated that the samples were matched according to age but there is no information provided for the stages of pancreatic cancer and the kind of benign lesions analyzed in each instance.

      Thank you to the reviewer for the valuable comments. We acknowledge that the radiomics data and EV-miRNA data were derived from different patient cohorts. The primary aim of this study was to explore the integration of data from different omics sources in an exploratory manner to identify potential shared biological features.

      We have revised the Methods section accordingly. Regarding the imaging data, we mainly performed batch effect correction on CT images from different centers to eliminate variability. As you correctly pointed out, the EV-miRNA data and CT images from DUH were matched by age. Since all the patients we included had early-stage pancreatic cancer, and the benign pancreatic lesions were predominantly IPMN, we did not specifically highlight this aspect. However, we have now clarified this approach in the data collection section. Thank you for your attention.

      (2) The study is focused on low-abundance miRNAs with no adequate explanation of the selection criteria for the miRNAs analyzed.

      We used MAD (Median Absolute Deviation) to filter low-abundance miRNAs in the manuscript, as this concept was introduced by us for the first time in this context, and we acknowledge that there is still considerable room for refinement and improvement.

      (3) While EV-miRNAs were profiled or sequenced (not well described in the Methods section) with two different EV isolation methods, the authors used four public datasets of serum circulating miRNAs to validate the findings. It would be better to show the expression of the three miRNAs in the additional dataset(s) of EV-miRNAs and compare the expressions of the three EV-miRNAs in pancreatic cancer with healthy and benign disease controls.

      Thank you for your suggestion. We have attempted to identify available EV-miRNA datasets; however, due to current limitations in data access, we opted to use serum samples for validation. In our follow-up studies, we are already in the process of collecting relevant EV samples for further validation.

      (4) It is not clear how the 12 EV-miRNAs in Figure 4C were identified.

      These 12 EV-miRNAs were identified through WGCNA analysis and are associated with the high-risk group.

      (5) Box plots in Figures 4D-F and G-I of three miRNAs in serum and tissue should show all quantitative data points.

      We have completed the revisions. Kindly review them at your convenience.

      (6) What is the GBM model in Figure 5?

      Thank you to the reviewer for raising this question. The "GBM model" referred to in Figure 5 is a classification model built using the Gradient Boosting Machine (GBM) algorithm, designed to predict the diagnostic status of pancreatic cancer by integrating EV-miRNA expression and radiomics features. We implemented the model using the `GradientBoostingClassifier` from the scikit-learn library (version 1.2.2), and optimized the model’s hyperparameters—including learning rate, maximum depth, and number of trees—within a five-fold cross-validation framework. The training process and performance evaluation of the model, including the ROC curve and AUC values, are presented in Figure 5.

      (7) What are the AUCs of individual EV-miRNAs integrated as a panel of three EV-miRNAs?

      Thanks for your comments, Our GBM model integrates the panel of these three EV-miRNAs.

      (8) The authors could have compared the performance of CA19-9 with that of the three EV-miRNAs.

      Since our main focus is on the panel of three EV-miRNAs, we did not present the AUC for each individual miRNA separately. However, we have included the performance of CA19-9 in our dataset as a reference. The predictive AUC for CA19-9 is 0.843 (95% CI, 0.762–0.924).

      (9) How was the diagnostic performance of the three EV-miRNAs in the two molecular subtypes identified in Figure 6&7? Do the C1 & C2 clusters correlate with the classical/basal subtypes, staging, and imaging features?

      Thank you to the reviewer for raising this important question. In fact, our EV panel is primarily designed to distinguish between normal and tumor samples, whereas both C1 and C2 represent tumor subtypes, and thus the panel is not applicable for diagnostic purposes in this context. Additionally, our subtypes are novel and do not align with the conventional classical and basal-like gene expression profiles. Furthermore, the C1 subtype is more frequently observed in stage III tumors (Figure 6J) and is associated with distinct imaging features such as higher texture heterogeneity and lower CT density.

      Reviewer #2 (Public review):

      Summary:

      This study investigates a low abundance microRNA signature in extracellular vesicles to subtype pancreatic cancer and for early diagnosis. There are several major questions that need to be addressed. Numerous minor issues are also present.

      Strengths:

      The authors did a comprehensive job with numerous analyses of moderately sized cohorts to describe the clinical and translational significance of their miRNA signature.

      Weaknesses:

      There are multiple weaknesses of this study that should be addressed:

      (1) The description of the datasets in the Materials and Methods lacks details. What were the benign lesions from the various hospital datasets? What were the healthy controls from the public datasets? No pancreatic lesions? No pancreatic cancer? Any cancer history or other comorbid conditions? Please define these better.

      We sincerely thank the reviewer for the detailed and important suggestions regarding sample definition. Indeed, the source of the datasets and the definition of control groups are critical for ensuring the rigor and interpretability of the study. In response to this comment, we have added clarifications in the revised "Materials and Methods" section.

      First, for the benign lesion group derived from various clinical centers (DUH, UMMD, WUH, etc.), we have carefully reviewed the pathological and clinical records and defined these samples as histologically confirmed non-malignant pancreatic lesions, primarily IPMN. All patients in the benign lesion group had no diagnosis of pancreatic cancer at the time of sample collection, and for cohorts with available follow-up data, no evidence of malignant progression was observed within at least six months.

      Second, the healthy control group from public databases was derived from healthy individuals.

      Finally, to eliminate potential confounding factors, we excluded any samples with a history of other malignancies (e.g., breast cancer, colorectal cancer, etc.) from all datasets with available clinical information, to ensure the specificity of the EV-miRNA expression analysis.

      (2) It is unclear how many of the controls and cases had both imaging for radiomics and blood for biomarkers.

      Due to limitations in resource availability, our study does not include samples with both CT imaging and serological data from the same individuals. Instead, we integrated blood samples and CT imaging data collected from different clinical centers.

      (3) The authors should define the imaging methods and protocols used in more detail. For the CT scans, what slice thickness? Was a pancreatic protocol used? What phase of contrast is used (arterial, portal venous, non-contrast)? Any normalization or pre-processing?

      Thank you to the reviewer for the professional suggestions regarding the imaging section. We have added detailed technical information on CT imaging in the revised Materials and Methods section. All CT images were acquired using a 64-slice multidetector spiral CT scanner, with a standard slice thickness of 1.0–1.5 mm and a reconstruction interval of 1 mm. All pancreatic cancer patients underwent a standard pancreatic protocol triphasic contrast-enhanced CT examination, which included non-contrast, arterial phase (approximately 25–30 seconds), and portal venous phase (approximately 65–70 seconds) imaging.

      For the radiomics analysis, images from the portal venous phase were selected, as this phase provides consistent clarity in delineating tumor boundaries and surrounding vasculature. To ensure data consistency, all imaging data underwent preprocessing, including resampling, intensity normalization of grayscale values (standardized using z-score normalization to a mean of 0 and a standard deviation of 1), and N4 bias field correction to address potential low-frequency signal inhomogeneities.

      (4) Who performed the segmentation of the lesions? An experienced pancreatic radiologist? A student? How did the investigators ensure that the definition of the lesions was performed correctly? Raidomics features are often sensitive to the segmentation definitions.

      All lesion segmentations were performed on portal venous phase contrast-enhanced CT images. Manual delineation was conducted using 3D Slicer (version 4.11) by two radiologists with extensive experience in pancreatic tumor diagnosis. A consensus was reached between the two radiologists on the ROI definition criteria prior to analysis.

      To further assess the robustness of radiomic features to segmentation boundary variations, we selected a subset of representative cases and created “expanded/shrunk ROIs” by adding or subtracting a 2-pixel margin at the lesion boundary. Feature extraction was then repeated, and the coefficient of variation (CV) for the main features included in the model was found to be below 10%, indicating that the model is stable with respect to minor boundary fluctuations.

      (5) Figure 1 is full of vague images that do not convey the study design well. Numbers from each of the datasets, a summary of what data was used for training and for validation, definitions of all of the abbreviations, references to the Roman numerals embedded within the figure, and better labeling of the various embedded graphs are needed. It is not clear whether the graphs are real results or just artwork to convey a concept. I suspect that they are just artwork, but this remains unclear.

      We thank the reviewer for the detailed feedback on Figure 1. We would like to clarify that Figure 1 is a conceptual schematic intended to visually illustrate the overall design of the study, the relationships among different data modules, and the logical sequence of the analytical strategy. It is not meant to present actual results or quantitative details.

      Regarding the reviewer’s concerns about sample sizes, the division between training and validation cohorts, explanations of specific abbreviations, and the precise meaning of each panel, we have provided comprehensive and detailed clarifications in Figure 2.

      (6) The DF selection process lacks important details. Please reference your methods with the Boruta and Lasso models. Please explain what machine learning algorithms were used. There is a reference in the "Feature selection.." section of "the model formula listed below" but I do not see a model formula below this paragraph.

      We thank the reviewer for the thoughtful and detailed comments on the feature selection strategy. We first applied the Boruta algorithm (based on random forests, implemented using the Boruta R package) to the original feature set—which included both radiomics and EV-miRNA features—to identify variables that consistently demonstrated importance across multiple rounds of random resampling.

      Subsequently, we used LASSO regression with five-fold cross-validation to further reduce the dimensionality of the Boruta-selected features and to construct the final feature set used for modeling. The formula for the model is as follows: each regression coefficient is multiplied by the corresponding feature expression level, and the resulting products are summed to generate the Risk Score.

      (7) In Figure 2, more quantitative details are needed. How are patients dichotomized into non-obese and obese? What does alcohol/smoking mean? Is it simply no to both versus one or the other as yes? These two risk factors should be separated and pack years of smoking should be reported. The details of alcohol use should also be provided. Is it an alcohol abuse history? Any alcohol use, including social drinking? Similarly, "diabetes" needs to be better explained. Type I, type II, type 3c? P values should be shown to demonstrate any statistically significant differences in the proportions of the patients from one dataset to another.

      Our definition of obesity was based on the standard BMI threshold (30 kg/m²). A history of smoking or alcohol consumption was defined as continuous use for more than one year. Specific details regarding smoking and alcohol use were recorded at baseline under the category of “smoking/alcohol history”; unfortunately, we did not collect follow-up data on these variables. As for diabetes, only type II diabetes was documented. Statistically significant p-values have been added. Thank you.

      (8) In the section "Different expression radiomic features between pancreatic benign lesions and aggressive tumors", there is a reference to "MUJH" for the first time. What is this? There is also the first reference to "aggressive tumors" in the section. Do the authors just mean the cases? Otherwise there is no clear definition of "aggressive" (vs. indolent) pancreatic cancer. This terminology of tumor "aggressiveness" either needs to be removed or better defined.

      We have corrected the abbreviation (MUJH); it should in fact be JHC. Additionally, regarding the term "aggressive," we have reviewed the literature and used it to convey the highly malignant nature of pancreatic cancer.

      (9) Figure 3 needs to have the specific radiomic features defined and how these features were calculated. Labeling them as just f1, f2, etc is not sufficient for another group to replicate the results independently.

      We have presented these features in Supplementary Table 1. Kindly refer to it for details.

      (10) It is not clear what Figure 4A illustrates as regards model performance. What do the different colors represent, and what are the models used here? This is very confusing.

      This represents the correlation between WGCNA modules and miRNAs. Different module colors indicate distinct miRNA clusters—for example, the green module contains 12 miRNAs grouped together. The colors themselves do not carry any intrinsic meaning.

      (11) Figure 5 shows results for many more model runs than the described 10, please explain what you are trying to convey with each row. What are "Test A" and "Test B"? There is no description in the manuscript of what these represent. In the figure caption, there is a reference to "our center data" which is not clear. Be more specific about what that data is.

      We have indicated this using arrows in Figure 5 from Test A/B/C. Please check.

      (12) Figure 6 describes the subtypes identified in this study, but the authors do not show a multi-variable cox proportional hazards model to show that this subtype classification independently predicts DFS and OS when incorporating confounding variables. This is essential to show the subtypes are clinically relevant. In particular, the authors need to account for the stage of the patients, and receipt of chemotherapy, surgery, and radiation. If surgery was done, we need to know whether they had R1 or R0 resection. The details about the years in which patients were included is also important.

      We sincerely thank the reviewer for this critical comment. We fully agree that incorporating a multivariate Cox proportional hazards model to control for potential confounding factors would provide a more robust validation of the independent prognostic value of our proposed subtypes for DFS and OS.

      However, as the clinical data used in this study were retrospectively collected and access to certain variables is currently restricted, we were only able to obtain limited clinical information. At this stage, we are unable to systematically include key variables such as tumor staging, adjuvant chemoradiotherapy regimens, and resection margin status (R0 vs. R1), which prevents us from performing a rigorous multivariate Cox analysis.

      Similarly, regarding the postoperative resection status, after reviewing the original surgical reports and pathology records, we regret to confirm that margin status (R0 vs. R1) is missing in a substantial portion of cases, making it unsuitable for reliable statistical analysis.

      We fully acknowledge this as a limitation of the current study and have explicitly addressed it in the Discussion section. To address this gap, we are currently designing a more comprehensive prospective cohort study, which will allow us to validate the clinical independence and utility of the proposed subtypes in future research.

      (13) How do these subtypes compare to other published subtypes?

      We sincerely thank the reviewer for raising this important point. Clusters 1 and 2 represent a novel molecular classification proposed for the first time in this study, driven by EV-miRNA profiles. This classification approach is conceptually independent from traditional transcriptome-based subtyping systems, such as the classical/basal-like subtypes, as well as other existing classification schemes. Comparisons with previously reported subtypes and validation of clinical relevance will require further investigation in future studies.

      Reviewer #3 (Public review):

      Summary:

      The authors appear to be attempting to identify which patients with benign lesions will progress to cancer using a liquid biomarker. They used radiomics and EV miRNAs in order to assess this.

      Strengths:

      It is a strength that there are multiple test datasets. Data is batch-corrected. A relatively large number of patients is included. Only 3 miRNAs are needed to obtain their sensitivity and specificity scores.

      Weaknesses:

      This manuscript is not clearly written, making interpretation of the quality and rigor of the data very difficult. There is no indication from the methods that the patients in their cohorts who are pancreatic cancer patients (from the CT images) had prior benign lesions, limiting the power of their analysis. The data regarding the cluster subtypes is very confusing. There is no discussion or comparison if these two clusters are just representing classical and basal subtypes (which have been well described).

      Sorry,we don’t have the data of record from patients, in addition, Regarding the relationship between Cluster 1/Cluster 2 and classical subtypes:We are very grateful for the reviewer’s insightful question. We would like to clarify that Clusters 1 and 2, as shown in Figures 6 and 7, are derived from a novel EV-miRNA–driven molecular classification proposed for the first time in this study. This classification system is constructed independently of the traditional transcriptome-based classical/basal-like subtypes.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      There are errors in reference citations and several typos, misspellings, and grammatical errors throughout the manuscript.

      We have made the necessary revisions.

      Reviewer #2 (Recommendations for the authors):

      (1) Were the radiomic features associated with the subtypes and prognostic in the subset of patients who had CT scans?

      Unfortunately, there are no corresponding CT imaging results available for these cases, as the genes were identified based on predicted miRNA targets and were not derived from patients who had undergone CT scans.

      (2) There is a whole body of literature on prognostic imaging-based subtypes of pancreatic cancer that needs to be cited.

      Thank you for your suggestion. We have cited the relevant references accordingly in the manuscript.

      (3) Similarly, the authors should be more comprehensive about prognostic and early detection markers for miRNAs for pancreatic cancer. Early detection markers really should be described separately from prognostic markers. The authors did not do a PROBE phase 3 study, so early detection is not really relevant. Please see https://edrn.nci.nih.gov/about-edrn/five-phase-approach-and-prospective-specimen-collection-retrospective-blinded-evaluation-study-design/

      The primary objective of our study is early detection. We acknowledge the absence of third-phase validation results, which we will address in the limitations section. Additionally, the subtype classification represents our secondary objective.

      (4) If they want to couch this as a PROBE phase 2 study, then they should review the PROBE guidelines and ensure they are meeting standards. Many of the comments above regarding methodologies, definitions, and patient cohort descriptions would address this concern.

      We have revised the Methods section accordingly. Please kindly review the updated version.

      (5) The entire manuscript needs to have a review for the use of the English language. There are numerous typos and grammatical errors that make this manuscript difficult to follow and hard to interpret.

      We have revised the Methods section accordingly. Please kindly review the updated version.

      (6) In the section on "Definition and identification of low abundance EV-derived miRNA transcripts", provide a reference for the "edger" function.

      We have revised the Methods section accordingly. Please kindly review the updated version.

      (7) In the Abstract: The purpose section only mentions early diagnosis as the goal of this study. It seems subtyping is also a major goal, but it is not mentioned.

      The primary objective of our study is early detection.Additionally, the subtype classification represents our secondary objective.so,we didn’t add it in the purpose.

      (8) The experimental design fails to describe any of the 8 datasets that were used. How many patients? What were the ethnic and racial backgrounds, which is one of the key aspects of this study and mentioned in the title? What range of stages? When were the images and the blood collected in relation to diagnosis? Over what time frame were the patients included? What patients were excluded, if any? These details are important to understand the materials used, along with the methods to design the signatures and models.

      We have revised the Methods section accordingly. Please kindly review the updated version.

      (9) Again, the purpose section of the abstract does not align with the rest of the study, including the description of the experimental design. The last sentence of the experimental design section mentions predicting drug sensitivity and survival, which is unrelated to the aim of early diagnosis.

      We have revised the Methods section accordingly. Please kindly review the updated version.

      (10) The results section lacks key details to indicate the impact of the work. Vague descriptions of the findings are not sufficient. The performance of the biomarkers to differentiate benign from malignant lesions, hazard ratios, survival times, and p values should be reported for key results.

      Our aim was to develop an integrated panel for diagnostic purposes; therefore, we provided the AUC to evaluate its performance. However, since this is a diagnostic model, we did not include hazard ratios or survival time data.

      (11) What are "tow" molecular subtypes of pancreatic cancer? Did you mean "two"? What system was used to subtype the pancreatic cancers? Is some new subtyping or a previously published method to subtype the disease?

      Yes, it means two, previously published method.In method part, we have describe it.

      Reviewer #3 (Recommendations for the authors):

      The writing of this manuscript needs extensive re-wording and clarification to increase the readability and interpretability of the data presented. The authors could include a dataset of pancreatic cancer patient imaging data where the status of prior benign lesions was detected (as opposed to patients with benign lesions that do not develop pancreatic cancer). The authors could also address if their clusters 1 and 2 are representing (or are correlated with) the classical and basal subtypes that have been well described for pancreatic cancer.

      Thank you to the reviewer for the constructive comments. We sincerely appreciate your careful review, particularly regarding language clarity, data interpretability, and subtype correlation. To enhance the readability and scientific precision of the manuscript, we have conducted a thorough revision and language polishing throughout the text, improving logical structure, terminology consistency, and clarity in result descriptions. We have especially reinforced the Methods and Discussion sections to better explain key analytical steps and data interpretation.

      We fully understand the reviewer’s suggestion to include information on “the presence of benign lesions prior to pancreatic cancer diagnosis.” However, due to the retrospective nature of our study, the current imaging and EV-miRNA datasets do not contain systematically collected follow-up annotations of this type. Therefore, it is not feasible to incorporate such data into the present manuscript.

      That said, we fully recognize the importance of this direction. In future studies, we plan to evaluate longitudinal samples to investigate the dynamic changes in EV-miRNAs and imaging features during the progression from premalignant to malignant states, aiming to clarify their potential value for early cancer warning.

      Regarding the relationship between Cluster 1/Cluster 2 and classical subtypes:We are very grateful for the reviewer’s insightful question. We would like to clarify that Clusters 1 and 2, as shown in Figures 6 and 7, are derived from a novel EV-miRNA–driven molecular classification proposed for the first time in this study. This classification system is constructed independently of the traditional transcriptome-based classical/basal-like subtypes.

      Although we attempted a cross-comparison with existing TCGA subtypes, differences in data origin, analysis modality (EV-miRNA vs. tissue transcriptome), and limitations in sample matching prevent us from establishing a direct correspondence. In the revised Discussion, we have emphasized that these two classification approaches are complementary rather than equivalent, reflecting different dimensions of tumor heterogeneity. Further integrative multi-omics studies will be needed to validate their biological significance and clinical utility.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary: Zhu et al., investigate the cellular defects in glia as a result of loss in DEGS1/ifc encoding the dihydroceramide desaturase. Using the strength of Drosophila and its vast genetic toolkit, they find that DEGS1/ifc is mainly expressed in glia and its loss leads to profound neurodegeneration. This supports a role for DEGS1 in the developing larval brain as it safeguards proper CNS development. Loss of DEGS1/ifc leads to dihydroceramide accumulation in the CNS and induces alteration in the morphology of glial subtypes and a reduction in glial number. Cortex and ensheathing glia appeared swollen and accumulated internal membranes. Astrocyte-glia on the other hand displayed small cell bodies, reduced membrane extension and disrupted organization in the dorsal ventral nerve cord. They also found that DEGS1/ifc localizes primarily to the ER. Interestingly, the authors observed that loss of DEGS1/ifc drives ER expansion and reduced TGs and lipid droplet numbers. No effect on PC and PE and a slight increase in PS.

      The conclusions of this paper are well supported by the data. The study could be further strengthened by a few additional controls and/or analyses.

      Strengths:

      This is an interesting study that provides new insight into the role of ceramide metabolism in neurodegeneration.

      The strength of the paper is the generation of LOF lines, the insertion of transgenes and the use of the UAS-GAL4/GAL80 system to assess the cell-autonomous effect of DEGS1/ifc loss in neurons and different glial subtypes during CNS development.

      The imaging, immunofluorescence staining and EM of the larval brain and the use of the optical lobe and the nerve cord as a readout are very robust and nicely done.

      Drosophila is a difficult model to perform core biochemistry and lipidomics but the authors used the whole larvae and CNS to uncover global changes in mRNA levels related to lipogenesis and the unfolded protein responses as well as specific lipid alterations upon DEGS1/ifc loss.

      Weaknesses:

      (1) The authors performed lipidomics and RTqPCR on whole larvae and larval CNS from which it is impossible to define the cell type-specific effects. Ideally, this could be further supported by performing single cell RNAseq on larval brains to tease apart the cell-type specific effect of DEGS1/ifc loss.

      We agree that using scRNAseq or pairing FACS-sorting of individual glial subtypes with bulk RNAseq would help tease apart the cell-type specific effects of DEGS1/ifc loss on glial cells. At this time, however, this approach extends beyond the scope of the current paper and means of the lab. 

      (2) It's clear from the data that the accumulation of dihydroceramide in the ER triggers ER expansion but it remains unclear how or why this happens. Additionally, the authors assume that, because of the reduction in LD numbers, that the source of fatty acids comes from the LDs. But there is no data testing this directly.

      As CERT, the protein that transports ceramide from the ER to the Golgi, is far more efficient at transporting ceramide than dihydroceramide, we speculate that dihydroceramide accumulates in the ER due to inefficient transport from the ER to the Golgi by CERT. We state this model more explicitly in the results under the subheading “Reduction of dihydroceramide synthesis suppresses the ifc CNS phenotype”.

      We agree with the point on lipid droplet. We observe a correlation, not a causation, between reduction of lipid droplets and a large expansion of ER membrane. We have tried to clarify the text in the last paragraph of the discussion to make this point more clearly. See also response to reviewer 2 point 3. 

      (3) The authors performed a beautiful EMS screen identifying several LOF alleles in ifc. However, the authors decided to only use KO/ifcJS3. The paper could be strengthened if the authors could replicate some of the key findings in additional fly lines.

      We agree. We replicated the observed cortex glia swelling, ER expansion in cortex glia, and observed increase in neuronal cell death markers in late-third instar larvae mutant for either the ifcjs1 or ifcjs2 allele. These data are now provided as Supplementary Figure 7.

      (4) The authors use M{3xP3-RFP.attP}ZH-51D transgene as a general glial marker. However, it would be advised to show the % overlap between the glial marker and the RFP since a lot of cells are green positive but not per se RFP positive and vice versa.

      We visually reexamined the expression of the 3xP3 RFP transgene relative to FABP labeling for cortex glia, Ebony for astrocyte-like glia, and the Myr-GFP transgene driven by glial-subtype specific GAL4 driver lines for perineurial, subperineurial, and ensheathing glia. We note that RFP localizes to the nucleus cytoplasm while FABP and Ebony localize to the cytoplasm and Myr-GFP to the cell membrane. Thus, an observed lack of overlap of expression between RFP and the other markers can arise to differential localization of the two markers in the same cells (see, for example, Fig. S2D where Myr-GFP expression in the nuclear envelope encircles that of RFP in the nucleus. Through visual inspection of five larval-brain complexes for each glial subtype marker, we found that essentially all cortex, SPG, and ensheathing glia expressed RFP. Similarly, nearly all astrocyte-like glia also expressed RFP, but they expressed RFP at significantly lower levels than that observed for cortex, SPG, or ensheathing glia. This analysis also confirmed that most perineurial glia do not express RFP. The 3xP3 M{3xP3-RFP.attP}ZH-51D transgene then labels most glia in the Drosophila CNS. We have added text to Supplementary Figure 2 noting the above observations as to which glial cells express RFP. 

      (5) The authors indicate that other 3xP3 RFP and GFP transgenes at other genomic locations also label most glia in the CNS. Do they have a preferential overlap with the different glial subtypes?

      We assessed three different types of 3xP3 RFP and GFP transgenes: M{3xP3RFP.attp} transgenes (n=4), Mi{GFP[E.3xP3]=ET1} transgenes (n=3), and

      Tl{GFP[3xP3.cLa]=CRIMIC.TG4} transgenes (n>6). All labeled cortex glia, but different lines exhibited differential labeling of astrocyte and ensheathing glia. These data are now included as Supplementary Figure 3.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript by Zhu et al. describes phenotypes associated with the loss of the gene ifc using a Drosophila model. The authors suggest their findings are relevant to understanding the molecular underpinnings of a neurodegenerative disorder, HLD-18, which is caused by mutations in the human ortholog of ifc, DEGS1.

      The work begins with the authors describing the role for ifc during fly larval brain development, demonstrating its function in regulating developmental timing, brain size, and ventral nerve cord elongation. Further mechanistic examination revealed that loss of ifc leads to depleted cellular ceramide levels as well as dihydroceramide accumulation, eventually causing defects in ER morphology and function. Importantly, the authors showed that ifc is predominantly expressed in glia and is critical for maintaining appropriate glial cell numbers and morphology. Many of the key phenotypes caused by the loss of fly ifc can be rescued by overexpression of human DEGS1 in glia, demonstrating the conserved nature of these proteins as well as the pathways they regulate. Interestingly, the authors discovered that the loss of lipid droplet formation in ifc mutant larvae within the cortex glia, presumably driving the deficits in glial wrapping around axons and subsequent neurodegeneration, potentially shedding light on mechanisms of HLD-18 and related disorders.

      Strengths:

      Overall, the manuscript is thorough in its analysis of ifc function and mechanism. The data images are high quality, the experiments are well controlled, and the writing is clear.

      Weaknesses:

      (1) The authors clearly demonstrated a reduction in number of glia in the larval brains of ifc mutant flies. What remains unclear is whether ifc loss leads to glial apoptosis or a failure for glia to proliferate during development. The authors should distinguish between these two hypotheses using apoptotic markers and cell proliferation markers in glia.

      To address this point, we used phospho-histone H3 to assess mitotic index in the thoracic CNS of wild-type versus ifc mutant late third instar larvae and found a mild, but significant reduction in mitotic index in ifc mutant relative to wild-type nerve cords. We also assessed the ability of glial-specific expression of the potent anti-apoptotic gene p35 to rescue the observed loss of cortex glia phenotype in the thoracic region of the CNS of otherwise ifc mutant larvae and observed a clear increase in cortex glia in the presence versus the absence of glial-specific p35 expression (p<3 x 10-4). These data are now provided as Supplementary Figure S8 in the paper and referred to on page 8.

      (2) It is surprising that human DEGS1 expression in glia rescues the noted phenotypes despite the different preference for sphingoid backbone between flies and mammals. Though human DEGS1 rescued the glial phenotypes described, can animal lethality be rescued by glial expression of human DEGS1? Are there longer-term effects of loss of ifc that cannot be compensated by the overexpression of human DEGS1 in glia (age-dependent neurodegeneration, etc.)?

      We note explicitly that while glial expression of human DEGS1 does provide rescuing activity, it only partially rescues the ifc mutant CNS phenotype in contrast to glial expression of Drosophila ifc, which fully rescues this phenotype. Thus, the relative activity of human DEGS1 is far below that of Drosophila ifc when assayed in flies. To quantify the functional difference between the two transgenes, we assessed the ability of glial expression of fly ifc or of human DEGS1 to rescue the lethality of otherwise ifc mutant larvae: Glial expression of ifc was sufficient to rescue the adult viability of 57.9% of ifc mutant flies based on expected Mendelian ratios (n=2452), whereas glial expression of DEGS1 was sufficient to rescue just 3.9% of ifc mutant flies (n=1303), uncovering a ~15-fold difference in the ability of the two transgenes to rescue the lethality of otherwise ifc mutant flies. In the absence of either transgene, no ifc mutant larvae reached adulthood (n=1030). These data are now provided in the text on page 9 of the revised manuscript. 

      (3) The mechanistic link between the loss of ifc and lipid droplet defects is missing. How do defects in ceramide metabolism alter triglyceride utilization and storage? While the author's argument that the loss of lipid droplets in larval glia will lead to defects in neuronal ensheathment, a discussion of how this is linked to ceramides needs to be added.

      We have revised the text to address this point. We speculate that the apparent increased demand for membrane phospholipid synthesis may drive the depletion of lipid droplets, providing a link to ifc function and ceramides. Below we provide the rewritten last paragraph; the underlined section is the new text.  

      “The expansion of ER membranes coupled with loss of lipid droplets in ifc mutant larvae suggests that the apparent demand for increased membrane phospholipid synthesis may drive lipid droplet depletion, as lipid droplet catabolism can release free fatty acids to serve as substrates for lipid synthesis. At some point, the depletion of lipid droplets, and perhaps free fatty acids as well, would be expected to exhaust the ability of cortex glia to produce additional membrane phospholipids required for fully enwrapping neuronal cell bodies. Under wild-type conditions, many lipid droplets are present in cortex glia during the rapid phase of neurogenesis that occurs in larvae. During this phase, lipid droplets likely support the ability of cortex glia to generate large quantities of membrane lipids to drive membrane growth needed to ensheathe newly born neurons. Supporting this idea, lipid droplets disappear in the adult Drosophila CNS when neurogenesis is complete and cortex glia remodeling stops. We speculate that lipid droplet loss in ifc mutant larvae contributes to the inability of cortex glia to enwrap neuronal cell bodies. Prior work on lipid droplets in flies has focused on stress-induced lipid droplets generated in glia and their protective or deleterious roles in the nervous system. Work in mice and humans has found that more lipid droplets are often associated with the pathogenesis of neurodegenerative diseases, but our work correlates lipid droplet loss with CNS defects. In the future, it will be important to determine how lipid droplets impact nervous system development and disease.”

      (4) On page 10, the authors use the words "strong" and "weak" to describe where ifc is expressed. Since the use of T2A-GAL4 alleles in examining gene expression is unable to delineate the amount of gene expression from a locus, the terms "broad" and "sparse" labeling (or similar terms) should be used instead.

      The ifc T2A-GAL4 insert in the ifc locus reports on the transcription of the gene. We agree that GAL4 system will not reflect amount of gene expression differences when the expression levels are not dramatically different. However, when the expression levels differ dramatically, as in our case, GAL4 system can reflect this difference in the expression of a reporter gene.  We reworded this section to suggest that ifc is transcribed at higher levels in glia as compared to neurons. We can’t use sparse or broad, as ifc is expressed in all, or at least in most, glia and neurons. The new text is as follows:” Using this approach, we observed strong nRFP expression in all glial cells (Figures 4D and S10A) and modest nRFP expression in all neurons (Figures 4E and S10B), suggesting ifc is transcribed at higher levels in glial cells than neurons in the larval CNS.”  

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, the authors report three novel ifc alleles: ifc[js1], ifc[js2], and ifc[js3]. ifc[js1] and ifc[js2] encode missense mutations, V276D and G257S, respectively. ifc[js3] encodes a nonsense mutation, W162*. These alleles exhibit multiple phenotypes, including delayed progression to the late-third larval instar stage, reduced brain size, elongation of the ventral nerve cord, axonal swelling, and lethality during late larval or early pupal stages.

      Further characterization of these alleles the authors reveals that ifc is predominantly expressed in glia and localizes to the endoplasmic reticulum (ER). The expression of ifc gene governs glial morphology and survival. Expression of fly ifc cDNA or human DEGS1 cDNA specifically in glia, but not neurons, rescues the CNS phenotypes of ifc mutants, indicating a crucial role for ifc in glial cells and its evolutionary conservation. Loss of ifc results in ER expansion and loss of lipid droplets in cortex glia. Additionally, loss of ifc leads to ceramide depletion and accumulation of dihydroceramide. Moreover, it increases the saturation levels of triacylglycerols and membrane phospholipids. Finally, the reduction of dihydroceramide synthesis suppresses the CNS phenotypes associated with ifc mutations, indicating the key role of dihydroceramide in causing ifc LOF defects.

      Strengths:

      This manuscript unveils several intriguing and novel phenotypes of ifc loss-of-function in glia. The experiments are meticulously planned and executed, with the data strongly supporting their conclusions.

      Weaknesses:

      I didn't find any obvious weakness.

      Reviewer #1 (Recommendations For The Authors):

      Additional minor comments below:

      (1) The authors state that TGs are the building blocks of membrane phospholipids. This is not exactly true. The breakdown of TGs can result in free FAs which can be used for membrane phospholipid synthesis. Also, membrane phospholipids can also be generated from free FAs that were never in TGs.

      To address this point, we have reworked a number of sentences in the text. On page 12 we reworded two small sections to the following: 

      “In the CNS, lipid droplets form primarily in cortex glia[29] and are thought to contribute to membrane lipid synthesis through their catabolism into free fatty acids versus acting as an energy source in the brain.[41] Consistent with the possibility that increased membrane lipid synthesis drives lipid droplet reduction, RNA-seq assays of dissected nerve cords revealed that loss of ifc drove transcriptional upregulation of genes that promote membrane lipid biogenesis”

      As TG breakdown results in free fatty acids that can be used for membrane phospholipid synthesis, we asked if changes in TG levels and saturation were reflected in the levels or saturation of the membrane phospholipids phosphatidylcholine (PC), phosphatidylethanolamine (PE), and phosphatidylserine (PS).

      (2) Figure 5J what does the dotted line indicate? Please specify in the figure legend or remove it.

      We have added the following text in the figure legend: Dotted line indicates a log2 fold change of 0.5 in the treatment group compared to the control group.

      (3) The text for your graphs is hard to read. Please make the font larger.

      We have increased font size to enhance the readability of the figures.

      (4) The authors mentioned that driving ifc expression in neurons rescues the phenotypes (ref 17). While the glial-specific role presented in this study is robust. I think some readers would appreciate some discussion of this study in light of the data presented here.

      We have added the below text on page 10 to address this point.

      “Results of our gene rescue experiments conflict with a prior study on ifc in which expression of ifc in neurons was found to rescue the ifc phenotype. In this context, we note that elav-GAL4 drives UASlinked transgene expression not just in neurons, but also in glia at appreciable levels, and thus needs to be paired with repo-GAL80 to restrict GAL4-mediated gene expression to neurons. Thus, “off-target” expression in glial cells may account for the discrepant results. It is, however, more difficult to reconcile how neuronal or glial expression of ifc would rescue the observed lethality of the ifc-KO chromosome given the presence additional lethal mutations in the 21E2 region of the second chromosome.”

      (5) While the analysis of fatty acid saturation is experimentally well done. I'm not really sure what the significance of this data is.

      We included this information as a reference for future analysis of additional genes in the ceramide biogenesis pathway, as we expect that alteration of the levels and saturation levels of PE, PC, and PS in cell membranes may underlie key changes in the biophysical properties of glial cell membranes and their ability to enwrap or infiltrate their targets. Thus, we expect the significance of these data to grow as more work is done on additional members of the ceramide pathway in the nervous system in flies and other systems.  

      Reviewer #2 (Recommendations For The Authors):

      (1) There is a typo at the top of page 11: "internal membranes and fail enwrap neurons" is missing the word "to" before "enwrap"

      The typo was fixed.

      (2)  PMID: 36718090 should be included in the discussion of SPT and ORMDL complex in human disease.

      The reference was added.

      Reviewer #3 (Recommendations For The Authors):

      In this manuscript, the authors report three novel ifc alleles: ifc[js1], ifc[js2], and ifc[js3]. ifc[js1] and ifc[js2] encode missense mutations, V276D and G257S, respectively. ifc[js3] encodes a nonsense mutation, W162*. These alleles exhibit multiple phenotypes, including delayed progression to the late-third larval instar stage, reduced brain size, elongation of the ventral nerve cord, axonal swelling, and lethality during late larval or early pupal stages.

      Further characterization of these alleles the authors reveals that ifc is predominantly expressed in glia and localizes to the endoplasmic reticulum (ER). The expression of ifc gene governs glial morphology and survival. Expression of fly ifc cDNA or human DEGS1 cDNA specifically in glia, but not neurons, rescues the CNS phenotypes of ifc mutants, indicating a crucial role for ifc in glial cells and its evolutionary conservation. Loss of ifc results in ER expansion and loss of lipid droplets in cortex glia. Additionally, loss of ifc leads to ceramide depletion and accumulation of dihydroceramide. Moreover, it increases the saturation levels of triacylglycerols and membrane phospholipids. Finally, the reduction of dihydroceramide synthesis suppresses the CNS phenotypes associated with ifc mutations, indicating the key role of dihydroceramide in causing ifc LOF defects.

      In summary, this manuscript unveils several intriguing and novel phenotypes of ifc loss-of-function in glia. The experiments are meticulously planned and executed, with the data strongly supporting their conclusions. I have no additional comments and fully support the publication of this manuscript in eLife.

      The authors also note that they added one paragraph to the discussion that addresses the possibility that the increased detection of cell death markers could arise due to the inability of glial cells to remove cellular debris. The text of this paragraph is provided below:

      We note that cortex glia are the major phagocytic cell of the CNS and phagocytose neurons targeted for apoptosis as part of the normal developmental process.23-26  Thus, while we favor the model that ifc triggers neuronal cell death due to glial dysfunction, it is also possible that increased detection of dying neurons arises due at least in part to a decreased ability of cortex glia to clear dying neurons from the CNS. At present, the large number of neurons that undergo developmentally programmed cell death combined with the significant disruption to brain and ventral nerve cord morphology caused by loss of ifc function render this question difficult to address.Additional evidence does, however, support the idea that loss of ifc function drives excess neuronal cell death: Clonal analysis in the fly eye reveals that loss of ifc drives photoreceptor neuron degeneration17, indicating that loss of ifc function drives neuronal cell death; cortex-glia specific depletion of CPES, which acts downstream of ifc, disrupts neuronal function and induces photosensitive epilepsy in flies59, indicating that genes in the ceramide pathway can act nonautonomously in glia to regulate neuronal function; recent genetic studies reveal that other glial cells can compensate for impaired cortex glial cell function by phagocytosing dying neurons62, and we observe that the cell membranes of subperineurial glia enwrap dying neurons in ifc mutant larvae (Fig. S14), consistent with similar compensation occurring in this background, and in humans, loss of function mutations in DEGS1 cause neurodegeneration.7-9 Clearly, future work is required to address this question for ifc/DEGS1 and perhaps other members of the ceramide biogenesis pathway.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):  

      Summary: 

      Kohno et al. examined whether the anti-inflammatory cytokine IL-4 attenuates neuropathic pain by promoting the emergence of antinociceptive microglia in the dorsal horn of the spinal cord. In two models of neuropathic pain following peripheral nerve injury, intrathecal administration of IL-4 once a day for 3 days from day 14 to day 17 after injury, attenuates hypersensitivity to mechanical stimuli in the hind paw ipsilateral to nerve injury. Such an antinociceptive effect correlates with a higher number of CD11c+microglia in the dorsal horn of the spinal cord which is the termination area for primary afferent fibres injured in the periphery. Interestingly, CD11c+ microglia emerge spontaneously in the dorsal horn in concomitance with the resolution of pain in the spinal nerve model of pain, but not in the spared nerve injury model where pain does not resolve, confirming that this cluster of microglia is involved in resolution pain. 

      Based on existing evidence that the receptor for IL-4, namely IL-4R, is expressed by microglia, the authors suggest that IL-4R mediates IL-4 effect in microglia including up-regulation of Igf1 mRNA. They have previously reported that IGF-1 can attenuate pain neuron activity in the spinal cord. 

      Strengths:

      This study includes cutting-edge techniques such as flow cytometry analysis of microglia and transgenic mouse models. 

      Weaknesses:

      The conclusion of this paper is supported by data, but the interpretation of some data requires clarification.  

      We appreciate the reviewer's careful reading of our paper.  According to the reviewer's comments, we have performed new immunohistochemical experiments and added some discussion in the revised manuscript (please see the point-by-point responses below).

      Reviewer #2 (Public review):

      Summary:

      The authors aimed to investigate how IL-4 modulates the reactive state of microglia in the context of neuropathic pain. Specifically, they sought to determine whether IL-4 drives an increase in CD11c+ microglial cells, a population associated with anti-inflammatory responses and whether this change is linked to the suppression of neuropathic pain. The study employs a combination of behavioral assays, pharmacogenetic manipulation of microglial populations, and characterization of microglial markers to address these questions. 

      Strengths: 

      The methodological approach in this study is robust, providing convincing evidence for the proposed mechanism of IL-4-mediated microglial regulation in neuropathic pain. The experimental design is well thought out, utilizing two distinct neuropathic pain models (SpNT and SNI), each yielding different outcomes. The SpNT model demonstrates spontaneous pain remission and an increase in the CD11c+ microglial population, which correlates with pain suppression. In contrast, the SNI model, which does not show spontaneous pain remission, lacks a significant increase in CD11c+ microglia, underscoring the specificity of the observed phenomenon. This design effectively highlights the role of the CD11c+ microglial population in pain modulation. The use of behavioral tests provides a clear functional assessment of IL-4 manipulation, and pharmacogenetic tools allow for precise control of microglial populations, minimizing off-target effects. Notably, the manipulation targets the CD11c promoter, which presumably reduces the risk of non-specific ablation of other microglial populations, strengthening the experimental precision. Moreover, the thorough characterization of microglial markers adds depth to the analysis, ensuring that the changes in microglial populations are accurately linked to the behavioral outcomes. 

      Weaknesses: 

      One potential limitation of the study is that the mechanistic details of how IL-4 induces the observed shift in microglial populations are not fully explored. While the study demonstrates a correlation between IL-4 and CD11c+ microglial cells, a deeper investigation into the specific signaling pathways and molecular processes driving this population shift would greatly strengthen the conclusions. Additionally, the paper does not clearly integrate the findings into the broader context of microglial reactive state regulation in neuropathic pain.  

      We thank the reviewer for these insightful comments on our paper.  As the reviewer's suggested, further investigation of the specific signaling pathways and molecular processes by which IL-4 induces a transition of spinal microglia to the CD11c+ state would strengthen our conclusion and also provide important clues to discovering new therapeutic targets.  In revising the manuscript, we have included this in the Discussion section (line 264-267), and we hope that future studies clarify these points.  As for the additional comment, we have added a brief summary of existing research on microglial function in neuropathic pain at the beginning of the Discussion section (line 188–196).

      Reviewer #1 (Recommendations for the authors):

      The conclusions of this paper are supported by data, but the interpretation of some data requires clarification. 

      (1) In Figure 1D and Figure 7 C, CD11c+ microglia numbers are higher in contralateral dorsal horns after IL-4 administration despite IL-4 having no effect on pain thresholds. The authors should discuss these findings.  

      As the reviewer pointed out, IL-4 increased the number of CD11c<sup>+</sup> microglia in the contralateral spinal dorsal horn (SDH) but did not affect pain thresholds in the contralateral hindpaw.  The data seem to be related to the selective effect of CD11c+ microglia and their factors (especially IGF1) on nerve injury-induced pain hypersensitivity.  In fact, depletion of CD11c+ spinal microglia and intrathecal administration of IGF1 do not elevate pain threshold of the contralateral hindpaw (Science 376: 86–90, 2022).  We have added above statement in the Discussion section (line 208– 213).

      (2)  Do monocytes infiltrate the dorsal horn and DRG after intrathecal injections?

      To address this reviewer's comment, we performed new immunohistochemical experiments to analyze monocytes in the SDH using an antibody for CD169 (a marker for bone marrow-derived monocytes/macrophages, but not for resident microglia) (J Clin Invest 122: 3063– 3087, 2012; Cell Rep 3: 605–614, 2016) and found no CD169+ monocytes in the SDH parenchyma after SpNT.  Consistent with this data, we have previously demonstrated that few bone marrow-derived monocytes/macrophages are recruited to the SDH after SpNT (Sci Rep 6: 23701, 2016).  Similarly, no CD169+ monocytes in the SDH parenchyma were observed in SpNT mice treated intrathecally with PBS or IL-4 (Figure 1—figure supplement 1A).

      In the DRG, CD169 is constitutively expressed in macrophages.  Thus, in accordance with a recent report showing that monocytes infiltrating the DRG are positive for chemokine (C-C motif) receptor 2 (CCR2) (J Exp Med 221: e20230675, 2024), we analyzed CCR2+ cells and found that CCR2+ IBA1dim monocytes were observed in the capsule and parenchyma of the DRG of naive mice (Figure 1—figure supplement 1B).  After SpNT, CCR2+ IBA1dim monocytes in the DRG parenchyma increased.  Intrathecal treatment of IL-4 increased CCR2+ IBA1dim cells in the DRG capsule.  However, the involvement of these monocytes in the DRG in IL-4-induced alleviation of neuropathic pain is unclear and warrants further investigation.  In revising the manuscript, we have included additional data (Figure 1—figure supplement 1) and corresponding text in the Results (line 112–114) and Discussion section (line 218–222).

      (3) In Figure 4, depletion of CD11c+ cells in dorsal root ganglia (DRG) ameliorates neuropathic thresholds but does not alter the anti-nociceptive effect of IL-4 injected intrathecal. It appears that CD11c+ macrophages in DRG have an opposite role to CD11c+ microglia in the spinal cord. Please discuss this result. 

      We apologize for the confusion.  The aim of the experiments in Figure 4 was to examine the contribution of CD11c+ cells in the DRG to the pain-alleviating effect of intrathecal IL-4.  For this aim, we depleted CD11c+ cells in the DRG (but not in the SDH) by intraperitoneal injection of diphtheria toxin (DTX) immediately after the behavioral measurements performed on day 17 (Fig. 4A, B).  On day 18, the paw withdrawal threshold of DTX-treated mice was almost similar to that of PBS-treated mice, indicating that the depletion of CD11c+ cells in the DRG does not affect the pain-alleviating effect of IL-4.  These data are in stark contrast to those obtained from mice with depletion of CD11c+ cells in the SDH by intrathecal DTX (the depletion canceled the IL-4's effect) (Figure 2A).  Thus, it is conceivable that CD11c+ cells in the DRG are not involved in the IL-4-induced alleviating effect on neuropathic pain.  Because the confusion might be related to the statement in this paragraph of the initial version, we thus modified our statements to make this point more clearly (line 133–139).

      Reviewer #2 (Recommendations for the authors):

      A discussion addressing how these results fit into existing research on microglial function in pain would enhance the study's impact.

      A brief summary of existing research on microglial function in neuropathic pain has been included at the beginning of the Discussion section (line 188–196).

      It would be helpful for the authors to elaborate on the implications of their findings within the larger landscape of immune regulation in neuropathic pain.

      Our present findings showed an ability of IL-4, known as a T-cell-derived factor, to increase CD11c+ microglia and to control neuropathic pain.  Furthermore, recent studies have also indicated that immune cells such as CD8+ T cells infiltrating into the spinal cord (Neuron 113: 896-911.e9, 2025), and regulatory T cells (eLife 10: e69056, 2021; Science 388: 96–104, 2025) and MRC1+ macrophages in the spinal meninges (Neuron 109: 1274–1282, 2021) have important roles in regulating microglial states and neuropathic pain.  Thus, these findings provide new insights into the mechanisms of the neuro-immune interactions that regulate neuropathic pain.  In revising the manuscript, we have added above statement in the Discussion section (line 254–260).

      Furthermore, a discussion on how these findings could inform the development of targeted therapies that modulate microglial populations in a controlled, disease-specific manner would be valuable. Exploring how these insights could lead to novel treatment strategies for neuropathic pain could provide important future directions for the research and broader clinical applications.

      We appreciate the reviewer's valuable suggestion.  Our current data, demonstrating that IL-4 increases CD11c+ microglia without affecting the total number of microglia, could open a new avenue for developing strategies to modulate microglial subpopulations through molecular targeting, which may lead to new analgesics.  However, given IL-4's association with allergic responses, targeting microglia-selective molecules involved in shifting microglia toward the CD11c+ state—such as intracellular signaling molecules downstream of IL-4 receptors—may offer a more selective and safer therapeutic approach.  Moreover, since CD11c+ microglia have been implicated in other CNS diseases [e.g., Alzheimer disease (Cell 169: 1276–1290, 2017), amyotrophic lateral sclerosis (Nat Neurosci 25: 26–38, 2022), and multiple sclerosis (Front Cell Neurosci 12: 523, 2019)], further investigations into the mechanisms driving CD11c+ microglia induction could provide insights into novel therapeutic strategies not only for neuropathic pain but also for other CNS diseases.  In revising the manuscript, we have added above statement in the Discussion section (line 260–271).

    1. Author Response:

      Reviewer #1 (Public review):

      The study by Lotonin et al. investigates correlates of protection against African swine fever virus (ASFV) infection. The study is based on a comprehensive work, including the measurement of immune parameters using complementary methodologies. An important aspect of the work is the temporal analysis of the immune events, allowing for the capture of the dynamics of the immune responses induced after infection. Also, the work compares responses induced in farm and SPF pigs, showing the latter an enhanced capacity to induce a protective immunity. Overall, the results obtained are interesting and relevant for the field. The findings described in the study further validate work from previous studies (critical role of virus-specific T cell responses) and provide new evidence on the importance of a balanced innate immune response during the immunization process. This information increases our knowledge on basic ASF immunology, one of the important gaps in ASF research that needs to be addressed for a more rational design of effective vaccines. Further studies will be required to corroborate that the results obtained based on the immunization of pigs by a not completely attenuated virus strain are also valid in other models, such as immunization using live attenuated vaccines.

      While overall the conclusions of the work are well supported by the results, I consider that the following issues should be addressed to improve the interpretation of the results:

      We thank Reviewer #1 for their thoughtful and constructive feedback, which will significantly contribute to improving the clarity and quality of our manuscript. Below, we respond to each of the reviewer’s comments and outline the revisions we plan to incorporate.

      (1) An important issue in the study is the characterization of the infection outcome observed upon Estonia 2014 inoculation. Infected pigs show a long period of viremia, which is not linked to clinical signs. Indeed, animals are recovered by 20 days post-infection (dpi), but virus levels in blood remain high until 141 dpi. This is uncommon for ASF acute infections and rather indicates a potential induction of a chronic infection. Have the authors analysed this possibility deeply? Are there lesions indicative of chronic ASF in infected pigs at 17 dpi (when they have sacrificed some animals) or, more importantly, at later time points? Does the virus persist in some tissues at late time points, once clinical signs are not observed? Has all this been tested in previous studies?

      Tissue samples were tested for viral loads only at 17 dpi during the immunization phase, and long-term persistence of the virus in tissues has not been assessed in our previous studies. At 17 dpi, lesions were most prominently observed in the lymph nodes of both farm and SPF pigs. In a previous study using the Estonia 2014 strain (doi: 10.1371/journal.ppat.1010522), organs were analyzed at 28 dpi, and no pathological signs were detected. This finding calls into question the likelihood of chronic infection being induced by this strain.

      (2) Virus loads post-Estonia infection significantly differ from whole blood and serum (Figure 1C), while they are very similar in the same samples post-challenge. Have the authors validated these results using methods to quantify infectious particles, such as Hemadsorption or Immunoperoxidase assays? This is important, since it would determine the duration of virus replication post-Estonia inoculation, which is a very relevant parameter of the model.

      We did not perform virus titration but instead used qPCR as a sensitive and standardized method to assess viral genome loads. Although qPCR does not distinguish between infectious and non-infectious virus, it provides a reliable proxy for relative viral replication and clearance dynamics in this model. Unfortunately, no sample material remains from this experiment, but we agree that subsequent studies employing infectious virus quantification would be valuable for further refining our understanding of viral persistence and replication following Estonia 2014 infection.

      (3) Related to the previous points, do the authors consider it expected that the induction of immunosuppressive mechanisms during such a prolonged virus persistence, as described in humans and mouse models? Have the authors analysed the presence of immunosuppressive mechanisms during the virus persistence phase (IL10, myeloid-derived suppressor cells)? Have the authors used T cell exhausting markers to immunophenotype ASFV Estonia-induced T cells?

      We agree with the reviewer that the lack of long-term protection can be linked to immunosuppressive mechanisms, as demonstrated for genotype I strains (doi: 10.1128/JVI.00350-20). The proposed markers were not analyzed in this study but represent important targets for future investigation. We will address this point in the discussion.

      (4) A broader analysis of inflammatory mediators during the persistence phase would also be very informative. Is the presence of high VLs at late time points linked to a systemic inflammatory response? For instance, levels of IFNa are still higher at 11 dpi than at baseline, but they are not analysed at later time points.

      While IFN-α levels remain elevated at 11 dpi, this response is typically transient in ASFV infection and likely not linked to persistent viremia. We agree that analyzing additional inflammatory markers at later time points would be valuable, and future studies should be designed to further understand viral persistence.

      (5) The authors observed a correlation between IL1b in serum before challenge and protection. The authors also nicely discuss the potential role of this cytokine in promoting memory CD4 T cell functionality, as demonstrated in mice previously. However, the cells producing IL1b before ASFV challenge are not identified. Might it be linked to virus persistence in some organs? This important issue should be discussed in the manuscript.

      We agree that identifying the cellular source of IL-1β prior to challenge is important, and this should be addressed in subsequent studies. We will include a discussion on the potential link between elevated IL-1β levels and virus persistence in certain organs.

      (6) The lack of non-immunized controls during the challenge makes the interpretation of the results difficult. Has this challenge dose been previously tested in pigs of the age to demonstrate its 100% lethality? Can the low percentage of protected farm pigs be due to a modulation of memory T and B cell development by the persistence of the virus, or might it be related to the duration of the immunity, which in this model is tested at a very late time point? Related to this, how has the challenge day been selected? Have the authors analysed ASFV Estonia-induced immune responses over time to select it?

      In our previous study, intramuscular infection with ~3–6 × 10² TCID₅₀/mL led to 100% lethality (doi: 10.1371/journal.ppat.1010522), which is notably lower than the dose used in the present study, although the route here was oronasal. The modulation of memory responses could be more thoroughly assessed in future studies using exhaustion markers. The challenge time point was selected based on the clearance of the virus from blood and serum. We agree that the lack of protection in some animals is puzzling and warrants further investigation, particularly to assess the role of immune duration, potential T cell exhaustion caused by viral persistence, or other immunological factors that may influence protection. Based on our experience, vaccine virus persistence alone does not sufficiently explain the lack-of-protection phenomenon. We will incorporate these important aspects into the revised discussion.

      (7) Also, non-immunized controls at 0 dpc would help in the interpretation of the results from Figure 2C. Do the authors consider that the pig's age might influence the immune status (cytokine levels) at the time of challenge and thus the infection outcome?

      We support the view that including non-immunized controls at 0 dpc would strengthen the interpretation of cytokine dynamics and will consider this in future experimental designs. Regarding age, while all animals were within a similar age range at the time of challenge, we acknowledge that age-related differences in immune status could influence baseline cytokine levels and infection outcomes, and this is an important factor to consider.

      (8) Besides anti-CD2v antibodies, anti-C-type lectin antibodies can also inhibit hemadsorption (DOI: 10.1099/jgv.0.000024). Please correct the corresponding text in the results and discussion sections related to humoral responses as correlates of protection. Also, a more extended discussion on the controversial role of neutralizing antibodies (which have not been analysed in this study), or other functional mechanisms such as ADCC against ASFV would improve the discussion.

      The relevant text in the Results and Discussion sections will be revised accordingly, and the discussion will be extended to more thoroughly address the roles of antibodies.

      Reviewer #2 (Public review):

      Summary:

      In the current study, the authors attempt to identify correlates of protection for improved outcomes following re-challenge with ASFV. An advantage is the study design, which compares the responses to a vaccine-like mild challenge and during a virulent challenge months later. It is a fairly thorough description of the immune status of animals in terms of T cell responses, antibody responses, cytokines, and transcriptional responses, and the methods appear largely standard. The comparison between SPF and farm animals is interesting and probably useful for the field in that it suggests that SPF conditions might not fully recapitulate immune protection in the real world. I thought some of the conclusions were over-stated, and there are several locations where the data could be presented more clearly.

      Strengths:

      The study is fairly comprehensive in the depth of immune read-outs interrogated. The potential pathways are systematically explored. Comparison of farm animals and SPF animals gives insights into how baseline immune function can differ based on hygiene, which would also likely inform interpretation of vaccination studies going forward.

      Weaknesses:

      Some of the conclusions are over-interpreted and should be more robustly shown or toned down. There are also some issues with data presentation that need to be resolved and data that aren't provided that should be, like flow cytometry plots.

      We appreciate the feedback from the Reviewer #2 and acknowledge the concerns raised regarding data presentation. In the revised manuscript, we will clarify our conclusions where needed and ensure that interpretations are better aligned with the data shown.

    1. Author Response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public Review):

      Summary:

      This paper introduces a new class of machine learning models for capturing how likely a specific nucleotide in a rearranged IG gene is to undergo somatic hypermutation. These models modestly outperform existing state-of-the-art efforts, despite having fewer free parameters. A surprising finding is that models trained on all mutations from non-functional rearrangements give divergent results from those trained on only silent mutations from functional rearrangements.

      Strengths:

      (1) The new model structure is quite clever and will provide a powerful way to explore larger models.

      (2) Careful attention is paid to curating and processing large existing data sets.

      (3) The authors are to be commended for their efforts to communicate with the developers of previous models and use the strongest possible versions of those in their current evaluation.

      Thank you very much for your comments. We especially appreciate the last comment, as we have indeed tried hard to do so.

      Weaknesses:

      (1) 10x/single cell data has a fairly different error profile compared to bulk data. A synonymous model should be built from the same briney dataset as the base model to validate the difference between the two types of training data.

      Thank you for pointing this out.

      We have repeated the same analysis with synonymous mutations derived from the bulk-sequenced tang dataset for Figure 4 and the supplementary figure. The conclusion remains the same. We used tang because only the out-of-frame sequences were available to us for the briney dataset, as we were using preprocessing from the Spisak paper.<br /> The fact that both the 10x and the tang data give the same results bolsters our claim.

      (2) The decision to test only kernels of 7, 9, and 11 is not described. The selection/optimization of embedding size is not explained. The filters listed in Table 1 are not defined.

      We have added the following to the Models subsection to further explain these decisions:

      “The hyperparameters for the models (Table 1) were selected with a run of Optuna (Akiba et al., 2019) early in the project and then fixed. Further optimization was not pursued because of the limited performance differences between the existing models.”

      Reviewer #2 (Public Review):

      Summary:

      This work offers an insightful contribution for researchers in computational biology, immunology, and machine learning. By employing a 3-mer embedding and CNN architecture, the authors demonstrate that it is possible to extend sequence context without exponentially increasing the model's complexity.

      Key findings:

      (1) Efficiency and Performance: Thrifty CNNs outperform traditional 5-mer models and match the performance of significantly larger models like DeepSHM.

      (2)Neutral Mutation Data: A distinction is made between using synonymous mutations and out-of-frame sequences for model training, with evidence suggesting these methods capture different aspects of SHM or different biases.

      (3) Open Source Contributions: The release of a Python package and pre-trained models adds practical value for the community.

      Thank you for your positive comments. We believe that we have been clear about the modest improvements (e.g., the abstract says “slight improvement”), and we discuss the data limitations extensively. If there are ways we can do this more effectively, we are happy to hear them.

      Reviewer #3 (Public Review):

      Summary:

      Sung et al. introduce new statistical models that capture a wider sequence context of somatic hypermutation with a comparatively small number of additional parameters. They demonstrate their model’s performance with rigorous testing across multiple subjects and datasets.

      Strengths:

      Well-motivated and defined problem. Clever solution to expand nucleotide context. Complete separation of training and test data by using different subjects for training vs testing. Release of open-source tools and scripts for reproducibility.

      Thank you for your positive comments.

      Weaknesses:

      This study could be improved with better descriptions of dataset sequencing technology, sequencing depth, etc.

      We have added columns to Table 3 that report sequencing technology and depth for each dataset.

      Reviewer #1 (Recommendations for the Authors):

      (1) There seems to be a contradiction between Tables 2 and 3 as to whether the Tang et al. dataset was used to train models or only to test them.

      Thank you for catching this. The "purpose" column in Table 3 was for the main analysis, while Table 2 is describing only models trained to compare with DeepSHM. Explaining this seems more work than it's worth, so we simply removed that column from Table 2. The dataset purposes are clear from the text.

      (2) In Figure 4, I assume the two rows correspond to the Briney and Tang datasets, as in Figure 2, but this is not explicitly described.

      Yes, you are correct. We added an explanation in the caption of Figure 4.

      (3) Figure 2, supplement 1 should include a table like Table 1 that describes these additional models.

      We have added an explanation in the caption to Table 1 that "Medium" and "Large" refer to specific hyperparameter choices. The caption to Figure 2, supplement 1 now describes the corresponding hyperparameter choices for "Small" thrifty models.

      (4) On line 378 "Therefore in either case" seems extraneous.

      Indeed. We have dropped those words.

      (5) In the last paragraph of the Discussion, only the attempt to curate the Ford dataset is described. I am not sure if you intended to discuss the Rodriguez dataset here or not.

      Thank you for pointing this out. We have updated the Materials and Methods section to include our attempts to recover data from Rodriguez et al., 2023.

      (6) Have you looked to see if Soto et al. (Nature 2019) provides usable data for your purposes?

      Thank you for making us aware of this dataset!

      We assessed it but found that the recovery of usable out-of-frame sequences was too low to be useful for our analysis. We now describe this evaluation in the paper.

      (7) Cui et al. note a high similarity between S5F and S5NF (r=0.93). Does that constrain the possible explanations for the divergence you see?

      This is an excellent point.

      We don't believe the correlation observed in Cui and our results are incompatible. Our point is not that the two sources of neutral data are completely different but that they differ enough to limit generalization. Also, the Spearman correlation in Cui is 0.86, which aligns with our observed drop in R-precision.

      (8) Are you able to test the effects of branch length or background SHM on the model?

      We're unsure what is meant by “background SHM.”<br /> We did try joint optimization of branch length and model parameters, but it did not improve performance. Differences in clone size thresholds do exist between datasets, but Figure 3 suggests that tang is better sequence data.

      (9) Would the model be expected to scale up to a kernel of, say, 50? Would that help yield biological insight?

      We did not test such large models because larger kernels did not improve performance.

      While your suggestion is intriguing, distinguishing biological effects from overfitting would be difficult. We explore biological insights more directly in our recent mechanistic model paper (Fisher et al., 2025), which is now cited in a new paragraph on biological conclusions.

      Reviewer #2 (Recommendations for the Authors):

      (1) Consider applying a stricter filtration approach to the Briney dataset to make it more comparable to the Tang dataset.

      Thank you. We agree that differences in datasets are interesting, though model rankings remain consistent. We now include supplementary figures comparing synonymous and out-of-frame models from the tang dataset.

      (2) You omit mutations between the unmutated germline and the MRCA of each tree. Why?

      The inferred germline may be incorrect due to germline variation or CDR3 indels, which could introduce spurious mutations. Following Spisak et al. (2020), we exclude this branch.<br /> Yes, singletons are discarded: ~28k in tang and ~1.1M in jaffe.

      (3) Could a unified model trained on both data types offer further insights?

      We agree and present such an analysis in Figure 4.

      (4) Tree inference biases from parent-child distances may impact the results.

      While this is an important issue, all models are trained on the same trees, so we expect any noise or bias to be consistent. Different datasets help confirm the robustness of our findings.

      (5) Simulations would strengthen validation.

      We focused on real datasets, which we view as a strength. While simulations could help, designing a meaningful simulation model would be nontrivial. We have clarified this point in the manuscript.

      Reviewer #3 (Recommendations for the Authors):

      There are typos in lines 109, 110, 301, 307, and 418.

      Thank you, we have corrected them.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The authors revisit the specific domains/signals required for the redirection of an inner nuclear membrane protein, emerin, to the secretory pathway. They find that epitope tagging influences protein fate, serving as a cautionary tale for how different visualisation methods are used. Multiple tags and lines of evidence are used, providing solid evidence for the altered fate of different constructs.

      Strengths:

      This is a thorough dissection of domains and properties that confer INM retention vs secretion to the PM/lysosome, and will serve the community well as a caution regarding the placement of tags and how this influences protein fate.

      Weaknesses:

      Biogenesis pathways are not explored experimentally: it would be interesting to know if the lysosomal pool arrives there via the secretory pathway (eg by engineering a glycosylation site into the lumenal domain) or by autophagy, where failed insertion products may accumulate in the cytoplasm and be degraded directly from cytoplasmic inclusions.

      This manuscript is a Research Advance that follows previous work that we published in eLife on this topic (Buchwalter et al., eLife 2019; PMID 31599721). In that prior publication, we showed that emerin-GFP arrives at the lysosome by secretion and exposure at the PM, followed by internalization. While we state these previous findings in this manuscript, we did not explicitly restate here how we came to that conclusion. In the 2019 study, we (i) engineered in a glycosylation site, which demonstrated that emerin-GFP receives complex, Endo H-resistant N-glycans, indicating passage through the Golgi; (ii) performed cell surface labeling, which confirmed that emerin accesses the PM; and interfered with (iii) the early secretory pathway using brefeldin A and with (iv) lysosomal function using bafilomycin A1. Further, we ruled out autophagy as a major contributor to emerin trafficking by treating cells with the PI3K inhibitor KU55933, which had no effect on emerin’s lysosomal delivery.

      It would be helpful if the topology of constructs could be directly demonstrated by pulse-labelling and protease protection. It's possible that there are mixed pools of both topologies that might complicate interpretation.

      We demonstrate that emerin’s TMD inserts in a tail-anchored orientation (C terminus in ER lumen) by appending a GFP tag to either the N or C terminus, followed by anti-GFP antibody labeling of unpermeabilized cells (Fig. 1G). This shows the preferred topology of emerin’s wild type TMD.

      As the reviewer points out, it is possible that our manipulations of the TMD sequence (Fig. 2D-E) alter its preferred topology of membrane insertion. We addressed this question by performing anti-GFP and anti-emerin antibody labeling of the less hydrophobic TMD mutant (EMD-TMDm-GFP) after selective permeabilization of the plasma membrane (Figure 2 supplement, panel F). If emerin biogenesis is normal, the GFP tag should face the ER lumen while the emerin antibody epitope should be cytosolic. If the fidelity of emerin’s membrane insertion is impaired, the GFP tag could be exposed to the cytosol (flipped orientation), which would be detected by anti-GFP labeling upon plasma membrane permeabilization. We find that the C-terminal GFP tag is completely inaccessible to antibody when the PM is selectively permeabilized with digitonin, but is readily detected when all intracellular membranes are permeabilized with Triton-X-100. These data confirm that mutating emerin’s TMD does not disrupt the protein’s membrane topology.

      Reviewer #2 (Public review):

      In this manuscript, Mella et al. investigate the effect of GFP tagging on the localization and stability of the nuclear-localized tail-anchored (TA) protein Emerin. A previous study from this group showed that C-terminally GFP-tagged Emerin protein traffics to the plasma membrane and reaches lysosomes for degradation. It is suggested that the C-terminal tagging of tail-anchored proteins shifts their insertion from the post-translational TRC/GET pathway to the co-translational SRP-mediated pathway. The authors of this paper found that C-terminal GFP tagging causes Emerin to localize to the plasma membrane and eventually reach lysosomes. They investigated the mechanism by which Emerin-GFP moves to the secretory pathway. By manipulating the cytosolic domain and the hydrophobicity of the transmembrane domain (TMD), the authors identify that an ER retention sequence and strong TMD hydrophobicity contribute to Emerin trafficking to the secretory pathway. Overall, the data are solid, and the knowledge will be useful to the field. However, the authors do not fully answer the question of why C-terminally GFP-tagged Emerin moves to the secretory pathway. Importantly, the authors did not consider the possible roles of GFP in the ER lumen influencing Emerin trafficking to the secretory pathway.

      Reviewer #2 (Recommendations for the authors):

      Major concerns:

      (1) The authors suggest that an ER retention sequence and high hydrophobicity of Emerin TMD contribute to its trafficking to the secretory pathway. However, these two features are also present in WT Emerin, which correctly localizes to the inner nuclear membrane. Additionally, the authors show that the ER retention sequence is normally obscured by the LEM domain. The key difference between WT Emerin and Emerin-GFP is the presence of GFP in the ER lumen. The authors missed investigating the role of GFP in the ER lumen in influencing Emerin trafficking to the secretory pathway. It is likely that COPII carrier vesicles capture GFP protein in the lumen as part of the bulk flow mechanism for transport to the Golgi compartment. The authors could easily test this by appending a KDEL sequence to the C-terminus of GFP; this should now redirect the protein to the nucleus.

      We agree with the reviewer’s point that the presence of lumenal GFP somehow promotes secretion of emerin from the ER, likely at the stage of enhancing its packaging into COPII vesicles. We struggle to think about how to interpret the KDEL tagging experiment that the reviewer proposes, as the KDEL receptor predominantly recycles soluble proteins from the Golgi to the ER, while emerin is a membrane protein; and we have shown that emerin already contains a putative COPI-interacting RRR recycling motif in its cytosolic domain.

      Nevertheless, we agree with the reviewer that it is worthwhile to test the possibility that addition of GFP to emerin’s C-terminus promotes capture by COPII vesicles. We have evaluated this question by performing temperature block experiments to cause cargo accumulation within stalled COPII-coated ER exit sites, then comparing the propensity of various untagged and tagged emerin variants to enrich in ER exit sites as judged by colocalization with the COPII subunit Sec31a. These data now appear in Figure 4 supplement 1. These experiments indicate that emerin-GFP samples ER exit sites significantly more than does untagged emerin. Further, the ER exit site enrichment of emerin-GFP is dampened by shortening emerin’s TMD. We do not see further enrichment of any emerin variant in ER exit sites when COPII vesicle budding is stalled by low temperature incubation, implying that emerin lacks any positive sorting signals that direct its selective enrichment in COPII vesicles. Altogether, these data indicate that both emerin’s long and hydrophobic TMD and the addition of a lumenal GFP tag increase emerin’s propensity to sample ER exit sites and undergo non-selective, “bulk flow” ER export.

      (2) The authors nicely demonstrate that the hydrophobicity of Emerin TMD plays a role in its secretory trafficking. I wonder if this feature may be beneficial for cells to degrade newly synthesized Emerin via the lysosomal pathway during mitosis, as the nuclear envelope breakdown may prevent the correct localization of newly synthesized Emerin. The authors could test Emerin localization during mitosis. Such findings could add to the physiological significance of their findings. At the minimum, they should discuss this possibility.

      We thank the reviewer for this insightful suggestion. It is attractive to speculate that secretory trafficking might enable lysosomal degradation of emerin during mitosis, when its lamin anchor has been depolymerized. However, we think it is unlikely that mitotic trafficking contributes significantly to the turnover flux of untagged emerin; if it did, we would expect to see higher steady state levels and/or slowed turnover of emerin mutants that cannot traffic to the lysosome. We did not observe this outcome. Instead, mutations that enhance (RA) or impair (TMDm) emerin trafficking had no effect on the untagged protein’s steady-state levels (Fig. 4G).

      Minor concerns:

      (1) On page 7, the authors note that "FLAG-RA construct was not poorly expressed relative to WR, in contrast with RA-GFP (Figures S3C, 2I)." The expression levels of these proteins cannot be compared across two different blots.

      We apologize for this confusion; we were implying two distinct comparisons to internal controls present on each blot. We have adjusted the text to read “FLAG-RA construct was not poorly expressed relative to FLAG-WT (Fig. S3C) in contrast to RA-GFP compared to WT-GFP (Fig. 2I).”

      (2) In the first paragraph of the discussion, the authors suggest that aromatic amino acids facilitate trafficking to lysosomes. However, they only replaced aromatic amino acids with alanine residues. If they want to make this claim, they should test other amino acids, particularly hydrophobic amino acids such as leucine.

      The reviewer may be inferring more import from our statement than we intended. We focused on these aromatic residues within the TMD because they contribute strongly to its overall hydrophobicity. Experimentally, we determined that nonconservative alanine substitutions of these aromatic residues inhibited trafficking. We do not state and do not intend to imply that the aromatic character of these residues specifically influences trafficking propensity, and we agree with the reviewer that to test such a question would require additional substitutions with non-aromatic hydrophobic amino acids.

      We realize that our phrasing may have been misleading by opening with discussion of the aromatic amino acids; in the revised discussion paragraph, we instead lead with discussion of TMD hydrophobicity, and then state how the specific substitutions we made affect trafficking.

      Reviewing Editor comments:

      While reviewer 1 did not provide any recommendations to the authors, I agree with this reviewer that the authors should validate the topology of their tagged proteins (at least for the one used to draw key conclusions). Given that Emerin is a tail-anchored protein, having a big GFP tag at the C-terminus could mess up ER insertion, causing the protein to take a wrong topology or even be mislocalized in the cytosol, particularly under overexpression conditions. In either case, it can be subject to quality control-dependent clearance via either autophagy, ERphagy, or ER-to-lysosome trafficking. I think that the authors should try a few straightforward experiments such as brefeldin A treatment or dominant negative Sar1 expression to test whether blocking conventional ER-to-Golgi trafficking affects lysosomal delivery of Emerin. I also think that the authors should discuss their findings in the context of the RESET pathway reported previously (PMID: 25083867). The ER stress-dependent trafficking of tagged Emerin to the PM and lysosomes appears to follow a similar trafficking pattern as RESET, although the authors did not demonstrate that Emerin traffic to lysosomes via the PM. In this regard, they should tone down their conclusion and discuss their findings in the context of the RESET pathway, which could serve as a model for their substrate.

      We agree that validating the topology of TMD mutants is important, and now include these experiments in the revised manuscript (please see our response to Reviewer 1 above).

      Please see our response to Reviewer 1’s public review; we previously determined that emerin-GFP undergoes ER-to-Golgi trafficking (see our 2019 study).

      We recognize the major parallels between our findings and the RESET pathway. In our 2019 study, we found that similarly to other RESET cargoes, emerin-GFP travels through the secretory pathway, is exposed at the PM, and is then internalized and delivered to lysosomes. We discussed these strong parallels to RESET in our 2019 study. In this revised manuscript, we now also point out the parallels between emerin trafficking and RESET and cite the 2014 study by Satpute-Krishnan and colleagues (PMID 25083867)

    1. Author Response:

      The following is the authors response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      The authors report four cryoEM structures (2.99 to 3.65 Å resolution) of the 180 kDa, full-length, glycosylated, soluble Angiotensin-I converting enzyme (sACE) dimer, with two homologous catalytic domains at the N- and C-terminal ends (ACE-N and ACE-C). ACE is a protease capable of effectively degrading Aβ. The four structures are C2 pseudo-symmetric homodimers and provide insight into sACE dimerization. These structures were obtained using discrete classification in cryoSPARC and show different combinations of open, intermediate, and closed states of the catalytic domains, resulting in varying degrees of solvent accessibility to the active sites. 

      To deepen the understanding of the gradient of heterogeneity (from closed to open states) observed with discrete classification, the authors performed all-atom MD simulations and continuous conformational analysis of cryo-EM data using cryoSPARC 3DVA, cryoDRGN, and RECOVAR. cryoDRGN and cryoSPARC 3DVA revealed coordinated open-closed transitions across four catalytic domains, whereas RECOVAR revealed independent motion of two ACE-N domains, also observed with cryoSPARC-focused classification. The authors suggest that the discrepancy in the results of the different methods for continuous conformational analysis in cryo-EM could result from different approaches used for dimensionality reduction and trajectory generation in these methods. 

      Strengths: 

      This is an important study that shows, for the first time, the structure and the snapshots of the dynamics of the full-length sACE dimer. Moreover, the study highlights the importance of combining insights from different cryo-EM methods that address questions difficult or impossible to tackle experimentally while lacking ground truth for validation. 

      Weaknesses: 

      The open, closed, and intermediate states of ACE-N and ACE-C in the four cryo-EM structures from discrete classification were designated quantitatively (based on measured atomic distances on the models fitted into cryo-EM maps, Figure 2D). Unfortunately, atomic models were not fitted into cryo-EM maps obtained with cryoSPARC 3DVA, cryoDRGN, and RECOVAR, and the open/closed states in these cases were designated based on qualitative analysis. As the authors clearly pointed out, there are many other methods for continuous conformational heterogeneity analysis in cryo-EM. Among these methods, some allow analyzing particle images in terms of atomic models, like MDSPACE (Vuillemot et al., J. Mol. Biol. 2023, 435:167951), which result in one atomic model per particle image and can help in analyzing cooperativity of domain motions through measuring atomic distances or angular differences between different domains (Valimehr et al., Int. J. Mol. Sci. 2024, 25: 3371). This could be discussed in the article. 

      Reviewer #2 (Public review): 

      Summary: 

      The manuscript presents a valuable contribution to the field of ACE structural biology and dynamics by providing the first complete full-length dimeric ACE structure in four distinct states. The study integrates cryo-EM and molecular dynamics simulations to offer important insights into ACE dynamics. The depth of analysis is commendable, and the combination of structural and computational approaches enhances our understanding of the protein's conformational landscape. However, the strength of evidence supporting the conclusions needs refinement, particularly in defining key terms, improving structural validation, and ensuring consistency in data analysis. Addressing these points through major revisions will significantly improve the clarity, rigor, and accessibility of the study to a broader audience, allowing it to make a stronger impact in the field. 

      Strengths: 

      The integration of cryo-EM and MD simulations provides valuable insights into ACE dynamics, showcasing the authors' commitment to exploring complex aspects of protein structure and function. This is a commendable effort, and the depth of analysis is appreciated. 

      Weaknesses: 

      Several aspects of the manuscript require further refinement to improve clarity and scientific rigor as detailed in my recommendations for the authors. 

      Reviewer #3 (Public review): 

      Summary: 

      Mancl et al. report four Cryo-EM structures of glycosylated and soluble Angiotensin-I converting enzyme (sACE) dimer. This moves forward the structural understanding of ACE, as previous analysis yielded partially denatured or individual ACE domains. By performing a heterogeneity analysis, the authors identify three structural conformations (open, intermediate open, and closed) that define the openness of the catalytic chamber and structural features governing the dimerization interface. They show that the dimer interface of soluble ACE consists of an N-terminal glycan and protein-protein interaction region, as well as C-terminal protein-protein interactions. Further heterogeneity mining and all-atom molecular dynamic simulations show structural rearrangements that lead to the opening and closing of the catalytic pocket, which could explain how ACE binds its substrate. These studies could contribute to future drug design targeting the active site or dimerization interface of ACE. 

      Strengths: 

      The authors make significant efforts to address ACE denaturation on cryo-EM grids, testing various buffers and grid preparation techniques. These strategies successfully reduce denaturation and greatly enhance the quality of the structural analysis. The integration of cryoDRGN, 3DVA, RECOVAR, and all-atom simulations for heterogeneity analysis proves to be a powerful approach, further strengthening the overall experimental methodology. 

      Weaknesses: 

      In general, the findings are supported by experimental data, but some experimental details and approaches could be improved. For example, CryoDRGN analysis is limited to the top 5 PCA components for ease of comparison with cryoSPARC 3DVA, but wouldn't an expansion to more components with CryoDRGN potentially identify further conformational states? The authors also say that they performed heterogeneity analysis on both datasets but only show data for one. The results for the first dataset should be shown and can be included in supplementary figures. In addition, the authors mention that they were not successful in performing cryoSPARC 3DFLex analysis, but they do not show their data or describe the conditions they used in the methods section. These data should be added and clearly described in the experimental section. 

      Some cryo-EM data processing details are missing. Please add local resolution maps, box sizes, and Euler angle distributions and reference the initial PDB model used for model building. 

      Reviewer #1 (Recommendations for the authors): <br /> Major point: 

      The authors could discuss the use of continuous conformational heterogeneity analysis methods that analyze particle images in terms of atomic models, based on MD simulations, like MDSPACE (Vuillemot et al., J. Mol. Biol. 2023, 435:167951). MDSPACE can be used on a dataset preprocessed with cryoSPARC or Relion by discrete classification to reduce compositional heterogeneity and obtain initial particle poses. It results in one atomic model per particle image and can help in analyzing the cooperativity of domain motions by measuring atomic distances or angular differences between different domains (Valimehr et al., Int. J. Mol. Sci. 2024, 25: 3371). 

      We agree that MDSPACE is a promising and useful tool for analysis, and are excited to implement such a method. Prior to manuscript submission, we have had discussions with the primary author, Slavica Jonic, about how we may employ her software in our analysis. Unfortunately, we were unable to overcome significant computational issues, notably MDSPACE’s lack of GPU functionality, which prevent us from employing MDSPACE in a reasonable manner for our dataset. We hope to employ MDSPACE in future work, once the computational issues have been addressed, and have added a section on MDSPACE to the discussion in an effort to increase the visibility of MDSPACE, as we feel it is an exciting approach that deserves more visibility. We have added a substantial discussion on this point, specifically on MDspace as follows:

      line 565-574

      Similarly, MDSPACE holds tremendous promise as a method for investigating conformational dynamics from cryo-EM data (61). MDSPACE integrates cryo-EM particle data with short MD simulations to fit atomic models into each particle image through an iterative process which extracts dynamic information. However, the lack of GPU-enabled processing for MDSPACE requires either a dedicated a computational setup that diverges from most other cryo-EM software, or access to a CPU-based supercomputer, which severely limits the accessibility of such software. Despite these challenges, both 3DFlex and MDSPACE use promising approaches to study protein conformational dynamics. We look forward to exploring effective methods to incorporate these strategies into our future research.

      Minor points: 

      (1) Lines 348-350: "The discrepancy in population size between these clusters is likely due to bias in the initial particle poses, rather than a subunit-specific preference for the open state." Which bias? The cluster size is related to conformations, not to poses. 

      We hope to emphasize that the assignment of particles to either the OC or CO cluster is likely due to the particle orientation within the complete dimer refinement, and the discrepancy in size between OC and CO clusters does not necessarily indicate a domain specific preference for one state or another, which would carry allosteric implications. This remains a possibility, but we hope to avoid over-interpretation of our results with the statement above.

      The statement was altered to now read:

      Line 418-423

      “The discrepancy in population size between these clusters is likely due to bias in the initial particle orientation, rather than a subunit-specific preference for the open state. As the O/C state and the C/O state are 180 degree rotations of each other, particle assignment to either cluster is likely influenced by the initial particle orientation of the complete dimer, and we currently lack the data to discern any allosteric implication to the orientation assignment.”

      (2) Line 519: "Micrographs with a max CTF value worse than 4Å were removed from the dataset,..." (also, lines 822-823 in supplementary material). <br /> Do you want to say that micrographs with a resolution worse than 4 A were removed? 

      Max CTF value was replaced with CTF fit resolution to properly match the parameter used in Cryosparc.

      (3) Figure 2C: The black lines are barely visible. Can you make them thicker and in red color? 

      The figure has been amended.

      (4) Figure 2D: The values for Chain A and Chain B in the second row (ACE-C) of sACE-3.05 columns are 17.9 (I) (Chain A) and 13.9 (C) (Chain B). Shouldn't they be reversed (13.9 (C) (Chain A) and 17.9 (I) (Chain B))? 

      The values are now correct. sACE-3.65 chains were flipped in the table, and the updated color scheme should make it easier to map the values from the table to their corresponding structure.

      Reviewer #2 (Recommendations for the authors): 

      The manuscript presents the first complete full-length dimeric ACE structure. The integration of cryo-EM and MD simulations provides valuable insights into ACE dynamics, showcasing the authors' commitment to exploring complex aspects of protein structure and function. This is a commendable effort, and the depth of analysis is appreciated. However, several aspects of the manuscript require further refinement to improve clarity and scientific rigor. In the view of this reviewer, a major revision is necessary. Please see the detailed comments below: 

      (1) Definition of "Conformational Heterogeneity": The term "conformational heterogeneity" should be clearly defined when citing references 27-29. <br /> References 27 and 29 use MD simulations, which reveal "conformational flexibility" rather than "conformational heterogeneity" as observed in cryo-EM data. A more precise distinction should be made. 

      We have changed the term “conformational heterogeneity” to the broader “conformational dynamics

      (2) Figure Adjustments for Clarity: <br /> Figure 1B: A scale bar is needed for accurate representation. 

      A 100 Angstrom scale bar was added to figure 1B.

      Figure 2A, B: Using a Cα trace representation would improve clarity and make structural differences more apparent. 

      We found using a Cα trace representation makes the figure too confusing and impossible to determine individual structural elements. Everything just becomes a jumble of lines.

      Additionally, a Cα displacement vs. residue index plot (with Figure 1A placed along the x-axis) should be included alongside Figures 2A and B to provide quantitative insight into structural variations. 

      This analysis has been combined with several other suggestions and now comprises a new figure 4.

      (3) Structural Resolution and Validation: <br /> Euler angle distribution and 3D-FSC analysis should be provided to help the audience assess how these factors influence the resolution of each structure. <br /> Local resolution analysis in Relion should be included to determine if there are dynamic differences among the four structures. <br /> To enhance structural interpretation, the manuscript would benefit from showcasing examples of bulky side-chain densities (e.g., Trp, Phe, Tyr) for each of the four structures. 

      Information is included in Figure S3 and S5.

      (4) Glycan Modeling Considerations: <br /> Since the resolution of cryo-EM does not allow for precise glycan composition determination, additional experimental validation (e.g., Glyco-MS) would strengthen the modeling. If experimental support is unavailable, appropriate references should be cited to justify the modeled glycans. 

      Minimal glycan modeling was performed with the goal of demonstrating that the protein is glycosylated. We have highlighted that we chose 12 N-linked glycosylation sites that have the observed extra density, an indication that glycan should be present and modeled them with complex glycans in the manuscript.  

      (5) Advanced Cryo-EM and MD Analyses: 3DFlex Analysis: <br /> It is recommended that the authors explore 3DFlex to better capture conformational variability. CryoSPARC's community support can assist in proper implementation. 

      We have incorporated our 3Dflex analysis in our discussion as follows:

      Line 553-565

      Surprisingly, we did not observe such motion using cryoSPARC 3DFlex, a neural network-based method analyzing our cryo-EM data of sACE (54). Central to the working of cryoSPARC 3DFlex is the generation of a tetrahedral mesh used to calculate deformations within the particle population. Proper generation of the mesh is critical for obtaining useful results and must often be determined empirically. Despite several attempts, we were unable to obtain results from 3DFlex comparable to what we observed with our other methods. Even using the results from our 3DVA as prior input to 3DFlex, the largest conformational change we observed was a slight wiggling at the bottom of the D3a subdomain (Movie S12). The authors of 3DFlex note that 3DFlex struggles to model intricate motions, and the implementation of custom tetrahedral meshes currently requires a non-cyclical fusion strategy between mesh segments. Given these limitations, and the complexity of sACE conformational dynamics, it appears that sACE, as a system, is not well-suited to analysis via 3DFlex in its current implementation.

      (6) Movie Consistency: <br /> The MD simulation movies should use the same color coding as the first four movies for consistency. Similarly, the 3DVar analysis map should be color-coded to enhance interpretability. 

      MD simulation movies are re-colored.

      (7) MD Simulations - Data Extraction and Validation: <br /> The manuscript includes several long-timescale MD simulations, but further analysis is needed to extract meaningful dynamic information. Suggested analyses include: <br /> a. RMSF (Root Mean Square Fluctuation) Analysis: Calculate RMSF from MD trajectories and compare it with local resolution variations in cryo-EM maps. 

      RMSF values were included in the new figure 4 along with structural depictions colored by RMSF value to localize variation to the structure.

      b. Assess whether regions exhibiting lower dynamics correspond to higher resolution in cryo-EM. 

      Information is added to Figure 4, Figure S3, S5, S6.

      c. Compare RMSF between simulations with and without glycans to identify potential effects. 

      This has been done in Figure 4.

      d. Clustering Analysis: Use the four solved structures as reference states to cluster MD simulation trajectories. Determine if the population states observed in MD simulations align with cryo-EM findings. 

      This has been done in supplementary figure S10.

      e. Principal Component Analysis (PCA): Perform PCA on MD trajectories and compare with dynamics inferred from cryo-EM analyses (3DVar, cryoDRGN, and RECOVAR) to ensure consistency. 

      This has been done in supplementary figure S11.

      f. Correction of RMSF Analysis or the y-axis label in Figure S9: The RMSF values cannot be negative by definition. The authors should carefully review the code used for this calculation or explicitly define the metric being measured. 

      The Y-axis label has been corrected to clarify that the plot depicts the change in RMSF values when comparing the glycosylated and non-glycosylated MD simulations.

      (8) Discussion on Coordinated Motion and Allostery: <br /> The discussion of coordinated motion and allosteric regulation between sACE-N domains should be explicitly connected to experimental evidence mentioned in the introduction: <br /> "Enzyme kinetics analysis suggests negative cooperativity between two catalytic domains (31-33). However, ACE also exhibits positive synergy toward Ab cleavage and allostery to enhance the activity of its binding partner, the bradykinin receptor (11, 34)." 

      (9) The authors should elaborate on how their new insights provide a mechanistic explanation for these experimental observations. 

      (10) Connection to Therapeutic Implications: <br /> The discussion section should more explicitly connect the structural findings to potential therapeutic applications, which would significantly enhance the impact of the study. 

      These three points (8-10) were addressed in a significant overhaul to the discussion section.

      In summary, this study makes a valuable contribution to the field of ACE structural biology and dynamics. The combination of cryo-EM and MD simulations is particularly powerful, and with major revisions, this manuscript has the potential to make a strong impact. Addressing the points outlined above will significantly improve clarity, strengthen the scientific claims, and enhance the manuscript's accessibility to a broader audience. I appreciate the authors' rigorous approach to this complex topic and encourage them to refine their work to fully highlight the significance of their findings. 

      Reviewer #3 (Recommendations for the authors): 

      (1) The authors incorrectly refer to their ACE construct as full-length throughout the manuscript. Given that they are purifying the soluble region (aa 1-1231), saying full-length ACE is not the correct nomenclature. I suggest removing full-length and using soluble ACE (sACE) throughout the text. 

      We utilize the term full-length to highlight the fact that our structures contain both the N and C domains for both subunits in the dimer, in contrast to the previously published ACE cryo-EM structure. We have clarified in the text that we refer to the full-length soluble region of ACE (sACE), and sACE is used to specifically refer to our construct throughout the text, except when referring to ACE in a more generalized biological context in the introduction and discussion.

      (2) The authors could show differences between the different structural states by measuring and displaying the alpha carbon distances. For example, in Figures 2A, B, 3A, and 4B and C. 

      Alpha carbon displacements for each residue have been added to the new figure 4.

      (3) Most figures, with a few exceptions (Figures 2 and S11), are of low quality. Perhaps they are not saved in the same format. In addition, the color schemes used throughout the figures and movies are not consistent. For example, in Figure 1 D2 domains are in green, while they appear yellow in Figure 2 and later. Please double-check all coloring schemes and keep them consistent throughout the manuscript. In addition, it would be good to keep the labeling of the domains in the subsequent figures, as it is difficult to remember which domain is which throughout the manuscript. 

      We are unsure of how to address the low quality issue, our files and the online versions appear to be of suitable high quality. We will work with editorial staff to ensure all files are of suitable quality. The color scheme has been revised throughout the manuscript to ensure consistency and better differentiate between domains and chains.

      (4) Figure 1. Indicate exactly where in panel A ACE-N ends and ACE-C starts. Also, the pink and magenta, as well as aqua vs. light blue, are hard to distinguish. 

      We have updated coloring scheme.

      (5) Figure 2. In the figure legend, the use of brackets for defining closed, intermediate, and open states is confusing, given that the panels are also described with brackets, and some letters match between them. Using a hyphen or bolding the abbreviations could help. Also, define chains A and B, make the black lines that I assume indicate distances in C bold or thicker as they are very hard to see in the figure, and add to the legend what those lines mean. 

      The abbreviations have been changed from parentheses to quotes, and suggestions have been implemented.

      (6) Figure 4 is confusing as shown. Since the authors mention the general range of motion in sACE-N first in the text, wouldn't it make more sense to show panel B first and then panel A? Also, can you point and label the "tip connecting the two long helices of the D1a subdomain" in the figure? It is not clear to me where this region is in B. In addition, add a description of the arrows in B and C to the figure legend. 

      Most changes incorporated. The order should make more sense now in light of other changes.

      (7) Figure 5. Can the authors add a description to the legend as to what the arrows indicate and their thickness? 

      Done

      (8) Add a scale bar to the micrograph images in the supplementary figures. 

      Figure S2 and S4 need the scale bar.

      (9) Provide a more comprehensive description of buffers used in the DF analysis, as this information could be useful to others. 

      We have included the data in Table S1.<br /> (10) Line 51: Reference format not consistent with other references: (Wu et al., 2023). 

      Fixed

      (11) Line 66: Define "ADAM". 

      The definition has been added.

      (12) Line 90: The authors say: Recent open state structures of sACE-N, sACE monomer, and a sACE-N dimer, along with molecular dynamics (MD) simulations of sACE-C, have begun to reveal the conformational heterogeneity, though it remains under-studied (27-29)." Can the authors clarify what "it" refers to? The full-length ACE, sACE, or its specific domains? 

      The sentence now reads: Recent open state structures of sACE-N, sACE monomer, and a sACE-N dimer, along with molecular dynamics (MD) simulations of sACE-C, have begun to reveal ACE conformational dynamics, though they remain under-studied (29-31).

      (13) Line 204: "The comparison of our dimeric sACE cryoEM structures of reveals the conformational dynamics of sACE catalytic domains." The second "of" should be removed. 

      Fixed<br /> (14) Line 268: "From room mean square fluctuation (RMSF) analysis..." "room" should be replaced with "root."

      Fixed

    1. Author Response:

      We would like to thank the reviewers and editors for your consideration of our manuscript, your kind comments about the value of our study, and for providing constructive feedback. We intend to submit a revised version of the manuscript and address the concerns and recommendations. This will include improvements to the statistical analyses, text content, and text format. 

      Specifically, we will:

      1. Revise the text to better explain the experimental methods, interpretation of results and how our findings are situated in the literature. Although we still believe that there is sufficient evidence to suggest that temperate tree species other than Fagus sylvatica may show similar patterns, we understand the reviewers concerns regarding these statements and will revise them.

      2. Add a supplemetal analysis of leaf chlorophyll content data to use leaf discolouration as an alternative marker of the end of the growing season. On this we would like to make two important points. Firstly, we agree with the reviewers that bud set often occurs before leaf discolouration. In experiment 1, bud set occurred on average on day-of-year (DOY) 262, onset of leaf senescence (last day when leaf chlorophyll content fell below 90% of its measured maximum) occurred on average at the same time – DOY 261, and mid-senescence (50% leaf discolouration) occurred on DOY 320. We do not agree that this excludes the combined discussion of bud set and leaf senescence timing. Whilst environmental drivers can affect parts of plants differently, often responses from different end-of-season indicators (e.g. bud set and leaf discolouration) are similar, even if only directionally. Secondly, shifts in bud set timing will remain the key focus of the manuscript as we believe it has greater physiological relevence to plant development, whereas leaf discolouration may simply follow bud set as a symptom of the completion of growth (reduced sink activity).

      3. Address points raised about potential additional drivers of our observed phenological shifts. For example, photoperiod effects and the Sosltice-as-Phenology-Switch hypothesis are not mutually exclusive, the annual progression of photoperiod is fundamental to how we suggest the switch is regulated (please see L66-68 in the original manuscript). The reviewers also comment on the significant differences in soil water content between the treatment groups in Fig. S1. However, all pots were watered sufficiently to avoid water deficit and all efforts were made to minimise differences in water availabiltiy. A provisional analysis shows only one treatment pair (6 - Late_July_Extreme vs. 7 - Early_August_Moderate) had significantly different soil water content, a pair whose differences are not discussed.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):  

      Summary: 

      The paper describes the cryoEM structure of RAD51 filament on the recombination intermediate. In the RAD51 filament, the insertion of a DNA-binding loop called the L2 loop stabilizes the separation of the complementary strand for the base-pairing with an incoming ssDNA and the non-complementary strand, which is captured by the second DNA-binding channel called the site II. The molecular structure of the RAD51 filament with a recombination intermediate provides a new insight into the mechanism of homology search and strand exchange between ssDNA and dsDNA. 

      Strengths: 

      This is the first human RAD51 filament structure with a recombination intermediate called the D-loop. The work has been done with great care, and the results shown in the paper are compelling based on cryo-EM and biochemical analyses. The paper is really nice and important for researchers in the field of homologous recombination, which gives a new view on the molecular mechanism of RAD51-mediated homology search and strand exchange. 

      Weaknesses: 

      The authors need more careful text writing. Without page and line numbers, it is hard to give comments. 

      We would like to thank the reviewer for their kind words of appreciation of our work.

      Reviewer #2 (Public review):  

      Summary: 

      Homologous recombination (HR) is a critical pathway for repairing double-strand DNA breaks and ensuring genomic stability. At the core of HR is the RAD51-mediated strand-exchange process, in which the RAD51-ssDNA filament binds to homologous double-stranded DNA (dsDNA) to form a characteristic D-loop structure. While decades of biochemical, genetic, and single-molecule studies have elucidated many aspects of this mechanism, the atomic-level details of the strand-exchange process remained unresolved due to a lack of atomic-resolution structure of RAD51 D-loop complex. 

      In this study, the authors achieved this by reconstituting a RAD51 mini-filament, allowing them to solve the RAD51 D-loop complex at 2.64 Å resolution using a single particle approach. The atomic resolution structure reveals how specific residues of RAD51 facilitate the strand exchange reaction. Ultimately, this work provides unprecedented structural insight into the eukaryotic HR process and deepens the understanding of RAD51 function at the atomic level, advancing the broader knowledge of DNA repair mechanisms. 

      Strengths: 

      The authors overcame the challenge of RAD51's helical symmetry by designing a minifilament system suitable for single-particle cryo-EM, enabling them to resolve the RAD51 D-loop structure at 2.64 Å without imposed symmetry. This high resolution revealed precise roles of key residues, including F279 in Loop 2, which facilitates strand separation, and basic residues on site II that capture the displaced strand. Their findings were supported by mutagenesis, strand exchange assays, and single-molecule analysis, providing strong validation of the structural insights. 

      Weaknesses: 

      Despite the detailed structural data, some structure-based mutagenesis data interpretation lacks clarity. Additionally, the proposed 3′-to-5′ polarity of strand exchange relies on assumptions from static structural features, such as stronger binding of the 5′-arm-which are not directly supported by other experiments. This makes the directional model compelling but contradicts several well-established biochemical studies that support a 5'-to-3' polarity relative to the complementary strand (e.g., Cell 1995, PMID: 7634335; JBC 1996, PMID: 8910403; Nature 2008, PMID: 18256600). 

      Overall: 

      The 2.6 Å resolution cryoEM structure of the RAD51 D-loop complex provides remarkably detailed insights into the residues involved in D-loop formation. The high-quality cryoEM density enables precise placement of each nucleotide, which is essential for interpreting the molecular interactions between RAD51 and DNA. Particularly, the structural analysis highlights specific roles for key domains, such as the N-terminal domain (NTD), in engaging the donor DNA duplex. 

      This structural interpretation is further substantiated by single-molecule fluorescence experiments using the KK39,40AA NTD mutant. The data clearly show a significant reduction in D-loop formation by the mutant compared to wild-type, supporting the proposed functional role of the NTD observed in the cryoEM model. 

      However, the strand exchange activity interpretation presented in Figure 5B could benefit from a more rigorous experimental design. The current assay measures an increase in fluorescence intensity, which depends heavily on the formation of RAD51-ssDNA filaments. As shown in Figure S6A, several mutants exhibit reduced ability to form such filaments, which could confound the interpretation of strand exchange efficiency. To address this, the assay should either: (1) normalize for equivalent levels of RAD51-ssDNA filaments across samples, or (2) compare the initial rates of fluorescence increase (i.e., the slope of the reaction curve), rather than endpoint fluorescence, to better isolate the strand exchange activity itself. 

      We agree with the reviewer that the reduced filament-forming ability of some of the RAD51 mutants complicates a straightforward interpretation of their strand-exchange assay. Interestingly, the RAD51 mutants that appear most impaired are the esDNA-capture mutants that do not contact the ssDNA in the structure of the pre-synaptic filament. However, the RAD51 NTD mutants, that display the most severe defect in strand-exchange, have a near-WT filament forming ability.

      Based on the structural features of the D-loop, the authors propose that strand pairing and exchange initiate at the 3'-end of the complementary strand in the donor DNA and proceed with a 3'-to-5' polarity. This conclusion, drawn from static structural observations, contrasts with several well-established biochemical studies that support a 5'-to-3' polarity relative to the complementary strand (e.g., Cell 1995, PMID: 7634335; JBC 1996, PMID: 8910403; Nature 2008, PMID: 18256600). While the structural model is compelling and methodologically robust, this discrepancy underscores the need for further experiments. 

      We would like to thank the reviewer for highlighting the importance of our findings to our understanding of the mechanism of homologous recombination.

      The reviewer correctly points out that the polarity of strand exchange by RecA and RAD51 is an extensively researched topic that has been characterised in several authoritative studies. In our paper, we simply describe the mechanistic insights obtained from the structural D-loop models of RAD51 (our work) and RecA (Yang et al, PMID: 33057191).The structures illustrate a very similar mechanism of Dloop formation that proceeds with opposite polarity of strand exchange for RAD51 and RecA. Comparison of the D-loop structures for RecA and RAD51 provides an attractive explanation for the opposite polarity, as caused by the different positions of their dsDNA-binding domains in the filament structure. 

      We agree with the reviewer that further investigation will be needed for an adequate rationalisation of the available evidence. We will mention the relevant literature in the revised version of the manuscript.

      Reviewer #3 (Public review):  

      Summary: 

      Built on their previous pioneer expertise in studying RAD51 biology, in this paper, the authors aim to capture and investigate the structural mechanism of human RAD51 filament bound with a displacement loop (D-loop), which occurs during the dynamic synaptic state of the homologous recombination (HR) strand-exchange step. As the structures of both pre- and post-synaptic RAD51 filaments were previously determined, a complex structure of RAD51 filaments during strand exchange is one of the key missing pieces of information for a complete understanding of how RAD51 functions in the HR pathway. This paper aims to determine the high-resolution cryo-EM structure of RAD51 filament bound with the D-loop. Combined with mutagenesis analysis and biophysical assays, the authors aim to investigate the D-loop DNA structure, RAD51-mediated strand separation and polarity, and a working model of RAD51 during HR strand invasion in comparison with RecA. 

      Strengths: 

      (1) The structural work and associated biophysical assays in this paper are solid, elegantly designed, and interpreted.  These results provide novel insights into RAD51's function in HR. 

      (2) The DNA substrate used was well designed, taking into consideration the nucleotide number requirement of RAD51 for stable capture of donor DNA. This DNA substrate choice lays the foundation for successfully determining the structure of the RAD51 filament on D-loop DNA using single-particle cryo-EM. 

      (3) The authors utilised their previous expertise in capping DNA ends using monomeric streptavidin and combined their careful data collection and processing to determine the cryo-EM structure of full-length human RAD51 bound at the D-loop in high resolution. This interesting structure forms the core part of this work and allows detailed mapping of DNA-DNA and DNA-protein interaction among RAD51, invading strands, and donor DNA arms (Figures 1, 2, 3, 4). The geometric analysis of D-loop DNA bound with RAD51 and EM density for homologous DNA pairing is also impressive (Figure S5). The previously disordered RAD51's L2-loop is now ordered and traceable in the density map and functions as a physical spacer when bound with D-loop DNA. Interestingly, the authors identified that the side chain position of F279 in the L2_loop of RAD51_H differs from other F279 residues in L2-loops of E, F, and G protomers. This asymmetric binding of L2 loops and RAD51_NTD binding with donor DNA arms forms the basis of the proposed working model about the polarity of csDNA during RAD51-mediated strand exchange. 

      (4) This work also includes mutagenesis analysis and biophysical experiments, especially EMSA, singlemolecule fluorescence imaging using an optical tweezer, and DNA strand exchange assay, which are all suitable methods to study the key residues of RAD51 for strand exchange and D-loop formation (Figure 5). 

      Weaknesses: 

      (1) The proposed model for the 3'-5' polarity of RAD51-mediated strand invasion is based on the structural observations in the cryo-EM structure. This study lacks follow-up biochemical/biophysical experiments to validate the proposed model compared to RecA or developing methods to capture structures of any intermediate states with different polarity models. 

      (2) The functional impact of key mutants designed based on structure has not been tested in cells to evaluate how these mutants impact the HR pathway. 

      The significance of the work for the DNA repair field and beyond: 

      Homologous recombination (HR) is a key pathway for repairing DNA double-strand breaks and involves multiple steps. RAD51 forms nucleoprotein filaments first with 3' overhang single-strand DNA (ssDNA), followed by a search and exchange with a homologous strand. This function serves as the basis of an accurate template-based DNA repair during HR. This research addressed a long-standing challenge of capturing RAD51 bound with the dynamic synaptic DNA and provided the first structural insight into how RAD51 performs this function. The significance of this work extends beyond the discovery of biology for the DNA repair field, into its medical relevance. RAD51 is a potential drug target for inhibiting DNA repair in cancer cells to overcome drug resistance. This work offers a structural understanding of RAD51's function with the D-loop and provides new strategies for targeting RAD51 to improve cancer therapies. 

      We thank the reviewer for their positive comments on the significance of our work. Concerning the proposed polarity of strand exchange based on our structural finding, please see our reply to the previous reviewer; we agree with the reviewer that further experimentation will be needed to to reach a settled view on this.

      Testing the functional effects of the RAD51 mutants on HR in cells was not an aim of the current work but we agree that it would be a very interesting experiment, which would likely provide further important insights into the mechanism of strand exchange at the core of the HR reaction.

      Reviewer #1 (Recommendations for the authors):

      Major points:

      (1) Structural analysis showed a critical role of F279 in the L2 loop. However, the biochemical study showed that the F279A substitution did not provide a strong defect in the in vitro strand exchange, as shown in Figure 5B. Moreover, a previous study by Matsuo et al. FEBS J, 2006; ref 43) showed human RAD51-F279A is proficient in the in vitro strand exchange. These suggest that human RAD51 F279 is not critical for the strand exchange. The authors need more discussions of the role of F279 or the L2 for the RAD51-mediated reactions in the Discussion.

      In the strand-exchange essay of Figure 5B, the F279A mutant shows the mildest phenotype, in agreement with the findings of Matsuo et al. Accordingly, in the text we describe the F279A mutant as having a “modest impact” on strand-exchange.

      We have now added a brief comment to the relevant text, pointing out that the result of the strand exchange assay for F279A are in agreement with the previous findings by Matsuo et al., and adding the reference.

      (2) In some parts, the authors cited the newest references rather than the paper describing the original findings. For RAD51 paralogs, why are these three (refs 21,22, 23) selected here? For FIGNL1, why is only one (ref 24) chosen?

      The cited publications were chosen to acquaint the reader with the latest structural and mechanistic advances about the function of some of the most important and well-studied recombination mediator proteins. For completeness, we have now added a further reference for FIGNL1 - Ito, Masaru et al, Nat Comm, 2023 – in the Introduction, to provide the reader with an additional pointer to our current knowledge about the mechanism of FIGNL1 in Homologous Recombination.

      Minor points:

      (1) Page 3, line 1 in the second paragraph, the reaction of "HR": HR should be homology search and strand exchange. HR is used incorrectly throughout the text, please check them. Remove "strandexchange" from ATPases in line 2.

      We believe that HR is used correctly in this context, as we refer to the biochemical reactions of HR, which includes the search for homology and strand exchange.

      We have removed “strand-exchange” from ATPases in line 2, as requested by the reviewer.

      (2) Supplementary Figure 1B, C, "EMSA" experiment: Please indicate an experimental condition in the legend: how ssDNA and dsDNA were mixed with RAD51. In (B), this is not an actual EMSA result, but rather a native gel analysis of reaction products with the D-loop. In (C), was the binding of RAD51 to the pre-formed D-loop examined? Which is correct here? Moreover, why do the authors need streptavidin in this experiment? Please explain why this is necessary for the EMSA assay. Please show where is Cy3 or Cy5 labels on the DNAs should be shown in the schematic drawing.

      The conditions for the experiments of Supplementary figure 1B, C are reported in the Methods section.

      Panel B shows the mobility shifts of the ssDNA and dsDNA sequences in panel A, so it is appropriate to describe it as an EMSA.

      We did not examine the binding of RAD51 to a pre-formed D-loop.

      We used streptavidine in the experiment of Supplementary Figure 1C to show that streptavidine binding did not interfere with D-loop reconstitution.

      The position of the Cy3, Cy5 labels in the DNAs is reported in Table S1.

      (3) Figure S4B, page 6, line 6 from the top, 5'-arm and 3'-arm: please add them to the figure. And also, please explain what 5'-arm and 3'-arm are here in the text, as shown in lines 3-5 in the second paragraph of the same page.

      We thank the reviewer for spotting this slight incongruity. We have removed the reference to 5’- and 3’arms of the donor DNA in the initial description of the D-loop (first paragraph of the “D-loop structure” section, 6 lines from the top), as the nomenclature for the arms of the donor DNA is introduced more appropriately in the following paragraph. Thus, there is no need to re-label Figure S4B; we note that the 5’- and 3’-labels are added to the arms of the donor DNA in Figure S4D.

      (4) Page 7, line 4, and Figure 2E, "C24": C24 should be C26 here (Figure 2D shows that position 24 in esDNA is "T").

      We thank the reviewer for spotting this typo, that is now corrected in the revised version of Figure 2 and in the text.

      (5) Page 8, line 1, K284: It would be nice to show "K284" in Figure 3F.

      We have added the side chain of K284 to Figure 3F, as suggested by the reviewer.

      (6) Page 8, second paragraph, line 3 from the bottom, "5'-arm" should be "3'-arm" for the binding of RAD51A NTD to ds DNA (Figure 4D).

      We thank the reviewer for spotting this typo, that is now corrected in the revised version of the text.

      Reviewer #2 (Recommendations for the authors):

      I understand that the strand exchange polarity of RAD51 should be opposite to that of RecA. But in the RecA manuscript (Nature 2020), it states (in the extended figure 1) " Because the mini-filament consists of fused RecA protomers, it does not reflect the effects a preferential polarity of RecA polymerization might have on the directionality of strand exchange. Also, our strand exchange reactions do not include the single-stranded DNA binding protein SSB that is involved in strand exchange in vivo and may sequester released DNA strands."

      We are aware that the findings by Yang et al, 2020 were obtained with a multi-protomeric RecA chimera and that their construct might not therefore recapitulate a potential effect of RecA polymerisation on the directionality of strand-exchange. 

      Comparison of the RecA and RAD51 D-loop structures shows that RecA and RAD51 adopt the same asymmetric mechanism of D-loop formation, which begins at one arm of the donor DNA and proceeds with donor unwinding and strand invasion until the second arm is captured, completing D-loop formation. However, the cryoEM structures provide compelling evidence that, after engagement with the donor DNA, RecA and RAD51 proceed to unwind the donor with opposite polarity; the structures provide a clear rationale for this, because of the different position of their dsDNA-binding domains relative to the ATPase domain.

      We acknowledge that there exists an extensive body of literature that has investigated the polarity of strand exchange by RecA and RAD51 under a variety of experimental conditions, and we have added a brief comment to the text to reflect this, as well as some of the key citations. Undoubtedly, and as we also mention in our reply to the public reviews, further experimental work will be needed for a full reconciliation of the available evidence.

      Reviewer #3 (Recommendations for the authors):

      (1) I have a minor comment regarding the DNA shown in the structural figures in this work. The authors have used different colours to differentiate between isDNA, esDNA, and csDNA for easier interpretation. However, these colour codes are inconsistent across Figures 1, 2, 3, and 5. This inconsistency makes it difficult to interpret which strand is which, particularly for readers unfamiliar with D-loops and strand invasion. A consistent colour scheme for the DNA strands would enhance the quality of the structural figures.

      We appreciate the reviewer’s comment about the colour scheme of the strands in the D-loop. We chose a unique colour scheme for each figure, to help the reader focus on the particular structural features that we wanted to highlight in the figure. So for instance, in figure 1D we chose to highlight the relationship (complementary vs identical) of the donor DNA strands with the the invading strand; in figure 2, the emphasis is on distinguishing the homologously paired dsDNA (pink) from the exchanged strand (magenta), as a consequence of L2 loop binding; etc.

      (2) I have another comment regarding the rationale behind naming the RAD51 protomers (A to H) within the structure, which could confuse general readers if not clearly explained. In this paper, the RAD51 protomer is RAD51_A when closest to the 3' end of the isDNA. I assume the authors chose this order because HR generates a 3' ssDNA overhang before strand invasion. It would be beneficial for the introduction and results sections to mention this property of the 3' ssDNA overhang and the reasoning behind this naming strategy. This explanation will help readers understand how it differs from other naming orders used in RecA/RAD51 with ssDNA, where protomer A is closer to the 5' ssDNA.

      We thank the reviewer for their insightful comment. We chose to name as chain A the RAD51 protomer nearest to the 3’-end of the isDNA to be consistent with the naming scheme that we use for all our published RAD51 filament structures.

      (3) I have highlighted some text within this paper that has contradicting parts for authors to clarify and correct:

      "Overall, the structural features of the RAD51 D-loop provide a strong indication that strand pairing and exchange begins at the 3'-end of the complementary strand in the donor DNA and progresses with 3'-to5' polarity (Fig. 5F)"

      "The observed 5'-to-3' polarity of strand-exchange by RAD51 is opposite to the 3'-to-5' polarity of bacterial RecA (Fig. S8), that was determined based on cryoEM structures of RecA D-loops".

      We thank the reviewer for alerting us to this inconsistency that has now been corrected in the revised manuscript.

      (4) Figure S8 last model: NTD should be CTD in the title; Figure 2B: resolution scale bar needs A unit. We thank the reviewer for spotting this typo that has now been corrected in the revised version of figure S8. 

      We couldn’t find a missing resolution scale bar in Figure 2B; however, we have added a missing resolution bar with A unit to Fig. S3B.

    1. Author Response:

      The following is the authors’ response to the original reviews

      Reviewer #1(Public Review):

      Summary:

      The authors extended a previous study of selective response to herbivory in Arabidopsis, in order to look specifically for selection on induced epigenetic variation ("Lamarckian evolution"). They found no evidence. In addition, they re-examined result from a previously published study arguing that environmentally induced epigenetic variation was common, and found that these findings were almost certainly artifactual.

      Strengths:

      The paper is very clearly written, there is no hype, and the methods used are state-of-the-art.

      Weaknesses:

      The result is negative, so the best you can do is put an upper bound on any effects.

      Significance:

      Claims about epigenetic inheritance and Lamarckian evolution continue to be made based on very shaky evidence. Convincing negative results are therefore important. In addition, the study presents results that, to this reviewer, suggest that the 2024 paper by Lin et al. [26] should probably be retracted.

      Reviewer #2(Public Review):

      In this paper, the authors examine the extent to which epigenetic variation acquired during a selection treatment (as opposed to standing epigenetic variation) can contribute to adaptation in Arabidopsis. They find weak evidence for such adaptation and few differences in DNA methylation between experimental groups, which contrasts with another recent study (reference 26) that reported extensive heritable variation in response to the environment. The authors convincingly demonstrate that the conclusions of the previous study were caused by experimental error, so that standing genetic variation was mistaken for acquired (epigenetic) variation. Given the controversy surrounding the possible role of epigenetic variation in mediating phenotypic variation and adaptation, this is an important, clarifying contribution.

      I have a few specific comments about the analysis of DNA methylation:

      (1) The authors group their methylation analysis by sequence context (CG, CHG, CHH). I feel this is insufficient, because CG methylation can appear in two distinct forms: gene body methylation (gbM), which is CG-only methylation within genes, and transposable element (TE) and TE-like methylation (teM), which typically involves all sequence contexts and generally affects TEs, but can also be found within genes. GbM and teM have distinct epigenetic dynamics, and it is hard to know how methylation patterns are changing during the experiment if gbM and teM are mixed. This can also have downstream consequences (see point below).

      We thank Reviewer 2 for this suggestion. We usually separate the three contexts because they are set by different enzymes and not because of the general process or specific function. It would indeed be informative to group DMCs into gbM and teM, but as there are many regions with overlaps between genes and transposons, this also adds some complexity. Given that there were very few DMCs, we wanted to keep it simple. Therefore, we wrote that 87.3% of the DMCs were close to or within genes and that 98.1% were close to and within genes or transposons. Together with the clear overrepresentation of the CG context, this indicates that most of the DMCs were related to gbM. We updated the paragraph and specifically referred to gbM to make this point clearer.

      (2) For GO analysis, the authors use all annotated genes as a control. However, most of the methylation differences they observe are likely gbM, and gbM genes are not representative of all genes. The authors' results might therefore be explained purely as a consequence of analyzing gbM genes, and not an enrichment of methylation changes in any particular GO group.

      We are grateful to Reviewer #2 for this suggestion. We updated the GO analysis and defined the background as genes with cytosines that we tested for differences in methylation and which also exhibited overall at least 10% methylation (i.e., one cytosine per gene was sufficient). This resulted in a decrease of the background gene set from 34'615 to 18'315 genes. We still detect enrichment of terms related to epigenetic regulation, transport and growth processes. We have updated the corresponding paragraph accordingly.

      Reviewer #1 (Recommendations for The Authors):

      This paper is very clearly written and could be published as-is. The writing could be improved in a few places, for example:

      "We realized that in this recent study (26), potential errors may have confounded treatments with genetic variation. This is because in that study, Lin and colleagues kept lineages 1-to-1 throughout the experiment by single-seed descent."

      “This” in the second sentence seems to refer to the confounding, not your realization thereof.

      I am sure there are more: just give the manuscript a good read-through.

      We thank the Reviewer for pointing out that some sentences may not be clear. We have edited the manuscript and focused on avoiding misleading or unclear wording.

      Reviewer #2 (Recommendations for The Authors):

      (1) The authors should distinguish gbM from teM and repeat the GO term analysis with an appropriate set of control genes.

      See our response to the public reviews above.

      (2) The authors' experimental design should allow them to directly assess whether the rates of epigenetic change are affected by the selective environment. This would require comparison of methylation patterns of individual plants prior to treatment with their progeny (the progeny is what the authors have currently analyzed). This would entail gathering new data, and I don't feel that this analysis is essential, but given the question the authors are addressing (the extent to which a selective environment can induce heritable epigenetic variation), it seems important to test whether the rates of epigenetic change are at all affected by the selection treatment.

      While this is a very valuable recommendation, we can currently not address it because the person who gathered the data works at a different university now. However, we keep this in mind for future projects.

      Again, we would like to thank the reviewers for the constructive suggestions that help us to improve the manuscript.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      In this study, the authors developed three case studies:

      (1) transcriptome profiling of two human cell cultures (HEK293 and HeLa)

      (2) identification of experimentally enriched transcripts in cell culture (RiboMinus and RiboPlus treatments)

      (3) identification of experimentally manipulated genes in yeast strains (gene knockouts or strains transformed with plasmids containing the deleted gene for overexpression). Sequencing was performed using the Oxford Nanopore Technologies (ONT), the only technology that allows for real-time analysis. The real-time transcriptomic analysis was performed using NanopoReaTA, a recent toolbox for comparative transcriptional analyses of Nanopore-seq data, developed by the group (Wierczeiko and Pastore et al. 2023). The authors aimed to show the use of the tool developed by them in data generated by ONT, evidencing the versatility of the tool and the possibility of cost reduction since the sequencing by ONT can be stopped at any time since enough data were collected.

      Strengths: 

      Given that Oxford Nanopore Technologies offers real-time sequencing, it is extremely useful to develop tools that allow real-time data analysis in parallel with data generation. The authors demonstrated that this strategy is possible for both human cell lines and yeasts in the case studies presented. It is a useful strategy for the scientific community, and it has the potential to be integrated into clinical applications for rapid and cost-effective quality checks in specific experiments such as overexpression of genes.

      Weaknesses:

      In relation to the RNA-Seq analyses, for a proper statistical analysis, a greater number of replicates should have been performed. The experiments were conducted with a minimal number of replicates (2 replicates for case study 1 and 2 and 3 replicates for case study 3).

      We have addressed this issue by performing two new sets of experiments: similar HEK293 vs HeLa with 10 replicates per condition and heatshocked vs non-heat shock with 6 replicates per condition. In the case of HEK293 vs HeLa comparison, we kept the 2 replicates per condition comparison to demonstrate the effect of limited replication number, simulating an early-stage evaluation of the experimental approach to obtain valuable quality control metrics. Nevertheless, we show that relevant and reproducible data can be obtained even with a lower replication number (2 replicates per condition), compared to a higher replication number (10 replicates), across both PromethION and MinION sequencing platforms.

      Regarding the experimental part, some problems were observed in the conversion to doublestranded and loading for Nanopore-Seq, which were detailed in Supplementary Material 2. This fact is probably reflected in the results where a reduction in the overall sequencing throughput and detected gene number for HEK293 compared to HeLa were observed (data presented in Supplementary Figure 2). It is necessary to use similar quantities of RNA/cDNA since the sequencing occurs in real-time. The authors should have standardized the experimental conditions to proceed with the sequencing and perform the analyses.

      We completely agree with the reviewer. In the 10-replicate HEK vs HeLa experiment, we collected similar data to what was presented in Supplementary Material 2. We chose to include this information to highlight the experimental variability that can arise during Nanopore-seq library preparation, particularly with cDNA synthesis. This type of information is not often highlighted in Nanoporebased studies, yet it is crucial to be aware of such differences. Despite these variations, we identified a consistent set of DEGs across comparisons of low versus high replicate numbers. Importantly, NanopoReaTA successfully provided realtime monitoring (e.g. detected number of genes per replicate/condition) as it allows for informed decision-making regarding the next steps in sequencing-based experiments.

      Reviewer #2 (Public Review):

      Transcriptomics technologies play important roles in biological studies. Technologies based on second-generation sequencing, such as mRNA-seq, face some serious obstacles, including isoform analysis, due to short read length. Third-generation sequencing technologies perfectly solve these problems by having long reads, but they are much more expensive. The authors presented a useful real-time strategy to minimize the cost of sequencing with Oxford Nanopore Technologies (ONT). The authors performed three sets of experiments to illustrate the utility of the real-time strategy. However, due to the problems in experimental design and analysis, their aims are not completely achieved. If the authors can significantly improve the experiments and analysis, the strategy they proposed will guide biologists to conduct transcriptomics studies with ONT in a fast and cost-effective way and help studies in both basic research and clinical applications.

      Strengths:

      The authors have recently developed a computational tool called NanopoReaTA to perform real-time analysis when cDNA/RNA samples are sequenced with ONT (Wierczeiko et al., 2023). The advantage of real-time analysis is that the sequencing can be stopped once enough data is collected to save cost. Here, they described three sets of experiments: a comparison between two human cell lines, a comparison among RNA preparation procedures, and a comparison between genetically modified yeasts. Their results show that the real-time strategy works for different species and different RNA preparation methods.

      Weaknesses:

      However, especially considering that the computational tool NanopoReaTA is their previous work, the authors should present more helpful guidelines to perform real-time ONT analysis and more advanced analysis methods. There are four major weaknesses:

      (1) For all three sets of experiments, the authors focused on sample clustering and gene-level differential expression analysis (DEA), and only did little analysis on isoform level and even nothing in any figures in the main text. Sample clustering and gene-level DEA can be easily and well done using mRNA-seq at a much cheaper cost. Even for initial data quality checking, mRNA-seq can be first done in Illumina MiSeq/NextSeq which is quick, before deep sequencing in HiSeq/NovaSeq. The real power of third-generation RNA sequencing is the isoform analysis due to the long read length. At least for now, PacBio Iso-seq is very expensive and one cannot analyze the data in real-time. Thus, the authors should focus on the real-time isoform analysis of ONT to show the advantages.

      We are aware that isoform analysis is one of the powers of real-time monitoring of long-read data, especially with Nanopore-seq. That is why we have included pipelines such as DRIM-seq and DEX-seq, which could provide valuable information about the differential transcript usage (i.e. isoforms). However, interpreting the results in a biologically meaningful context, particularly regarding the role of specific isoforms, remains challenging. This is especially relevant as our main goal is to demonstrate NanopoReaTA's utility as a real-time transcriptomic tool that offers valuable quality control and meaningful insights. Nevertheless, in the heat-shock experiments, we have identified one isoform that was differentially expressed and included it in the main figure. We hope that with the right experimental setup, users could use the incorporated tools for meaningful analyses for isoforms identification.

      (2) The sample sizes are too small in all three sets of experiments: only two for sets 1 and 2, and three for set 3. For DEA, three is the minimal number for proper statistics. But a sample size of three always leads to very poor power. Nowadays, a proper transcriptomics study usually has a larger sample size. Besides the power issue, biological samples always contain many outliers due to many reasons. It is crucial to show whether the real-time analysis also works for larger sample sizes, such as 10, i.e., 20 samples in total. Will the performance still hold when the sample number is increasing? What is the maximum sample number for an ONT run? If the samples need to be split into multiple runs, how the real-time analysis will be adjusted? These questions are quite useful for researchers who plan to use ONT.

      We thank the reviewer for their suggestion. We performed the suggested experiment in the HEK293 vs HeLa, taking 10 replicates per condition and acquired the data during the sequencing. As you can see in the results (Figure 2), the performance held very well, from the first hour up until the 24hour mark. In theory, the maximum number of barcodes that can be integrated in a sequencing run can be used for the pair-wise comparison. We are using 24 barcoding kit (provided by ONT) therefore we can include up to 12 replicates per condition. We are aware that there is a 96 barcoding kit that could be used as well. However, it is important to note that with more samples integrated in the sequencing run, less reads will be generated per sample. Therefore, it is important to plan properly the number of replicates used per sequencing run.

      (3) According to the manuscript, real-time analysis checks the sequencing data in a few time points, this is usually called sequential analysis or interim analysis in statistics which is usually performed in clinical trials to save cost. Care must be taken while performing these analyses, as repeated checks on the data can inflate the type I error rate. Thus, the authors should develop a sequential analysis procedure for real-time RNA sequencing.

      We would like to respond to this comment by addressing two points: 1) Quality control: During the analysis we offer two main statistics, which enable scientists to assess the experimental development. For each iteration the change in relative gene counts per sample is computed to assess the convergence towards 0. Moreover, for each iteration the number of detected genes per sample is computed to assess whether the number of detected reads is saturated. These metrics allow the user to independently assess whether samples within the experimental development reach a stable state, to reveal a meaningful timepoint of data evaluation. 

      Sequential analysis: One solution to lower the type 1 error during sequential analysis is using the Pocock boundary, a systematic lowering of the p-value threshold depending on the number of interim analyses. We offer in NanopoReaTA a custom choice of the p-value threshold during the analysis. This allows researchers to set their parameters as needed.  

      (4) The experimental set 1 (comparison between two completely different human cell lines) and experimental set 2 (comparison among RNA preparation procedures) are not quite biologically meaningful. If it is possible, it is better for the authors to perform an experiment more similar to a real situation for biological discovery. Then the manuscript can attract more researchers to follow its guidelines.

      We took the suggestion of reviewer 2 (from recommendation for authors) to perform heat-shock experimental comparison between heatshocked and non-heat shocked cells from the same cell line (HEK293). We sequenced the sample (6 replicates per condition) and one-hour postsequencing initiation, we already identified three DEGs (including HSPA1A, DNAJB1, and HSP90AA1) known to be upregulated in heat shock conditions (Yonezawa and Bono 2023, Sanchez-Briñas et al. 2023). Therefore, we illustrate how NanopoReaTA can capture biologically relevant insights in real time.

      Reviewer #1 (Recommendations for The Authors):

      (1) The comparison between two different human cell lines doesn't have much biological relevance. It would be more interesting and useful to evaluate the genes and transcripts expressed from the same cell in different conditions.

      As mentioned previously, we conducted a heat-shock experimental comparison between heat-shocked and non-heat-shocked within the same cell line HEK293. We observed reliable results already within one hour of initiating the sequencing.

      (2) Increase the number of replicates to give greater confidence in the results.

      We have addressed the replicate issue by performing two new sets of experiments: HEK293 vs HeLa with 10 replicates per condition and heatshocked vs non-heat shock with 6 replicates per condition. In both cases, we obtained reliable and reproducible results (even when comparing with lower replicate number).

      (3) One of the advantages of performing Nanopore sequencing is the possibility of sequencing RNA molecules directly. It would be interesting to test the real-time analysis strategy in parallel using direct RNA sequencing if it is possible.

      That is a great point. In theory, it would be possible to perform realtime differential gene expression on direct RNA data (since the pipeline for such analysis is already integrated in NanopoReaTA), however the limiting factor is the lack of multiplexing. To perform real-time transcriptomic analysis with direct RNA-seq data, one would need to sequence at least 4 flow cells (MinION or PromethION), each containing one sample (2 flow cells per condition to perform pairwise transcriptomic analyses). Despite the possibility of such an analysis, this scenario will not be cost-effective as this will increase significantly the costs for the amount of data gathered. We are aware that ONT is planning to release a multiplexing option to direct RNA-seq in the unforeseen future. We have integrated the option of direct RNA-seq analyses for the day that such option will be available, and the users will be able to perform real-time transcriptomic analysis with dRNA-seq data.  

      Some minor weakneses are below:

      (4) With respect to the text as a whole, the authors should be more careful with standardization, such as mL/ml and uL/ul, Ribominus/RiboMinus.

      We have standardized the nomenclature to µL, mL and Ribominus (due to trademark).  

      (5) Set up paragraphs on page 9 and throughout the text when necessary.

      We have set the suggested paragraphs on page 9 and throughout the text.

      (6) Please, check the word form in the sentence: "To isolate the RNA form the

      RiboMinus{trade mark, serif} supernatant.."

      The word has been corrected.

      (7) In order to make clear to the reader at the outset, I suggest including in the methodology how many biological replicates were performed for each cell type studied (cell lines and yeast strains).

      _For cell line w_e have included now the number of replicates used for each replicate. We have included this also for yeast setups. 

      (8) Please, check the Supplementary Tables as the word VERDADEIRO has not been translated (TRUE) in Supplementary Table 1.

      This issue appears to be influenced by the language settings configured on the viewer's computer.

      (9) On page 17, I suggest including the absorbance used to measure RNA concentration in HEK293 and HeLa cell lines. Also, I suggest including how the quality of the RNA extracted from the cell cultures and yeast strains was determined. Was the ratio 260/280 and 260/230 calculated? Given that the material was extracted with Trizol, which has phenol and chloroform in its composition, it would be important to evaluate the quality of the RNA, especially by calculating the 260/230 ratio.

      We have included a statement regarding the concentrations and quality of RNA in the “RNA isolation” section within the material and methods.

      (10) On page 18, the topic of Selective purification of ribosomal-depleted (RiboMinus) and ribosomal-enriched (RiboPlus) transcripts needs to be better detailed, especially in the last two sentences. For example: "The pooled bead samples (containing the rRNA) were further processed with Trizol RNA isolation to complete the purification." This sentence should be detailed to make it clear that this procedure is what you call ribosomal-enriched (RiboPlus).

      Qualitative analysis of the material was performed after rRNA depletion and enrichment.

      We have made these sentences clearer.

      (9) On the topic of Direct cDNA-native barcoding Nanopore library preparation and sequencing, in the following sentences: "Concentration determination (1 μl) and adapter ligation using 5 μL NA, 10 μL NEBNext Quick Ligation Reaction Buffer (5X), and 5 μL Quick T4 DNA Ligase (NEB, cat # E6056) were performed. Pooled library purification with 0.7X AMPure XP Beads resulted in a final elution volume of 33 μl EB. Concentration of the pooled barcoded library was determined using Qubit (1 μl)."

      Two concentration determinations were performed, before and after adapter ligation. I suggest writing one sentence for concentration determination and another for adapter ligation.

      We applied the reviewer’s suggestion. 

      (11) In the section Experimental Design in Results, the first sentences are part of the methodology and are described in materials and methods. I suggest removing it from the results and rewriting the text. Results of the RNA extraction methodology and library preparation were shown in supplementary material. Thus, the authors could mention that the results were presented in supplementary material.

      We have revised this section to remove the details of RNA extraction and library preparation, focusing instead on the pipeline and experimental setups. The methodology is outlined in Figure 1, as well as in the materials and methods and the supplementary figures for each experimental setup.

      Reviewer #2 (Recommendations For The Authors):

      For major weakness 4 described in the Public Review, the authors could try experiments like:

      (1) comparison between females and males of tissues or primary cells; or

      (2) comparison between cell lines before and after heat shock.

      They are easy to perform and much more similar to real experimental designs for discovery, and the authors may actually have some new findings because usually people do not do much investigation on the isoform level using mRNA-seq.

      We thank the reviewer for their suggestions. We performed the heat-shock experimental comparison between heat-shocked and non-heat shocked cells from the same cell line (HEK293). We sequenced the sample (6 replicates per condition) and already one-hour post-sequencing initiation, we identified three DEGs including HSPA1A, DNAJB1, and HSP90AA1 reported to be upregulated heat shock conditions (Yonezawa and Bono 2023, Sanchez-Briñas et al. 2023). We have identified one differentially expressed isoform and included it in the main figure.

      There are two minor weaknesses:

      (1) Many figure numbers in the main text are wrong, including:

      Page 4, "similarity plot and principal component analysis (PCA) (Figure 1B, 1C)";

      Page 7, "same intervals as mentioned earlier (Figure 1A)", and "Next, we inspected the PCA and dissimilarity plots (Figure 2B";

      Page 10, "process (Supplementary Figure 19A) until the 24-hour PSI mark point (Figure 9B", and "NEW1 was the sole differentially expressed gene (Figure 9D)".

      The authors should be more careful about this. It is very confusing for readers.

      We have addressed these points in the text. 

      (2) The texts in the figures are too small to recognize, especially in Figures 4 and 5. The reason is that there are too many sub-figures in one figure. Is that really necessary to put more than 20 sub-figures in one? The authors should better summarize their results. For example, remove sub-figures with little information; do not show figures with the same styles again and again in the main text and just summarize them instead.

      We thank the reviewer for the suggestion. We have updated the figure to focus on the most relevant comparisons (new1Δ-pEV vs. WT-pEV and rkr1Δ-pEV vs. WT-pEV), providing a clearer and more realistic comparison between mutant and wild-type conditions in the main figure. Additionally, a summary and all related comparisons are included in Supplementary Documents S4 and S5. We believe these supplementary figures are essential to demonstrate NanopoReaTA's capabilities as a quality control tool, effectively detecting expected transcriptomic alterations in real-time.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors have developed self-amplifying RNAs (saRNAs) encoding additional genes to suppress dsRNA-related inflammatory responses and cytokine release. Their results demonstrate that saRNA constructs encoding anti-inflammatory genes effectively reduce cytotoxicity and cytokine production, enhancing the potential of saRNAs. This work is significant for advancing saRNA therapeutics by mitigating unintended immune activation.

      Strengths:

      This study successfully demonstrates the concept of enhancing saRNA applications by encoding immune-suppressive genes. A key challenge for saRNA-based therapeutics, particularly for non-vaccine applications, is the innate immune response triggered by dsRNA recognition. By leveraging viral protein properties to suppress immunity, the authors provide a novel strategy to overcome this limitation. The study presents a well-designed approach with potential implications for improving saRNA stability and minimizing inflammatory side effects.

      We thank Reviewer #1 for their thorough review and for recognizing both the significance of our work and the potential of our strategy to expand saRNA applications beyond vaccines.

      Weaknesses:

      (1) Impact on Cellular Translation:

      The authors demonstrate that modified saRNAs with additional components enhance transgene expression by inhibiting dsRNA-sensing pathways. However, it is unclear whether these modifications influence global cellular translation beyond the expression of GFP and mScarlet-3 (which are encoded by the saRNA itself). Conducting a polysome profiling analysis or a puromycin labeling assay would clarify whether the modified saRNAs alter overall translation efficiency. This additional data would strengthen the conclusions regarding the specificity of dsRNA-sensing inhibition.

      We thank the Reviewer for this insightful suggestion. We performed a puromycin labeling assay to assess global translation rates (Figure 3—figure supplement 1c). This experiment revealed that the E3 construct significantly reduces global protein synthesis, despite driving high levels of saRNAencoded transgene expression (Figure 1d, e). In contrast, the E3-NSs-L* construct mitigated this reduction in global translation while maintaining moderate transgene expression. These findings support our hypothesis that E3 enhances transgene output in part by activating RNase L, which degrades host mRNAs and thereby reduces ribosomal competition. We appreciate the Reviewer’s recommendation of this experiment, which has strengthened the manuscript.

      (2) Stability and Replication Efficiency of Long saRNA Constructs:

      The saRNA constructs used in this study exceed 16 kb, making them more fragile and challenging to handle. Assessing their mRNA integrity and quality would be crucial to ensure their robustness.

      Furthermore, the replicative capacity of the designed saRNAs should be confirmed. Since Figure 4 shows lower inflammatory cytokine production when encoding srIkBα and srIkBαSmad7-SOCS1, it is important to determine whether this effect is due to reduced immune activation or impaired replication. Providing data on replication efficiency and expression levels of the encoded anti-inflammatory proteins would help rule out the possibility that reduced cytokine production is a consequence of lower replication.

      We thank the Reviewer for these valuable suggestions.

      To assess the integrity of the saRNA constructs, we performed denaturing gel electrophoresis (Supplemental Figure 6c). The native saRNA, E3, and E3-NSs-L* constructs each migrated as a single band. The moxBFP, srIκBα, and srIκBα-Smad7-SOCS1 constructs showed both a full-length transcript and a lower-abundance truncated band (Supplemental Figure 6d), suggestive of a cryptic terminator sequence introduced in a region common to these three constructs.

      To evaluate replicative capacity, we performed qPCR targeting EGFP, which is encoded by all constructs. This analysis revealed that the srIκBα-Smad7-SOCS1 construct exhibited lower replication efficiency than both native saRNA and E3. Several factors may contribute to this difference, including the longer transcript length, reduced molar input when equal mass was used for transfection, prevention of host mRNA degradation due to RNase L inhibition, or the presence of truncated transcripts.

      Given these confounding variables, we revised our approach to analyzing cytokine production. Rather than comparing all six constructs together, we split the analysis into two parts: (1) the effects of dsRNA-sensing pathway inhibition (Figure 4a), and (2) the effects of inflammatory signalling inhibition (Figure 4c). For the latter, we compared srIκBα and srIκBα-Smad7-SOCS1 to moxBFP, as these three constructs are more comparable in size, share the same truncated transcript, and all encode L* to inhibit RNase L. This strategy minimizes the likelihood that differences in the cytokine responses are due to variation in replication efficiency.

      (3) Comparative Data with Native saRNA:

      Including native saRNA controls in Figures 5-7 would allow for a clearer assessment of the impact of additional genes on cytokine production. This comparison would help distinguish the effect of the encoded suppressor proteins from other potential factors.

      We thank the Reviewer for this helpful suggestion. We have added the native saRNA condition to Figure 5 as a visual reference. However, due to the presence of truncated transcripts in the constructs designed to inhibit inflammatory signalling pathways, the actual amount of full-length saRNA delivered in these conditions is likely lower than expected, despite using equal total RNA mass for transfection. This complicates direct comparisons with constructs targeting dsRNAsensing pathways, which do not show transcript truncation. For this reason, native saRNA was included only as a visual reference and was not used in statistical comparisons with the inflammatory signalling inhibitor constructs.

      (4) In vivo Validation and Safety Considerations:

      Have the authors considered evaluating the in vivo potential of these saRNA constructs? Conducting animal studies would provide stronger evidence for their therapeutic applicability. If in vivo experiments have not been performed, discussing potential challenges - such as saRNA persistence, biodistribution, and possible secondary effectswould be valuable.

      (5) Immune Response to Viral Proteins:

      Since the inhibitors of dsRNA-sensing proteins (E3, NSs, and L*) are viral proteins, they would be expected to induce an immune response. Analyzing these effects in vivo would add insight into the applicability of this approach.

      We appreciate the Reviewer’s points regarding in vivo validation and safety considerations. While in vivo studies are beyond the scope of the present investigation, we agree that evaluating therapeutic potential, biodistribution, persistence, and secondary effects will be essential for future translation. We have now included a brief discussion of these considerations at the end of the revised discussion. In ongoing work, we are planning follow-up studies incorporating in vivo imaging and functional assessments of saRNA-driven cargo delivery in preclinical models of inflammatory joint pain.

      Regarding the immune response to viral proteins, we agree that this is an important consideration and have now included a clearer discussion of this limitation in the revised manuscript. Specifically, we highlight that encoding multiple viral inhibitors (E3, NSs, and L*), in combination with the VEEV replicase, may increase the likelihood of adaptive immune recognition via MHC class I presentation. This could lead to cytotoxic T cell–mediated clearance of saRNA-transfected cells, thereby limiting therapeutic durability. We emphasize that addressing both intrinsic cytotoxicity and immune-mediated clearance will be essential for advancing the clinical potential of this platform.

      (6) Streamlining the Discussion Section:

      The discussion is quite lengthy. To improve readability, some content - such as the rationale for gene selection-could be moved to the Results section. Additionally, the descriptions of Figure 3 should be consolidated into a single section under a broader heading for improved coherence.

      Thank you for these helpful suggestions. We have streamlined the Discussion to improve readability and have moved the rationale for gene selection to the results section, as recommended. In addition, we have consolidated the Figure 3 descriptions to improve coherence and to simplify the presentation.

      Reviewer #2 (Public review):

      Summary:

      Lim et al. have developed a self-amplifying RNA (saRNA) design that incorporates immunomodulatory viral proteins, and show that the novel design results in enhanced protein expression in vitro in mouse primary fibroblast-like synoviocytes. They test constructs including saRNA with the vaccinia virus E3 protein and another with E3, Toscana virus NS protein and Theiler's virus L protein (E3 + NS + L), and another with srIκBα-Smad7SOCS1. They have also tested whether ML336, an antiviral, enables control of transgene expression.

      Strengths:

      The experiments are generally well-designed and offer mechanistic insight into the RNAsensing pathways that confer enhanced saRNA expression. The experiments are carried out over a long timescale, which shows the enhance effect of the saRNA E3 design compared to the control. Furthermore, the inhibitors are shown to maintain the cell number, and reduce basal activation factor-⍺ levels.

      We thank Reviewer #2 for their thoughtful and detailed assessment of our manuscript, and for recognizing the mechanistic insights provided by our study. We also appreciate their positive comments on the experimental design, the extended timescale, and the observed effects on transgene expression, cell viability, and basal fibroblast activation factor-α levels.

      Weaknesses:

      One limitation of this manuscript is that the RNA is not well characterized; some of the constructs are quite long and the RNA integrity has not been analyzed. Furthermore, for constructs with multiple proteins, it's imperative to confirm the expression of each protein to confirm that any therapeutic effect is from the effector protein (e.g. E3, NS, L). The ML336 was only tested at one concentration; it is standard in the field to do a dose-response curve. These experiments were all done in vitro in mouse cells, thus limiting the conclusion we can make about mechanisms in a human system.

      Thank you for your detailed feedback. We have added new experiments and clarified limitations in the revised manuscript to address these concerns:

      RNA integrity: We performed denaturing gel electrophoresis on the in vitro transcribed saRNA constructs (Supplemental Figure 7c). Constructs targeting dsRNA-sensing pathways migrated as a single band, while those targeting inflammatory signalling pathways showed both a full-length product and a common, lower-abundance truncated transcript. This suggests that the actual amount of full-length RNA delivered for the constructs inhibiting inflammatory signalling was overestimated. To account for this, we avoided direct comparisons between the two types of constructs and instead focused on comparisons within each type to ensure more meaningful interpretation.

      Confirmation of protein expression: While we acknowledge that direct measurement of each protein would provide additional insight, we believe the functional assays presented offer strong evidence that the encoded proteins are expressed and exert their intended biological effects. Additionally, IRES functionality was confirmed visually using fluorescent protein reporters, supporting the successful expression of downstream genes.

      ML336 concentration–response: We have now performed a concentration–response analysis for ML336 (Figure 8a and b), which demonstrates its ability to modulate transgene expression in a concentration-dependent manner.

      Use of human cells: We agree that testing these constructs in human cells is essential for future translational applications and are actively exploring opportunities to evaluate them in patientderived FLS. However, previous studies have shown that Theiler’s virus L* does not inhibit human RNase L (Sorgeloos et al., PLoS Pathog 2013). As a result, it is highly likely that the E3-NSs-L* construct will not function as intended in human systems. Addressing this limitation will be a priority in our future work, where we aim to develop constructs incorporating inhibitors specific to human RNase L to ensure efficacy in human cells.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Figure 2c is not indicated.

      Thank you for pointing out this error. It has now been corrected in the revised manuscript.

      Reviewer #2 (Recommendations for the authors):

      (1) The Graphical Abstract is a bit confusing; suggest modifying it to represent the study and findings more accurately.

      We have revised the graphical abstract to improve clarity and better reflect the study’s design and main findings. Thank you for the suggestion.

      (2) The impact of this paper would be greatly improved if these experiments were repeated, at least partially, in human cells. The rationale for mouse cells in vitro is unclear.

      The rationale for developing constructs targeting mouse cells is based on our intention to utilize these constructs in mouse models of inflammatory joint pain in future studies.

      We recognize that incorporating data from human cells would significantly enhance the translational relevance of our work, and we are actively pursuing collaborations to test these constructs in patient-derived FLS. However, a key component of our saRNA constructs—Theiler’s virus L*—has been shown to inhibit mouse, but not human, RNase L (Sorgeloos et al., PLoS Pathog 2013). Consequently, the E3-NSs-L* polyprotein may not function as intended in human cells. To address this limitation, future work will focus on developing constructs that incorporate inhibitors specific to human RNase L, thereby facilitating more effective translation of our findings to human systems.

      (3) The ML336 was only tested at one concentration and works mildly well, but would be more impactful if tested in a dose-response curve.

      We have now performed a concentration–response analysis for ML336 (Figure 8a and b), which demonstrates its concentration-dependent effects on transgene expression and saRNA elimination. Thank you for the suggestion.

      (4) Overall, there is not a cohesive narrative to the story, instead it comes off as we tried these three different approaches, and they worked in different contexts.

      We have revised the graphical abstract, results, and discussion to improve the cohesiveness of the manuscript’s narrative and to better integrate the mechanistic rationale linking the different approaches. We appreciate the feedback.

      (5) The title is not supported by the data; the saRNA is still somewhat cytotoxic, immunostimulatory and the antiviral minimally controls transgene expression; suggest making this reflect the data.

      We have revised the title to better reflect the scope of the data and the mechanistic focus of the study. The updated title emphasizes the pathways targeted and the outcomes demonstrated, while avoiding overstatement. Thank you for this helpful recommendation.

    1. Author response:

      We would like to thank the reviewers and the editorial team for all their thoughtful and constructive feedback. The reviewers provided many helpful comments which we will work to incorporate in our resubmission as we believe they will significantly enhance the quality of our manuscript.

      An overarching critique shared among reviewers was regarding limitations in our datasets. Namely, lower N-values for certain groups make some conclusions less reliable. We acknowledge this limitation and will add more experiments to address this concern. Additionally, attention was drawn to our reliance on using the generalized linear model (GLM) for making claims about rebalancing and learning-related changes. To address this, we will work to include additional analyses such as ACC spike-triggered average CA1sup responses, cross-covariances between ACC and CA1sup cells in post-task sleep, and ripple-triggered cross-correlations, among others as per reviewer recommendations. We will also provide a deeper analysis of the weights CA1 neuron in our GLM analysis and their specific features during learning. In accordance, we will provide a clearer description of our learning paradigm including performance data for each animal and how performance relates to our analyses. Overall, we will include more analyses of our datasets across various task events such as recall, to make more efficient use of the full repertoire of our recordings.

      Concerns were also raised regarding some aspects of our statistical analyses. During revision, we will ensure we select the most appropriate statistical measure for each of our tests. Our paper implements the use of tetrode recordings to assess sublayer identification. This approach comes with limitations, and in our resubmission, we will provide a more detailed explanation of those limitations along with a more thorough description of our measures to mitigate them.

      Lastly, in our follow-up submission we will work to improve the written clarity of findings. Specifically, we will simplify and better explain our findings and provide clearer justification for our interpretations and choice of analyses.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      A) The presentation of the paper must be strengthened. Inconsistencies, mislabelling, duplicated text, typos, and inappropriate colour code should be changed.

      We spotted and corrected several inconsistencies and mislabelling issues throughout the text and figures. Thanks!  

      B) Some claims are not supported by the data. For example, the sentence that says that "adolescent mice showed lower discrimination performance than adults (l.22) should be rewritten, as the data does not show that for the easy task (Figure 1F and Figure 1H).

      We carefully reviewed the specific claims and fixed some of the wording so it adheres to the data shown.

      C) In Figure 7 for example, are the quantified properties not distinct across primary and secondary areas?

      We now carried out additional analysis to test this. We found that while AUDp and AUDv exhibit distinct tuning properties, they show similar differences between adolescent and adult neurons (see Supplementary Table 6, Fig. S7-1a-h). Note that TEa and AUDd could not be evaluated due to low numbers of modulated neurons in this protocol.

      D) Some analysis interpretations should be more cautious. (..) A lower lick rate in general could reflect a weaker ability to withhold licking- as indicated on l.164, but also so many other things, like a lower frustration threshold, lower satiation, more energy, etc).

      That is a fair comment, and we refined our interpretations. Moreover, we also addressed whether impulsiveness impacted lick rates. In the Educage, we found that adolescent mice had shorter ITIs only after FAs (Fig. S2-1). In the head-fixed setup, we examined (1) the proportion of ITIs where licks occurred (Fig. S3-1c) and (2) the number of licks in these ITIs (Fig. S3-1d). We found no differences between adolescents and adults, indicating that the differences observed in the main task are not due to general differences in impulsiveness (Fig. S2-1, Fig. S3-1c, d). Finally, we note that potential differences in satiation were already addressed in the original manuscript by carefully examining the number of trials completed across the session. See also Review 3, comment #1 below.

      Reviewer #2 (Public review):

      A) For some of the analyses that the authors conducted it is unclear what the rationale behind them is and, consequently, what conclusion we can draw from them.

      We reviewed the manuscript carefully and revised the relevant sections to clarify the rationale behind the analyses. See detailed responses to all the reviewer’s specific comments.

      B) The results of optogenetic manipulation, while very interesting, warrant a more in-depth discussion.

      We expanded our discussion on these experiments (L495-511) and also added an additional analysis to strengthen our findings (Fig. S3-2e).

      Reviewer #3 (Public review):

      (1) The authors report that "adolescent mice showed lower auditory discrimination performance compared to adults" and that this performance deficit was due to (among other things) "weaker cognitive control". I'm not fully convinced of this interpretation, for a few reasons. First, the adolescents may simply have been thirstier, and therefore more willing to lick indiscriminately. The high false alarm rates in that case would not reflect a "weaker cognitive control" but rather, an elevated homeostatic drive to obtain water. Second, even the adult animals had relatively high (~40%) false alarm rates on the freely moving version of the task, suggesting that their behavior was not particularly well controlled either. One fact that could help shed light on this would be to know how often the animals licked the spout in between trials. Finally, for the head-fixed version of the task, only d' values are reported. Without the corresponding hit and false alarm rates (and frequency of licking in the intertrial interval), it's hard to know what exactly the animals were doing.

      irst, as requested, we added the Hit rates and FA rates for the head-fixed task (Fig. S3-1a). Second, as requested by the reviewr, we performed additional analyses in both the Educage and head-fixed versions of the task. Specifically, we analyzed the ITI duration following each trial outcome. We found that adolescent mice had shorter ITIs only after Fas (Fig. S2-1). In the head-fixed setup, we examined (1) the proportion of ITIs during which licks occurred (Fig. S3-1c) and (2) the number of licks in these ITIs (Fig. S3-1d). We found no differences between adolescents and adults, indicating that the differences observed in the main task are not due to general differences in impulsiveness (Fig. S2-1, Fig. S3-1c, d). See also comment #D of reviewer #1 above.

      B) There are some instances where the citations provided do not support the preceding claim. For example, in lines 64-66, the authors highlight the fact that the critical period for pure tone processing in the auditory cortex closes relatively early (by ~P15). However, one of the references cited (ref 14) used FM sweeps, not pure tones, and even provided evidence that the critical period for this more complex stimulus occurred later in development (P31-38). Similarly, on lines 72-74, the authors state that "ACx neurons in adolescents exhibit high neuronal variability and lower tone sensitivity as compared to adults." The reference cited here (ref 4) used AM noise with a broadband carrier, not tones.

      We carefully checked the text to ensure that each claim is accurately supported by the corresponding reference.

      C) Given that the authors report that neuronal firing properties differ across auditory cortical subregions (as many others have previously reported), why did the authors choose to pool neurons indiscriminately across so many different brain regions?

      We appreciate the reviewer’s concern. While we acknowledge that pooling neurons across auditory cortical subregions may obscure region-specific effects, our primary focus in this study is on developmental differences between adolescents and adults, which were far more pronounced than subregional differences.

      To address this potential limitation: (1) We analyzed firing differences across subregions during task engagement (see Fig. S4-1, S4-2, S4-3; Supplementary Tables 2 and 3). (2) We have now added new analyses for the passive listening condition in AUDp and AUDv (Fig. S7-1; Supplementary Table 6).

      These analyses support our conclusion that developmental stage has a greater impact on auditory cortical activity than subregional location in the contexts examined. For clarity and cohesion, the main text emphasizes developmental differences, while subregional analyses are presented in the Supplement.

      D) And why did they focus on layers 5/6? (Is there some reason to think that age-related differences would be more pronounced in the output layers of the auditory cortex than in other layers?)

      We agree that other cortical layers, particularly supragranular layers, are important for auditory processing and plasticity. Our focus on layers 5/6 was driven by both methodological and biological considerations. Methodologically, our electrode penetrations were optimized to span multiple auditory cortical areas, and deeper layers provided greater mechanical stability for chronic recordings. Biologically, layers 5/6 contain the principal output neurons of the auditory cortex and are well-positioned to influence downstream decision-making circuits. We acknowledge the limitation of our recordings to these layers in the manuscript (L268; L464-8).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The presentation of the paper must be strengthened. As it is now, it makes it difficult to appreciate the strengths of the results. Here are some points that should be addressed:

      a) The manuscript is full of inconsistencies that should be fixed to improve the reader's understanding. For example, the description on l.217 and the Figure. S3-1b, the D' value of 0 rounded to 0.01 on l. 735 (isn't it rather the z-scored value that is rounded? A D' of 0 is not a problem), the definition of lick bias on l. 750 and the values in Fig.2, the legend of Figure 7F and what is displayed on the graph (is it population sparseness or responsiveness?), etc.

      We adjusted the legend and description of former Fig. S3-1b (now Fig. S3-2b).

      We now clarify that the rounded values refer to z-scored hit and false alarm rates that we used in the d’ calculation. We adjusted the definition of the lick bias in Fig. 2 and Fig. S3-1b (L804).

      We replaced ‘population responsiveness’ with ‘population sparseness’ throughout the figures, legend and the text.

      b) References to figures are sometimes wrong (for example on l. 737,739).

      c) Some text is duplicated (for example l. 814 and l. 837).

      d) Typos should be corrected (for example l. 127, 'the', l. 787, 'upto').

      We deleted the incorrect references of this section, removed the duplicated text, and corrected the typos.

      e) Color code should be changed (for example the shades of blue for easy and hard tasks - they are extremely difficult to differentiate).

      After consideration, we decided to retain the blue color code (i.e., Fig. 1d, Fig. 3d, Fig. 4e-g, Fig. 5c, Fig. 6d–g), where the distinction between the shades of blue appears sufficiently clear and maintains visual consistency and aesthetic appeal. We did however, made changes in the other color codes (Fig. 4, Fig. 5, Fig. 6, Fig. 7).

      f) Figure design should be improved. For example, why is a different logic used for displaying Figure 5A or B and Figure 1E?

      We adjusted the color scheme in Fig. 5. We chose to represent the data in Fig. 5 according to task difficulty, as this arrangement best illustrates the more pronounced deficits in population decoding in adolescents during the hard task.

      f) Why use a 3D representation in Figure 4G? (2)

      The 3D representation in Fig. 4g was chosen to illustrate the 3-way interactions between onset-latency, maximal discriminability, and duration of discrimination.

      g) Figure 1A, lower right panel- should "response" not be completed by "lick", "no lick"?

      We changed the labels to “Lick” and “No Lick” in Fig. 1a.

      h) l.18 the age mentioned is misleading, because the learning itself actually started 20 days earlier than what is cited here.

      Corrected.

      i) Explain what AAV5-... is on l.212.

      We added an explanation of virus components (see L216-220).

      (2) The comparison of CV in Figure 2 H-J is interesting. I am curious to know whether the differences in the easy and hard tasks could be due to a decrease in CV in adults, rather than an increase in CV in adolescents? Also, could the difference in J be due to 3 outliers?

      We agree that the observed CV differences may reflect a reduction in variability in adults rather than an increase in adolescents. We have revised the Results section accordingly to acknowledge this interpretation.

      Regarding the concern about potential outliers in Fig. 2J, we tested the data for outliers using the isoutlier function in MATLAB (defining outliers as values exceeding three standard deviations from the mean) and found no such cases.

      (3) Figure 2c shows that there is no difference in perceptual sensitivity between adolescents and adults, whereas the conclusion from Figure 4 is that adolescents exhibit lower discriminability in stimulus-related activity. Aren't these results contradictory?

      This is a nuanced point. The similar slopes of the psychometric functions (Fig. 2c) indicating comparable perceptual sensitivity and the lower AUC observed in the ACx of adolescents (Fig. 4) do not necessarily contradict each other. These two measures capture related but distinct issues: psychometric slopes reflect behavioral output, which integrates both sensory encoding and processing downstream to ACx, while the AUC analysis reflects stimulus-related neural activity in ACx, which may still include decision-related components.<br /> Note that stimulus-related neural discriminability outside the context of the task is not different between adolescent and adult experts (Fig. 7h; p = 0.9374, Kruskal Willis Test after Tukey-Kramer correction for multiple comparisons; not discussed in the manuscript). This suggests that there are differences that emerge when we measure during behavior. Also note that behavior may rely on processing beyond ACx, and it is possible that downstream areas compensate for weaker cortical discriminability in adolescents — but this issue merits further investigation.

      (4) Why do you think that the discrimination in hard tasks decreases with learning (Figure 6D vs Figure 6F)?

      This is another nuanced point, and we can only speculate at this stage. While it may appear counterintuitive that single-neuron discriminability (AUC) for the hard task is reduced after learning (Fig. 6D vs. 6F), we believe this may reflect a shift in sensory coding in expert animals. In a recent study (Haimson et al., 2024; Science Advances), we found that learning alters single-neuron responses in the easy versus hard task in complex and distinct ways, which may account for this result. It is also possible that, in expert mice, top-down mechanisms such as feedback from higher-order areas act to suppress or stabilize sensory responses in auditory cortex, reducing the apparent stimulus selectivity of single neurons (e.g., AUC), even as behaviorally relevant information is preserved or enhanced at the population level.

      Reviewer #2 (Recommendations for the authors):

      This is very interesting work and I enjoyed reading the manuscript. See below for my comments, queries and suggestions, which I hope will help you improve an already very good paper.

      We thank the reviewer for the meticulous and thoughtful review.

      (1) Line 107: x-axis of panel 1e says 'pre-adolescent'.

      (2) Line 130: replace 'less' with 'fewer'.

      (3) Line 153: 'both learned and catch trials': I find the terminology here a bit confusing. I would typically understand a catch trial to be a trial without a stimulus but these 'catch' trials here have a stimulus. It's just that they are not rewarded/punished. What about calling them probe trials instead?

      We corrected the labelling (1), reworded to ‘fewer’ and ‘probe trials’ (2,3).

      (4) Line 210: The results of the optogenetics experiments are very interesting. In particular, because the effect is so dramatic and much bigger than what has been reported in the literature previously, I believe. Lick rates are dramatically reduced suggesting that the mice have pretty much stopped engaging in the task and the authors very rightly state that the 'execution' of the behavior is affected. I think it would be worth discussing the implications of these results more thoroughly, perhaps also with respect to some of the lesion work. Useful discussions on the topic can be found, for instance, in Otchy et al., 2015; Hong et al., 2018; O'Sullivan et al., 2019; Ceballo et al., 2019 and Lee et al., 2024. Are the mice unable to hear anything in laser trials and that is why they stopped licking? If they merely had trouble distinguishing them then we would perhaps expect the psychometric curves to approach chance level, i.e. to be flat near the line indicating a lick rate of 0.5. Could the dramatic decrease in lick rate be a motor issue? Can we rule out spillover of the virus to relevant motor areas? (I understand all of the 200nL of the virus were injected at a single location) Or are the effects much more dramatic than what has been reported previously simply because the GtACR2 is much more effective at silencing the auditory cortex? Could the effect be down to off-target effects, e.g. by removing excitation from a target area of the auditory cortex, rather than the disruption of cortical processing?

      We have now expanded the discussion in the manuscript to more thoroughly consider alternative interpretations of the strong behavioral effect observed during ACx silencing (L495–511). In particular, we acknowledge that the suppression of licking may reflect not only impaired sensory discrimination but also broader disruptions to arousal, motivation, or motor readiness. We also discuss the potential impact of viral spread, circuit-level off-target effects, and the potency of GtACR2 as possible contributors. We highlight the need for future work using more graded or temporally precise manipulations to resolve these issues.

      (5) Line 226: Reference 19 (Talwar and Gerstein 2001) is not particularly relevant as it is mostly concerned with microstimulation-induced A1 plasticity. There are, however, several other papers that should be cited (and potentially discussed) in this context. In particular, O'Sullivan et al., 2019 and Ceballo et al., 2019 as these papers investigate the effects of optogenetic silencing on frequency discrimination in head-fixed mice and find relatively modest impairments. Also relevant may be Kato et al., 2015 and Lee et al., 2024, although they look at sound detection rather than discrimination.

      We changed the references and pointed the reader to the (new section) Discussion.

      (6) Line 253: 'engaged [in] the task.

      (7) Figure 4: It appears that panel S4-1d is not referred to anywhere in the main text.

      Fixed.

      (8) Line 260: Might be useful to explain a bit more about the motivation behind focusing on L5/L6. Are there mostly theoretical considerations, i.e. would we expect the infragranular layers to be more relevant for understanding the difference in task performance? Or were there also practical considerations, e. g. did the data set contain mostly L5/L6 neurons because those were easier to record from given the angle at which the probe was inserted? If those kinds of practical considerations played a role, then there is nothing wrong with that but it would be helpful to explain them for the benefit of others who might try a similar recording approach.

      There were no deep theoretical considerations for targeting L5/6.  Our focus on layers 5/6 was driven by both methodological and biological considerations. Methodologically, our electrode penetrations were optimized to span multiple auditory cortical areas, and deeper layers provided greater mechanical stability for chronic recordings. Biologically, layers 5/6 contain the principal output neurons of the auditory cortex and are well-positioned to influence downstream decision-making circuits. We acknowledge the limitation of our recordings to these layers in the manuscript (L268; L463–467). See also comment D of reviewer 3.

      (9) Supplementary Table 2: The numbers in brackets indicate fractions rather than percentages.

      Fixed.

      (10) Figure S4-3: The figure legend implies that the number of neurons with significant discriminability for the hard stimulus and significant discriminability for choice was identical. (adolescent neurons = 368, mice = 5, recordings = 10; adult n = 544, mice = 6, recordings = 12 in both cases). Presumably, that is not actually the case and rather the result of a copy/paste operation gone wrong. Furthermore, I think it would be helpful to state the fractions of neurons that can discriminate between the stimuli and between the choices that the animal made in the main text.

      Thank you for spotting the mistake. We corrected the n’s and added the percentage of neurons that discriminate stimulus and choice in the main text and the figure legend.

      (11) Line 301: 'We used a ... decoder to quantify hit versus correct reject trial outcomes': I'm not sure I understand the rationale here. For the single unit analysis hit and false alarm trials were compared to assess their ability to discriminate the stimuli. FA and CR trials were compared to assess whether neurons can encode the choice of the mice. But the hit and CR trials which are contrasted here differ in terms of both stimulus and behavior/choice so what is supposed to be decoded here, what is supposed to be achieved with this analysis?

      Thank you for this important point. You're correct that comparing hit and CR trials captures differences in both stimulus and choice, or task-related differences. We chose this contrast for the population decoding analysis to achieve higher trial counts per session and similar number of trials which are necessary for the reliability of the analysis. While this approach does not isolate stimulus from choice encoding, it provides an overall measure of how well population activity distinguishes task-relevant outcomes. We explicitly acknowledge this issue in L313-314.

      (12) Line 332: What do you mean when you say the novice mice were 'otherwise fully engaged' in the task when they were not trained to do the task and are not doing the task?

      By "otherwise fully engaged," we mean that novice mice were actively participating in the task environment, similar to expert mice — they were motivated by thirst and licked the spout to obtain water. The key distinction is that novice mice had not yet learned the task rules and likely relied on trial-and-error strategies, rather than performing the task proficiently.

      (13) Line 334: 'regardless of trial outcome': Why is the trial outcome not taken into account? What is the rationale for this analysis? Furthermore, in novice mice a substantial proportion of the 'go' trials are misses. In expert mice, however, the proportion of 'miss trials' (and presumably false alarms) will by definition be much smaller. Given this, I find it difficult to interpret the results of this section.

      This approach was chosen to reliably decode a sufficient number of trials for each task difficulty (i.e. expert mice predominantly performed CRs on No-Go trials and novice mice often showed FAs). Utilizing all trial outcomes ensured that we had enough trials for each stimulus type to accurately estimate the AUCs. This approach avoids introducing biases due to uneven trial numbers across learning stages.

      (14) Line 378: 'differences between adolescents and adults arise primarily from age': Are there differences in any of the metrics shown in 7e-h between adolescents and adults?

      We confirm that differences between adolescents and adults are indeed present in some metrics but not others in Figure 7e–h. Specifically, while tuning bandwidth was similar in novice animals, it was significantly lower in adult experts (Fig. 7e; novice: p = 0.0882; expert: p = 0.0001 Kruskal Willis Test after Tukey-Kramer correction for multiple comparisons; not discussed in the manuscript). The population sparseness was similar in both novice and expert adolescent and adult neurons (Fig. 7f; novice: p = 0.2873; expert: p = 0.1017, Kruskal Willis Test after Tukey-Kramer correction for multiple comparisons; not discussed in the manuscript). The distance to the easy go stimulus was similar in novice animals, but lower in adult experts (Fig. 7g; novice: p = 0.7727; expert: p = 0.0001, Kruskal Willis Test after Tukey-Kramer correction for multiple comparisons; not discussed in the manuscript). The neuronal d-prime was similar in both novice and expert adolescent and adult neurons (Fig. 7h; novice: p = 0.7727; expert: p = 0.0001, Kruskal Willis Test after Tukey-Kramer correction for multiple comparisons; not discussed in the manuscript).

      (15) Line 475: '...well and beyond...': something seems to be missing in this statement.

      (16) Line 487: 'onto' should be 'into', I think.

      (17) Line 610 and 613: '3 seconds' ... '2.5 seconds': Was the response window 3s or 2.5s?

      (18) Line 638: 'set' should be 'setup', I believe.

      All the mistakes mentioned above, were fixed. Thanks.

      (19) Line 643: 'Reward-reinforcement was delayed to 0.5 seconds after the tone offset': Presumably, if they completed their fifth lick later than 0.5 seconds after the tone, the reward delivery was also delayed?

      Apologies for the lack of clarity. In the head-fixed version, there was no lick threshold. Mice were reinforced after a single lick. If that lick occurred after the 0.5-second reinforcement delay following tone offset, the reward or punishment was delivered immediately upon licking.

      (20) Line 661: 'effect [of] ACx'.

      (21) Line 680: 'a base-station connected to chassis'. The sentence sounds incomplete.

      (22) Line 746: 'infliction', I believe, should say 'inflection'.

      (23) Line 769: 'non-auditory responsive units': Shouldn't that simply say 'non-responsive units'? The way it is currently written I understand it to mean that these units were responsive (to some other modality perhaps) but not to auditory stimulation.

      (24) Line 791: 'bins [of] 50ms'.

      (25) Line 811: 'all of' > 'of all'.

      (26) Line 814: Looks like the previous paragraph on single unit analysis was accidentally repeated under the wrong heading.

      (27) Line 817: 'encoded' should say 'calculated', I believe.

      All the mistakes mentioned above were fixed. Thanks.

      (28) Line 869: 'bandwidth of excited units': Not sure I understand how exactly the bandwidth, i.e. tuning width was measured.

      We acknowledge that our previous answer was unclear and expanded the Methods section. To calculate bandwidth, we identified significant tone-evoked responses by comparing activity during the tone window to baseline firing rates at 62 dB SPL (p < 0.05). For each neuron, we counted the number of contiguous frequencies with significant excitatory responses, subtracting isolated false positives to correct for chance. We then converted this count into an octave-based bandwidth by multiplying the number of frequency bins by the octave spacing between them (0.1661 octaves per step).

      (29) Line 871: 'population sparseness': Is that the fraction of tone frequencies that produced a significant response? I would have thought that this measure is very highly correlated to your measure of bandwidth, to the point of being redundant, but I may have misunderstood how one or the other is calculated. Furthermore, the Y label of Figure 7f says 'responsiveness' rather than sparseness and that would seem to be the more appropriate term because, unless I am misunderstanding this, a larger value here implies that the neuron responded to more frequencies, i.e. in a less sparse manner.

      We have clarified the use of the term "population sparseness" and updated the Y-axis label in Figure 7f to better reflect this measure. This metric reflects the fraction of tone–attenuation combinations that elicited a significant excitatory response across the entire population of neurons, not within individual units.

      While this measure is related to bandwidth, it captures a distinct property of the data. Bandwidth quantifies how broadly or narrowly a single neuron responds across frequencies at a fixed intensity, whereas population sparseness reflects how distributed responsiveness is across the population as a whole. Although the two measures are related, since broadly tuned neurons often contribute to lower population sparseness, they capture distinct aspects of neural coding and are not redundant.

      (30) Line 881: I think this line should refer to Figure 7h rather than 7g.

      Fixed.

      Reviewer #3 (Recommendations for the authors):

      (1) In the Educage, water was only available when animals engaged in the task; however, there is no mention of whether/how animal weight was monitored.

      In the Educage, mice had continuous access to water by voluntarily engaging in the task, which they could perform at any time. Although body weight was not directly monitored, water access was essentially ad libitum, and mice performed hundreds of trials per day, thereby ensuring sufficient daily intake. This approach allowed us to monitor hydration (ad libitum food is supplied in the home cage). The 24/7 setup, including automated monitoring of trial counts and water consumption, was reviewed and approved by our institutional animal care and use committee (IACUC).

      (2) In Figure 2B-C and Figure 2E, the y-axis reads "lick rate". At first glance, I took this to mean "the frequency of licking" (i.e. an animal typically licks at a rate of 5 Hz). However, what the authors actually are plotting here is the proportion of trials on which an animal elicited >= 5 licks during the response window (i.e. the proportion of "yes" responses). I recommend editing the y-axis and the text for clarity.

      We replaced the y-label and adjusted the figure legend (Fig. 2).

      (3) I didn't see any examples of raw (filtered) voltage traces. It would be worth including some to demonstrate the quality of the data.

      We have added an example of a filtered voltage trace aligned to tone onset in Fig. S4-1a to illustrate data quality. In addition, all raw and processed voltage traces, along with relevant analysis code, are available through our GitHub repository and the corresponding dataset on Zenodo.

      (4) The description of the calculation of bias (C) in the methods section (lines 749-750) is incorrect. The correct formula is C = -0.5 * [z(hit rate) + z(fa rate)]. I believe this is the formula that the authors used, as they report negative C values. Please clarify or correct.

      Thanks for spotting this. It is now corrected.

      (5) The authors use the terms 'naïve' and 'novice' interchangeably. I suggest sticking with one term to avoid potential confusion.

      (6) Multiple instances: "less trials/day" should be "fewer trials/day"

      (7) Supplementary Table 2: The values reported are proportions, not percentages. Please correct.

      (8) Line 270: Table 2 does not show the number of neurons in the dataset categorized by region. Perhaps the authors meant Supplementary Table 2?

      Fixed. Thank you for pointing these mistakes out.

      (9) Figure 5C: the data from the hard task are entirely obscured by the data from the easy task. I recommend splitting it into two different plots.

      We agree and split the decoding of the easy and the hard task into two graphs (left: easy task; right: hard task). Thank you!

      (10) How many mice contributed to each analyzed data set? Could the authors provide a breakdown in a table somewhere of how many neurons were recorded in each mouse and which ones were included in which analyses?

      We added an overview of the analyzed datasets in supplementary Table 7. Please note that the number of mice and neurons used in each analysis is also reported in the main text and legends. Importantly, all primary analyses were conducted using LME models, which explicitly account for hierarchical data structure and inter-mouse variability, thereby addressing potential concerns about data imbalance or bias.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations for the authors): 

      Overall, the manuscript could be clearer and more beneficial to the readers with the following suggested revisions:  

      (1) The abstract should include information on the comparative performance of 89Zr 64Cu and 18F labeled nanobodies, especially noting the challenges with DFO-89Zr and NOTA-64Cu. 

      (2) The abstract should explicitly note the types of transplants assessed and the specific PET findings.

      (3) The abstract should note the negative results in terms of brain PET findings. 

      We thank reviewer 1 for these three suggestions. We have now included this information in the abstract.

      (4)  Based on the data shown in Fig. 1 and Table 1, it seems that the nanobodies bind to quite a few proteins other than TfR. This should be discussed frankly as a limitation. 

      The presence of multiple other bands and proteins identified by LC/MS in Figure 1 is typical for immunoprecipitation experiments, as performed under the conditions used: all proteins other than TfR that are identified in Table 1 are abundant cytoplasmic (cytoskeletal) and/or nuclear proteins.  More rigorous washing would perhaps have removed some of these contaminants at the risk of losing some of the specific signal as well. We have added a comment to this effect.  In an in vivo setting, this would be of minor concern, as these proteins would be inaccessible to our nanobodies. In fact, when VHH123 radioconjugates are injected in huTfr+/+ mice (or VHH188 in C57BL/6), we observe no specific signal – which supports this conclusion. 

      We therefore state: “We show that both V<sub>H</sub>Hs bind only to the appropriate TfR, with no obvious cross-reactivity to other surface-expressed proteins by immunoblot, LC/MSMS analysis of immunoprecipitates, SDS-PAGE of <sup>35</sup>S-labelled proteins and flow cytometry (Fig 1;Table 1).”. We have added some clarification to make this clearer, and we also include the full LC/MSMS data tables are also added in supplemental materials, as supplementary Table 1. We have included subcellular localization information for each protein identified through LC/MSMS in Table 1 as well.

      (5)  Why did the authors use DFO, which is well known to leak Zr, rather than the current standard for 89Zr PET, DFO* (DFO-star)? 

      We used DFO rather than DFO-star for several reasons: 1) because we had already conducted and published numerous other studies using DFO-conjugated nanobodies and not observed any release of <sup>89</sup>Zr, 2) commercially sourced clickchemistry enabled DFO-star (such as DFO*-DBCO) was not available at the time of the study. 

      (6) Figure 2B appears to show complex structures, more complex than just GGG-DFOazide, and GGG-NOTA-azide. This should be explained in detail. 

      We have added two supplemental figures and methods that recapitulate how we generated what we have termed as GGG-DFO-Azide and GGG-NOTA-Azide. We have updated the legend of Figure 2B. 

      (7) Why is there a double band in Suppl. Fig 9 for VHH123-NOTA-Azide? 

      Under optimal conditions, sortase A-mediated transpeptidation is efficient,  resulting in the formation of a peptide bond between the C-terminally LPETG-tagged protein and the GGG-probe. However, extended reaction times or suboptimal concentrations of modified GGG-probes (which are often in limited supply) in the reaction mixture, allow hydrolysis of the sortase A-LPET-protein intermediate. The hydrolysis product can no longer participate in a sortase A reaction. This is what explains the doublet in the reaction used to generate VHH123-NOTA-N<sub>3</sub> – the upper band is VHH123-NOTA-N<sub>3</sub> and the lower band is the hydrolysis product.  VHH123-LPET, is unable to react with PEG<sub>20kDa</sub>-DBCO (the lower band that appears at the same position of migration in the next lane on the gel). We noticed that an adjacent lane was mislabelled as ‘VHH188-NOTA-PEG<sub>20kDa</sub>’ when in fact it was ‘VHH123-NOTA-PEG<sub>20kDa</sub>’. This has been corrected.

      The hydrolysis product, VHH123-LPET, has a short circulatory half-life and obviously lacks the PEG moiety as well as the chelator. It therefore cannot chelate <sup>64</sup>Cu. Its presence should not interfere with PET imaging.  Since all animals were injected with the same measured dose of <sup>64</sup>Cu labeled-conjugate, the presence of an unlabeled TfRbinding competitor in the form of VHH123-LPET - at a << 1:1 molar ratio to the labelled nanobody – would be of no consequence.

      (8) More details should be provided about the tetrazine-TCO click chemistry for 18F labeling. 

      We have added supplementary methods and figures that detail how <sup>18</sup>F-TCO was generated. For the principle of TCO-tetrazine click-chemistry, a brief description was added in the text, as well as a reference to a review on the subject.

      (9) For the data shown in Figure 3H, the authors should state whether the brain tissues were capillary depleted, and if so, how this was performed and how complete the procedure was. 

      No capillary depletion of the brain tissues was performed, as this was challenging to perform in compliance with the radiosafety protocols in place at our institution. We have updated the legend of figure 3H and methods to include this important detail. Whole blood gamma-counting did not show any obvious di  erence of activity across the 4 groups in figure 3G (same mice as in figure 3H), which would go against the interpretation that activity di  erences in the brain (figure 3H) are solely attributable to residual activity from blood in the capillaries. 

      (10) The authors should experimentally test the hypotheses that the PEG adduct reduced BBB transcytosis. 

      Reviewer 1 is correct to point out that we have not tested un-PEGylated conjugates of <sup>64</sup>Cu and <sup>89</sup>Zr with the anti-TfR nanobodies and we currently do not have the means to perform additional experiments. However, the <sup>18</sup>F conjugates were not PEGylated, and these also fail to show any detectable signal in the CNS by PET/CT (see figure 4A). PEGylation alone cannot be the sole factor that limits transcytosis across the BBB.

      (11) It was interesting to note that the Cu appears to dissociate from the NOTA chelator. The authors should provide more information about the kinetics of this process.  

      We have not tested the kinetics of dissociation between <sup>64</sup>Cu and the NOTA conjugates in vitro, like we have done for <sup>89</sup>Zr and DFO (supplemental figure 2), because previous work (see references 35 and 36 by Dearling JL and Mirick GR and colleagues) has shown that NOTA and other copper chelators tend to release free copper radioisotopes in the liver, a commonly reported artifact. We have also included a new set of images that show the biodistribution of VHH123-NOTA-<sup>64</sup>Cu in huTfR+/+ mice, where we still observe a substantial signal in the liver, indicating release of <sup>64</sup>Cu from NOTA, in the absence of the anti-TfR VHH binding to its target. This was clearly not seen using the DFO-<sup>89</sup>Zr conjugates.  Binding of the VHH to TfR, followed by internalization, appears to be required for the release of <sup>89</sup>Zr from DFO, prompting us to investigate this phenomenon further.

      (12) The authors should increase the sample size, and test two different radiolabels for the transplant imaging results (Figs. 5 and 6), since these seem to be the ones they feel are the most important, based on the title and abstract. 

      We agree with reviewer 1 that more repeats would increase the significance of our findings, but we unfortunately do not have the means of performing additional experiments at this time (the lab at Boston Children’s Hospital has closed as Dr. Ploegh has retired). We believe that the results are compelling and will be of use to the in vivo imaging community.

      (13) Fig. 6G appears to show a false positive result for the kidney imaging. Is this real, or an artifact of small sample size?

      We agree with reviewer 1 that the kidney signals in figure 6 are somewhat puzzling. The difference between the tumor-bearing mice that received VHH123 and VHHEnh conjugates is not significant – with the obvious caveat that the VHHEnh group is comprised of only 2 mice, so sample size may well be a factor here. If we compare the signals of the VHH123 conjugate in tumor-bearing mice vs. tumor-free mice, the VHH123 conjugates would have cleared much faster in the tumor-free mice over 24 hours (since no epitope is present for VHH123 to bind to), thus weakening the kidney signal observed after 24 hours. The same would be true for all the other tissues – except for the liver (where free <sup>64</sup>Cu that leaks from NOTA accumulates). VHHEnh conjugates in tumor-bearing mice show a significant kidney signal – although no VHH123 target epitope is present in these mice. B16.F10 tumors at 4 weeks of growth tend to be necrotic and can passively retain any radiotracer – this generates the weak lung signal visible in Fig 6D – thus the radiotracer would clear at a slower rate than VHH123 conjugates in tumor-free mice giving a higher kidney signal at 24 hours. 

      No tumors were found in the kidneys post-necropsy. We attribute the differences in kidney signals to di erent kinetics of clearance of the radioconjugates. We have added this explanation to the results and discussion.

      (14) Are the results shown in Fig. 7 generalizable? The authors should the constructs with 18F labeling and without the PEG adduct. 

      We agree with reviewer 1 that it would be very interesting to confirm these observations using 18F radioconjugates. The results should be generalizable, as the difference between signals can only be attributed to the presence of the recognized epitope in the placenta– which is in fact the only variable that differs between the two groups. At the time of conducting the study, we had not planned to perform the same experiments with 18F radioconjugates – partly because synthesis of 18F radioconjugates is more challenging (and costly) than the production of 89Zr-labeled nanobodies.  

      (15) The authors should discuss the relative safety of 89Zr and 64Cu. It is likely to be quite a bit worse than for 18F, since the 89Zr and 64Cu have longer half-lives, dissociate from their chelators, and lodge in off-target tissues. An alternative interpretation of the authors' data could be that 89Zr and 64Cu labeling in this context are unsuitable for the stated purposes of PET imaging. In this case, the key experiments shown in Figs. 5-7 should be repeated with the 18F labeled nanobody constructs. 

      Our vision was to o er a tool to the scientific community interested in in vivo tracking of cells in di erent preclinical disease models. The question of safety regarding 89Zr and 64Cu for clinical use was therefore not a factor we then considered. However, we have now included a section in the discussion about the potential safety issue of <sup>89</sup>Zr release and bone accumulation in clinical settings, especially for radioconjugates that target an internalizing surface protein. 

      (16) The authors should remark on the somewhat surprisingly modest amount of BBB transcytosis in the discussion. What were the a inities of the nanobodies? 

      The a inities and binding kinetics of both nanobodies was described in a separate work that is referenced in the introduction (references 21 and 22 by Wouters Y and colleagues). Through other methods that rely on a highly sensitive bio-assay, it was shown that both VHH123 and VHH188 are capable of transcytosis: both nanobodies coupled to a neurotensin peptide induced a drop of temperature after i.v. injection in matching mouse strains (VHH123 in C57BL/6 and VHH188 in huTfr +/+). The lack of any compelling CNS signal by PET/CT is discussed in the manuscript.

      (17) More details of the methods should be provided in the supplement. 

      a.  What was the source of the penta-mutant Sortase A-His6? 

      Sortase A pentamutant is produced in-house, by cytoplasmic expression in E.coli (BL21 strain), using a plasmid vector encoding a truncated and mutated version of Sortase A. References were added, as well as the Addgene repository number (51140).

      b.  What was the yield of the sortase reactions? 

      For small proteins, such as nanobodies/ V<sub>H</sub>Hs, we find that the yield of a sortase A reaction typically is > 75%. This is what we observed for all our conjugations. The methods section was updated to include this information.

      c.  What was the source of the GGG-Azide-DFO and GGG-Azide NOTA? Based on the structures shown in Fig. 2, these appear to be more complex that was noted in the text. 

      We have now detailed the synthesis of GGG-DFO-Azide and GGG-NOTA-Azide in the supplementary methods.

      d.  More details about the source and purity of the tetrazine and TCO labeling reagents should be provided. 

      We have included information on the synthesis of GGG-tetrazine in the supplementary methods. Concerning the synthesis of <sup>18</sup>F-TCO, we have also included a detailed description of the compound in supplementary methods. The reaction between GGG-tetrazine and <sup>18</sup>F-TCO is now further detailed in the manuscript. 

      e.  The TCO-agarose slurry purification should be explained in more detail, and the results should be shown. 

      We have included a detailed procedure of how the TCO-agarose slurry purification was performed in the methods sections. We had already included the Radio-Thin Layer Chromatography QC data of the final VHH123-18F and VHH188-18F purifications in the supplementary figures – which are obtained immediately after TCOagarose slurry purification. The detailed yields of the TCO-agarose slurry purification in terms of activity of each collected fraction is now detailed in the methods section.

      f.   The CT parameters should be provided.  

      We have now added more information about the PET/CT imaging procedure in the methods section of the manuscript.

      Reviewer #2 (Recommendations for the authors): 

      Authors should discuss the possibility of the TfR as a rejection antigen. Murine TfR is foreign for hTfR+/+ mice and vice versa. 

      We have not discussed this possibility, as we believe the risk of rejection of huTfR+ cells in moTfR+ mice (or vice versa) is negligible. The cells and mice are of the same genetic background – save for the coding region of ectodomain of the TfR (spanning amino acids ~194 to 390 of the full length TfR, which is 763 AA). The pairwise identity of both human and mouse TfR ectodomains is of 73% after alignment of both AA sequences using Clustal Omega. We agree that we cannot formally exclude the possibility of an immune rejection, and have now mentioned this possibility in the discussion.

      Is there any clinical use of the anti-human TfR receptor PET tracer? 

      We do not currently envision an application for the anti-human TfR VHH in PET/CT in a clinical setting.  

      Why is the in vivo anti-mouse TfR uptake level in C57BL/6 mice consistently higher than the anti-human TfR receptor PET tracer in hTfR+/+ mice? Is this due to differences in characteristics of the VHH's (e.g. a inity, internalization properties), or rather due to a different biological behavior of the hTfR-transgene (e.g. reduced internalization properties)? 

      We indeed observed that VHH123 uptake and binding appears to be more robust than that of VHH188 to their respective targets. Moreover, after later times post-injection (> 48h), VHH188 appears to display a very low reactivity to C57BL/6 (moTfR+) cells (see Figure 3B). We attribute this to the respective affinities and specificities of both VHHs. We have not investigated the VHH binding kinetics of the mouse versus humanectodomain TfR proteins in vitro. Internalization should be mildly different at best, as <sup>89</sup>Zr release from DFO occurs with both VHHs in both C57BL/6 and huTfR +/+ mouse models (when injected in a matched configuration). The huTfR +/+ mice rely exclusively on the huTfr for their iron supply. They are healthy with no obvious pathological features. The behavior of the huTfr is therefore presumably similar, if not identical to that of the mouse Tfr, bearing in mind that the huTfr and the mouse Tfr are both reliant on mouse Tf as their ligand

      The anti-TfR VHHs were initially developed as a carrier for BBB-transport of VHH-based drug conjugates (previous publications). The data shown here reduces enthusiasm towards this application. Uptake in the brain is several log-factors lower than physiological uptake elsewhere. Potential consequences of off-brain uptake on potential toxicity of VHH-based drug-conjugates could be better emphasized in the discussion. 

      We did not observe a significant presence of the anti-TfR VHHs in the CNS by PET/CT. We have addressed several possibilities: longer circulation times post-injection may favor transcytosis of the VHHs through the BBB. However, because transcytosis requires endocytosis –<sup>89</sup>Zr may be released by their chelating moiety at this step. The only radiotracers with a covalent bond between the radio-isotope and the VHHs in our work are the <sup>18</sup>F VHHs, but the signal acquisition window may have been too short to observe transcytosis and accumulation in the CNS. Another possible caveat is that PEGylation of the radiotracers may be an obstacle to transcytosis. The circulatory halflife of unpegylated VHHs is too low to allow adequate visualization after 24 hours postinjection, as the conjugates rapidly clear from the circulation (t ½ = 30 minutes or less). We have updated the discussion to address these points.

      In several locations (I have counted 5) a space is missing between words, please double-check. 

      We carefully checked the manuscript to remove any remaining typos.

      It is unclear to me why for the melanoma-tracking experiment the tracer is switched from the 89Zr-labeled variant to the 64Cu-labeled variant. 

      The decision to switch to the <sup>64</sup>Cu labeled VHHs for the melanoma experiment stemmed from a wish to 1) evaluate the performance of the <sup>64</sup>Cu-radioconjugates in detecting transplanted cells as we had done with the <sup>89</sup>Zr conjugates and 2) assess how the (non-specific) liver signal seen with <sup>64</sup>Cu contrasts with a specific signal.  

      typo in discussion: C57BL/6 instead of C57B/6         

      We have corrected the typo.

      It is unclear to me why in FIG1B cells are labeled with 35S. Is it correct that the signals seen are due to staining membranes with anti-TfR mAbs? Or is this an autoradiography of the gel? 

      In Figure 1B cells were labeled with 35S-Met/Cys, while the images shown are indeed those of Western Blots, using an anti-TfR monoclonal antibody as the primary antibody to detect human and mouse TfR retrieved by the anti Tfr VHHs. Autoradiography using the same lysates showed the presence of contaminants in the VHH eluates, as commonly seen in immunoprecipitates from metabolically labeled cells (as distinct from IP/Westerns). For this reason, we performed a Western Blot on the same samples to confirm TfR pull-down. As written in the results section, we also performed LCMS analysis of the immunoprecipitated proteins to better characterize contaminating proteins (Table 1). To clarify this, we have now added the autoradiographs in supplementary data (supplementary figure 15) and added a reference to these observation in the results. 

      ROI quantifications in all figures: these should be expressed as %ID/cc instead of %ID/g. Ex vivo tissue counts should be in %ID/g instead of cpm. 

      We have converted all ROI quantification figures as %ID/cc based on the assumption that 1mL (1cc) = 1g. For ex vivo tissue counts, %ID/g has been calculated based on injected dose (except for figure 3G, where the comparisons in %ID/G are not possible due to the uncertain nature of bone marrow and whole blood). All figures have now been updated.

      Fig4: it would be good to also see respective mouse controls (C57BL6 vs hTfR+/+) for the 64Cu- and 18F-labeled VHH123 tracers. Each radiolabeling methodology changes in vivo biodistribution and specificity, which can be better assessed by using appropriate controls. 

      We had performed these controls but they were not included in the manuscript as deemed redundant with the results of Figure 3. We have now separated Figure 4 in two panels (Figure 4A and 4B) with figure 4A showing the 1h timepoint post-injection of VHH123 radiotracers in C57BL/6 vs huTfr<sup>+/+</sup> and Figure 4B showing the 24h timepoint in the same configuration. ROI analyses were also done on the huTfR<sup>+/+</sup> controls and were included in Figure 4C as well.

      Fig7: is it correct that mouse imaging is performed at 24h p.i. and dissected embryo's at 72h p.i.? Why are there 2 days between each procedure of the same animals? 

      We acquired images at di erent timepoints, specifically at 1h, 24h, 48h and 72 hours after radio-tracer injection. As 72 h was the last timepoint, the mice were sacrificed the same day and embryo dissection performed thereafter, at 72 hours post radiotracer injection. We decided to show the 24h timepoint images as they were the most representative of the series, o ering the best signal-to-noise ratio. The signal pattern did not change over the course from 24h to 72h. We have now added those timepoints in the supplementary data.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary:

      The authors use analysis of existing data, mathematical modelling, and new experiments, to explore the relationship between protein expression noise, translation efficiency, and transcriptional bursting.

      Strengths:

      The analysis of the old data and the new data presented is interesting and mostly convincing.

      Thank you for the constructive suggestions and comments. We address the individual comments below. 

      Weaknesses:

      (1) My main concern is the analysis presented in Figure 4. This is the core of mechanistic analysis that suggests ribosomal demand can explain the observed phenomenon. I am both confused by the assumptions used here and the details of the mathematical modelling used in this section. Firstly, the authors' assumption that the fluctuations of a single gene mRNA levels will significantly affect ribosome demand is puzzling. On average the total level of mRNA across all genes would stay very constant and therefore there are no big fluctuations in the ribosome demand due to the burstiness of transcription of individual genes. Secondly, the analysis uses 19 mathematical functions that are in Table S1, but there are not really enough details for me to understand how this is used, are these included in a TASEP simulation? In what way are mRNA-prev and mRNA-curr used? What is the mechanistic meaning of different terms and exponents? As the authors use this analysis to argue ribosomal demand is at play, I would like this section to be very much clarified.

      Thank you for raising two important points. Regarding the first point, we agree that the overall ribosome demand in a cell will remain mostly the same even with fluctuations in mRNA levels of a few genes. However, what we refer to in the manuscript is the demand for ribosomes for translating mRNA molecules of a single gene. This demand will vary with the changes in the number of mRNA molecules of that gene. When the mRNA copy number of the gene is low, the number of ribosomes required for translation is low. At a subsequent timepoint when the mRNA number of the same gene goes up rapidly due to transcriptional bursting, the number of ribosomes required would also increase rapidly. This would increase ribosome demand. The process of allocation of ribosomes for translation of these mRNA molecules will vary between cells, and this process can lead to increased expression variation of that gene among cells. We have now rephrased the section between the lines 321 and 331 to clarify this point.

      Regarding the second point, each of the 19 mathematical functions was individually tested in the TASEP model and stochastic simulation. The parameters ‘mRNA-curr’ and ‘mRNA-prev’ are the mRNA copy numbers at the present time point and the previous time point in the stochastic simulations, respectively. These numbers were calculated from the rate of production of mRNA, which is influenced by the transcriptional burst frequency and the burst size, as well as the rate of mRNA removal. We have now incorporated more details about the modelling part along with explanation for parameters and terms in the revised manuscript (lines 390 to 411; lines 795 to lines 807). 

      (2) Overall, the paper is very long and as there are analytical expressions for protein noise (e.g. see Paulsson Nature 2004), some of these results do not need to rely on Gillespie simulations. Protein CV (noise) can be written as three terms representing protein noise contribution, mRNA expression contribution, and bursty transcription contribution. For example, the results in panel 1 are fully consistent with the parameter regime, protein noise is negligible compared to transcriptional noise. 

      Thank you for referring to the paper on analytical expressions for protein noise. We introduced translational bursting and ribosome demand in our model, and these are linked to stochastic fluctuations in mRNA and ribosome numbers. In addition, our model couples transcriptional bursting with translational bursting and ribosome demand. Since these processes are all stochastic in nature, we felt that the stochastic simulation would be able to better capture the fluctuations in mRNA and protein expression levels originating from these processes. For consistency, we used stochastic simulations throughout even when the coupling between transcription and translation were not considered. 

      Reviewer #1 (Recommendations for the authors):  

      (1) Figure 1B shows noise as Distance to Median (DM) that can be positive or negative. It is therefore misleading that the authors say there is a 10-fold increase in noise (this would be relevant if the quantity was strictly positive). How is the 10-fold estimated? Similar comments apply to Figure 1F and the estimated 37-fold. I also wonder if the datasets combined from different studies are necessarily compatible.

      We have now changed the statements and mentioned the actual noise values for different classes of genes rather than the fold-changes (lines 111-113 and 143-145). We agree that the measurements for mRNA expression levels, protein synthesis rates and protein noise were obtained from experiments done by different research labs, and this could introduce more variation in the data. However, it is unlikely the experimental variations are likely to be random and do not bias any specific class of genes (in Fig. 1B and Fig. 1F) more than others.  

      (2)   How Figure 1D has been generated seems confusing, the authors state this is based on the Gillespie algorithm, but in panel 1C and also in the methods, they are writing ODEs and Equations 3 and 4 stating the Euler method for the solution of ODEs. Also, I am concerned if this has been done at steady-state. The protein noise for the two-state model can be analytically obtained, and instead of simulations, the authors could have just used the expression. Also, Figure 1D shows CV while the corresponding data Figure 1B is showing mean adjusted DM. So, I am not sure if the comparison is valid. I am also very confused about the fact that the authors show CV does not depend on the mean expression of proteins and mRNA. Analytical solutions suggested there is always an inverse relationship exists between CV and mean and this has also been experimentally observed (see for example Newman et al 2006).

      We used Gillespie algorithm for stochastic simulations and identified the time points when an event (for example, switching to ON or OFF states during transcriptional bursting) occurred. If an event occurred at a time point, the rates of the reactions were guided by the equations 3 and 4, as the rates of reactions were dependent on the number of mRNA (or protein) molecules present, production rates and removal rates. 

      For all published datasets where we had measurements from many genes/promoters, we used the measures of adjusted noise (for mRNA noise) and Distance-to-median (DM, for protein noise). These measures of noise are corrected mean-dependence of expression noise (Newman et al., 2006). For simulations, which we performed for a single gene, and for experiments that we performed on a limited number of promoters, we used the measure of coefficient of variation (CV) to quantify noise, as calculation of adjusted noise or DM was not possible for a single gene. 

      The work of Newman et al. (2006) measures noise values of different genes with different transcriptional burst characteristics and different mRNA and protein removal rates. We also see similar results in our simulations (Fig. 1E), where as we increase the mean expression by changing the transcriptional burst frequency, the protein noise goes down.     

      (3) Estimating parameters of gene expression using reference 44 ignores the effect of variability in capture efficiency and cell size. In a recent paper, Tang et al Bioinformatics 39 (7), btad395 2023 addressed this issue.

      Thank you for referring to the work of Tang et al. (2023). We note that the cell size and capture efficiency have a small effect on the burst frequency (Kon) but has a more pronounced effect on burst size (Tang et al., 2023). In our analysis, we considered only burst frequency and even with likely small inaccuracies in our estimation of Kon, we can capture interesting association of burst frequency with noise trends. 

      (4) In the methods "αp = 0.007 per mRNA molecule per unit time", I believe it should be per protein molecule per unit time.

      Corrected.

      (5)  Figure 3 uses TASEP modelling but the details of this modelling are not described well.

      We have now expanded the description of the modelling approach in the revised manuscript (lines 391-412; lines 693-776 and lines 797-809). In addition, we have also added more details in the figure captions. 

      (6) Another overall issue is that when the authors talk about changes in burst frequency or changes in translation efficiency, it is not always clear, is this done while keeping all the other parameters constant therefore changing mean expressions, or is this done by keeping the mean expressions constant?

      To test for the association between mean protein expression and protein noise, we have varied the mean expression by changing the translation initiation rate (TLinit) for the most part of the manuscript while keeping other parameters constant. In figure 5, where we decoupled TLinit from ribosome traversal rate (V), we changed the mean protein expression by changing the ribosome traversal rate while keeping other parameters constant. We have now mentioned this in the manuscript. 

      (7)   I believe Figures 5 and 6 present the same data in different ways, I wonder if these can be combined or if some aspect of the data in Figure 5 could go to supplementary. Also, the statistical tests in Figure 5E and F are not clear what they are testing.

      We have now moved figures 5E and 5F to the supplement (Fig. S20). We have also added details of the statistical test in the figure caption. 

      Reviewer #2 (Public review): 

      This work by Pal et al. studied the relationship between protein expression noise and translational efficiency. They proposed a model based on ribosome demand to explain the positive correlation between them, which is new as far as I realize. Nevertheless, I found the evidence of the main idea that it is the ribosome demand generating this correlation is weak. Below are my major and minor comments.

      Thank you for your helpful suggestions and comments. We note that the direct experimental support required for the ribosome demand model would need experimental setups that are beyond the currently available methodologies. We address the individual comments below. 

      Major comments: 

      (1) Besides a hypothetical numerical model, I did not find any direct experimental evidence supporting the ribosome demand model. Therefore, I think the main conclusions of this work are a bit overstated.

      Direct experimental evidence of the hypothesis would require generation of ribosome occupancy maps of mRNA molecules of specific genes at the level of single cells and at time intervals that closely match the burst frequency of the genes. This is beyond the currently available methodologies. However, there are other evidences that support our model. For example, earlier work in cell-free systems have showed that constraining cellular resources required for transcription or translation can increase expression heterogeneity (Caveney et al., 2017). In addition, the ribosome demand model had two predictions both of which could be validated through modelling as well as from our experiments. 

      To further investigate whether removing ribosome demand from our model could eliminate the positive mean-noise correlation for a gene, we have now tested two additional sets of models where we decoupled the translation initiation rate (TLinit) from the ribosome traversal speed (V). In the first model, we changed the mean protein expression by changing the translation initiation rate but keeping the ribosome traversal speed constant. Thus, in this scenario, ribosome demand varied according to the variation in the translation initiation rate. As expected, the positive correlation between mean expression and protein noise was maintained in this condition (Fig. 5B). In the second model, we changed the mean expression by changing the ribosome traversal speed but keeping the translation initiation rate (and therefore, the ribosome demand) constant. In this situation, the relationship between mean expression and protein noise turned negative (Fig. 5B and fig. S16). These results further pointed that the ribosome demand was indeed driving the positive relationship between mean expression and protein noise. 

      (2) I found that the enhancement of protein noise due to high translational efficiency is quite mild, as shown in Figure 6A-B, which makes the biological significance of this effect unclear.

      We agree with the reviewer’s comment that the effect of translational efficiency on protein noise may not be as substantial as the effect of transcriptional bursting, but it has been observed in studies across bacteria, yeast, and Arabidopsis (Ozbudak et al., 2003; Blake et al., 2003; Wu et al., 2022). In addition, the relationship between translational efficiency and protein noise is in contrast with the inverse relationship observed between mean expression and noise (Newman et al., 2006; Silander et al., 2012). We also note that the goal of the manuscript was not to evaluate the relative strength of these associations, but to understand the molecular basis of the influence of translational efficiency on protein noise. 

      (3) The captions for most of the figures are short and do not provide much explanation, making the figures difficult to read.

      We have revised the figure captions to include more details as per the reviewer’s suggestion. 

      (4)  It would be helpful if the authors could define the meanings of noise (e.g., coefficient of variation?) and translational efficiency in the very beginning to avoid any confusion. It is also unclear to me whether the noise from the experimental data is defined according to protein numbers or concentrations, which is presumably important since budding yeasts are growing cells. 

      For all published datasets where we had measurements from many genes/promoters, we used the measures of adjusted noise (for mRNA noise) and Distance-tomedian (DM, for protein noise). These measures of noise are corrected mean-dependence of expression noise. For simulations, which we performed for a single gene, and for experiments that we performed on a limited number of promoters, we used the measure of coefficient of variation (CV) to quantify noise, as calculation of adjusted noise or DM was not possible for a single gene. We now mention this in line 123-124. We used the measure of protein synthesis rate per mRNA as the measure of translational efficiency (Riba et al., 2019; line 100). Alternatively, we also used tRNA adaptation index (tAI) as a measure of translational efficiency, as codon choice could also influence the translation rate per mRNA molecule (Tuller et al., 2010) (line 193). 

      The protein noise was quantified from the signal intensity of GFP tagged proteins (Newman et al., 2006; and our data), which was proportional to protein numbers without considering cell volume. For quantification of noise at the mRNA level, single-cell RNA-seq data was used, which provided mRNA numbers in individual cells.  

      (5) The conclusions from Figures 1D and 1E are not new. For example, the constant protein noise as a function of mean protein expression is a known result of the two-state model of gene expression, e.g., see Equation (4) in Paulsson, Physics of Life Reviews 2005.

      Yes, they may not be new, but we included these results for setting the baseline for comparison with simulation results that appear in the later part of the manuscript where we included translational bursting and ribosome demand in our models. 

      (6) In Figure 4C-D, it is unclear to me how the authors changed the mean protein expression if the translation initiation rate is a function of variation in mRNA number and other random variables.

      The translation initiation rate varied from a basal translation initiation rate depending on the mRNA numbers and other variables. We changed the basal translation initiation rate to alter the mean protein expression levels. We have now elaborated the modelling section to incorporate these details in the revised manuscript (lines 404 to 412). 

      (7) If I understand correctly, the authors somehow changed the translation initiation rate to change the mean protein expression in Figures 4C-D. However, the authors changed the protein sequences in the experimental data of Figure 6. I am not sure if the comparison between simulations and experimental data is appropriate.

      It is an important observation. Even though we changed the basal translation initiation rate to change the mean expression (Fig. 4C-D), we noted in the description of the model that the changes in the translation initiation rate were also linked to changes in the translation elongation rate (Fig. 3D). Thus, an increase in the translation initiation rate was associated with faster ribosome traversal through an mRNA molecule. This has also been observed in an experimental study by Barrington et al. (2023). Therefore, the models can also be expressed in terms of the translation elongation rate or ribosome traversal speed, instead of the translation initiation rate, and this modification will not change the results of the simulations due to interconnectedness of the initiation rate and the elongation rate.  

      Reviewer #2 (Recommendations for the authors):

      Minor comments:

      (1)  The discussion from lines 180 to 182 appears consistent with Figure 1E. It seems that the twostate model can already explain why the genes with high burst frequency and high protein synthesis rate showed a small protein noise. It is unclear to me the purpose of this discussion.

      Yes, the results from Fig. 1E were from stochastic simulations, whereas the results discussed in the lines 191 to 193 (in the revised manuscript) were based on our analysis of experimental data that is shown in Fig. 2D.

      (2)  If I understand correctly, "translational efficiency" is the same as "protein synthesis rate" in this work. It would be helpful if the authors could keep the same notation throughout the paper to avoid confusion.

      The protein synthesis rate per mRNA molecule is the best measure of translational efficiency, and we used the experimental data from Riba et al. (2019) for this purpose (line 99-100). Alternatively, we also used tRNA Adaptation Index (tAI) as a measure of translational efficiency, as the codon choice also influences the rate at which an mRNA molecule is translated (Tuller et al., 2010) (line 192). 

      (3) On line 227, does "higher translation rate" mean "higher translation initiation rate"? The same issues happen in a few places in this paper.

      Corrected now (line 243 in the revised manuscript and throughout the manuscript). 

      (4) The discussion from lines 296 to 301 is unclear. It is not obvious to me how the authors obtained the conclusion that lowering translational efficiency would decrease the protein expression noise.

      High translational efficiency will require more ribosomes and hence, will increase ribosome demand. If ribosome demand is the molecular basis of high expression noise for genes with bursty transcription and high translational efficiency, then we can expect a reduction in ribosome demand and a reduction in noise if we lower the translational efficiency. We have rephrased this section for clarity between the lines 334 and 339 in the revised manuscript.   

      (5)  On line 324, should slower translation mean a shorter distance between neighboring ribosomes? One can imagine the extreme limit in which ribosomes move very slowly so that the mRNA is fully packed with ribosomes. 

      Slower translation or ribosome traversal rate would also lower the translation initiation rate (Barrington et al., 2023). Slower traversal of ribosomes reduces the chances of collision in case of transient slow-down of ribosomes due to occurrence of one or more non-preferred codons. We have now clarified this part in the lines 360 to 369 in the revised manuscript.

      (6) The text from lines 423 to 433 can be put in Methods.

      We have already added this part to the methods section (lines 900 to 910) and now minimize this discussion in the results section. 

      (7)  The discussion from lines 128 to 130 is unclear, and the statement appears to be consistent with the two-state model (see Figure 1E). The meaning of "initial mRNA numbers" is also unclear.

      An earlier study has proposed that essential genes in yeast employs high transcription and low translation strategy for expression, likely to maintain low expression noise in these genes and to prevent detrimental effects of high expression noise (Fraser et al., 2004). However, there has been no direct supportive evidence. Therefore, we were testing whether the differences in mRNA levels and translational efficiency of genes can lead to differences in protein noise through stochastic simulations. The discussion between the lines 130 and 132 in the revised manuscript summarises the results of the simulations. 

      Initial mRNA numbers - mRNA copy numbers that are present in the cell at the start of stochastic simulations. However, we have now changed it to ‘mRNA levels’ in the revised manuscript for clarity (line 131 in the revised manuscript).

      (8)  On line 212, is the translation initiation rate TL_init the same thing as beta_p in Figure 3A?

      βp refers to the rate of protein synthesis, which is influenced by the translational burst kinetics as well as the translation initiation rate, whereas TLinit refers to the translation initiation rate. So, these parameters are related, but are not the same.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      In this study, Floedder et al report that dopamine ramps in both Pavlovian and Instrumental conditions are shaped by reward interval statistics. Dopamine ramps are an interesting phenomenon because at first glance they do not represent the classical reward prediction errors associated with dopamine signaling. Instead, they seem somewhat to bridge the gap between tonic and phasic dopamine, with an intense discussion still being held in the field about what is their actual behavioral role. Here, in tests with head-fixed mice, and dopamine being recorded with a genetically encoded fluorescent sensor in the nucleus accumbens, the authors find that dopamine ramps were only present when intertrial intervals were relatively short and the structure of the task (Pavlovian cue or progression in a VR corridor) contained elements that indicated progression towards the reward (e.g., a dynamic cue). The authors show that these findings are well explained by their previously published model of Adjusted Net Contingency of Causal Relation (ANCCR).

      Strengths:

      This descriptive study delineates some fundamental parameters that define dopamine ramps in the studied conditions. The short, objective, and to-the-point format of the manuscript is great and really does a service to potential readers. The authors are very careful with the scope of their conclusions, which is appreciated by this reviewer.

      We thank the reviewer for their overall support of the formatting and scope of the manuscript. 

      Weaknesses:

      The discussion of the results is very limited to the conceptual framework of the authors' preferred model (which the authors do recognize, but it still is a limitation). The correlation analysis presented in panel l of Figure 3 seems unnecessary at best and could be misleading, as it is really driven by the categorical differences between the two conditions that were grouped for this analysis. There are some key aspects of the data and their relationship with each other, the previous literature, and the methods used to collect them, that could have been better discussed and explored.

      We agree with the reviewer that a weakness of the discussion was the limited framing of the results within the ANCCR model. To address this, we have expanded our introduction and discussion sections to provide a more thorough explanation of our model and possible leading alternatives.

      We thank the reviewer for pointing out that Figure 3l may be misleading for readers; we removed this panel from the revised Figure 4.

      We have further addressed the specific concerns raised by the reviewer in their comments to the authors. Indeed, we agree with the reviewer that the original manuscript was narrow in its focus regarding relationships between different aspects of the data. To more thoroughly explore how key variables – including dopamine ramp slope and onset response as well as licking behavior slope – could relate to each other, we have added Extended Data Figure 8. In this figure, we show that no correlations exist between any of these key variables in either dynamic tone condition; it is our hope that this additional analysis highlights the significance of the clear relationship between dopamine ramp slope and ITI duration. 

      Reviewer #2 (Public Review):

      In this manuscript by Floeder et al., the authors report a correlation between ITI duration and the strength of a dopamine ramp occurring in the time between a predictive conditioned stimulus and a subsequent reward. They found this relationship occurring within two different tasks with mice, during both a Pavlovian task as well as an instrumental virtual visual navigation task. Additionally, they observed this relationship only in conditions when using a dynamic predictive stimulus. The authors relate this finding to their previously published model ANCCR in which the time constant of the eligibility trace is proportionate to the reward rate within the task.

      The relationship between ITI duration and the extent of a dopamine ramp which the authors have reported is very intriguing and certainly provides an important constraint for models for dopamine function. As such, these findings are potentially highly impactful to the field. I do have a few questions for the authors which are written below.

      We thank the reviewer for their interest in our findings and belief in their potential to be impactful in the field. 

      (1) I was surprised to see a lack of counterbalance within the Pavlovian design for the order of the long vs short ITI. Ramping of the lick rate does increase from the long-duration ITIs to the short-duration ITI sessions. Although of course, this increase in ramping of the licking across the two conditions is not necessarily a function of learning, it doesn't lend support to the opposite possibility that the timing of the dynamic CS hasn't reached asymptotic learning by the end of the long-duration ITI. The authors do reference papers in which overtraining tends to result in a reduction of ramping, which would argue against this possibility, yet differential learning of the dynamic CS would presumably be required to observe this effect. Do the authors have any evidence that the effect is not due to heightened learning of the timing of the dynamic CS across the experiment?

      We appreciate the reviewer expressing their surprise regarding the lack of counterbalance in our Pavlovian experimental design. We previously did not explicitly do this because the ramps disappeared in the short ITI/fixed tone condition, indicating that their presence is not just a matter of total experience in the task. However, we agree that this is incidental, but not direct evidence. To address this drawback, we repeated the Pavlovian experiment in a new cohort of animals with a revised training order, switching conditions such that the short ITI/dynamic tone (SD) condition preceded the long ITI/dynamic tone (LD) condition (see revised Figure 2a). Despite this change in the training order, the main findings remain consistent: positive dLight slopes (i.e., dopamine ramps) are only observed in the SD condition (Figure 2b-d). 

      We thank the reviewer for raising these questions regarding licking behavior and learning and their relationship with dopamine ramps. Indeed, a closer look at the average licking behavior reveals subtle differences across conditions (Figure 1f and Extended Data Figure 5a). While the average lick rate during the ramp window does not differ across conditions (Extended Data Figure 5c), the ramping of the lick rate during this window is higher for dynamic tone conditions compared to fixed tone conditions (Extended Data Figure 5d). Despite these differences, we still believe that the main comparison between the dopamine slope in the SD vs LD condition remains valid given their similar lick ramping slopes. Furthermore, our primary measure of learning is not lick slope, but anticipatory lick rate during the 1 s trace preceding reward delivery, which is robustly nonzero across cohorts and conditions (Figure 1g and Extended Data Figure 5b). 

      Taken together, we hope that the results from our counterbalanced Pavlovian training and more rigorous analysis of lick behavior across conditions provide sufficient evidence to assuage concerns that the differences in ramping dopamine simply reflect differences in learning. 

      (2) The dopamine response, as measured by dLight, seems to drop after the reward is delivered. This reduction in responding also tends to be observed with electrophysiological recordings of dopamine neurons. It seems possible that during the short ITI sessions, particularly on the shorter ITI duration trials, that dopamine levels may still be reduced from the previous trial at the onset of the CS on the subsequent trial. Perhaps the authors can observe the dynamics of the recovery of the dopamine response following a reward delivery on longer-duration ITIs in order to determine how quickly dopamine is recovering following a reward delivery. Are the trials with very short ITIs occurring within this period that dopamine is recovering from the previous trial? If so, how much of the effect may be due to this effect? It should be noted that the lack of observance of a ramp on the condition of shortduration ITIs with fixed CSs provides a potential control for this effect, yet the extent to which a natural ramp might occur following sucrose deliveries should be investigated.

      We thank the reviewer for highlighting the possibility that ramps may be due to the dopamine response recovery following reward delivery. Given that peak reward dopamine responses tend to be larger in long ITI conditions, however, we felt that it was inappropriate to compare post-reward dopamine recovery times across conditions. Instead, we decided to directly compare the dLight slope 2s before cue onset (“pre-cue window,” a proxy for recovery from previous trial) with the dLight slope during our ramp window from 3 to 8s after cue onset (Extended Data Figure 6a). There were no significant differences in pre-cue dLight slope across conditions (Extended Data Figure 6b); this suggests that the ramping slopes seen in the SD condition, but not other conditions, is not simply due to the natural dopamine recovery response following reward delivery. Furthermore, if the dopamine ramps observed in the SD condition were a continuation of the post-reward dopamine recovery from the previous trial, we would expect to see a positive correlation between the dLight slope before and during the cue. However, there is no such correlation between the dLight slopes in the ramp window vs. pre-cue window in the SD condition (Extended Data Figure 6c-d). We believe that this observation, along with the builtin control of the SF condition mentioned by the reviewer, serves as evidence against the possibility of our ramp results being due to a natural ramp after reward delivery.

      (3) The authors primarily relate the finding of the correlation between the ITI and the slope of the ramp to their ANCCR model by suggesting that shorter time constants of the eligibility trace will result in more precisely timed predictors of reward across discrete periods of the dynamic cue. Based on this prediction, would the change in slope be more gradual, and perhaps be more correlated with a broader cumulative estimate of reward rate than just a single trial?

      To clarify, we do not propose that a smaller eligibility trace time constant results in more precise timing per se. Instead, we believe that the rapid eligibility trace decay from smaller time constants gives greater causal predictive power for later periods in the dynamic cue (see Extended Data Figure 1) since the memory of the earlier periods of the cue is weaker. 

      We appreciate the reviewer’s curiosity regarding the influence of a broader cumulative estimate of reward vs. only the immediately preceding ITI on dopamine ramp slopes. Indeed, in several instrumental tasks (e.g., Krausz et al., Neuron, 2023), recent reward rate modulates the magnitude of dopamine ramps, making this an important variable to investigate. We chose to use linear regression for each mouse separately to analyze the relationship between the trial dopamine slope and the average previous ITI for the past 1 through 10 most recent trials. In the SD condition, as reported in our earlier manuscript, there was a significantly negative dependence of trial dopamine slope with the single previous ITI (i.e., if the previous ITI was long, the next trial tends to have a weaker ramp). This negative dependence, however, only held for a single previous trial; there was no clear relationship between the per-trial dopamine slope and the average of the past 2 through 10 ITIs (Extended Data Figure 7a). For the LD condition, on the other hand, there is no clear relationship between the per-trial dopamine slope and the average previous ITI for any of the past 1 through 10 trials, with one exception: there is a significantly negative dependence of trial dopamine slope with the average ITI of the previous 2 trials (Extended Data Figure 7b). This longer timescale relationship in the LD condition suggests that the adaptation of the eligibility trace time constant is nuanced and depends on the general ITI length. 

      In general, though we reason that the eligibility trace time constant should depend on overall event rates, we do not currently propose a real-time update rule for the eligibility trace time constant depending on recent event rates. Accordingly, we are currently agnostic about the actual time scale of history of recent event rate calculation that mediates the eligibility trace time constant. Our experimental results suggest that when the ITI is generally short for Pavlovian conditioning, the eligibility trace time constant adapts to ITI on a rapid timescale. However, only a small fraction of the variability of this rapid fluctuation is captured by recent ITI history. A more thorough investigation of this real-time update rule would need to be done in the future.

      Reviewer #3 (Public Review):

      Summary:

      Floeder and colleagues measure dopamine signaling in the nucleus accumbens core using fiber photometry of the dLight sensor, in Pavlovian and instrumental tasks in mice. They test some predictions from a recently proposed model (ANCCR) regarding the existence of "ramps" in dopamine that have been seen in some previous research, the characteristics of which remain poorly understood.

      They find that cues signaling a progression toward rewards (akin to a countdown) specifically promote ramping dopamine signaling in the nucleus accumbens core, but only when the intertrial interval just experienced was short. This work is discussed in the context of ongoing theoretical conceptions of dopamine's role in learning.

      Strengths:

      This work is the clearest demonstration to date of concrete training factors that seem to directly impact whether or not dopamine ramps occur. The existence of ramping signals has long been a feature of debates in the dopamine literature and this work adds important context to that. Further, as a practical assessment of the impact of a relatively simple trial structure manipulation on dopamine patterns, this work will be important for guiding future studies. These studies are well done and thoughtfully presented.

      We thank the reviewer for recognizing the context that our study adds to the dopamine literature and the potential for our experiments to guide future work. 

      Weaknesses:

      It remains somewhat unclear what limits are in place on the extent to which an eligibility trace is reflected in dopamine signals. In the current study, a specific set of ITIs was used, and one wonders if the relative comparison of ITI/history variables ("shorter" or "longer") is a factor in how the dopamine signal emerges, in addition to the explicit length ("short" or "long") of the ITI. Another experimental condition, where variable ITIs were intermingled, could perhaps help clarify some remaining questions.

      Though we used ITIs of fixed means, due to the exponential nature of their distribution, we did intermingle ITIs of various durations in both our long and short ITI conditions. The distribution of ITI durations is visualized in Figure 1c for Pavlovian conditioning and Extended Data Figure 9b for VR navigation. 

      The relative comparison between consecutive ITIs was not something we originally explored, so we thank the reviewer for wondering how it impacts the dopamine signal. To investigate this, we quantified both the change in ITI (+ or - Δ ITI for relatively longer or shorter, respectively) and the change in dopamine ramp slope between consecutive trials in the SD condition (Figure 3d). Across each mouse separately, we found a significantly negative relationship between Δ slope and Δ ITI (Figure 3e-f). Also, the average Δ slope was significantly greater for consecutive trials with a Δ ITI below -1 s compared to trials with a Δ ITI above +1 s (Figure 3g). Altogether, these findings suggest that relative comparison of ITIs does correlate with changes in the dopamine signal; a relatively longer ITI tends to have a weaker ramp, which fits in nicely with the expected inverse relationship between ITI and dopamine ramp slope from our ANCCR model.

      In both tasks, cue onset responses are larger, and longer on long ITI trials. One concern is that this larger signal makes seeing a ramp during the cue-reward interval harder, especially with a fluorescence method like photometry. Examining the traces in Figure 1i - in the long, dynamic cue condition the dopamine trace has not returned to baseline at the time of the "ramp" window onset, but the short dynamic trace has. So one wonders if it's possible the overall return to baseline trend in the long dynamic conditions might wash out a ramp.

      This is a good point, and we thank the reviewer for raising it. Certainly, the cue onset response is significantly larger in long ITI conditions (see Figure 1i-j and Figure 4h-j). To avoid any bleed over effect, we intentionally chose ramp window periods during later portions of the trial (in line with work from others e.g., Kim et al., Cell, 2020). While the cue onset dopamine pulse seems to have flatlined by the start of the ramp window period, the dopamine levels clearly remain elevated relative to pre-cue baseline. This type of signal has been observed with fiber photometry in other Pavlovian conditioning paradigms with long cue durations (e.g., Jeong et al., Science, 2022). Because of the persistently elevated dopamine levels, it is certainly possible that a ramping signal during the cue is getting washed out; with the bulk fluorescence photometry technique we employed in this study, this possibility is unfortunately difficult to completely rule out. However, the long ITI/fixed tone (LF) condition could serve as a potential control given the overall similarity in the dopamine signal between the LF and LD conditions: both conditions have large cue onset responses with elevated dopamine throughout the duration of the cue (see Extended Data Figures 2c and 3c). Critically, the LD condition lacks a noticeable ramp despite the dynamic tone providing information on temporal proximity to reward, which is thought to be necessary for dopamine ramps to occur. Importantly, regardless of whether a ramp is masked in the long ITI dynamic condition, most studies investigate such a condition in isolation and would report the absence of dopamine ramps. Thus, at a descriptive level, we believe it remains true that observable dopamine ramps are only present when the ITI is short. 

      Not a weakness of this study, but the current results certainly make one ponder the potential function of cue-reward interval ramps in dopamine (assuming there is a determinable function). In the current data, licking behavior was similar on different trial types, and that is described as specifically not explaining ramp activity.

      We agree that this work naturally raises the question of the function of dopamine ramps. However, selective and precise manipulation of only the dopamine ramps without altering other features such as phasic responses, or inducing dopamine dips, is highly technically challenging at this moment; due to this challenge, we intentionally focused on the conditions that determine the presence or absence of dopamine ramps rather than their function. We agree with the reviewer that studying the specific function of dopamine ramps is an interesting future question. 

      Reviewing Editor:

      The reviewers felt the results are of considerable and broad interest to the neuroscience community, but that the framing in terms of ANCCR undermined the scope of the findings as did the brief nature of the formatting of the manuscript. In addition, the reviewers felt that the relationship between ramp dynamics, behavior, and ITI conditions requires more in-depth analyses. Relatedly, the lack of counterbalancing of the ITI durations was considered to be a drawback and needs to be addressed as it may affect the baseline. Addressing these issues in a satisfactory manner would improve the assessment of the manuscript to important/convincing.

      We truly appreciate the valuable feedback provided on this manuscript by all three reviewers and the reviewing editor. Based on this input, we have significantly revised the manuscript to address the issues brought up by the reviewers. Firstly, we have conducted additional experiments to counterbalance the ITI conditions for Pavlovian conditioning; this strengthened our results by confirming our original findings that ITI duration, rather than training order, is the key variable controlling the presence or absence of dopamine ramps. Secondly, we completed more rigorous analyses to further explore the relationship between dopamine dynamics, animal behavior, and ITI duration; we generally found no significant correlations between these variables, with a notable exception being our main finding between ITI duration and dopamine ramp slope. Finally, we revised and expanded our writing to both explain predictions from our ANCCR model in less technical language and explore how alternative theoretical frameworks could potentially explain our findings. In doing so, we hope that our manuscript is now more accessible and of interest to a broad audience of neuroscience readers.

      Reviewer #1 (Recommendations For The Authors):

      The study could be improved if the authors performed a more detailed comparison of how other theoretical frameworks, beyond ANCCR could account for the observed findings. Also, the correlation analysis presented in the panel I of Figure 3 seems unnecessary and potentially spurious, as the slope of the correlation is clearly mostly driven by the categorical differences between the two ITI conditions, which were combined for the analysis - it's not clear what is the value of this analysis beyond the group comparison presented in the following panel.

      Again, we thank the reviewer for elaborating on their concern regarding Figure 3l – we have removed it from the revised Figure 4. 

      The relationship between ramp dynamics with the behavior and the large differences in cue onset responses between short and long ITI conditions could have been better explored. If I understand correctly the overarching proposal of this and other publications by this group, then the differences in cue responses is determined by the spacing of rewards in a somewhat similar way that the ramps are. So, is there a trial-by-trial correlation between the amplitude of the cue responses and the slope of the ramps? Is there a correlation between any of these two measures with the licking behavior, and if so, does it change with the ITI condition? A more thorough exploration of these relationships would help support the proposal of the primacy of inter-event spacing in determining the different types of dopamine responses in learning.

      There are certainly interesting relationships between dopamine dynamics, behavior, and ITI that we failed to explore in our original manuscript – we appreciate the reviewer bringing them up. We found no correlation between dopamine ramp slope and cue onset response in either the SD or LD condition (Extended Data Fig 8a-b). Moreover, we found no correlation between either of these variables and the trial-by-trial licking behavior (Extended Data Fig 8c-f). Finally, there is no relationship between licking behavior and previous ITI duration (Extended Data Fig 8g-h), suggesting that behavioral differences do not account for differences in the dopamine ramp slope. Together, the lack of significant relationships between these other variables highlights the specific, clear relationship between ITI duration and dopamine ramp slope. 

      Finally, another issue I feel could have been better discussed is how the particular settings of both tasks might be biasing the results. For example, there is an issue to be considered about how the dopamine ramp dynamics reported here, especially the requirement of a dynamic cue for ramps to be present, square with the previous published results by one of the authors - Mohebi et al, Nature, 2019. In that manuscript, rats were executing a bandit task where, to this reviewer's understanding, there was no explicit dynamic cue aside from the standard sensory feedback of the rats moving around in the behavior boxes to approach a nose poke port. Is the idea that this sensory feedback could function as a dynamic cue? If that's the case, then this short-scale, movement-related feedback should also function as a dynamic cue in a freely moving Pavlovian condition, when the animals must also move towards a reward delivery port, right? Therefore, could it be that the experimental "requirement" of a dynamic cue is only present in a head-fixed condition? One could phrase this in a different way to Steelman and potentially further the authors' proposal: perhaps in any slightly more naturalistic setting, the interaction of the animals with their environment always functions as a dynamic cue indicating proximity to reward, and this relationship was experimentally isolated by the use of head fixation (but not explicitly compared with a freely moving condition) in the present study. I think that would be an interesting alternative to consider and discuss, and perhaps explore experimentally at some point.

      We thank the reviewer for raising this important point regarding the influence of our experimental settings on our results. At first glance, it could appear that our results demonstrating the necessity of a dynamic cue for ramps in a head-fixed setting do not fit neatly with other results in a freely moving setup (e.g., Collins et al., Scientific Reports, 2016; Mohebi et al., Nature, 2019). Exactly as the reviewer states though, we believe that sensory feedback from the environment in freely moving preparations serves the same function as a dynamic progression of cues. We have considered the implications of methodological differences between head-fixed and freely moving preparations in the discussion section. 

      Reviewer #2 (Recommendations For The Authors):

      This comment relates indirectly to comment 3, in that the authors intermix theory throughout the manuscript. I think this would be fine if the experiment was framed directly in terms of ANCCR, but the authors specifically mention that this experiment wasn't developed to distinguish between different theories. As such, it seems difficult to assess the scope of the comments regarding theory within the paper because they tend to be specifically related to ANCCR. For instance, the last comment has broad implications of how the ramp might be related to the overall reward rate, an interesting finding that constrains classes of dopamine models rather than evidence just for ANCCR. Perhaps adding a discussion section that allows the authors to focus more on theory would be beneficial for this manuscript.

      We appreciate this suggestion by the reviewer. We have updated both our introduction and discussion sections to elaborate more thoroughly on theory.

      Reviewer #3 (Recommendations For The Authors):

      The paper could potentially benefit from the use of more accessible language to describe the conceptual basis of the work, and the predictions, and a bit of reformatting away from the brief structure with lots of supplemental discussion.

      For example, in the introduction, the line - "Varying the ITI was critical because our theory predicts that the ITI is a variable controlling the eligibility trace time constant, such that a short ITI would produce a small time constant relative to the cue-reward interval (Supplementary Note 1)". As far as I can tell, this is meant to get across the notion that dopamine represents some aspect of the time between rewards - dopamine signals will differ for cues following short vs long intervals between rewards.

      As written, the language of the paper takes a fair bit of parsing, but the notions are actually pretty simple. This is partly due to the brief format the paper is written in, where familiarity with the previous papers describing ANCCR is assumed.

      From a readability standpoint, and the potential impact of the paper on a broad audience, perhaps this could be considered as a point for revision.

      We thank the reviewer for pointing out the drawbacks of our technical language and brief formatting. To address this, we have removed the majority of the supplementary notes and expanded our introduction and discussion sections. In doing so, we hope that the conceptual foundations of this work, and potential alternative theoretical explanations, are accessible and impactful for a broad audience of readers.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      Early and accurate diagnosis is critical to treating N. fowleri infections, which often lead to death within 2 weeks of exposure. Current methods-sampling cerebrospinal fluid are invasive, slow, and sometimes unreliable. Therefore, there is a need for a new diagnostic method. Russell et al. address this need by identifying small RNAs secreted by Naegleria fowleri (Figure 1) that are detectable by RT-qPCR in multiple biological fluids including blood and urine. SmallRNA-1 and smallRNA-2 were detectable in plasma samples of mice experimentally infected with 6 different N. fowleri strains, and were not detected in uninfected mouse or human samples (Figure 4). Further, smallRNA-1 is detectable in the urine of experimentally infected mice as early as 24 hours post-infection (Figure 5). The study culminates with testing human samples (obtained from the CDC) from patients with confirmed N. fowleri infections; smallRNA-1 was detectable in cerebrospinal fluid in 6 out of 6 samples (Figure 6B), and in whole blood from 2 out of 2 samples (Figure 6C). These results suggest that smallRNA-1 could be a valuable diagnostic marker for N. fowleri infection, detectable in cerebrospinal fluid, blood, or potentially urine. 

      Strengths: 

      This study investigates an important problem, and comes to a potential solution with a new diagnostic test for N. fowleri infection that is fast, less invasive than current methods, and seems robust to multiple N. fowleri strains. The work in mice is convincing that smallRNA1 is detectable in blood and urine early in infection. Analysis of patient blood samples suggest that whole blood (but not plasma) could be tested for smallRNA-1 to diagnose N. fowleri infections. 

      Thank you for comments regarding the strengths of this study. We agree that our data for detecting the biomarker in biofluids from mice is convincing. In addition, our spike-in studies with human cerebrospinal fluid, plasma, and urine (Figure 6) suggest these biofluids from humans could be used for diagnosis.

      We appreciate the comment regarding plasma and recognize this was not fully explained in the manuscript. We do believe that plasma can be used to assess the biomarker. Firstly, we demonstrated equivalent sensitivity of the method to detect smallRNA-1 in plasma and urine in mice with end-stage PAM (Figure 5). In addition, spike in samples of human plasma, cerebrospinal fluid, and urine demonstrated equivalent sensitivity of detecting the biomarker (Figure 6). 

      The negative result for human plasma in Figure 6C requires clarification; this sample was convalescent plasma from a survivor. The patient presented to the hospital on August 7, 2016, was treated, made a remarkable recovery, and was released from the hospital later that month. The plasma sample in Figure 6C was collected September 7, 2016, which is a month after treatment was initiated and weeks after the patient was symptom free. Our interpretation of the convalescent plasma result is the patient had cleared the active amoeba infection and that is why we did not detect the biomarker. We have added text in the discussion and in the legend for Figure 6 to clarify the convalescent plasma result. 

      One additional caveat for consideration is that many of the samples we received from amoebaeinfected humans were stored at room temperatures for undefined periods of time before being moved to <-20°C (see details in Table S9). We can’t rule out possible sample degradation, but this is an unfortunate reality of obtaining human samples from individuals later confirmed to be infected with pathogenic free-living amoebae.

      Weaknesses: 

      (1) There are not many N. fowleri cases, so the authors were limited in the human samples available for testing. It is difficult to know how robust this biomarker is in whole blood (only 2 samples were tested, both had detectable smallRNA-1), serum (1 out of 1 sample tested negative), or human urine (presumably there is no material available for testing). This limitation is openly discussed in the last paragraph of the discussion section. 

      We agree the extremely limited availability of human samples is a limitation of this study. Given the rarity of these infections in the United States, even prospective studies to systematically collect samples would be very challenging. We hope that by publishing the details of this biomarker detection is that the method can be used by diagnostic reference centers, especially in areas where outbreaks of multiple cases per year have been reported.

      (2) There seems to be some noise in the data for uninfected samples (Figures 4B-C, 5B, and 6C), especially for those with serum (2E). While this is often orders of magnitude lower than the positive results, it does raise questions about false positives, especially early in infection when diagnosis would be the most useful. A few additional uninfected human samples may be helpful. 

      We agree; however, we would like to point out the progression of disease in humans and mice are similar. Typically, patients survive between 10-14 days after presumed exposure and mice have similar survival times following instillation of N. fowleri amoebae into a nare of the mouse. Therefore, detection of this biomarker as early as 72 h in mice is seemingly equivalent to the onset of initial symptoms in humans.  

      Reviewer #2 (Public review): 

      Summary: 

      The authors sought to develop a rapid and non-invasive diagnostic method for primary amoebic meningoencephalitis (PAM), a highly fatal disease caused by Naegleria fowleri. Due to the challenges of early diagnosis, they investigated extracellular vesicles (EVs) from N. fowleri, identifying small RNA biomarkers. They developed an RT-qPCR assay to detect these biomarkers in various biofluids. 

      Strengths: 

      (1)  This study has a clear methodological approach, which allows for the reproducibility of the experiments. 

      (2) Early and Non-Invasive Diagnosis - The identification of a small RNA biomarker that can be detected in urine, plasma, and cerebrospinal fluid (CSF) provides a non-invasive diagnostic approach, which is crucial for improving early detection of PAM. 

      (3) High Sensitivity and Rapid Detection - The RT-qPCR assay developed in the study is highly sensitive, detecting the biomarker in 100% of CSF samples from human PAM cases and in mouse urine as early as 24 hours post-infection. Additionally, the test can be completed in ~3 hours, making it feasible for clinical use. 

      (4)  Potential for Disease Monitoring - Since the biomarker is detectable throughout the course of infection, it could be used not only for early diagnosis but also for tracking disease progression and monitoring treatment efficacy. 

      (5)  Strong Experimental Validation - The study demonstrates biomarker detection across multiple sample types (CSF, urine, whole blood, plasma) in both animal models and human cases, providing robust evidence for its clinical relevance. 

      (6) Addresses a Critical Unmet Need - With a >97% case fatality rate, PAM urgently requires improved diagnostics. This study provides one of the first viable liquid biopsy-based diagnostic approaches, potentially transforming how PAM is detected and managed. 

      Thank you for summarizing the strengths of the study.

      Weaknesses: 

      (1) Limited Human Sample Size - While the biomarker was detected in 100% of CSF samples from human PAM cases, the number of human samples analyzed (n=6 for CSF) is relatively small. A larger cohort is needed to validate its diagnostic reliability across diverse populations. 

      As noted in response to Reviewer #1 above, we agree this is a limitation of the study; however, we were fortunate to obtain even 15 µL samples of cerebrospinal fluid, plasma, serum, or whole blood from as many patients as we did. There is an urgent need for more systematic collection and storage of samples for rare diseases like primary amoebic meningoencephalitis so that advancements in diagnostics and biomarker discovery can be conducted. It is our sincere hope that by publishing our detailed methods and experimental results in this manuscript, that additional hospitals and research centers can replicate our studies and help advance this or other techniques for early diagnosis of PAM.

      (2) Lack of Pre-Symptomatic or Early-Stage Human Data - Although the biomarker was detected in mouse urine as early as 24 hours post-infection, there is no data on whether it can be reliably detected before symptoms appear in humans, which is crucial for early diagnosis and treatment initiation. 

      It is difficult to envision a method to obtain these biofluids from infected humans prior to onset of symptoms. More likely the best we can hope for is that physicians include primary amoebic meningoencephalitis in their assessment of patients that present with prodromal symptoms of meningitis.

      (3)  Plasma Detection Challenges - While the biomarker was detected in whole blood, it was not detected in human plasma, which could limit the ease of clinical implementation since plasma-based diagnostics are more common. Further investigation is needed to understand why it is absent in plasma and whether alternative blood-based approaches (e.g., whole blood assays) could be optimized. 

      See response to Reviewer #1 above.

      Reviewer #1 (Recommendations for the authors): 

      (1) What is the evidence that these small RNAs are secreted specifically in EVs? I believe that they are, and ultimately it doesn't impact the conclusions, but I think the evidence here could be either stronger or presented in a more obvious way. 

      Our data demonstrates that smallRNA-1 is present in N. fowleri-derived EVs (Figures 2 and Supplemental Figure 7) and in the intact amoebae (Figure 3B).  Initial sequencing data to identify these smallRNA biomarkers came from PEG-precipitated EVs (Figure S1), by using methods we previously published (22). The PEG-precipitated EVs were extracted specifically for spike in studies. Finally, the smallRNAs in EVs were confirmed after extraction of EVs from 7 N. fowleri strains (Figure 2). We do not have evidence that they are secreted outside of EVs.

      (2) The figure legends would be more useful with some additional information. For example: why are there two points for Nf69 in Fig 2B? In Figure 3A-B, please add more detail as to what the graphs are showing (are they histograms binned by a number of amoebae? This does not seem obvious to me). 

      We agree the Figure legends should be edited for clarity and to add additional information. Both Figure legends have been updated.

      In Figure 2B, each point represents the mean of three technical replicates of EV preps for each N. fowleri strain.

      In Figure 3 the points indicate the Copy#/µL of a well from a 96-well plate. The histograms show the mean of these observations for each condition. 

      (3)  In Figure 2E, the FBS seems like it has near detectable levels of smallRNA-1 compared to Ac and Bm (albeit N. fowleri has 4 orders of magnitude higher levels than the FBS). Because cows are likely exposed to N. fowleri and have documented infections (e.g. doi: 10.1016/j.rvsc.2012.01.002), is it possible this signal is real? 

      Thank you for making this interesting observation. We agree that cows are likely to have significant exposure to N. fowleri, yet documented infections are rare. In this case we do not believe the near detectable levels of smallRNA-1 in FBS was due to an infected donor animal. This noise was likely due to extracting RNA from concentrated FBS rather than FBS diluted in cell culture media. In addition, as shown in Supplemental Figure 4, the qPCR product from EVs extracted from FBS were not the same as that from the N. fowleri-derived EVs. Please note we used a PEG extraction reagent that separates lipid particles, so this is additional evidence the smallRNAs are present in EVs.

      (4)  In Figure 6A, why was the sample size greater for water and unspiked urine? Similarly, why is the number of infected mice so variable in Figure 4B? 

      In Figure 6A we assayed de-identified biofluids provided by Advent Hospital in Orlando, Florida. The plasma and serum samples were pooled from multiple individuals; whereas, individual urine samples (n=8) were provided for this experiment. We have updated the legend for Figure 6A to include these details.

      For Figure 4B we used plasma collected at the end-stage of disease following infections with five different strains of N. fowleri. The sample sizes varied for two reasons. First, Nf69 was the strain used most by our lab and we had plasma from several in vivo experiments. The lower sample sizes for the other strains came from an experiment with 8 mice per group. Some of these strains were less virulent and did not succumb to disease with the number of amoebae inoculated in this experiment. Thus, plasma was only collected from animals that were euthanized due to severe N.

      fowleri infections. In follow up studies (e.g., Figure 5B), plasma was collected every 24 hr for analysis.

      Very minor points: 

      (1)  The number of acronyms (FLA, PAM, EVs, CNS, CSF, LOD) could be reduced to make this paper more reader-friendly. 

      Acronyms that were used infrequently in the manuscript (FLA, CNS, LOD, mNGS, UC) have been edited to spell out the complete names. We kept the acronyms EVs and CSF because they are each used more than twenty times in the manuscript.

      (2)  The decimal point in the Cq values is formatted strangely. 

      The decimal points have been edited to normal format in both the manuscript and supplementary material.

      (3)  Figure 3C is not intuitive. I do not understand the logic for the placement of the different samples (was row A only amoebae, B only Veros, C blank, D a mix, and F more Veros?). 

      Thank you for this comment; we agree the microtiter plate schematic (Fig 3C) was misleading. We have revised Figure 3C to make the point that we tested amoebae alone, Vero cells alone, and we combined supernatants from Vero cells (alone) plus amoebae (alone) to confirm that 1) smallRNA-1 was only detected in amoeba-conditioned media, and 2) that Vero-conditioned media does not affect detection of smallRNA-1.

      Reviewer #2 (Recommendations for the authors): 

      Minor corrections: 

      The abbreviation 'Nf' for Naegleria fowleri is not appropriate in a scientific publication. According to taxonomic conventions, the correct way to abbreviate a scientific name is as follows: 

      The first mention should be written in full: Naegleria fowleri. 

      In subsequent mentions, the genus name should be abbreviated to its initial in uppercase, followed by a period, while the species name remains in lowercase: N. fowleri. 

      The same rule applies to Balamuthia mandrillaris and Acanthamoeba species, which should be abbreviated as B. mandrillaris and Acanthamoeba spp. after their first mention. 

      We agree and each of the scientific names have been updated to the proper format. Please note Nf69 is the accepted nomenclature for this N. fowleri strain, so no changes were made when referring to this specific strain.

      Temperatures should be expressed in international units (°C). Please update the temperatures reported in Fahrenheit (°F) in the 'Materials and Methods' section, specifically in the 'Animal Studies' subsection. 

      These changes were made in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary

      This paper summarises responses from a survey completed by around 5,000 academics on their manuscript submission behaviours. The authors find several interesting stylised facts, including (but not limited to):

      - Women are less likely to submit their papers to highly influential journals (*e.g.*, Nature, Science and PNAS).

      - Women are more likely to cite the demands of co-authors as a reason why they didn't submit to highly influential journals.

      - Women are also more likely to say that they were advised not to submit to highly influential journals.

      Recommendation

      This paper highlights an important point, namely that the submissions' behaviours of men and women scientists may not be the same (either due to preferences that vary by gender, selection effects that arise earlier in scientists' careers or social factors that affect men and women differently and also influence submission patterns). As a result, simply observing gender differences in acceptance rates---or a lack thereof---should not be automatically interpreted as as evidence of for or against discrimination (broadly defined) in the peer review process. I do, however, make a few suggestions below that the authors may (or may not) wish to address.

      We thank the author for this comment and for the following suggestions, which we take into account in our revision of the manuscript.

      Major comments

      What do you mean by bias?

      In the second paragraph of the introduction, it is claimed that "if no biases were present in the case of peer review, then 'we should expect the rate with which members of less powerful social groups enjoy successful peer review outcomes to be proportionate to their representation in submission rates." There are a couple of issues with this statement.

      - First, the authors are implicitly making a normative assumption that manuscript submission and acceptance rates *should* be equalised across groups. This may very well be the case, but there can also be important reasons why not -- e.g., if men are more likely to submit their less ground-breaking work, then one might reasonably expect that they experience higher rejection rates compared to women, conditional on submission.

      We do assume that normative statement: unless we believe that men’s papers are intrinsically better than women’s papers, the acceptance rate should be the same. But the referee is right: we have no way of controlling for the intrinsic quality of the work of men and women. That said, our manuscript does not show that there is a different acceptance rate for men and women; it shows that women are less likely to submit papers to a subset of journals that are of a lower Journal Impact Factor, controlling for their most cited paper, in an attempt to control for intrinsic quality of the manuscripts.

      - Second, I assume by "bias", the authors are taking a broad definition, i.e., they are not only including factors that specifically relate to gender but also factors that are themselves independent of gender but nevertheless disproportionately are associated with one gender or another (e.g., perhaps women are more likely to write on certain topics and those topics are rated more poorly by (more prevalent) male referees; alternatively, referees may be more likely to accept articles by authors they've met before, most referees are men and men are more likely to have met a given author if he's male instead of female). If that is the case, I would define more clearly what you mean by bias. (And if that isn't the case, then I would encourage the authors to consider a broader definition of "bias"!)

      Yes, the referee is right that we are taking a broad definition of bias. We provide a definition of bias on page 3, line 92. This definition is focused on differential evaluation which leads to differential outcomes. We also hedge our conversation (e.g., page 3, line 104) to acknowledge that observations of disparities may only be an indicator of potential bias, as many other things could explain the disparity. In short, disparities are a necessary but insufficient indicator of bias. We add a line in the introduction to reinforce this. The only other reference to the term bias comes on page 10, line 276. We add a reference to Lee here to contextualize.

      Identifying policy interventions is not a major contribution of this paper

      In my opinion, the survey evidence reported here isn't really strong enough to support definitive policy interventions to address the issue and, indeed, providing policy advice is not a major -- or even minor -- contribution of your paper, so I would not mention policy interventions in the abstract. (Basically, I would hope that someone interested in policy interventions would consult another paper that much more thoughtfully and comprehensively discusses the costs and benefits of various interventions!)

      We thank the referee for this comment. While we agree that our results do not lead to definitive policy interventions, we believe that our findings point to a phenomenon that should be addressed through policy interventions. Given that some interventions are proposed in our conclusion, we feel like stating this in the abstract is coherent.

      Minor comments

      - What is the rationale for conditioning on academic rank and does this have explanatory power on its own---i.e., does it at least superficially potentially explain part of the gender gap in intention to submit?

      The referee is right: academic rank was added to control for career age of researchers, with the assumption that this variable would influence submission behavior. However, the rank information we collected was for the time that the individual respondent took the survey, which could be different from the rank they held concerning their submission behaviors mentioned in the survey. That is why we didn't consider rank as an independent variable of interest. But I do also agree with the reviewer that it could be related to their submission behaviors in some cases. Our initial analysis shows that academic rank is not a significant predictor of whether researchers submitted to SNP, but does contribute significantly to the SNP acceptance rates and desk rejection rates of individuals in Medical Sciences.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Basson et al. study the representation of women in "high-impact" journals through the lens of gendered submission behavior. This work is clear and thorough, and it provides new insights into gender disparities in submissions, such as that women were more likely to avoid submitting to one of these journals based on advice from a colleague/mentor. The results have broad implications for all academic communities and may help toward reducing gender disparities in "high-impact" journal submissions. I enjoyed reading this article, and I have several recommendations regarding the methodology/reporting details that could help to enhance this work.

      We thank the referee for their comments.

      Strengths:

      This is an important area of investigation that is often overlooked in the study of gender bias in publishing. Several strengths of the paper include:

      (1) A comprehensive survey of thousands of academics. It is admirable that the authors retroactively reached out to other researchers and collected an extensive amount of data.

      (2) Overall, the modeling procedures appear thorough, and many different questions are modeled.

      (3) There are interesting new results, as well as a thoughtful discussion. This work will likely spark further investigation into gender bias in submission behavior, particularly regarding the possible gendered effect of mentorship on article submission.

      Thank you for those comments.

      Weaknesses:

      (1) The GitHub page should be further clarified. A detailed description of how to run the analysis and the location of the data would be helpful. For example, although the paper says that "Aggregated and de-identified data by gender, discipline, and rank for analyses are available on GitHub," I was unable to find such data.

      We added the link to the Github page, as well as more details on the how to run the statistical analysis. Unfortunately, our IRB approval does not allow for the sharing of the raw data.

      (2) Why is desk rejection rate defined as "the number of manuscripts that did not go out for peer review divided by the number of manuscripts rejected for each survey respondent"? For example, in your Grossman 2020 reference, it appears that manuscripts are categorized as "reviewed" or "desk-rejected" (Grossman Figure 2). If there are gender differences in the denominator, then this could affect the results.

      We thank the referee for pointing this out. Actually, what the referee is proposing is how we calculated it in the manuscript; the calculation mentioned in the manuscript was a mistake. We corrected the manuscript.

      (3) Have you considered correcting for multiple comparisons? Alternatively, you could consider reporting P-values and effect sizes in the main text. Otherwise, sometimes the conclusions can be misleading. For example, in Figure 3 (and Table S28), the effect is described as significant in Social Sciences (p=0.04) but not in Medical Sciences (p=0.07).

      We highly appreciate the suggestion. We’ve added Odds Ratio values and p-values to the main manuscript.

      (4) More detail about the models could be included. It may be helpful to include this in each table caption so that it is clear what all the terms of the model were. For instance, I was wondering if journal or discipline are included in the models.

      We appreciate the suggestion. We’ve added model details to the figure and table captions in the manuscript and the supplemental materials.

      Reviewer #3 (Public Review):

      Summary:

      This is a strong manuscript by Basson and colleagues which contributes to our understanding of gender disparities in scientific publishing. The authors examine attitudes and behaviors related to manuscript submission in influential journals (specifically, Science, Nature and PNAS). The authors rightly note that much attention has been paid to gender disparities in work that is already published, but this fails to capture the unseen hurdles that occur prior to publication (which include decisions about where to publish, desk rejections, revisions and resubmissions, etc.). They conducted a survey study to address some of these components and their results are interesting:

      They find that women are less likely to submit their manuscript to Science, Nature or PNAS. While both men and women feel their work would be better suited for more specialized journals, women were more likely to think their work was 'less novel or groundbreaking.'

      A smaller proportion of respondents indicated that they were actively discouraged from submitting their manuscripts to these journals. In this instance, women were more likely to receive this advice than men.

      Lastly, the authors also looked at self-reported acceptance and rejection rates and found that there were no gender differences in acceptance or rejection rates.

      These data are helpful in developing strategies to mitigate gender disparities in influential journals.

      We thank the referee for their comments

      Comments:

      The methods the authors used are appropriate for this study. The low response rate is common for this type of recruitment strategy. The authors provide a thoughtful interpretation of their data in the Discussion.

      We thank the referee for their comments

      Reviewer #4 (Public Review):

      This manuscript covers an important topic of gender biases in the authorship of scientific publications. Specifically, it investigates potential mechanisms behind these biases, using a solid approach, based on a survey of researchers.

      Main strengths

      The topic of the MS is very relevant given that across sciences/academia representation of genders is uneven, and identified as concerning. To change this, we need to have evidence on what mechanisms cause this pattern. Given that promotion and merit in academia are still largely based on the number of publications and impact factor, one part of the gap likely originates from differences in publication rates of women compared to men.

      Women are underrepresented compared to men in journals with high impact factor. While previous work has detected this gap, as well as some potential mechanisms, the current MS provides strong evidence, based on a survey of close to 5000 authors, that this gap might be due to lower submission rates of women compared to men, rather than the rejection rates. The data analysis is appropriate to address the main research aims. The results interestingly show that there is no gender bias in rejection rates (desk rejection or overall) in three high-impact journals (Science, Nature, PNAS). However, submission rates are lower for women compared to men, indicating that gender biases might act through this pathway. The survey also showed that women are more likely to rate their work as not groundbreaking, and be advised not to submit to prestigious journals

      With these results, the MS has the potential to inform actions to reduce gender bias in publishing, and actions to include other forms of measuring scientific impact and merit.

      We thank the referee for their comments.

      Main weakness and suggestions for improvement

      (1) The main message/further actions: I feel that the MS fails to sufficiently emphasise the need for a different evaluation system for researchers (and their research). While we might act to support women to submit more to high-impact journals, we could also (and several initiatives do this) consider a broader spectrum of merits (e.g. see https://coara.eu/ ). Thus, I suggest more space to discuss this route in the Discussion. Also, I would suggest changing the terms that imply that prestigious journals have a better quality of research or the highest scientific impact (line 40: journals of the highest scientific impact) with terms that actually state what we definitely know (i.e. that they have the highest impact factor). And think this could broaden the impact of the MS

      We agree with the referee. We changed the wording on impact, and added a few lines were added on this in the discussion.

      (2) Methods: while methods are all sound, in places it is difficult to understand what has been done or measured. For example, only quite late (as far as I can find, it's in the supplement) we learn the type of authorship considered in the MS is the corresponding authorship. This information should be clear from the very start (including the Abstract).

      We performed the suggested edits.

      Second, I am unclear about the question on the perceived quality of research work. Was this quality defined for researchers, as quality can mean different things (e.g. how robust their set-up was, how important their research question was)? If researchers have different definitions of what quality means, this can cause additional heterogeneity in responses. Given that the survey cannot be repeated now, maybe this can be discussed as a limitation.

      We agree that this can mean something different for researchers—probably varies by discipline, but also by gender. But that was precisely the point: whether men/women considered their “best work” to be published in higher impact venue. While there may be heterogeneity in those perceptions, the fact that 1) men and women rate their research at the same level and 2) we control for disciplinary differences should mitigate some of that.

      I was surprised to see that discipline was considered as a moderator for some of the analyses but not for the main analysis on the acceptance and rejection rates.

      We appreciate the attention to detail. In our analysis of acceptance and rejection rates, we conducted separate regression analyses for each discipline to capture any field-specific patterns that might otherwise be obscured.

      We added more details on this to clarify.

      I was also suppressed not to see publication charges as one of the reasons asked for not submitting to selected journals. Low and middle-income countries often have more women in science but are also less likely to support high publication charges.

      That is a good point. However, both Science and Nature have subscription options, which do not require any APCs.

      Finally, academic rank was asked of respondents but was not taken as a moderator.

      Academic rank is included in the regression as a control variable (Figure 1).

      Reviewer #2 (Recommendations For The Authors):

      In addition to the points in the "Weaknesses" section of the my Public Review above, I have several suggestions to improve this work.

      (1) Can you please indicate what the error bars mean in each plot? I am assuming that they are 95% confidence intervals.

      We appreciate the attention to detail. Yes, they are 95% confidence intervals. We’ve clarified this in the captions of the corresponding figures. 

      (2) Can you provide a more detailed explanation for why the 7 journals were separated? I see that on page 3 of the supporting information you write that "Due to limited responses, analysis per journal was not always viable. The results pertaining to the journals were aggregated, with new categories based on the shared similarities in disciplinary foci of the journals and their prestige." Specifically, why did you divide the data into (somewhat arbitrary) categories as opposed to using all the data and including a journal term in your model?

      The survey covered 7 journals:

      • Science, Nature, and PNAS (S.N.P.)

      • Nature Communications and Science Advances (NC.SA.)

      • NEJM and Cell (NEJM.C.)

      We believe that the first three are a class of their own: they cover all fields (while NEJM and Cell are limited to (bio)medical sciences), and have a much higher symbolic capital than both Nature Comms and Science Advances (which are receiving cascading papers from Nature and Science, respectively). We believe that factors leading to submission to S.N.P. are much different than those leading to submission to the other groups of journals, which is why we separated the analysis in that manner.

      (3) You included random effects for linear regression but not for logistic regression. Please justify this choice or include additional logistic regression models with random effects.

      We used mixed-effect models for linear regressions (where number of submissions, acceptance rate, or rejection rate is the dependent variable). As mentioned in the previous comment, we tested using rank as the control variable and found it had a potential impact on the variables we analyzed using linear regressions in some disciplines. Therefore, we introduced it as a random effect for all the linear regression models.

      Reviewer #3 (Recommendations For The Authors):

      The limitations of this work are currently described in the Supplement. It may be helpful to bring several of these items into the Discussion so that they can be addressed more prominently.

      Added content

      Reviewer #4 (Recommendations For The Authors):

      (1) Line 40: add 'as leading authors of papers published in' before ' 'journals'

      Done

      (2) Explain what the direction in the ' relationship between' line 62 is

      Added

      (3) Lines 101-102 - this is a bit unclear. Please, provide some more info, also including what did these studies find.

      Added

      (4) Is 'sociodemographic' the best term in line 120

      Yes, we believe so.

      (5) Results would benefit from a short intro with the info on the number of respondents, also by gender.

      Those are present at the end of the intro (and in the methods, at the end). We nonetheless added gender.

      (6) Line 134 add how many woman and man did submit to Science, Nature, and PNAS

      Added. In all disciplines combined, 552 women and 1,583 men ever submitted to these three elite journals. More details can be found in SI Table 9

      (7) Add 'Self-' before reported, line 141

      Added

      (8) Add sample sizes to Figs 1 and 2

      Those are in the appendix

      (9) Line 168 - unclear if this is ever or as their first choice

      We do not discriminate – it is whether the considered it at all.

      (10) Add sample size in line 177

      Added. 480 women and 1404 men across all disciplines reported desk rejections by S.N.P. journals.

      (11) I would like to see some discussion on the fact that the highest citation paper will also be a paper that the authors have submitted earlier in their careers given that citations will pile up over time.

      Those are actually quite evenly distributed. We modified the supplementary materials.

      (12) Data availability - be clear that supporting info contains only summary data. Also, while the Data availability statement refers to de-identified data on Github, the Github page only contains the code, and the note that 'The STAT code used for our analyses is shared.

      We are unable to share the survey response details publicly per IRB protocols.' Why were de-identified data shared? This is extremely important to allow for the reproducibility of MS results. I would also suggest sharing data in a trusted repository (e.g. Dryad, ZENODO...) rather than on Github, as per current recommendations on the best practices for data sharing.

      Thank you for your careful reading and for highlighting the importance of clear data availability. We will revise our Data Availability Statement to explicitly state that the supporting information contains only summary data and that the complete analysis code is available on GitHub.

      We understand the importance of sharing de-identified data for reproducibility. However, our IRB strictly prohibits the sharing of any individual-level data, including de-identified files, to protect participant confidentiality. Consequently, the summary data included in the supporting information, together with the provided code, is intended to facilitate the verification of our core findings. Our previous statement regarding “de-identified” data sharing was inaccurate and thus has been removed. We apologize for the confusion.

      In light of your suggestion, we are also exploring depositing the summary data and code in a trusted repository (e.g., Dryad or Zenodo) to further align with current best practices for data sharing.

    1. Author response:

      The following is the authors’ response to the current reviews.

      We thank the editor and reviewers for their thoughtful evaluations. We would like to clarify that the revised manuscript does not make a general claim about the absence of ripple-associated synchronous population activity. Rather, we report only that the synchronous ensembles observed in our data were not associated with contralateral ripple oscillations. This distinction is clearly reflected in the revised Title, Abstract, Introduction, Results, and Discussion. We also explicitly acknowledged the methodological limitation of recording LFP from the contralateral side of the hippocampus.

      To further improve clarity and prevent potential misinterpretation, we are submitting a revised version (R4) in which we:

      (1) Replace the word "surprisingly" with the more neutral "Moreover";

      (2) Refer to ripple events consistently as "contralateral ripples (c-ripples)";

      (3)Expand the discussion of limitations inherent to contralateral LFP recordings.

      Additionally, while Buzsaki et al. (2003) wrote that "These findings suggest ripples emerge locally and independently in the two hemispheres", the same study also presents data and reports that "Ripple episodes occurred simultaneously in the left and right CA1 regions" (p. 206). Our original citation was intended to reflect this nuance. Nevertheless, to avoid any potential misinterpretation, we have removed the co-occurrence statement with its associated citations in the revised (R4) manuscript.


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      For many years, there has been extensive electrophysiological research investigating the relationship between local field potential patterns and individual cell spike patterns in the hippocampus. In this study, using state-ofthe-art imaging techniques, they examined spike synchrony of hippocampal cells during locomotion and immobility states. In contrast to conventional understanding of the hippocampus, the authors demonstrated that hippocampal place cells exhibit prominent synchronous spikes locked to theta oscillations.

      Strengths:

      The voltage imaging used in this study is a highly novel method that allows recording not only suprathreshold-level spikes but also subthreshold-level activity. With its high frame rate, it offers time resolution comparable to electrophysiological recordings.

      Comments on revisions: I have no further comments.

      We thank the reviewer for constructive reviews and for recognizing the strength of our study.

      Reviewer #2 (Public review):

      Summary:

      This study employed voltage imaging in the CA1 region of the mouse hippocampus during the exploration of a novel environment. The authors report synchronous activity, involving almost half of the imaged neurons, occurred during periods of immobility. These events did not correlate with SWRs, but instead, occurred during theta oscillations and were phased locked to the trough of theta. Moreover, pairs of neurons with high synchronization tended to display non-overlapping place fields, leading the authors to suggest these events may play a role in binding a distributed representation of the context.

      Strengths:

      Technically this is an impressive study, using an emerging approach that allows single cell resolution voltage imaging in animals, that while head-fixed, can move through a real environment. The paper is written clearly and suggests novel observations about population level activity in CA1.

      Comments on revisions:

      I have no further major requests and thank the authors for the additional data and analyses.

      We thank the reviewer for recognizing the strength of our study and for appreciating the additional data and analyses we provided during the revision process.

      Reviewer #3 (Public review):

      Summary:

      In the present manuscript, the authors use a few minutes of voltage imaging of CA1 pyramidal cells in head fixed mice running on a track while local field potential (LFPs) are recorded. The authors suggest that synchronous ensembles of neurons are differentially associated with different types of LFP patterns, theta and ripples. The experiments are flawed in that the LFP is not "local" but rather collected the other side of the brain.

      Strengths:

      The authors use a cutting-edge technique.

      Weaknesses:

      Although the authors have toned down their claims, the statement in the title ("Synchronous Ensembles of Hippocampal CA1 Pyramidal Neurons Associated with Theta but not Ripple Oscillations During Novel Exploration") is still unsupported.

      One could write the same title while voltage imaging one mouse and recording LFP from another mouse.

      To properly convey the results, the title should be modified to read

      "Synchronous Ensembles of Hippocampal CA1 Pyramidal Neurons Associated with Contralateral Theta but not with Contralateral Ripple Oscillations During Novel Exploration"

      Without making this change, the title - and therefore the entire work - is misleading at best.

      We thank the reviewer for the thoughtful and constructive suggestion regarding the title. We fully understand the concern that our original title may have overstated the specificity of the contralateral LFP recordings, potentially allowing for misinterpretation.

      In our results, synchronous ensembles are associated with intracellular theta oscillations recorded from the ipsilateral hippocampus and with extracellular theta but not ripples oscillations recorded from the contralateral hippocampus. To clarify this distinction and minimize the potential for misinterpretation, we have revised the abstract accordingly. 

      Abstract (line18):

      “… Notably, these synchronous ensembles were not associated with contralateral ripple oscillations but were instead phase-locked to theta waves recorded in the contralateral CA1 region. Moreover, the subthreshold membrane potentials of neurons exhibited coherent intracellular theta oscillations with a depolarizing peak at the moment of synchrony.”

      Based on this, we propose the following revised title, which we believe more effectively communicates the central finding of our study: 

      “Synchronous Ensembles of Hippocampal CA1 Pyramidal Neurons During Novel Exploration”. 

      Compared to the reviewer’s suggested title, this version offers a clearer and more concise summary of our findings while allowing important methodological details to be fully conveyed in the abstract and main text. While the suggested title accurately reflects the source of the LFP signals, it does not mention the intracellular theta oscillations recorded from the ipsilateral hippocampus, which are a critical part of our results. Including both the intracellular and extracellular recording contexts in the title would make it overly long and potentially less accessible to readers. In contrast, the revised title succinctly captures the core phenomenon, and the updated abstract now explicitly clarifies the relationship between the synchronous ensembles and both types of oscillatory signals. 

      We sincerely appreciate the reviewer’s input, which helped us refine both the language and the presentation of our findings. We hope these changes address the concern and clarify the scope of our work. 

      Recommendations for the authors:

      Reviewer #3 (Recommendations for the authors):

      (1) Change the title. Although the authors have toned down their claims, the statement in the title ("Synchronous Ensembles of Hippocampal CA1 Pyramidal Neurons Associated with Theta but not Ripple Oscillations During Novel Exploration") is still unsupported. One could write the same title while voltage imaging one mouse and recording LFP from another mouse. To properly convey the results, the title should be modified to read

      "Synchronous Ensembles of Hippocampal CA1 Pyramidal Neurons Associated with Contralateral Theta but not with Contralateral Ripple Oscillations During Novel Exploration"

      Without making this change, the title - and therefore the entire work - is misleading at best. But if you can manage that (and attend to comment #2 below), then the manuscript would not be making any false statements.

      Please see our reply in the public review above.

      (2) Report the exact locations of the contralateral recording electrodes. In their rebuttal, the authors supplies a figure ("Author response image 1") in which they show damage to the neocortex and fluorescence signal in the CA1 pyramidal cell layer. This is useful, but it is unclear from which animal this histology was generated.

      Please include this (or another similar) photograph in Figure 1B, right next to the voltage imaging photograph. Indicate from which animal each photograph was obtained - ideally, provide the two photographs from the same animal. Second, please include such paired photographs - along with paired signals - for every animal that you are able to.

      If you can manage that, it will add credibility to the statement that the recordings are indeed from the contralateral CA1 pyramidal cell layer (as opposed to from the contralateral hemisphere).

      We thank the reviewer for this important point. We have followed the suggestion and now provide paired photographs showing LFP electrode tracks and voltage images from the same animal (see revised Figure 1B)

      In addition, we have included similar paired photographs for additional animals used in this study (see Figure 1-figure supplement 1).

      These updates directly support the claim that LFP recordings were obtained from the contralateral CA1 pyramidal layer, rather than from the contralateral hemisphere. We sincerely thank the reviewer for the valuable suggestion, which has substantially strengthened our manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Functional lateralization between the right and left hemispheres is reported widely in animal taxa, including humans. However, it remains largely speculative as to whether the lateralized brains have a cognitive gain or a sort of fitness advantage. In the present study, by making use of the advantages of domestic chicks as a model, the authors are successful in revealing that the lateralized brain is advantageous in the number sense, in which numerosity is associated with spatial arrangements of items. Behavioral evidence is strong enough to support their arguments. Brain lateralization was manipulated by light exposure during the terminal phase of incubation, and the left-to-right numerical representation appeared when the distance between items gave a reliable spatial cue. The light-exposure induced lateralization, though quite unique in avian species, together with the lack of intense inter-hemispheric direct connections (such as the corpus callosum in the mammalian cerebrum), was critical for the successful analysis in this study. Specification of the responsible neural substrates in the presumed right hemisphere is expected in future research. Comparable experimental manipulation in the mammalian brain must be developed to address this general question (functional significance of brain laterality) is also expected.

      We sincerely appreciate the Reviewer's insightful feedback and his/her recognition of the key contributions of our study.

      Reviewer #2 (Public review):

      Summary:

      This is the first study to show how a L-R bias in the relationship between numerical magnitude and space depends on brain lateralisation, and moreover, how is modulated by in ovo conditions.

      Strengths:

      Novel methodology for investigating the innateness and neural basis of an L-R bias in the relationship between number and space.

      We would like to thank the Reviewer for their valuable feedback and for highlighting the key contributions of our study.

      Weaknesses:

      I would query the way the experiment was contextualised. They ask whether culture or innate pre-wiring determines the 'left-to-right orientation of the MNL [mental number line]'.

      We thank the Reviewer for raising this point, which has allowed us to provide a more detailed explanation of this aspect. Rather than framing the left-to-right orientation of the mental number line (MNL) as exclusively determined by either cultural influences or innate pre-wiring, our study highlights the role of environmental stimulation. Specifically, prenatal light exposure can shape hemispheric specialization, which in turn contributes to spatial biases in numerical processing. Please see lines 115-118.

      The term, 'Mental Number Line' is an inference from experimental tasks. One of the first experimental demonstrations of a preference or bias for small numbers in the left of space and larger numbers in the right of space, was more carefully described as the spatial-numerical association of response codes - the SNARC effect (Dehaene, S., Bossini, S., & Giraux, P. (1993). The mental representation of parity and numerical magnitude. Journal of Experimental Psychology: General, 122, 371-396).

      We have refined our description of the MNL and SNARC effect to ensure conceptual accuracy in the revised manuscript; please see lines 53-59.

      This has meant that the background to the study is confusing. First, the authors note, correctly, that many other creatures, including insects, can show this bias, though in none of these has neural lateralisation been shown to be a cause. Second, their clever experiment shows that an experimental manipulation creates the bias. If it were innate and common to other species, the experimental manipulation shouldn't matter. There would always be an L-R bias. Third, they seem to be asserting that humans have a left-to-right (L-R) MNL. This is highly contentious, and in some studies, reading direction affects it, as the original study by Dehaene et al showed; and in others, task affects direction (e.g. Bachtold, D., Baumüller, M., & Brugger, P. (1998). Stimulus-response compatibility in representational space. Neuropsychologia, 36, 731-735, not cited). Moreover, a very careful study of adult humans, found no L-R bias (Karolis, V., Iuculano, T., & Butterworth, B. (2011), not cited, Mapping numerical magnitudes along the right lines: Differentiating between scale and bias. Journal of Experimental Psychology: General, 140(4), 693-706). Indeed, Rugani et al claim, incorrectly, that the L-R bias was first reported by Galton in 1880. There are two errors here: first, Galton was reporting what he called 'visualised numerals', which are typically referred to now as 'number forms' - spontaneous and habitual conscious visual representations - not an inference from a number line task. Second, Galton reported right-to-left, circular, and vertical visualised numerals, and no simple left-to-right examples (Galton, F. (1880). Visualised numerals. Nature, 21, 252-256.). So in fact did Bertillon, J. (1880). De la vision des nombres. La Nature, 378, 196-198, and more recently Seron, X., Pesenti, M., Noël, M.-P., Deloche, G., & Cornet, J.-A. (1992). Images of numbers, or "When 98 is upper left and 6 sky blue". Cognition, 44, 159-196, and Tang, J., Ward, J., & Butterworth, B. (2008). Number forms in the brain. Journal of Cognitive Neuroscience, 20(9), 1547-1556.

      We sincerely appreciate the opportunity to discuss numerical spatialization in greater detail. We have clarified that an innate predisposition to spatialize numerosity does not necessarily exclude the influence of environmental stimulation and experience. We have proposed an integrative perspective, incorporating both cultural and innate factors, suggesting that numerical spatialization originates from neural foundations while remaining flexible and modifiable by experience and contextual influences. Please see lines 69–75.

      We have incorporated the Reviewer’s suggestions and cited all the recommended papers; please see lines 47–75.

      If the authors are committed to chicks' MN Line they should test a series of numbers showing that the bias to the left is greater for 2 and 3 than for 4, etc.

      What does all this mean? I think that the paper should be shorn of its misleading contextualisation, including the term 'Mental Number Line'. The authors also speculate, usefully, on why chicks and other species might have a L-R bias. I don't think the speculations are convincing, but at least if there is an evolutionary basis for the bias, it should at least be discussed.

      In the revised version of the manuscript, we have resorted to adopt the Spatial Numerical Association (SNA). We thank the Reviewer for this valuable comment.

      We appreciated the Reviewer’s suggestion regarding the evolutionary basis of lateralization and have included considerations of its relevance in chicks and other species; please see lines 143-151 and 381-386.

      This paper is very interesting with its focus on why the L-R bias exists, and where and why it does not.

      We wish to thank the Reviewer again for his/her work.

      Reviewer #1(Public review)

      (1) Introduction needs to be edited to make it much more concise and shorter. Hypotheses (from line 67 to 81) and predictions (from line 107 to 124) must be thoroughly rephrased, because (a) general readers are not familiar with the hypotheses (emotional valence and BAFT), (b) the hypotheses may or may not be mutually exclusive, and therefore (c) the logical linkage between the hypotheses and the predicted results are not necessarily clear. Most general readers may be embarrassed by the apparently complicated logical constructs of this study. Instead, it is recommended that focal spotlight should be given to the issue of functional contributions of brain lateralization to the cognitive development of number sense.

      We thank the Reviewer for these comments, which allowed us to improve the clarity of our hypotheses and predictions. We thoroughly rephrased them to ensure they are accessible to general readers and specified that the models may or may not be mutually exclusive. Additionally, we highlighted the functional contributions of brain lateralization to the cognitive development of number sense, addressing the suggested focal point. While we have shortened the introduction, we opted to retain essential background information to ensure readers are well-informed about the relevant scientific literature. Please review the entire introduction, particularly lines 84–118 and 218.

      (2) In relation to the above (a), abbreviations need to be reexamined. MNL (mental number line) appears early on lines 27 and 49, whereas the possibly related conceptual term SNA appeared first on line 213, without specification to "spatial numerical association".

      We thank the Reviewer for bringing this to our attention. We have addressed the suggestions, and the term SNA has been used specifically to refer to numerical spatialization in non-human animals. Please see lines 27-30.

      (3) By the way, what difference is there between MNL and SNA? Please specify the difference if it is important. If not important, is it possible that one of these two is consistently used in this report, at least in the Introduction?

      We clarified the distinction between MNL and SNA and have consistently used SNA in this report; please see lines 47-75.

      (4) In relation to the above (a and b), clarification of the hypotheses and their abbreviations in the form of a table or a graphical representation will strongly reinforce the general readers' understanding. It is also possible that some of these hypotheses are discussed later in the Discussion, rather than in Introduction.

      We appreciated this suggestion and have now clarified the hypotheses, also providing a table/graphical representation, aiming to enhance accessibility for general readers; please see lines 110-118, and 218.

      (5) Figures 1 and 2 are transparent and easily understandable; however, the statistical details in the Results may bother the readers as the main points are doubly represented in Figures 1, 2, and Table 1. These (statistics and Table 1) may go to the supplementary file, if the editor agrees.

      We would prefer to keep Table 1 and the statistical details as part of the main article to provide readers with a comprehensive overview of the experimental results. However, if the editors also suggest to move them to the supplementary file, we are open to making this adjustment.

      (6) In Figure 1D and E, and text lines 139-140. Figure 1D shows that the chick is looking monocularly by the right eye, but the text (line 139) says "left eye in use. Is it correct?

      We thank the reviewer for pointing out this incongruity. We have corrected the text to align with Figure 1D and E; please see lines 180-181.

      (7) Methods. The behavioral experiment was initiated on Wednesday (8 a.m.; line 479), but at what age? At what post-hatch day was the experiment terminated? A simple graphical illustration of the schedule will be quite helpful.

      We have added the requested details, specifying that experiments began on the third post-hatch day and ended on the fifth day; please see lines 533-539.

      Additionally, we have included a graphical illustration of the schedule to enhance clarity; please see line 666.  

      (8) Methods. How many chicks were excluded from the study in the course of Pre-training (line 525) and Training (line 535-536)? Was the exclusion rate high, or just negligible?

      We appreciate the reviewer's suggestion. We have now included the number of subjects excluded during the training phase; please see lines 593-597.

      We wish to thank the Reviewer again for his/her work.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      This work integrates two timepoints from the Adolescent Brain Cognitive Development (ABCD) Study to understand how neuroimaging, genetic, and environmental data contribute to the predictive power of mental health variables in predicting cognition in a large early adolescent sample. Their multimodal and multivariate prediction framework involves a novel opportunistic stacking model to handle complex types of information to predict variables that are important in understanding mental health-cognitive performance associations. 

      Strengths: 

      The authors are commended for incorporating and directly comparing the contribution of multiple imaging modalities (task fMRI, resting state fMRI, diffusion MRI, structural MRI), neurodevelopmental markers, environmental factors, and polygenic risk scores in a novel multivariate framework (via opportunistic stacking), as well as interpreting mental health-cognition associations with latent factors derived from partial least squares. The authors also use a large well-characterized and diverse cohort of adolescents from the ABCD Study. The paper is also strengthened by commonality analyses to understand the shared and unique contribution of different categories of factors (e.g., neuroimaging vs mental health vs polygenic scores vs sociodemographic and adverse developmental events) in explaining variance in cognitive performance 

      Weaknesses: 

      The paper is framed with an over-reliance on the RDoC framework in the introduction, despite deviations from the RDoC framework in the methods. The field is also learning more about RDoC's limitations when mapping cognitive performance to biology. The authors also focus on a single general factor of cognition as the core outcome of interest as opposed to different domains of cognition. The authors could consider predicting mental health rather than cognition. Using mental health as a predictor could be limited by the included 9-11 year age range at baseline (where many mental health concerns are likely to be low or not well captured), as well as the nature of how the data was collected, i.e., either by self-report or from parent/caregiver report. 

      Thank you so much for your encouragement.

      We appreciate your comments on the strengths of our manuscript.

      Regarding the weaknesses, the reliance on the RDoC framework is by design. Even with its limitations, following RDoC allows us to investigate mental health holistically. In our case, RDoC enabled us to focus on a) a functional domain (i.e., cognitive ability), b) the biological units of analysis of this functional domain (i.e., neuroimaging and polygenic scores), c) potential contribution of environments, and d) the continuous individual deviation in this domain (as opposed to distinct categories). We are unaware of any framework with all these four features.

      Focusing on modelling biological units of analysis of a functional domain, as opposed to mental health per se, has some empirical support from the literature. For instance, in Marek and colleagues’ (2022) study, as mentioned by a previous reviewer, fMRI is shown to have a more robust prediction for cognitive ability than mental health. Accordingly, our reasons for predicting cognitive ability instead of mental health in this study are motivated theoretically (i.e., through RDoC) and empirically (i.e., through fMRI findings). We have clarified this reason in the introduction of the manuscript.

      We are aware of the debates surrounding the actual structure of functional domains where the originally proposed RDoC’s specific constructs might not fit the data as well as the data-driven approach (Beam et al., 2021; Quah et al., 2025). However, we consider this debate as an attempt to improve the characterisation of functional domains of RDoC, not an effort to invalidate its holistic, neurobiological and basicfunctioning approach. Our use of a latent-variable modelling approach through factor analyses moves towards a data-driven direction. We made the changes to the second-to-last paragraph in the introduction to make this point clear:

      “In this study, inspired by RDoC, we a) focused on cognitive abilities as a functional domain, b) created predictive models to capture the continuous individual variation (as opposed to distinct categories) in cognitive abilities, c) computed two neurobiological units of analysis of cognitive abilities: multimodal neuroimaging and PGS, and d) investigated the potential contributions of environmental factors. To operationalise cognitive abilities, we estimated a latent variable representing behavioural performance across various cognitive tasks, commonly referred to as general cognitive ability or the gfactor (Deary, 2012). The g-factor was computed from various cognitive tasks pertinent to RDoC constructs, including attention, working memory, declarative memory, language, and cognitive control. However, using the g-factor to operationalise cognitive abilities caused this study to diverge from the original conceptualisation of RDoC, which emphasises studying separate constructs within cognitive abilities (Morris et al., 2022; Morris & Cuthbert, 2012). Recent studies suggest an improvement to the structure of functional domains by including a general factor, such as the g-factor, in the model, rather than treating each construct separately (Beam et al., 2021; Quah et al., 2025). The g-factor in children is also longitudinally stable and can forecast future health outcomes (Calvin et al., 2017; Deary et al., 2013). Notably, our previous research found that neuroimaging predicts the g-factor more accurately than predicting performance from separate individual cognitive tasks (Pat et al., 2023). Accordingly, we decided to conduct predictive models on the g-factor while keeping the RDoC’s holistic, neurobiological, and basic-functioning characteristics.”

      Reviewer #2 (Public review):

      Summary: 

      This paper by Wang et al. uses rich brain, behaviour, and genetics data from the ABCD cohort to ask how well cognitive abilities can be predicted from mental-health-related measures, and how brain and genetics influence that prediction. They obtain an out-ofsample correlation of 0.4, with neuroimaging (in particular task fMRI) proving the key mediator. Polygenic scores contributed less. 

      Strengths: 

      This paper is characterized by the intelligent use of a superb sample (ABCD) alongside strong statistical learning methods and a clear set of questions. The outcome - the moderate level of prediction between the brain, cognition, genetics, and mental health - is interesting. Particularly important is the dissection of which features best mediate that prediction and how developmental and lifestyle factors play a role. 

      Thank you so much for the encouragement. 

      Weaknesses: 

      There are relatively few weaknesses to this paper. It has already undergone review at a different journal, and the authors clearly took the original set of comments into account in revising their paper. Overall, while the ABCD sample is superb for the questions asked, it would have been highly informative to extend the analyses to datasets containing more participants with neurological/psychiatric diagnoses (e.g. HBN, POND) or extend it into adolescent/early adult onset psychopathology cohorts. But it is fair enough that the authors want to leave that for future work. 

      Thank you very much for providing this valuable comment and for your flexibility.

      For the current manuscript, we have drawn inspiration from the RDoC framework, which emphasises the variation from normal to abnormal in normative samples (Morris et al., 2022). The ABCD samples align well with this framework.

      We hope to extend this framework to include participants with neurological and psychiatric diagnoses in the future. We have begun applying neurobiological units of analysis for cognitive abilities, assessed through multimodal neuroimaging and polygenic scores (PGS), to other datasets containing more participants with neurological and psychiatric diagnoses. However, this is beyond the scope of the current manuscript. We have listed this as one of the limitations in the discussion section:

      “Similarly, our ABCD samples were young and community-based, likely limiting the severity of their psychopathological issues (Kessler et al., 2007). Future work needs to test if the results found here are generalisable to adults and participants with stronger severity.”

      In terms of more practical concerns, much of the paper relies on comparing r or R2 measures between different tests. These are always presented as point estimates without uncertainty. There would be some value, I think, in incorporating uncertainty from repeated sampling to better understand the improvements/differences between the reported correlations. 

      This is a good suggestion. We have now included bootstrapped 95% confidence intervals in all of our scatter plots, showing the uncertainty of predictive performance.

      The focus on mental health in a largely normative sample leads to the predictions being largely based on the normal range. It would be interesting to subsample the data and ask how well the extremes are predicted. 

      We appreciate this comment. Similar to our response to Reviewer 2’s Weakness #1, our approach has drawn inspiration from the RDoC framework, which emphasises the variation from normal to abnormal in normative samples (Morris et al., 2022). Subsampling the data would make us deviate from our original motivation. 

      Moreover, we used 17 mental healh variables in our predictive models: 8 CBCL subscales, 4 BIS/BAS subscales and 5 UPSS subscales. It is difficult to subsample them. Perhaps a better approach is to test the applicability of our neurobiological units of analysis for cognitive abilities (multimodal neuroimaging and PGS) in other datasets that include more extreme samples. We are working on this line of studies at the moment, and hope to show that in our future work. 

      Reviewer 2’s Weakness #4

      A minor query - why are only cortical features shown in Figure 3? 

      We presented both cortical and subcortical features in Figure 3. The cortical features are shown on the surface space, while the subcortical features are displayed on the coronal plane. Below is an example of these cortical and subcortical features from the ENBack contrast. The subcortical features are presented in the far-right coronal image.

      We separated the presentation of cortical and subcortical features because the ABCD uses the CIFTI format (https://www.humanconnectome.org/software/workbenchcommand/-cifti-help). CIFTI-format images combine cortical surface (in vertices) with subcortical volume (in voxels). For task fMRI, the ABCD parcellated cortical vertices using Freesurfer’s Destrieux atlas and subcortical voxels using Freesurfer’s automatically segmented brain volume (ASEG).

      Due to the size of the images in Figure 3, it may have been difficult for Reviewer 2 to see the subcortical features clearly. We have now added zoomed-in versions of this figure as Supplementary Figures 4–13.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the autors):

      (1) In the abstract, could the authors mention which imaging modalities contribute most to the prediction of cognitive abilities (e.g., working memory-related task fMRI)? 

      Thank you for the suggestion. Following this advice, we now mention which imaging modalities led to the highest predictive performance. Please see the abstract below.

      “Cognitive abilities are often linked to mental health across various disorders, a pattern observed even in childhood. However, the extent to which this relationship is represented by different neurobiological units of analysis, such as multimodal neuroimaging and polygenic scores (PGS), remains unclear. 

      Using large-scale data from the Adolescent Brain Cognitive Development (ABCD) Study, we first quantified the relationship between cognitive abilities and mental health by applying multivariate models to predict cognitive abilities from mental health in children aged 9-10, finding an out-of-sample r\=.36 . We then applied similar multivariate models to predict cognitive abilities from multimodal neuroimaging, polygenic scores (PGS) and environmental factors. Multimodal neuroimaging was based on 45 types of brain MRI (e.g., task fMRI contrasts, resting-state fMRI, structural MRI, and diffusion tensor imaging). Among these MRI types, the fMRI contrast, 2-Back vs. 0-Back, from the ENBack task provided the highest predictive performance (r\=.4). Combining information across all 45 types of brain MRI led to the predictive performance of r\=.54. The PGS, based on previous genome-wide association studies on cognitive abilities, achieved a predictive performance of r\=.25. Environmental factors, including socio-demographics (e.g., parent’s income and education), lifestyles (e.g., extracurricular activities, sleep) and developmental adverse events (e.g., parental use of alcohol/tobacco, pregnancy complications), led to a predictive performance of r\=.49. 

      In a series of separate commonality analyses, we found that the relationship between cognitive abilities and mental health was primarily represented by multimodal neuroimaging (66%) and, to a lesser extent, by PGS (21%). Additionally, environmental factors accounted for 63% of the variance in the relationship between cognitive abilities and mental health. The multimodal neuroimaging and PGS then explained 58% and 21% of the variance due to environmental factors, respectively. Notably, these patterns remained stable over two years. 

      Our findings underscore the significance of neurobiological units of analysis for cognitive abilities, as measured by multimodal neuroimaging and PGS, in understanding both a) the relationship between cognitive abilities and mental health and b) the variance in this relationship shared with environmental factors.”

      (2) Could the authors clarify what they mean by "completing the transdiagnostic aetiology of mental health" in the introduction? (Second paragraph). 

      Thank you. 

      We intended to convey that understanding the transdiagnostic aetiology of mental health would be enhanced by knowing how neurobiological units of cognitive abilities, from the brain to genes, capture variations due to environmental factors. We realise this sentence might be confusing. Removing it does not alter the intended meaning of the paragraph, as we clarified this point later. The paragraph now reads:

      “According to the National Institute of Mental Health’s Research Domain Criteria (RDoC) framework (Insel et al., 2010), cognitive abilities should be investigated not only behaviourally but also neurobiologically, from the brain to genes. It remains unclear to what extent the relationship between cognitive abilities and mental health is represented in part by different neurobiological units of analysis -- such as neural and genetic levels measured by multimodal neuroimaging and polygenic scores (PGS). To fully comprehend the role of neurobiology in the relationship between cognitive abilities and mental health, we must also consider how these neurobiological units capture variations due to environmental factors, such as sociodemographics, lifestyles, and childhood developmental adverse events (Morris et al., 2022). Our study investigated the extent to which a) environmental factors explain the relationship between cognitive abilities and mental health, and b) cognitive abilities at the neural and genetic levels capture these associations due to environmental factors. Specifically, we conducted these investigations in a large normative group of children from the ABCD study (Casey et al., 2018). We chose to examine children because, while their emotional and behavioural problems might not meet full diagnostic criteria (Kessler et al., 2007), issues at a young age often forecast adult psychopathology (Reef et al., 2010; Roza et al., 2003). Moreover, the associations among different emotional and behavioural problems in children reflect transdiagnostic dimensions of psychopathology (Michelini et al., 2019; Pat et al., 2022), making children an appropriate population to study the transdiagnostic aetiology of mental health, especially within a framework that emphasises normative variation from normal to abnormal, such as the RDoC (Morris et al., 2022).“

      (3) It is unclear to me what the authors mean by this statement in the introduction: "Note that using the word 'proxy measure' does not necessarily mean that the predictive model for a particular measure has a high predictive performance - some proxy measures have better predictive performance than others". 

      We added this sentence to address a previous reviewer’s comment: “The authors use the phrasing throughout 'proxy measures of cognitive abilities' when they discuss PRS, neuroimaging, sociodemographics/lifestyle, and developmental factors. Indeed, the authors are able to explain a large proportion of variance with different combinations of these measures, but I think it may be a leap to call all of these proxy measures of cognition. I would suggest keeping the language more objective and stating these measures are associated with cognition.” 

      Because of this comment, we assumed that the reviewers wanted us to avoid the misinterpretation that a proxy measure implies high predictive performance. This term is used in machine learning literature (for instance, Dadi et al., 2021). We added the aforementioned sentence to ensure readers that using the term 'proxy measure' does not necessarily mean that the predictive model for a particular measure has high predictive performance. However, it seems that our intention led to an even more confusing message. Therefore, we decided to delete that sentence but keep an earlier sentence that explains the meaning of a proxy measure (see below).

      “With opportunistic stacking, we created a ‘proxy’ measure of cognitive abilities (i.e., predicted value from the model) at the neural unit of analysis using multimodal neuroimaging.”

      (4) Overall, despite comments from reviewers at another journal, I think the authors still refer to RDoC more than needed in the intro given the restructuring of the manuscript. For instance, at the end of page 4 and top of page 5, it becomes a bit confusing when the authors mention how they deviated from the RDoC framework, but their choice of cognitive domains is still motivated by RDoC. I think the chosen cognitive constructs are consistent with what is in ABCD and what other studies have incorporated into the g factor and do not require the authors to further justify their choice through RDoC. Also, there is emerging work showing that RDoC is limited in its ability to parse apart meaningful neuroimaging-based patterns; see for instance, Quah et al., Nature 2025 (https://doi.org/10.1038/s41467-025-55831-z). 

      Thank you very much for your comment. We have addressed it in our Response to Reviewer 1’s summary, strengths, and weaknesses above. We have rewritten the paragraph to clarify the relevance of our work to the RDoC framework and to recent studies aiming to improve RDoC constructs (including that from Quah and colleagues).

      (5) I am still on the fence about the use of 'proxy measures of cognitive abilities' given that it is defined as the predictive performance of mental health measures in predicting cognition - what about just calling these mental health predictors? Also, it would be easier to follow this train of thought throughout the manuscript. But I leave it to the authors if they decide to keep their current language of 'proxy measure of cognition'. 

      Thank you so much for your flexibility. As we explained previously, this ‘proxy measures’ term is used in machine learning literature (for instance, Dadi et al., 2021). We thought about other terms, such as “score”, which is used in genetics, i.e., polygenic scores (Choi et al., 2020). and has recently been used in neuroimaging, i.e., neuroscore (Rodrigue et al., 2024). However, using a ‘score’ is a bit awkward for mental health and socio-demographics, lifestyle and developmental adverse events. Accordingly, we decided to keep the term ‘proxy measures’.

      (6) It is unclear which cognitive abilities are being predicted in Figure 1, given the various domains that authors describe in their intro. Is it the g-factor from CFA? This should be clarified in all figure captions. 

      Yes, cognitive abilities are operationalised using a second-order latent variable, the g-factor from a CFA. We now added the following sentence to Figure 1, 2, 4 to make this point clearer. Thank you for the suggestion:

      “Cognitive abilities are based on the second-order latent variable, the g-factor, based on a confirmatory factor analysis of six cognitive tasks.”

      (7) I think it may also be worthwhile to showcase the explanatory power cognitive abilities have in predicting mental health or at least comment on this in the discussion. Certainly, there may be a bidirectional relationship here. The prediction direction from cognition to mental health may be an altogether different objective than what the paper currently presents, but many researchers working in psychiatry may take the stance (with support from the literature) that cognitive performance may serve as premorbid markers for later mental health concerns, particularly given the age range that the authors are working with in ABCD. 

      Thank you for this comment. 

      It is important to note that we do not make a directional claim in these cross-sectional analyses. The term "prediction" is used in a machine learning sense, implying only that we made an out-of-sample prediction (Yarkoni & Westfall, 2017). Specifically, we built predictive models on some samples (i.e., training participants) and applied our models to test participants who were not part of the model-building process. Accordingly, our predictive models cannot determine whether mental health “causes” cognitive abilities or vice versa, regardless of whether we treat mental health or cognitive abilities as feature/explanatory/independent variables or as target/response/outcome variables in the models. To demonstrate directionality, we would need to conduct a longitudinal analysis with many more repeated samples and use appropriate techniques, such as a cross-lagged panel model. It is beyond the scope of this manuscript and will need future releases of the ABCD data.

      We decided to use cognitive abilities as a target variable here, rather than a feature variable, mainly for theoretical reasons. This work was inspired by the RDoC framework, which emphasises functional domains. Cognitive abilities is the functional domain in the current study. We created predictive models to predict cognitive abilities based on a) mental health, b) multimodal neuroimaging, c) polygenic scores, and d) environmental factors. We could not treat cognitive abilities as a functional domain if we used them as a feature variable. For instance, if we predicted mental health (instead of cognitive abilities) from multimodal neuroimaging and polygenic scores, we would no longer capture the neurobiological units of analysis for cognitive abilities.

      We now made it clearer in the discussion that our use of predictive models cannot provide the directional of the effects

      “Our predictive modelling revealed a medium-sized predictive relationship between cognitive abilities and mental health. This finding aligns with recent meta-analyses of case-control studies that link cognitive abilities and mental disorders across various psychiatric conditions (Abramovitch et al., 2021; East-Richard et al., 2020). Unlike previous studies, we estimated the predictive, out-of-sample relationship between cognitive abilities and mental disorders in a large normative sample of children. Although our predictive models, like other cross-sectional models, cannot determine the directionality of the effects, the strength of the relationship between cognitive abilities and mental health estimated here should be more robust than when calculated using the same sample as the model itself, known as in-sample prediction/association (Marek et al., 2022; Yarkoni & Westfall, 2017). Examining the PLS loadings of our predictive models revealed that the relationship was driven by various aspects of mental health, including thought and externalising symptoms, as well as motivation. This suggests that there are multiple pathways—encompassing a broad range of emotional and behavioural problems and temperaments—through which cognitive abilities and mental health are linked.”

      (8) There is a lot of information packed into Figure 3 in the brain maps; I understand the authors wanted to fit this onto one page, and perhaps a higher resolution figure would resolve this, but the brain maps are very hard to read and/or compare, particularly the coronal sections. 

      Thank you for this suggestion. We agree with Reviewer 1 that we need to have a better visualisation of the feature-importance brain maps. To ensure that readers can clearly see the feature importance, we added a Zoom-in version of the feature-importance brain maps as Supplementary Figures 4 – 13.

      (9) It would be helpful for authors to cluster features in the resting state functional connectivity correlation matrices, and perhaps use shorter names/acronyms for the labels. 

      Thank you for this suggestion. 

      We have now added a zoomed-in version of the feature importance for rs-fmri as Supplementary Figure 7 (for baseline) and 12 (for follow-up).

      (10) Figures 4a) and 4b): please elaborate on "developmental adverse" in the title. I am assuming this is referring to childhood adverse events, or "developmental adversities". 

      Thank you so much for pointing this out. We meant ‘developmental adverse events’. We have made changes to this figure in the current manuscript.

      (11) For the "follow-up" analyses, I would recommend the authors present this using only the features that are indeed available at follow-up, even if the list of features is lower, otherwise it becomes a bit confusing with the mix of baseline and follow-up features. Or perhaps the authors could make this more clear in the figures by perhaps having a different color for baseline vs follow-up features along the y-axis labels. 

      Thank you for this advice. We have now added an indicator in the plot to show whether the features were collected in the baseline or follow-up. We also added colours to indicate which type of environmental factors they were. It is now clear that the majority of the features that were collected at baseline, but were used for the followup predictive model, were developmental adverse events.

      (12) Minor: Makowski et al 2023 reference can be updated to Makowski et al 2024, published in Cerebral Cortex. 

      Thank you for pointing this out. We have updated the citation accordingly. 

      References

      Abramovitch, A., Short, T., & Schweiger, A. (2021). The C Factor: Cognitive dysfunction as a transdiagnostic dimension in psychopathology. Clinical Psychology Review, 86, 102007. https://doi.org/10.1016/j.cpr.2021.102007

      Beam, E., Potts, C., Poldrack, R. A., & Etkin, A. (2021). A data-driven framework for mapping domains of human neurobiology. Nature Neuroscience, 24(12), 1733–1744. https://doi.org/10.1038/s41593-021-00948-9

      Calvin, C. M., Batty, G. D., Der, G., Brett, C. E., Taylor, A., Pattie, A., Čukić, I., & Deary, I. J. (2017). Childhood intelligence in relation to major causes of death in 68 year follow-up: Prospective population study. BMJ, j2708. https://doi.org/10.1136/bmj.j2708

      Casey, B. J., Cannonier, T., Conley, M. I., Cohen, A. O., Barch, D. M., Heitzeg, M. M., Soules, M. E., Teslovich, T., Dellarco, D. V., Garavan, H., Orr, C. A., Wager, T. D., Banich, M. T., Speer, N. K., Sutherland, M. T., Riedel, M. C., Dick, A. S., Bjork, J. M., Thomas, K. M., … ABCD Imaging Acquisition Workgroup. (2018). The Adolescent Brain Cognitive Development (ABCD) study: Imaging acquisition across 21 sites. Developmental Cognitive Neuroscience, 32, 43–54. https://doi.org/10.1016/j.dcn.2018.03.001

      Choi, S. W., Mak, T. S.-H., & O’Reilly, P. F. (2020). Tutorial: A guide to performing polygenic risk score analyses. Nature Protocols, 15(9), Article 9. https://doi.org/10.1038/s41596-020-0353-1

      Dadi, K., Varoquaux, G., Houenou, J., Bzdok, D., Thirion, B., & Engemann, D. (2021). Population modeling with machine learning can enhance measures of mental health. GigaScience, 10(10), giab071. https://doi.org/10.1093/gigascience/giab071

      Deary, I. J. (2012). Intelligence. Annual Review of Psychology, 63(1), 453–482. https://doi.org/10.1146/annurev-psych-120710-100353

      Deary, I. J., Pattie, A., & Starr, J. M. (2013). The Stability of Intelligence From Age 11 to Age 90 Years: The Lothian Birth Cohort of 1921. Psychological Science, 24(12), 2361–2368. https://doi.org/10.1177/0956797613486487

      East-Richard, C., R. -Mercier, A., Nadeau, D., & Cellard, C. (2020). Transdiagnostic neurocognitive deficits in psychiatry: A review of meta-analyses. Canadian Psychology / Psychologie Canadienne, 61(3), 190–214. https://doi.org/10.1037/cap0000196

      Insel, T., Cuthbert, B., Garvey, M., Heinssen, R., Pine, D. S., Quinn, K., Sanislow, C., & Wang, P. (2010). Research Domain Criteria (RDoC): Toward a New Classification Framework for Research on Mental Disorders. American Journal of Psychiatry, 167(7), 748–751. https://doi.org/10.1176/appi.ajp.2010.09091379

      Kessler, R. C., Amminger, G. P., Aguilar-Gaxiola, S., Alonso, J., Lee, S., & Üstün, T. B. (2007). Age of onset of mental disorders: A review of recent literature. Current Opinion in Psychiatry, 20(4). https://journals.lww.com/co-psychiatry/fulltext/2007/07000/age_of_onset_of_mental_disorders_a_review_of .10.aspx

      Marek, S., Tervo-Clemmens, B., Calabro, F. J., Montez, D. F., Kay, B. P., Hatoum, A. S., Donohue, M. R., Foran, W., Miller, R. L., Hendrickson, T. J., Malone, S. M., Kandala, S., Feczko, E., Miranda-Dominguez, O., Graham, A. M., Earl, E. A., Perrone, A. J., Cordova, M., Doyle, O., … Dosenbach, N. U. F. (2022). eproducible brain-wide association studies require thousands of individuals. Nature, 603(7902), 654–660. https://doi.org/10.1038/s41586-022-04492-9

      Michelini, G., Barch, D. M., Tian, Y., Watson, D., Klein, D. N., & Kotov, R. (2019). Delineating and validating higher-order dimensions of psychopathology in the Adolescent Brain Cognitive Development (ABCD) study. Translational Psychiatry, 9(1), 261. https://doi.org/10.1038/s41398-019-0593-4

      Morris, S. E., & Cuthbert, B. N. (2012). Research Domain Criteria: Cognitive systems, neural circuits, and dimensions of behavior. Dialogues in Clinical Neuroscience, 14(1), 29–37.

      Morris, S. E., Sanislow, C. A., Pacheco, J., Vaidyanathan, U., Gordon, J. A., & Cuthbert, B. N. (2022). Revisiting the seven pillars of RDoC. BMC Medicine, 20(1), 220. https://doi.org/10.1186/s12916-022-02414-0

      Pat, N., Riglin, L., Anney, R., Wang, Y., Barch, D. M., Thapar, A., & Stringaris, A. (2022). Motivation and Cognitive Abilities as Mediators Between Polygenic Scores and Psychopathology in Children. Journal of the American Academy of Child and Adolescent Psychiatry, 61(6), 782-795.e3. https://doi.org/10.1016/j.jaac.2021.08.019

      Pat, N., Wang, Y., Bartonicek, A., Candia, J., & Stringaris, A. (2023). Explainable machine learning approach to predict and explain the relationship between task-based fMRI and individual differences in cognition. Cerebral Cortex, 33(6), 2682–2703. https://doi.org/10.1093/cercor/bhac235

      Quah, S. K. L., Jo, B., Geniesse, C., Uddin, L. Q., Mumford, J. A., Barch, D. M., Fair, D. A., Gotlib, I. H., Poldrack, R. A., & Saggar, M. (2025). A data-driven latent variable approach to validating the research domain criteria framework. Nature Communications, 16(1), 830. https://doi.org/10.1038/s41467-025-55831-z

      Reef, J., Diamantopoulou, S., van Meurs, I., Verhulst, F., & van der Ende, J. (2010). Predicting adult emotional and behavioral problems from externalizing problem trajectories in a 24-year longitudinal study. European Child & Adolescent Psychiatry, 19(7), 577–585. https://doi.org/10.1007/s00787-010-0088-6

      Rodrigue, A. L., Hayes, R. A., Waite, E., Corcoran, M., Glahn, D. C., & Jalbrzikowski, M. (2024). Multimodal Neuroimaging Summary Scores as Neurobiological Markers of Psychosis. Schizophrenia Bulletin, 50(4), 792–803. https://doi.org/10.1093/schbul/sbad149

      Roza, S. J., Hofstra, M. B., Van Der Ende, J., & Verhulst, F. C. (2003). Stable Prediction of Mood and Anxiety Disorders Based on Behavioral and Emotional Problems in Childhood: A 14-Year Follow-Up During Childhood, Adolescence, and Young Adulthood. American Journal of Psychiatry, 160(12), 2116–2121. https://doi.org/10.1176/appi.ajp.160.12.2116

      Yarkoni, T., & Westfall, J. (2017). Choosing Prediction Over Explanation in Psychology: Lessons From Machine Learning. Perspectives on Psychological Science, 12(6), 1100–1122. https://doi.org/10.1177/1745691617693393

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Perlee et al. sought to generate a zebrafish line where CRISPR-based gene editing is exclusively limited to the melanocyte lineage, allowing assessment of cell-type restricted gene knockouts. To achieve this, they knocked in Cas9 to the endogenous mitfa locus, as mitfa is a master regulator of melanocyte development. The authors use multiple candidate genes - albino, sox10, tuba1a, ptena/ptenb, tp53 - to demonstrate their system induces lineagerestricted gene editing. This method allows researchers to bypass embryonic lethal and non-cell autonomous phenotypes emerging from whole body knockout (sox10, tuba1a), drive directed phenotypes, such as depigmentation (albino), and induce lineage-specific tumors, such as melanomas (ptena/ptenb, tp53, when accompanied with expression of BRAFV600E). While the genetic approaches are solid, the argued increase in efficiency of this model compared to current tools was untested, and therefore unable to be assessed. Furthermore, the mechanistic explanations proposed to underlie their phenotypes are mostly unfounded, as discussed further in the Weaknesses section. Despite these concerns, there is still a clear use for this genetic methodology and its implementation will be of value to many in vivo researchers.

      Strengths:

      The strongest component of this manuscript is the genetic control offered by the mitfa:Cas9 system and the ability to make stable, lineage-specific knockouts in zebrafish. This is exemplified by the studies of tuba1a, where the authors nicely show non-cell autonomous mechanisms have obfuscated the role of this gene in melanocyte development. In addition, the mitfa:Cas9 system is elegantly straightforward and can be easily implemented in many labs. Mostly, the figures are clean, controls are appropriate, and phenotypes are reproducible. The invented method is a welcomed addition to the arsenal of genetic tools used in zebrafish.

      Weaknesses:

      The major weaknesses of the manuscript include the overly bold descriptions of the value of the model and the superficial mechanistic explanations for each biological vignette.

      The authors argue that a major advantage of this system is its high efficiency. However, no direct comparison is made with other tools that achieve the same genetic control, such as MAZERATI. This is a missed opportunity to provide researchers the ability to evaluate these two similar genetic approaches. In addition, Fig.1 shows that not all melanocytes express Cas9. This is a major caveat that goes unaddressed. It is of paramount importance to understand the percentage of mitfa+ cells that express Cas9. The histology shown is unclear and too zoomed out of a scale to make any insightful conclusions, especially in Fig.S1. It would also be beneficial to see data regarding Cas9 expression in adult melanocytes, which are distinct from embryonic melanocytes in zebrafish. Moreover, this system still requires the injection of a plasmid encoding gRNAs of interest, which will yield mosaicism. A prime example of this discrepancy is in Fig.6, where sox10 is clearly still present in "sox10 KO" tumors.

      We agree with these points. While our method has the advantage of endogenous knockin (thus keeping all regulatory elements), you are correct that we did not make a direct comparison with existing technologies like MAZERATI, and therefore we cannot make comparative claims about efficiency. Based on this, we have revised the manuscript to remove these points, reduce the strength/boldness of the claims, and make it more clear what our system achieves in comparison to existing systems. In reference to the other specific points you raise above about mosaicism and extent of Cas9 expression:

      - We have added a paragraph to address the advantages and disadvantages of mitfaCas9 compared to expression of Cas9 with lineage-specific promoters including MAZERATI in the discussion.  

      - Figure 1C has been revised to more clearly show the overlap of mitfa and Cas9 in melanocytes. 

      - We then quantified the percentage of mitfa+ cells expressing Cas9 from the in situ hybridizations (Supplemental Figure S1D). We did attempt to look at Cas9 protein expression in both embryonic and adult melanocytes by immunofluorescence. Unfortunately, the Cas9 antibodies commercially available did not work on the zebrafish embryos or adult tailfins, so we are limited in proper quantification to the in situs in the embryos.

      The authors argue that their model allows rapid manipulation of melanocyte gene expression. Enthusiasm for the speed of this model is diminished by minimal phenotypes in the F0, as exemplified in Fig.2. Although the authors say >90% of fish have loss of pigmentation, this is misleading as the phenotype is a very weak, partial loss. Only in the F1 generation do robust phenotypes emerge, which takes >6 months to generate. How this is more efficient than other tools that currently exist is unclear and should be discussed in more detail.

      This needed clarification, and we have now modified the Discussion to reflect this more accurately. What we were trying to show is that both F0 and F1 fish can be useful in screening for the effect of any given gene. In the F0, while you are correct that the phenotype is indeed weak/partial, it is also quantifiable and therefore can be used as a rapid screen for potential effects of knockout, so it can help with speed. The major advantage of the F1 generation is that we can generate fully penetrant phenotypes for recessive genes since the fish just needs to have 1 copy of the Cas9/sgRNA instead of 2. This means we do not have to go to F2 or F3 generations, which really does save time. But we agree this could be achieved using MAZERATI, and so we have added these considerations to the manuscript, as we feel these are important.

      In Figure 3, the authors find that melanocyte-specific knockout of sox10 leads to only a 25% reduction in melanocytes in the F1 generation. This is in contradiction to prior literature cited describing sox10 as indispensable for melanocyte development. In addition, the authors argue that sox10 is required for melanocyte regeneration. This claim is not accurate, as >50% of melanocytes killed upon neocuproine treatment can regenerate. This data would indicate that sox10 is required for only a subset of melanocytes to develop (Fig.3C) and for only a subset to regenerate (Fig.3G). This is an interesting finding that is not discussed or interrogated further.

      We too were initially very puzzled by this result. We do not completely understand it, but we have two thoughts about it. First could be timing. sox10 usually starts to be expressed around the 1-somite stage, and so in the original sox10/colourless mutant (which truly has no melanocytes), sox10 will be lost during those early stages. In contrast, mitf comes on later (around 18hpf) so this might indicate that there is a subset of melanocytes that are dependent upon this early expression of sox10. This may indicate that there could be different functions of sox10 early in melanocyte development versus later timepoints after melanocytes have already been specified. This might also help explain our findings during regeneration.  Second could be genetic compensation. Since in the other parts of the paper we seem to see a somewhat reciprocal relationship between sox10 and sox9, it is conceivable that loss of sox10 in the melanocytes could be compensated for by sox9 (or even other genes) in our CRISPR approach (as opposed to the ENU allele in colourless). Since we really do not fully understand this, we have added a section to the Discussion about this issue, mentioning these possibilities but leaving open other yet to be defined mechanisms.

      Tumor induction by this model is weak, as indicated by the tumor curves in Figs.5,6. This might be because these fish are mitfa heterozygous. Whereas the avoidance of mitfa overexpression driven by other models including MAZERATI is a benefit of this system, the effect of mitfa heterozygosity on tumor incidence was untested. This is an essential question unaddressed in the manuscript.

      We agree that in the BRAF;p53 group especially tumor incidence is very low, although PTEN loss does accelerate it. One possibility is exactly as you stated, and that mitfa heterozygosity is the etiology. The other possibility is that in the MAZERATI approach (https://pubmed.ncbi.nlm.nih.gov/30385465/) the authors used the casper background as opposed to the wild-type T5D as we did in our study. In unpublished observations, we have found that casper (with miniCoopR rescue) is markedly more sensitive to melanoma induction compared to WT fish in this setting. In fact, in looking at our BRAF;p53 curves compared to the original Patton paper curves (https://pubmed.ncbi.nlm.nih.gov/15694309/) which were also done in a WT background with no miniCoopR, they are fairly similar. This might indicate that casper + miniCoopR particularly sensitizes the fish to melanoma. However, because we do not fully know the reasons for this, we have now included both of these possible reasons in the Discussion.

      In Fig.6, the authors recapitulate previous findings with their model, showing sox10 KO inhibits tumor onset. The tumors that do develop are argued to be highly invasive, have mesenchymal morphology, and undergo phenotypic switching from sox10 to sox9 expression. The data presented do not sufficiently support these claims. The histology is not readily suggestive of invasive, mesenchymal melanomas. Sox10 is still present in many cells and sox9 expression is only found in a small subset (<20%). Whether sox10-null cells are the ones expressing sox9 is untested. If sox9-mediated phenotypic switching is the major driver of these tumors, the authors would need to knockout sox9 and sox10 simultaneously and test whether these "rare" types of tumors still emerge. Additional histological and genetic evaluation is required to make the conclusions presented in Fig.6. It feels like a missed opportunity that the authors did not attempt to study genes of unknown contribution to melanoma with their system.

      We did not mean to overstate the admittedly early observations from these fish. Invasiveness in the fish models can be difficult to precisely quantify, and therefore is somewhat qualitative. While we did not mean to imply that every cell that loses sox10 will become sox9 positive (which is clearly not the case), the human single-cell RNA-seq data does suggest these are somewhat mutually exclusive populations (https://pubmed.ncbi.nlm.nih.gov/32753671/). This phenomenon has also long been observed even prior to single-cell approaches (https://pubmed.ncbi.nlm.nih.gov/25629959/). So while we agree our data is not definitive in this regard, it is consistent with the literature and was presented mainly to provide areas for future exploration with the model. 

      Overall, this manuscript introduces a solid method to the arsenal of zebrafish genetic tools but falls short of justifying itself as a more efficient and robust approach than what currently exists. The mechanisms provided to explain observed phenotypes are tenuous. Nonetheless, the mitfa:Cas9 approach will certainly be of value to many in vivo biologists and lays the foundation to generate similar methods using other tissue-specific regulators and other Cas proteins.

      We hope that by toning down the language around what we have observed, and providing as honest an assessment as possible as to what might be occurring, that the manuscript will be helpful for future studies aiming to knock out genes in the melanocyte lineage.

      Reviewer #2 (Public review):

      Summary:

      This manuscript describes a genetic tool utilizing mutant mitfa-Cas9 expressing zebrafish to knockout genes to analyze their function in melanocytes in a range of assays from developmental biology to tumorigenesis. Overall, the data are convincing and the authors cover potential caveats from their model that might impact its utility for future work.

      Strengths:

      The authors do an excellent job of characterizing several gene deletions that show the specificity and applicability of the genetic mitfa-Cas9 zebrafish to studying melanocytes.

      Weaknesses:

      Variability across animals not fully analyzed.

      To more clearly show variability across animals, we calculated the percentage of mitfa+ cells that express Cas9 across n=7 mitfaCas9 embryos. We also expanded Supplemental Figure 2 to show loss of pigmentation across n=7 individual adult MG-albino F2 fish instead of one representative image.

      Reviewer #3 (Public review):

      Summary:

      Perlee et al. present a method for generating cell-type restricted knockouts in zebrafish, focusing on melanocytes. For this method, the authors knock-in a Cas9 encoding sequence into the mitfa locus. This mitfaCas9 line has restricted Cas9 expression, allowing the authors to generate melanocyte-specific knockouts rapidly by follow-up injection of sgRNA expressing transposon vectors.

      The paper presents some interesting vignettes to illustrate the utility of their approach. These include 1) a derivation of albino mutant fish as a demonstration of the method's efficiency, 2) an interrogation and novel description of tuba1a as a potential non-autonomous contributor to melanocyte dispersion, and 3) the generation of sox10 deficient melanoma tumors that show "escape" of sox10 loss through upregulation of sox9. The latter two examples highlight the usefulness of cell-type targeted knockouts (Body-wide sox10 and tuba1a loss elicit developmental defects). Additionally, the tumor models involve highly multiplexed sgRNAs for tumor initiation which is nicely facilitated by the stable Cas9.

      Strengths:

      The approach is clever and could prove very useful for studying melanocytes and other cell types. As the authors hint at in their discussion, this approach would become even more powerful with the generation of other Cas9-restricted lineages so a single sgRNA construct can be screened across many lineages rapidly (or many sgRNA and fish lines screened combinatorially).

      The biological findings used to demonstrate the power of the approach are interesting in their own right. If it proves true, tuba1a's non-autonomous effects on melanosome dispersion are striking, and this example demonstrates very nicely how one could use Perlee et al.'s approach to search for other non-autonomous mechanisms systematically. Similarly, the observation of the sox9 escape mechanism with sox10 loss is a beautiful demonstration of the relevance of SOX10/SOX9's reciprocal regulation in vivo. This system would be a very nice model for further interrogating mechanisms/interventions surrounding Sox10 in melanoma.

      Finally, the figure presentation is very nice. This work involves complex genetic approaches including multiple fish generations and multiplexed construct injections. The vector diagrams and breeding schemes in the paper make everything very clear/"grok-able," and the paper was enjoyable to read.

      Weaknesses:

      The mitfa-driven GFP on their sgRNA-expressing cassette is elegant, but it makes one wonder why the endogenous knock-in is necessary. It would strengthen the motivation of the work if the authors could detail the potential advantages and disadvantages of their system compared to expressing Cas9 with a lineage-specific promoter from a transposon in their introduction or discussion.

      We agree this needed a better and more clear explanation. There are many excellent examples of promoter driven Cas9 approaches. Within melanocytes, Ablain and others have developed the MAZERATI system (https://pubmed.ncbi.nlm.nih.gov/30385465/) which is very powerful, especially for melanoma development. In our minds, the major advantage of endogenous knockin is that we retain all of the natural regulatory elements (many of which are not known) and so small promoter fragments always run the risk of missing certain types of regulation. While these regulatory elements may not matter under homeostatic conditions, they may become very important under perturbation, stress or disease states. This is why it is common, for example, in the mouse field, to knock in things like Cre into endogenous loci. We have now added a clarification of this to the manuscript.

      Related to the above - is mitfa haplosufficient? If the mitfaCas9/+ fish have any notable phenotypes, it would be worth noting for others interested in using this approach to study melanoma and pigmentation.

      In normal melanocytes, mitfa is haplosufficient. There are no visible differences between mitfaCas9/+ and wild-type fish at any stages of development (Figure S1C). Although we did not directly compare tumor growth in mitfa-/+ and mitfa+/+ fish in this study, it is possible that the disruption of mitfa in mitfaCas9/+ fish affects melanoma development. Most zebrafish melanoma models involve the overexpression of mitfa with MiniCoopR vectors and it would be interesting in future studies to determine how mitfa heterozygosity affects melanoma initiation or progression. 

      A core weakness (and also potential strength) of the system is that introduced edits will always be non-clonal (Fig 2H/I). The activity of individual sgRNAs should always be validated in the absence of any noticeable phenotype to interpret a negative result. Additionally, caution should be taken when interpreting results from rare events involving positive outgrowth (like tumorogenesis) to account for the fact many cells in the population might not have biallelic null alleles (i.e., 100% of the gene product removed).

      Along those lines: in my opinion, the tuba1a results are the most provocative finding in the paper, but they lack key validation. With respect to cutting activity, the Alt-R and transgenic sgRNA expression approaches are not directly comparable. Since there is no phenotype in the melanocyte specific tuba1a knockouts, the authors must confirm high knockout efficiency with this set of reagents before making the claim there is a non-autonomous phenotype. This can be achieved with GFP+ sorting and NGS like they performed with their albino melanocytes.

      The whole-body tuba1a knockout phenotype is expected to be pleiotropic, and this expectation might mask off-target effects. Controls for knockout specificity should be included. For instance, confidence in the claims would greatly increase if the dispersed melanosome phenotype could be recovered with guide-resistant tuba1a re-expression and if melanocyte-restricted tuba1a reexpression failed to rescue. As a less definitive but adequate alternative, the authors could also test if another guide or a morpholino against tuba1a phenocopies the described Alt-R edited fish.

      Thank you for your thoughtful suggestions, which led us to an important discovery. While validating the original tuba1a guide RNA, we found that tuba1a sg1 also targets tuba1c, a gene that shares 99.78% homology with tuba1a in zebrafish. To determine which gene was responsible for the melanocyte phenotype, we designed multiple new guide RNAs specifically targeting either tuba1a or tuba1c and used Alt-R to globally knock them out in zebrafish embryos. However, none of these guides successfully replicated the phenotype (Sanger sequencing validation for the most efficient tuba1a and tuba1c guides is provided below).

      Ultimately, we identified a new guide RNA (5’-GGTCTACAAAGACAGCCCTA-3’) that successfully phenocopied the original tuba1a sg1 melanocyte phenotype. Tuba1c—but not tuba1a—was predicted to have a mismatch at the 3’ end of the guide sequence, which is typically expected to inhibit target cleavage. Surprisingly, despite this mismatch, we observed robust cleavage in both tuba1a and tuba1c. Since the melanocyte phenotype was only reproducible when both tuba1a and tuba1c were targeted, this suggests potential compensatory interactions between these highly similar genes. We have updated the text and figures to reflect this finding and have included validation of this second guide RNA (tuba1a/c sg2) in Supplemental Figure 3.

      As you suggested, we also conducted GFP+ sorting and NGS to confirm knockout of both tuba1a and tuba1c in melanocytes of mitfaCas9 fish (Figure S3G). The knockout percentages were comparable to those observed in our previous experiment with MG_-albino_ fish. This also confirms that this method can be used to sort and sequence GFP+ cells even when pigmentation is retained, which was not the case for albino fish. 

      I have similar questions about the sox10 escapers, but these suggestions are less critical for supporting the authors claims (especially given the nice staining). Are the sox10 tumors relatively clonal with respect to sox10 mutations? And are the sox10 tumor mutations mostly biallelic frameshifts or potential missense mutations/single mutations that might not completely remove activity? I am particularly curious as SOX10 doesn't seem to be completely absent (and is still very high in some nuclei) in the immunohistochemistry.

      We attempted to address this question by performing DNA sequencing on the FFPE blocks that we had retained from the original study. While our sequencing facility said this should be possible, we could not consistently generate high enough quality DNA to make a definitive statement either way. While we are very curious to know what the nature of the mutations are in these “escapers”, the student who performed these studies has now graduated, and it would take us several additional months to a year to fully address it. Given this, we would prefer to leave this open question to a future paper, but have addressed this limitation in the Discussion.

      Recommendations for the authors:

      Reviewing Editor:

      Overall, the reviewers felt and eLife concurs that your manuscript is insightful and appropriate for publication. Reviewers were impressed by your generating a zebrafish line where CRISPRbased gene editing is exclusively limited to the melanocyte lineage, allowing assessment of celltype restricted gene knockouts. Your use of multiple candidate genes to demonstrate that your system induces lineage-restricted gene editing is compelling and will be of interest to the broad readership of eLife. This method will allow researchers to bypass embryonic lethal and non-cell autonomous phenotypes emerging from whole body knockout, drive directed phenotypes, such as depigmentation, and induce lineage-specific tumors, such as melanomas. This said, the argued increase in efficiency of this model compared to current tools was untested, and therefore it remains difficult for a reader to assess the extent to which your new model represents a major advance over prior ones. Of additional concern are the mechanistic explanations proposed to underlie the phenotypes, as these are largely unfounded. Thus, in preparing your final publication version of the paper, eLife strongly encourages you to fully address the reviewers' thoughtful comments. In particular, the boldness of the claims made in the manuscript should be reduced. Terms like "highly efficient" and "rapid" are unsupported due to the lack of comparison with other well-established methods, like MAZERATI.

      As discussed above in each of the reviewer points above, we agree with both of these points. We have reduced the boldness of the claims, with a better discussion of the different approaches. We also address the potential mechanisms of our observations, and where and why we still lack an understanding of what gives rise to those phenotypes. 

      There are also some minor discrepancies that should be edited in the manuscript: Fig.2A plasmid description is written oppositely in text; Fig.3 labels G-H are swapped in the legend description; Fig.5A MTdT is unexplained. This is a non-exhaustive list, and the authors are encouraged to carefully read through their manuscript to revise other minor mistakes and formatting errors.

      Figure 2A was revised to show the correct orientation of mitfa:GFP and the guide RNA cassette as described in the text. Figure 3 legend was fixed. We have gone through the manuscript again to make sure we have not made any other errors, to the best of our knowledge.

      The biggest concern is the expression of cas9 and the weak histological support shown in Fig.1 and Fig.S1. It would be a benefit to all readers and potential future users to know how robust cas9 expression is in the melanocyte lineage. It would be helpful if there is a way to analyze the percentage of cells that are mutated in each animal to understand the variability that can exist across animals with the method.

      We have revised Figure 1C to show additional melanocytes and added a new quantification of Cas9 RNA expression in melanocytes (S1D). 

      The analysis of the scRNA sequencing could also be described more fully.

      More details have been added to the scRNA sequencing analysis including the functions that were used. 

      The final major concern is whether this model is genuinely more valuable than MAZERATI. A more elaborate discussion would benefit potential future users to guide their decisions regarding which tool best suits their experimental goals.

      As noted above, we agree with this statement. The reviewers are correct in that we did not directly compare our system to MAZERATI, and therefore cannot make any claims about efficiency in a comparative regard. Therefore, in our revised Discussion, we talk about the relative strengths and weaknesses of each approach, and emphasize that our approach mainly has the advantage of retaining endogenous regulatory elements for mitfa, but that each user should decide which is the best approach for their problem.

      There are also some minor concerns that should be addressed.

      Are the mitfaCas9 fish used as homozygotes before the first cross? If so, might be nice to include their nacre-like phenotype in diagrams like Fig 2A.

      For these studies, heterozygous mitfaCas9 fish were used for all breedings and progeny were sorted for BFP+ eyes. This enabled the comparison to sibling controls without Cas9 expression. 

      BFP+ eye screening for mitfaCas9 is elegant and included nicely in the diagrams. Are germline sgRNA integrants identified in F1 with melanocyte GFP? Or present at a high enough efficiency that this is not relevant? This would be good to include in the diagrams.

      Germline sgRNA integrants are identified with melanocyte GFP in embryos. Figure 2A has been edited to show GFP expression. 

      Most cells are GFP positive in S3C (the F0 "mosaic"). It might be nice to show a single GFP stripe like in the other panels for direct comparison of edited/non-edited in the same fish.

      This figure (now S3E) has been edited to show a clear comparison between GFP+ and GFP- cells in the same fish. 

      177 - CRISPR-Seq is basically amplicon sequencing. This would measure efficiency but not "specificity" as described. Off-target activity would have to be measured at other loci etc. Not necessary to do, but I don't think measured.

      In this case, “specificity” refers to cell type specificity, not genomic specificity. We are measuring cell type specificity by comparing on-target cutting in GFP+ cells (melanocytes) versus GFP- cells (non-mitfa expressing cells). We did not look at off-target activity of Cas9 in this study and have edited the text to make this clearer. 

      219 -"several gaps were visible"

      Fixed

      286 - TUBA1A should be italicized

      Fixed

      399 - SOX9's most enriched dependency in DepMap is cutaneous melanoma and its top coessential gene is SOX10. I'm not sure the SOX9/SOX10 interaction couldn't be parsed from DepMap alone.

      This is true, and the DepMap was actually somewhat of an inspiration for our own studies. We have modified the line to acknowledge this and explain the main advantage of our system is in vivo confirmation of what the DepMap had alluded to.

      433 - "fewer animals since all F1 animals (even those for recessive alleles) are informative."

      The fact that this is approach is faster and more efficient per animal is important to highlight (and very believable), but is this technically true given not all F1 fish will have Cas9 or a germline sgRNA integration?

      In considering this statement, we agree with you and decided to remove it from the text.

      We hope the comments in both the public and private reviews will help improve the manuscript.

      Reviewer #1 (Recommendations for the authors):

      Overall, the boldness of the claims made in the manuscript should be reduced. Terms like "highly efficient" and "rapid" are unsupported due to the lack of comparison with other wellestablished methods, like MAZERATI.

      As discussed above, we agree with this and have now modified the manuscript to better reflect what our system achieves in comparison to the well developed systems such as MAZERATI. Because we have not done a direct comparison, we are not able to make any claims about comparative efficiency, and instead focus on the potential benefits of a knockin approach, which is the maintenance of endogenous regulatory elements.

      There are some minor discrepancies that should be edited in the manuscript: Fig.2A plasmid description is written oppositely in text; Fig.3 labels G-H are swapped in the legend description; Fig.5A MTdT is unexplained. This is a non-exhaustive list, and the authors are encouraged to carefully read through their manuscript to revise other minor mistakes and formatting errors.

      Figure 2A was revised to show the correct orientation of mitfa:GFP and the guide RNA cassette as described in the text. Figure 3 legend was fixed. We have gone through the manuscript again to make sure we have not made any other errors, to the best of our knowledge.

      The biggest concern is the expression of cas9 and the weak histological support shown in Fig.1 and Fig.S1. It would be a benefit to all readers and potential future users to know how robust cas9 expression is in the melanocyte lineage.

      We have revised Figure 1C to show additional melanocytes and added a new quantification of Cas9 RNA expression in melanocytes (S1D). 

      The second major concern is whether this model is genuinely more valuable than MAZERATI. A more elaborate discussion would benefit potential future users to guide their decision regarding which tool best suits their experimental goals.

      As noted above, we agree with this statement. The reviewers are correct in that we did not directly compare our system to MAZERATI, and therefore cannot make any claims about efficiency in a comparative regard. Therefore, in our revised Discussion, we talk about the relative strengths and weaknesses of each approach, and emphasize that our approach mainly has the advantage of retaining endogenous regulatory elements for mitfa, but that each user should decide which is the best approach for their problem.

      We hope the comments in both the public and private reviews will help improve the manuscript.

      Reviewer #2 (Recommendations for the authors):

      While that authors show the indel charts for the Crispr mutations generated in the supplement. However, I wonder if there is a way to analyze the percentage of cells that are mutated in each animal to understand the variability that can exist across animals with the method.

      We have revised Figure 1C to show additional melanocytes and added a new quantification of Cas9 RNA expression in melanocytes (S1D). 

      The analysis of the scRNA sequencing could be described more fully.

      More details have been added to the scRNA sequencing analysis including the functions that were used. 

      Reviewer #3 (Recommendations for the authors):

      This was an excellent read, and I'm very interested in seeing it in its final form. Congratulations! My larger critiques are outlined in the public reviews. A few smaller points:

      Are the mitfaCas9 fish used as homozygotes before the first cross? If so, might be nice to include their nacre-like phenotype in diagrams like Fig 2A.

      For these studies, heterozygous mitfaCas9 fish were used for all breedings and progeny were sorted for BFP+ eyes. This enabled the comparison to sibling controls without Cas9 expression. 

      BFP+ eye screening for mitfaCas9 is elegant and included nicely in the diagrams. Are germline sgRNA integrants identified in F1 with melanocyte GFP? Or present at a high enough efficiency that this is not relevant? This would be good to include in the diagrams.

      Germline sgRNA integrants are identified with melanocyte GFP in embryos. Figure 2A has been edited to show GFP expression. 

      Most cells are GFP positive in S3C (the F0 "mosaic"). It might be nice to show a single GFP stripe like in the other panels for direct comparison of edited/non-edited in the same fish.

      This figure (now S3E) has been edited to show a clear comparison between GFP+ and GFP- cells in the same fish. 

      177 - My understanding is that CRISPR-Seq is basically amplicon sequencing. This would measure efficiency but not "specificity" as described. Off-target activity would have to be measured at other loci etc. Not necessary to do in my opinion, but I don't think measured.

      In this case, “specificity” refers to cell type specificity, not genomic specificity. We are measuring cell type specificity by comparing on-target cutting in GFP+ cells (melanocytes) versus GFP- cells (non-mitfa expressing cells). We did not look at off-target activity of Cas9 in this study and have edited the text to make this clearer. 

      219 -"several gaps were visible"

      Fixed

      286 - TUBA1A should be italicized

      Fixed

      399 - I think I understand the logic of the DepMap argument, and the importance of studying tumor initiation in vivo stands for itself. But here is maybe not the best example (or might need clarification)? - SOX9's most enriched dependency in DepMap is cutaneous melanoma and its top co-essential gene is SOX10. I'm not sure the SOX9/SOX10 interaction couldn't be parsed from DepMap alone.

      This is true, and the DepMap was actually somewhat of an inspiration for our own studies. We have modified the line to acknowledge this and explain the main advantage of our system is in vivo confirmation of what the DepMap had alluded to.

      433 - "fewer animals since all F1 animals (even those for recessive alleles) are informative."

      The fact that this is approach is faster and more efficient per animal is important to highlight (and very believable), but is this technically true given not all F1 fish will have Cas9 or a germline sgRNA integration?

      In considering this statement, we agree with you and decided to remove it from the text.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript by Garbelli et al. investigates the roles of excitatory amino acid transporters (EAATs) in retinal bipolar cells. The group previously identified that EAAT5b and EAAT7 are expressed at the dendritic tips of bipolar cells, where they connect with photoreceptor terminals. The previous study found that the light responses of bipolar cells, measured by electroretinogram (ERG) in response to white light, were reduced in double mutants, though there was little to no reduction in light responses in single mutants of either EAAT5b or EAAT7.

      The current study further explores the roles of EAAT5b and EAAT7 in bipolar cells' chromatic responses. The authors found that bipolar cell responses to red light, but not to green or UV-blue light, were reduced in single mutants of both EAAT5b and EAAT7. In contrast, UV-blue light responses were reduced in double mutants. Additionally, the authors observed that EAAT5b, but not EAAT7, is strongly localized in the UV cone-enriched area of the eye, known as the "Strike Zone (SZ)." This led them to investigate the impact of the EAAT5b mutation on prey detection performance, which is mediated by UV cones in the SZ. Surprisingly, contrary to the predicted role of EAAT5b in prey detection, EAAT5b mutants did not show any changes in prey detection performance compared to wild-type fish. Interestingly, EAAT7 mutants exhibited enhanced prey detection performance, though the underlying mechanisms remain unclear.

      The distribution of EAAT7 protein in the outer plexiform layer across the eye correlates with the distribution of red cones. Based on this, the authors tested the behavioral performance driven by red light in EAAT5b and EAAT7 mutants. The results here were again somewhat contrary to predictions based on ERG findings and protein localization: the optomotor response was reduced in EAAT5b mutants, but not in EAAT7 mutants.

      Strengths:

      Although the paper lacks cohesive conclusions, as many results contradict initial predictions as mentioned above, the authors discuss possible mechanisms for these contradictions and suggest future avenues for study. Nevertheless, this paper demonstrates a novel mechanism underlying chromatic information processing.

      The manuscript is well-written, the data are well-presented, and the analysis is thorough.

      We are happy about the perceived strengths of our manuscript.

      Weaknesses:

      I have only a minor comment. The authors present preliminary data on mGluR6b distribution across the eye. Since this result is based on a single fish, I recommend either adding more samples or removing this data, as it does not significantly impact the paper's main conclusions.

      We agree that the mGluR6 result is statistically underpower (we would never claim differently). The data is based on only one clutch of fish, comprising 11 eyes. Since the data is anyway in the supplement and not part of the main story, we would like to keep it to spur further investigations into anisotropic distribution of synaptic proteins.

      Reviewer #2 (Public review):

      Garbelli et. al. set out to elucidate the function of two glutamate transporters, EAAT5b and EAAT7, in the functional and behavioral responses to different wavelengths of light. The question is an interesting one, because these transporters are well positioned to affect responses to light, and their distribution in the retina suggests that they could play differential roles in visual behaviors. However, the low resolution of both the functional and behavioral data presented here means that the conclusions are necessarily a bit vague.

      In Figure 1, the authors show that the double KO has a decreased ERG response to UV/blue and red wavelengths. However, the individual mutations only affect the response to red light, suggesting that they might affect behaviors such as OMR which typically rely on this part of the visual spectrum. However, there was no significant change in the response to UV/blue light of any intensity, making it unclear whether the mutations could individually play roles in the detection of UV prey. Based on the later behavioral data, it seems likely that at least the EAAT7 KO should affect retinal responses to UV light, but it may be that the ERG does not have the spatial or temporal resolution to detect the difference, or that the presence of blue light overwhelmed any effect of the individual knockouts on the response to UV light.

      In Figures 5 and 6, the authors compare the two knockouts to wild-type fish in terms of their sensitivity to UV prey in a hunting assay. The EAAT5b KO showed no significant impairment in UV sensitivity, while the EAAT7 KO fish actually had an increased hunting response to UV prey. However, there is no comparison of the KO and WT responses to different UV intensities, only in bulk, so we cannot conclude that the EAAT7 KO is allowing the fish to detect weaker prey-like stimuli.

      We have now reported in both in the results paragraph and in the methods section that response-comparison of intensity-specific responses were non-significant in all instances of analyses (Chi-square test with p>0.05). We decided not to add the information to the figure as it does not add to the data and risks causing excessive clutter of an already complex graph.

      As reviewer #2 rightfully states, we cannot conclude that EAAT7 KO is allowing the fish to detect weaker prey-like stimuli. We only intend to suggest that a lack of EAAT7 might facilitate prey detection events as the number of hunting events in total, is increased compared to WT.

      In Figure 7, the EAAT5b KO seems to cause a decrease in OMR behavior to red grating stimuli, but only one stimulus is tested, so it is unclear whether this is due to a change in visual sensitivity or resolution.

      We fully agree that further experiments presenting different stimuli in the setup may very well reveal more details on the nature of the observed defect and thank reviewer #2 for the suggestion. We feel that identifying the reason of the defect lies outside of the scope of this paper, but should definitely be investigated in future studies.

      The conclusions made in the manuscript are appropriately conservative; the abstract states that these transporters somehow influence prey detection and motion sensing, and this is probably true. However, it is unclear to what extent and how they might be acting on these processes, so the conclusions are a bit unsatisfying.

      In terms of impact on the field, this work highlights the potential importance of these two transporters to visual processing, but further studies will be required to say how important they are and what they are doing. The methods presented here are not novel, as UV prey and red OMR stimuli and behaviors have previously been described.

      We agree that this study is not fully conclusive but a first step towards a clarification of the role of glutamate transporters in shaping visual behavior.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Suggestions for improved or additional experiments, data, or analyses:

      Figure 3:

      (a) What is the intensity of the light emitted by the UV and yellow LEDs and experienced by the larva, e.g. in nW? This is necessary in order to be able to compare and replicate the results.

      Stimuli intensities in microwatts are now included and reported in the Materials and Methods sections

      (b) In Figure 3D, are all the example eye movement events hunting initiations? Does right eye/left eye positive or negative angle change denote convergence?

      As indicated in the figure legend, hunting initiations are indicated by black dots on the graph. In Stytra’s eye tracking system, eye convergence is indicated by an increase in the left eye angle and a decrease in the right eye angle. Both these points have now been clarified in the figure legend.

      (c) Also in 3D, the tail angle plot and x-axis are too small to read.

      Figure 3D has been reformatted to be more legible.

      (d) How much eye convergence constitutes a response? In order to compare the findings to previous studies of prey capture, it would be best to use a bimodal distribution of eye angles to set a convergence threshold for each fish (e.g. Paride et. al., eLife 2019), but there should at least be a clear threshold mentioned.

      We have expanded the explanation of how the response detection paradigm was calculated. We acknowledge that this analysis has limitations in terms of comparability with previous studies, as it was developed de novo, based on the format of eye coordinate data provided by Stytra and refined through iterative comparison with experimental video recordings. Since the threshold was defined relative to the average noise level of the trace, it is difficult to specify an exact value. However, we are happy to share the Python scripts used for the analysis to facilitate further investigation.

      (e) The previous study using artificial UV prey stimuli to trigger hunting (Khan et. al., Current Biology 2023) should be acknowledged.

      This is an indeed an embarrassing omission, not excused by the first version of this section being drafted before the Khan publication. We have now cited this important study.

      Figure 5:

      Was the response at any individual intensity significantly lower in the mutant? If not, this should be clearly stated.

      Yes, and this is now clearly stated in the main text

      Figure 6:

      Again, it would be more informative to know for which intensities the KO response was significantly greater than WT.

      This is now also clearly stated in the main text

      Figure 7:

      (a) What are the intensity units?

      We now clarified in the figure that the intensity shown in the graph is digital intensity

      (b) Similar to Figures 5 and 6, it would be more informative to know at which intensities the KO response was significantly different from WT.

      We now report the measured optical powers relative to the digital intensities in the Materials and Methods sections.

      Suggestion for writing:

      The discussion was a bit discursive. A more structured discussion, sequentially explaining each of the key results, would be easier for the reader to follow. And, it would be helpful to have hypotheses for how these transporter mutants could cause each of the changes in visual behaviors that were observed.

      We agree that the discussion needed improvements. We have completely rewritten the discussion and hope that it now more concisely put our results into context.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The authors present a new protocol to assess social dominance in pairs and triads of C57BL/6j mice, based on a competition to access a hidden food pellet. Using this new protocol, the authors have been able to identify stable ranking among male and female pairs, while reporting more fluctuant hierarchies among triads of males. Ranking readouts identified with this new apparatus were compared to the outcomes obtained with the same animals competing in the tube and in the warm spot tests, which have been both commonly used during the last decade to identify social ranks in rodents under laboratory conditions.

      Strengths:

      FPCT allows for easy and fast identification of a winner and a loser in the context of food competition. The apparatus and the protocol are relatively easy and quick to implement in the lab and free from any complex post-processing/analysis, which qualifies it for wide distribution, particularly within laboratories that do not have the resources to implement more sophisticated protocols. Hierarchical readouts identified through the FPCT correlate with social ranks identified with the tube and the warm spot tests, which have been widely adopted during the last decade and allow for study comparison.

      Weaknesses:

      While the FPCT is validated by the tube and the warm spot test, this paper would have gained strength by providing a more ethologically based validation. Tube and warm spot tests have been shown to provide conflicting results and might not been a sufficient measurement for social ranking (see Varholik et al, Scientific reports, 2019; Battivelli et al, Biological psychiatry, 2024). Instead, a general consensus pushing toward more ethological approaches for neuroscience studies is emerging.

      We appreciate all the reviewers for recognizing the strength of the FPCT setup and the data. We also appreciate the reviewers for pointing out weakness and giving us valuable suggestions that help us to improve the quality of our manuscript through revision.

      In this manuscript, we found the ranking results of the FPCT were largely consistent with the tube and the warm spot tests. Such a finding was unexpected by us as we considered that different competitive targets of different paradigms should provide the mice with distinct appeals and enable them to exert their specific advantages. However, the consistency between the FPCT and tube test was observed in the pairs of female mice, pairs of male mice and triads of male mice. The consistency between the FPCT, tube test and warm spot test was observed in pairs of male mice and triads of male mice. Thus, we concluded that there is a social rank-order stability of mice. 

      We acknowledge that it’d better if this conclusion could be validated by more ethological approaches like urine-marking analysis and water competition test. Whereas, we did not rule out inconsistency of ranking results between two or more paradigms. Actually, there were inconsistent cases in our experiments. The inconsistency of ranking results between paradigms, even between FPCT and tube test, could be amplified if the tests were operated with other details of experimental protocols and conditions. This is in that too many factors and aspects can affect the readouts, such as formation of colony, tasks, test protocols, habituation and training. Using tube test itself, both stable 1,2 and unstable 3 ranking results have been reported.

      Other papers already successfully identified social ranks dyadic food competition, using relatively simple scoring protocol (see for example Merlot et al., 2006), within a more naturalistic set-up, allowing the 2 opponents to directly interact while competing for the food. A potential issue with the FPCT, is that the opponents being isolated from each other, the normal inhibition expected to appear in subordinates in the presence of a dominant to access food, could be diminished, and usually avoiding subordinates could be more motivated to push for the access to the food pellet.

      The hierarchical structure of mice colony could be established on the basis of physical aspects—such as muscular strength, vigorousness of fighting—and psychological aspects— such as boldness, focused motivation, active self-awareness of status. In the contexts of currently available food contest paradigms where the mice compete with bodily interaction, the physical and psychological aspects are intermingled in the interpretation of the mice’s winning/losing. In the FPCT, the opponents are isolated from each other so that the importance of direct bodily interaction in a competition is minimized, facilitating the exposure of psychological factors contributing to the establishment and/or expression of social status of the mice. In this study, the overall stable ranking results across the FPCT, tube test and warm spot test indicate that the status sense of animals is part of a comprehensive identify of self-recognition of individuals in an established mice social colony.

      There are issues with use of the English language throughout the text. Some sentences are difficult to understand and should be clarified and/or synthesized.

      We thank the reviewer for pointing out language issues. We have carefully corrected the grammar errors.

      Open question:

      Is food restriction mandatory? Palatable food pellet is not sufficient to trigger competition? Food restriction has numerous behavioral and physiological consequences that would be better to prevent to be able to clearly interpret behavioral outcomes in FPCT (see for example Tucci et al., 2006).

      We thank the reviewer for raising this question. In the preliminary experiments, we noticed that food restriction was mandatory and palatable food pellet was not sufficient to trigger competition. In order to limit the potential influence of food restriction on competitive behavior, the mice underwent only a 24-hour food deprivation period at the beginning of training, followed by mild restriction of food supply to meet basic energy requirement.

      Conclusive remarks:

      Although this protocol attempts to provide a novel approach to evaluate social ranks in mice, it is not clear how it really brings a significant advance in neuroscience research. The FPCT dynamic is very similar to the one observed in the tube test, where mice compete to navigate forward in a narrow space, constraining the opponent to go backward. The main difference between the FPCT and the tube test is the presence of food between the opponents. In the tube test, a food reward was initially used to increase motivation to cross the tube and push the opponent upon the testing day. This component has been progressively abandoned, precisely because it was not necessary for the mice to compete in the tube.

      This paper would really bring a significant contribution to the field by providing a neuronal imaging or manipulation correlate to the behavioral outcome obtained by the application of the FPCT.

      Thank the reviewer for this comment on the significance of the FPCT paradigm. In this manuscript, we think it is interesting to report that the ranking results were consistent across the FPCT, tube test and warm spot test. This finding indicates that the status sense of animals might be a part of a comprehensive identify of self-recognition of individuals in an established social colony. 

      Moreover, we are conducting researches on biological consequences and mechanisms of social competition. Hopefully, the results of the on-going project will be published in the near future.

      Reviewer #2 (Public review):

      Summary:

      In this study, the authors have devised a novel assay to measure relative social rank in mice that is aimed at incorporating multiple aspects of social competition while minimizing direct contact between animals. Forming a hierarchy often involves complex social dynamics related to competitive drives for different fundamental resources including access to food, water, territory, and sexual mates. This makes the study of social dominance and its neural underpinnings hard, warranting the development of new tools and methods that can help understand both social functions as well as dysfunction.

      Strengths:

      This study showcases an assay called the Food Pellet Competition Test where cagemate mice compete for food, without direct contact, by pushing a block in a tube from opposite directions. The authors have attempted to quantify motivation to obtain the food independent of other factors such as age, weight, sex, etc. by running the assay under two conditions: one where the food is accessible and one where it isn't. This assay results in an impressive outcome consistency across days for females and males paired housed and for male groups of three. Further, the determined social ranks correlate strongly with two common assays: the tube test and the warm spot test.

      Weaknesses:

      This new assay has limited ethological validity since mice do not compete for food without touching each other with a block in the middle. In addition, the assay may only be valid for a single trial per day making its utility for recording neural recordings and manipulations limited to a single sample per mouse. Although the authors attempt to measure motivation as a factor driving who wins the social competition, the data is limited. This novel assay requires training across days with some mice reaching criteria before others. From the data reported, it is unclear what effects training can have on the outcome of social competition. Beyond the data shown, the language used throughout the manuscript and the rationale for the design of this novel assay is difficult to understand.

      We appreciate the reviewers for the valuable comments on the strength and weakness of our manuscript. 

      The design mentality of the FPCT was to (1) provide researchers with a choice of new food competition paradigm and (2) expose psychological factors influencing the establishment and/or expression social status in mice by avoiding direct physical competition between contenders (see revised Abstract and the last paragraph in the Introduction).

      As a result, the consistent ranking across the FPCT, tube test and warm spot test might indicate that the status sense of animals is part of a comprehensive identify of self-recognition of individuals in an established social colony. 

      We suggest to perform the FPCT test one trial per day per mouse as the mice might lose interest in the food pellet if it is tested frequently in a day, but it is practical to perform the FPCT assay for several days. 

      Regarding the training, we suggest 4-5 days for training as we did. In this revision, we add training data which show the progressing latency of food-getting of mice (Figure 1). At the last day of training, the mice would go directly to push the block and eat the food after they entered the arena.

      We thank the reviewer for pointing out language issues. We have carefully corrected the errors.

      Reviewer #3 (Public review):

      Summary:

      The laboratory mouse is an ideal animal to study the neural and psychological underpinnings of social dominance behavior because of its economic cost and the animals' readiness to display dominant and subordinate behaviors in simple and testable environments. Here, a new and novel method for measuring dominance and the individual social status of mice is presented using a food competition assay. Historically, food competition assays have been avoided because they occur in an open arena or the home cage, and it can be difficult to assess who gets priority access to the resource and to avoid aggressive interactions such as bite wounding. Now, the authors have designed a narrow rectangular arena separated in half by a sliding floor-to-ceiling obstacle, where the mice placed at opposite sides of the obstacle compete by pushing the obstacle to gain priority access to a food pellet resting on the arena floor under the obstacle. One can also place the food pellet within the obstacle to restrict priority access to the food and measure the time or effort spent pushing the obstacle back and forth. As hypothesized, the outcomes in the food competition test were significantly consistent with those of the more common tube test (space competition) and warm spot competition test. This suggests that these animals have a stereotypic dominance organization that exists across multiple resource domains (i.e., food, space, and temperature). Only male and female C57 mice in same-sex pairs or triads were tested.

      Strengths:

      The design of the apparatus and the inclusion of females are significant strengths within the study.

      Weaknesses:

      There are at least two major weaknesses of the study: neglecting the value of test inconsistency and not providing the mice time to recognize who they are competing with.

      Several studies have demonstrated that although inbred mice in laboratory housing share similar genetics and environment, they can form diverse types of hierarchical organizations (e.g., loose, stable, despotic, linear, etc.) and there are multiple resource domains in the home cage that mice compete over (e.g., space, food, water, temperature, etc.). The advantage of using multiple dominance assays is to understand the nuances of hierarchical organizations better. For example, some groups may have clear dominant and subordinate individuals when competing for food, but the individuals may "change or switch" social status when competing for space. Indeed, social relationships are dynamic, not static. Here, the authors have provided another test to measure another dimension of dominance: food competition. Rather than highlight this advantage, the authors highlight that the test is in agreement with the standard tube test and warm spot test and that C57 mice have stereotypic dominance across multiple domains. While some may find this great, it will leave many to continue using the tube test only (which measures the dimension of space competition) and avoid measuring food competition. If the reader looks at Figures 6E, F, and G they will see examples of inconsistency across the food competition test, tube test, and warm spot test in triads of mice. These groups are quite interesting and demonstrate the diversity of social dynamics in groups of inbred mice in highly standardized environmental conditions. Scientists interested in dominance should study groups that are consistent and inconsistent across multiple dimensions of dominance (e.g., space, food, mates, etc.).

      Unlike the tube test and warm spot test, the food competition test presented here provides no opportunity for the animals to identify their opponent. That is, they cannot sniff their opponent's fur or anogenital region, which would allow them an opportunity to identify them individually. Thus, as the authors state, the test only measures psychological motivation to get a food reward. Notably, the outcome in the direct and indirect testing of food competition is in agreement, leaving many to wonder whether they are measuring the social relationship or the effort an individual puts forth in attaining a food reward regardless of the social opponent. Specifically, in the direct test, an individual can retrieve the food reward by pushing the obstacle out of the way first. In the indirect test, the animals cannot retrieve the reward and can only push the obstacle back and forth, which contains the reward inside. In Figure 4E, you can see that winners spent more time pushing the block in the indirect test. Thus, whether the test measures a social relationship or just the likelihood of gaining priority access to food is unclear. To rectify this issue, the authors could provide an opportunity for the animals to interact before lowering the obstacle and raising(?) a food reward. They may also create a very long one-sided apparatus to measure the amount of effort an individual mouse puts forth in the indirect test with only one individual - or any situation with just one mouse where the moving obstacle is not pushed back, and the animal can just keep pushing until they stop. This would require another experiment. It also may not tell us much more since it remains unclear whether inbred mice can individually identify one another

      (see https://doi.org/10.1098/rspb.2000.1057 for more details).

      A minor issue is that the write-up of the history of food competition assays and female dominance research is inaccurate. Food competition assays have a long history since at least the 1950s and many people study female dominance now.

      Food competition: https://doi.org/10.1080/00223980.1950.9712776, https://psycnet.apa.org/fullte xt/1953-03267-

      001.pdf, https://doi.org/10.1016/j.bbi.2003.11.007, https://doi.org/10.1038/s41586-02204507-5

      Female dominance: history  https://doi.org/10.1016/j.cub.2023.03.020,  https://doi.org/10.1016/S0 031-9384(01)00494-2,  https://doi.org/10.1037/0735-7036.99.4.411

      We thank the reviewers very much for so many helpful comments and suggestions.

      In this manuscript, we want to address the overall and averagely consistency of ranking results between FPCT, tube test and warm spot test) as an unexpected finding. We agree that the inconsistency of social ranking occurred between trials and between paradigms should not be ignored. In the revision, we added description and discussion of inconsistent part of the different test paradigms (paragraph 2 in the section 3 of the Result, last 2 sentences of paragraph 4 in the Discussion)

      Although the two opponents were separated each other, they were able to see and sniff each other because the block is transparency, there are holes in the lower portion of the block, and there is the gap between the block and chamber (Supplementary figures 1 and 2). In the female but not male groups, the presence of a cagemate opponent during the test 1 could significantly disturb the female mice and increase the its latency to get the food, comparing with last day of training when there was no opponent (Figure 3A). This indicates that one mouse, at least female mouse, could identify the existence of the opponent in the opposite side of the chamber. To further see whether social relation was influential to readouts of the FPCT, we performed additional experiments using two groups of non-cagemate mice to perform the competition. We did not detect obviously different ranks between the two groups (Figure 1H-1J), suggesting that establishment of social colony is necessary for FPCT to distinguish social ranks of mice.

      Thank the reviewer for reminding us to recognize the history of food competition assays. We have added the citations and discussions of related literatures, both for male (paragraph 2 in the Introduction; paragraph 3 in the Discussion) and female (paragraph 1 of section 3 in the Results; paragraph 4 in the Discussion) mice. 

      Reviewer #1 (Recommendations for the authors):

      There are issues with use of the English language throughout the text. Some sentences are difficult to understand and should be clarified and/or synthesized.

      We appreciate the reviewer for constructive comments and helpful corrections.

      “Despite that 6 in 9 groups of mice display some extent of flipped ranking (Figures 6B-6G) and only 3 in 9 groups displayed continuously unaltered ranking (Figure 6H) during a total of 9 trials consisting of 3 trials of FPCT, 3 trials of tube test and 1 trial of WST, an obvious stable linear intragroup hierarchy was observed throughout all the trials and tasks"

      The above sentence has been re-written as: The ranking result showed that 6 in 9 groups of mice displayed some extent of flipped ranking (Figures 4B-4G), and only 3 in 9 groups displayed continuously unaltered ranking (Figure 4H). Averagely, in the totally 27 trials consisting of 12 trials of FPCT, 12 trials of tube test and 3 trials of WST, an obvious stable linear intragroup hierarchy was observed across all the trials and tasks (paragraph 1 of section 4 in the Results).

      "it is hard to attribute winning a competition in a shared space to stronger motivation rather than muscular superiority".

      The above sentence has been deleted and re-written in paragraph 1 of section 4 in the Results and paragraph 3 in the Discussion.

      "Unexpectedly, in most of the trials the mice preserved the winner or loser identity acquired in FPCT into tube test and WST (Figures 5L-5O)".

      Why this is unexpected? Instead, it looks like this result is expected (tube test has been successfully applied to identify ranks in females, see Leclair et al, eLife, 2021).

      We thank the reviewer for raising this point. FPCT is different from tube test and warm spot test at least in two aspects: competition for food vs space; presence vs absence of direct bodily interaction during competition. Some mice might be active in food competition, but not in space competition, while others might be on the contrary. Some mice might be good at physical contest, while others might be good at play tricks. Therefore, these factors made us expect task-specific outcomes of ranking results.

      Vocabulary issues:

      "Stereotypic", to talk about rank stability in a different context does not look appropriate. In behavioral neuroscience, stereotypy is more excepted to intend abnormal repetitive behaviors. The stability that the authors seem to indicate with the word "stereotype" refers rather to the concept of "consistency" or "stability".

      We thank the reviewer for this detailed explanation. We have chosen to use "stability" to describe the data.

      "Society", to talk about groups or colonies of animals sounds a bit odd. Society evokes more abstract concepts more likely to fit with human organization. I suggest the use of "group" or "colony".

      "Hide" to qualify the block preventing access to the food pellet. It is said that the block is transparent. We suggest the use of "inaccessible" instead of hidden.

      We strongly encourage the authors to further edit the entire script to improve language.

      Thank the reviewer for kind correction. We have corrected the above vocabulary misuse. 

      Technical issues / typos:

      Figure 1. The picture does not seem optimal to visualize the apparatus.

      Missing unit legend in Figure 4E.

      Supplementary videos 2 and 4 are missing.

      We have added a frontal view of the apparatus in the figure (Supplementary Figure 1), added a unit to the Figure 2F (previous Figure 4E), and we will make sure to upload the missing videos.

      Reviewer #2 (Recommendations for the authors):

      While the assay shows promise as a tool for studying social dominance, the study suffers from some limitations such as lack of ethological relevance. In addition, there is a lack of rationale and methodological clarity in the manuscript that can impact the ability of other scientists to be able to perform this novel assay.

      (1) Related to lack of scientific rigor:

      a. In the first paragraph of the introduction, the authors mention that "disability in social recognition and unsatisfied social status are associated with brain diseases such as autism, depression and schizophrenia". Both papers that they cited refer to mouse models, not humans (which is the species that is attributed these diagnoses clinically). In addition, neither citation discusses schizophrenia. While social dysfunctions can indeed be related to these diseases, to my knowledge this is not caused by a change in "social status" and there is no human data with patient populations and social status. Therefore, this sentence is inaccurate and there is no research that demonstrates that.

      We thank the reviewer for raising this point. To express the opinion and cite literatures more accurately, we improved the sentence in the 1st paragraph of Introduction as follows: “Impaired awareness of social competition has been documented in individuals with autism spectrum disorder (ASD)4,5, and reduced social interaction has been characterized in corresponding animal models6. Similarly, maladaptive responses to social status loss has been associated with patient depressive disorders7,8 and animal models of depression1,9”. The reviewer is right that no patient disease is causally related with social status, and only depression has been proposedly associated with change of social status7,8.

      b. In the second paragraph of the introduction, the authors mention a scarcity of research papers with designs for food competition-based social hierarchy assays for mice. At least two such papers have been published in the past few years (DOIs https://doi.org/10.1038/s41586-

      021-04000-5 and https://doi.org/10.1038/s41586-022-04507-5). The authors should acknowledge the existence of these and other assays and discuss how their work would be related. In the same paragraph, they also mention that existing assays suffer from "hierarchy instability" and "complex calculations" without showing any citations or details for these claims.

      We thank the reviewer for raising this point. We acknowledged that there are some available food competitions to measure social hierarchy for mice. But relative to space competition, food competition tests have not been used so commonly and widely. No food competition paradigm has been accepted as generally as some space competition paradigms like tube test and warm spot test. To improve the language and scientific expression, we revised the sentences as follows: “Relative to space competition, food competition tests for mice have been designated and applied less commonly in animal studies despite its long history 28-30. Several issues could be thought to be the underlying limitations for the application of food competition paradigms. First, there are methodological issues in some of these approaches, such as long video recording duration and difficulty in analyzing animal’s behaviors during competitive physical interaction in videos, hindering their application by laboratories that cannot afford sophisticated equipment and analysis”. Corresponding citations have been updated (see paragraph 3 in the Introduction).

      c. The authors say that their study is the first to demonstrate that female mice follow social ranks. This is not the first study to do so and the authors should acknowledge existing publications that have done the same (eg DOI https://doi.org/10.7554/eLife.71401).

      We have followed the reviewer’s suggestion to increase citations regarding social ranking of female mice tested by competition paradigms, especially food competition paradigms (see paragraph 1 of section 3 in the Results; paragraph 4 in the Discussion).

      (2) Related to problems with interpretation of data:

      a. The authors showed the assay works for females and males in pairwise housing, but two mice don't make a hierarchy, as hierarchies require a minimum of three individuals. Therefore, whether the assay works for females caged in three is an important question that is unaddressed in this study and is a caveat. extended the competition assay to male mice that are housed in cages of three. It would be important to show whether the assay generalizes well for female mice with this three-animal housing as well as discuss the effect of using even bigger groups of mice on the results of the assay.

      We thank the reviewer for raising questions related to the interpretation of data and giving us the insightful the suggestions. We agree that it is interesting and important to probe if FPCT works for a group of three female mice. Although social rankings of pairs of male and female mice were not significantly different (new Figure 2D-2F and 3F-3H), that of triads of male and female mice could be different. We have tested trads of male mice and found that the mice displayed an overall linear hierarchical ranking. We would like to use FPCT to investigate the rankings of trads of female mice and even bigger group of mice in the future. In the present manuscript we’d like to address the feasible application of the FPCT in smaller groups. In the Discussion, we add contents commenting group size effect on social competition tests (see paragraph 4 in the Discussion).

      b. The authors claim that "test 2" of their assay helps assert the motivation of mice for social competition as in Figure 4E. This could simply be a readout of how strong the mice are (muscle mass). To claim that this is indeed related to motivation during the FPCT assay, the authors should show the correlation of this readout with the latency to push the block during the social competition task.

      We appreciate the reviewer for raising this question. The dimensions establishing the social structures include physical and psychological factors. In the FPCT paradigm, the two contenders are separated so that physical factors are minimized in this context and psychological factors should play more important role in competition in comparison with previous reported food competition paradigms. Therefore, in the revised manuscript we consider to attribute the ranking results mainly to psychological factors, rather than only motivation which is just one of the numerous psychological factors (paragraph 3 of Discussion). Moreover, in the Discussion we point out that we could not exclude physical factors still participate in the determination of competitive outcomes since some of mice pairs pushed the block simultaneously (paragraph 3 of Discussion).

      c.The authors mention that they are interested to understand which factors lead to the outcome of the competition such as age, sex, physical strength, training level, and intensity of psychological motivation. However, in all their runs of the assay, they always matched these variables between the competitors. They should clarify that they were instead controlling for these variables. Another thing to note here is that while they controlled the body mass of the animals, that isn't the same as physical strength, as a lighter mouse can have more muscle mass than a heavier mouse. They should either specify this limitation or quantify the additional metric of "muscle mass" which is a much better proxy for physical strength. Thus, the claim that the outcome of the competition is solely affected by motivation is not convincing since they didn't rule out the others such as quantifying the rate of learning during training and strength.

      We thank the reviewer for addressing this question. As our response to the question in (c), we acknowledge that it is not accurate to ascribe the outcomes of FPCT to psychological motivation. In the revised manuscript, the dimensions of contributing factors to the outcomes of FPCT have been simplified to physical and psychological factors. We consider that the psychological factor could be the main driver of mice participating in FPCT (see paragraph 3 of Discussion).

      d. In the discussion, the authors mention that their task only requires a single day of food deprivation (the day before the first trial) while other assays suffer from a continued food deprivation protocol. However, the authors also use 10g per cage as the amount of food instead of giving them ad libitum access. Limited food is a food deprivation method. Thus, this is an inaccurate claim.

      We thank the reviewer for raising this point. We have clarified the requirement of food restriction for FPCT in the revision. The mice were deprived of food for 24 hours while water consumption remained normally to enhance the appeal of the food pellet to the mice. Then, after 24 hours of food deprivation, each cage of mice was given 10 g of food every morning to meet their daily food requirements until the end of the test (see FPCT procedure section in Methods and materials).

      e.In the second section of the results, the authors run their assay with female mice that are housed in cages of two. This section suffers from the same limitations as the first and can be improved by showing the training data, correlations of competition outcome with "motivation" and ruling out the other factors that could contribute to the outcome. Further, the authors saying that their FPCT assay is enough to show that female mice follow a social hierarchy by itself is a weak claim. They should instead include their cross-validation with the others to strengthen it.

      We appreciate the reviewer for raising this question. We have taken the reviewer’s suggestion to show the training data (Figures 1E, 2A and 3A). As the factors contributing to the outcomes of FPCT are diverse, we’d like not to control and determine the exact factor in the current manuscript. We agree with the reviewer that cross-validation with different paradigms is suggested for the studies to rank social hierarchy as the ranking results could be variable with tasks, procedures and operations.

      f.  In the last paragraph of the introduction, the authors mention how their assay involves "peaceful competition" since the mice are not in direct contact and hence cannot exhibit aggression. The authors do not address the limitation that a lack of physical contact actually makes the assay less ethological. Further, since the mice are housed in groups of two and three, it is not guaranteed that the mice will not be aggressive during their time in the home cage, which could affect their behavior during the competition assay. Whether the assay causes more aggression in the cage due to the lack of physical contact during the competition is not addressed in this study.

      We thank the reviewer for raising this point. Diverse factors affect the outcomes of a food competition test, some of which belong to psychological factors and others belong to physical factors. We agree that a lack of physical contact makes the assay less naturally ethological. However, when the social statuses have been established during habituation housing a group of mice for enough time, the win/lose outcomes in the FPCT could be a readout of the expression of social statuses since the mice cannot exhibit aggression in the test. We have revised the Introduction and Discussion (paragraph 3 of Discussion). Thank you.

      (3) Related to lack of methodological rigor and rationale clarity:

      a. In the first section of the results, the authors run their assay with male mice that are housed in cages of two. While the data that they display is promising, we do not see how mice change behavior across days of training and how that relates to the outcome of the competition. It would be valuable to also show the training data for the mice, answering questions related to competency and any inter-animal variabilities prior to rank assessment. Plotting the training data across all days would be helpful for the other parts of the results as well. This is especially important because the methods mention that mice are trained until they get to the criterium, so this means that different individuals get different amounts of training.

      We appreciate the reviewer for addressing the importance of showing training data. We have taken the reviewer’s suggestion and shown the training data (Figures 1E, 2A and 3A).

      b.  It is unclear why the assay was run only once per mouse pair per day since most protocols for the tube test involve multiple repetitions each day while alternating the side from which the mice enter. The authors should address whether a single trial per day is enough to show consistent results and that it wouldn't vary with more.

      We suggest to run the FPCT once or twice per mouse per day under conditions of mild food restriction, training and test procedures in this manuscript. Frequent tests might make the mice’s interest in the food pellet gradually diminished because the food supply was not fully deprived. According to our data, the outcomes of FPCT in 4 consecutive days were overall stable.

      c.  In the results the authors say that they "raised 3 male mice" which may be incorrect because they report in the methods buying the mice buy mice and they housed all their mice for only three days before running the assay which might be too little for the hierarchy to stabilize. The authors should comment on what was the range of the cohabitation across different cages and whether it had an impact on the results.

      According to our experiments, housing the mice for 3 days is enough to establish a mice social colony with relative stable status structure. Prolonged housing may produce either similar, stabler or more dynamic social colony.

      d. There are also some formatting and/or convention issues in the results. The first figure callout in the results is for Figure 4 instead of Figure 1 (which is the standard). This is because the authors do not explain how the mice are trained for the task in the results section and show limited data about the training of the task. Not showing comprehensive training data would make replication of this study very difficult.

      We appreciate the reviewer for raising this question. We have re-arranged the figures. The new arrangement of figures started with schematic drawing of FPCT procedure and training data (Figure 1).

      e. The authors don't report the exact p-values in the figures

      We reported the difference level in the figures in the revised manuscript. Thank you.

      4. The writing of the manuscript suffers from a lack of clarity in most sections of the manuscript.

      Here are several examples that are critical:

      a. In the title and abstract, it isn't clear what the authors mean by "stereotype". It could be a behavior during the competition, or that the social ranks across assays are correlated or that the rank for the new assay is consistent across days.

      b. There are several instances where the authors anthropomorphize mice using human features such as "urbanization" and "society" which are not established factors affecting mouse hierarchy. This further extends to anthropomorphizing mice in ways that are not standard such as an animal being "timid" or "bold" which would be hard to measure in mice, if not impossible.

      c. Across the social dominance literature, relative social rank is described using more general "dominant" and "subordinate" titles instead of "superior" and "inferior" that are sometimes used in the manuscript. The authors should follow the standard language so that readers understand.

      d.  In the third paragraph of the introduction, the authors say "Thus, it is more likely expected that different paradigms to weigh the social competency and status may lead to diverse readouts, given that competitive factors are included in competition paradigms." This sentence suffers from multiple syntax errors thereby reducing clarity

      e. There are several typos in the manuscript such as using "dominate" instead of "dominant", "grades" instead of "outcomes" and "forth" instead of "fourth", to give a few examples.

      We thank the reviewer for careful reading of the manuscript and very helpful comments. We have taken the above suggestions and improved the writing of the manuscript. For examples, "stereotype" was replaced by “stability”, mice "society" was expressed by "colony", the sentence “Thus, it is more.... in competition paradigms” has been deleted.

      Reviewer #3 (Recommendations for the authors):

      (1) The justification for the design of this new test paradigm is unclear. In the abstract, you state that the field needs a reliable, valid, and easily executable test. Your test provides this, as you state, but how is it better than the tube test? Does the tube test suffer from taskspecific win-or-lose outcomes? Can you provide evidence for this? The nature methods protocol for the tube test (https://doi.org/10.1038/s41596-018-0116-4) "strongly suggest using more than two dominance measures, for example, by also carrying out the warm spot test, or territory urine marking or ultrasonic courtship vocalization assays." This would suggest that results from the tube test can be task-specific, but I am not convinced that you have demonstrated that results from your food competition test are not task-specific. Indeed, by your title, one must run multiple tests.

      This same problem is apparent in the introduction. In the second paragraph, there is a discussion of the tube test, warm spot test, and food competition tests. What is the problem with these tests?

      I believe that social dominance relationships are complex and dynamic social relationships indicating who has priority access to a resource between multiple animals that live together. In these living situations, several resources can often be capitalized competed over-for example, space, food, mates, temperature, etc. Currently, we have tests to measure space via the tube test or urine marking, mates via ultrasonic vocalization, temperature via warm spot test, and food via food competition assays. The tube test, urine marking assay, and ultrasonic vocalization test have been demonstrated to be reliable, valid, and easily executable. However, the food competition assays are often difficult to execute because it is difficult to interpret the dominant behaviors and aggressive behaviors like bite wounding can occur during the test. Here, you present a new food competition assay to address these issues and show that it can be used in conjunction with other assays to measure social dominance across multiple resources easily. In doing so, you revealed that many same-sex groups of C57 mice have a stereotypic pattern of dominance behavior when competing across multiple types of resources: space, temperature, and food.

      I ask that you please rebut if you disagree with me, and adjust your abstract, introduction, and discussion accordingly.

      We thank the reviewer for all the constructive comments. We have adjusted the Abstract, Introduction and Discussion of the manuscript.

      We recognize and appreciate the valuable tube test, warm spot test and many other competition tests, including food competitions. Tube test and warm spot test are space competition tasks. Relative to space competition, food competition tests for mice have been designated and applied less commonly in animal studies. Several issues (such as methodological issue, aggressive behaviors occurring in competition, and prolonged food deprivation) could be thought to be the underlying limitations of the application of food competition paradigms (paragraph 3 in the Introduction). Therefore, we clarify that the justification for the design of FPCT was “to have a new choice of food competition paradigm for mice, and to facilitate the exposure of psychological aspects contributing to the winning/losing outcomes in competitions” (last paragraph in the Introduction).

      FPCT is different from tube test and warm spot test at least in two ways. FPCT is food completion task where the mice need no physical contact during competition, while tube test and WST are space competition tasks where the mice need direct physical contact during competition. Therefore, we expected inconsistent evaluation results of competitiveness and rankings if we compared FPCT with typically available competition paradigms—tube test and WST (last paragraph in the Introduction).

      (2)  The design of the test needs to be described before the results. You can either move the methods section before the results or add a paragraph in the introduction to better describe the test. Here, you can also reference Figures 1 through 3 so that the figures are presented in the order of which they are mentioned in the paper. (It is very confusing that the first reference to a figure is Figure 4, when it should be Figure 1).

      We appreciate the reviewer for raising this point and giving us suggestions. We have added a new section (section 1) in the Results. In the revised manuscript, the figures in the Results start with Figure 1 which shows schematic drawing of FPCT procedure, training data and some test results (Figure 1).

      (3)  The sentence describing Figure 4H. You argue that this shows that the mice are well and equally trained. It also shows that they have the same motivation or preference for the food.

      We appreciate the reviewer for this helpful comment. Data in previous Figures 4H and 5I have been presented as new Figures 2A and 3A, respectively, of revised manuscript. These retrospect analysis of training data displayed similar training level of food-getting and craving state for food (Sections 2 and 3 in the Results).

      (4)  "Social ranking of multiple cagemate mice using FPCT, tube test and WST"

      Here, you claim that "comparison of inter-task consistency revealed that the ranks evaluated by FPCT, tube test and WST did not differ from each other...Figure 6K." Okay, however, it is important to discuss the three cases when there wasn't consistency between the tests! Figure 6E-G.

      We appreciate the reviewer for raising this point. In the revised manuscript, we add description and discussion of inconsistent part of the different test paradigms (paragraph 2 in the section 3 of the Result, last 2 sentences of paragraph 4 in the Discussion)

      (5)  Replace all instances of "gender" with "sex". Animals do not have a gender.

      (6)  Adjust the strain of the mice to C57BL/6JNifdc.

      We have replaced "gender" with "sex" and “C57BL/6J” with “C57BL/6JNifdc”. Thank you for your careful correction.

      (7)  What is the justification for running the warm spot test for one day and the other tests for four days?

      From the consecutive FPCT and tube test, we already knew that the ranking results were overall stable. This stability was still observed in the day of warm spot test. A bad point for frequent warm spot test is that mice get much stress due to exposure in ice-cold environment. Therefore, we terminated the competition test after only one trial of warm spot test.

      (8)  Grammar

      The second sentence of the abstract: ...recognized as a valuable...

      Results, sentence after "...was observed (Figure 4G)." it should be "Fourth"

      We have corrected these and other grammar errors. We appreciate the reviewers for very careful review and all helpful comments.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      The authors survey the ultrastructural organization of glutamatergic synapses by cryo-ET and image processing tools using two complementary experimental approaches. The first approach employs so-called "ultra-fresh" preparations of brain homogenates from a knock-in mouse expressing a GFP-tagged version of PSD-95, allowing Peukes and colleagues to specifically target excitatory glutamatergic synapses. In the second approach, direct in-tissue (using cortical and hippocampal regions) targeting of the glutamatergic synapses employing the same mouse model is presented. In order to ascertain whether the isolation procedure causes any significant changes in the ultrastructural organization (and possibly synaptic macromolecular organization) the authors compare their findings using both of these approaches. The quantitation of the synaptic cleft height reveals an unexpected variability, while the STA analysis of the ionotropic receptors provides insights into their distribution with respect to the synaptic cleft.

      The main novelty of this study lies in the continuous claims by the authors that the sample preservation methods developed here are superior to any others previously used. This leads them as well to systematically downplay or directly ignore a substantial body of previous cryo-ET studies of synaptic structure. Without comparisons with the cryo-ET literature, it is very hard to judge the impact of this work in the field. Furthermore, the data does not show any better preservation in the so-called "ultra-fresh" preparation than in the literature, perhaps to the contrary as synapses with strangely elongated vesicles are often seen. Such synapses have been regularly discarded for further analysis in previous synaptosome studies (e.g. Martinez-Sanchez 2021). Whilst the targeting approach using a fluorescent PSD95 marker is novel and seems sufficiently precise, the authors use a somewhat outdated approach (cryo-sectioning) to generate in-tissue tomograms of poor quality. To what extent such tomograms can be interpreted in molecular terms is highly questionable. The authors also don't discuss the physiological influence of 20% dextran used for high-pressure freezing of these "very native" specimens.

      Lastly, a large part of the paper is devoted to image analysis of the PSD which is not convincing (including a somewhat forced comparison with the fixed and heavy-metal staining room temperature approach). Despite being a technically challenging study, the results fall short of expectations. 

      Our manuscript contains a discussion of both conventional EM and cryoET of synapses. We apologise if we have omitted referencing or discussing any earlier cryoET work. This was certainly not our intention, and we include a more complete discussion of published cryoET work on synapses in our revised manuscript.

      The reviewer is concerned that the synaptic vesicles in some synapse tomograms are “stretched” and that this may reflect poor preservation.  We would like to point out that such non-spherical synaptic vesicles have also been previously reported in cryoET of primary neurons grown on EM grids (Tao et al., J. Neuro, 2018). Indeed, there is no reason per se to suppose synaptic vesicles are always spherical and there are many diverse families of proteins expressed at the synapse that shape membrane curvature (BAR domain proteins, synaptotagmin, epsins, endophilins and others). We will add further discussion of this issue in the revised manuscript.

      The reviewer regards ‘cryo-sectioning’ as outdated and cryoET data from these preparations as “poor quality”. We respectfully disagree. Preparing brain tissues for cryoET is generally considered to be challenging. The first successful demonstration of preparing such samples was before the advent of the cryoEM resolution revolution (with electron counting detectors) by Zuber et al (Proc. Natl. Acad. Sci.,2005) preparing cryo-sections/CEMOVIS of in vitro brain cultures. We followed this technique to prepare tissue cryo-sections for cryoET in our manuscript. Recently, cryoFIB-SEM liftout has been developed as an alternative method to prepare tissue samples for cryoET (Mahamid et al., J. Struct. Biol., 2015) and only more recently this method became available to more laboratories. Both techniques introduce damage as has been described (Han et al., J. Microsc., 2008; Lucas et al., Proc. Natl. Acad. Sci., 2023). Importantly no like-for-like, quantitative comparison of these two methodologies has yet been performed. We have recently demonstrated that the molecular structure of amyloid fibrils within human brain is preserved down to the protein fold level in samples prepared by cryo-sectioning (Gilbert et al., Nature, 2024). We will add further detail on the process by which we excluded poor quality tomograms from our analysis, which we described in detail in our methods section.

      The reviewer asks what the physiological effect is of adding 20% w/v ~40,000 Da dextran? This is a reasonable concern since this could in principle exert osmotic pressure on the tissue sample. While we did not investigate this ourselves, earlier studies have (Zuber et al, 2005) showing cell membranes were not damaged by and did not have any detectable effect on cell structure in the presence of this concentration of dextran.

      The reviewer is not convinced by our analysis of the apparent molecular density of macromolecules in the postsynaptic compartment that in conventional EM is called the postsynaptic density. However, the reviewer provides no reasoning for this assessment nor alternative approaches that could be attempted. We would like to add that we have tested multiple different approaches to objectively measure molecular crowding in cryoET data, that give comparable results. We believe that our conclusion – that we do not observe an increased molecular density conserved at the postsynaptic membrane, and that the PSD that we and others observed by conventional EM does not correspond to a region of increased molecular density - is well supported by our data.  We and the other reviewers consider this an important and novel observation.

      Reviewer #2 (Public review)

      Summary: 

      The authors set out to visualize the molecular architecture of the adult forebrain glutamatergic synapses in a near-native state. To this end, they use a rapid workflow to extract and plunge-freeze mouse synapses for cryo-electron tomography. In addition, the authors use knockin mice expression PSD95-GFP in order to perform correlated light and electron microscopy to clearly identify pre- and synaptic membranes. By thorough quantification of tomograms from plunge- and high-pressure frozen samples, the authors show that the previously reported 'post-synaptic density' does not occur at high frequency and therefore not a defining feature of a glutamatergic synapse.

      Subsequently, the authors are able to reproduce the frequency of post-synaptic density when preparing conventional electron microscopy samples, thus indicating that density prevalence is an artifact of sample preparation. The authors go on to describe the arrangement of cytoskeletal components, membraneous compartments, and ionotropic receptor clusters across synapses.

      Demonstrating that the frequency of the post-synaptic density in prior work is likely an artifact and not a defining feature of glutamatergic synapses is significant. The descriptions of distributions and morphologies of proteins and membranes in this work may serve as a basis for the future of investigation for readers interested in these features.

      Strengths: 

      The authors perform a rigorous quantification of the molecular density profiles across synapses to determine the frequency of the post-synaptic density. They prepare samples using two cryogenic electron microscopy sample preparation methods, as well as one set of samples using conventional electron microscopy methods. The authors can reproduce previous reports of the frequency of the post-synaptic density by conventional sample preparation, but not by either of the cryogenic methods, thus strongly supporting their claim. 

      We thank the reviewer for their generous assessment of our manuscript.

      Reviewer #3 (Public review): 

      Summary: 

      The authors use cryo-electron tomography to thoroughly investigate the complexity of purified, excitatory synapses. They make several major interesting discoveries: polyhedral vesicles that have not been observed before in neurons; analysis of the intermembrane distance, and a link to potentiation, essentially updating distances reported from plastic-embedded specimen; and find that the postsynaptic density does not appear as a dense accumulation of proteins in all vitrified samples (less than half), a feature which served as a hallmark feature to identify excitatory plastic-embedded synapses. 

      Strengths: 

      (1)The presented work is thorough: the authors compare purified, endogenously labeled synapses to wild-type synapses to exclude artifacts that could arise through the homogenation step, and, in addition, analyse plastic embedded, stained synapses prepared using the same quick workflow, to ensure their findings have not been caused by way of purification of the synapses. Interestingly, the 'thick lines of PSD' are evident in most of their stained synapses.

      (2)I commend the authors on the exceptional technical achievement of preparing frozen specimens from a mouse within two minutes.

      (3)The approaches highlighted here can be used in other fields studying cell-cell junctions.

      (4)The tomograms will be deposited upon publication which will enable neurobiologists and researchers from other fields to carry on data evaluation in their field of expertise since tomography is still a specialized skill and they collected and reconstructed over 100 excellent tomograms of synapses, which generates a wealth of information to be also used in future studies.

      (5) The authors have identified ionotropic receptor positions and that they are linked to actin filaments, and appear to be associated with membrane and other cytosolic scaffolds, which is highly exciting.

      (6) The authors achieved their aims to study neuronal excitatory synapses in great detail, were thorough in their experiments, and made multiple fascinating discoveries. They challenge dogmas that have been in place for decades and highlight the benefit of implementing and developing new methods to carefully understand the underlying molecular machines of synapses.

      Weaknesses: 

      The authors show informative segmentations in their figures but none have been overlayed with any of the tomograms in the submitted videos. It would be helpful for data evaluation to a broad audience to be able to view these together as videos to study these tomograms and extract more information. Deposition of segmentations associated with the tomgrams would be tremendously helpful to Neurobiologists, cryo-ET method developers, and others to push the boundaries.

      Impact on community: 

      The findings presented by Peukes et al. pertaining to synapse biology change dogmas about the fundamental understanding of synaptic ultrastructure. The work presented by the authors, particularly the associated change of intermembrane distance with potentiation and the distinct appearance of the PSD as an irregular amorphous 'cloud' will provide food for thought and an incentive for more analysis and additional studies, as will the discovery of large membranous and cytosolic protein complexes linked to ionotropic receptors within and outside of the synaptic cleft, which are ripe for investigation. The findings and tomograms available will carry far in the synapse fields and the approach and methods will move other fields outside of neurobiology forward. The method and impactful results of preparing cryogenic, unlabelled, unstained, near-native synapses may enable the study of how synapses function at high resolution in the future.

      We thank the reviewer for their supportive assessment of our manuscript.  We thank the reviewer for suggesting overlaying segmentations with videos of the raw tomographic volumes. We will include this in our revised manuscript.

      Reviewer #1 (Recommendations for the authors): 

      Major comments: 

      (1) The previous literature on synaptic cryo-ET studies is systematically ignored. The results presented here (and their novelty) must be compared directly with this body of work, rather than with classical EM.

      Our submitted manuscript included a 3-paragraph discussion of earlier synaptic cryoET studies, albeit we apologize that a seminal citation was missing, which we have corrected in our revised manuscript. We have now also included an additional brief discussion related to several more recent cryoET studies (see citations below) that were published after our pre-print was first deposited in 2021.

      (1) Held, R.G., Liang, J., and Brunger, A.T. (2024). Nanoscale architecture of synaptic vesicles and scaffolding complexes revealed by cryo-electron tomography. Proc. Natl. Acad. Sci. 121, e2403136121. https://doi.org/10.1073/pnas.2403136121.

      (2) Held, R.G., Liang, J., Esquivies, L., Khan, Y.A., Wang, C., Azubel, M., and Brunger, A.T. (2024). In-Situ Structure and Topography of AMPA Receptor Scaffolding Complexes Visualized by CryoET. bioRxiv, 2024.10.19.619226. https://doi.org/10.1101/2024.10.19.619226.

      (3)Matsui, A., Spangler, C., Elferich, J., Shiozaki, M., Jean, N., Zhao, X., Qin, M., Zhong, H., Yu, Z., and Gouaux, E. (2024). Cryo-electron tomographic investigation of native hippocampal glutamatergic synapses. eLife 13, RP98458. https://doi.org/10.7554/elife.98458.

      (4)Glynn, C., Smith, J.L.R., Case, M., Csöndör, R., Katsini, A., Sanita, M.E., Glen, T.S., Pennington, A., and Grange, M. (2024). Charting the molecular landscape of neuronal organisation within the hippocampus using cryo electron tomography. bioRxiv, 2024.10.14.617844. https://doi.org/10.1101/2024.10.14.617844.

      We discuss the above papers in our revised manuscript with the following:

      “Since submission of our manuscript, several reports of synapse cryoET from within cultured primary neurons (Held et al., 2024a, 2024b)  and mouse brain(Glynn et al., 2024; Matsui et al., 2024) were prepared by cryoFIB-milling. These new datasets are largely consistent with the data reported here. CryoFIB-SEM has the advantage of overcoming the local knife damage caused by cryo-sectioning but introduces amorphization across the whole sample that diminishes the information content (Al-Amoudi et al., 2005; Lovatt et al., 2022; Lucas and Grigorieff, 2023). We have recently shown cryoET data is capable of revealing subnanometer resolution in-tissue protein structure from vitreous cryo-sections (Gilbert et al., 2024) and near-atomic structures within cryo-sections has recently been demonstrated (Elferich et al., 2025).”

      Although there is variation between individual synapses, PSDs are clearly visible in several previous cryo-ET studies (even if it's not as striking as in heavy-metal stained samples). In fact, although the contrast of the images is generally poor, PSDs are also visible in several examples shown in Figure 1 - Supplement 3. Not being able to detect them seems more of a problem of the workflow used here than of missing features. The authors should also discuss why heavy-metal stains would accumulate on a non-existing structure (PSD) in conventional EM.

      We agree that apparent higher molecular density can be observed in example tomographic data of earlier cryoET studies. We also report individual examples of similar synapses in our dataset. A key strength of our approach is that we have assessed the molecular architecture of large numbers of adult brain synapses acquired by an unbiased approach (solely guided by PSD95 cryoCLEM), which indicate that a higher molecular density proximal to the postsynaptic membrane is not a conserved feature of glutamatergic synapses in the adult brain. There is no rationale for our cryoCLEM approach being a ‘problem of the workflow’.

      The reviewer misunderstands the weaknesses of conventional/room temperature EM workflows (including resin-embedding and freeze substitution). It is unavoidable that most proteins are damaged by denaturation and/or washed away by washing samples in organic solvents (methanol/acetone that directly denature most proteins) during tissue preparation for conventional EM. It is therefore conceivable that in such preparations a relative increase in contrast proximal to the postsynaptic membrane (‘PSD’) would appear if cytoplasmic proteins were washed away during these harsh organic solved washing steps, leaving only those denatured proteins that are tethered to the postsynaptic membrane. It is not that the PSD is absent in cryoEM, rather that this difference in molecular crowding is not evident when tissues are imaged directly by cryoEM and have not undergone the harsh sample preparation required for conventional/room temperature EM.

      (2) Whether the synapses examined here are in a more physiological state than those analyzed in other papers remains absolutely unclear. For example, the quality of the tomographic slice shown in Figure 1C is poor, with the majority of synaptic vesicles looking suspiciously elongated. 

      We addressed this in our public reviews.

      (3) How were actin filaments segmented and quantified (e.g. for Fig 1E)? Apart from actin, can the authors show some examples of other macromolecular complexes (e.g. ribosomes) that they are able to identify in synapses (based on the info in supplementary tables)? Also, the mapping of glutamatergic receptors is not convincing, as the molecules were picked manually. To analyze their distribution, they should be mapped as comprehensively as possible by e.g. template matching.

      Actin filaments identified by ~7 nm diameter with ~70° branch points were manually segmented in IMOD. The number of filaments was counted per postsynaptic compartment. We have amended the methods section to include this description.

      “In the PoSM, F-actin formed a network with ~70° branch points (Figure 1–figure supplement 1C) likely formed by Arp2/3, as expected(Pizarro-Cerdá 2017,Fäßler 2020) . Putative filament copy number in the PoSM was estimated by manual segmentation in IMOD.” Manual picking was validated by the quality of the subtomogram average, which although only reached modest resolution (25 Å) is consistent with the identification of ionotropic glutamate receptors.

      (4) In the section "Synaptic organelles" the authors should provide some general information on the average number and size of synaptic vesicles (for the in-tissue tomograms).

      We have provided this information in the methods section:

      “The average diameter of synaptic vesicles was 40.2 nm and the minimum and maximum dimensions ranged from 20 to 57.8 nm, measured from the outside of the vesicle that included ellipsoidal synaptic vesicles similar to those previously reported (Tao et al., 2018).” A detailed survey of the presynaptic compartment, including the number of presynaptic vesicles was not the focus of our manuscript. We have deposited all tomograms from our dataset for any further data mining.

      Can the "flat tubular membranes compartments" be attributed to ER? The angular vesicles certainly have a typical ER appearance, as such morphology has been seen in several cryo-ET studies of neuronal and non-neuronal cells.

      In neuronal cells we regard it as unsafe to describe an intracellular organelle as being endoplasmic reticulum on the basis of morphology alone (eg. Smooth ER described widely in conventional EM) because of the apparent diversity of distinct organelles. As described in our methods section, we could have confidence that a membrane compartment is ER when we observe ribosomes tethered to the membrane. In instances where flat/tubular membranes did not have associated ribosomes, we take the cautious view that there is not sufficient evidence to define these as ER.

      Importantly, polyhedral vesicles were distinct from the flat/tubular membranes that resembled ER and are at present organelles of unknown identity. It will be important in future experiments to determine what are the protein constituents of these distinct organelle types to understand both their functions and how these distinct membrane architectures are assembled.

      Therefore, the sentences in lines 198-199 are simply wrong. Additionally, features of even higher membrane curvature are common in the ER (e.g. Collado et al., Dev Cell 2019). 

      We thank the reviewer for bringing our attention to this excellent paper (Collado et al.). We agree that the sentence describing the curvature being higher than all other membranes except mitochondrial cristae is wrong. We have removed this sentence in the revised manuscript.

      (5)The quality of the tomographic data for the in-tissue sample is low, likely due to cryo-sectioning-induced artifacts, as extensively documented in the literature. Additionally, the authors used 20% dextran as cryo-protectant for high-pressure freezing, which contrasts with statements like those in lines 342-344. Given that several publications describing the in-tissue targeting of synapses (e.g. from Eric Gouaux's lab) are available, the quality of the tomographic data presented in this work is underwhelming and limits the conclusions that can be drawn, not providing a solid basis for future studies of in-tissue synapse targeting. However, the complete workflow (excluding the sectioning part) can be adapted for a cryo-FIB approach. The authors should discuss the limitations of their approach. 

      Our manuscript preprint was deposited in the Biorxiv several years before Matsui/Gouaux’s recent ELife paper that reported a novel work-flow for in-tissue cryoET. It is difficult to directly compare data from our and Matsui/Gouaux’s approach because the latter reported a dataset of only 3 tomograms. Note also that Matsui/Gouaux followed our approach of using 20% dextran 40,000 as a cryo-preservative. The use of 20% dextran 40,000 as a cryo-protectant was first established by Zuber et al., 2005 (PMID: 16354833) and shown avoid hyper-osmotic pressure and cell membrane rupture. However, Matsui/Gouaux additionally included 5% sucrose in their cryoprotectant. We did not include sucrose as cryo-preservative because this exerts osmotic pressure and was not necessary to achieve vitreous tissues in our workflow.

      Before high-pressure freezing, Matsui/Gouaux also incubated tissue slices in a HEPES-buffered artificial cerebrospinal fluid (that included 2 mM CaCl2 but did not include glucose as an energy source) for 1 h at room temperature to label AMPA receptors with Fab fragment-Au conjugates. Under these conditions, neurons can elicit both physiological and excitotoxic action potentials (even though AMPARs were themselves antagonised with ZK-200775). The absence of glucose is a concern, and it is unclear to what extent tissue viability is affected by this incubation step. In contrast, we chose to use an NMDG-based artificial cerebrospinal fluid for slice preparation and high-pressure freezing that is a well-established method for preserving neuronal viability (Ting et al., 2018).

      We addressed the supposed limitations of cryo-sectioning versus cryoFIB-SEM in our public response. In particular, we have recently shown that cryo-sectioning produced a  subnanometer resolution in-tissue structure of a protein, that has so far only been achieved for ribosome within cryoFIB-SEM sample preparations. A discussion of cryo-sectioning versus cryoFIB-SEM must be informed by new data that directly compares these methods, which is not the subject of our eLife paper. We also cite a recent preprint directly comparing cryoFIB-milled lamellae with cryo-sections and showing that near atomic resolution structures can also be obtained from the latter sample preparations (Elferich et al., 2025).

      (6) The authors show (in Supplementary) putative tethers connecting SV and the plasma membrane. Is it possible to improve the image quality (e.g. some sort of filtering or denoising) so that the tethers appear more obvious? Can the authors observe connectors linking synaptic vesicles? 

      We have tested multiple iterative reconstruction and denoising approaches, including SIRT and noise2noise filtering in Isonet. We observed instances of macromolecular complexes linking one synaptic vesicle with another. However, there was no question we sought to answer by performing a quantitative analysis of these linkers.

      (7) Figure 4F is missing. 

      Thank you for spotting this omission. We have corrected this in the revised manuscript.

      (8) Most quantifications lack statistical analyses. These need to be included, and only statistically significant findings should be discussed. Terms like "significantly" (e.g. Line 144) should only be used in these cases.

      We used the term ‘significantly’ in the results section (line 143 and line 166 in revised text, we cite figure 1H and 2F showing analyses in which we have in fact performed statistical tests (t-tests with Bonferroni correction) comparing the voxel intensities in regions of the cytoplasm that are proximal versus distal to the postsynaptic membrane. We have amended the main text to include the details of the statistical test that we performed. Also, we neglected to include a description of the statistical test in line 241, which cites Figure 3G. We have corrected this in the revised text.

      Minor comments: 

      (1) Can the authors comment on why only 1-2 grids are prepared per mouse brain (in M&M -section)?

      We prepared only two grids in order to have prepared samples within 2 minutes, to limit deterioration of the sample.

      (2) Figure 1 Supplement 2 and its legend are confusing (averaging of non-aligned versus aligned post-synaptic membrane). Can the authors describe more clearly their molecular density profile analysis?

      We apologise that this figure legend was insufficient. We have included a detailed description of our molecular density profile analysis in the methods section entitled ‘Molecular density profile analysis’. In the revised manuscript we have now also included a citation to this methods section in Figure – figure 1 supplement 2 legend.

      (3) Please clarify with higher precision the areas were recorded in relation to the fluorescent spots (e.g. Figures 3A-C).

      We have included a white rectangular annotation in the cryoCLEM inset panels of Figures 3A-C to indicate the field of view of each corresponding tomographic slice. This shows that PSD95-GFP puncta localise to the postsynaptic compartments in each tomogram.

      (4) Figure 4 Supplement 2D is not clear: the connection between receptors and actin should be shown in a segmentation.

      We agree with the reviewer. A ‘connection’ is not clear, which is expected because the cytoplasmic domain of ionotropic glutamate receptor subunits is composed of a non-globular/intrinsically disordered sequence. We have amended our description of the proximity of actin cytoskeleton to ionotropic glutamate receptor clusters in the main text replacing “associated with” to “adjacent to”.

      (5) Line 341: the reference is referred to by a number (56) at the end of the sentence, rather than by name.

      Good spot. We have corrected this in the revised manuscript.

      (6) Line 968: tomograms is misspelled. 

      Good spot. We have corrected this error (line 1018 in our revised manuscript).

      Reviewer #2 (Recommendations for the authors): 

      (1) On page 11: "The position of (i)onotropic receptor...". 

      Good spot. We have corrected this.

      (2) On page 13: "Slightly higher relative molecular density..." this line ends with a citation to reference '56', but the works cited are not numbered.

      Good spot. We have corrected this in the revised manuscript.

      (3) On page 46: "as described in (69)..." the works cited are not numbered. 

      Good spot. We have corrected this in the revised manuscript.

      Reviewer #3 (Recommendations for the authors): <br /> (1) The title does not do the work justice. The authors make many exciting discoveries, e.g. PSD appearance, new polyhedral vesicles, ionotropic receptor positions, and intermembrane distance changes even within the synaptic cleft, but title their manuscript "The molecular infrastructure of glutamatergic synapses in the mammalian forebrain". It is also a bit misleading, since one would have expected more molecular detail and molecular maps as part of the work, so the authors may think about updating the title to reflect their exciting work. 

      We thank the reviewer for recognising the exciting discoveries in our manuscript. Summarising all these in a title is challenging. We intend ‘molecular infrastructure’ to mean a structure composed of many molecules including proteins (by analogy ‘transport infrastructure’ is composed of many roads, ports and train lines).

      (2) It would be in the spirit of eLife and open science if the authors could submit their segmentations alongside the tomographic data to either EMPIAR or pdb-dev (if they accept it) or the new CZII cryoET data portal for neurobiologists, method developers, and others to use. 

      We agree with the reviewer. We have deposited in subtomogram averaged map of AMPA receptor in EMDB, and all tilt series and 4x binned tomographic reconstructions described in our manuscript (figure 1- table1 and figure 2 -table 2), together with segmentations in EMPIAR.  

      (3) Methods: the authors establish an exciting new workflow to get from living mice to frozen specimens within 2 minutes and perform many unique analyses that would be useful to different fields. Their methods section overall is well described and contains criteria and details that should allow others to apply experiments to their scientific problems. However, it would be very helpful to expand on the methods in the 'annotation and analysis [...]' and "Subtomogram averaging" sections, to at least in short describe the steps without having to embark on a reference journey for each method and generally provide more detail. For the annotation section, the software used for annotation is not listed. Table 1 only contains the list of the counts of organelles etc. identified in each tomogram, no processing details. 

      We have revised the methods section ‘annotation and analysis’ including software used (IMOD). We have also included a slightly more detailed description of subtomogram averaging. We did not include ‘processing details’ because there are none - identification of constituents in each tomogram was carried out manually, as described in the methods section.

      (4) Some of the tomograms submitted as videos may have slipped through as an early version since they appear to be originating from not perfectly aligned tiltseries; vesicles and membranes can be observed 'rubberbanding'. The authors should go through and check their videos. 

      We thank the referee for suggesting we double check our tomogram videos. All movies are representative tomographic reconstructions from ultra-fresh synapse preparations (Figure 1 – videos 1-7) and synapses in tissue cryo-sections (Figure 2 – videos 1-2). We have double checked that the videos correspond to tomograms that were aligned as good as possible. In general, tissue cryo-section tomograms reconstructed less well than ultra-fresh synapse tomograms, which limits the information content of these data, as expected. Consequently, the reconstructions shown in these videos were all reconstructed as best we could (testing multiple approaches in IMOD, and more recent software packages, eg. AreTomo). While we think it is important to share all tomograms, regardless of quality, we were careful to exclude tomograms for analysis that did not contain sufficient information for analysis (as described in the methods section).

      Minor suggestions: 

      (1) Page 13, line 341, reference 56, but references are not numbered. Please update.

      Good spot. We have corrected this in the revised manuscript.

      (2) Page 33, line 746, the figure legend is not referencing the correct figure panels G-K should be I-K;

      We have amended the Figure 3 legend to “(G-K) Snapshots and quantification of membrane remodeling within glutamatergic synapses”.

      (3) Page 33, line 750; reads 'same as E', but should be 'same as G'. 

      Good spot. We have corrected this in the revised manuscript.

      (4) Page 35, Figure 4: Please use more labels: Figure 4B: it would be helpful to use different colors for each view and match to the tomogram - then non-experts could easily relate the projections and real data; Figure 4C: please label domains; Figure 4F: the figure panel got lost. 

      This is an interesting idea. While our subtomgram average of 2522 subvolumes provided decent evidence that these are ionotropic receptors, we are reluctant to label specific putative domains of individual subvolumes in the raw tomographic slice because the resolution of the raw tomogram (particularly in the Z-direction) is worse and may not be sufficient to resolve definitely each domain layer. We hope the reviewer appreciates our cautious approach.

      (5) Page 42, line 933: incomplete sentence. 

      Good spot. We have corrected this in the revised manuscript.

      (6) Page 46, line 1038; Reference 69 is in brackets, but references are not numbered. Please update.

      Good spot. We have corrected this in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewing Editor Comments:

      Focus and Scope:

      The paper attempts to address too many topics simultaneously, resulting in a lack of focus and insufficient depth in the treatment of individual components.

      We have moved this selective clinical review section that was previously Part I in the paper now to Part II, given the importance of leading off with the meta-analysis and resource before doing a selective review, which are now Part I. In the lead in to Part II, we now indicate that the review is not intended to be comprehensive, because there are other recent comprehensive reviews, which we cite. This part of the paper merely aims to generate hypotheses on the directionality of effects ripe for testing on how TUS could be used to excite or suppress function, illustrated with specific clinical examples. The importance of this section, even though not comprehensive, is that it should provide the reader with examples on how the directionality of TUS could be used specifically in a range of clinical applications. The reader will find that the same hypotheses do not apply to different clinical disorder. Therefore, patient specific hypotheses need to be motivated and then subsequently tested with empirical application of TUS, which Part II provides.

      Part II. Selective TUS clinical applications review and TUS directionality hypotheses starts at line 458. Part I, the meta-analysis and resource section starts at line 199, after the Introduction on TUS and the importance on understanding how the directionality of TUS effects could be better understood.

      Strengthening the Meta-Analysis:

      The meta-analysis is the strongest aspect of the paper and should be expanded to include the relevant statistics. However, it currently omits several key concepts, studies, and discussion points, particularly related to replication and the dominance of results from specific groups. These omissions should be addressed even with a focus on meta-analysis.

      We thank the reviewer for their enthusiasm about the meta-analysis, which we have now promoted to Part I in the revised paper. We have substantially updated the latest database (inTUS_DATABASE_1-2025.csv) and ensured that the R markdown script can re-generate all of the results and statistical values. We have inserted additional statistical values in the main manuscript, as requested. The inTUS Resource is located here (https://osf.io/arqp8/ under Cafferatti_et_al_inTUS_Resource), and we have aimed to make it as user friendly to use and contribute to as possible. For instance, the reader can find them all in the HTML link summarizing the R markdown output with all statistical values here: https://rpubs.com/BenSlaterNeuro/1268823, a part of the inTUS resource.

      Since the last submission, there has been a tremendous increase in the number of TUS studies in healthy participants. We have curated and included all of the relevant studies we could find in the 1-2025 database, as the next large expansion of the database (now including 52 experiments in healthy participants). We then reran and report the results of the statistical tests via the R markdown script (starting at line 336). Finally, the online database (inTUS_DATABASE_1-2025.csv) has additional columns, suggested by the reviewers, including one to identify the same groups that conducted the TUS study, based on a social network analysis. The manuscript figures (Table 1 and Table 2) did not have the space to expand the data tables, but these additional columns are available in the database online. Finally, we have ensured that the resource is as easy to use as possible (line 862 has the Introduction to the inTUS Resource – which is also the online READ ME file), and we have been in contact with the iTRUSST consortium leads who are interested in discussing hosting the resource and helping it to become self-sustaining.

      Conceptual Development:

      The more conceptual part of the paper is underdeveloped. It lacks sufficient supporting data, a well-articulated argument, and a clear derivation or development of a concrete model.

      To ensure that the conceptual sections are well developed, we have revised the introduction, including the background on TUS and bases for the interest in the directionality of effects. We have also revised the TUS mechanisms background as suggested by the reviewers. For Part I, the meta-analysis basis and hypotheses we have ensured the rationale is clearer. The hypotheses are based on several lines of research in the animal model and human literature as cited (starting with line 211). For Part II, the selective clinical review, we have revised this section as well to have each section on lowintensity TUS and end in a hypothesis on the directionality of TUS effects. Starting at line 199 we have clarified the scope of the review and ensured that all the relevant experiments in healthy participants (n = 52 experiments) have now been included in the next key update of the resource and meta-analysis in this key paper update.

      Database Curation:

      The authors should provide more detailed information about how the database will be curated and made accessible. They may consider collaborating with ITRUSST.

      We have expanded the information on the Resource documents (starting at line 862) to make the resource as user friendly as possible. At the beginning of the resource development stage we had contacted but not heard from the ITRUSST consortium. Encouraged by this comment we again reached out and are now in contact with the ITRUSST consortium leads who are interested in discussing sustaining the resource. It would be wonderful to have the resource linked to other ITTRUST tools, since it was inspired by the organization. Practically what this means is that the resource rather than being hosted on Open Science Framework, would potentially be hosted on the ITRUSST web site (https://itrusst.com/). These discussions are in progress, but the next key update to the database (1-2025) is already available and reported in this key update to our original paper.

      Reviewer #1: (Public Review)

      Summary:

      This paper is a relevant overview of the currently published literature on lowintensity focussed ultrasound stimulation (TUS) in humans, with a meta-analysis of this literature that explores which stimulation parameters might predict the directionality of the physiological stimulation effects.

      The pool of papers to draw from is small, which is not surprising given the nascent technology. It seems nevertheless relevant to summarize the current field in the way done here, not least to mitigate and prevent some of the mistakes that other non-invasive brain stimulation techniques have suffered from, most notably the theory- and data-free permutation of the parameter space.

      The meta-analysis concludes that there are, at best, weak trends toward specific parameters predicting the direction of the stimulation effects. The data have been incorporated into an open database, that will ideally continue to be populated by the community and thereby become a helpful resource as the field moves forward.

      Strengths:

      The current state of human TUS is concisely and well summarized. The methods of the meta-analysis are appropriate. The database is a valuable resource.

      Weaknesses:

      These are not so much weaknesses but rather comments and suggestions that the authors may want to consider.

      We thank the reviewer for their support of the resource and meta-analysis. We have implemented the suggestions next as follows.

      I may have missed this, but how will the database be curated going forward? The resource will only be as useful as the quality of data entry, which, given the complexity of TUS can easily be done incorrectly.

      We have added a paragraph on how authors could use the Qualtrics form to submit their data and the curation process involved (from line 891). Currently, this process cannot be automated because we continue to find that reported papers do not report the TUS parameters that ITRUSST has encouraged the community to report (Martin et al., 2024). We can dedicate for a TUS expert to ensure that every 6 or 12 months the data base is curated and expanded. The current version is the latest 1-2025 update to the data base. Longer term we are in discussion with ITRUSST on whether the resource could become self sustaining when TUS papers regularly reporting all the relevant parameters such that the database expansion becomes trivial, and then the Resource R markdown script and other tools can be used to re-evaluate the statistical tests and the user can conduct secondary hypothesis testing on the data.

      It would be helpful to report the full statistics and effect sizes for all analyses. At times, only p-values are given. The meta-analysis only provides weak evidence (judged by the p-values) for two parameters having a predictive effect on the direction of neuromodulation. This reviewer thinks a stronger statement is warranted that there is currently no good evidence for duty cycle or sonication direction predicting outcome (though I caveat this given the full stats aren't reported). The concern here is that some readers may gallop away with the impression that the evidence is compelling because the p-value is on the correct side of 0.05.

      We have ensured that the R script can generate the full statistics from the tests and the effect sizes for all the analyses, and now also report more of the key statistical values in the revised paper (starting at line 336). As suggested, we have also ensured that the interpretation is sufficiently nuanced given the small sample sizes and the p-values below 0.1 but above 0.05 are interpreted as a statistical trend.

      This reviewer thinks the issue of (independent) replication should be more forcefully discussed and highlighted. The overall motivation for the present paper is clearly and thoughtfully articulated, but perhaps the authors agree that the role that replication has to play in a nascent field such as TUS is worth considering.

      We completely agree and have added additional columns to the online database to identify unique groups, using a social network analysis, and independent replications. These expanded tables did not fit in the manuscript versions of Tables 1 and 2 but are fully available in the Resource data tables ready for further analysis by interested resource users.

      A related point is that many of the results come from the same groups (the so-called theta-TUS protocol being a clear example). The analysis could factor this in, but it may be helpful to either signpost independent replications, which studies come from the same groups, or both.

      In the expanded database tables (inTUS_DATABASE_1-2025.csv: https://osf.io/arqp8/ under Cafferatti_et_al_inTUS_Resource) we have added a column to identify independent replication.

      The recent study by Bao et al 2024 J Phys might be worth including, not least because it fails to replicate the results on theta TUS that had been limited to the same group so far (by reporting, in essence, the opposite result).

      Thank you. We have added this study and over a dozen recent TUS studies in healthy participants to the database and redone the analyses.

      The summary of TUS effects is useful and concise. Two aspects may warrant highlighting, if anything to safeguard against overly simplistic heuristics for the application of TUS from less experienced users. First, could the effects of sonication (enhancing vs suppressing) depend on the targeted structure? Across the cortex, this may be similar, but for subcortical structures such as the basal ganglia, thalamus, etc, the idiosyncratic anatomy, connectivity, and composition of neurons may well lead to different net outcomes. Do the models mentioned in this paper account for that or allow for exploring this? And is it worth highlighting that simple heuristics that assume the effects of a given TUS protocol are uniform across the entire brain risk oversimplification or could be plain wrong? Second, and related, there seems to be the implicit assumption (not necessarily made by the authors) that the effects of a given protocol in a healthy population transfer like for like to a patient population (if TUS protocol X is enhancing in healthy subjects, I can use it for enhancement in patient group Y). This reviewer does not know to which degree this is valid or not, but it seems simplistic or risky. Many neurological and psychiatric disorders alter neurotransmission, and/or lead to morphological and structural changes that would seem capable of influencing the impact of TUS. If the authors agree, this issue might be worth highlighting.

      We agree that given the divergence in circuits and cellular constituents between cortical and subcortical areas, it is important to distinguish studies that have focused on cortical or subcortical brain areas. The online data tables identify the target region. The analyses can be used to focus on the cortical or subcortical sites for analysis, although for the current version of the database there are too few subcortical sites with which to conduct analyses on subcortical sites. On the second point, that pathology may have affected the results, we completely agree and have clarified that the current database only includes healthy participant experiments for this reason. We are considering future updates to the resource may include clinical patient results (Line 247).

      Reviewer #1 (Recommendations for the authors):

      Minor edits (I wouldn't call them "corrections").

      We sincerely appreciate the constructive comments and have aimed to address them all as suggested.

      Perhaps the most relevant edit pertains to the statistics.

      We now report the more complete statistical results (line 336) and the R markdown script can re-generate all the statistical values for the tests.

      The issue of replication also seems relevant and ought to be raised. This reviewer does not want to prescribe what to do or impose the view the authors ought to adopt.

      In the online version of the data tables for the latest dataset, we have added a column in the data table as suggested that identifies independent groups and replications.

      The other points are left to the authors' discretion.

      We have aimed to address all of the reviewer’s points. Thank you for the constructive input which has helped to improve the expanded database and resource.

      Reviewer #2: (Public Review)

      Summary:

      This paper describes a number of aspects of transcranial ultrasound stimulation (TUS) including a generic review of what TUS might be used for; a meta-analysis of human studies to identify ultrasound parameters that affect directionality; a comparison between one postulated mechanistic model and results in humans; and a description of a database for collecting information on studies.

      Strengths:

      The main strength was a meta-analysis of human studies to identify which ultrasonic parameters might result in enhancement or suppression of modulation effects. The meta-analysis suggests that none of the US parameters correlate significantly with effects. This is a useful result for researchers in the field in trying to determine how the parameter space should be further investigated to identify whether it is possible to indeed enhance or suppress brain activity with ultrasound.

      The database is a good idea in principle but would be best done in collaboration with ITRUSST, an international consortium, and perhaps should be its own paper.

      Weaknesses:

      The paper tries to cover too many topics and some of the technical descriptions are a bit loose. The review section does not add to the current literature. The comparison with a mechanistic model is limited to comparing data with a single model at a time when there is no general agreement in the field as to how ultrasound might produce a neuromodulation effect. The comparison is therefore of limited value.

      We appreciate the reviewer’s assessment and interest in the meta-analysis and database to guide the development of TUS for more systematic control of the directionality of neuromodulation. With this next key expansion of the database (inTUS_DATABASE_1-2025.csv) we have added over a dozen new studies that have been published since our original submission (n = 52 experiments). We have also moved the ‘review’ part of the paper below the meta-analysis and resource description. We have clarified that the clinical review section (now Part II in the revised manuscript) is not intended as a comprehensive review but as a selective review showing how hypotheses on the directionality of TUS effects need to be carefully developed for specific patient groups that require different effects to be induced at specific brain areas. Finally, we have gotten in contact with the ITRUSST consortium leads, as suggested, and are in discussion on whether the inTUS resource could be hosted by ITRUSST. Since these discussions are ongoing practically what this might mean is moving the resource from the Open Science Framework to ITRUSST webpages, which would be a trivial update of the link to the resource in OSF.

      We also sincerely appreciate the time and care the reviewer has given to provide us with the below guidance, all of which we have aimed to take on board in the revised paper.

      Reviewer #2 (Recommendations for the authors):

      Line 24/25 - I suggest avoiding using the term "deep brain stimulation" in reference to TUS as the term is normally used to describe electrically implanted electrodes.

      We have removed the term “deep” brain stimulation in reference to TUS to avoid confusion with electrical DBS for patient treatment [Line 24].

      Line 25 - I don't think "computational modelling" has changed how TUS can be done. There is still much to be understood about mechanisms. I think the modelling aspects of the paper should be toned down. Indeed the NICE data that is presented later appears to have a weak, if any, correlation to the outcomes.

      We have revised the manuscript text throughout to ensure that the computational modeling contributions are not overstated, as noted, given the lack of strong correlation to the NICE model outcomes by the meta-analysis including in the latest results with the more extensive database (n = 52).

      Line 32 - "exponentially increasing" is a well-defined technical term and the increase in studies should be quantified to ensure it is indeed exponential. I agree that TUS studies in humans are increasing but a quick tally of the data by year in the meta-analysis reported here doesn't suggest that it follows an "exponential" growth.

      We have changed “exponential” to “to increase”. [Line 32]

      Line 50 - I would suggest using the term sub-MHz rather than 100-1,000 kHz as it is challenging to deliver ultrasound at 1 MHz through the skull. The highest frequency in the meta-analysis is 850 kHz; but the majority are in the 200-500 kHz range.

      We have made this correction to sub-MHz. [Line 54]

      Line 58/59 - Is the FDA publication on diagnostic imaging relevant for saying that 50 W/cm2 is a lowintensity TUS? I think it's perhaps reasonable to say that intensities below diagnostic thresholds are "low intensities" but that is not clear in the text. I would refer to ITRUSST on what is appropriate for defining what is low, medium, or high.

      We have cut the reference to the FDA here since it is, as noted, not as relevant as pointing to the ITRUSST definition.

      Line 65/66 - I agree that ultrasound for neuromodulation is gaining traction and there is an increase in activity, but it also has a long history with the work of the Fry brothers published in the 1950s; and extensive work of Gavrilov in humans starting in the 1970s.

      We have added citations to the Fry brothers and Gavrilov to the text in this section. [Line 69/70]

      Line 75 - I think the intermembrane cavitation mechanism is unlikely to be due to "microbubbles" in a lipid membrane. The predicted displacements are on the order of nanometres, so they are unlikely to generate microbubbles. The work on comparing with NICE is limited. Note there are a number of experimental papers that have reported an absence of intra-membrane cavitation, including the Yoo et al 2022 which is referenced later in the paragraph. Also, there are other models, such as Liao et al 2021 (https://www.nature.com/articles/s41598020-78553-2).

      As suggested, we have removed this phrase on microbubble formation as a likely mechanism. We have also added the Liao paper to this paragraph as it is relevant.

      Line 83 - "At the lower intensities..." it is not clear whether this means all TUS intensities or the lower end of intensities used in TUS.

      We now use the following wording here: “low intensities”. [Line 86]  

      Line 85/86 - "more continuous stimulation" the modulation paradigms haven't been described yet and so pulse vs continuous hasn't been made clear to the reader. Also "more continuous" is very loose terminology. Something is either continuous or it isn't.

      We agree and have removed “more” to be clear that the stimulation is continuous. [Line 88]

      Line 87/88 - "TUS does not .. cavitation ..when ..ISPTA...<14 W/cm2". You can't use ISPTA to determine cavitation. It is the peak negative pressure which is the key driver for cavitation and the MI which is the generally accepted (although grudgingly by some) metric for assessing cavitation risk. You can link the negative pressure to ISPPA but not really to ISPTA. In histotripsy for example the ISPTA is low due to the low duty cycles to avoid heating but the cavitation is a huge effect. Technical terminology is loose.

      We have corrected this to “TUS does not appear to cause significant heating or cavitation of brain tissue when the intensity remains low, based on Mechanical and Thermal Index values and recommendations of use”. [Line 90/91]

      Line 89 - What is meant by "low intensity TUS"? I think all TUS used in the literature counts as low intensity - in that it is below the level allowed for diagnostic imaging.

      We have ensured that the text is focused on TUS being low-intensity and only in the introduction do we distinguish low intensity TUS from moderate and high intensity TUS, such as used for thermal ablation [Lines 62-66].

      Line 88/89 - Most temperature rises in brain tissue in TUS are well below 1 C - will this really change membrane capacitance significantly? If so it would have been good to consider a model for it.

      We have revised this statement as “thermal effects could at least minimally alter cell membrane capacitance…”. [Line 93]

      Line 111 - The text refers to "recent studies" but then the next two references are from 1990 and 2005 which I would argue don't count as "recent".

      We have corrected this wording to “previous studies”. [Line 114]

      Lines 122/129 - This paragraph on TMS pulsing should be linked to the TUS paragraph on pulsing (lines 109/116). The intervening paragraph on anaesthesia is relevant but breaks the flow.

      We have merged the paragraph on anesthesia to the prior one on TUS so that the TMS paragraph is linked more closely to it [starting on line 112].

      Line 130/131 - It is not clear to me that current studies are being guided by computational models. I think there is still no generally accepted theory for mechanisms. If the authors want to do a mechanisms paper then they should compare a few.

      We have revised this as suggested to not overstate the contribution of the limited computational modeling studies throughout the manuscript.

      Line 132 on - There are a number of studies that suggest that NICE is likely not the mechanism by which TUS produces neuromodulation.

      We have revised this sentence as follows: “Although it remains questionable whether intramembrane cavitation is a key mechanism for TUS, the NICE model simulations explored a broad set of TUS parameters, including TUS intensity and the continuity of stimulation (duty cycle) on modelled neuronal responses.” [Lines 139/142]

      Lines 137-140 - Terms are defined after their use. Things like ISPPTA, PRF, TI, and MI have been discussed already and so the terms should have been defined earlier. The authors should think carefully about how the material is presented to make it more logical for the reader.

      We have ensured that the definitions precede the use of abbreviations and have added abbreviations to the tables.

      Part I Line 180-437 - The review of potential applications for TUS reads like an introductory chapter of a thesis. It is entirely proper for a thesis to have a chapter like this, but it is not really relevant for a peer-reviewed research article. There are also numerous applications, e.g. mapping areas associated with decisions, or treating patients with addiction, which are not included, so it is not exhaustive. I would suggest this part be removed.

      We have moved the ‘review’ part of the paper to Part II, given the metaanalysis and resource should be more prominent as Part I. In the review now Part II of the paper we also now make it clear that there are recent comprehensive reviews of the clinical literature ( line 465/467). Namely, the purpose of our selective review is to demonstrate how directionality of TUS effects need to be specific for the clinical application intended, given the great variability in clinical effects that might be desired, brain areas targeted and pathology being treated. We have also aimed to ensure that each section summary is scholarly and academically written to a high level. All the co-authors contributed to these sections so we have also edited to have some consistency across sections, with sections ending with directionality of TUS hypotheses that could be developed for empirical testing.

      Line 453 - It is stated that "ISPTA, which mathematically integrates ISSPA by the sonication DC" It sounds rather grand to mathematically integrate but you can't integrate with respect to DC, you can integrate with respect to time. If you integrate intensity with respect to time over pulse and over the sonication time then one finds that ISPTA = DC x ISPPA, multiplication is also an important mathematical function and should be given its due. Lastly, I think there is a typo and ISSPA should read ISPPA

      We have corrected the typo and the statement to “mathematically multiplies ISPPA by the continuity of sonication”. [Line 221/222]

      Line 454 - I don't think ISPTA is a good measure of "dose." In radiation physics dose is well defined in terms of absorbed energy. The equivalent has yet to be defined for TUS so I would avoid using dose. The ISPTA does relate to TI - although it depends not just on the spatial peak but also on the spatial distribution and the frequency-dependent absorption coefficient of the tissue. I would just avoid the use of "dose" until the field has a better idea of what is going on.

      We have cut this phrase on dose as suggested.

      Page 16 Box 1 - TI is defined as diagnostic ultrasound imaging it is based on. Also, I think TI is dimensionless; it is referenced to a 1-degree temperature rise and so it can be interpreted in terms of celsius or kelvin; but to be technically accurate it is dimensionless.

      We have made TI dimensionless in Box 1

      Page 17 Box 2 - Here you have no units for TI - which is correct but inconsistent with Box 1. But the legend suggests a 2 K temperature rise where as your Box allows for 6 K. The value of 6 is consistent with FDA but my understanding of the BMUS guidelines is the TI must be less than or equal to 0.7 for unlimited time or less than 3 if the duration is less than 1 minute. I accept that the table is labelled FDA limits, but the bold table caption is "Recommendations for TUS parameters" I think you should give the ITRUSST values rather than FDA.

      We have revised this Box legend to better distinguish the FDA and ITRUSST recommendation where they differ (e.g., the importance of ISPTA and the TI values). See revised legend for Box 2.

      Page 18 Box 3 - Not sure what this is trying to show? Also, what is "higher intensity" and "lower intensity"?

      Why not just give a range of values in each box?

      We agree that the higher and lower intensities likely to lead to enhancement or suppression are poorly defined and have noted this in the legend: “Note that the threshold for ISPPA qualifying as ‘higher’ or ‘lower’ intensity is currently poorly understood, or may non-linearly interact with other factors” [Line 751/754, Box 3].

      Line 444 - The hypotheses should be stated more clearly. Maybe I am just dense, but it is not obvious to me from box 3.

      We provide the basis for the hypotheses in the manuscript text on the paragraph [Lines 106-179].

      Line 481/482 - The intensity of a diagnostic ultrasound system is very well characterised. It just might be that the authors didn't report it. It is not clear what is meant by the "continuity." I guess it's to do with pulsing - which is also well defined but perhaps also not reported.

      We agree and have revised this as follows “For the meta-analysis, we only included studies that either reported a basic set of TUS stimulation parameters or those sufficient for estimating the required parameters or those sufficient for estimating the required parameters necessary for the meta-analysis” [Lines 256/258]

      Figure 2 - What is the purpose of this figure? Did you carry out simulations for all the studies? It doesn't seem to be relevant to the data here.

      This figure illustrates the TUS targeting approach and simulations, in this case conducted in k-plan. These were conducted to evaluate approximations to ISPPA in brain values from the studies that did not report these values [Lines 264/268]).  

      Figure 4 - The data in these figures is nice (and therefore doesn't need to have a NICE curve) To me it clearly shows that the data in the literature does not obviously segment into enhancement vs suppression with DC. I suspect it is the same with PRF. I think it would have been better if C and D had PRF on the horizontal axis for on-line and off-line so that effect could be seen more clearly.

      We have kept the NICE curve only for a reference that some readers familiar with the NICE model might want to see overlaid in the figure, but have ensured that the text throughout makes clear that the NICE model predictions are not as statistically robust as initially anecdotally thought. PRF results are not significant but we do show a panel with the PRF measures on one axis (Fig. 4D). Figure 5 also shows box plot results with PRF as well as the other key TUS parameters. Moreover, in the inTUS resource we have provided an app for users to explore the data (https://benslaterneuro.shinyapps.io/Caffaratti_inTUS_Resource/).

      Figure 5 - The text on the axes is too small to read. Was the DC significant for both on-line and offline? What about ISPPA for off-line. At least by eye, it looks as different as DC. Figure 5C doesn't add anything.

      We have boosted the font for Figure 5 and have cut panel 5C since it was not adding much. We have also checked whether DC parameter was significant separately for on-line and off-line effects, but the sample sizes were too small for significance, and the statistical test was not significantly different for Online and Offline effects even in the 12025 database. Therefore they might look stronger for Offline effects in some of the plots in Figure 5, but are currently statistically indistinguishable [Lines 347/348].

      Table 1 - There is a typo in the 3rd column. FF should have units of kHz, not KHz. In addition, SD should have units of s as that is the SI symbol for seconds. I would swap columns 9 and 10 so that ISPPA in water and ISPPA in the brain are next to each other.

      We have corrected the typo in the 3rd column and ensured that units are kHz. SD in the tables has units of ‘s’ for seconds and have put ISPPA in water and in brain next to each other in the data tables.

      Line 767 - "M.K. was supported..." There are TWO MKs in the author list.

      We have changed this to M.Ka. for Marcus Kaiser.

    1. Author response:

      We thank the Reviewers for their thoughtful and helpful critiques. Below we provide a point-bypoint response to the comment raised.

      Reviewer #1:

      (1) Labels should be added in the Figures and should be uniform across all Figures (some are distorted).

      We thank the Reviewer for pointing out this issue. As requested, labels have been edited to ensure they are legible and are consistent in font, size, and style.  

      Reviewer #2:

      (1) As for Figure 2F, Setd2-SET activity on WT rNuc (H3) appears to be significantly lower compared to what is extensively reported in the literature. This is particularly puzzling given that Figure 2B suggests that using 3H-SAM, H3-nuc are much better substrates than K36me1, whereas in Figure 3F, rH3 is weaker than K36me1. It is recommended for the authors to perform additional experimental repeats and include a quantitative analysis to ensure the consistency and reliability of these findings.  

      We appreciate the Reviewer’s points. We respectfully suggest that these comments may reflect potential confusion around interpreting how different assays detect in vitro methylation, what data can and cannot be compared, and the nature of the different substrates used. 

      With respect to point 1 (Western signal significantly lower compared to extensive literature): To the best of our knowledge, it would be extremely challenging to make a quantitative argument comparing the strength of the Western signal in Figure 2F with results reported in the literature. Specifically, comparing our results with previous studies would require (1) all the studies to have used the exact same antibodies as antibody signal intensities vary depending on the specific activity and selectively of a particular antibody and even its lot number, (2) similar in vitro methylation reaction condition, (3) the same type of recombinant nucleosomes used, and so on. Further, given that these are Western blots, we do not understand how one could interpret an absolute activity level. In the figure, all we can conclude is that in in vitro methylation reactions, our recombinant SETD2 protein methylates rNucs to generate mono-, di-, and tri-methylation at K36 (using vetted antibodies (see Fig. 2e)). If there is a specific paper within the extensive literature that the Reviewer highlights, we could look more into the details of why the signals are different (our guess is that any difference would largely be due to the use of different antibodies). We add that it might be challenging to find a similar experiment performed in the literature; we are not aware of a similar experiment. 

      With respect to comparing Figure 2B and 2F: We do not understand how one can meaningfully compare incorporation of radiolabeled SAM to antibody-based detection on film using an antibody against specific methyl states. In particular, regarding the question regarding comparing rH3 vs H3K36me1 nucleosomes, we point out that in using recombinant nucleosomes installed with native modifications (e.g. H3K36me1), in which the entire population of the starting material is mono-methylated, then naturally the Western signal with an anti-H3K36me1 antibody will be strong. In Fig. 2b, the assay is incorporation of radiolabeled methyl, which is added to the preexiting mono-methylated substrate. In other words, the results are entirely consistent if one understands how the methylation reactions were performed, how methylation was detected, and the nature of the reagents.

      (2) The additional bands observed in Figure 4B, which appear to be H4, should be accompanied by quantification of the intensity of the H3 bands to better assess K36me3 activity. Additionally, the quantification presented in Figure 4C for SAH does not seem accurate as it potentially includes non-specific methylation activity, likely from H4. This needs to be addressed for clarity and accuracy. 

      We thank the reviewer for this comment. The additional bands observed in Figure 4B represent degradation products of histone H3, not H4 methylation. This is commonly seen in in vitro reactions using recombinant nucleosomes, where partial proteolysis of H3 can occur under the assay conditions.  

      (3) In Figure 4E, the differences between bound and unbound substrates are not sufficiently pronounced. Given the modest differences observed, authors might want to consider repeating the assay with sufficient replicates to ensure the results are statistically robust.

      In Figure 4E, we observe a clear difference between the bound and unbound substrate. To aid interpretation, we have clarified in the figure where the bound complex migrates on the gel, while the unbound nucleosomes migrate at the bottom of the gel. The differences are indeed subtle, which we highlight in the text.  

      (4) Regarding labeling, there are multiple issues that need correction: In the depiction of Epicypher's dNuc, it is crucial to clearly mark H2B as the upper band, rather than ambiguously labeling H2A/H2B together when two distinct bands are evident. In Figure 3B and D, the histones appear to be mislabeled, and the band corresponding to H4 has been cut off. It would be beneficial to refer to Figure 3E for correct labeling to maintain consistency and accuracy across figures. 

      Thank you for pointing this out. To avoid any confusion, we have delineated the H2B and H2A markers and indicate the band corresponding to H4.

      (5) There are issues with the image quality in some blots; for instance, Figure 2EF and Figure 2D exhibit excessive contrast and pixelation, respectively. These issues could potentially obscure or misrepresent the data, and thus, adjustments in image processing are recommended to provide clearer, more accurate representations. 

      Contrast adjustments were applied uniformly across each entire image and were not used to modify any specific region of the blot. We have corrected the issue of increased pixelation in Figure 2D. 

      (6) The authors are recommended to provide detailed descriptions of the materials used, including catalog numbers and specific products, to allow for reproducibility and verification of experimental conditions. 

      We have added the missing product specifications and catalog numbers to ensure clarity and reproducibility of the experiments.

      (7) The identification of Setd2 as a tumor suppressor in KrasG12C-driven LUAD is a significant finding. However, the discussion on how this discovery could inspire future therapeutic approaches needs to be more balanced. The current discussion (Page 10) around the potential use of inhibitors is somewhat confusing and could benefit from a clearer explanation of how Setd2's role could be targeted therapeutically. It would be beneficial for the authors to explore both current and potential future strategies in a more structured manner, perhaps by delineating between direct inhibitors, pathway modulators, and other therapeutic modalities. 

      SETD2 is a tumor suppressor in lung cancer (as we show here and many others have clearly established in the literature) and thus we would recommend avoiding a SETD2 inhibitor to treat solid tumors, as it could have a very much unwanted affect.  Our discussion addresses a different point regarding the relative importance of the enzymatic activity versus other, nonenzymatic functions of SETD2. We believe that a detailed exploration of the therapeutic potential of inhibiting SETD2 would be better suited in a review or a more therapy-focused manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews

      We thank the reviewers and editors for their careful consideration of our work and pointing out areas where the current version lacked clarity or necessary experiments. Based on the reviews we have made the following significant changes to the revised version:

      (1) Revised the text to focus on the distinct pathogen responses to indole in isolation versus fecal material.

      We believe the key takeaway from this work is that the native context of a given effector, in this case indole, can elicit markedly different bacterial responses compared to the pure compound in isolation. This is because natural environments contain multiple, often conflicting, stimuli that complicate predictions of overall chemotactic behavior. For example, while indole has been proposed to mediate chemorepulsion and contribute to colonization resistance against enteric pathogens, our findings challenge this model. We provide evidence that feces, the intestinal source of indole, actually induces attraction, and that indole taxis may in fact benefit the pathogen through prioritizing niches with low microbial competition. Put another way, the biological reservoir of indole, fecal material, generates an attraction response but indole regulated the degree of attraction.

      Most current understanding of chemotaxis is based on responses to individual, purified effectors. Our study highlights the need to investigate chemotactic responses in the presence of native mixtures, which better reflect the complexity of natural environments and may reveal new functional insights relevant for disease.

      Reviewer comments indicated that these core points above were not clearly conveyed in the previous version, and that the manuscript's logical flow needed improvement. In this revised version, we have substantially rewritten the text and removed extraneous content to sharpen the focus on these central findings. We have also aligned our discussion more closely with the experimental data. While we appreciated the reviewers’ thoughtful suggestions, we chose not to expand on topics that fall outside the scope of our current experiments.

      (2) Provide new chemotaxis data with mixtures of fecal effectors (Fig. 5).

      Related to the above, the reviewers and editors brought up concerns that our discovery of pathogen fecal attraction was underexplored. Although we showed Tsr to be important for mediating fecal attraction, even the tsr mutant showed attraction to a lesser degree, and the reviewers noted that we did not identify what other fecal attractants could be involved.

      Fecal material is a complex biological material (as noted by Reviewer 3) and contains effectors already characterized as chemoattractants and chemorepellents. It would be ideal to be able to perform some experiment where individual effectors are removed from fecal material and then quantify chemotaxis. We considered methods to do this but ultimately found this approach unfeasible. Instead, we employed a reductionist approach and developed a synthetic approximate of fecal material containing a mixture of known chemoeffectors at fecal-relevant concentrations (Fig. 5). We used this defined system as a way to test the specific roles of the Tsr effectors L-Ser (attractant) and indole (repellent) in relation to glucose, galactose, and ribose (sensed through the chemoreceptor Trg), and L-Asp (sensed through the chemoreceptor Tar). We chose these effectors as they have reasonable structure-function relationships established in prior work, and had information available about their concentrations in fecal material. We present these data as a new Figure 5, and also provide videos clearly showing the responses to each treatment (Movies 7-10).

      This defined system provided several new insights that help understand and model indole taxis amidst other fecal effectors. First, the complete effector mixture, like fecal treatment, elicits attraction. Second, L-Ser is able to negate indole chemorepulsion in cotreatments of the two effectors, and also other chemoattractants in the absence of L-Ser also negate this repulsion, albeit to a lesser degree, helping to explain why the tsr mutant still shows attraction to fecal material. Lastly, we also show that the degree of attraction in this system is controlled by indole, with mixtures containing greater indole showing less attraction. We feel this is an important addition to the study because it provides a new view on how indole-taxis functions in pathogen colonization; rather than causing the pathogen to swim away (like pure indole does) indole helps the pathogen rank and prioritize its attraction to fecal effector mixtures, biasing navigation toward lower indolecontaining niches.

      We also acknowledge that this defined system does not capture all possible interactions. Indeed, there are even a few chemoreceptors in Salmonella for which the sensing functions remain poorly understood. Nonetheless, we believe the data offer mechanistic context for understanding fecal attraction and suggest that factors beyond Tsr, L-Ser, and indole also contribute to the observed behaviors, aligning with other data we present.

      (3) Provide new data that show that E. coli MG1655, and disease-causing clinical isolate strains of the Enterobacteriaceae Tsr-possessing species E. coli, Citrobacter koseri, and Enterobacter cloacae exhibit fecal attraction (Fig. 4).

      An important new finding from this study is our direct test of whether indole-rich fecal material elicits repulsion. Contrary to expectations, given that for E. coli indole is a wellcharacterized strong chemorepellent, we show that fecal material instead elicits attraction in non-typhoidal Salmonella.

      Reviewers raised the question of whether our observations regarding indole taxis and attraction to indole-rich feces in Salmonella are similar or relevant to E. coli. While a full dissection of indole taxis in E. coli is beyond the scope of this study and has been the focus of extensive prior research, we sought to address this point by examining whether other enteric pathogens respond similarly to the native indole reservoir, fecal material. To this end, we present new data demonstrating that, like S. Typhimurium, E. coli and other representative enteric pathogens and pathobionts possessing Tsr are also attracted to indole-rich feces (Fig. 4, Movies 4–6, Fig. S4).

      Notably, these new results represent some of the first characterizations of chemotactic behavior in the clinical isolates we examined, including E. coli NTC 9001 (a urinary tract infection isolate), Citrobacter koseri, and Enterobacter cloacae, adding another element of novelty to this work.

      (4) Repeated all of the explant Salmonella Typhimurium infection studies and added a new experimental control competition between WT and an invasion-deficient mutant (invA).

      Although our new colonic explant system was noted as a novelty and strength of this work, it was also seen as a weakness in that some of the results were surprising and difficult to link to chemotactic behavior. Reviewer 3 also brought up the need to be clear about our usage of the term ‘invasion’ in reference to S. Typhimurium entering nonphagocytic host cells, and requested we test an invasion-inhibited mutant (which we do in new experiments, now Fig. S1). We also note that some of the interpretations of these data were made challenging by result variability.

      To help address these issues we performed additional replicates for all of our explant experiments (contained within Figure 1, Fig. S1-S2, and Data S1), to provide greater power for our analyses. These new data provide a clearer view of this system that revise our interpretations from the prior version of this study. While treatment with indole alone does suppress the WT advantage over chemotactic mutants for both total colonization and cellular invasion, essentially all other treatments have a similar result with a timedependent increase in both colonization and invasion, dependent on chemotaxis and Tsr. A remaining unique feature of fecal treatment is an increase in the cellular invaded population of the cells at 3 h post-infection. As requested by Reviewer 3, we provide new experimental data showing that in competitions between WT and an invasion-deficient mutant (invA), with fecal material pretreatment, we see the WT has an advantage only for the gentamicin-treated qualifications, providing some support that our model selects for the invaded sub-population. Although we note that the invA still can invade through alternative mechanisms (as discussed in earlier work such as here: https://doi.org/10.1111/1574-6968.12614), so the relative amount of presumed cellular invasion is less than WT, and not zero, in our experiments (Fig. S1).

      One point of confusion in the previous version of the text was the assay design for the explant experiments, which is important to understand in order to interpret the results. During the explant infection bacteria are not immersed in the effector treatment solution, rather the tissue is soaked in the effector solution beforehand and then exposed to a 300 µl buffer solution containing the bacteria. This means that the bacteria experience only the residue of that treatment at concentrations far lower. We have added clarity about this through revising Fig. 1 to include a conceptual diagram of the assay (Fig. 1C), and added a new supplementary Fig. S5 that summarizes the explant data in this same conceptual model. We provide detail on the method in the text in lines 115-137. In describing the results, and synthesizing them in the discussion, we now state:

      Line 112: “This establishes a chemical gradient which we can use to quantify the degree to which different effector treatments are permissive of pathogen association with, and cellular invasion of, the intestinal mucosa (Fig. 1C).”

      And, a new section in the discussion devoted to describing the explant infections:

      Line: 366: “Our explant experiments can be thought of as testing whether a layer of effector solution is permissive to pathogen entry to the intestinal mucosa, and whether chemotaxis provides an advantage in transiting this chemical gradient to associate with, and invade, the tissue (Fig. 1C, Fig. S5).”

      As mentioned above, we have honed the text to focus on the disparity between the effects of indole alone versus treatments with indole-rich feces to help clarify how these data advance our understanding of the indole taxis in directing pathogenesis. While our explant studies still confirm the role of factors other than L-Ser, indole, and Tsr in directing Salmonella infection and cellular invasion, we now include further analyses of other fecal effectors (described above) that provide some insights into how fecal effectors have some redundancy in their impact.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study shows, perhaps surprisingly, that human fecal homogenates enhance the invasiveness of Salmonella typhimurium into cells of a swine colonic explant. This effect is only seen with chemotactic cells that express the chemoreceptor Tsr. However, two molecules sensed by Tsr that are present at significant concentrations in the fecal homogenates, the repellent indole and the attractant serine, do not, either by themselves or together at the concentrations in which they are present in the fecal homogenates, show this same effect. The authors then go on to study the conflicting repellent response to indole and attractant response to serine in a number of different in vitro assays.

      Strengths:

      The demonstration that homogenates of human feces enhance the invasiveness of chemotactic Salmonella Typhimurium in a colonic explant is unexpected and interesting. The authors then go on to document the conflicting responses to the repellent indole and the attractant serine, both sensed by the Tsr chemoreceptor, as a function of their relative concentration and the spatial distribution of gradients.

      Thank you for your summary and acknowledgement of the strengths of this work. We hope the revised text and additional data we provide further improve your view of the study.

      Weaknesses:

      The authors do not identify what is the critical compound or combination of compounds in the fecal homogenate that gives the reported response of increased invasiveness. They show it is not indole alone, serine alone, or both in combination that have this effect, although both are sensed by Tsr and both are present in the fecal homogenates. Some of the responses to conflicting stimuli by indole and serine in the in vitro experiments yield interesting results, but they do little to explain the initial interesting observation that fecal homogenates enhance invasiveness.

      Thank you for noting these weaknesses. We have provided new data using a defined mixture of fecal effectors to further investigate the roles of L-Ser, indole, and other effectors present in feces that we did not initially study. We have refined our discussion of these results to hopefully improve the clarity of our conclusions. We show now both in explant studies (Fig. 1I) and chemotaxis responses to a defined fecal effector system (Fig. 5) that L-Ser is able to abolish both the suppression of indole-mediated WT advantage and also indole chemorepulsion, respectively. We also show the latter can be accomplished by other fecal chemoattractants (Fig. 5). This is in line with our earlier finding that Tsr, the sensor of indole and L-Ser, is an important mediator of fecal attraction but not the sole mediator.

      As this reviewer points out, there are indeed other factors mediating invasion that we do not elucidate here, but we do note these possibilities in the text (lines: 125-127):

      “This benefit may arise from a combination of factors, including sensing of host-emitted effectors, redox or energy taxis, and/or swimming behaviors that enhance infection [5,30,31,35].”

      Reviewer #2 (Public review):

      Summary:

      The manuscript presents experiments using an ex vivo colonic tissue assay, clearly showing that fecal material promotes Salmonella cell invasion into the tissue. It also shows that serine and indole can modulate the invasion, although their effects are much smaller. In addition, the authors characterized the direct chemotactic responses of these cells to serine and indole using a capillary assay, demonstrating repellent and attractant responses elicited by indole and serine, respectively, and that serine can dominate when both are present. These behaviors are generally consistent with those observed in E. coli, as well as with the observed effects on cell invasion.

      Strengths:

      The most compelling finding reported here is the strong influence of fecal material on cell invasion. Also, the local and time-resolved capillary assay provides a new perspective on the cell's responses.

      Thank you for acknowledging these aspects of the study.

      Weaknesses:

      The weakness is that indole and serine chemotaxis does not seem to control the fecal-mediated cell invasion and thus the underlying cause of this effect remains unclear.

      In addition, the fact that serine alone, which clearly acts as a strong attractant, did not affect cell invasion (compared to buffer) is somewhat puzzling. Additionally, wild-type cells showed nearly a tenfold advantage even without any ligand (in buffer), suggesting that factors other than chemotaxis might control cell invasion in this assay, particularly in the serine and indole conditions. These observations should probably be discussed.

      Addressed above.

      Final comment. As shown in reference 12, Tar mediates attractant responses to indole, which appear to be absent here (Figure 3J). Is it clear why? Could it be related to receptor expression?

      Thank you for noting this. We now mention this in the discussion. In the course of this work, we encountered a number of apparent inconsistencies, or differences, between what we were observing with S. Typhimurium and what had been reported previously in studies of Tsr function in E. coli. We indeed noted that some studies had investigated a role of Tar for indole taxis (in E. coli), hence why we determined whether, and confirmed, that Tsr is required for indole taxis for S. Typhimurium (Fig. 6).

      We do not know the reason for this apparent difference between the two bacteria, but we have previously shown with our same strain of S. Typhimurium IR715, under the same growth assay, and preparation protocol, that L-Asp is a strong chemoattractant for both WT and the tsr mutant (see Glenn et al. 2024, eLife, Fig. 5G: https://iiif.elifesciences.org/lax:93178%2Felife-93178-fig5-v1.tif/full/1500,/0/default.jpg).

      This supports that this strain of Salmonella indeed has a functional Tar present and is expressed at a level sufficient for sensing L-Asp. So, if Tar generally mediates indole sensing we do not know why we would not see that in Salmonella. Hence, we do not see any role for Tar in indole chemorepulsion in our strain of study, which is different than reported for E. coli, but we cannot confirm the reason.

      Reviewer #3 (Public review):

      Summary:

      In this manuscript, Franco and colleagues describe careful analyses of Salmonella chemotactic behavior in the presence of conflicting environmental stimuli. By doing so, the authors describe that this human pathogen integrates signals from a chemoattractant and a chemorepellent into an intermediate "chemohalation" phenotype.

      Strengths:

      The study was clearly well-designed and well-executed. The methods used are appropriate and powerful. The manuscript is very well written and the analyses are sound. This is an interesting area of research and this work is a positive contribution to the field.

      Thank you for your comments.

      Weaknesses:

      Although the authors do a great job in discussing their data and the observed bacterial behavior through the lens of chemoattraction and chemorepulsion to serine and indole specifically, the manuscript lacks, to some extent, a deeper discussion on how other effectors may play a role in this phenomenon. Specifically, many other compounds in the mammalian gut are known to exhibit bioactivity against Salmonella. This includes compounds with antibacterial activity, chemoattractants, chemorepellers, and chemical cues that control the expression of invasion genes. Therefore, authors should be careful when making conclusions regarding the effect of these 2 compounds on invasive behavior.

      Thank you for this comment, and we agree with your point. We hope we have revised the text and provided new data to address your concern. We have also chosen for clarity to keep our text close to our experimental data and so have refrained from speculating about some topics, even though you are absolutely correct about the immense complexity of these systems.

      It is important that the word invasion is used in the manuscript only in its strictest sense, the ability displayed by Salmonella to enter non-phagocytic host cells. With that in mind, authors should discuss how other signals that feed into the control of Salmonella invasion can be at play here.

      Thank you for your recommendation. We have revised the text to hopefully be clearer on our meaning of invasion in regard to Salmonella entering non-phagocytic host cells, essentially changing our usage to ‘cellular invasion’ throughout.

      It is also a commonly-used phrase in reference to enteric infections and the colonization resistance conferred by the microbiome to refer to ‘invading pathogens’ (i.e. invasion in the sense of a new microbe colonizing the intestines), For instance, this recent review on Salmonella makes use of the term invading pathogen (https://www.nature.com/articles/s41579-021-00561-4). We acknowledge the confusion by this dual use of the term. We have mostly removed our statements using invasion in this context. We hope our language is clearer in this revised version.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      It was difficult to understand the true intent or importance of the study described in this manuscript. The first figure in the paper showed that a Salmonella Typhimurium strain lacking either CheY, and thus incapable of any chemotaxis, or the Tsr chemoreceptor, and thus incapable of sensing serine or indole, was modestly inferior to the wild-type version of that strain in invading the cells of a swine colonic explant. It then showed that, in the presence of a human fecal homogenate, the wild-type strain had a much greater advantage in invading the colonic cells. Thus, the presence of the fecal homogenate significantly increased invasiveness in a way that depends on chemotaxis and the Tsr chemoreceptor.

      As human feces were determined to contain 882 micromolar indole and 338 micromolar serine, the effects of those concentrations of either indole or serine alone or in combination were tested. The somewhat surprising finding was that neither indole nor serine alone nor in combination changed the result from the experiment done with just buffer in the colonic explant.

      The clear conclusion of this initial study is that both chemotaxis in general and chemotaxis mediated by Tsr improve the invasiveness of S. Typhimurium. They provide a much bigger advantage in the presence of human feces. However, two molecules present in the feces that are sensed by Tsr, serine, and indole, seem to have no effect on invasiveness either alone or in combination.

      At this point, the parsimonious interpretation is that there is something else in human feces that is responsible for the increased invasiveness, and the authors acknowledge this possibility. However, they do not take what appears to be the obvious approach: to look for additional factors in human feces that might be responsible, either by themselves or in combination with indole and/or serine, for the increased invasiveness. Instead, they carry out a detailed examination of the counteracting effects of indole as a repellent and of serine as an attractant as a function of their relative concentrations and their spatial distributions.

      Thank you for your comments. In our revised version, we have undertaken some additional studies of other fecal effectors that help better understand the relationship between L-Ser and indole, but also the roles of other chemoattractants (glucose, galactose, ribose, L-Asp) in mediating fecal attraction (Fig. 5). We agree with the reviewer and conclude that fecal attraction and the cell invasion phenotype mediated by fecal treatment are influenced by factors other than only Tsr, indole, and L-Ser. Our new data do show that L-Ser is sufficient to block both the invasion suppression effects of indole (negating the WT advantage) and also indole chemorepulsion, therefore making our detailed examination of the counteracting effects more relevant for understanding this system.

      What they find is what other studies have shown, primarily with S. Typhimurium's relative, the gamma-proteobacterium Escherichia coli.

      At high indole and low serine concentrations, the repulsion by indole wins out. At low indole and high serine concentrations, attraction by serine wins out. What is perhaps novel is what happens at an intermediate ratio of concentrations. Repulsion by indole dominates at short distances from the source, so there is a zone of clearing. At longer distances, attraction by serine dominates, so there is an accumulation of cells in a "halo" around the zone of clearing. Thus, assuming that serine and indole diffuse equally, the repulsive effect of indole dominates until its concentration falls below some critical level at which the concentration of serine is still high enough to exert an attractive effect.

      They go on to show, using ITC, that serine binds to the periplasmic ligand-binding domain (LBD) of Tsr, something that has been studied extensively with very similar E. coli Tsr.

      They also show that indole does not bind to the Tsr LBD, which also is known for E. coli Tsr.

      This would be newsworthy only if the results were different for S. Typhimurium than for E. coli. As it is, it is merely confirmatory of something that was already known about Tsr of enteric bacteria.

      An idea that the authors introduce, if I understand it correctly, is that a repellent response to something in feces, perhaps indole, drives S. Typhimurium chemotactically competent cells out of the colonic lumen and promotes invasion of the bacteria into the cells of the colonic lining. If the feces contain both an attractant and a repellent, bacteria might be attracted by the feces to the lining of the intestine and then enter the colonic cells to escape a repellent, perhaps indole. That is an interesting proposition.

      In summary, I think that the initial experimental approach is fine. I do not understand the failure to follow up on the effect of the fecal homogenates in promoting invasion by chemotactic bacteria possessing Tsr. It seems there must be something else in the homogenates that is sensed by Tsr. Other amino acids and related compounds are also sensed by Tsr. Perhaps it is energy or oxygen taxis, which is partially mediated by Tsr, as the authors acknowledge.

      Much of the work reported here is quasi-repetitive with work done with E. coli Tsr. Minimally, previous work on E. coli Tsr should be explained more thoroughly rather than dealt with only as a citation.

      Thank you for your comments.

      We would like to confirm our agreement that E. coli and S. enterica indeed possess similarities. They are Gammaproteobacteria and inhabit/infect the gut. But also we note they diverged evolutionarily during the Jurassic period (ca. 140 million years ago, see: PMC94677). In the context of colonizing humans, the former is a pathobiont, indoleproducer, and a native member of the microbiome, whereas the latter is a frank pathogen and does not produce indole. Hence, there are many reasons to believe one is not an approximate of the other, especially when it comes to causing disease.

      We agree that much of what is known about indole taxis has come from excellent studies in well-behaved laboratory strains of E. coli, a powerful model. We believe that expanding this work to include clinically relevant pathogens is important for understanding its role in human disease. In this study, we contribute to that broader understanding by providing new mechanistic insights into Tsr-mediated indole taxis in S. Typhimurium, along with data demonstrating fecal attraction in other enteric pathogens and pathobionts. These findings help define a more general role for Tsr in enteric colonization and disease. While some of our results indeed confirm and extend prior findings, we respectfully believe that such confirmation in relevant pathogenic strains adds value to the field.

      Regarding our ITC studies, to our knowledge no other study has investigated, using ITC whether indole does or does not bind the LBD (which we show it does not), nor investigated whether it interferes with L-Ser sensing (which we show it does not). Hence, these are not duplicate findings, although we do acknowledge this leaves the mechanism of indolesensing undiscovered. If we are incorrect in this regard, please provide us a citation and we will be happy to include it and revise our comments.

      We now clarify in the text on lines 378-381: “While these leave the molecular mechanism of indole-sensing unresolved, it does eliminate two possibilities that have not, to our knowledge, been tested previously. Overall, our data add support to the hypothesis that a non-canonical sensing mechanism is employed by Tsr to respond to indole [8,18,69].”

      Lastly, as noted by the reviewer, and which we mention in the text, essentially all prior studies on indole taxis were conducted in E. coli, and this is not what is new and novel about the work we present, which is focused on S. Typhimurium and testing the prediction that fecal indole protects against pathogen invasion. We have added in a few additional points of comparisons between our results and prior studies. While we appreciate that much understanding has come from E. coli as a model for indole taxis, we feel discussing prior work in extensive detail would be more suitable for a review and would occlude our new findings about Salmonella, and other enterics.

      In an earlier version of the manuscript, we included more background on E. coli indole taxis. However, we found that the historical literature in this area was somewhat inconsistent, with different assays using varying time points and indole concentrations, often leading to results that were difficult to reconcile. Providing sufficient context to explain these discrepancies required considerable space and, ultimately, detracted from the focus of our current study. Hence, we have only brought in comparisons with E. coli where most relevant to the present work. Also, we provide new data that E. coli also exhibits fecal attraction, and so there is reason to believe the mechanisms we study here are also relevant to that system.

      Some minor points

      (1) Hyphens are not needed with constructs like "naturally occurring" or "commonly used".

      Thank you. Revisions made throughout.

      (2) The word "frank" as in "frank pathogen" seems odd. It seems "potent" would be better.

      Thank you for this comment. Per your recommendation, we have removed this term.

      The term ‘frank pathogen’ is standard usage in the field of bacterial pathogenesis in reference to a microbe that always causes disease in its host (in this case humans) and causes disease in otherwise healthy hosts (example: https://www.sciencedirect.com/science/article/pii/S1369527420300345). We actually used this specific term to distinguish an aspect of novelty of our study because E. coli can, sometimes, be a pathogen (i.e. a pathobiont) and of course E. coli indole taxis has been previously studied. Ours is the first study of indole taxis in a frank pathogen.

      (3) It is unnecessary to coin a new word, chemohalation, to describe a phenomenon that is a simple consequence of repulsion by higher concentrations of a repellent and attraction by lower concentrations of attractant to generate a halo pattern of cell distribution.

      Thank you for your opinion on this. We have softened our statements on this point, and in the newly revised version of the text less space is devoted to this idea. We now state in line 304-307:

      “There exists no consensus descriptor for taxis of this nature, and so we suggest expanding the lexicon with the term “chemohalation,” in reference to the halo formed by the cell population, and which is congruent with the commonly-used terms chemoattraction and chemorepulsion.”

      We appreciate the reviewer’s perspective and agree that the behavior we describe can be viewed as the result of competing attractant and repellent cues. However, we find that the traditional framework of “chemoattraction” and “chemorepulsion” is often insufficient to describe the spatial positioning behaviors we observe in our system. In our experience presenting and discussing this work, especially with audiences outside the chemotaxis field, it has been challenging to convey these dynamics clearly using only those two terms.

      For this reason, we introduced the term chemohalation to describe this more nuanced behavior, which appears to reflect a balance of signals rather than a simple unidirectional response. More bacteria enter the field of view, but they are clearly positioned differently than regular ‘chemoattraction.’ We also note that Reviewers 2 and 3 did not raise concerns about the term, and after careful consideration, we have opted to retain it in the revised manuscript.

      Reviewer #2 (Recommendations for the authors):

      Lines 143-156 seem somewhat overcomplicated and may be confusing. For example: in line 143: "However, when colonic tissue was treated with purified indole at the same concentration, the competitive advantage of WT over the chemotactic mutants was abolished compared to fecaltreated tissue...". But indole was tested alone, so it did not abolish the response; rather the absence of fecal material did.

      We appreciate your point. We have made revisions throughout to help improve the clarity of how we discuss the explant infection data and provide new visuals to help explain the experiment and data (Fig. 1C, Fig. S5).

      Reviewer #3 (Recommendations for the authors):

      (1) Line 46 - Are references 9-11 really about topography?

      Thank you. You are correct. Revised and eliminated this statement.

      (2) Lines 87-89 - It seems to me that a bit more information on this would be helpful to the reader.

      In our revision of the text, to make it more centered on our primary findings of the differences between indole taxis when indole is the sole effector versus amidst other effectors, we have removed this section.

      (3) Line 112 - When mentioning the infection of the cecum and colon, authors should specify that this is in mice.

      Thank you for this comment. In our revised version we provide references both for animal model infections and work in human patients (ex: https://www.sciencedirect.com/science/article/abs/pii/S0140673676921000)

      We have revised our statement to be (Line 99-100: “Salmonella Typhimurium preferentially invades tissue of the distal ileum but also infects the cecum and colon in humans and animal models [42–46].”

      (4) Lines 122-123 - Authors state that "This experimental setup simulates a biological gradient in which the effector concentration is initially highest near the tissue and diffuses outward into the buffer solution.". Was this experimentally demonstrated? If not, authors should tone this down.

      We have removed this comment and instead present a conceptual diagram illustrating this idea (Fig. 1C). Also, addressed by above.

      (5) When looking at the results in Figure 1, I wonder what the results of this experiment would be if the authors tested an invasion mutant of Salmonella. In a strain that is able to perform chemotaxis (attraction and repulsion) but unable to actively invade, would there be a phenotype here? Is it possible that the fecal material affects cellular uptake of Salmonella, independently of active invasion? I don't think the authors necessarily need to perform this experiment, but I think it could be informative and this possibility should at least be discussed.

      Thank you for your comments and suggestions. We have included new data of an explant co-infection experiment with WT and an invasion-deficient mutant invA (Fig. S1). Under these conditions, WT exhibits an advantage in the gentamicin-treated homogenate, but not the untreated homogenate, suggestive of an advantage in cellular invasion.

      However, we did not repeat all experiments with this genetic background. We felt that would be outside the scope of this work, and would probably require dual chemotaxis/invA deletions to assess the impact of each, which also could be difficult to interpret. The hypothesis mentioned by the Reviewer is possible, but we were not able to devise a way to test this idea, as it seems we would need to deactivate all other mechanisms of Salmonella invasion.

      (6) Lines 137-140 - Because this is a competition experiment and results are plotted as CI, the reader can't readily assess the impact of human feces on invasion by WT Salmonella.

      Thank you for pointing this out. We want to mention that the data are plotted as CI in the main text, but the supplemental contains the disaggregated CFU data (Fig. S1-2) and the numerical values (Data S1).

      Please include the magnitude of induction in this sentence, compared to the buffer control.

      The text of this section has been changed to account for new data.

      Additionally, although unlikely, the presence of the chemotaxis mutants in the same infection may be a confounding factor. In order to irrefutably ascertain that feces induces invasion, I suggest authors perform this experiment with the wildtype strain (and mutant) alone in different conditions.

      Thank you for this suggestion, although after careful consideration we have decided not to repeat these explant studies with monoinfections. Coinfections are a common tool in Salmonella pathogenesis studies, including prior chemotaxis studies which our work builds upon (ex: https://pmc.ncbi.nlm.nih.gov/articles/PMC3630101/). The explant experiments, even controlling as many aspects as we did, still show lots of variability and one way to mitigate this is through competition experiments so that each strain experiences the same environment.

      We agree that a cost of this approach is that one strain may affect the other, or may alter the environment in a way that impacts the other. Thus, the resulting data must also be understood through this lens. We have revised the text to stay closer to the competitive advantage phenotype.

      (7) Line 150 - Authors state that bacterial loads are similar. However, authors should perform and report statistical analyses of these comparisons, at least in the supplementary data.

      We have removed this statement as requested. We do note, however, that the mean CFU values across treatments at identical time points appear qualitatively similar, which is an observation that does not require statistical testing.

      (8) Lines 154-154 - This seems incorrect, as the effect observed with the mixture of indole and serine is very similar to the addition of serine alone. Therefore, there was no "neutralization" of their individual effects.

      We have revised this statement.

      (9) Line 159-161 - I strongly suggest authors reword this sentence. I don't think this is the best way to describe these results. The stronger phenotype observed was with the fecal material. Therefore, it is the indole (alone) condition that does not "elicit a response". Focusing on indole too much here ignores everything else that is present in feces and also the fact that there was a drastic phenotype when feces were used.

      Thank you for your opinion on this. We believe this is one of the ways in which our earlier draft was unclear. It was actually a primary motivation of this work to test whether there were differences in pathogen infection, mediated by chemotaxis, in the presence of indole as a singular effector or in its near-native context in fecal material, and our revised text centers our study around this question. We believe this distinction is important for the reasons mentioned earlier.

      Relative to buffer treatment, indole changes the behavior of the system, eliminating the WT advantage, and this is the effect we refer to. We have made many revisions to the text of these sections and hope it better conveys this idea. We expect we may still have differences regarding the interpretation of these results, but regardless, thank you for your suggestions and we have tried to implement them to improve the clarity of the text.

      (10) Line 162 - Again, I disagree with this. Indole does not have an effect to be cancelled out by serine.

      Addressed above, and this text has been changed. Also, we provide new chemotaxis data that at fecal-relevant concentrations of indole and L-Ser, indole chemorepulsion is overridden (Fig. 5).

      (11) Lines 166-168 - Again, this is a skewed analysis. Indole and serine could not possibly provide an "additive effect" since they do not provide an effect alone. There is nothing to be added.

      This text has been deleted.

      (12) Lines 168-170 - Most of the citations provided to this sentence are inadequate. Our group has previously shown that the mammalian gut harbors thousands of small molecules (Antunes LC et al. Antimicrob Agents Chemother 2011). You obviously do not have to cite our work, but there is significant literature out there about the complexity of the gut metabolome.

      Thank you for this comment. We have revised this particular text, but do make mention of potential other effectors driving these effects, which was also requested by the other reviewers.

      Your work and others indeed support there being thousands of molecules in the gut, but our work centers on chemotaxis, and bacteria have a small number of chemoreceptors and only sense a very tiny fraction of these molecules as effectors. Since the impacts of infection of the explants depends on chemotaxis, we keep our comments restricted to those, but agree that there are likely many interactions involved, such as those impacting gene expression.

      Please note our more detailed description of the explant infection assay (and shown in Fig. 1C) that may change your view on the significance of non-chemotaxis effects. The bacteria only experience the effectors at low concentration, not the high concentration that is used to soak and prepare the tissue prior to infection.

      (13) Figure 2 - The letter 'B' from panel B is missing.

      Thank you very much for bringing this oversite to our attention. We have fixed this.

      (14) Legend of Figure 3 - Panel J is missing a proper description. Figure legends need improvement in general, to increase clarity.

      Thank you for noting this. This is now Fig. 6E. We have provided an additional description of what this panel shows. We have edited the legend text to read: “E. Shows a quantification of the relative number of cells in the field of view over time following treatment with 5 mM indole for a competition experiment with WT and tsr (representative image shown in F).”

      We also have made other edits to figure legends to improve their clarity and add additional experimental details and context. By breaking up larger figures into smaller figures, we also hope to have improved the clarity of our data presentation.

      (15) Lines 264-265 - Maybe I am missing something, but I do not see the ITC data for serine alone.

      We have clarified in the text that this was measured in our previous study https://elifesciences.org/articles/93178). The present study is a ‘Research Advance’ article format, and so builds on our prior observation.

      We have revised the text to read: “To address these possibilities, we performed ITC of 50 μM Tsr LBD with L-Ser in the presence of 500 μM indole and observed a robust exothermic binding curve and KD of 5 µM, identical to the binding of L-Ser alone, which we reported previously (Fig. 6H) [36].”

      (16) Lines 296-297 - What is the effect of these combinations of treatments on bacterial cells? I commend the authors for performing the careful growth assays, but I wonder if bacterial lysis could be a factor here. I am not doubting the effect of chemotaxis, but I am wondering if toxic effects could be a confounding factor. For instance, could it be that the "avoidance" close to the compound source and subsequent formation of a halo suggest bacterial death and lysis? I suggest the authors perform a very simple experiment, where bacteria are exposed to the compounds at various concentrations and combinations, and cells are observed over time to ensure that no bacterial lysis occurs.

      Thank you for mentioning this possibility. If we understand correctly, the Reviewer is asking if the chemohalation effect we report could be from the bacteria lysing near the source. Our data actually argue against this possibility through a few lines of evidence.

      First, if this were the case in experiments with the cheY mutant, we would also see an effect near the source. But actually, in experiments with either the cheY mutant or the tsr mutant, neither of which can sense indole, the bacteria just ignore the stimulus and show an even distribution (see current Fig. 6F).

      Second, our calculations suggest that in the chemotaxis assay (CIRA), the bacteria only experience rather low local concentration of indole, mostly I the nM concentration range, because as soon as the effector treatment is injected into the greater volume, it is immediately diluted. This means the local concentration is far below what we see inhibits growth of the cells in the long run and may not be toxic (Fig. 7, Fig. S3).

      Lastly, in the representative video presented we can observe individual cells approach and exit the treatment (Movie 11). Due to the above we have not performed additional experiments to test for lysis.

      (17) Lines 310-311 - Isn't this the opposite of the model you propose in Figure 5? The higher the concentration of indole in the lumen the more likely Salmonella is to swim away from it and towards the epithelium, favoring invasion, no?

      We appreciate the opportunity to clarify this point and apologize for any confusion caused. In response, we have revised the text to place less emphasis on chemohalation, and the specific statement and model in question have now been removed. Instead, we provide a summary of our explant data in light of the other analyses in the study (Fig. S5).

      What we meant here was in relation to the microscopic level, not whether or not a host/intestine is colonized. To put it another way, we think our data supports that the pathogen colonizes and infects the host regardless of indole presence, but it uses indole as a means to prioritize which tissues are optimal for colonization at the microscopic level. The prediction made by others was that bacteria swim away from indole source and therefor this could prevent or inhibit pathogen colonization of the intestines, which our data does not support.

      (18) Lines 325-326 - Maybe, but feces also contain several compounds with antibacterial activity, as well as other compounds that could elicit chemorepulsion. This should be stated and discussed.

      We have removed this statement since we did not explicitly test the growth of the bacteria with fecal treatments. We have refrained from speculating further in the text since we do not have direct knowledge of how that relationship with differing effectors could play out.

      We agree with the reviewer that the growth assays are reductionist and give insight only into the two effectors studied. We provide evidence from several different types of enterics that they all exhibit fecal attraction, and it seems unlikely the bacteria would be attracted to something deleterious, but we have not confirmed.

      (19) Lines 371-374 - How preserved (or not) is the mucus layer in this model? The presence of an inhibitory molecule in the lumen does not necessarily mean that it will protect against invasion. It is possible that by sensing indole in the lumen Salmonella preferentially swims towards the epithelium, thus resulting in enhanced evasion.

      The text in question has been removed. However, we acknowledge the reviewer’s point, and that these explant tissues do not fully model an in vivo intestinal environment. Other than a gentle washing with PBS to remove debris prior to the experiment the tissue is not otherwise manipulated, and feasibly the mucus layer is similar to its in vivo state.

      In mentioning this hypothesis about indole, which our data do not support, we were echoing a prediction from the field, proposed in the studies we cite. We agree with the reviewer that there were other potential outcomes of indole impacting chemotaxis and invasion, and indeed our data supports that.

      (20) Lines 394-395 - The authors need to remember that the ability to invade the intestinal epithelium is not only a product of chemoattraction and repulsion forces. Several compounds in the gut are used by Salmonella as cues to alter invasion gene expression. See PMID: 25073640, 28754707, 31847278, and many others.

      Thank for you for this point, and we now include these citations. We have revised the text in question, stating:

      “In addition to the factors we have investigated, it is already well-established in the literature that the vast metabolome in the gut contains a complex repertoire of chemicals that modulate Salmonella cellular invasion, virulence, growth, and pathogenicity [79–81].”

      Our intent is not to diminish the role of other intestinal chemicals but rather to put our new findings into the context of bacterial pathogenesis. We do provide evidence that specific chemoeffectors present in fecal material alter where bacteria localize through chemotaxis, which is one method of control over colonization.

      (21) Line 408 - I think it could be hard to observe this using your experimental approach.

      Because you need to observe individual cells, the number of cells you observe is relatively small. If, in a bet-hedging strategy, the proportion of cells that were chemoattracted to indole was relatively low you likely would not be able to distinguish it from an occasional distribution close to the repellent source. You may or may not want to discuss this.

      Thank you for this observation. It is indeed challenging to both observe large scale population behaviors and also the behaviors of individual cells in the same experiment. Our ability to make this distinction is similar to the approach used in the study we cite, so that is our comparison.

      But, if there was a subpopulation that was attracted we would predict a ‘bull’s-eye’ population structure, with some cells attracted and other avoiding the source, which we do not see - we see the halo. So, we find no evidence of the bet-hedging response seen in a different study using E. coli and using different time scales than we have.

      (22) Lines 410-411 - What could the other attractants be? Would it be possible/desirable to speculate on this?

      We have changed the text here, but we present new data that examines some of these other attractants (Fig. 5).

      (23) Line 431 - What exactly do you mean by "running phenotype"? Please, provide a brief explanation.

      We have removed this text, but a running phenotype means the swimming bacteria rarely make direction changes (i.e. tumbles), which has been associated with promoting contact with the epithelium, described in the references we cite. Hence, this type of swimming behavior could contribute to the effects we observe in the explant studies, potentially explaining some of the Tsr-mediated advantage that was not dependent on L-Ser/indole.

      (24) Line 441 - Other work has shown that feces contain inhibitors of invasion gene expression. The authors should integrate this knowledge into their model. In fact, indole has been shown to repress host cell invasion by Salmonella, so it is important that authors understand and discuss the fact that the impact of indole is multifaceted and not only a reflection of its action as a chemorepellent. PMID: 29342189, 22632036.

      We agree with the reviewer about this point, and mention this in the text (lines 55-57): “Indole is amphipathic and can transit bacterial membranes to regulate biofilm formation and motility, suppress virulence programs, and exert bacteriostatic and bactericidal effects at high concentrations [16–18,20–22].”

      We have added in the references suggested.

      What we test here is the specific hypothesis made by others in the field about indole chemorepulsion serving to dissuade pathogens from colonizing.

      For instance, the statement from: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0190613

      “Since indole is also a chemorepellent for EHEC [23], it is intriguing to speculate that in addition to attenuating Salmonella virulence, indole also attenuates the recruitment and directed migration of Salmonella to its infection niche in the GI tract.”

      And from: https://doi.org/10.1073/pnas.1916974117

      “We propose that indole spatially segregates cells based on their state of adaptation to repel invaders while recruiting beneficial resident bacteria to growing microbial communities within the GI tract.”

      And

      “Thus, foreign ingested bacteria, including invading pathogens such as E. coli O157:H7 and S. enterica, are likely to be prevented by indole from gaining a foothold in the mucosa.”

      As shown by others, indole certainly does have many roles in controlling pathogenesis, and there are other chemicals we do not investigate that control invasion and bacterial growth, but we keep our statements here restricted to chemotaxis since that is what are experiments and data show.

      (25) Line 472 - "until fully motile". How long did this take, how variable was it, and how was it determined?

      Thank you for asking for this clarification. We have added that the time was between 1-2 h, and confirmed visually. Our methods are similar to those described in earlier chemotaxis studies (ex: 10.1128/jb.182.15.4337-4342.2000).

      (26) Line 487 - I worry that the fact fecal samples were obtained commercially means that compound stability/degradation may be a factor to consider here. How long had the sample been in storage? Is this information available?

      Thank you for this question. We agree that the fecal sample we used serves as a model system and we cannot rule out that handling by the supplier could potentially alter its contents in some way that would impact bacterial chemosensing. However, we note that the measurements of L-Ser and indole we obtained are in the appropriate range for what other studies have shown.

      The fecal sample used for all work in the study were from a single healthy human donor, obtained from Lee Biosolutions (https://www.leebio.com/product/395/fecal-stool-samplehuman-donor-991-18). The supplier did not state the explicit date of collection, nor indicated any specific handline or storage methods that would obviously degrade its native metabolites, but we cannot rule that out. In our hands, the fecal sample was collected and kept frozen at -20 C. For research purposes, portions were extracted and thawed as needed, maintaining the frozen state of the original sample to limit degradation from freeze-thaws.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      In this paper, Li and colleagues overcome solubility problems to determine the structure of FtsEX bound to EnvC from E. coli.

      Strengths:

      The structural work is well done and the work is consistent with previous work on the structure of this complex from P. aerugionsa.

      Weaknesses:

      The model does not take into account all information that the authors obtained as well as known in vivo data.

      The work lacks a clear comparison to the Pseudomonas structure highlighting new information that was obtained so that it is readily available to the reader.

      The authors set out to obtain the structure of FtsEX-EnvC complex from E. coli. Previously, they were unable to do so but were able to determine the structure of the complex from P. aeruginosa. Here they persisted in attacking the E. coli complex since more is known about its involvement in cell division and there is a wealth of mutants in E. coli. The structural work is well done and recapitulates the results this lab obtained with this complex from P. aeruginosa. It would be helpful to compare more directly the results obtained here with the E. coli complex with the previously reported P. aeruginosa complex - are they largely the same or has some insight been obtained from the work that was not present in the previous complex from P. aeruginosa. This is particularly the case in discussing the symmetrical FtsX dimer binding to the asymmetrical EnvC, since this is emphasized in the paper. However, Figures 3C & D of this paper appear similar to Figures 2D & E of the P. aeruginosa structure. Presumably, the additional information obtained and presented in

      Figure 4 is due to the higher resolution, but this needs to be highlighted and discussed to make it clear to a general audience.

      The main issue is the model (Figure 6). In the model ATP is shown to bind to FtsEX before EnvC, however, in Figure 1c it is shown that ADP is sufficient to promote binding of FtsEX to EnvC.

      The work here is all done in vitro, however, information from in vivo needs to be considered. In vivo results reveal that the ATP-binding mutant FtsE(D162N)X promotes the recruitment of EnvC (Proc Natl Acad Sci U S A 2011 108:E1052-60). Thus, even FtsEX in vivo can bind EnvC without ATP (not sure if this mutant can bind ADP).

      Perhaps the FtsE protein from E. coli has to have bound nucleotides to maintain its 3D structure.

      Thank you for your thoughtful feedback and valuable suggestions. We have carefully revised the manuscript to address these concerns, incorporating additional analysis and discussion to enhance clarity and improve the accuracy of our interpretation.

      Regarding the relationship between EnvC binding and nucleotide binding to FtsEX, our previous study on P. aeruginosa FtsEX demonstrated that FtsEX can bind EnvC even in the absence of nucleotide (PMID: 37186861, Fig. 3C). However, for E. coli FtsEX (Fig. S1 in this study), ATP is required to stabilize the complex in vitro, preventing us from directly testing whether EnvC binding is ATP-dependent. The reviewer raised an important point about the FtsED162N mutant study, from which previous studies suggests that this mutant may still retain ATP binding, as observed in its homolog MacB (PMID: 29109272, PMID: 32636250). Additionally, previous work (PMID: 22006325) has shown that the PLD domain of FtsX can bind EnvC directly, even in the absence of the NBD domain, a finding further supported by Crow’s lab (PMID: 33097670). Taken together, these studies indicate that EnvC binding to FtsEX is likely nucleotideindependent, while ATP binding primarily stabilizes FtsE dimerization, reinforcing FtsEX complex formation.

      In line with these findings, our results suggest a stabilizing role of ATP in FtsEX assembly, whereas EnvC binding does not appear to be nucleotide-dependent. However, we acknowledge that the precise sequence of ATP binding and EnvC recruitment within the cell remains unresolved. To reflect this, we have revised the manuscript to incorporate these insights (L190-201, L445-451), clearly stated the limitations (L450-451, L887-890), and updated our model (Fig. 6) to avoid assigning a definitive sequence to EnvC and ATP binding.

      Additionally, we have strengthened the structural comparison between E. coli and P. aeruginosa FtsEX, as the reviewer suggested. We have now included a detailed comparative analysis (L282-306, Fig. S9), which reveals that the transmembrane and nucleotide-binding domains are highly superimposable. The primary structural distinction lies in a slight tilting difference in the bound EnvC, which appears to stem from the conformation of the X-lobes within the PLD domains. Highlighting these differences helps clarify how our new structural data provide additional insights beyond what was previously observed in P. aeruginosa.

      Reviewer #2 (Public Review):

      Summary:

      Peptidoglycan remodeling, particularly that carried out by enzymes known as amidases, is essential for the later stages of cell division including cell separation. In E. coli, amidases are generally activated by the periplasmic proteins EnvC (AmiA and AmiB) and NlpD (AmiC). The ABC family member, FtsEX, in turn, has been implicated as a modulator of amidase activity through interactions with EnvC. Specifically how FtsEX regulates EnvC activity in the context of cell division remains unclear.

      Strengths:

      Li et al. make two primary contributions to the study of FtsEX. The first, the finding that ATP binding stabilizes FtsEX in vitro, enables the second, structural resolution of fulllength FtsEX both alone (Figure 2) and in combination with EnvC (Figure 3). Leveraging these findings, the authors demonstrate that EnvC binding stimulates FtsEX-mediated ATP hydrolysis approximately two-fold. The authors present structural data suggesting EnvC binding leads to a conformational change in the complex. Biochemical reconstitution experiments (Figure 5) provide compelling support for this idea.

      Weaknesses:

      The potential impact of the study is curtailed by the lack of experiments testing the biochemical or physiological relevance of the model which is derived almost entirely from structural data.

      Altogether the data support a model in which interaction with EnvC, results in a conformational change stimulating ATP hydrolysis by FtsEX and EnvC-mediated activation of the amidases, AmiA and AmiB. However, the study is limited in both approach and scope. The importance of interactions revealed in the structures to the function of FtsEX and its role in EnvC activation are not tested. Adding biochemical and/or in vivo experiments to fill in this gap would allow the authors to test the veracity of the model and increase the appeal of the study beyond the small number of researchers specifically interested in FtsEX.

      Thank you for your thoughtful review and constructive feedback. We appreciate your recognition of our study’s contributions, particularly the structural resolution of fulllength E coli FtsEX, its interaction with EnvC, and our biochemical characterization of EnvC-stimulated ATP hydrolysis.

      We understand the importance of further biochemical and in vivo validation to support our model. While our study primarily provides a structural framework for understanding FtsEX function, many key residues identified in our E. coli structures have already been tested in prior cell physiological studies. For example, residues critical for the FtsEXEnvC interaction were examined in our collaborator David Roper’s lab in collaboration with Crow’s lab (PMID: 33097670, L319-321).

      With the structural blueprint provided by our full-length E. coli FtsEX-EnvC complex, we now have a foundation to explore several key functional aspects of this system. Future mutagenesis studies will help dissect the roles of specific residues in ATP binding/hydrolysis, coupling between the TMD and NBD domains, interactions between the PLD and TMD domains of FtsX, and signal transduction from the NBD, through the TMD and PLD to EnvC. Additionally, we aim to investigate how the symmetrical PLD domain recruits asymmetrical EnvC and how the dynamics of PLD of FtsX and CCD domains of EnvC contribute to the complex’s function.

      As these experiments require specialized expertise in cell physiology and PG degradation assays, we are actively collaborating with experts in these areas to pursue them. We are committed to furthering this work and providing deeper biochemical and in vivo insights into the function of the FtsEX complex in cell division.

      Reviewer #1 (Recommendations For The Authors):

      (1) As mentioned, two things could strengthen the paper. One is to take into account that ADP or possibly nucleotide-free FtsEX can bind EnvC. The second is to highlight any differences between the structures from E. coli and P. aeruginosa.

      Thank you for these insightful suggestions. In our revision, we have (1) carefully considered the possibility of EnvC binding independently of nucleotide and (2) have incorporated a detailed comparison between the newly obtained E. coli FtsEX/EnvC structure and that of P. aeruginosa.

      Regarding the relationship between EnvC binding and ATP binding to FtsEX, our previous study on P. aeruginosa FtsEX demonstrated that FtsEX can bind EnvC in the absence of nucleotide (PMID: 37186861, Fig 3C). However, for E. coli FtsEX systems (Fig S1 in this study), ATP is necessary for FtsEX stabilization in vitro, which limited us from further directly testing whether EnvC binding is ATP-dependent or not.

      We appreciate the reviewer’s reference to the FtsE(D162N) mutant study. Previous studies suggest that D162N mutant may still retain ATP binding, similar to its homolog MacB (PMID: 29109272; PMID: 32636250). Additionally, findings from Winkler’s lab (PMID: 22006325) indicate that the PLD domain of FtsX can bind EnvC directly, even in the absence of the NBD domain, a result further supported by study from Crow’s lab (PMID: 33097670). Collectively, these studies suggest that EnvC binding to FtsEX is nucleotide-independent, while ATP binding likely stabilizes FtsE dimerization, thereby reinforcing FtsEX complex formation, as the reviewer suggested.

      Thus, consistent with previous studies, our results so far support a stabilizing role of ATP in FtsEX assembly, while EnvC binding itself does not appear to be nucleotidedependent. However, the available evidence remains inconclusive, and the precise sequence of ATP binding and EnvC recruitment within the cell is still unclear. In our revision, we have now incorporated these analyses in L190-201 and L445-451, stated the limitations (L450-451 and L887-890) and updated our model (Fig. 6) to avoid assigning a definitive sequence to EnvC and ATP binding.

      For the structural comparison between E. coli and P. aeruginosa FtsEX, we have added a detailed analysis in L282-306 and Supplementary figure 9. In summary, we found that the transmembrane domain and nucleotide-binding domain are highly superimposable, with only minor differences observed. The primary distinction lies in a slight tilting difference in the bound EnvC, which appears to come from the conformation of the X-lobes within the PLD domains.

      (2) Line 129. Concerning the role of ATP in stabilizing the complex. It is clear that ADP can do it as well (Figure 1c). This is mentioned in line 131 but not considered in the model.

      Thank you for pointing this out. We have now revised the relevant sections in the manuscript (L190-201 and L445-451) and updated the model (Fig 6) accordingly. In the revised manuscript, we acknowledge the reviewer’s point that ATP may primarily serve to stabilize the FtsEX complex. Additionally, we have explicitly clarified that EnvC binding appears to be nucleotide-independent. Regarding the model, we state that the current study does not provide sufficient evidence to determine the precise sequence of EnvC and ATP binding to FtsEX in the cell. We believe these revisions, incorporating the reviewer’s suggestions, improve the accuracy of our interpretation.

      Reviewer #2 (Recommendations For The Authors):

      (1) The introduction is written for an audience with significant expertise in bacterial PG synthesis and is thus difficult for those outside the field to follow.

      Thank you for your feedback. We have revised the introduction, particularly the first passage (L51–63), to improve readability and make it more accessible to a broader audience.

      (1) Figure 1: Please express ATP hydrolysis data in ATP/FtsEX/minute. (It is currently nmol/mg/min).

      Changed accordingly, thank you!

      (2) Figure 4: Please clarify in the legend and in the figure itself which structures correspond to full-length data from cryoEM data or truncated (FtsEX-PLD domain) protein data from previous crystallographic studies.

      Both the FtsEX and FtsEX/EnvC complex structures shown in Figure 4 were obtained from our cryo-EM data using full-length proteins. To avoid any confusion, we have now further clarified this in the figure legend (L857).

    1. Author response:

      The following is the authors’ response to the previous reviews

      Recommendations for the authors:

      Reviewing Editor Comments:

      The resubmitted version of the manuscript adequately addressed several initial comments made by reviewing editors, including a more detailed analysis of the results (such as those of bilayer thickness). This version was seen by 2 reviewers. Both reviewers recognize this work as being an important contribution to the field of BK and voltage-dependent ion channels in general. The long trajectories and the rigorous/novel analyses have revealed important insights into the mechanisms of voltage-sensing and electromechanical coupling in the context of a truncated variant of the BK channel. Many of these observations are consistent with structural and functional measurements of the channel, available thus far. The authors also identify a novel partially expanded state of the channel pore that is accessed after gating-charge displacement, which informs the sequence of structural events accompanying voltage-dependent opening of BK.

      However, there are key concerns regarding the use of the truncated channel in the simulations. While many gating features of BK are preserved in the truncated variant, studies have suggested that opening of the channel pore to voltage-sensing domain rearrangement is impaired upon gating-ring deletion. So the inferences made here might only represent a partial view of the mechanism of electromechanical coupling.

      It is also not entirely clear whether the partially expanded pore represents a functionally open, sub-conductance, or another closed state. Although the authors provide evidence that the inner pore is hydrated in this partially open state, in the absence of additional structural/functional restraints, a confident assignment of a functional state to this structure state is difficult. Functional measurements of the truncated channel seem to suggest that not only is their single channel conductance lower than full-length channels, but they also appear to have a voltage-independent step that causes the gates to open. It is unclear whether it is this voltage-independent step that remains to be captured in these MD trajectories. A clean cut resolution of this conundrum might not be feasible at this time, but it could help present the various possibilities to the readers.

      We appreciate the positive comments and agree that there will likely be important differences between the mechanistic details of voltage activation between the Core-MT and full-length constructs of BK channels. We also agree that the dilated pore observed in the simulation may not be the fully open state of Core-MT.

      Nonetheless, the notion that the simulation may not have captured the full pore opening transition or the contribution of the CTD should not render the current work “incomplete”, because a complete understanding of BK activation would be an unrealistic goal beyond the scope of this work. We respectfully emphasize that the main insights of the current simulations are the mechanisms of voltage sensing (e.g., the nature of VSD movements, contributions of various charged residues, how small charge movements allow voltage sensing, etc.) as well as the role of the S4-S5-S6 interface in VSD-pore coupling. As noted by the Editor and reviewers, these insights represent important steps towards establishing a more complete understanding of BK activation.

      Below are the specific comments of the two experts who have assessed the work and made specific suggestions to improve the manuscript.

      Reviewer #1 (Recommendations for the authors):

      (1) Although the successful simulation of V-dependent K+ conduction through the BK channel pore and analysis of associated state dependent VSD/pore interactions and coupling analysis is significant, there are two related questions that are relevant to the conclusions and of interest to the BK channel community which I think should be addressed or discussed.

      One key feature of BK channels is their extraordinarily large conductance compared to other K+ selective channels. Do the simulations of K+ conductance provide any insight into this difference? Is the predicted conductance of BK larger than that of other K+ channels studied by similar methods? Is there any difference in the conductance mechanism (e.g., the hard and soft knock-on effects mentioned for BK)?

      The molecular basis of the large conductance of BK channels is indeed an interesting and fundamental question. Unfortunately, this is beyond the scope of this work and the current simulation does not appear to provide any insight into the basis of large conductance. It is interesting to note, though, the conductance is apparently related to the level of pore dilation and the pore hydration level, as increasing hydration level from ~30 to ~40 waters in the pore increases the simulated conductance from ~1.5 to 6 pS (page 8). This is consistent with previous atomistic simulations (Gu and de Groot, Nature Communications 2023; ref. 33) showing that the pore hydration level is strongly correlated with observed conductance. As noted in the manuscript, the conductance mechanism through the filter appears highly similar to previous simulations of other K+ channels (Page 8). Given the limit conductance events observed in the current simulations, we will refrain from discussing possible basis of the large conductance in BK channels except commenting on the role of pore hydration (page 8; also see below in response to #5).

      The pore in the MD simulations does not open as wide as the Ca-bound open structure, which (as the authors note) may mean that full opening requires longer than 10 us. I think that is highly likely given that the two 750 mV simulations yielded different degrees of opening and that in BK channels opening is generally much slower than charge movement. Therefore, a question is - do any of the conclusions illustrated in Figures 6, S5, S6 differ if the Ca-bound structure is used as the open state? For example, I expect the interactions between S5 and S6 might at least change to some extent as S6 moves to its final position. In this case, would conclusions about which residues interact, and get stronger or weaker, be the same as in Figures S6 b,c? Providing a comparison may help indicate to what extent the conclusions are dependent on achieving a fully open conformation.

      We appreciate the reviewer’s suggestion and have further analyzed the information flow and coupling pathways using the simulation trajectory initiated from the Ca<sup>2+</sup>-bound cryo-EM structure (sim 7, Table S1). The new results are shown in two new SI Figures S7 and S8, and new discussion has been added to pages 14-15. Comparing Figures 5 and S7, we find that dynamic community, coupling pathways, and information flow are highly similar between simulation of the open and closed states, even though there are significant differences in S5 contacts in the simulated open state vs Ca<sup>2+</sup>-bound open state (Figure S8). Interestingly, there are significant differences in S4-S5 packing in the simulated and Ca<sup>2+</sup>-bound open states (Figure S8 top panel), which likely reflect important difference in VSD/pore interactions during voltage vs Ca<sup>2+</sup> activation.

      (2) P4 Significance -"first, successful direct simulation of voltage-activation"

      This statement may need rewording. As noted above Carrasquel-Ursulaez et al.,2022 (reference 39) simulated voltage sensor activation under comparable conditions to the current manuscript (3.9 us simulation at +400 mV), and made some similar conclusions regarding R210, R213 movement, and electric field focusing within the VSD. However, they did not report what happens to the pore or simulate K+ movement. So do the authors here mean something like "first, successful direct simulation of voltage-dependent channel opening"?

      We agree with the reviewer and have revised the statement to “ … the first successful direct simulation of voltage-dependent activation of the big potassium (BK) channel, ..”

      (3) P5 "We compare the membrane thickness at 300 and 750 mV and the results reveal no significant difference in the membrane thickness (Figure S2)"

      The figure also shows membrane thickness at 0 mV and indicates it is 1.4 Angstroms less than that at 300 or 750 mV. Whether or not this difference is significant should be stated, as the question being addressed is whether the structure is perturbed owing to the use of non-physiological voltages (which would include both 300 and 750 mV).

      We have revised the Figure S2 caption to clarify that one-way ANOVA suggest the difference is not significant.

      (4) P7 "It should be noted that the full-length BK channel in the Ca2+ bound state has an even larger intracellular opening (Figure 2f, green trace), suggesting that additional dilation of the pore may

      occur at longer timescales."

      As noted above, I agree it is likely that additional pore dilation may occur at longer timescales. However, for completeness, I suppose an alternative hypothesis should be noted, e.g. "...suggesting that additional dilation of the pore may occur at longer timescales, or in response to Ca-binding to the full length channel."

      This is a great suggestion. Revised as suggested.

      (5) Since the authors raise the possibility that they are simulating a subconductance state, some more discussion on this point would be helpful, especially in relation to the hydrophobic gate concept. Although the Magleby group concluded that the cytoplasmic mouth of the (fully open) pore has little impact on single channel conductance, that doesn't rule out that it becomes limiting in a partially open conformation. The simulation in Figure 3A shows an initial hydration of the pore with ~15 waters with little conductance events, suggesting that hydration per se may not suffice to define a fully open state. Indeed, the authors indicate that the simulated open state (w/ ~30-40 waters) has 1/4th the simulated conductance of the open structure (w/ ~60 waters). So is it the degree of hydration that limits conductance? Or is there a threshold of hydration that permits conductance and then other factors that limit conductance until the pore widens further? Addressing these issues might also be relevant to understanding the extraordinarily large conductance of fully open BK compared to other K channels.

      We agree with the reviewer’s proposal that pore hydration seems to be a major factor that can affect conductance. This is also well in-line with the previous computational study by Gu and de Groot (2023). We have now added a brief discussion on page 8, stating “Besides the limitation of the current fixed charge force fields in quantitively predicting channel conductance, we note that the molecular basis for the large conductance of BK channels is actually poorly understood (78). It is noteworthy that the pore hydration level appears to be an important factor in determining the apparent conductance in the simulation, which has also been proposed in a previous atomistic simulation study of the Aplysia BK channel (33).”

      Minor points

      (1) P5 "the fully relaxed pore profile (red trace in Figure S1d, top row) shows substantial differences compared to that of the Ca2+-free Cryo-EM structure of the full-length channel."

      For clarity, I suggest indicating which is the Ca-free profile - "... Ca2+-free Cryo-EM structure of the full-length channel (black trace)."

      We greatly appreciate the thoughtful suggestion. Revised as suggested.

      (2) P8 "Consistent with previous simulations (78-80), the conductance follows a multi-ion mechanism, where there are at least two K+ ions inside the filter"

      For clarity, I suggest indicating these are not previous simulations of BK channels (e.g., "previous simulations of other K+ channels ...").

      Author response: Revised as suggested. Thank you.

      (3) Figure 2, S1 - grey traces representing individual subunits are very difficult to see (especially if printed). I wonder if they should be made slightly darker. Similar traces in Figure 3 are easier to see.

      The traces in Figure S1 are actually the same thickness in Figure 3 and they appear lighter due to the size of the figure. Figure 2 panels a-c have been updated to improve the resolution.

      (4) Figure 2 - suggest labeling S6 as "S6 313-324" (similar to S4 notation) to indicate it is not the entire segment.

      Figure 2 panel d) has been updated as suggested.

      (5) Figure 2 legend - "Voltage activation of Core-MT BK channels. a-d)..."

      It would be easier to find details corresponding to individual panels if they were referenced individually. For example:

      "a-d) results from a 10-μs simulation under 750 mV (sim2b in Table S1). Each data point represents the average of four subunits for a given snapshot (thin grey lines), and the colored thick lines plot the running average. a) z-displacement of key side chain charged groups from initial positions. The locations of charged groups were taken as those of guanidinium CZ atoms (for Arg) and sidechain carboxyl carbons (for Asp/Glu) b) z-displacement of centers-of-mass of VSD helices from initial positions, c) backbone RMSD of the pore-lining S6 (F307-L325) to the open state, and d) tilt angles of all TM helices. Only residues 313-324 of S6 were included inthe tilt angle calculation, and the values in the open and closed Cryo-EM structures are marked using purple dashed lines. "

      We appreciate the thoughtful suggestion and have revised the caption as suggested.

      (6) Figure S1 - column labels a,b,c, and d should be referenced in the legend.

      The references to column labels have been added to Figure S1 caption.

      (7) References need to be double-checked for duplicates and formatting.

      a) I noticed several duplicate references, but did not do a complete search: Budelli et al 2013 (#68, 100), Horrigan Aldrich 2002 (#22,97), Sun Horrigan 2022 (#40, 86), Jensen et al 2012 (#56,81).

      b) Reference #38 is incorrectly cited with the first name spelled out and the last name abbreviated.

      We appreciate the careful proofreading of the reviewer. The duplicated references were introduced by mistake due to the use of multiple reference libraries. We have gone through the manuscript and removed a total of 5 duplicated references.

      Response to additional reviewer comments

      My only new comment is that the numbering of residues in Fig. S8 does not match the standard convention for hSlo and needs to be doublechecked. For the residues I checked, the numbers appear to be shifted 3 compared hSlo (e.g. Y315, P317, E318, G324 should be Y318, P320, E321, G327).

      We greatly appreciate the reviewer for catching the errors in residue labels. Figure S8 has now been updated to include correct residue labels. Thanks!

      Reviewer #2 (Recommendations for the authors):

      This manuscript has been through a previous level of review. The authors have provided their responses to the previous reviewers, which appear to be satisfactory, and I have no additional comments, beyond the caveats concerning interpretations based on the truncated channel, which are noted above.

      We greatly appreciate the constructive comments and insightful advice. Please see above response to the Reviewing Editor’s comments for response and changes regarding the caveats concerning interpretations of the current simulations.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      This study provides comprehensive instructions for using the chromatophore tracking software, Chromas, to track and analyse the dynamics of large numbers of cephalopod chromatophores across various spatiotemporal scales. This software addresses a long-standing challenge faced by many researchers who study these soft-bodied creatures, known for their remarkable ability to change colour rapidly. The updated software features a user-friendly interface that can be applied to a wide range of applications, making it an essential tool for biologists focused on animal dynamic signalling. It will also be of interest to professionals in the fields of computer vision and image analysis.

      Strengths:

      This work provides detailed instructions for this toolkit along with examples for potential users to try. The Gitlab inventory hosts the software package, installation documentation, and tutorials, further helping potential users with a less steep learning curve.

      Weaknesses:

      The evidence supporting the authors' claims is solid, particularly demonstrated through the use of cuttlefish and squid. However, it may not be applicable to all coleoid cephalopods yet, such as octopuses, which have an incredibly versatile ability to change their body forms.

      The reviewer is right to highlight this limitation. We clarified, in the revised manuscript, that CHROMAS relies on the assumption that chromatophore activity occurs primarily in a plane — a condition that is valid most of the time in squid and cuttlefish, where the majority of skin deformations are in-plane (with small occasional papillae). In cephalopods such as octopuses, however, in which the skin may undergo large 3-dimensional deformations through the action of papillary musculature, this assumption may not always hold. Although octopods’ bodies are more spherical (less flat) than those of squid and cuttlefish, CHROMAS should still be usable and useful if applied to smaller skin areas, especially because chromatophore density is often even higher in octopoda than in sepiidae.

      We added the following paragraph in the discussion:

      Another known limitation concerns the biological assumptions underlying the current version of CHROMAS. The pipeline is designed for surfaces that remain reasonably planar and undergo deformations primarily in two dimensions. In cephalopods such as octopuses, in which the skin can undergo substantial three-dimensional morphological changes, analysing chromatophore dynamics may require complementary three-dimensional tracking of the skin surface to correct for out-of-plane deformations and maintain accurate measurement of chromatophore activity.

      Reviewer #2 (Public review):

      Summary:

      The authors developed a computational pipeline named CHROMAS to track and analyse chromatophore dynamics, which provides a wide range of biological analysis tools without requiring the user to write code.

      Strengths:

      (1) CHROMAS is an integrated toolbox that provides tools for different biological tasks such as: segment, classify, track and measure individual chromatophores, cluster small groups of chromatophores, analyse full-body patterns, etc.

      (2) It could be used to investigate different species. The authors have already applied it to analyse the skin of the bobtail squid Euprymna berryi and the European cuttlefish Sepia officinalis.

      (3) The tool is open-source and easy to install. The paper describes in detail the command format to complete each task and provides relevant sample figures.

      Weaknesses:

      (1) The generality and robustness of the proposed pipeline need to be verified through more experimental evaluations. For example, the implementation algorithm depends on relatively specific or obvious image features, clean backgrounds, and objects that do not move too fast.

      (2) The pipeline lacks some kind of self-correction mechanism. If at one moment there is a conflicting match with the previous frames, how does the system automatically handle it to ensure that the tracking results are accurate over a long period of time?

      We thank the reviewer for raising this important point. CHROMAS does rely on relatively clean imaging conditions for optimal performance. However, the computational features of the pipeline — segmentation, tracking, and downstream analysis — have been designed to perform reliably as long as the segmentation models are trained on frames that reflect the diversity of the dataset (e.g., variations in lighting or minor background noise). It is correct, however, that acquiring the necessary quality of input data is both important and non-trivial. The pipeline is designed to work best with high-resolution footage of chromatophores under clear imaging conditions — specifically, with minimal water surface distortion, minimal particulate matter in the water column, and stable focus.

      To mitigate issues arising from motion blur or focus loss, CHROMAS includes an automatic frame quality control step that detects and discards frames that are out of focus, including those where the animal moves too fast for reliable tracking.

      To assist future users, we have now added a section under Discussion detailing the recommended recording conditions and video characteristics for effective analysis with CHROMAS. It reads:

      Recommended Video Parameters for Optimal Use of CHROMAS

      The performance of CHROMAS depends on the quality of the input videos. Although the pipeline analyses each frame independently and has no frame rate requirement, we recommend recording at 20 frames per second at least, to capture chromatophore dynamics accurately. Sharp, in-focus frames are critical, particularly for moving subjects, where higher shutter speeds help minimize motion blur. For reliable segmentation, each chromatophore should cover at least 10 pixels across its fully expanded diameter. Higher spatial resolution, with chromatophores covering around 50 pixels in diameter, are recommended if sub-chromatophore dynamics are of interest. Recording conditions should minimize background noise, and the water column should be as clear as possible, free of particles or debris. The water surface should be kept as calm and planar as possible to avoid optical artifacts. If wide-angle lenses or other optics that may introduce distortion are used, lens correction algorithms should be applied during preprocessing to compensate for the optical distortions. For long-term tracking applications (e.g., developmental studies), frequent imaging sessions are recommended. Newly differentiated chromatophores are initially light colored (e.g., yellow) and thus visually distinct from mature chromatophores (which are dark); over days to weeks, however, the light chromatophores darken and become increasingly difficult to differentiate from older ones. Recording at appropriate and regular intervals thus helps track individual chromatophores across developmental stages and improves the reliability of long-term analyses. Following these recommendations will help segmentation, tracking, and analysis with CHROMAS.

      CHROMAS does not implement an active self-correction mechanism in the sense of real-time error recovery. Yet, several steps are in place to ensure the reliability of registration and tracking over time. During registration, a set of points is tracked across frames using optical flow. If the displacement of a point between two frames exceeds a biologically plausible threshold, that point is automatically discarded from the registration calculation to prevent error propagation. If too many points are discarded, the registration step fails, preventing the acceptance of a poor alignment.

      In addition, masterframes (the averages of all aligned frames in a chunk) are generated at the end of the registration process to enable the visual verification of the quality of the mapping.

      During stitching, CHROMAS calculates reprojection errors between chunks, providing a quantitative measure of stitching validity and allowing users to detect and correct potential mismatches.

      We have revised the Results section to explicitly highlight the error-checking mechanisms implemented during registration and stitching to maintain tracking accuracy over time.

      Reviewer #1 (Recommendations for the authors):

      (1) Figures 2, 3, 5, 6, 8 showed the bobtail squid, however, all command lines for these figures were referred to "sepia_example.dataset".

      We thank the reviewer for noticing this inconsistency. We have corrected the labeling of the dataset name in the command line examples from "sepia_example.dataset" to the neutral term "example.dataset" to avoid any confusion regarding the species used in the figures.

      (2) It's excellent that Chromas includes a manual pre-alignment function. However, it's unclear how the authors determined the registration of selected chromatophores across different ages in the long-term tracking session. Given the rapid growth of cephalopods and presumably skin expansion with increased chromatophores, it would be helpful to provide more details or examples on this process.

      The manual pre-alignment function provides an interactive interface allowing the user to select a set of matching chromatophores across frames from different developmental stages. The accuracy of this process depends on the user's ability to recognize individual chromatophores reliably over time. Critically, it is not necessary to identify all those chromatophores; a representative subset is sufficient to interpolate the spatial mapping and align the surrounding chromatophores.

      To limit the potential challenges associated with chromatophore development, frequent imaging sessions (every few days) are recommended initially. Excessive intervals between recordings can result in relative displacements among existing chromatophores and the sudden appearance of newly matured chromatophores, both of which complicate manual matching.

      It should be noted that these challenges are not limitations of the CHROMAS pipeline itself, but rather relate to experimental design choices that affect the quality and traceability of the dataset. The exact parameters (e.g., size/duration of the datasets, spatial resolution, frame rate and intervals between recording sessions) to be used must be adapted to each experimental animal, each age, and ultimately, each question.

      Recommended video acquisition parameters, including guidance on recording frequency for long-term chromatophore tracking, have been added to the Discussion section.

      Reviewer #2 (Recommendations for the authors):

      (1) More detailed information should be given, such as operating system requirements, camera frame rate requirements, target size and speed limitations, when chunking videos into usable segments, the minimum length of each segment, etc.

      CHROMAS is platform-independent and requires only a functioning Python 3.9+ environment, regardless of the operating system or OS version, as described in “Methods – Implementation details”.

      Although CHROMAS does not require specific frame rates and because it analyses each frame independently, the quality of each image—and thus of imaging parameters—is critical to enable reliable chromatophore segmentation. If an animal remains relatively calm during recording, low shutter speeds will be adequate for image sharpness. Conversely, if the animal moves frequently or rapidly, it will be preferable to use a higher frame rate and a higher shutter speed to minimize motion blur. Recording parameters should therefore be adjusted accordingly, primarily to optimize image clarity and maintain frames in sharp focus.

      The frame rate should be sufficiently high also to capture the fast dynamics of chromatophore expansions and contractions. Although the pipeline has no specific frame rate requirement, we recommend image rates of at least 20 frames per second to sample the temporal patterns of chromatophore activity adequately, based on biological considerations.

      Each chromatophore should be represented by a sufficiently large number of pixels in each recorded image to enable the reliable estimation of its size, shape, and dynamics. If the spatial resolution is too low, individual chromatophores may appear as small pixel clusters, reducing the accuracy of area and shape measurements and introducing quantization artifacts. Based on our experience, we recommend recording conditions that result in each chromatophore covering at least 10 pixels across its diameter when fully expanded to ensure accurate segmentation and quantitative whole-chromatophore analysis. For sub-chromatophore motion analysis, we recommend a minimum of 50 pixels across the fully expanded diameter.

      These considerations relate to optimizing biological sampling and image quality for analysis, and are not technical requirements imposed by CHROMAS itself.

      We added a Discussion section outlining the recommended recording conditions and video parameters to facilitate effective use of CHROMAS.

      (2) This pipeline does not include functionality to correct for lens distortion, which may affect the results when accurate measurement of single chromatophore morphology is required.

      We thank the reviewer for this observation. We agree that lens distortion can affect the accurate measurement of chromatophore morphology if present. However, the current datasets analysed with CHROMAS were recorded using a long macro lens with minimal distortion, and visual inspections as well as quantitative assessments of chromatophore geometry did not indicate measurable optical deformation. We acknowledge that for other imaging setups —particularly those relying on the use of wide-angle lenses— lens distortion could introduce artifacts. In such cases, we recommend applying standard lens distortion correction during preprocessing, prior to analysis with CHROMAS.

      We have also addressed this point in the newly added section under the Discussion.

      (3) How to perform expansion for single chromatophores shown in Figure 6, and how to keep the expansion area consistent?

      The graph in Figure 6 illustrates the expansion of a single chromatophore over time and was generated entirely using the "areas" command and visualization tools available within CHROMAS.

      Spatial consistency is maintained because CHROMAS, through its registration and area extraction steps, tracks the identity of each chromatophore across the video, allowing the same individual to be followed reliably over time.

      (4) Tables 1 and 2: it's better to add the units of the values in each column.<br />

      We thank the reviewer for the suggestion. We have added the appropriate units to each column in Tables 1 and 2 to improve clarity.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors aimed to enhance the effectiveness of PARP inhibitors (PARPi) in treating high-grade serous ovarian cancer (HGSOC) and triple-negative breast cancer (TNBC) by inhibiting PRMT1/5 enzymes. They conducted a drug screen combining PARPi with 74 epigenetic modulators to identify promising combinations.

      Zhang et al. reported that protein arginine methyltransferase (PRMT) 1/5 inhibition acts synergistically to enhance the sensitivity of Poly (ADP-ribose) polymerase inhibitors (PARPi) in high-grade serous ovarian cancer (HGSOC) and triple-negative breast cancer (TNBC) cells. The authors are the first to perform a drug screen by combining PARPi with 74 well-characterized epigenetic modulators that target five major classes of epigenetic enzymes. Their drug screen identified both PRMT1/5 inhibitors with high combination and clinical priority scores in PARPi treatment. Notably, PRMT1/5 inhibitors significantly enhance PARPi treatment-induced DNA damage in HR-proficient HGSOC and TNBC cells through enhanced maintenance of gene expression associated with DNA damage repair, BRCAness, and intrinsic innate immune pathways in cancer cells. Additionally, bioinformatic analysis of large-scale genomic and functional profiles from TCGA and DepMap further supports that PRMT1/5 are potential therapeutic targets in oncology, including HGSOC and TNBC. These results provide a strong rationale for the clinical application of a combination of PRMT and PARP inhibitors in patients with HR-proficient ovarian and breast cancer. Thus, this discovery has a high impact on developing novel therapeutic approaches to overcome resistance to PARPi in clinical cancer therapy. The data and presentation in this manuscript are straightforward and reliable.

      Strengths:

      (1) Innovative Approach: First to screen PARPi with a large panel of epigenetic modulators.

      (2) Significant Results: Found that PRMT1/5 inhibitors significantly boost PARPi effectiveness in HR-proficient HGSOC and TNBC cells.

      (3) Mechanistic Insights: Showed how PRMT1/5 inhibitors enhance DNA damage repair and immune pathways.

      (4) Robust Data: Supported by extensive bioinformatic analysis from large genomic databases.

      Weaknesses:

      (1) Novelty Clarification: Needs clearer comparison to existing studies showing similar effects.

      (2) Unclear Mechanisms: More investigation is needed on how MYC targets correlate with PRMT1/5.

      (3) Inconsistent Data: ERCC1 expression results varied across cell lines.

      (4) Limited Immune Study: Using immunodeficient mice does not fully explore immune responses.

      (5) Statistical Methods: Should use one-way ANOVA instead of a two-tailed Student's t-test for multiple comparisons.

      We sincerely thank Reviewer #1 for the insightful and constructive feedback, as well as for the kind acknowledgment of the significance of our work: “These results provide a strong rationale for the clinical application of a combination of PRMT and PARP inhibitors in patients with HR-proficient ovarian and breast cancer. Thus, this discovery has a high impact on developing novel therapeutic approaches to overcome resistance to PARPi in clinical cancer therapy. The data and presentation in this manuscript are straightforward and reliable.” We greatly appreciate the reviewer #1’s thoughtful comments, which have significantly improved the quality of our manuscript. In response, we conducted additional experiments and analyses, and made comprehensive revisions to the text, figures, and supplementary materials. In the “Recommendations for the authors” sections, we have provided point-by-point responses to each of the reviewer’s comments, which were immensely helpful in guiding our revisions. We believe these updates have substantially strengthened the manuscript and have fully addressed all reviewer concerns.

      Reviewer #2 (Public Review):

      Summary:

      The authors show that a combination of arginine methyltransferase inhibitors synergize with PARP inhibitors to kill ovarian and triple-negative cancer cell lines in vitro and in vivo using preclinical mouse models.

      PARP inhibitors have been the common targeted-therapy options to treat high-grade serous ovarian cancer (HGSOC) and triple-negative breast cancer (TNBC). PRMTs are oncological therapeutic targets and specific inhibitors have been developed. However, due to the insufficiency of PRMTi or PARPi single treatment for HGSOC and TNBC, designing novel combinations of existing inhibitors is necessary. In previous studies, the authors and others developed an "induced PARPi sensitivity by epigenetic modulation" strategy to target resistant tumors. In this study, the authors presented a triple combination of PRMT1i, PRMT5i and PARPi that synergistically kills TNBC cells. A drug screen and RNA-seq analysis were performed to indicate cancer cell growth dependency of PRMT1 and PRMT5, and their CRISPR/Cas9 knockout sensitizes cancer cells to PARPi treatment. It was shown that the cells accumulate DNA damage and have increased caspase 3/7 activity. RNA-seq analysis identified BRCAness genes, and the authors closely studied a top hit ERCC1 as a downregulated DNA damage protein in PRMT inhibitor treatments. ERCC1 is known to be synthetic lethal with PARP inhibitors. Thus, the authors add back ERCC1 and reduce the effects of PRMT inhibitors suggesting PRMT inhibitors mediate, in part, their effect via ERCC1 downregulation. The combination therapy (PRMT/PARP) is validated in 2D cultures of cell lines (OVCAR3, 8 and MDA-MB-231) and has shown to be effective in nude mice with MDA-MB-231 xenograph models.

      Strengths and weaknesses:

      Overall, the data is well-presented. The experiments are well-performed, convincing, and have the appropriate controls (using inhibitors and genetic deletions) and statistics.

      They identify the DNA damage protein ERCC1 to be reduced in expression with PRMT inhibitors. As ERCC1 is known to be synthetic lethal with PARPi, this provides a mechanism for the synergy. They use cell lines only for their study in 2D as well as xenograph models.

      We sincerely thank Reviewer #2 for the insightful and constructive feedback, as well as for the kind acknowledgment of the significance of our work: “Overall, the data are well-presented. The experiments are well-performed, convincing, and supported by appropriate controls (using inhibitors and genetic deletions) and statistics.” We greatly appreciate the reviewer #2’s thoughtful comments, which have significantly improved the quality of our manuscript. In response, we conducted additional experiments and analyses, and made comprehensive revisions to the text, figures, and supplementary materials. In the “Recommendations for the authors” sections, we have provided point-by-point responses to each of the reviewer’s comments, which were immensely helpful in guiding our revisions. We believe these updates have substantially strengthened the manuscript and have fully addressed all reviewer concerns.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Recent studies have revealed promising synergistic effects between PRMT inhibitors and chemotherapy, as well as DDR-targeting drugs (ref. 89-92). In the discussion, the authors should highlight what is novel in this study compared to the reported studies.

      We thank the reviewer for this important comment and fully agree that prior studies have demonstrated the potential of PRMT inhibitors to enhance the efficacy of DNA damage-targeting agents and certain chemotherapies[1-4]. In response to the reviewer’s constructive suggestion, we have now revised the discussion to highlight the novel aspects of our study compared to previously reported findings. Specifically, our work presents several key advances that go beyond prior studies. Below, we would like to emphasize the novelty of our current study as follows:

      In the clinic, a strategy termed “induced PARP inhibitor (PARPi) sensitivity by epigenetic modulation” is being evaluated to sensitize homologous recombination (HR)-proficient tumors to PARPi treatments. Together with other groups, we reported that repression of BET activity significantly reduces the expression levels of essential HR genes by inhibiting their super-enhancers[5]. This preclinical discovery is now being assessed in a Phase 1b/2 clinical trial combining the BET inhibitor ZEN-3694 with the PARPi talazoparib for the treatment of patients with metastatic triple-negative breast cancer (TNBC) who do not carry germline BRCA1/2 mutations. Promising anti-tumor activity has been observed in this ongoing trial[6]. Importantly, gene expression profiles from paired tumor biopsies demonstrated robust target engagement, evidenced by repression of BRCA1 and RAD51 mRNA expression, consistent with our preclinical findings in xenograft models. Based on these encouraging results, the trial is being expanded to a Phase 2b stage to enroll additional TNBC patients. Moreover, other combination strategies[7-13] based on this “induced PARPi sensitivity by epigenetic modulation” approach have also shown promising clinical responses in both intrinsic and acquired HR-proficient settings. Notably, these clinical studies indicate that the strategy is well-tolerated, likely due to cancer cells being particularly sensitive to epigenetic repression of DNA damage response (DDR) genes, compared with normal cells.

      However, two key clinical challenges remain for broader application of this strategy in oncology: 1) which clinically actionable epigenetic drugs can produce the strongest synergistic effects with PARPi? and 2) can a BRCA-independent approach be developed? To address these questions, we performed a drug screen combining the FDA-approved PARPi olaparib with a panel of clinically relevant epigenetic drugs. This panel includes 74 well-characterized epigenetic modulators targeting five major classes of epigenetic enzymes, comprising 7 FDA-approved drugs, 14 agents in clinical trials, and 54 in preclinical development. Notably, both type I PRMT inhibitors (PRMTi) and PRMT5 inhibitors (PRMT5i) achieved high combination and clinical prioritization scores in the screen. Functional assays demonstrated that PRMT inhibition markedly enhances PARPi-induced DNA damage in HR-proficient cancer cell lines. In line with a strong positive correlation between PRMT and DDR gene expression across primary tumors, we observed that PRMT activity supports the transcription of DDR genes and maintains a BRCAness-like phenotype in cancer cells. These findings provide strong rationale for clinical development of PRMT/PARPi combinations in patients with HR-proficient ovarian or breast cancers. Mechanistic characterization from our study further supports PRMTi clinical development by elucidating mechanisms of action, identifying rational combinations, defining predictive biomarkers, and guiding dosing strategies.

      We believe our studies will be of significant interest to the cancer research community for several reasons. First, they address major clinical challenges in women’s cancers, specifically, high-grade serous ovarian cancer (HGSOC) and TNBC, both of which are aggressive malignancies with limited therapeutic options. Second, they offer a novel solution to overcome PARPi resistance. Our earlier discovery of “induced PARPi sensitivity by epigenetic modulation” has already shown promising clinical results and represents a new path to overcome both primary and acquired resistance to PARPi and platinum therapies. Third, they focus on a clinically translatable drug class. Selective and potent PRMT inhibitors have been developed by leading pharmaceutical companies, with more than ten currently in advanced clinical trials. Fourth, they support mechanism-driven combination strategies. Preclinical evaluation of PRMTi-based combinations with other therapeutic agents is urgently needed for future clinical success. Finally, our work highlights understudied but therapeutically relevant mechanisms in cancer biology. In-depth mechanistic analysis of the PRMT regulome is essential, and our studies provide important new insights into how PRMTs regulate transcription, RNA splicing, DNA damage repair, and anti-tumor immune responses in the context of HGSOC and TNBC.

      In summary, our study identifies PRMT1 and PRMT5 as key epigenetic regulators of DNA damage repair and shows that their inhibition sensitizes HR-proficient tumors to PARP inhibitors by repressing transcription and altering splicing of BRCAness genes. Distinct from prior strategies, dual inhibition of type I PRMT and PRMT5 exhibits strong synergy, allowing for lower-dose combination treatments that may reduce toxicity. Our findings also nominate ERCC1 as a potential predictive biomarker and suggest that MYC-driven tumors may be particularly responsive to this approach. Collectively, these results offer a mechanistic rationale and translational framework to broaden the clinical application of PARP inhibitors.

      (2) In Figures 3H-J, MYC targets were likely to correlate with the expression levels of PRMT1/PRMT5 in various public datasets, supporting previous reports that the Myc-PRMT loop plays critical roles during tumorigenesis (ref. 45). "Myc-targets" signatures were also the most significant signatures correlated with the expression of PRMT1 and PRMT5. The authors suggest that under MYC-hyperactivated conditions, tumors may be extremely sensitive to PRMT inhibitors or PRMTi/PARPi combination. However, the underlying mechanism remains unclear.

      We sincerely thank the reviewer for the critical and insightful comments. We fully agree that more direct evidence is needed to establish the regulatory relationship between MYC and PRMT1/5. To investigate the effect of c-Myc on PRMT1 and PRMT5 expression, we analyzed RNA-seq data from P493-6 Burkitt lymphoma cells, which harbor a tetracycline (Tet)-repressible MYC transgene. In this system, MYC expression can be suppressed to very low levels and then reactivated, enabling a gradual increase in c-Myc protein levels[14]. Upon Tet removal to induce MYC expression, we observed a robust upregulation of both PRMT1 (4.3-fold) and PRMT5 (3.6-fold) RNA levels within 24 hours, as measured by RNA-seq. These findings indicate that MYC activation can transcriptionally upregulate PRMT1 and PRMT5. To determine whether this regulation is directly driven by MYC, we further analyzed MYC ChIP-seq profiles from the same cell line following 24 hours of MYC induction. Consistently, we observed remarkably increased MYC binding at the promoter regions of both PRMT1 and PRMT5 genes. Interestingly, MYC’s regulatory influence was not limited to PRMT1 and PRMT5, we also observed transcriptional upregulation of other PRMT family members, including PRMT3, PRMT4, and PRMT6, in response to MYC activation. Together with the data presented in Figure 3H, these new results strongly suggest that MYC directly upregulates the expression of PRMT family genes by binding to their promoter regions. Consequently, increased PRMT expression may facilitate MYC’s regulation of target gene expression and splicing in cancer cells. In cancers with MYC hyperactivation, this feed-forward loop may be amplified, creating a potential therapeutic vulnerability. In response to the reviewer’s insightful suggestion, we have further explored how MYC regulates PRMT1/5 and whether this regulation modulates the efficacy of PRMT inhibitors in oncology. These unpublished observations are currently being prepared for a separate manuscript, and we have now incorporated a discussion of these unpublished findings into the revised version of this manuscript. We thank the reviewer again for the thoughtful and constructive comments regarding the MYC–PRMT regulatory axis.

      (3) In Figure 5F, ERCC1 expression was unlikely to be reduced in cells treated with GSK025, especially in OVCAR8 cells, although other cells, including TNBC cells, are dramatically changed after treatment.

      We sincerely thank the reviewer for the critical and insightful comments. We agree with the reviewer that in Figure 5F, although GSK025 treatment reduced ERCC1 expression, the loading control Tubulin also showed a notable decrease in the OVCAR8 cell line. This may be because Tubulin expression is not specifically affected by the chemical inhibitor GSK025 in this particular cell line, or it may be secondarily reduced as a consequence of PRMT inhibitor-induced cell death. As the reviewer pointed out, this phenomenon was not observed in other cell lines, suggesting that the effect on Tubulin is not specific to PRMT inhibition. To further investigate, we employed CRISPR/Cas9-mediated knockout of PRMT1 or PRMT5 in OVCAR8 cells, a more specific genetic approach to inhibit PRMT activity. In both cases, ERCC1 expression was significantly reduced, whereas Tubulin levels remained stable (Figure 5G). These results support the conclusion that PRMT1 and PRMT5 specifically regulate ERCC1 expression in OVCAR8 cells. The inconsistent effect on Tubulin is likely due to nonspecific cellular responses to chemical inhibition, which are generally more variable and less precise than those induced by genetic perturbation.

      (4) In Figure 7H-L, MDA-MB-231 cells were implanted subcutaneously in nude immunodeficient mice to confirm the synergistic therapeutic action of the PRMTi/PARPi combination in vivo. Although PRMT inhibition activates intrinsic innate immune pathways in cancer cells, suggesting that PRMTi treatments may enhance intrinsic immune reactions in tumor cells, the use of nude immune deficient mice means that changes in the tumor immune microenvironment remain unknown.

      We sincerely thank the reviewer for the critical and insightful comments. We fully agree with the reviewer that our in vivo experiments using the human cancer cell line MDA-MB-231 in immunodeficient nude mice limit our ability to assess changes in the tumor immune microenvironment. We thank the reviewer for highlighting this important limitation. While the primary goal of the current study was to investigate the therapeutic synergy between PRMT inhibition and PARP inhibition in cancer cells, we would like to take this opportunity to share additional unpublished data that further support and extend the reviewer’s point regarding the immunomodulatory effects of PRMT inhibitors. In syngeneic mouse tumor models, we have observed that the combination of PRMT inhibition and PARP inhibition leads to a more robust anti-tumor immune response compared to either treatment alone. Specifically, we found increased infiltration of CD8⁺ cytotoxic T cells within the tumor microenvironment, suggesting enhanced immune activation and tumor immunogenicity. Furthermore, we have also obtained preliminary evidence that PRMT inhibition can potentiate immune checkpoint blockade therapy. Mechanistically, this may be mediated through the activation of the STING1 pathway and the upregulation of splicing-derived neoantigens, both of which have been implicated in promoting tumor immune visibility. These findings indicate that beyond enhancing DNA damage response, PRMT inhibition may have a broader impact on tumor-immune interactions and could serve as a promising strategy to sensitize tumors to immunotherapy. A separate manuscript detailing these results is currently in preparation and will be submitted for publication as an independent research article. In light of the reviewer’s thoughtful suggestions and in consideration of feedback from Reviewer #2, who recommended removing Figure 6 from the manuscript, we have carefully reevaluated the overall organization of the manuscript. Given the scope and focus of the current work, as well as the desire to maintain a concise and coherent narrative, we decided to move the content originally presented in Figure 6 to the supplementary materials. This figure is now included as Supplementary Figure S5 in the revised version of the manuscript. We believe this change helps streamline the main text while still making the additional data available for interested readers.

      (5) In Figures 6-7, a two-tailed Student's t-test was used to determine the statistical differences among multiple comparisons, which should be performed by one-way ANOVA followed by a post hoc test.

      We thank the reviewer for this thoughtful and important comment regarding the choice of statistical method. We fully agree with the reviewer that one-way ANOVA followed by a post hoc test is one of the standard approaches for multiple group comparisons. In response to the suggestion, we have performed one-way ANOVA on our data and found that the statistical conclusions are consistent with those obtained from the two-tailed Student’s t-tests. For example, in the first panel of Figure 6A (OVCAR8 treated with GSK715), one-way ANOVA (p = 1.1 × 10<sup>-6</sup>), followed by Tukey’s HSD test, confirmed significant differences between control and Olaparib (p = 0.000165), control and GSK715 (p = 0.000145), control and combination (p = 6.067 × 10<sup>-7</sup>), Olaparib and combination (p = 0.0003523), and GSK715 and combination (p = 0.0004015), consistent with the conclusions from the two-tailed t-test shown in Figure 6H. Additionally, we would like to explain why two-tailed Student’s t-tests were used in our current study. When comparisons are predefined and conducted pairwise (i.e., two groups at a time), a two-tailed Student’s t-test is statistically equivalent to one-way ANOVA for those comparisons. In our study, each comparison involved only two groups, and we therefore chose t-tests for hypothesis-driven, specific comparisons rather than exploratory multiple testing. This approach aligns with valid statistical principles. All statistical analyses presented in Figures 6-7 were designed to evaluate specific, biologically meaningful comparisons (e.g., treatment vs. control or treatment A vs treatment B). The study was hypothesis-driven, not exploratory, and did not involve simultaneous comparisons across multiple groups. In such cases, the t-test provides a more direct and interpretable result for targeted comparisons. The use of Student’s t-tests reflects the focused nature of the analysis, where each test directly addresses a specific biological question rather than a global group comparison. We sincerely appreciate the reviewer’s thoughtful comments on the statistical methods.

      Reviewer #2 (Recommendations for the authors):

      (1) If the authors kept the tumors of various sizes in Figure 7I, it would be important to assess the protein and/or mRNA level of ERCC1 to further support their mechanism.

      We sincerely thank the reviewer for the insightful comments. We fully agree that evaluating ERCC1 expression in drug-treated tumor samples is critical to support the proposed mechanism. Due to the limited volume of tumor specimens and extensive necrosis observed after three weeks of treatment in the condition used for Figure 7I, we were unable to obtain sufficient material for expression analysis in the original cohort. To address this, we conducted an additional experiment using xenograft-bearing mice (MDA-MB-231 model), initiating treatment when tumors reached approximately 200 mm³ to ensure adequate tissue collection. We also shortened the treatment duration to 7 days to assess early molecular responses to therapy, rather than downstream effects. Consistent with our in vitro results, both GSK715 and GSK025 significantly reduced ERCC1 RNA expression (0.79 ± 0.17, p = 0.03; 0.82 ± 0.11, p = 0.02, respectively), and the combination treatment further decreased ERCC1 expression (0.49 ± 0.20, p = 0.0003), as determined by qRT-PCR. A two-tailed Student’s t-test was used for statistical analysis. In this experiment, we used the same dosing regimen as in the three-week treatment shown in Figure 7I. Importantly, the shorter treatment period and moderate tumor size at treatment initiation minimized necrosis and did not significantly affect tumor growth, allowing for reliable molecular evaluation. We sincerely thank the reviewer for highlighting this important point.

      (2) Figure 2G: please explain why two bands remain for sgPRMT1.

      We greatly appreciate the reviewer for raising this insightful and important question. As the reviewer pointed out, an additional band appeared after PRMT1 knockdown in OVCAR8 cells using two sequence-independent gRNAs. Notably, this band was not observed in MDA-MB-231 cells. The antibody used to detect PRMT1 (clone A33, #2449, Cell Signaling Technology) is widely adopted in PRMT1 research, with over 65 citations supporting its specificity. Interestingly, previous studies[15] have identified seven PRMT1 isoforms (v1–v7), generated through alternative splicing and exhibiting tissue-specific expression patterns. Of these, three isoforms are detectable using the A33 antibody. We believe the additional band observed upon sgRNA treatment likely represents a PRMT1 isoform that is normally expressed at low levels in OVCAR8 cells. Upon knockdown of the major isoforms by CRISPR/Cas9, expression of this minor isoform may have increased as part of a compensatory feedback mechanism, rendering it detectable by immunoblotting. Because PRMT1 isoform expression is largely tissue-type specific, it is not surprising that the same band was absent in MDA-MB-231 cells, which are derived from a different lineage than OVCAR8 cells. The reviewer raised an important question regarding the role of PRMT1 isoforms in regulating DNA damage response in cancer. We agree this is an intriguing direction and will investigate it further in future studies.

      (3) Figure 4D: Please correct the figure legend so the description matches the color in the figure. Red and blue are absent.

      We sincerely thank the reviewer for the critical and insightful comments. The figure legend for Figure 4D has been corrected in the revised version of the manuscript to accurately match the colors shown in the figure. We thank the reviewer for pointing out this issue.

      (4) Figure 7A and B: please indicate the cell lines used.

      We sincerely thank the reviewer for the critical and insightful comments. In Figure 7A and 7B, human embryonic kidney 293T (HEK293T) cells were used due to their high transfection efficiency and widespread application in reporter assays. This information has been incorporated into the figure legend for Figures 7A and 7B.

      (5) What is the link with ERCC1 splicing because reduced overall ERCC1 expression is clear?

      We sincerely thank the reviewer for the critical and insightful comments. As the reviewer pointed out, although the direct impact of ERCC1 alternative splicing on its protein expression remains to be fully elucidated, it is likely that PRMT inhibition induces aberrant splicing events that result in the production of alternative ERCC1 isoforms with impaired or altered function. These splicing changes may compromise ERCC1’s role in DNA repair pathways. Furthermore, as shown in Figure 4G, we observed a reduction in the total ERCC1 mRNA reads following PRMTi treatment. This decrease may be attributed, at least in part, to the instability of the alternatively spliced ERCC1 transcripts, which could be more prone to degradation. In combination with the transcriptional downregulation of ERCC1 induced by PRMT inhibition, these alternative splicing events may lead to a further reduction in functional ERCC1 protein levels. This dual impact on ERCC1 expression, through both decreased transcription and the generation of unstable or non-functional isoforms, likely contributes to the enhanced cellular sensitivity to PARP inhibitors observed in our study. We believe this represents an important mechanistic insight into how PRMT inhibition modulates the DNA damage response in cancer cells, and further studies are warranted to investigate the precise role of ERCC1 splicing regulation in this context. We thank the reviewer for pointing out this interesting future research direction.

      (6) Figure 7J: From the graph, it seems like Olaparib+G715 and G715+G025 have a similar effect on tumor volume (two curves overlap). Please discuss.

      We sincerely thank the reviewer for the critical and insightful comments. In the current study, the doses used for single-agent treatments were selected based on prior publications. For example, the dose of GSK715 was guided by a recent study from the GSK group[16]. Our in vitro and in vivo findings, together with previously published data, consistently demonstrate that GSK715 is more potent than both GSK025 and Olaparib. Notably, treatment with GSK715 alone led to significantly greater inhibition of tumor growth compared to either GSK025 or Olaparib administered individually. This higher potency of GSK715 also explains the comparable levels of tumor suppression observed in the combination groups, including GSK715 plus Olaparib and GSK715 plus GSK025. These results suggest that GSK715 is likely the primary driver of efficacy in the two drug combination settings. Importantly, this observation provides a valuable opportunity to further refine and optimize the dosing strategy for GSK715. Specifically, because GSK715 is highly potent, its dose may be reduced when used in combination regimens without compromising therapeutic efficacy. This approach could significantly improve the safety profile of GSK715 by minimizing potential dose-related toxicities, thereby enhancing its suitability for future clinical development in combination therapy contexts.

      (7) Discussion: "PRMT5i increased global sDMA levels"-> "... aDMA levels.".

      We sincerely thank the reviewer for the critical and insightful comments. In response, we have corrected the sentence in the discussion from “PRMT5i increased global sDMA levels, which suggested that type I PRMT and PRMT5 share a substrate (i.e., MMA) and/or their functions are compensatory” to “PRMT1i increased global sDMA levels, which suggested that type I PRMT and PRMT5 share a substrate (i.e., MMA) and/or their functions are compensatory.” We apologize for the misstatement and have corrected this error in the revised version of the manuscript.

      (8) In addition to the methods, add that nude mice were used in the body of the results and the figure legend for Figure 7J.

      We sincerely thank the reviewer for the critical and insightful comments. In the revised version of the manuscript, we have added that immunodeficient nude mice were used in both the body of the Results section and the figure legend for Figure 7J, in addition to the Methods section. We thank the reviewer for this helpful suggestion.

      (9) Figure 6 can be deleted to focus the manuscript. It does not add to the PARP inhibition story, but only suggests a link to immunotherapy where this has been reported previously PMID: 35578032 and 32641491.

      We sincerely thank the reviewer for the critical and insightful comments. Reviewer #1 also raised a related concern regarding the relevance of this section to the main focus of the manuscript. In consideration of both reviewers’ comments, we have decided to move the data previously shown in Figure 6 to the supplementary section as Supplementary Figure S5. This revision allows us to streamline the main text and maintain a clear focus on the core findings related to PARP inhibition. At the same time, we believe the immunotherapy-related observation may still be of interest to some readers. By presenting these results in the supplementary materials, we ensure that this potentially relevant link remains accessible without distracting from the primary narrative of the manuscript. We greatly appreciate the reviewers’ guidance in helping us improve the clarity and focus of our work. We thank the reviewer for the thoughtful suggestion.

      References

      (1) Dominici, C., et al. Synergistic effects of type I PRMT and PARP inhibitors against non-small cell lung cancer cells. Clin Epigenetics 13, 54 (2021).

      (2) O'Brien, S., et al. Inhibiting PRMT5 induces DNA damage and increases anti-proliferative activity of Niraparib, a PARP inhibitor, in models of breast and ovarian cancer. BMC Cancer 23, 775 (2023).

      (3) Carter, J., et al. PRMT5 Inhibitors Regulate DNA Damage Repair Pathways in Cancer Cells and Improve Response to PARP Inhibition and Chemotherapies. Cancer Res Commun 3, 2233-2243 (2023).

      (4) Li, Y., et al. PRMT blockade induces defective DNA replication stress response and synergizes with PARP inhibition. Cell Rep Med 4, 101326 (2023).

      (5) Yang, L., et al. Repression of BET activity sensitizes homologous recombination-proficient cancers to PARP inhibition. Sci Transl Med 9(2017).

      (6) Aftimos, P.G., et al. A phase 1b/2 study of the BET inhibitor ZEN-3694 in combination with talazoparib for treatment of patients with TNBC without gBRCA1/2 mutations. Journal of Clinical Oncology 40, 1023-1023 (2022).

      (7) Karakashev, S., et al. BET Bromodomain Inhibition Synergizes with PARP Inhibitor in Epithelial Ovarian Cancer. Cell Rep 21, 3398-3405 (2017).

      (8) Sun, C., et al. BRD4 Inhibition Is Synthetic Lethal with PARP Inhibitors through the Induction of Homologous Recombination Deficiency. Cancer Cell 33, 401-416 e408 (2018).

      (9) Johnson, S.F., et al. CDK12 Inhibition Reverses De Novo and Acquired PARP Inhibitor Resistance in BRCA Wild-Type and Mutated Models of Triple-Negative Breast Cancer. Cell Rep 17, 2367-2381 (2016).

      (10) Iniguez, A.B., et al. EWS/FLI Confers Tumor Cell Synthetic Lethality to CDK12 Inhibition in Ewing Sarcoma. Cancer Cell 33, 202-216 e206 (2018).

      (11) Shan, W., et al. Systematic Characterization of Recurrent Genomic Alterations in Cyclin-Dependent Kinases Reveals Potential Therapeutic Strategies for Cancer Treatment. Cell Rep 32, 107884 (2020).

      (12) Muvarak, N.E., et al. Enhancing the Cytotoxic Effects of PARP Inhibitors with DNA Demethylating Agents - A Potential Therapy for Cancer. Cancer Cell 30, 637-650 (2016).

      (13) Abbotts, R., et al. DNA methyltransferase inhibitors induce a BRCAness phenotype that sensitizes NSCLC to PARP inhibitor and ionizing radiation. Proc Natl Acad Sci U S A 116, 22609-22618 (2019).

      (14) Lin, C.Y., et al. Transcriptional amplification in tumor cells with elevated c-Myc. Cell 151, 56-67 (2012).

      (15) Goulet, I., Gauvin, G., Boisvenue, S. & Cote, J. Alternative splicing yields protein arginine methyltransferase 1 isoforms with distinct activity, substrate specificity, and subcellular localization. J Biol Chem 282, 33009-33021 (2007).

      (16) Fedoriw, A., et al. Anti-tumor Activity of the Type I PRMT Inhibitor, GSK3368715, Synergizes with PRMT5 Inhibition through MTAP Loss. Cancer Cell 36, 100-114 e125 (2019).

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study uses a cell-based computational model to simulate and study T cell development in the thymus. They initially applied this model to assess the effect of the thymic epithelial cells (TECs) network on thymocyte proliferation and demonstrated that increasing TEC size, density, or protrusions increased the number of thymocytes. They postulated and confirmed that this was due to changes in IL7 signalling and then expanded this work to encompass various environmental and cell-based parameters, including Notch signalling, cell cycle duration, and cell motility. Critical outcomes from the computational model were tested in vivo using medaka fish, such as the role of IL-7 signalling and minimal effect of Notch signalling.

      Strengths:

      The strength of the paper is the use of computational modelling to obtain unique insights into the niche parameters that control T cell development, such as the role of TEC architecture, while anchoring those findings with in vivo experiments. I can't comment on the model itself, as I am not an expert in modelling, however, the conclusions of the paper seem to be wellsupported by the model.

      Weaknesses:

      One potential issue is that many of the conclusions are drawn from the number of thymocytes, or related parameters such as the thymic size or proliferation of the thymocytes. The study only touches briefly on the influence of the thymic niche on other aspects of thymocyte behaviour, such as their differentiation and death.

      We thank the reviewer for this constructive feedback. Indeed, the strength of our approach lies in the close cooperation between modellers and experimentalists. One advantage of the model is its ability to manipulate challenging or even impossible variables, such as TEC dimensions, which cannot be varied experimentally with current tools. 

      The reviewer rightly pointed out that our validation focuses on comparing cell numbers or organ size as a proxy for cell numbers.

      In our previous study (Aghaallaei et al., Science Advances, 2021), we focused more on differentiation and used the computational model to predict how proportions of T-cell sublineages would vary according to different parameter values, including the IL-7 availability. One of the initial inspirations for the focus on proliferation in this manuscript was the observation in this previous work that overexpression of IL-7 in the niche resulted in overproliferation. We also focused on proliferation and organ size because these are more easily measured in experimental conditions with the tools that we have available in medaka, allowing better comparisons to the computational results.

      Regarding cell death, our experimental observations do not suggest that it plays a role before the final stages of T cell maturation. Hence, the model also does not include apoptosis before this stage either. 

      However, we do agree that taking a closer look at the regulation of differentiation and cell death would be an exciting avenue for future study!

      Please see our response to author recommendations below for more information on these points. Moreover, to make the model more accessible to non-experts, we have created new schematic figures, which we can be found in the Appendix of the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      The authors have worked up a ``virtual thymus' using EPISIM, which has already been published. Attractive features of the computational model are stochasticity, cell-to-cell variability, and spatial heterogeneity. They seek to explore the role of TECs, that release IL-7 which is important in the process of thymocyte division.

      In the model, ordinary clones have IL7R levels chosen from a distribution, while `lesioned' clones have an IL7R value set to the maximum. The observation is that the lesioned clones are larger families, but the difference is not dramatic. This might be called a cell-intrinsic mechanism. One promising cell-extrinsic mechanism is mentioned: if a lesioned clone happens to be near a source of IL-7 and begins to proliferate, the progeny can crowd out cells of other clones and monopolise the IL-7 source. The effect will be more noticeable if sources are rare, so is seen when the TEC network is sparse.

      Strengths:

      Thymic disfunctions are of interest, not least because of T-ALL. New cells are added, one at a time, to simulate the conveyor belt of thymocytes on a background of stationary cells. They are thus able to follow cell lineages, which is interesting because one progenitor can give rise to many progeny.

      There are some experimental results in Figures 4,5 and 6. For example, il7 crispant embryos have fewer thymocytes and smaller thymii; but increasing IL-7 availability produces large thymii.

      Weaknesses:

      On the negative side, like most agent-based models, there are dozens of parameters and assumptions whose values and validity are hard to ascertain.

      The stated aim is to mimic a 2.5-to-11 day-old medaka thymus, but the constructed model is a geometrical subset that holds about 100 cells at a time in a steady state. The manuscript contains very many figures and lengthy descriptions of simulations run with different parameters values and assumptions. The abstract and conclusion did not help me understand what exactly has been done and learned. No attempt to synthesise observations in any mathematical formula is made.

      The reviewer raises several important points to consider when working with mathematical or computational models.

      As in many other agent-based models, we agree that our model makes use of many parameters. Many of these parameters summarize multiple steps and are treated as phenomenological, i.e. they do not represent a microscopic event such as the rate of an individual chemical reaction, but more high-level processes such as "rate of differentiation". Realistically, this process should consist of cascades of pathway components that regulate transcription factors.

      In the supplementary material of our previous work (Aghaallaei et al., Science Advances, 2021) we provided an in-depth explanation of the mathematical formulation and rationale behind our choices in relation to the available biological data to select assumptions and restrict parameter value ranges. Four parameters that could not be characterized with pre-existing data, but which were crucial to the model's predictions, were studied in detail in that publication. Hence, the submitted manuscript starts with a well-calibrated model that has been tailored for the medaka thymus. The submitted manuscript explores the robustness of the system to lesions,  which we conceptualize as alterations in parameter values. We were surprised by how well the model recapitulated the time scales of overproliferation in the thymus of medaka embryos, which further supports the notion that our previous model calibration was successful.

      Another important point raised by the reviewer is that the "validity [of parameters and assumptions is] hard to ascertain". We agree, which is precisely the reason why we aim to test the model's predictions through experimentation. Importantly, a model does not need to be perfect to be useful. For example, in the submitted manuscript we observed a discrepancy between model predictions and experimental results that led us to hypothesize negative feedback regulation from the proliferative state to differentiation. 

      Thus, a major strength of modelling approaches is that they allow to identify erroneous or missing assumptions about the structure of the regulatory interaction network and its parametrization which can advance our scientific understanding of the underlying biology. Using models as an investigative tool is fundamental to the philosophy of systems biology (Kitano, Science, 2002), and is what we strive for.

      The reviewer rightfully points out that we only represent a geometric subset of the organ. In our preliminary work, we considered representing the full three-dimensional thymus; however, we later simplified our approach, as the organ is a symmetric ellipsoid at this developmental stage. This decision vastly reduced our computational costs, enabling us to explore parameter space more effectively.

      Nevertheless, we apologize if the submitted manuscript did not sufficiently emphasize the main insights of the paper, model limitations, and model construction. In the revised manuscript, we have improved the abstract and discussion sections to explicitly highlight the main results and limitations. We have also provided further details of the model's structure and underlying logic in the appendix.

      Reviewer #3 (Public review):

      Summary:

      Tsingos et al. seek to advance beyond the current paradigm that proliferation of malignant cells in T-cell acute lymphoblastic leukemia occurs in a cell-autonomous fashion. Using a computational agent-based model and experimental validation, they show instead that cell proliferation also depends on interaction with thymic epithelial cells (TEC) in the thymic niche. One key finding is that a dense TEC network inhibits the proliferation of malignant cells and favors the proliferation of normal cells, whereas a sparse TEC network leads to rapid expansion of malignant thymocytes.

      Strengths:

      A key strength of this study is that it combines computational modeling using an agent-based model with experimental work. The original modeling and novel experimental work strengthen each other well. In the agent-based model, the authors also tested the effects of varying a few key parameters of cell proliferation.

      Weaknesses:

      A minor weakness is that the authors did not conduct a global sensitivity analysis of all parameters in their agent-based model to show that the model is robust to variation, which would demonstrate that their results would still hold under a reasonable level of variation in the model and model parameters. This is a minor point, and such a supporting study would end in an appendix or supplement.

      The reviewer highlights the lack of a global sensitivity analysis as a minor weakness. 

      In our previous work (Aghaallaei et al., Science Advances, 2021), we studied parameters sensitivity for some parameters, while in the submitted manuscript, we extended this exploration to parameters that we expected to be the most meaningful for cell proliferation.

      In the revised version of the manuscript, we have included an additional supplementary figure alongside Figure 4 to show the effect of changing parameters in "control" simulations lacking a lesioned clone. These data are also provided in the source data to Figure 4. While this does not constitute an exhaustive exploration of all parameter space, it provides a useful overview of the effect of the studied parameters on thymocyte population size in the absence of lesioned clones.

      Response to reviewer recommendations

      In the revision, we have improved the manuscript to address the reviewers’ points. The following is an overview of the changes to the manuscript:

      • We wrote an extensive Appendix to better explain the model implementation.

      • The Abstract was rewritten to improve clarity on what was done and to highlight the main findings.

      • Subheadings to paragraphs were rewritten to better emphasize the main findings.

      • Font sizes in Figure 2J and Figure 4E were increased to improve readability.

      • The spacing of graphical elements in the legend of Figure 4E was improved.

      • An error in Figure 5B was corrected (the legend labels had been accidentally swapped).

      • A new supplementary figure to Figure 4 shows the sensitivity of clone size in control simulations for a subset of the tested parameter combinations.

      • The Conclusion section was rewritten to better highlight limitations of the study and Improve the summary of the main findings. 

      • Minor wording improvements were done throughout the text to improve readability.

      In the following we respond to the reviewers’ individual recommendations.

      Reviewer #1 (Recommendations for the authors):

      I am not an expert in modelling, so I apologise if I missed these points in the manuscript. I am slightly confused about how differentiation and death are included in the model. At the beginning of the results you mention that you model a 5 um slice, is it known which stages of development occur in that section of the thymus? 

      We thank the reviewer for this question and appreciate the opportunity to clarify. Our virtual thymus is based on the medaka embryonic thymus, which we have extensively characterized using functional analyses and noninvasive in toto imaging (Bajoghli et al., Cell, 2009; Bajoghli et al., J Immunology, 2015; Aghaallaei et al., Science Advances, 2021; Aghaallaei, Eur J Immunology, 2022). These studies allowed us to map thymocyte developmental stages and migratory trajectories within the spatial context of a fully functional medaka thymus (see Figure 7 in Bajoghli et al., J Immunology, 2015).

      To simplify the biological system without compromising model fidelity, we chose to simulate a representative 5 µm slice from the ventral half of the thymus. Importantly, the medaka thymus is a symmetric organ (Bajoghli et al., J Immunology 2015), hence this slice captures all key events of T-cell development, including thymus homing, differentiation, proliferation, selection, and egress akin to our in vivo observations (see Figure 7 in Bajoghli et al., 2015 and Figure 7a in Aghaallaei et al., Science Advances, 2021).

      Furthermore, our model incorporates the spatial organization of the thymic cortex and medulla by including two types of thymic epithelial cells (TECs): cortical TECs positioned on the outer side, and medullary TECs on the inner side (see Figure Supplement 7 in Aghaallaei et al., Science Advances, 2021). Differentiation and cell death are modeled as discrete steps along the developmental trajectory, informed by our in vivo observations.

      We apologize to the reviewer if the workings of the model were not sufficiently clear in the original manuscript. To address this, and as also requested by reviewer 2, we provided an extensive Appendix in the revised version of the manuscript that also includes visual summaries of the model logic in the form of intuitive flowcharts.

      And is it known, or do you factor in, whether there are changes in the responsiveness of the thymocytes to signals, such as notch and IL7, depending on their state of differentiation?

      We have previously examined the roles of IL-7 (Aghaallaei et al., Science Advances, 2021) and Notch1 (Aghaallaei et al., Europ J Immunology, 2022) signaling in the medaka thymus. These studies demonstrated that T cell progenitors are responsive to both IL7 and Notch signaling, whereas more differentiated, non-proliferative thymocytes are unresponsive to IL-7. Our in vivo observations further suggest that mature thymocytes require Notch signaling during the thymic selection process. This appears to be a species-specific phenomenon (Aghaallaei et al., Europ J Immunology, 2022). 

      In the computational model, we include this state-specific responsiveness by incorporating a dependence on IL-7 and Notch signaling in the cellular decision to commit to the cell cycle (see Appendix Figure 6, and Appendix section X.) and in the decision of differentiating into αβ<sup>+</sup> or γδ<sup>+</sup> T cell subtypes (see Appendix Figure 5, and Appendix section IX.). Although the model still calculates pathway signaling activity for thymocytes in the differentiated stage belonging to the αβ<sup>+</sup> or γδ<sup>+</sup> subtype, this signaling activity has no downstream consequences for the cells’ behavior in the model.

      Note that in the computational model we do not incorporate feedback loops that regulate pathway activity (for example, it could be that thymocytes upregulate the IL7R receptor at some point in their differentiation trajectory – in the absence of speciesspecific knowledge of such regulatory feedbacks, we have chosen not to include any in our model).

      And you mention the stages of development are incorporated into the model but the main output that you discuss is thymocyte number or proliferation. It would be interesting to use the model to explore how parameters related to differentiation are changed by, for example, the level of IL7 signalling.

      We agree that examining how factors like IL-7 signaling influence thymocyte differentiation is a promising direction for future work. Based on our previous modelling work (Aghaallaei et al., Science Advances, 2021), we expect that increased IL7 availability or sensitivity should result in an increase of cells differentiating into the γδ<sup>+</sup> T cell subtype. As molecular tools for medaka continue to advance, we anticipate being able to refine and expand the model accordingly.

      Moreover, we see strong potential for adapting the current computational framework to model thymopoiesis in other species, such as mouse or human, where stage-specific markers are well characterized. We have now explicitly mentioned this opportunity for future development in the conclusion section of the revised manuscript (see page #26).

      It is also mentioned in the description of the model that the cells can die at the end of the development process. However, is death incorporated into the earlier stages of development? For instance, it is possible that when signals, such as a notch, are at low levels the thymocytes at certain stages of development will die.

      We thank the reviewer for this comment. In a previous study, we mapped the spatial distribution of apoptotic cells within the medaka thymus and did not observe cell death in the region where ETPs enter the cortical thymus (Bajoghli et al., J Immunology, 2015) and where Notch1 signaling becomes activated (Aghaallaei et al., Europ J Immunology, 2021). Notch mutants exhibit a markedly reduced number of thymocytes, this reduction could be attributed either to impaired thymus homing or increased cell death within the thymus. However, our unpublished data shows that the total number of apoptotic cells in Notch1b-deficient thymus is comparable to their wild-type siblings. In fact, our in vivo observations revealed that the frequency of thymus colonization by progenitors is significantly reduced in the notch1b mutant (Aghaallaei et al., J E Immunol., 2021). Based on these in vivo observations, our computational model incorporates cell death only at the end of the thymocyte developmental trajectory. The current model does not consider cell death at earlier stages. 

      Overall, the manuscript was well-written and the figures were clear and well-presented. A minor point would be that the writing in some of the figures was too small and difficult to read, such as in Figure 4. I also sometimes struggled to find the definition of the acronyms in the figures, for example in Figure 3 it would be helpful if the definitions for D, SD, and SA were given in the figure legend as well as in the figure itself.

      We thank the reviewer for the kind words. We have reworked the figures to have larger more readable font sizes and improved figure legends as suggested.

      Reviewer #2 (Recommendations for the authors):

      Suppose the computational results did throw up an important new phenomenon. How might researchers seek to replicate it? If no mathematical relations can be given, can at least the code be made publicly available?

      We apologize to the reviewer if the workings of the model were not sufficiently clear in the submitted manuscript. However, we believe there may have been a misunderstanding, and we would like to clarify that both the mathematical formulations and the code used in this study were publicly available in the scientific record at the time of submission.

      Specifically, the full source code for the virtual thymus model is hosted in a permanent Zenodo repository (accessible here: https://zenodo.org/records/11656320), which includes:

      - Model files and links to source codes for the simulation environment;

      - Pre-compiled binary versions of the simulation environment (EPISIM) for both Windows and Linux platforms;

      - Detailed documentation, including step-by-step instructions on how to install and use the provided files.

      The repository link is cited in the manuscript (see page 38) and in the section “Data and materials availability”.  

      In addition, the mathematical framework that underpins the computational model has already been published and described in detail in our previous work (Aghaallaei, et al. Science Advances, 2021). In the supplementary material of this publication, we provide extensive documentation of the model, including:

      - A 13-page textual explanation of the design rationale;

      - 44 equations describing model implementation;

      - Parameter choices, partial sensitivity analysis, additional simulations, and supporting data presented in two figures and four tables.

      Nonetheless, to improve transparency, we have added an extensive Appendix in the revised version of the manuscript that also includes visual summaries of the model logic in the form of intuitive flowcharts. We hope this clarification and the new provided appendix assures the reviewer that both reproducibility and transparency have been central to our approach. 

      What about the growth of the animal and its thymus over weeks 2-11?

      We thank the reviewer for this insightful question. Indeed, our current computational model does not incorporate thymus growth over time. We decided not to model the dynamic increase in TEC numbers or organ size over time because we wanted to maintain simplicity and computational tractability. Therefore, we assumed a steadystate thymic environment. The model is therefore limited to representing thymopoiesis under homeostatic conditions, as it appears to stabilize by day 11. This is a recognized limitation of the current model. Looking ahead, we plan to develop a more advanced computational framework that incorporates thymic growth and dynamic changes in cellular composition over time. We have now included a brief note on this limitation in the conclusion of the revised manuscript (see page #26).

    1. Author response:

      Reviewer #1 (Public review):

      The usefulness of the proposed new metric of "variant consistency" and how it can guide users in selecting demultiplexing methods seems a little unclear. It correlates with the level of ambient RNA/DNA contamination, which makes it look like a metric on data quality. However, it does depend on the exact demultiplexing method, yet it's not clear how it directly connects to the "accuracy" of each demultiplexing method, which is the most important property that users of these methods care about. Since the simulated data has ground truth of donor identities available, I would suggest using the simulated data to show whether "variant consistency" directly indicates the accuracy of each method, especially the accuracy within those "C2" reads.

      I also think the tool and analyses presented in this paper need some further clarification and documentation on the details, such as how the cell-type gene and peak probabilities are determined in the simulation, and how doublets from different cell types are handled in the simulation and analysis. A few analyses and figures also need a more detailed description of the exact methods used. 

      We thank the reviewer for their suggestions. We plan on revising the manuscript to reflect their suggestions, which will include clarification of the variant consistency metric and its relationship with demultiplexing accuracy based on the simulations and additional detail regarding ambisim’s generation of multiplexed snRNA/snATAC.

      Reviewer #2 (Public review):

      (1) Throughout the manuscript, the figure legends are difficult to understand, and this makes it difficult to interpret the graphs.

      (2) Since this is both a new tool and a benchmark, it would be worthwhile in the Discussion to comment on which demultiplexing tools one may want to choose for their dataset, especially given the warning against ensemble methods. From this extensive benchmarking, one may want to choose a tool based on the number of donors one has pooled, the modalities present, and perhaps even the ambient RNA (if it has been estimated previously).

      (3) What are the minimal computational requirements for running ambisim? What is the time cost? 

      We thank the reviewer for their suggestions. We plan on updating the manuscript to better clarify figure legends. We will also outline a set of concrete recommendations in our discussion section based on different multiplexed experimental designs. Finally, we will also include extra computational benchmarks for ambisim.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Recommendations for the authors: 

      Reviewer #1 (Recommendations for the authors): 

      (1) The use of the term "language network" throughout is unclear. Does this refer to work by Ev Fedorenko (i.e., does it distinguish language from other cognitive and sensorimotor domains)? There does not seem to be much in the behavior presented here that aligns with an interpretation about language per se. 

      We understand the reviewer’s point according to the work by Evelina Fedorenko considering this distinction. It is important to precise that in our present study we did not refer to her work when using the term “language network”.

      (2) Fig 4A: the "B" is missing on the figure panel to denote which Broadmann areas are shown. 

      We updated the figure panel by adding the “B” for more clarity.

      Reviewer #2 (Recommendations for the authors): 

      I think it would be worth mentioning the relatively sparse coverage of the right hemisphere in your abstract. 

      We agree with this suggestion, we updated the abstract as follows :  

      “Our use of language, which is profoundly social in nature, essentially takes place in interactive contexts and is shaped by precise coordination dynamics that interlocutors must observe. Thus, language interaction is highly demanding on fast adjustment of speech production. Here, we developed a real-time coupled-oscillators virtual partner that allows - by changing the coupling strength parameters - to modulate the ability to synchronise speech with a virtual speaker. Then, we recorded the intracranial brain activity of 16 patients with drug-resistant epilepsy while they performed a verbal coordination task with the virtual partner (VP). More precisely, patients had to repeat short sentences synchronously with the VP. This synchronous speech task is efficient to highlight both the dorsal and ventral language pathways. Importantly, combining time-resolved verbal coordination and neural activity shows more spatially differentiated patterns and different types of neural sensitivity along the dorsal pathway. More precisely, high-frequency activity in left secondary auditory regions is highly sensitive to verbal coordinative dynamics, while primary regions are not. Finally, while bilateral engagement was observed in the high-frequency activity of the IFG BA44— which seems to index online coordinative adjustments that are continuously required to compensate deviation from synchronisation—interpretation of right hemisphere involvement should be approached cautiously due to relatively sparse electrode coverage. These findings illustrate the possibility and value of using a fully dynamic, adaptive and interactive language task to gather deeper understanding of the subtending neural dynamics involved in speech perception, production as well as their interaction.”

      There are a few places in your results section which haven't been updated to reflect the fact that some sections refer only to the left hemisphere e.g. 

      Page 11 line 347: "Overall, neural responses are present in all six canonical frequency bands" I think this should be "In the left hemisphere, neural responses are present...". 

      Page 12 line 355: "As expected, the whole language network is strongly involved..." I think this should be "As expected, the whole left hemisphere language network is strongly involved".  Page 17 (third paragraph of the discussion): "The observed negative correlation between verbal coordination and high-frequency activity (HFa) in STG BA22" I think this should be "in left STG BA22". 

      We thank the reviewer for highlighting these important points. The updated lines are as follows:

      Page 11 line 348: ”In the left hemisphere, neural responses are present in all six canonical frequency bands…”  

      Page 12 line 356: ”As expected, the whole left hemisphere language network is strongly involved..." Page 17 lines 502-503 : “The observed negative correlation between verbal coordination and highfrequency activity (HFa) in left STG BA22 suggests a suppression of neural responses as the degree of behavioural synchrony increases.”

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Shin et al. conduct extensive electrophysiological and behavioral experiments to study the mechanisms of short-term synaptic plasticity at excitatory synapses in layer 2/3 of the rat medial prefrontal cortex. The authors interestingly find that short-term facilitation is driven by progressive overfilling of the readily releasable pool, and that this process is mediated by phospholipase C/diacylglycerol signaling and synaptotagmin-7 (Syt7). Specifically, knockdown of Syt7 not only abolishes the refilling rate of vesicles with high fusion probability, but it also impairs the acquisition of trace fear memory.

      Overall, the authors offer novel insight to the field of synaptic plasticity through well-designed experiments that incorporate a range of techniques.

      Comments on revisions:

      The authors have adequately addressed my earlier comments and questions.

      Reviewer #2 (Public review):

      All the comments from Reviewer #2 are the same as her/his comments to our original manuscript. Therefore, we have already responded to all the following comments in the first revision. Here we described our additional responses to the same comments.

      Summary:

      Shin et al aim to identify in a very extensive piece of work a mechanism that contributes to dynamic regulation of synaptic output in the rat cortex at the second time scale. This mechanism is related to a new powerful model and is well versed to test if the pool of SV ready for fusion is dynamically scaled to adjust supply demand aspects. The methods applied are state-of-the-art and both address quantitative aspects with high signal to noise. In addition, the authors examine both excitatory output onto glutamatergic and GABAergic neurons, which provides important information on how general the observed signals are in neural networks. The results are compellingly clear and show that pool regulation may be predominantly responsible. Their results suggests that a regulation of release probability, the alternative contender for regulation, is unlikely to be involved in the observed short term plasticity behavior (but see below). Besides providing a clear analysis of the underlying physiology, they test two molecular contenders for the observed mechanism by showing that loss of Synaptotagmin7 function and the role of the Ca dependent phospholipase activity seems critical for the short term plasticity behavior. The authors go on to test the in vivo role of the mechanism by modulating Syt7 function and examining working memory tasks as well as overall changes in network activity using immediate early gene activity. Finally, they model their data, providing strong support for their interpretation of TS pool occupancy regulation.

      Strengths:

      This is a very thorough study, addressing the research question from many different angles and the experimental execution is superb. The impact of the work is high, as it applies recent models of short term plasticity behavior to in vivo circuits further providing insights how synapses provide dynamic control to enable working memory related behavior through non-permanent changes in synaptic output.

      Weaknesses:

      While this work is carefully examined and the results are presented and discussed in a detailed manner, the reviewer is still not fully convinced that regulation of release probability is not a putative contributor to the observed behavior. No additional work is needed, but in the moment, I am not convinced that changes in release probability are not in play. One solution may be to extend the discussion of changes in rules probability as an alternative.

      As the Reviewer #3 suggested, we examined the dependence of EPSC amplitude on extracellular [Ca<sup>2+</sup>] ([Ca<sup>2+</sup>]<sub>o</sub>) in order to test our assertion that vesicular release probability (p<sub>v</sub>) is already saturated in resting conditions at L2/3 recurrent synapses. A three-fold increase is expected according to Dodge and Rahamimoff (1967), if resting p<sub>v</sub> has enough room to increase, when [Ca<sup>2+</sup>]<sub>o</sub> is elevated from 1.3 to 2.5 mM. We found an increase in the baseline EPSC amplitude only by 23%, and this change was not statistically significant, supporting our assertion.

      Fig 3. I am confused about the interpretation of the Mean Variance analysis outcome. Since the data points follow the curve during induction of short term plasticity, doesn't these suggests that release probability and not the pool size increases?

      We separated the conventional release probability into a multiplication of p<sub>v</sub> and p<sub>occ</sub>, in which p<sub>v</sub> = probability of TS vesicles and p<sub>occ</sub> = occupancy of release sites by TS vesicles. In this regard, the abscissa of V-M plot represents the conventional release probability. Because p<sub>v</sub> is close to unity, we interpreted a change along the abscissa as a change of p<sub>occ</sub>.

      Related, to measure the absolute release probability and failure rate using the optogenetic stimulation technique is not trivial as the experimental paradigm bias the experiment to a given output strength, and therefore a change in release probability cannot be excluded.

      We agree to this concern. Because EPSC data were obtained by optogenetic stimulation, it cannot be ruled out a possibility that optogenetic stimulation biased the release probability. Although we found that STP obtained by dual patch experiment was not different from that by optogenetic stimulation, it needs to confirm our conclusion using dual patch or other methods.

      Fig. 4B interprets the phorbol ester stimulation to be the result of pool overfilling, however, phorbol ester stimulation has also been shown to increase release probability without changing the size of the readily releasable pool. The high frequency of stimulation may occlude an increased paired pulse depression in presence of OAG, that others have interpreted in mammalian synapses as an increase in release probability.

      Provided that pv of TS vesicles is very high, the OAG-induced increase in EPSC1 and low STF and PTA are consistent with higher baseline p<sub>occ</sub> in PDBu conditions, while the number of docking sites is limited. It should be noted that previous PDBu-induced invariance of the RRP size is based on measuring the RRP size using hypertonic solution (Basu et al., 2007). Given that this sucrose method releases not only TS but also LS vesicles, the sucrose-based RRP size may not be affected by PDBu or OAG at L2/3 synapses too. Therefore, PDBu or OAG-induced increase in p<sub>occ</sub> (proportion of TS vesicles over LS+TS vesicles) would result in an increase in release probability without a change in the RRP size.

      The literature on Syt7 function is still quite controversial. An observation in the literature that loss of Syt7 function in the fly synapse leads to an increase of release probability. Thus the observed changes in short term plasticity characteristics in the Syt7 KD experiments may contain a release probability component. Can the authors really exclude this possibility? Figure 5 shows for the Syt7 KD group a very prominent depression of the EPSC/IPSC with the second stimulus, particularly for the short interpulse intervals, usually a strong sign of increased release probability, as lack of pool refilling can unlikely explain the strong drop in synaptic output.

      Comments on revisions:

      I am satisfied with the reply of the authors and I do not have any further points of concern.

      Reviewer #3 (Public review):

      The results are consistent with the main claim that facilitation is caused by overfilling a readily releasable pool, but alternative interpretations continue to seem more likely, especially when the current results are taken together with previous studies. Key doubts could be resolved with a single straightforward experiment (see below).

      The central issue is the interpretation of paired pulse depression that occurs when the interval between action potentials is 25 ms, but not when 50. To summarize: a similar phenomenon was observed at Schaffer collateral synapses (Dobrunz and Stevens, 1997), but was interpreted as evidence for a decrease in pv. Ca2+-channel inactivation was proposed as the mechanism, but this was not proven. The key point for evaluating the current study is that Dobrunz and Stevens specifically ruled out the kind of decrease in pocc that is the keystone premise of the current study because the depression occurred independently of whether or not the first action potential elicited exocytosis. Of course, the mechanism might be different at layer 2/3 cortical synapses. But, it seems reasonable to hope that the older hypothesis would be ruled out for the cortical synapses before concluding that the new hypothesis must be correct.

      The old and new hypotheses could be distinguished from each other cleanly with a straightforward experiment. Most/maybe all central synapses strengthen a great amount when extracellular Ca2+ is increased from 1.3 to 2 mM, even when intracellular Ca2+ is buffered with EGTA. According to the authors' model, this is only possible when pv is low, and so could not occur at synapses between layer 2/3 neurons. Because of this, confirmation that increasing extracellular Ca2+ does not change synaptic strength would support the hypothesis that baseline pv is high, as the authors claim, and the support would be impressive because large changes have been seen at every other type of synapse where this has been studied (to my knowledge at least). In contrast, the Ca2+ imaging experiment that has been added to the new version of the manuscript does not address the central issue because a wide range of mechanisms could, in principle, decrease release without involving prior exocytosis or altering bulk Ca2+ signals, including: a small decrease in nano-domain Ca2+, which wouldn't be detected because nano-domains contribute a minuscule amount to the bulk signal during Ca2+-imaging; or even very fast activity-dependent undocking of synaptic vesicles, which was reported in the same Kusick et al, 2020 study that is central to the LS/TS terminology adopted by the authors.

      Additional points:

      (1) A new section in the Discussion (lines 458-475) suggests that previous techniques employed to show that augmentation and facilitation are caused by increases in pv did not have the resolution to distinguish between pv and pocc, but this is misleading. The confusion might be because the terminology has changed, but this is all the more reason to clarify this section. The previous evidence for increases in pv - and against increases in pocc - is as follows: The residual Ca2+ that drives augmentation decreases the latency between the onset of hypertonic solution and onset of the postsynaptic response by about 150 ms, which is large compared to the rise time of the response. The decrease indicates that the residual Ca2+ drives a decrease in the energy barrier that must be overcome before readily releasable vesicles can undergo exocytosis, which is precisely the type of mechanism that would enhance pv. In contrast, an increase in pocc could change the rise time, but not the latency. There is a small change in the rise time, but this could be caused by changes in either pv or pocc, and one of the studies (Garcia-Perez and Wesseling, 2008) showed that augmentation occluded facilitation, even at times when pocc was reduced by a factor of 3, which would seem to argue against parallel increases in both pv and pocc.

      We greatly appreciate for pointing out our mis-understanding. We acknowledge that the post-tetanic acceleration of the latency in the hypertonicity-induced vesicle release may reflect a decrease in the activation energy barrier (ΔEa) for vesicle fusion resulting in an increase in fusion probability of TS vesicles (Stevens and Wesseling, 1999; Garcia-Perez and Wesseling, 2008). We agree that such latency changes are not easily explained by increases in p<sub>occ</sub> alone. Indeed, Taschenberger et al (2016) concluded that PTP is similar to the PDBu-induced increase in baseline EPSCs. Subsequently, Lin et al (2025) estimated PDBu-induced changes of TS vesicle pool size and p_fusion of TS vesicles (these correspond to p<sub>occ</sub> and p<sub>v</sub> in this study, respectively), and found that PDBu increases majorly the former (2 folds) and minorly the latter (1.3 folds). Although it has not been directly tested, it is possible that PTP increases p<sub>v</sub>. Accordingly, we corrected the first statement of the paragraph, and mentioned the possibility for a post-tetanic increase in p<sub>v</sub> of TS vesicles.

      It should be noted, however, it is still puzzling what is represented by the acceleration of the latency in the hypertonicity-induced vesicle release. Schotten et al (2015) simulated how vesicle release rate is affected by reducing ΔEa for vesicle fusion. They found that a reduction of ΔEa resulted in increases in the peak amplitude and shorter time-to-peak of vesicle fusion, but did not accelerate the latency. Therefore, it remains to be clarified whether shorter latency can be regarded as lower activation barrier.  Moreover, the sucrose-induced release rate is comparable with the vesicle recruitment rate (1-2/s; Neher, Neuron, 2008). This slowness of sucrose-induced vesicle release rate makes it difficult to distinguish the vesicle fusion rate from their priming rate.

      (2) Similar evidence from hypertonic stimulation indicates that Phorbol esters increase pv, but I am not aware of evidence ruling out a parallel increase in pocc.

      As noted above, none of known mechanisms can clearly explain the PDBu-induced shorter latency to hypertonicity-induced vesicle fusion (Schotten et al, 2015). Even if shorter latency reflects higher p<sub>v</sub>, it does not rule out a concurrent change in p<sub>occ</sub>. Supporting this notion, Lin et al. (2025) showed in the framework of the two state vesicle fusion model that PDBu application leads to a substantial increase in the number of TS vesicles (vesicles having high fusion propensity), with a moderate change in fusion probability (p<sub>fusion</sub>). In light of previous observation that high tonicity (500 or 1000 mOsm) did not alter the RRP size (Basu et al., 2007), the results of Lin et al. (2025) can be interpreted as an increase of ‘p<sub>occ</sub>’ in terms of the present study.

      Reference:

      Schotten et al. (2015). Additive effects on the energy barrier for synaptic vesicle fusion cause supralinear effects on the vesicle fusion rate. eLife 4:e05531.

      Lin, K.-H., Ranjan, M., Lipstein, N., Brose, N., Neher, E., & Taschenberger, H. (2025). Number and relative abundance of synaptic vesicles in functionally distinct priming states determine synaptic strength and short-term plasticity. J. Physiology.

      Comments on revisions:

      There are at least two straightforward ways to address the main concern.

      The first would be experiments analogous to those in Dobrunz and Stevens that show that - unlike at Schaffer collateral synapses - paired pulse depression at L2/3 synapses requires neurotransmitter release. I proposed this in the first round, but realized since that a simpler and more powerful strategy would be to test directly that pv is/is-not near 1.0 in 1.2 mM Ca2+ simply by increasing to 2 mM Ca2+ (and showing that synaptic strength does-not/does change). This would be powerful because the increase in Ca2+ greatly increases synaptic strength at Schaffer collaterals by about 2.5-fold. Concerns about a confounding elevation in the basal intracellular Ca2+ concentration could be easily neutralized by pre-treating with EGTA-AM, which the authors have already done for other experiments.

      We thank to Reviewer #3 for suggesting an experiment for testing our assertion that the vesicular release probability (p<sub>v</sub>) is very high at layer 2/3 recurrent excitatory synapses. As the Reviewer recommended, we assessed EPSC changes induced by an increase in extracellular calcium concentration ([Ca<sup>2+</sup>]<sub>o</sub>). The results are added as Figure 3—figure supplement 3 to the revised manuscript.

      Dodge and Rahamimoff (1967) discovered a fourth-power relationship between end-plate potential (EPP) and [Ca<sup>2+</sup>]<sub>o</sub> at a neuromuscular junction. More specifically they found

      EPP amplitude µ  ([Ca<sup>2+</sup>]<sub>o</sub> / (1 + [Ca<sup>2+</sup>]<sub>o</sub> /1.1 mM + [Ma<sup>2+</sup>]<sub>o</sub> /2.97 mM))<sup>4</sup>.

      This equation nicely predicts the effects of high external calcium on EPSC amplitudes observed at the calyx synapses: a 2.6-fold increase of EPSC by changing [Ca<sup>2+</sup>]<sub>o</sub> from 1.25 to 2 mM  (Thanawala and Regehr, 2013; predicted as 2.57);  a 2.36-fold increase by changing [Ca<sup>2+</sup>] from 1.5 to 2 mM (Lin and Taschenberger, 2025; predicted as 2.16). In the framework of two-step priming model, Lin et al. (2015) estimated a 1.9-fold increase (from 0.22 to 0.42) in p<sub>v</sub> of TS vesicles and a 1.23-fold increase in the number of TS vesicles. It is clear that the increase in p<sub>v</sub> would be possible only if p<sub>v</sub> is not saturated, while the increase in the number of TS vesicles is still possible regardless of baseline p<sub>v</sub> of TS vesicles.

      The Dodge and Rahamimoff’s equation predicts a 3.24-fold increase in baseline EPSC amplitude by elevating [Ca Ca<sup>2+</sup>]<sub>o</sub> from 1.3 mM to 2.5 mM at L2/3 synapses. Contrary to this prediction, our recordings revealed a 1.23 fold increase in baseline EPSC amplitude, and this change was not statistically significant.

      Given the steep dependence of vesicle release on [Ca<sup>2+</sup>]<sub>o</sub>, this minimal increase strongly suggests that p<sub>v</sub> at L2/3 recurrent synapses is already near maximal at rest, limiting the dynamic range for further enhancement through increased calcium influx. Accordingly, we observed a small but statistically significant decrease in the paired-pulse ratio (PPR) at higher [Ca<sup>2+</sup>]<sub>o</sub>. Although this reduction in PPR might be indicative of increased p<sub>v</sub>, it is more consistent with a slight increase in p<sub>occ</sub> rather than a substantive increase in p<sub>v</sub> under the context of very high p<sub>v</sub>. Accordingly, Lin et al. (2025) recently estimated an increase in the TS vesicle subpool size as 1.23-fold by elevating [Ca<sup>2+</sup>]<sub>o</sub> under the framework of the two-step vesicle priming mode. Taken together, these findings suggest that an increase in the number of TS vesicles or p<sub>occ</sub> may contribute to both an increase in baseline EPSC amplitudes and a decrease in PPR.

      Overall, our central claim that baseline p<sub>v</sub> is near maximal at L2/3 recurrent synapses is supported by 1) high baseline PPR; 2) insensitivity to EGTA-AM; 3) high double failure rate; 4) insensitivity to elevating [Ca<sup>2+</sup>]<sub>o</sub>. These data are difficult to reconcile with a model in which facilitation is mediated by Ca<sup>2+</sup>-dependent increases in p<sub>v</sub>. Instead, our results support a mechanism in which facilitation arises from changes in release site occupancy.

      References

      Dodge, F.A., & Rahamimoff, R. (1967). Co-operative action of calcium ions in transmitter release at the neuromuscular junction. J Physiol, 193(2), 419–432. 

      Thanawala, M.S., & Regehr, W.G. (2013). Presynaptic calcium influx controls neurotransmitter release in part by regulating the effective size of the readily releasable pool. J Neurosci, 33(11), 4625–4633.

      Lin, K.-H., Ranjan, M., Lipstein, N., Brose, N., Neher, E., & Taschenberger, H. (2025). Number and relative abundance of synaptic vesicles in functionally distinct priming states determine synaptic strength and short-term plasticity. J. Physiology.

      Neher E, Sakaba T (2008) Multiple Roles of Calcium Ions in the Regulation of Neurotransmitter Release. Neuron 59:861-872.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Recommendations for the authors):

      The authors have taken into consideration and addressed all my previous comments.

      This referee has one major concern remaining: although the authors have refined their analysis of mitochondrial morphology, my concern regarding the characterization of mitochondria in Drp1-depleted zygotes as "elongated" persists.

      Taking into account this reviewers' comment, the following description has been changed. Line 256-257: “Quantification of the aspect ratio (major axis/minor axis) suggests that mitochondria are significantly elongated in Drp1-depleted embryos" to “The mean aspect ratio (major axis/minor axis) increased slightly from 1.36 in control to 1.66 in Drp1-depleted embryos ."

      (1) The morphological analysis of mitochondria reveals that both axes increase in length. Yet, the aspect ratio it is virtually unchanged, at least in biologically relevant terms, if not statistically.

      - Please calculate and represent mitochondrial aspect ratio as major axis/minor axis in fig 2M.

      - Could the authors also display individual data points in the graphs of Figure 2 K, L and M?

      We have revised the graph display format in accordance with the reviewer's suggestions.

      (2) The authors provide PMID: 25264261 as an example, yet mitochondria in PMID: 35704569 are apparently elongated. Judging by the authors discussion about the differences between these two studies, it would be enriching to comment, in the discussion of the manuscript, on the differences in morphology and to the reason why these might arise

      This referee believes that the unconventional mitochondrial morphology upon fission inhibition, reported here, enhances the relevance of the study and raises questions that could promote novel research lines, if thoroughly discussed in the manuscript.

      Thank you for your insightful suggestion. However, since the latter paper (PMID: 35704569) lacks EM images, it would be difficult to accurately assess the elongation. Thus, we would like to reconsider the mitochondrial morphological changes in zygotes caused by Drp1 deletion levels based on the results of future research.

      Minor

      (1) Labels for the staining used are missing in figure 1-figure supplement 1

      (2) Line 218. Could the intended sentence be:

      "Live imaging of mitochondria (mt-GFP) and chromosomes (H2B-mCherry) in Myo19 depleted zygotes shows symmetric distribution and partitioning of mitochondria during the first embryonic cleavage (Figure 1-figure supplement 2A, 2B; Figure 1-Video 2)."

      (3) Figure 2M: Please calculate and represent mitochondrial aspect ratio as major axis/minor axis.

      (4) Include a label with the experimental condition in figure 1 fig supp 2.

      (5) Line 592: missing reference.

      Thank you for your careful correction. We have corrected all the points the reviewer pointed out in the revised version.

      Reviewer #2 (Recommendations for the authors):

      The authors have sufficiently revised the manuscript to accommodate the majority of suggestions provided by myself and the other reviewers. While it would have been useful to see further clarity around mitochondrial transport, the data presented provide valuable insight into the role of a mitochondrial dynamics regulator in mediating the first mitosis event in embryo development.

      We thank again reviewer 2 for the helpful comment. We would like to address the issue of (aggregated) mitochondrial transport, including analysis methods, as a future challenge.

      Reviewer #3 (Recommendations for the authors):

      After reading through the comments of other reviewers, what authors could potentially improve their manuscript had been largely summarized in three following points.

      (1) Authors would better clarify whether a loss of Drp1 contributes to the chromosome segregation defects directly (e.g. checking SAC-like activity) or indirectly (aggregated mitochondria became physically obstacle; maybe in part getting the cytoskeleton involved).

      (2) Although the level of Myo19 may not be so high (given the low level of TRAK2 in oocytes: Lee et al. PNAS 2024, PMID 38917013), authors would better further clarify the effect of Myo19-Trim with timelapse (e.g. EB3-GFP/Mt-DsRed) and EM analysis (detailed mitochondrial architecture).

      (3) Authors would better clarify phenotypic heterogeneity/variety regarding the degree of alteration in mitochondrial morphology/ architecture dependent on the levels of Drp1 loss with detailed quantification of EM images to address why aggregation of mitochondria in Drp1-/- parthenote (possibly, more likely Drp1 protein-free) looks different/weaker than Trim-awayed one. Employment of the parthenotes of Trim-awayed MII oocytes might also complement the further discussion.

      The revised preprinted have addressed all the points described above. Authors have also adequately indicated the limitations at each of the specific points. Revisions authors made have consolidated their conclusion, thus still, making this study an excellent one.The only remaining weakness is that the authors have not undertaken additional experiments to clarify any role for mitochondrial transport following Drp1 depletion.

      We thank again reviewer 3 for the insightful comments. We would like to address the comments you have raised (points that were unclear in this study) as issues for future study.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Chua, Daugherty, and Smith analyze a new set of archaeal 20S proteasomes obtained by cryo-EM that illustrate how the occupancy of the HbYX binding pocket induces gate opening. They do so primarily through a V24Y mutation in the αsubunit. These results are supported by a limited set of mutations in K66 in the α subunit, bringing new emphasis to this unit.

      Strengths:

      The new structure's analysis is comprehensive, occupying the entire manuscript. As such, the scope of this manuscript is very narrow, but the strength of the data is solid, and they offer an interesting and important new piece to the gate-opening literature.

      Weaknesses:

      Major Concerns

      (1) This manuscript rests on one new cryo-EM structure, leading to a single (albeit convincing) experiment demonstrating the importance of occupying the pocket and moving K66. Could a corresponding bulky mutation at K66 not activate the 20S proteasome?

      Thank you for this insightful question. We believe such a mutation would likely not activate the proteasome, and would likely  be detrimental to gate opening. Our previous work (Smith et al., Molecular Cell, 2007), and data presented in this manuscript, demonstrate that a K66A mutation, which removes the side chain, blocks 20S gate opening. Furthermore, our new αV24Y T20S structure reveals that Lys66 forms specific hydrogen bonds with surrounding residues that are crucial for stabilizing the open gate conformation (Fig. 5). An aromatic or bulky hydrophobic mutation at this position would be unable to form these essential hydrogen bonds and would likely disrupt the necessary stabilizing interactions.  

      (2) To emphasize the importance of this work, the authors highlight the importance of gateopening to human 20S proteasomes. However, the key distinctions between these proteasomes are not given sufficient weight.

      (a) As the authors note, the six distinct Rpt C-termini can occupy seven different pickets. However, how these differences would impact activation is not thoroughly discussed.

      We appreciate the reviewer's point regarding the complexities of eukaryotic 26S proteasome activation. While our manuscript discusses some aspects of this, we agree that a detailed mechanistic extrapolation from our archaeal T20S model to the diverse interactions within the human 26S proteasome is challenging. As we elaborate in our response to Reviewer #2 (Recommendation #3), the significant differences in α-ring composition (homoheptameric vs. heteroheptameric) and the multifactorial nature of Rpt C-termini binding make direct, wide-reaching speculations about specific pocket contributions in the eukaryotic system difficult at this stage. Our aim was to focus on the conserved fundamental role of the HbYX hydrophobic pocket itself. 

      (b) With those other sites, the relative importance of various pockets, such as the one controlling the α3 N-terminus, should be discussed more thoroughly as a potential critical difference.

      The reviewer raises an excellent point about the regulation of specific α-subunits, like the α3 N-terminus, which acts as a lynchpin in gating. Understanding its precise regulation in the eukaryotic 26S proteasome is indeed a key goal in the field. However, determining which specific HbYX binding events (e.g., in the α2-α3 pocket, the α3-α4 pocket, or cooperative binding across multiple pockets) control the α3 subunit's conformation is beyond the scope of what our current T20S structural data can definitively inform. The cooperative nature of HbYX binding and its precise allosteric consequences across the heteroheptameric α-ring are complex questions that remain to be fully elucidated in the eukaryotic system. Our study focuses on demonstrating the sufficiency of hydrophobic pocket occupancy for activation in a conserved manner, which we propose is a fundamental aspect of HbYX action. Identifying which of the seven distinct eukaryotic hydrophobic pockets must be engaged for full activation remains an important area for future research.

      (c) These differences can lead to eukaryote 20S gates shifting between closed and open and having a partially opened state. This becomes relevant if the goal is to lead to an activated 20S. It would have been interesting to have archaea 20S with a mix of WT and V24Y α-subunits. However, one might imagine the subclassification problem would be challenging and require an extraordinary number of particles.

      We agree with the reviewer that exploring mixed subunit populations is an interesting idea, particularly given the dynamic and potentially partially open states of eukaryotic proteasomes. We have previously considered co-expressing WT and V24Y α-subunits. However, the interpretation of such experiments would be challenging. With 14 potential sites for mutant incorporation across the two homoheptameric α-rings, a heterogeneous population of proteasomes with varying numbers and arrangements of V24Y subunits would be generated. Correlating any observed changes in activity or structure (e.g. via cryoEM subclassification, would be exceedingly difficult) to specific stoichiometries or arrangements of mutant subunits would be highly complex and likely inconclusive for deriving clear mechanistic insights.

      (d) Furthermore, the conservation of the amino acids around the binding pocket was not addressed. This seems particularly important in the relative contribution of a residue analogous to K66 or V24.

      We apologize for the mislabeled figure title in the previous submission, which may have made this information less accessible. We have now corrected the title for Supplemental Figure S10 (previously S9). This figure presents the sequence alignment showing the conservation of residues in and around the HbYX hydrophobic pocket, including those analogous to T20S αV24, αL21, and αA154. As discussed in the manuscript, key residues that form this pocket, such as those corresponding to and surrounding T20S L21 and A154, are indeed well conserved in human α-subunits. This conservation supports the relevance of our findings to eukaryotic proteasomes.

      Reviewer #2 (Public review):

      Summary:

      The manuscript by Chuah et al. reports the experimental results that suggest the occupancy of the HbYX pockets suffices for proteasome gate opening. The authors conducted cryo-EM reconstructions of two mutant archaeal proteasomes. The work is technically sound and may be of special interest in the field of structural biology of the proteasomes.

      Strengths:

      Overall, the work incrementally deepens our understanding of the proteasome activation and expands the structural foundation for therapeutic intervention of proteasome function. The evidence presented appears to be well aligned with the existing literature, which adds confidence in the presentation.

      Weaknesses:

      The paper may benefit from some minor revision by making improvements on the figures and necessary quantitative comparative studies.

      We appreciate the reviewers thoughtful critique of our manuscript and have made the requested changes and provided further perspectives mentioned below.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Line 467: Mammalian should be replaced with eukaryotic.

      Done.  

      (2) Figure 1 Caption: The descriptions of the blue and green boxes should be described in panel A's caption rather than waiting until panel C.

      Done.

      (3) Figure 2 A: For greater clarity, the asterisks should be replaced with the numbers H4, H5, and H6.

      Done.

      (4) Figure 7 caption: The panels are misannotated. What is listed as E should become D, and what is listed as F should become E.

      Done.

      (5) The title for Figure S9, "αV24Y T20S validation," is inappropriate. A better title should discuss the sequence conservation of those amino acids. Why is the arrow drawing attention to L21 when the paper is about V24? There should be a corresponding alignment that includes K66.

      Thank you for pointing out the title issue for Figure S10 (previously S9); this has now been corrected to reflect its focus on sequence conservation. The arrow highlighting L21 (and its eukaryotic analogues) is intended to draw attention to a key residue that, along with A154, forms part of the hydrophobic pocket occupied by V24Y. As detailed in the main text and shown in Figures 3C, 3D, and 4G, measurements involving L21 were used to demonstrate the widening of this pocket upon V24Y mutation or ZYA binding.

      Reviewer #2 (Recommendations for the authors):

      The authors might consider improving the manuscript by addressing the following minor issues:

      (1) Figure 1: it might be easier for readers to understand what the authors meant to show by superimposing the atomic model of the mutated sidechain with the density map. In this case, the density map could be rendered half-transparent, or it could be represented by mesh.

      We appreciate this suggestion for enhancing Figure 1. While we agree that showing the model fit within the density is valuable, we found that incorporating this directly into the comparative overlay panels of Figure 1 (which already depict multiple aligned density maps) made the figure overly complex and visually detracted from its primary message of comparing overall conformational states. However, we do provide a clear illustration of the model-to-map fit for the αV24Y T20S structure in Supplemental Figure S3, where the atomic model is shown within the transparent map surface. Furthermore, all our maps and models are publicly available, and we encourage interested readers to perform detailed comparisons. We believe this approach balances clarity in the main figure with the provision of detailed validation data.  

      (2) What is the solvent-inaccessible surface area of the mutated side-chain buried by its hydrophobic interaction with the HbYX pockets? How is this buried surface area compared to the solvent-accessible surface area of the HbYX pocket without the mutation?

      We appreciate the idea of another visual to answer the question and provide the reader with a better perception of this pocket in the WT versus V24Y T20S. To address this we added a new Supplemental Figure 7 with surfaces showing this comparison including each separate pocket and an overlay with solid and mesh surfaces. We also added this line to the text: “Moreover, molecular surface representations of the hydrophobic pocket clearly show occupancy by the mutant tyrosine’s side chain (Fig. S7)”.

      (3) Based on the data of the buried surface area of the mutated side-chain (requested above), can the authors make some quantitative comparison with the activated eukaryotic proteasome (either human or yeast 26S) with the alpha-pocket occupied with HbYX motifs from Rpt subunits? How similar are they?

      This is a thoughtful suggestion, and we understand the interest in directly comparing pocket occupancy across systems. While we draw general parallels regarding HbYXdependent activation in the discussion, we believe a direct quantitative extrapolation of specific surface area occupancies from our T20S V24Y mutant to the eukaryotic system would be overly speculative and unlikely to yield further definitive insights into the eukaryotic gate-opening mechanism at this time. The primary reason for this is the significant disparity in complexity between the archaeal T20S and eukaryotic 26S proteasomes. The eukaryotic α-ring is a heteroheptamer, composed of seven distinct αsubunits, which creates seven non-identical inter-subunit pockets. In contrast, our study utilizes the homoheptameric archaeal T20S. Furthermore, eukaryotic 26S proteasome activation involves the intricate binding of multiple C-terminal tails from the six different Rpt ATPase subunits of the 19S regulatory particle. These C-termini include various HbYX motifs as well as non-HbYX tails, and they interact with the diverse α-subunit pockets in a highly complex, multifactorial manner that drives what appears to be an allosteric mechanism for gate regulation.

      Crucially, the precise number of C-termini required for 20S gate-opening in the eukaryotic system, the specific combination of these Rpt C-termini, and even the exact inter-subunit pockets that must be occupied to induce robust gate opening are still areas of active investigation and are not resolved (as discussed in our manuscript). Therefore, attempting to extrapolate nuances, such as the precise degree of hydrophobic pocket occupancy from our single, engineered αV24Y side-chain (which models one specific type of Hb-pocket interaction in a simplified system) to each of the potentially five or more different Rpt Ctermini interactions within the various 20S inter-subunit pockets in the eukaryotic 26S proteasome, would involve too many assumptions and would not provide reliable predictive power to understand mechanism.

      However, regarding the fundamental question of how a hydrophobic group occupies the HbYX pocket in our archaeal model system, we believe Figure 4D provides relevant insight that may address the reviewer's underlying curiosity. This figure carefully illustrates the spatial overlap, showing that the engineered αV24Y side-chain and the hydrophobic 'Z' group of the ZYA HbYX-mimetic occupy the same region within the T20S inter-subunit hydrophobic pocket. This provides a clear visual comparison of this key 'Hb' interaction in our defined and structurally characterized system.

      (4) It may be helpful that at the end of the discussion, the authors make some comments on how the current results might offer insights into the eukaryotic proteasome activation, and on what the limitations of the current study are.

      We thank the reviewer for this suggestion. We agree that discussing the implications for eukaryotic proteasome activation and the study's limitations is important.

      Insights into Eukaryotic Proteasome Activation:

      We have indeed discussed how our current findings with the αV24Y T20S mutant offer insights into eukaryotic proteasome activation in the Discussion section. To briefly summarize:

      (1) Conservation of the Target Site: Our study highlights that the key residues forming the hydrophobic pocket targeted by the αV24Y mutation (αL21 and αA154 in T20S) are well-conserved in the human 20S α-subunits (as shown in Fig. S9). This suggests that the mechanism of inducing gate opening through occupancy of this specific hydrophobic 'Hb' pocket by an aromatic residue is a plausible strategy for activating eukaryotic proteasomes.

      (2) Relevance of the IT Switch: The αV24Y mutation, by occupying the Hb-pocket, allosterically affects the conserved IT switch, promoting an open-gate conformation. As detailed in our previous work (Chuah et al., Commun. Biol. 2023; Ref. 31 in the current manuscript), this IT switch mechanism is also functionally conserved in most human α-subunits. The current study reinforces that direct manipulation of the Hb-pocket is sufficient to trigger this conserved downstream gating machinery.

      (3) Therapeutic Implications: These findings further pinpoint the HbYX hydrophobic pocket as a specific and promising target for the design of small molecule proteasome activators aimed at human proteasomes.

      While these parallels are informative, we reiterate our caution (as also mentioned in response to comment #3 and in the manuscript regarding direct quantitative extrapolation due to the increased complexity of the heteroheptameric eukaryotic α-ring and the multifactorial nature of Rpt C-termini interactions.

      We also agree that we should add a statement regarding key limitation raised by the reviewer, to our manuscript. Below is the key limitations paragraph that has been added to the penultimate paragraph of the discussion: 

      “While this study provides significant insights, it is important to acknowledge certain limitations. A key limitation stems from using the homoheptameric archaeal T20S as our model. Although this simpler system allows for more reliable dissection of fundamental mechanisms, and core elements like HbYX-induced gate opening are conserved at the intersubunit pocket level, the overall T20S and eukaryotic 20S/26S proteasomes differ significantly in their complexity. Specifically, our engineered αV24Y mutation results in a tyrosine constitutively occupying all seven identical hydrophobic pockets. This contrasts with the eukaryotic proteasome, which possesses seven distinct α-subunit pockets that interact with various Rpt C-termini through dynamic binding. Moreover, the specific Rpt Ctermini interactions—whether acting individually or cooperatively—that are essential to drive gate opening in the eukaryotic system remain incompletely understood. Therefore, while insights from our archaeal system are valuable for understanding general principles, direct comparisons and extrapolations to the intricate allostery and interaction complexities of the eukaryotic 26S proteasome must be made with caution.”

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #2:

      Minor reviews:

      The caveats are (1) the particular point will perhaps only be interesting to a small slice of the eQTL research community; (2) the authors provide no statistical controls/error estimate or independent validation of the variance partitioning analysis in Figure 3, and (3) the authors don't seem to use the single-cell growth/fitness estimates for anything else, as Figure 4 uses loci mapped to growth from a previously published, standard culture-by-culture approach. It would be appropriate for the manuscript to mention these caveats.

      We have added two small mention of these caveats – mainly that the study may not generalize, and that the study does not attempt to try the variance partitioning on other traits or other system where the values of the partitions are better established.

      I also think it is not appropriate for the manuscript to avoid a comparison between the current work and Boocock et al., which reports single-cell eQTL mapping in the same yeast system. I recommend a citation and statement of the similarities and differences between the papers.

      We have added this reference and a clear statement of similarities between the two studies. It was not our intention to avoid this; we had simply not seen that study in the initial submission.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      This is an interesting follow-up to a paper published in Human Molecular Genetics reporting novel roles in corticogenesis of the Kif7 motor protein that can regulate the activator as well as the repressor functions of the Gli transcription factors in Shh signalling. This new work investigates how a null mutation in the Kif7 gene affects the formation of corticofugal and thalamocortical axon tracts and the migration of cortical interneurons. It demonstrates that the Kif7 null mutant embryos present with ventriculomegaly and heterotopias as observed in patients carrying KIF7 mutations. The Kif7 mutation also disrupts the connectivity between the cortex and thalamus and leads to an abnormal projection of thalamocortical axons. Moreover, cortical interneurons show migratory defects that are mirrored in cortical slices treated with the Shh inhibitor cyclopamine suggesting that the Kif7 mutation results in a down-regulation of Shh signalling. Interestingly, these defects are much less severe at later stages of corticogenesis.

      Strengths/weaknesses:

      The findings of this manuscript are clearly presented and are based on detailed analyses. Using a compelling set of experiments, especially the live imaging to monitor interneuron migration, the authors convincingly investigate Kif7's roles and their results support their major claims. The migratory defects in interneurons and the potential role of Shh signalling present novel findings and provide some mechanistic insights but rescue experiments would further support Kif7's role in interneuron migration. Similarly, the mechanism underlying the misprojection which has previously been reported in other cilia mutants remains unexplored. Taken together, this manuscript makes novel contributions to our understanding of the role of primary cilia in forebrain development and to the aetiology of neural symptoms in ciliopathy patients.

      We again thank Reviewer 1 for her/his positive assessment of our article. We have addressed several weaknesses identified by the reviewer, supplementing the initial results with new data, and correcting or clarifying the text where necessary. Our detailed responses to the reviewer’s recommendations appear at the end of each comment.

      Reviewer #1 (Recommendations for the authors):

      (1) The authors report remarkable phenotypic changes in E14.5 embryos in the projection patterns of corticofugal/thalamocortical axons and in interneuron migration, but some of those phenotypes appear much less severe at E16.5. This might be indicative of a delay in development. Does the migration of interneurons to more dorsal regions correspond to an extended Cxcl12 expression? Do interneuorons still show migratory defects at E16.5? To address a potential delay, the authors could, if feasible, repeat Tbr2/Tomato and L1 or neurofilament stainings in E18.5 embryos?

      The question of a possible developmental delay in Kif7 -/- embryos is important. To document this topic, we have extended our study initially focused on embryonic stage E14.5 to earlier (E12.5) and later (E16.5, E18.5/P0) developmental stages. We added new data on E12.5 (Fig. 1, Fig. 3, Fig. S4) and E18.5 (Fig. 3, Fig. 4) embryos in the main figures, and considerably extended the data on E16.5 embryos (Fig. 1, Fig. 3). The legends of figures and the text of the result section (p5-p6) have been modified accordingly. We now describe developmental defects in Kif7 -/- embryos, which are not simple developmental delays. The sequences of thalamic axon development and cIN migration are representative of this complexity.

      Thalamic axons: the pioneer projection is misrouted to the amygdala at E14.5 (Fig. 4B) whereas most Kif7 -/- thalamic axons extend to the cortex at E16.5, with a slight delay compared to WT axons (Fig. 4D). At E18.5, the Kif7 -/- thalamo-cortical projection appears rather normal in the rostral forebrain but is drastically reduced in the median and caudal forebrain (Fig. 4E). This strong decrease is confirmed by neurofilament staining performed at E18.5 which identifies a major loss of corticofugal and thalamo-cortical projections in Kif7 -/- brains (Fig. 4F). 

      Migrating cIN: During normal development, CXCL12 maintains cIN in their tangential pathways as they start to colonize the cortical wall (E13.5/E14.5). Then CXCL12 drops in the SVZ (Tiveron et al., 2006; Caronia-Brown and Grove, 2011) allowing wild type cIN to invade the cortical plate (Stumm et al., 2003; Li et al., 2008; Atkins et al., 2023). In Kif7 -/- embryos, CXCL12 is never expressed in the SVZ of the dorsal cortex. Therefore Kif7 -/- cIN migrate radially in the dorsal cortex instead of tangentially. We have improved our text in the result section to clarify this transient defect (p8-9).

      (2) Figure 1D: The overview of the Gsh2 and Tbr2 stainings does not allow us to see details of the PSPB. The lines indicating the position of the PSPB are not helpful either. Higher magnifications are required to see whether there are subtle differences at these boundaries as observed for other cilia mutants.

      We thank the reviewer for her/his question that allowed us to identify a mild default of patterning at the PSB, illustrated by high magnification pictures in the Fig. 1D and described in the result section (p5). This subtle defect of PSB patterning is consistent with previous observations in Kif7 -/- embryos (Putoux et al, 2019) and appears milder than the PSB defect in hypomorphic Gli3 Pdn mutants (PSB shifted dorsally and less well defined as illustrated in Kuschel et al, 2003 and Magnani et al., 2010).

      (3) Figure 3: The authors report an interesting mis-projection of thalamocortical axons towards the amygdala. A very similar pattern has been described in Gli3 hypomorphic Pdn mutants (Magnani et al., 2010), in Rfx3, and in Inpp5e null mutant embryos (Magnani et al., 2015). These papers lend further support that this Kif7 phenotype is Gli3 dependent and should be cited in the manuscript. Moreover, the mechanism(s) underlying this mis-projection remain unexplored. Is this phenotype rescued in the previously reported Kif7/ Gli3D699 double mutants? Is there an abnormal expression of axon guidance molecules?

      We deeply thank the reviewer for drawing our attention to the abnormal projection of thalamic axons to the amygdala described in the Gli3 Pdn mutant and in two ciliary mutants, Rfx3 -/- and Lnpp5e -/-. We cite these two papers (Magnani et al., 2010, 2015) in the revised manuscript (p7). In the Gli3 Pdn mutant, transplantation experiments show that a patterning defect of the ventral telencephalon (VT) underlies the mis-projection of the thalamus to the amygdala (Magnani et al, 2010). In the Rfx3 ciliary mutant, two possible mechanisms are proposed: pre-thalamus patterning defect and ectopic Netrin and Slit1 expression in the VT (Magnani et al, 2015). We do agree that understanding the mechanism of the thalamic misprojection in the Kif7 mutant would be of great interest. However, given the complexity of the putative mechanisms described in the Gli3 Pdn and Rfx3 mutants, we believe that this question deserves further investigation in a future study. Finally, the possibility that the thalamic projection defect observed in Kif7 -/- embryos could be rescued in Kif7/Gli3699 (double mutants in which Gli3R is overexpressed in the dorsal and ventral forebrain) is very unlikely. Our two main arguments are:

      (1) Magnani et al (2015) did not rescue the TCA pathfinding defect in the Rfx3 -/- ciliary mutant when they overexpressed GLI3-R (see TCA description in the Rfx3/ Gli3699 double mutant, last paragraph of the result section). The authors concluded “This finding could be explained by a requirement for Gli activator and not Gli repressor function in VT {ventral telencephalon} patterning and indeed, Gli3 western blots showed that the levels of Gli3R are not altered in the VT of Rfx3 -/- embryos”.

      (2) The GLI3-R/Gli3-FL ratio is decreased in the cortex of the Kif7 -/- embryos (dorsal telencephalon) as expected, whereas it is very low in the MGE of WT embryos (ventral telencephalon) and remains unaltered in the Kif7 -/- embryos (Fig. 2B).  

      Similarly, the analysis of Kif7 -/- cIN migratory defects leads us to conclude that Kif7 ablation impairs Gli activation function rather than Gli repressor function in the VT where cIN are generated.

      (4) Figure 4: The authors should discuss the difference between Tbr2 and Cxcl12 expression which does not extend into the dorsal-most cortical SVZ.

      We observed that the transient CXCL12 expression is lacking in the SVZ of the dorsal cortex of Kif7 -/- embryos at E14.5, in a region where TBR2 cells abnormally reach the cortical surface and intermingle with post-mitotic cells. A sentence in our previous version (lines 233-234) could suggest a link between the abnormal location of TBR2 expressing cells and the lack of CXCL12 expression. Having found no data in the literature to explain the absence of CXCL12 expression in the brain by an abnormal cellular environment or by a defect in transcription factor expression, we do not want to further elaborate on differences and similarities between TBR2 and CXCL12 expression patterns in the Kif7 -/- brain. We have modified our text accordingly in the result section of the revised manuscript (p8-9). 

      (5) Figure 5: The authors convincingly describe migratory defects of interneurons. The treatment with Shh agonist and antagonist provides some mechanistic insights but genetic or pharmacological rescue experiments would lend further support. For example, they could treat Kif7 mutant sections with Shh agonists or analyse Kif7/Gli3D699 double mutants.

      We thank the reviewer for her/his positive assessment of our analysis of the cIN migration. Unfortunately, the rescue experiments proposed by the reviewer should not help to further support our conclusions. First, Kif7 ablation in cIN prevents the processing of any SHH signal in the transcriptional pathway. Second, increasing GLI3R by crossing Kif7 -/- animals with Gli3D699 mice could possibly rescue the alterations of layering in the dorsal cortex where the GLI3R/GLI-FL ratio is strongly decreased and the SHH pathway activated. Such a rescue had been previously described for corpus callosum defects (Putoux et al., 2019). However, because cIN are generated in the ventral forebrain where SHH signaling predominantly activates the formation of GLI-A and where Kif7 ablation does not alter the GLI3 ratio, GLI3R re-introduction in the basal forebrain should rather increase the migratory defects of Kif7 -/- cIN instead of producing a rescue. To further support our conclusion, we analyzed the migratory behavior of Kif7 -/- cIN in a WT cortical environment. The results illustrated in the Fig. 6A and described in page 9 of the result section confirm that the migration defects of Kif7 -/-  cIN are reminiscent of an inhibition but not an activation of the  transcriptional SHH pathway (same phenotype as in Kif3a ciliary mutants described in Baudoin et al, 2012).

      (6) Figure 6: The authors describe the Shh mRNA and protein expression with relevance to interneuron migration. In contrast to the in situ hybridisation, the immunofluorescence analysis is not very convincing and requires further controls. The authors should at least show a no primary antibody control and, if available, could include a staining on Shh mutants. These additional controls are important as Shh protein expression in the developing cortex is highly controversial and a recent paper describes a different pattern (Manuel et al., 2022: https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3001563#). Moreover, it remains unclear whether the Shh protein expression is uniform within the cortex or follows lateral to medial or ventricular to pial gradients. A more thorough description and corresponding figures would be helpful. 

      Manuel et al. (2022) used the SHH KO (generated by Chiang et al., 1996) that develops a long proboscis to validate the rabbit anti-SHH antibodies (from Genetech) used in their study. They show a lack of SHH signal in the SHH KO. However, it is difficult to identify the cortex in this mouse line and the authors did not specify which part of the SHH protein was used to generate antibodies. We wished to use the SHH KO generated by Chiang and backcrossed on a C57B/6 line (Rash and Grove, 2007) that develops a layered neocortex at E17.5. However,

      (1) the SHH KO was obtained by replacing exon2 with a PGK-neo cassette and could express a 101 aa truncated protein comprising the N-ter part of the protein, and

      (2) the antibody we used, is a polyclonal N-ter antibody that targets the active SHH protein (Cys25-Gly198 part of SHH protein used as immunogen to produce the antibody). We thus thought that this labeling experiment will not give information on the specificity of the antibody, some epitopes being able to recognize the truncated protein produced in the SHH KO.

      To overcome the lack of a good mutant mice to validate the SHH N-ter antibodies, we analyzed the SHH immunostaining pattern at E12.5 and compared the expression profile with previously published SHH mRNA expression patterns. The border of the third ventricle and the ZLI were strongly immunostained by SHH-Nter antibodies and these regions were shown to express SHH mRNA at E12.5-E13.5 (Kicker et al. 2004, Loulier et al., 2005, Sahara et al., 2007 and Fig. 7B1). In brain sections at E14.5, only the choroid plexus was strongly labeled and some structures showed diffused labeling. We analyzed the distribution of SHH mRNAs in the cortex using a highly sensitive technique (RNAscop) at E14.5 and showed that very few cortical cells expressed SHH mRNA and at very low level. Anti-SHH-Nter antibodies immunostained numerous bright dots throughout the cortical neuropile, which is not surprising for a diffusible factor like SHH. However, the labeling was not homogeneous and showed a ventricle to pial gradient at E12.5 and aligned distributions in the different cortical layers at E14.5. We have described the expression pattern in more detail and modified the Fig. S4 by adding an image of immunostaining performed without SHH N-ter antibody.  

      (7) Figure S1: The Gli3 Western blot needs to be quantified. As the authors only show one control and one mutant sample, it remains unclear how representative this blot is. In addition to Gli3R and Gli3FL, the authors should also determine the ratio of both isoforms. Are there also differences in the MGE?

      We now produce results of Gli3 western blots in the cortex and MGE of several E14.5 Kif7 KO (n=4) and WT (n=4) embryos. The GLI3R/GLI3FL ratio has been determined in the cortex and in the MGE of WT and mutant embryos. Results are illustrated in the Fig. 2. 

      Minor points:

      The authors should carefully amend the literature on Gli genes and forebrain development. For example:

      (1) Line 85: Add Hasenpusch-Theil et al., 2018.

      We added this reference.

      (2) Line 141: Remove Magnani et al., 2010 (they characterized hypomorphic Gli3 Pdn mutants) and replace with Kuschel et al., 2003.

      Since our revised figure 2 illustrates GLI3 western blots and compare GLI3R/GLI3FL ratios in the cortex and MGE of WT and Kif7-/- embryos, we no longer cite these papers in the result section.

      (3) Line 380: Replace reference with Theil, 2005.

      We have replaced Magnani et al, 2014 by Theil 2005 in the sentence.

      (4) Line 414: Rallu et al is not an appropriate reference for this as this manuscript does not investigate the expression of a single cortical marker in Shh/Gli3 double mutants.

      We removed the reference Rallu et al. in the sentence.

      (5) Reference in line 355: do not use Vancouver style.

      We apologize for the mistake that was corrected.

      (6) Spelling: Line 447 it should read "choroid plexus"

      We again apologize for the mistake that has been corrected.

      Reviewer #2 (Public review):

      Summary:

      This study investigates the role of KIF7, a ciliary kinesin involved in the Sonic Hedgehog (SHH) signaling pathway, in cortical development using Kif7 knockout mice. The researchers examined embryonic cortex development (mainly at E14.5), focusing on structural changes and neuronal migration abnormalities.

      Strengths:

      (1) The phenotype observed is interesting, and the findings provide neurodevelopmental insight into some of the symptoms and malformations seen in patients with KIF7 mutations.<br /> (2) The authors assess several features of cortical development, including structural changes in layers of the developing cortex, connectivity of the cortex with the thalamus, as well as migration of cINs from CGE and MGE to the cortex.

      We greatly thank Reviewer 2 for her/his positive assessment of our work that characterize the neurodevelopmental defects induced by KIF7 ablation. We have deeply reorganized and implemented data in the figures to show changes occurring in different cortical cell types and at different stages. We have moreover corrected and clarified the text where necessary. Our detailed responses to the reviewer’s recommendations appear at the end of each comment.

      Weaknesses:

      (1) The Kif7 null does have phenotype differences from individual mutations seen in patients. It would be interesting to add more thoughts about how the null differs from these mutants in ciliary structure and SHH signaling via the cilium.

      We are grateful to the Reviewer for recalling that Kif7 ablation alters SHH signaling within primary cilium and has a strong effect on ciliary structure. In the revised version of the manuscript, we discuss data from the literature that describe these alterations in human (Putoux et al, 2011) and in murine KIF7 depleted cells (He et al, 2015; Cheung et al., 2009; Lai et al., 2021) (discussion p13).

      (2) The description of altered cortex development at E14.5 is perhaps rather descriptive. It would be useful to assess more closely the changes occurring in different cell types and stages. For this it seems very important to have a time course of cortical development and how the structural organization changes over time. This would be easy to assess with the addition of serial sections from the same. It might also be interesting to see how SHH signaling is altered in different cortical cell types over time with a SHH signaling reporter mouse.

      We thank the Reviewer for her/his request that helped us to improve our description of developmental defaults in the Kif7 -/- cortex.  In the revised manuscript, we have expanded our study initially focused on embryonic stage E14.5 to earlier (E12.5) and later (E16.5, E18.5 /P0) developmental stages. Instead of focusing on median forebrain sections, we have expanded our observations to rostral and caudal sections. Altogether, these new observations allow us to describe more precisely the complex developmental defects in the Kif7 -/- cortex over time, in specific cortical regions (dorsal versus lateral cortex, and rostral versus caudal levels). Figures 1, 3, 4, and S4 have been deeply edited to present new data on E12.5 (Fig. 1, Fig. 3, Fig. S4), E16.5 (Fig. 1, Fig. 3) and E18.5 (Fig. 3, Fig. 4) embryos. We have modified the legends and text in the result section (p5-6) accordingly. We agree with the Reviewer that deciphering how SHH signaling is altered in the different cortical cells over time should be highly interesting and relevant. Nevertheless, we anticipate complex analyses and consider that they should be retained for future studies.

      (3) Abnormal neurodevelopmental phenotypes have been widely reported in the absence of other key genes affecting primary cilia function (Willaredt et al., J Neurosci 2008; Guo et al., Nat Commun 2015). It would be interesting to have more discussion of how the Kif7 null phenotype compares to some of these other mutants.

      We agree with this Reviewer concern. In the revised manuscript, we discuss our results with regard to previous observations in other ciliary mutants. The murine cobblestone mutant described in Willaredt et al. (2008) indeed shows defects similar to those we describe in the Kif7 -/- mouse. We thank again the Reviewer for her/his helpful comment that allowed us to strengthen and better interpret our results. Guo et al (2015) did not conduct a study of ciliary mutants. Nevertheless, their characterization of cortical developmental defects following invalidation of genes involved in human ciliopathies identified cell autonomous defects in cortical progenitors and in differentiating cortical neurons, which corroborate our observations (p.15)

      (4) The authors see alterations in cIN migration to the cortex and observe distinct differences in the pattern of expression of Cxcl12 as well as suggest cell-intrinsic differences within cIN in their ability to migrate. The slice culture experiments though make it a little difficult to interpret the cell intrinsic effects on cIN of loss of Kif7, as the differences in Cxcl12 patterns still exist presumably in the slice cultures. It would be useful to assess their motility in an assay where they were isolated, as well as assess transcriptional changes in cINs in vivo lacking KIF7 for expression patterns that may affect motility or other aspects of migration.

      To circumvent the difference in the expression profile of CXCL12 in the dorsal cortex of WT and Kif7 -/- embryos on the migratory behavior of cIN, we compared the trajectories and dynamics of WT and Kif7 -/- cIN imaged in the lateral cortex where CXCL12 expression appears similar in WT and Kif7 -/- brains.

      We moreover followed the reviewer recommendation and analyzed the migratory behavior of Kif7 -/- cIN that migrate as isolated cells on a dissociated substrate of WT cortical cells. We sincerely thank the reviewer for her/his suggestion as the results revealed an interesting and relevant ciliary phenotype in migrating Kif7 -/- cIN. This additional experiment confirms that Kif7 -/- cIN exhibit the same migratory defects as those initially characterized in the Kif3a -/-  ciliary mutant.  The new results are illustrated in the Fig. 6A and described in the result section (p9). We agree with the reviewer that the analysis of transcriptional changes that could affect Kif7 -/- cIN motility and migration would be very interesting to study, but this study is beyond the scope of the present article.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Review:

      Review #1 (Public review):

      Also, they observed no difference in the binding free energy of phosphatidyl-serine with wild TREM2-Ig and mutant TREM2-Ig, which is a bit inconsistent with the previous report with experiment studies by Journal of Biological Chemistry 293, (2018), Alzheimer's and Dementia 17, 475-488 (2021), Cell 160, 1061-1071 (2015).

      We agree with the reviewer that our results do not fully recapitulate experimental findings and directly note this in the body of our work, particularly given the known limitations of free energy calculations in MD simulations, as outlined in the Limitations section. Our claim is that the loss-of-function effects of the R47H variant extend beyond decreased binding affinities which are likely due to variable binding patterns. We have also re-analyzed and highlighted statistically significant differences in interaction entropies. Ultimately, our claim is that mutational effects extend beyond experimentally confirmed differences in binding affinities.

      Perhaps the authors made significant efforts to run a number of simulations for multiple models, which is nearly 17 microseconds in total; none of the simulations has been repeated independently at least a couple of times, which makes me uncomfortable to consider this finding technically true. Most of the important conclusions that authors claimed, including the opposite results from previous research, have been made on the single run, which raises the question of whether this observation can be reproduced if the simulation has been repeated independently. Although the authors stated the sampling number and length of MD simulations in the current manuscript as a limitation of this study, it must be carefully considered before concluding rather than based on a single run.

      To address this comment, we have added numerous replicates to our simulations of WT and R47H (s)TREM2 without lipids and substantially increased the total simulation time. Each pure protein system now has six total microsecond-long technical replicates. The addition of replicates strengthens the validity of the work and allows us to make stronger novel conclusions than with one simulation alone, particularly for claims regarding the CDR2 loop and sTREM2 stalk.  In our models with phospholipids, running multiple independent biological replicates of the same system offers a more rigorous methodology than simply repeating simulations of the same docked model. This strategy allows us to sample several distinct starting configurations, thereby minimizing biases introduced by docking algorithms and single-model reliance.

      sTREM2 shows a neuroprotective effect in AD, even with the mutations with R47H, as evidenced by authors based on their simulation. sTREM2 is known to bind Aβ within the AD and reduce Aβ aggregation, whereas R47H mutant increases Aβ aggregation. I wonder why the authors did not consider Aβ as a ligand for their simulation studies. As a reader in this field, I would prefer to know the protective mechanism of sTREM2 in Aβ aggregation influenced by the stalk domain.

      Our initial approach for this study used Aβ as a ligand rather than phospholipids. However, we noted the difficulties in simulating Aβ, particularly in choosing relevant Aβ structures and oligomeric states (n-mers). We believe that phospholipids represent an equally pertinent ligand for TREM2, given its critical role in lipid sensing and metabolism. Furthermore, there is growing recognition in the AD research community of the need to move beyond Aβ and focus on other understudied pathological mechanisms.

      In a similar manner, why only one mutation is considered "R47H" for the study? There are more server mutations reported to disrupt tethering between these CDRs, such as T66M. Although this "T66M" is not associated with AD, I guess the stalk domain protective mechanism would not be biased among different diseases. Therefore, it would be interesting to see whether the findings are true for this T66M.

      In most previous studies, the mechanism for CDR destabilization by mutant was explored, like the change of secondary structures and residue-wise interloop interaction pattern. While this is not considered in this manuscript, neither detailed residue-wise interaction that changed by mutant or important for 'ligand binding" or "stalk domain".

      These are both excellent points that deserve extensive investigation, although we note that our paper does include significant protein-protein and protein-ligand interaction mapping that encompasses both the CDR2 loop and stalk, analyses which were not performed in any previous papers. In a separate paper, we explored more detailed residue-wise interactions for the CDR2 loop (Lietzke et al., Alzheimer’s and Dementia, 2025). While R47H is the most common and prolific mutation in literature, an extensive catalog of other mutations is important to explore. To this end, we are currently preparing a separate publication that will explore a larger mutational library and include more detailed sTREM2 analyses. 

      The comparison between the wild and mutant and other different complex structures must be determined by particular statistical calculations to state the observed difference between different structures is significant. Since autocorrelation is one of the major concerns for MD simulation data for predicting statistical differences, authors can consider bootstrap calculations for predicting statistical significance.

      The addition of numerous replicates across systems negates potential effects from autocorrelation and allows us to include standard deviations to critically assess the validity of our claims.

      Review #2 (Public review):

      The authors state that reported differences in ligand binding between the TREM2 and sTREM2 remain unexplained, and the authors cite two lines of evidence. The first line of evidence, which is true, is that there are differences between lipid binding assays and lipid signaling assays. However, signaling assays do not directly measure binding. Secondly, the authors cite Kober et al 2021 as evidence that sTREM2 and TREM2 showed different affinities for Abeta1-42 in a direct binding assay. Unfortunately, when Kober et al measured the binding of sTREM2 and Ig-TREM2 to Abeta they reported statistically identical affinities (Kd = 3.8 {plus minus} 2.9 µM vs 5.1 {plus minus} 3.7 µM) and concluded that the stalk did not contribute measurably to Abeta binding.

      We appreciate the reviewer’s insight and acknowledge the need to clarify our interpretation of Kober et al. (2021). We have adjusted how we cite Kober et al. and reframed the first paragraph in the second results section.

      In line with these findings, our energy calculations reveal that sTREM2 exhibits weaker—but still not statistically significant—binding affinities for phospholipids compared to TREM2. These results suggest that while overall binding affinity might be similar, differences in binding patterns or specific lipid interactions could still contribute to functional differences observed between TREM2 and sTREM2.

      The authors appear to take simulations of the Ig domain (without any stalk) as a surrogate for the full-length, membrane-bound TREM2. They compare the Ig domain to a sTREM2 model that includes the stalk. While it is fully plausible that the stalk could interact with and stabilize the Ig domain, the authors need to demonstrate why the full-length TREM2 could not interact with its own stalk and why the isolated Ig domain is a suitable surrogate for this state.

      We believe that this is a major limitation of all computational work of TREM2 to-date, and of experimental work which only presents the Ig-like domain. This is extensively discussed in the limitations section of our paper and treated carefully throughout the text. We are currently working toward a separate manuscript that will represent the first biologically relevant model of full-length TREM2 in a membrane and will rigorously assess the current paradigm of using the Ig-like domain as an experimental surrogate for TREM2.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Perhaps the authors made significant efforts to run a number of simulations for multiple models, which is nearly 17 microseconds in total; none of the simulations has been repeated independently at least a couple of times, which makes me uncomfortable to consider this finding technically true. Most of the important conclusions that authors claimed, including the opposite results from previous research, have been made on the single run, which raises the question of whether this observation can be reproduced if the simulation has been repeated independently. Although the authors stated the sampling number and length of MD simulations in the current manuscript as a limitation of this study, it must be carefully considered before concluding rather than based on a single run.

      To address this comment, we have added numerous replicates to our simulations of WT and R47H (s)TREM2 without lipids and substantially increased the total simulation time. Each pure protein system now has six total microsecond-long technical replicates. The addition of replicates strengthens the validity of the work and allows us to make stronger novel conclusions than with one simulation alone, particularly for claims regarding the CDR2 loop and sTREM2 stalk.  In our models with phospholipids, running multiple independent biological replicates of the same system offers a more rigorous methodology than simply repeating simulations of the same docked model. This strategy allows us to sample several distinct starting configurations, thereby minimizing biases introduced by docking algorithms and single-model reliance. 

      (2) sTREM2 shows a neuroprotective effect in AD, even with the mutations with R47H, as evidenced by authors based on their simulation. sTREM2 is known to bind Aβ within the AD and reduce Aβ aggregation, whereas R47H mutant increases Aβ aggregation. I wonder why the authors did not consider Aβ as a ligand for their simulation studies. As a reader in this field, I would prefer to know the protective mechanism of sTREM2 in Aβ aggregation influenced by the stalk domain.

      Our initial approach for this study used Aβ as a ligand rather than phospholipids. However, we noted the difficulties in simulating Aβ, particularly in choosing relevant Aβ structures and oligomeric states (n-mers). We believe that phospholipids represent an equally pertinent ligand for TREM2, given its critical role in lipid sensing and metabolism. Furthermore, there is growing recognition in the AD research community of the need to move beyond Aβ and focus on other understudied pathological mechanisms.

      (3) In a similar manner, why only one mutation is considered "R47H" for the study? There are more server mutations reported to disrupt tethering between these CDRs, such as T66M. Although this "T66M" is not associated with AD, I guess the stalk domain protective mechanism would not be biased among different diseases. Therefore, it would be interesting to see whether the findings are true for this T66M.

      (4) In most previous studies, the mechanism for CDR destabilization by mutant was explored, like the change of secondary structures and residue-wise interloop interaction pattern. While this is not considered in this manuscript, neither detailed residue-wise interaction that changed by mutant or important for 'ligand binding" or "stalk domain".

      These are both excellent points that deserve extensive investigation, although we note that our paper does include significant protein-protein and protein-ligand interaction mapping that encompasses both the CDR2 loop and stalk, analyses which were not performed in any previous papers. In a separate paper, we explored more detailed residue-wise interactions for the CDR2 loop (Lietzke et al., Alzheimer’s and Dementia, 2025). While R47H is the most common and prolific mutation in literature, an extensive catalog of other mutations is important to explore. To this end, we are currently preparing a separate publication that will explore a larger mutational library and include more detailed sTREM2 analyses.

      (5) The comparison between the wild and mutant and other different complex structures must be determined by particular statistical calculations to state the observed difference between different structures is significant. Since autocorrelation is one of the major concerns for MD simulation data for predicting statistical differences, authors can consider bootstrap calculations for predicting statistical significance.

      The addition of numerous replicates across systems negates potential effects from autocorrelation and allows us to include standard deviations to critically assess the validity of our claims.

      Reviewer #2 (Recommendations for the authors):

      Major points:

      (1) I encourage the authors to review Figure 5D and the text of section 2.7 from Kober et al 2021, which argued that "(t)he identical (within error) binding affinities indicated that the TREM2 Ig domain composes the majority (if not entirety) of the mAβ42 binding surface."

      We appreciate the reviewer’s insight and acknowledge the need to clarify our interpretation of Kober et al. (2021). We have adjusted how we cite Kober et al and reframed the first paragraph in the second results section.

      (2) The abstract and text need extensive revision to address the major concerns, which jeopardize the biological premise and significance of the work.

      We have made changes to the abstract and text to reflect concerns and revisions.

      (3) The title and abstract should change to reflect the contents of the paper. The authors do not directly measure lipid binding, nor are any of the computations done in a membrane environment. The authors do not measure anything in the brain.

      We have modified the title to better reflect the content of the paper. The paper measures lipid binding in the form of free energy calculations and interaction maps.

      Minor points:

      (1) How does the conservation of the TREM2 stalk compare to the Ig domain as they relate to the TREM2 family?

      While this study may inspire further exploration of other TREM receptors, we do not believe that our results extend to other TREM family members because of relatively low homology.

      (2) Please show the locations of the glycosylation sites on a model in Figure 1 and discuss their potential contribution to the ligand binding surfaces.

      N-linked glycosylation points are now noted on the sequence map of Figure 1 and updated in the text.

      (3) There is an isoform of TREM2 that produces a secreted product that is similar to the sTREM2 produced by proteolysis. The authors should comment as to whether their findings would apply to secreted TREM2.

      We have addressed this with a new line in the ‘Ideas and Speculation’ section.

      (4) This sentence on p. 2, line 73 references a review, not a study:

      This has been corrected.

      (5) "Yet, one study suggested effective TREM2 stimulation by PLs may require co-presentation with other molecules, potentially reflecting the nature of lipoprotein endocytosis30"

      This has been corrected.

      (6) Is "inclusive" on line 88 a typo for inconclusive?

      This has been corrected.

      (7) "Further, there is a strong correlation between the levels of sTREM2 in the cerebrospinal fluid and that of Tau, however correlation with Aβ is inclusive"

      This has been corrected.