7,306 Matching Annotations
  1. Oct 2025
    1. Suzanne Briet: Physical evidence as document

      In part, I appreciate the pragmatism of Briet's approach. It would certainly make a cataloger's life easier to view documents in this way and, on its surface, it makes a tremendous amount of "sense".

      However, I can't help but feel this view is a little too limited. Certainly, it seems to me, the antelope itself would be a source of information. In one way it is an example of what an "antelope" is, but it is also an individual and, beyond that, an individual at a certain snapshot in time.

      In a very broad view, we can think that nothing is truly permanent as all things are constantly changing. I think it depends so much on how we observe and questions of time scale.

      Human beings are not even exactly what we were in the past. We grow (both physically and in other ways), we change (we age, we change our minds, we change our clothes, we get tattoos, we erase tattoos) and eventually we, as an individual, will cease to exist by any observable means (depending on your belief system) other than by the "things" we leave behind.

      We also continue to exist, in a sense, in the minds of those who knew us, but their memories cannot be a whole picture of who we were and certainly no one may know truly how we are inside our own heads. Others will certainly bring their own biases or preferences to their memories of us which may or may not be a complete picture of who we were.

    1. Author response:

      Reviewer #1 (Public review):

      In this important study, the authors develop a suite of machine vision tools to identify and align fluorescent neuronal recording images in space and time according to neuron identity and position. The authors provide compelling evidence for the speed and utility of these tools. While such tools have been developed in the past (including by the authors), the key advancement here is the speed and broad utility of these new tools. While prior approaches based on steepest descent worked, they required hundreds of hours of computational time, while the new approaches outlined here are >600-fold faster. The machine vision tools here should be immediately useful to readers specifically interested in whole-brain C. elegans data, but also for more general readers who may be interested in using BrainAlignNet for tracking fluorescent neuronal recordings from other systems.

      I really enjoyed reading this paper. The authors had several ground truth examples to quantify the accuracy of their algorithms and identified several small caveats users should consider when using these tools. These tools were primarily developed for C. elegans, an animal with stereotyped development, but whose neurons can be variably located due to internal motion of the body. The authors provide several examples of how BrainAlignNet reliably tracked these neurons over space and time. Neuron identity is also important to track, and the authors showed how AutoCellLoader can reliably identify neurons based on their fluorescence in the NeuroPAL background. A challenge with NeuroPAL though, is the high expression of several fluorophores, which compromises behavioral fidelity. The authors provide some possible avenues where this problem can be addressed by expressing fewer fluorophores. While using all four channels provided the best performance, only using the tagRFP and CyOFP channels was sufficient for performance that was close to full performance using all 4 NeuroPAL channels. This result indicates that the development of future lines with less fluorophore expression could be sufficient for reliable neuronal identification, which would decrease the genetic load on the animal, but also open other fluorescent channels that could be used for tracking other fluorescent tools/markers. Even though these tools were developed for C. elegans specifically, they showed BrainAlignNet can be applied to other organisms as well (in their case, the cnidarian C. hemisphaerica), which broadens the utility of their tools.

      Strengths:

      (1) The authors have a wealth of ground-truth training data to compare their algorithms against, and provide a variety of metrics to assess how well their new tools perform against hand annotation and/or prior algorithms.

      (2) For BrainAlignNet, the authors show how this tool can be applied to other organisms besides C. elegans.

      (3) The tools are publicly available on GitHub, which includes useful README files and installation guidance.

      We thank the reviewer for noting these strengths of our study.

      Weaknesses:

      (1) Most of the utility of these algorithms is for C. elegans specifically. Testing their algorithms (specifically BrainAlignNet) on more challenging problems, such as whole-brain zebrafish, would have been interesting. This is a very, very minor weakness, though.

      We appreciate the reviewer’s point that expanding to additional animal models would be valuable. In the study, we have so far tested our approaches on C. elegans and Jellyfish. Given that this is considered a ‘very, very minor weakness’ and that it does not directly affect the results or analyses in the paper, we think this might be better to address in future work.

      (2) The tools are benchmarked against their own prior pipeline, but not against other algorithms written for the same purpose.

      We agree that it would be valuable to benchmark other labs’ software pipelines on our datasets. We note that most papers in this area, which describe those pipelines, provide the same performance metrics that we do (accuracy of neuron identification, tracking accuracy, etc), so a crude, first-order comparison can be obtained by comparing the numbers in the papers. But, we agree that a rigorous head-to-head comparison would require applying these different pipelines to a common dataset. We considered performing these analyses, but we were concerned that using other labs’ software ‘off the shelf’ on our data might not represent those pipelines in their best light when compared to our pipeline that was developed with our data in mind. Data from different microscopy platforms can be surprisingly different and we wouldn’t want to perform an analysis that had this bias. Therefore, we feel that this comparison would be best pursued by all of these labs collaboratively (so that they can each provide input on how to run their software optimally). Indeed, this is an important area for future study. In this spirit, we have been sharing our eat-4::GFP datasets (that permit quantification of tracking accuracy) with other labs looking for additional ways to benchmark their tracking software.

      We also note that there are not really any pipelines to directly compare against CellDiscoveryNet, as we are not aware of any other fully unsupervised approach for neuron identification in C. elegans.

      (3) Considerable pre-processing was done before implementation. Expanding upon this would improve accessibility of these tools to a wider audience.

      Indeed, some pre-processing was performed on images before registration and neuron identification -- understanding these nuances can be important. The pre-processing steps are described in the Results section and detailed in the Methods. They are also all available in our open-source software. For BrainAlignNet, the key steps were: (1) selecting image registration problems, (2) cropping, and (3) Euler alignment. Steps (1) and (3) were critically important and are extensively discussed in the Results and Discussion sections of our study (lines 142-144, 218-234, 318-323, 704-712). Step (2) is standard in image processing. For AutoCellLabeler and CellDiscoveryNet, the pre-processing was primarily to align the 4 NeuroPAL color channels to each other (i.e. make sure the blue/red/orange/etc channels for an animal are perfectly aligned). This is also just a standard image processing step to ensure channel alignment. Thus, the more “custom” pre-processing steps were extensively discussed in the study and the more “common” steps are still described in the Methods. The implementation of all steps is available in our open-source software.

      Reviewer #2 (Public review):

      Summary:

      The paper introduced the pipeline to analyze brain imaging of freely moving animals: registering deforming tissues and maintaining consistent cell identities over time. The pipeline consists of three neural networks that are built upon existing models: BrainAlignNet for non-rigid registration, AutoCellLabeler for supervised annotation of over 100 neuronal types, and CellDiscoveryNet for unsupervised discovery of cell identities. The ambition of the work is to enable high-throughput and largely automated pipelines for neuron tracking and labeling in deforming nervous systems.

      Strengths:

      (1) The paper tackles a timely and difficult problem, offering an end-to-end system rather than isolated modules.

      (2) The authors report high performance within their dataset, including single-pixel registration accuracy, nearly complete neuron linking over time, and annotation accuracy that exceeds individual human labelers.

      (3) Demonstrations across two organisms suggest the methods could be transferable, and the integration of supervised and unsupervised modules is of practical utility.

      We thank the reviewer for noting these strengths of our study.

      Weaknesses:

      (1) Lack of solid evaluation. Despite strong results on their own data, the work is not benchmarked against existing methods on community datasets, making it hard to evaluate relative performance or generality.

      We agree that it would be valuable to benchmark many labs’ software pipelines on some common datasets, ideally from several different research labs. We note that most papers in this area, which describe the other pipelines that have been developed, provide the same performance metrics that we do (accuracy of neuron identification, tracking accuracy, etc), so a crude, first-order comparison can be obtained by comparing the numbers in the papers. But, we agree that a rigorous head-to-head comparison would require applying these different pipelines to a common dataset. We considered performing these analyses, but we were concerned that using other labs’ software ‘off the shelf’ and comparing the results to our pipeline (where we have extensive expertise) might bias the performance metrics in favor of our software. Therefore, we feel that this comparison would be best pursued by all of these labs collaboratively (so that they can each provide input on how to run their software optimally). Indeed, this is an important area for future study. In this spirit, we have been sharing our eat-4::GFP datasets (that permit quantification of tracking accuracy) with other labs looking for additional ways to benchmark their tracking software.

      We also note that there are not really any pipelines to directly compare against CellDiscoveryNet, as we are not aware of any other fully unsupervised approach for neuron identification in C. elegans.

      (2) Lack of novelty. All three models do not incorporate state-of-the-art advances from the respective fields. BrainAlignNet does not learn from the latest optical flow literature, relying instead on relatively conventional architectures. AutoCellLabeler does not utilize the advanced medNeXt3D architectures for supervised semantic segmentation. CellDiscoveryNet is presented as unsupervised discovery but relies on standard clustering approaches, with limited evaluation on only a small test set.

      We appreciate that the machine learning field moves fast. Our goal was not to invent entirely novel machine learning tools, but rather to apply and optimize tools for a set of challenging, unsolved biological problems. We began with the somewhat simpler architectures described in our study and were largely satisfied with their performance. It is conceivable that newer approaches would perhaps lead to even greater accuracy, flexibility, and/or speed. But, oftentimes, simple or classical solutions can adequately resolve specific challenges in biological image processing.

      Regarding CellDiscoveryNet, our claim of unsupervised training is precise: CellDiscoveryNet is trained end-to-end only on raw images, with no human annotations, pseudo-labels, external classifiers, or metadata used for training, model selection, or early stopping. The loss is defined entirely from the input data (no label signal). By standard usage in machine learning, this constitutes unsupervised (often termed “self-supervised”) representation learning. Downstream clustering is likewise unsupervised, consuming only image pairs registered by CellDiscoveryNet and neuron segmentations produced by our previously-trained SegmentationNet (which provides no label information).

      (3) Lack of robustness. BrainAlignNet requires dataset-specific training and pre-alignment strategies, limiting its plug-and-play use. AutoCellLabeler depends heavily on raw intensity patterns of neurons, making it brittle to pose changes. By contrast, current state-of-the-art methods incorporate spatial deformation atlases or relative spatial relationships, which provide robustness across poses and imaging conditions. More broadly, the ANTSUN 2.0 system depends on numerous manually tuned weights and thresholds, which reduces reproducibility and generalizability beyond curated conditions.

      Regarding BrainAlignNet: we agree that we trained on each species’ own data (worm, jellyfish) and we would suggest other labs working on new organisms to do the same based on our current state of knowledge. It would be fantastic if there was an alignment approach that generalized to all possible cases of non-rigid-registration in all animals – an important area for future study. We also agree that pre-alignment was critical in worms and jellyfish, which we discuss extensively in our study (lines 142-144, 318-321, 704-712).

      Regarding AutoCellLabeler: the animals were not recorded in any standardized pose and were not aligned to each other beforehand – they were basically in a haphazard mix of poses and we used image augmentation to allow the network to generalize to other poses, as described in our study. It is still possible that AutoCellLabeler is somehow brittle to pose changes (e.g. perhaps extremely curved worms) – while we did not detect this in our analyses, we did not systematically evaluate performance across all possible poses. However, we do note that this network was able to label images taken from freely-moving worms, which by definition exhibit many poses (Figure 5D, lines 500-525); aggregating the network’s performance across freely-moving data points allowed it to nearly match its performance on high-SNR immobilized data. This suggests a degree of robustness of the AutoCellLabeler network to pose changes.

      Regarding ANTSUN 2.0: we agree that there are some hyperparameters (described in our study) that affect ANTSUN performance. We agree that it would be worthwhile to fully automate setting these in future iterations of the software.

      Evaluation:

      To make the evaluation more solid, it would be great for the authors to (1) apply the new method on existing datasets and (2) apply baseline methods on their own datasets. Otherwise, without comparison, it is unclear if the proposed method is better or not. The following papers have public challenging tracking data: https://elifesciences.org/articles/66410, https://elifesciences.org/articles/59187, https://www.nature.com/articles/s41592-023-02096-3.

      Please see our response to your point (1) under Weaknesses above.

      Methodology:

      (1) The model innovations appear incrementally novel relative to existing work. The authors should articulate what is fundamentally different (architectural choices, training objectives, inductive biases) and why those differences matter empirically. Ablations isolating each design choice would help.

      There are other efforts in the literature to solve the neuron tracking and neuron identification problems in C. elegans (please see paragraphs 4 and 5 of our Introduction, which are devoted to describing these). However, they are quite different in the approaches that they use, compared to our study. For example, for neuron tracking they use t->t+1 methods, or model neurons as point clouds, etc (a variety of approaches have been tried). For neuron identification, they work on extracted features from images, or use statistical approaches rather than deep neural networks, etc (a variety of approaches have been tried). Our assessment is that each of these diverse approaches has strengths and drawbacks; we agree that a meta-analysis of the design choices used across studies could be valuable.

      We also note that there are not really any pipelines to directly compare against CellDiscoveryNet, as we are not aware of any other fully unsupervised approach for neuron identification in C. elegans.

      (2) The pipeline currently depends on numerous manually set hyperparameters and dataset-specific preprocessing. Please provide principled guidelines (e.g., ranges, default settings, heuristics) and a robustness analysis (sweeps, sensitivity curves) to show how performance varies with these choices across datasets; wherever possible, learn weights from data or replace fixed thresholds with data-driven criteria.

      We agree that there are some ANTSUN 2.0 hyperparameters (described in our Methods section) that could affect the quality of neuron tracking. It would be worthwhile to fully automate setting these in future iterations of the software, ensuring that the hyperparameter settings are robust to variation in data/experiments.

      Appraisal:

      The authors partially achieve their aims. Within the scope of their dataset, the pipeline demonstrates impressive performance and clear practical value. However, the absence of comparisons with state-of-the-art algorithms such as ZephIR, fDNC, or WormID, combined with small-scale evaluation (e.g., ten test volumes), makes the strength of evidence incomplete. The results support the conclusion that the approach is useful for their lab's workflow, but they do not establish broader robustness or superiority over existing methods.

      We wish to remind the reviewer that we developed BrainAlignNet for use in worms and jellyfish. These two animals have different distributions of neurons and radically different anatomy and movement patterns. Data from the two organisms was collected in different labs (Flavell lab, Weissbourd lab) on different types of microscopes (spinning disk, epifluorescence). We believe that this is a good initial demonstration that the approach has robustness across different settings.

      Regarding comparisons to other labs’ C. elegans data processing pipelines, we agree that it will be extremely valuable to compare performance on common datasets, ideally collected in multiple different research labs. But we believe this should be performed collaboratively so that all software can be utilized in their best light with input from each lab, as described above. We agree that such a comparison would be very valuable.

      Impact:

      Even though the authors have released code, the pipeline requires heavy pre- and post-processing with numerous manually tuned hyperparameters, which limits its practical applicability to new datasets. Indeed, even within the paper, BrainAlignNet had to be adapted with additional preprocessing to handle the jellyfish data. The broader impact of the work will depend on systematic benchmarking against community datasets and comparison with established methods. As such, readers should view the results as a promising proof of concept rather than a definitive standard for imaging in deformable nervous systems.

      Regarding worms vs jellyfish pre-processing: we actually had the exact opposite reaction to that of the reviewer. We were surprised at how similar the pre-processing was for these two very different organisms. In both cases, it was essential to (1) select appropriate registration problems to be solved; and (2) perform initialization with Euler alignment. Provided that these two challenges were solved, BrainAlignNet mostly took care of the rest. This suggests a clear path for researchers who wish to use this approach in another animal. Nevertheless, we also agree with the reviewer’s caution that a totally different use case could require some re-thinking or re-strategizing. For example, the strategy of how to select good registration problems could depend on the form of the animal’s movement.

      Reviewer #3 (Public review):

      Context:

      Tracking cell trajectories in deformable organs, such as the head neurons of freely moving C. elegans, is a challenging task due to rapid, non-rigid cellular motion. Similarly, identifying neuron types in the worm brain is difficult because of high inter-individual variability in cell positions.

      Summary:

      In this study, the authors developed a deep learning-based approach for cell tracking and identification in deformable neuronal images. Several different CNN models were trained to: (1) register image pairs without severe deformation, and then track cells across continuous image sequences using multiple registration results combined with clustering strategies; (2) predict neuron IDs from multicolor-labeled images; and (3) perform clustering across multiple multicolor images to automatically generate neuron IDs.

      Strengths:

      Directly using raw images for registration and identification simplifies the analysis pipeline, but it is also a challenging task since CNN architectures often struggle to capture spatial relationships between distant cells. Surprisingly, the authors report very high accuracy across all tasks. For example, the tracking of head neurons in freely moving worms reportedly reached 99.6% accuracy, neuron identification achieved 98%, and automatic classification achieved 93% compared to human annotations.

      We thank the reviewer for noting these strengths of our study.

      Weaknesses:

      (1) The deep networks proposed in this study for registration and neuron identification require dataset-specific training, due to variations in imaging conditions across different laboratories. This, in turn, demands a large amount of manually or semi-manually annotated training data, including cell centroid correspondences and cell identity labels, which reduces the overall practicality and scalability of the method.

      We performed dataset-specific training for image registration and neuron identification, and we would encourage new users to do the same based on our current state of knowledge. This highlights how standardization of whole-brain imaging data across labs is an important issue for our field to address and that, without it, variations in imaging conditions could impact software utility. We refer the reviewer to an excellent study by Sprague et al. (2025) on this topic, which is cited in our study.

      However, at the same time, we wish to note that it was actually reasonably straightforward to take the BrainAlignNet approach that we initially developed in C. elegans and apply it to jellyfish. Some of the key lessons that we learned in C. elegans generalized: in both cases, it was critical to select the right registration problems to solve and to preprocess with Euler registration for good initialization. Provided that those problems were solved, BrainAlignNet could be applied to obtain high-quality registration and trace extraction. Thus, our study provides clear suggestions on how to use these tools across multiple contexts.

      (2) The cell tracking accuracy was not rigorously validated, but rather estimated using a biased and coarse approach. Specifically, the accuracy was assessed based on the stability of GFP signals in the eat-4-labeled channel. A tracking error was assumed to occur when the GFP signal switched between eat-4-negative and eat-4-positive at a given time point. However, this estimation is imprecise and only captures a small subset of all potential errors. Although the authors introduced a correction factor to approximate the true error rate, the validity of this correction relies on the assumption that eat-4 neurons are uniformly distributed across the brain - a condition that is unlikely to hold.

      We respectfully disagree with this critique. We considered the alternative suggested by the reviewer (in their private comments to the authors) of comparing against a manually annotated dataset. But this annotation would require manually linking ~150 neurons across ~1600 timepoints, which would require humans to manually link neurons across timepoints >200,000 times for a single dataset. These datasets consist of densely packed neurons rapidly deforming over time in all 3 dimensions. Moreover, a single error in linking would propagate across timepoints, so the error tolerance of such annotation would be extremely low. Any such manually labeled dataset would be fraught with errors and should not be trusted. Instead, our approach relies on a simple, accurate assumption: GFP expression in a neuron should be roughly constant over a 16min recording (after bleach correction) and the levels will be different in different neurons when it is sparsely expressed. Because all image alignment is done in the red channel, the pipeline never “peeks” at the GFP until it is finished with neuron alignment and tracking. The eat-4 promoter was chosen for GFP expression because (a) the nuclei labeled by it are scattered across the neuropil in a roughly salt-and-pepper fashion – a mixture of eat-4-positive and eat-4-negative neurons are found throughout the head; and (b) it is in roughly 40% of the neurons, giving very good overall coverage. Our view is that this approach of labeling subsets of neurons with GFP should become the standard in the field for assessing tracking accuracy – it has a simple, accurate premise; is not susceptible to human labeling error; is straightforward to implement; and, since it does not require manual labeling, is easy to scale to multiple datasets. We do note that it could be further strengthened by using multiple strains each with different ‘salt-and-pepper’ GFP expression patterns.

      (3) Figure S1F demonstrates that the registration network, BrainAlignNet, alone is insufficient to accurately align arbitrary pairs of C. elegans head images. The high tracking accuracy reported is largely due to the use of a carefully designed registration sequence, matching only images with similar postures, and an effective clustering algorithm. Although the authors address this point in the Discussion section, the abstract may give the misleading impression that the network itself is solely responsible for the observed accuracy.

      Our tracking accuracy requires (a) a careful selection of registration problems, (b) highly accurate registration of the selected registration problems, and (c) effective clustering. We extensively discussed the importance of the choosing of the registration problems in the Results section (lines 218-234 and 318-321), Discussion section (lines 704-708), and Methods section (955-970 and 1246-1250) of our paper. We also discussed the clustering aspect in the Results section (lines 247-259), Discussion section (lines 708-712), and Methods section (lines 1162-1206). In addition, our abstract states that the BrainAlignNet needs to be “incorporated into an image analysis pipeline,” to inform readers that other aspects of image analysis need to occur (beyond BrainAlignNet) to perform tracking.

      (4) The reported accuracy for neuron identification and automatic classification may be misleading, as it was assessed only on a subset of neurons labeled as "high-confidence" by human annotators. Although the authors did not disclose the exact proportion, various descriptions (such as Figure 4f) imply that this subset comprises approximately 60% of all neurons. While excluding uncertain labels is justifiable, the authors highlight the high accuracy achieved on this subset without clearly clarifying that the reported performance pertains only to neurons that are relatively easy to identify. Furthermore, they do not report what fraction of the total neuron population can be accurately identified using their methods-an omission of critical importance for prospective users.

      The reviewer raises two points here: (1) whether AutoCellLabeler accuracy is impacted by ease of human labeling; and (2) what fraction of total neurons are identified. We address them one at a time.

      Regarding (1), we believe that the reviewer overlooked an important analysis in our study. Indeed, to assess its performance, one can only compare AutoCellLabeler’s output against accurate human labels – there is simply no way around it. However, we noted that AutoCellLabeler was identifying some neurons with high confidence even when humans had low confidence or had not even tried to label the neurons (Fig. 4F). To test whether these were in fact accurate labels, we asked additional human labelers to spend extra time trying to label a random subset of these neurons (they were of course blinded to the AutoCellLabeler label). We then assessed the accuracy of AutoCellLabeler against these new human labels and found that they were highly accurate (Fig. 4H). This suggests that AutoCellLabeler has strong performance even when some human labelers find it challenging to label a neuron. However, we agree that we have not yet been able to quantify AutoCellLabeler performance on the small set of neuron classes that humans are unable to identify across datasets.

      Regarding (2), we agree that knowing how many neurons are labeled by AutoCellLabeler is critical. For example, labeling only 3 neurons per animal with 100% accuracy isn’t very helpful. We wish to emphasize that we did not omit this information: we reported the number of neurons labeled for every network that we characterized in the study, alongside the accuracy of those labels (please see Figures 4I, 5A, and 6G; Figure 4I also shows the number of human labels per dataset, which the reviewer requested). We also showed curves depicting the tradeoff between accuracy and number of neurons labeled, which fully captures how we balanced accuracy and number of neurons labeled (Figures 5D and S4A). It sounds like the reviewer also wanted to know the total number of recorded neurons. The typical number of recorded neurons per dataset can also be found in the paper in Fig. 2E.

    1. Reviewer #1 (Public review):

      Summary:

      This study focuses on characterizing the EEG correlates of item-specific proportion congruency effects. In particular, two types of learned associations are characterized. One being associations between stimulus features and control states (SC), and the other being stimulus features and responses (SR). Decoding methods are used to identify SC and SR correlates and to determine whether they have similar topographies and dynamics.

      The results suggest SC and SR associations are simultaneously coactivated and have shared topographies, with the inference being that these associations may share a common generator.

      Strengths:

      Fearless, creative use of EEG decoding to test tricky hypotheses regarding latent associations.

      Nice idea to orthogonalize the ISPC condition (MC/MI) from stimulus features.

      Weaknesses:

      (1) I'm relatively concerned that these results may be spurious. I hope to be proven wrong, but I would suggest taking another look at a few things.

      While a nice idea in principle, the ISPC manipulation seems to be quite confounded with the trial number. E.g., color-red is MI only during phase 2, and is MC primarily only during Phase 3 (since phase 1 is so sparsely represented). In my experience, EEG noise is highly structured across a session and easily exploited by decoders. Plus, behavior seems quite different between Phase 2 and Phase 3. So, it seems likely that the classes you are asking the decoder to separate are highly confounded with temporally structured noise.

      I suggest thinking of how to handle this concern in a rigorous way. A compelling way to address this would be to perform "cross-phase" decoding, however I am not sure if that is possible given the design.

      The time courses also seem concerning. What are we to make of the SR and SC timecourses, which have aggregate decoding dynamics that look to be <1Hz?

      Some sanity checks would be one place to start. Time courses were baselined, but this is often not necessary with decoding; it can cause bias (10.1016/j.jneumeth.2021.109080), and can mask deeper issues. What do things look like when not baselined? Can variables be decoded when they should not be decoded? What does cross-temporal decoding look like - everything stable across all times, etc.?

      (2) The nature of the shared features between SR and SC subspaces is unclear.

      The simulation is framed in terms of the amount of overlap, revealing the number of shared dimensions between subspaces. In reality, it seems like it's closer to 'proportion of volume shared', i.e., a small number of dominant dimensions could drive a large degree of alignment between subspaces.

      What features drive the similarity? What features drive the distinctions between SR and SC? Aside from the temporal confounds I mentioned above, is it possible that some low-dimensional feature, like EEG congruency effect (e.g., low-D ERPs associated with conflict), or RT dynamics, drives discriminability among these classes? It seems plausible to me - all one would need is non-homogeneity in the size of the congruency effect across different items (subject-level idiosyncracies could contribute: 10.1016/j.neuroimage.2013.03.039).

      (3) The time-resolved within-trial correlation of RSA betas is a cool idea, but I am concerned it is biased. Estimating correlations among different coefficients from the same GLM design matrix is, in general, biased, i.e., when the regressors are non-orthogonal. This bias comes from the expected covariance of the betas and is discussed in detail here (10.1371/journal.pcbi.1006299). In short, correlations could be inflated due to a combination of the design matrix and the structure of the noise. The most established solution, to cross-validate across different GLM estimations, is unfortunately not available here. I would suggest that the authors think of ways to handle this issue.

      (4) Are results robust to running response-locked analyses? Especially the EEG-behavior correlation. Could this be driven by different RTs across trials & trial-types? I.e., at 400 ms post-stim onset, some trials would be near or at RT/action execution, while others may not be nearly as close, and so EEG features would differ & "predict" RT.

      (5) I suggest providing more explanation about the logic of the subspace decoding method - what trialtypes exactly constitute the different classes, why we would expect this method to capture something useful regarding ISPC, & what this something might be. I felt that the first paragraph of the results breezes by a lot of important logic.

      In general, this paper does not seem to be written for readers who are unfamiliar with this particular topic area. If authors think this is undesirable, I would suggest altering the text.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1. General Statements

      We thank the reviewers for providing thoughtful and constructive feedback, which will help us improve the clarity and rigor of the paper. On balance, the reviews were positive. Reviewer 1 mentioned that “This is a strong manuscript with few problems and all important findings well justified, indeed this is a nicely polished…..high-quality manuscript,” and that “this paper makes a major breakthrough, showing that cell autonomous defects in hTSCs are very likely at the heart of the pathology observed in GIN-prone murine mutants.” Reviewer 3 stated that “The study is well designed, and the manuscript is very well written. The conclusions are supported by the evidence presented.” Reviewer 2 was less enthusiastic, with main concerns being that “The paper is mostly descriptive and often quite confusing leaving one not much closer to understanding the mechanistic basis for the interesting sex-biased semi-lethal phenotype.” and felt that figure titles/section headers overstated the results, and finally recommended to improve some technical aspects and tempering conclusions. The proposed edits we think address most issues raised by the reviewers either with re-writing or adding data as described below.

      In response to reviewer #1 comments:

      Major comments:

      • I am confused as to the basis of the sex-skewing phenomenon? Is the problem that lack of maternally loaded WT Mcm4 worsens the phenotype, or is the issue that Mcm4C3/C3 dams are less able to retain pregnancies, perhaps being a more inflammatory environment? Also, while there quite consistent evidence for reduced viability of Mcm4C3/C3McmGt/+ progeny, especially for female progeny, how confident can we be that the genotype of the dam vs. sire is important? Notably on a Ddx58 background, the progeny of the Mcm4C3/C3 sire included seven live male Mcm4C3/C3McmGt/+ but no female.

      Regarding the first point (sex skewing only when female is C3/C3), we also suspected either: 1) the maternal uterine environment, or 2) reduced oocyte quality. Although not reported in this manuscript, we tested #1 by performing embryo transfer experiments. Transferring 2-cell stage embryos from sex-skewing mating to WT females did not rescue the sex-bias. We then examined oocytes from C3/C3 females. We found evidence for compromised mitochondria and transcriptome disruption. However, we are not sure why this happens (poor follicle support? Oocyte intrinsic phenomenon?). We are reserving these results and additional experiments for another paper, especially since this one mainly deals with GIN and placenta development. If the reviewers feel strongly that the embryo transfer data is crucial, we can include it.

      Regarding how confident we are that the genotype of the dam vs. sire is important, this stems from our previous paper by McNairn et al 2019 (the percentage of female C3/C3 M2/+ from sex-skewing mating is 20% compared to 60% from the reciprocal mating), which was quite dramatic. Consistent with this, MCM levels were significantly reduced in the placentae only when the dam was C3/C3 and the sire C3/+ M2/+, but not in the reciprocal cross. The reviewer makes a good observation about the Ddx58 cross; we can only hypothesize that the mutation somehow sensitizes females in this scenario and will make mention of it in the revision. We also realize that we neglected to write in Methods that the Ddx58 allele was coisogenic in the C3H background.

      • I'm not sure what Supplementary Figure 6 is showing (faster differentiation of C3 but less TGC?). Regardless, it's hard to draw too much conclusion from one not-very-pretty Western blot. This figure requires both additional replicates and a better explanation of how it fits with the other conclusions of the paper..

      We hypothesized that the JZ defect observed in the semi-lethal genotype placentas could arise either from impaired maintenance of the progenitor pool or from reduced capacity of mutant trophoblast progenitors to differentiate into the JZ lineage. The blot in Supplementary Figure 6 was intended as a qualitative demonstration that mutant trophoblast stem cells can differentiate into JZ lineages. We recognize that the figure is not definitive and will revise the text to clarify its purpose. A replicate(s) of the Western will be performed as suggested.

      • Supplementary Figure 7F-G is puzzling. Half of the mESCs have gamma-H2AX at all times, including most in S or G2 phase? In Figure S7E, do the quadrants correspond to being negative or positive for gamma-H2AX? At very least, IF images showing clear gamma-H2AX foci would be much more convincing.

      The gates for γH2AX FACS analysis were established using negative controls lacking primary antibody. As reported previously, embryonic stem cells display high basal levels of γH2AX staining (Chuykin et al., Cell Cycle 2008; Turinetto et al., Stem Cells 2012; Ahuja et al., Nat Comm 2016), which likely explains the broad signal observed across cell cycle phases. Regardless, we will provide immunofluorescence staining of γH2Ax and foci count in our revision.

      • The methods section is well detailed, but it would be ideal to clarify how many replicates each Western Blot or flow cytometry experiment is representative of.

      Thanks for the suggestion. We will update this for Fig4 and Fig5.

      Minor comments:

      • Is it possible that cGAS-STING and RIG pathways act redundantly to cause inflammation and lethality, or that other innate immune components are involved? I don't expect the authors to make compound mutants to test this but at least this possibility should be discussed textually.

      We appreciate the reviewer’s point, and had the same suspicion. Supporting this, we will add new RNA-seq analysis of Tmem173 KO placentas revealed elevated inflammatory gene expression compared to C3/C3 M2/+ controls, consistent with potential redundancy or feedback regulation. We will update in supplementary figures to reflect this.

      In response to reviewer #2 comments:

      Major comments:

      A major concern throughout the paper is that conclusions are often overstating their data. The title of figure 2 is "placentae with replication stress have smaller junctional and labyrinth zones". However, there is no measure of replication stress in this figure, just a histological evaluation of the placentae from the different mutants. The title of figure 3 is "Impact of GIN on LZ is less than JZ," but there is no measure of GIN, but instead measurement of number of cells in cell cycle and some bulk RNA-seq analysis. Title of figure 4 is "TSCs with increased genomic instability exhibit abnormal phenotypes." Again there is no measure of GIN, but instead staining of derived TSCs for proliferation, cell death, and a TSC marker. Title of figure 5 is "DNA damage responses and G2/M checkpoint activation drive premature TSC differentiation." However, there does not appear to be a difference in gH2AX between the two mutant genotypes. Checkpoint proteins might be up, but need quantification and reproduction. > 4C is the only marker of differentiation. Importantly, all the analyses here are associations, not connections, so cannot use the word "drive". Similar issues can be raised with a number of the supplementary figures.

      The Chaos3 (chromosome aberrations occurring spontaneously 3) model is a well-established system of intrinsic chronic replication stress and GIN. It is characterized by ~20 fold elevation of blood micronuclei (Shima et al., Nature 2007), a hallmark of GIN (Soxena et al., Mol Cell 2022); a destabilized MCM2-7 helicase prone to replication fork collapse (Bai et al., PLoS Genet 2016); and increased mitotic chromosome abnormalities and decreased dormant origins (Kawabata et al., Mol Cell 2011; Chuang et al., Nucleic Acid Res 2012) that are known to cause GIN and replication stress (Ibarra et al., PNAS 2008 ). Also, in our previous work (McNairn et al Nature 2019), we showed that placentae from C3/C3 dams exhibit significantly elevated γH2Ax as well as reduced MCM2 and MCM4 protein levels. In our current study, we also observe elevated γH2Ax in mutant TSCs (C3/C3 and C3/C3 M2/+), consistent with genomic instability. Nevertheless, we acknowledge that in TSCs, we did not formally demonstrate replications stress(RS), so where appropriate, we will advise figure titles, for example to say that “cells/placentae with a GIN or RS genotype.”

      We acknowledge the reviewers concern regarding western blots. We will provide quantification and statistics in our revision.

      1) A deeper analysis of the cell lines is likely to be the most fruitful path to reveal interesting mechanisms. It is very surprising that there is no phenotype in ESCs. Authors should check for increased apoptosis. Maybe the phenotypic cells are lost. Or do ESCs use different MCMs/mechanisms of DNA replication or are they better able to handle replication stress and GIN? How many passages were the TSCs and ESCs cultured for? Does GIN (i.e. aneuploidy, CNVs) develop in TSCs and ESCs with passaging? How do the MCM mutations impact the molecular identity of the ESC and TSC cells including their heterogeneity in the population.

      We assessed apoptosis using cleaved caspase 3 flow cytometry in mutant ESCs and observed no difference compared to controls (we will add this data as Supplementary Fig. 7).

      We believe there are intrinsic differences in TSCs and ESCs in their ability to respond to and counteract replication stress and DNA damage. ESCs are known to license more replication origins than somatic cells at a higher rate, which protects them from short G1-induced replication stress (Ahuja et al., Nat Comm 2016; Ge et al., Stem Cell Rep 2015; Matson et al., eLife 2017). Human placental cells physiologically exhibit high levels of mutation rate and chromosomal instability in vivo (Coorens et al., Nature 2021). Supporting this, Wang, D., et al (Nat Comm 2025) reported that several cell cycle and DDR regulators are differentially expressed in human TSCs vs human pluripotent stem cells. Whether such transcriptional differences directly contribute to functional outcomes remains to be determined.

      All experiments in this study were conducted using early-passage ESCs and TSCs (i.e. Finally, we showed that close to 90% mutant ESCs are KLF4+ (a naive pluripotency marker) whereas EOMES+ cells were significantly reduced in TSCs carrying the GIN genotype (Fig. 4E–F and Supplementary Fig. 7), highlighting lineage-specific differences.

      Minor Comments:

      1) There is a lack of quantification and repeats for all Westerns. At minimum there should be three repeats for each experiment, quantification including normalization to a reference protein, and stats confirming any proposed differences between conditions.

      We will update our revision with quantification and statistics for western blots.

      2) I would recommend moving the results in supp table 1 to figure 1. While negative, they are the newer results. The results shown in current figure 1 are essentially a reproduction of their previous work.

      The placental observations presented in Fig.1 are new. In particular, the placental and embryonic weight measurements graphed in Fig1B and C have not been published by our group. Fig1A reproduces our previous observation on embryo viability in GIN mutants (McNairn et al., Nature 2019), while the schematic was provided for better flow and readability given the complex mating schemes. We are agnostic on the Suppl Table 1. It could be changed to a new Table 1 in the main section depending on the journal.

      In response to reviewer #3 comments:

      Major Comments

      While the inclusion of bulk RNAseq data of whole placental tissue is appreciated, the interpretation of the results is somewhat problematic, as it is acknowledged that the cell type composition of the placentas is drastically different between groups. Making conclusions based upon GSEA analysis of two different groups with drastically different cell type composition is somewhat misleading, as based on the results, it is a direct reflection of the cell types present. It would be more helpful to perform cell type deconvolution of the RNAseq data to estimate the proportion of each cell type within the bulk samples and compare that to what is seen histologically and not dive too deeply into the pathways since the results could just be a reflection of the cell types e.g. angiogenesis pathways from more endothelial cells. Additionally, the RNAseq data can be leveraged to look at expression of inflammatory genes by sex, which may show interesting patterns based on the other results.

      We agree that the representation of cell types in the placenta is problematic especially for underrepresented genes. We propose to use the BayesPrism tool (Chu et al., Nat Cancer 2022) to deconvolute bulk RNA-seq for better representation of transcriptional changes in the placenta.

      Section: GIN impairs trophoblast stem cell establishment and maintenance. To support the assertion in the first paragraph, beyond measuring apoptosis, it would be helpful at this stage to look at RNA expression levels indicative of the activation of DNA damage checkpoint genes

      We have performed RNA-seq on mutant ESC and TSCs and are in the process of data analysis. We will update these results in the revision.

      Please include additional methodological details in the methods section on the statistical analysis done for differential expression analysis. Specifically, what type of normalization was used, if lowly expressed genes were filtered out and at what cutoff, what statistical model was used (did you include covariates?), what comparisons were made? Did you stratify by sex? What cutoff was used for statistical significance? Did you perform multiple testing correction?

      We will update RNA-Seq data analysis methods in our full revision.

      2. Description of the revisions that have already been incorporated in the transferred manuscript

      Reviewer #1 comments:

      • Supplementary Table 1. would be enhanced greatly showing comparable tables for Mcm4C3/C3 x Mcm4C3/+McmGt/+ in mice without the Tmem173 or Ddx58 mutations. It is fine to recycle data from McNairn 2019 here, as long as the source is indicated, but a comparison is needed.

      Thanks for pointing this out. We have updated this suggestion in Supp table 1.

      • In Figure S3E-F, is the box above each graph supposed to show the genotype of the dam?

      Yes. Thanks for pointing this out. We have added a description in the figure legend to make it clear.

      • "Indeed, the placenta and embryo weights of E13.5 Mcm4C3/C3 Mcm2Gt/+ Mcm3Gt/+ animals were significantly improved vs. Mcm4C3/C3 Mcm2Gt/+ animals, rendering them similar to Mcm4C3/C3 littermates (Fig. 6A-C). The JZ (but not LZ) area in Mcm4C3/C3 Mcm2Gt/+ Mcm3Gt/+ placentae also increased to the level of Mcm4C3/C3 littermates (Fig. 6D-H)." There are two problems here. First, the figure calls are wrong. Second, the description of the data is not quite right, it looks like the C3/C3 and C3/C3 M2/+ M3/+ LZs are a similar size to each and are statistically indistinguishable.

      Thanks for catching this. We have updated these in the main text.

      *Reviewer #2 comments: *

      Minor comment

      • Need to review citations to figures. For example, no citations are made to figure 4a and 4c.

      Thanks for catching this. We have updated the text.

      Reviewer #3 comments:

      Define the first use of >4C DNA content to help readers understand this potentially unfamiliar term.

      We have edited this part to indicate cells with more than 4C DNA content for better clarity.

      iDEP tool - please include citation to manuscript instead of link

      We have updated this citation.

      Check citations. Some citations to BioRxiv that are now published e.g. 13.

      We have updated this citation.

      3. Description of analyses that authors prefer not to carry out

      Reviewer 2

      2) Along similar lines, most of the in vivo phenotypic analyses are performed at E13.5, long after defects are likely beginning to express themselves especially given that they see phenotypes in the TSCs, which represent the polar TE of a E4.5. To understand the primary defects of the in vivo phenotype, they should be looking much earlier. Supplemental figure 5 is a start but represents a rather superficial analysis.

      The peri-implantation period, namely E4.5, represents a “black box” of embryonic development given that this is a critical stage for implantation. Aside from being an extremely difficult stage to analyze technically, we don’t think it is essential to the conclusions (or doable in a timely manner), especially given the use of TSCs. If we complete EdU studies on E6.5 embryos, we will include them.

      3) Fig. 6 would benefit from evidence that MCM3 mutant is rescuing MCM4 levels in the chromatin fraction of cells and the DNA damage phenotype.

      The genetic evidence presented is strong, and although we didn’t do the suggested experiment, we feel that our previous studies (McNairn et al., Nature 2019 and Chuang et al., PLoS Genet 2010) on the effects of MCM3 as a nuclear export factor (as it is in yeast (Liku et al., Mol Biol Cell 2005)) are a reasonable basis for not repeating such experiments. Furthermore, we are no longer maintaining the Mcm3 line and it would take over a year to reconstitute and rebreed triple mutants.

    1. Reviewer #3 (Public review):

      Summary:

      Lmx1a is an orthologue of apterous in flies, which is important for dorsal-ventral border formation in the wing disc. Previously, this research group has described the importance of the chicken Lmx1b in establishing the boundary between sensory and non-sensory domains in the chicken inner ear. Here, the authors described a series of cellular changes during border formation in the chicken inner ear, including alignment of cells at the apical border and concomitant constriction basally. The authors extended these observations to the mouse inner ear and showed that these morphological changes occurred at the border of Lmx1a positive and negative regions, and these changes failed to develop in Lmx1a mutants. Furthermore, the authors demonstrated that the ROCK-dependent actomyosin contractility is important for this border formation and blocking ROCK function affected epithelial basal constriction and border formation in both in vitro and in vivo systems.

      Strengths:

      The morphological changes described during border formation in the developing inner ear are interesting. Linking these changes to the function of Lmx1a and ROCK dependent actomyosin contractile function are provocative.

      Weaknesses:

      There are several outstanding issues that need to be clarified before one can pin the morphological changes observed being causal to border formation and that Lmx1a and ROCK are involved.

      Comments on the latest version:

      The revised manuscript has provided clarity of their results on some levels, but unfortunately, the basal restriction during border formation remains unclear and the study did not advance the understanding of role of Lmx1a in boundary formation. Overall comments are indicated below:

      (1) The authors states in the rebuttal, "we do not think that ROCK activity is required for the formation or maintenance of the basal constriction at the interface of Lmx1a-expressing and non-expressing cells"<br /> If the above is the sentiment of the authors, then the manuscript is not written to support this sentiment clearly, starting with this misleading sentence in the Abstract, "The boundary domain is absent in Lmx1a-deficient mice, which exhibit defects in sensory organ segregation, and is disrupted by the inhibition of ROCK-dependent actomyosin contractility."

      (2) As acknowledged by the authors, the data as they currently stand could be explained by Lmx1a functioning in specifying the non-sensory fate and may not function directly in boundary formation. With this caveat in mind, the role of Lmx1a in boundary formation remains unclear.

      (3) I feel like the word "orchestrate" in the title is an overstatement.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript by Raices et al., provides novel insights into the role and interactions between SPO-11 accessory proteins in C. elegans. The authors propose a model of meiotic DSBs regulation, critical to our understanding of DSB formation and ultimately crossover regulation and accurate chromosome segregation. The work also emphasizes the commonalities and species-specific aspects of DSB regulation.

      Strengths:

      This study capitalizes on the strengths of the C. elegans system to uncover genetic interactions between a large number of SPO-11 accessory proteins. In combination with physical interactions, the authors synthesize their findings into a model, which will serve as the basis for future work, to determine mechanisms of DSB regulation.

      Weaknesses:

      The methodology, although standard, lacks quantification. This includes the mass spectrometry data , along with the cytology. The work would also benefit from clarifying the role of the DSB machinery on the X chromosome versus the autosomes.

      • We have uploaded the MS data and added a summary table with the number of peptides and coverage.

      • We have added statistics to the comparisons of DAPI body counts.

      • We have provided additional images of the change in HIM-5 localization

      • We have quantified the overlap (or lack thereof) between XND-1 and HIM-17 and the DNA axis

      Reviewer #2 (Public Review):

      Summary:

      Meiotic recombination initiates with the formation of DNA double-strand break (DSB) formation, catalyzed by the conserved topoisomerase-like enzyme Spo11. Spo11 requires accessory factors that are poorly conserved across eukaryotes. Previous genetic studies have identified several proteins required for DSB formation in C. elegans to varying degrees; however, how these proteins interact with each other to recruit the DSB-forming machinery to chromosome axes remains unclear.

      In this study, Raices et al. characterized the biochemical and genetic interactions among proteins that are known to promote DSB formation during C. elegans meiosis. The authors examined pairwise interactions using yeast two-hybrid (Y2H) and co-immunoprecipitation and revealed an interaction between a chromatin-associated protein HIM-17 and a transcription factor XND-1. They further confirmed the previously known interaction between DSB-1 and SPO-11 and showed that DSB-1 also interacts with a nematodespecific HIM-5, which is essential for DSB formation on the X chromosome. They also assessed genetic interactions among these proteins, categorizing them into four epistasis groups by comparing phenotypes in double vs. single mutants. Combining these results, the authors proposed a model of how these proteins interact with chromatin loops and are recruited to chromosome axes, offering insights into the process in C. elegans compared to other organisms.

      Weaknesses:

      This work relies heavily on Y2H, which is notorious for having high rates of false positives and false negatives. Although the interactions between HIM-17 and XND-1 and between DSB-1 and HIM-5 were validated by co-IP, the significance of these interactions was not tested, and cataloging Y2H interactions does not yield much more insight.

      We appreciate that the reviewer recognized the value of our IP data, but we beg to differ that we rely too heavily on the Y2H. We also provide genetic analysis on bivalent formation to support the physical interaction data. We do acknowledge that there are caveats with Y2H, however, including that a subset of the interactions can only be examined with proteins in one orientation due to auto-activation. While we acknowledge that it would be nice to have IP data for all of the proteins using CRISPR-tagged, functional alleles, these strains are not all feasible (e.g. no functional rec-1 tag has been made) and are beyond the scope of the current work.

      Moreover, most experiments lack rigor, which raises serious concerns about whether the data convincingly supports the conclusions of this paper. For instance, the XND-1 antibody appears to detect a band in the control IP; however, there was no mention of the specificity of this antibody.

      We previously showed the specificity of this antibody in its original publication showing lack of staining in the xnd-1 mutant by IF (Wagner et al., 2010). To further address this, however, we have now included a new supplementary figure (Figure S1) demonstrating the specificity of the XND-1 antibody by Western blot. The antibody detects a distinct band in extracts from wild-type (N2) worms, but this band is absent in two independent xnd-1 mutant strains. This confirms that the antibody specifically recognizes XND-1, supporting the validity of the IP results shown in the main figures.

      Additionally, epistasis analysis of various genetic mutants is based on the quantification of DAPI bodies in diakinesis oocytes, but the comparisons were made without statistical analyses.

      We have added statistical analysis to all datasets where quantification was possible, strengthening the rigor and interpretation of our findings.

      For cytological data, a single representative nucleus was shown without quantification and rigorous analysis. The rationale for some experiments is also questionable (e.g. the rescue by dsb-2 mutants by him-5 transgenes in Figure 2), making the interpretation of the data unclear. Overall, while this paper claims to present "the first comprehensive model of DSB regulation in a metazoan", cataloging Y2H and genetic interactions did not yield any new insights into DSB formation without rigorous testing of their significance in vivo. The model proposed in Figure 4 is also highly speculative.

      Regarding the cytology, we provide new images and quantification of HIM-17 and XND-1 overlap with the DNA axes. We also added full germ line images showing HIM-5 localization in wild type and dsb-1 mutants, to provide a more complete and representative view of the observed phenotype. To further support our findings, we’ve also included images demonstrating that this phenotype is consistently observed with both in live worm with the the him-5::GFP transgene and in fixed worms with an endogenously tagged version of HIM-5.

      Reviewer #3 (Public Review):

      During meiosis in sexually reproducing organisms, double-strand breaks are induced by a topoisomerase-related enzyme, Spo11, which is essential for homologous recombination, which in turn is required for accurate chromosome segregation. Additional factors control the number and genome-wide distribution of breaks, but the mechanisms that determine both the frequency and preferred location of meiotic DSBs remain only partially understood in any organism.

      The manuscript presents a variety of different analyses that include variable subsets of putative DSB factors. It would be much easier to follow if the analyses had been more systematically applied. It is perplexing that several factors known to be essential for DSB formation (e.g., cohesins, HORMA proteins) are excluded from this analysis, while it includes several others that probably do not directly contribute to DSB formation (XND-1, HIM-17, CEP-1, and PARG-1).

      We respectfully disagree with the reviewer’s statement regarding the selection of factors included in our analysis. In this work, our focus was specifically on SPO-11 accessory factors — proteins that directly interact with or regulate SPO-11 activity during doublestrand break formation. Cohesins and chromosome axis proteins (such as the HORMA domain proteins) are essential for establishing the correct chromosome architecture that supports DSB formation, but there is no evidence that they are direct accessory factors of SPO-11. Therefore, they were intentionally excluded from this study to maintain a clear and focused scope on proteins that more directly modulate SPO-11 function.

      Conversely, XND-1, HIM-17, CEP-1, and PARG-1 have all been implicated in regulating aspects of SPO-11-mediated DSB formation or its immediate environment. Although their contributions mayinvolve broader chromatin or DNA damage response regulation, prior literature supports their inclusion as relevant modulators of SPO-11 activity, justifying their analysis within the context of this work.

      The strongest claims seem to be that "HIM-5 is the determinant of X-chromosome-specific crossovers" and "HIM-5 coordinates the actions of the different accessory factors subgroups." Prior work had already shown that mutations in him-5 preferentially reduce meiotic DSBs on the X chromosome. While it is possible that HIM-5 plays a direct role in DSB induction on the X chromosome, the evidence presented here does not strongly support this conclusion. It is also difficult to reconcile this idea with evidence from prior studies that him-5 mutations predominantly prevent DSB formation on the sex chromosomes, while the protein localizes to autosomes.

      HIM-5 is not the only protein that is autosomally enriched but preferentially affects the X chromosome: MES-4 and MRG-1 are both autosomally-enriched but influence silencing of the X chromosome. While HIM-5 appears autosomally-enriched, it does not appear to be autosomal-exclusive. While we would ideally perform ChIP to determine its localization on chromatin, this method for assaying DSB sites is likely insufficient to identify DSB sites which differ in each nucleus and for which there are no known hotspots in the worm.

      him-5 mutants confer an ~50% reduction in total number of breaks and a very profound change in break dynamics (seen by RAD-51 foci (Meneely et al., 2012)). Since the autosomes receives sufficient breaks in this context to attain a crossover in >98% of nuclei, this indicates that the autosomes are much less profoundly impacted by loss of DSB functions than is the X chromosome. Indeed, prior data from co-author, Monica Colaiacovo, showed that fewer breaks occur on the X (Gao, 2015) likely resulting from differences in the chromatin composition of the X and autosome resulting from X chromosome silencing.

      The conclusion that HIM-5 must be required for breaks on the X comes from the examination of DSB levels and their localization in different mutants that impair but do not completely abrogate breaks. In any situation where HIM-5 protein expression is affected (xnd-1, him-17, and him-5 null alleles), breaks on the X are reduced/ eliminated. By contrast, in dsb-2 mutants, where HIM-5 expression is unaffected, both X and autosomal breaks are impacted equally. As discussed above, in the absence of HIM-5 function, there are ~15 breaks/ nucleus. The Ppie1::him-5 transgene is expressed to lower levels than Phim-5::him-5, but in the best case, the ectopic expression of this protein should give a maximum of ~15 breaks (the total # of breaks is thought to be ~30/nucleus). By these estimates, Ppie-1::him-5; him-17 and him-5 null mutants have the same number of breaks. Yet, in the former case, breaks occur on the X; whereas in the latter they do not. The best explanation for this discrepancy is that HIM-5 is sufficient to recruits the DSB machinery to the X chromosome.

      The one experiment that seems to elicit the conclusion that HIM-5 expression is sufficient for breaks on the X chromosome is flawed (see below). The conclusion that HIM-5 "coordinates the activities of the different accessory sub-groups" is not supported by data presented here or elsewhere.

      We have reorganized the discussion to more directly address the reviewers’ concerns. We raise the possibility that HIM-5 has an important role in bringing together the SPO-11 and its interacting components (DSB-1/2/3) with the other DSB inducing factors, including those factors that regulating DSB timing (XND-1), coordination with the cell cycle (REC-1), association with the chromosome axis (PARG-1, MRE-11), and coupling to downstream resection and repair (MRE-11, CEP-1).  

      This raises a natural question: if HIM-5 has such a central role, why are the phenotypes of HIM-5 so mild? We propose that while the loss of DSBs on the X appears mild, more profound effects are seen in the total number, timing, and placement of the DSBs across the genome- all of which are diminished or altered in the absence of HIM-5. The phenotypes of him-5 loss reminiscent of those observed in Prdm9-/- in mice where breaks are relocated to transcriptional start sites and show significant delay in formation. As with PRDM9, the comparatively subtle phenotypes of HIM-5 loss do not diminish its critical role in promoting proper DSB formation in most mammals.

      Like most other studies that have examined DSB formation in C. elegans, this work relies on indirect assays, here limited to the cytological appearance of RAD-51 foci and bivalent chromosomes, as evidence of break formation or lack thereof. Unfortunately, neither of these assays has the power to reveal the genome-wide distribution or number of breaks. These assays have additional caveats, due to the fact that RAD-51 association with recombination intermediates and successful crossover formation both require multiple steps downstream of DSB induction, some of which are likely impaired in some of the mutants analyzed here. This severely limits the conclusions that can be drawn. Given that the goal of the work is to understand the effects of individual factors on DSB induction, direct physical assays for DSBs should be applied; many such assays have been developed and used successfully in other organisms.

      We appreciate the reviewer’s thoughtful comments. We agree that RAD-51 foci are an indirect readout of DSB formation and that their dynamics can be influenced by defects in downstream repair processes. However, in C. elegans, the available methods for directly detecting DSBs are limited. Unlike other organisms, C. elegans lacks γH2AX, eliminating the possibility of using γH2AX as a DSB marker. TUNEL assays, while conceptually appealing, have proven unreliable and poorly reproducible in the germline context. Similarly, RPA foci do not consistently correlate with the number of DSBs and are influenced by additional processing steps.

      Given these limitations, RAD-51 foci remain the most widely accepted surrogate for monitoring DSB formation in C. elegans. While we fully acknowledge the caveats associated with this approach — particularly the potential effects of downstream repair defects — RAD-51 analysis continues to provide valuable insight into DSB dynamics and regulation, especially when interpreted in combination with other phenotypic assessments.

      Throughout the manuscript, the writing conflates the roles played by different factors that affect DSB formation in very different ways. XND-1 and HIM-17 have previously been shown to be transcription factors that promote the expression of many germline genes, including genes encoding proteins that directly promote DSBs. Mutations in either xnd-1 or him-17 result in dysregulation of germline gene expression and pleiotropic defects in meiosis and fertility, including changes in chromatin structure, dysregulation of meiotic progression, and (for xnd-1) progressive loss of germline immortality. It is thus misleading to refer to HIM-17 and XND-1 as DSB "accessory factors" or to lump their activities with those of other proteins that are likely to play more direct roles in DSB induction.

      It is clear that we will not reach agreement about the direct vs indirect roles here of chromatin remodelers/transcription factors in break formation. In yeast, there is a precedent for SPP1 and in mouse for Prdm9, both of which could be described as transcription factors as well, as having roles in break formation by creating an open chromatin environment for the break machinery. We envision that these proteins function in the same fashion. The changes in histone acetylation in the xnd-1 mutants supports such a claim.

      We do not know what the reviewer is referring to in statement that “XND-1 and HIM-17 have previously been shown to be transcription factors that promote the expression of many germline genes.” While the Carelli et al paper indeed shows a role for HIM-17 in expression of many germline genes, there is only one reference to XND-1 in this manuscript (Figure S3A) which shows that half of XND-1 binding sites overlap with the co-opted germline promoters. There is no transcriptional data at all on xnd-1 mutants, save our studies (referenced herein) that XND-1 regulates him-5 expression.

      For example, statements such as the following sentence in the Introduction should be omitted or explained more clearly: "xnd-1 is also unique among the accessory factors in influencing the timing of DSBs; in the absence of xnd-1, there is precocious and rapid accumulation of DSBs as monitored by the accumulation of the HR strand-exchange protein RAD-51.

      We are not sure what is confusing here. The distribution of RAD-51 foci is significantly altered in xnd-1 mutants and peak levels of breaks are achieved as nuclei leave the transition zone (Wagner et al., 2010; McClendon et al., 2016). There is no other mutation that causes this type of change in RAD-51 distribution.

      "The evidence that HIM-17 promotes the expression of him-5 presented here corroborates data from other publications, notably the recent work of Carelli et al. (2022), but this conclusion should not be presented as novel here.

      We have clarified this in the text. We note that this paper showed alterations in him-5 levels by RNA-Seq but they did not validate these results with quantitative RT-PCR. Thus, our studies do provide an important validation of their prior results.

      The other factors also fall into several different functional classes, some of which are relatively well understood, based largely on studies in other organisms. The roles of RAD50 and MRE-11 in DSB induction have been investigated in yeast and other organisms as well as in several prior studies in C. elegans. DSB-1, DSB-2, and DSB-3 are homologs of relatively well-studied meiotic proteins in other organisms (Rec114 and Mei4) that directly promote the activity of Spo11, although the mechanism by which they do so is still unclear.

      Whilst we agree that we understand some of the functions of the homologs, there are clearly examples in other processes of conserved proteins adopting unique regulatory function. We should not presume evolutionary conservation until proven. Indeed the comparison between the Mer2 proteins becomes particularly relevant here. For example, the RMM complex in plants does not contain PRD3, although this protein is thought to have function in DSB formation and repair (Lambing et al, 2022; Vrielynck et al., 2021; Thangavel et al., 2023). In Sordaria, as well, the Mer2 homolog has distinct functions (Tesse et al., 2017).  

      Mutations in PARG-1 (a Poly-ADP ribose glycohydrolase) likely affect the regulation of polyADP-ribose addition and removal at sites of DSBs, which in turn are thought to regulate chromatin structure and recruitment of repair factors; however, there is no convincing evidence that PARG-1 directly affects break formation.

      Our prior collaborative studies on PARG-1 showed that is has a non-catalytic function that promote DSBs that is independent of accumulation of PAR (Janisiw et al., 2020; Trivedi et al., 2022)

      CEP-1 is a homolog of p53 and is involved in the DNA damage response in the germline, but again is unlikely to directly contribute to DSB induction.

      We respectfully disagree with the reviewer’s statement. While CEP-1 is indeed a homolog of p53 and plays a major role in the DNA damage response, prior work from Brent Derry’s lab and from our group (Mateo et al., 2016) demonstrated that specific cep-1 separationof-function alleles affect DSB induction and/or repair pathway choice independently of canonical DNA damage checkpoint activation. In particular, defects in DSB formation observed in certain cep-1 mutants can be rescued by exogenous irradiation, supporting a direct or closely linked role in promoting DSB formation rather than merely responding to damage. Thus, based on these functional data, we considered CEP-1 a relevant factor to include in our analysis. We have now clarified this rationale in the revised manuscript.

      HIM-5 and REC-1 do not have apparent homologs in other organisms and play poorly understood roles in promoting DSB induction. A mechanistic understanding of their functions would be of value to the field, but the current work does not shed light on this. A previous paper (Chung et al. G&D 2015) concluded that HIM-5 and REC-1 are paralogs arising from a recent gene duplication, based on genetic evidence for a partially overlapping role in DSB induction, as well as an argument based on the genomic location of these genes in different species; however, these proteins lack any detectable sequence homology and their predicted structures are also dissimilar (both are largely unstructured but REC-1 contains a predicted helical bundle lacking in HIM-5). Moreover, the data presented here do not reveal overlapping sets of genetic or physical interactions for the two genes/proteins. Thus, this earlier conclusion was likely incorrect, and this idea should not be restated uncritically here or used as a basis to interpret phenotypes.

      Actually, there is quite good bioinformatic analysis that the rec-1 and him-5 loci evolved from a gene duplication and that each share features of the ancestral protein (Chung et al., 2015). We are sorry if the reviewer casts aspersions on the prior literature and analyses. The homology between these genes with the ancestral protein is near the same degree as dsb-1, dsb-2, or dsb-3 to their ancestral homologs (<17%).

      DSB-1 was previously reported to be strictly required for all DSB and CO formation in C. elegans. Here the authors test whether the expression of HIM-5 from the pie-1 promoter can rescue DSB formation in dsb-1 mutants, and claim to see some rescue, based on an increase in the number of nuclei with one apparent bivalent (Figure 2C). This result seems to be the basis for the claim that HIM-5 coordinates the activities of other DSB proteins. However, this assay is not informative, and the conclusion is almost certainly incorrect. Notably, a substantial number of nuclei in the dsb-1 mutant (without Ppie-1::him-5) are reported as displaying a single bivalent (11 DAPI staining bodies) despite prior evidence that DSBs are absent in dsb-1 mutants; this suggests that the way the assay was performed resulted in false positives (bivalents that are not actually bivalents), likely due to inclusion of nuclei in which univalents could not be unambiguously resolved in the microscope. A slightly higher level of nuclei with a single unresolved pair of chromosomes in the dsb-1; Ppie-1::him-5 strain is thus not convincing evidence for rescue of DSBs/CO formation, and no evidence is presented that these putative COs are X-specific. The authors should provide additional experimental evidence - e.g., detection of RAD-51 and/or COSA-1 foci or genetic evidence of recombination - or remove this claim. The evidence that expression of Ppie-1::him-5 may partially rescue DSB abundance in dsb-2 mutants is hard to interpret since it is currently unknown why C. elegans expresses 2 paralogs of Rec114 (DSB-1 and DSB-2), and the age-dependent reduction of DSBs in dsb-2 mutants is not understood.

      We have removed this claim in part because we have been unable to create the triple mutants strains to analyze COSA-1 foci.

      To the point about 11 vs 12 DAPI bodies: the literature is actually replete with examples of 11 DAPI bodies vs 12 in mutants with no breaks:

      Hinman al., 2021: null allele of dsb-3 has an average of 11.6 +/- 0.6 breaks;

      Stamper et al, 2013, show just over 60% of dsb-1 nuclei with 12 DAPI bodies and 5-10% with 10 DAPI bodies. (Figure 1);

      In addition, we also previously showed (Machovina et al., 2016) that a subset of meiotic nuclei have a single RAD-51 focus and can achieve a crossover. RAD-51 foci in spo-11 were also reported in Colaiacovo et al., 2003.

      Several of the factors analyzed here, including XND-1, HIM-17, HIM-5, DSB-1, DSB-2, and DSB-3, have been shown to localize broadly to chromatin in meiotic cells. Coimmunoprecipitation of pairs of these factors, even following benzonase digestion, is not strong evidence to support a direct physical interaction between proteins.

      Similarly, the super-resolution analysis of XND-1 and HIM-17 (Figure 1EF) does not reveal whether these proteins physically interact with each other, and does not add to our understanding of these proteins functions, since they are already known to bind to many of the same promoters. Promoters are also likely to be located in chromatin loops away from the chromosome axis, so in this respect, the localization data are also confirmatory rather than novel.

      While the binding to promoters would be expected to be on DNA loops, that has not been definitively shown in the worm germ line. The supplemental data of the Carelli paper suggests that there are ~250 binding sites for each protein at these coopted promoters. This could not account for crossover map seen in C. elegans.

      The reviewer states correct that we do not reveal that these proteins interact, but we have shown that the two proteins co-IP and have a Y2H interaction. This interaction is supporedt by a recent publication (Blazickova et al., 2025) corroborating this conclusion and identifies XND-1 in HIM-17 co-IPs also in the presence of benzonase. We do now show, however, by immuno-localization that the two proteins appear to be adjacent, but nonoverlapping. As now described in the text, AlphaFold 3 modeling and structural analysis suggests that the two proteins do interact directly and that the tagged 5’ end of HIM-17 used in our studies is likely to be at least 200nm from the putative XND-1 binding interface, a distance that is consistent with our confocal images showing frequent juxtaposition of the two proteins.

      The phenotypic analysis of double mutant combinations does not seem informative. A major problem is that these different strains were only assayed for bivalent formation, which (as mentioned above) requires several steps downstream of DSB induction. Additionally, the basis for many of the single mutant phenotypes is not well understood, making it particularly challenging to interpret the effects of double mutants. Further, some of the interactions described as "synergistic" appear to be additive, not synergistic. While additive effects can be used as evidence that two genes work in different pathways, this can also be very misleading, especially when the function of individual proteins is unknown. I find that the classification of genes into "epistastasis groups" based on this analysis does not shed light on their functions and indeed seems in some cases to contradict what is known about their functions. ‘

      As described above, each of the proteins analyzed is thought to have a direct role in regulating meiotic DSB formation and single mutant phenotypes are consistent with this interpretation. In almost all-if not all- of these cases, IR induced breaks suppress univalent phenotypes (or uncover a downstream repair defect (e.g. in mre-11)) supporting this conclusion. We have changed the terminology from “epistasis groups” since this is not strict epistasis, but rather, “functional groups”.  

      The yeast two-hybrid (Y2H) data are only presented as a single colony. While it is understandable to use a 'representative' colony, it is ideal to include a dilution series for the various interactions, which is how Y2H data are typically shown.

      The Y2H data are presented as spots on a plate and are from three to four individual transformants per interaction tested, and are not individual colonies. The experiment was repeated in triplicate from different transformations. We have now made this clearer in the materials and methods section. This approach has been successfully used to examine protein interactions in our prior manuscripts of yeast and human proteins [Gaines et al (2015) Nat. Comms 6:7834; Kondrashova et al (2017) Cancer Discovery 7:984; Garcin et al (2019) PLoS Genetics 15:e1008355; Bonilla et al (2021) eLife 1: e68080) Prakash et al (2022) PNAS 119: e2202727119, etc]

      Additional (relatively minor) concerns about these data:

      (1) Several interactions reported here seem to be detected in only one direction - e.g., MRE-11-AD/HIM-5-BD, REC-1-AD/XND-1-BD, and XND-1-AD/HIM-17-BD - while no interactions are seen with the reciprocal pairs of fusion proteins. I'm not sure if some of this is due to pasting "positive" colony images into the wrong position in the grid, but this should be addressed.

      The asymmetry in the interactions observed is due to the well-known phenomenon in yeast two-hybrid (Y2H) assays where certain plasmids exhibit self-activation when fused in one orientation, making interpretation of reciprocal interactions challenging. In our experiment, some of the plasmids indeed showed self-activation in one direction, which likely accounts for the lack of interaction seen with the reciprocal pairs of fusion proteins. We have clarified this point in the Methods.

      (2) DSB-3 was only assayed in pairwise combinations with a subset of other proteins; this should be explained; it is also unclear why the interaction grids are not symmetrical about the diagonal.

      We have now completed the analysis by adding the interactions of DSB-3 with the remaining proteins that were missing from the initial set.

      (3) I don't understand why the graphic summaries of Y2H data are split among 3 different figures (1, 2, and 3).

      We chose to split the graphic summaries of the Y2H data across Figures 1, 2, and 3 because we felt this organization better aligns with the flow of the results presented in each figure. Each set of interactions is shown in the context of the specific experiments and findings discussed in those sections, which we believe helps provide a clearer and more logical presentation of the data.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Figure 1: B) The IP is difficult to interpret - there is a band of the corresponding size to XND-1 in the control lane calling into question the specificity of the IP/Western.

      We added a supplemental figure with the specificity of the antibody showing that there is a background non-specific band.

      C) More information about the mass spectrometry should be included. No indication of the number of times a peptide was identified, or the overall coverage of the identified proteins.

      Done

      This is important as in the results section (line 114) the authors indicate that there was "strong" interaction yet there is no way to assess this.

      D) Why wasn't hatching measured in the him-5p::him-5; him-17(ok424) strain?

      Great question. I guess we need to do this while back out for review. If anyone has suggestions of what to say here. Clearly we overlooked this point but do have the strain.

      E) Quantification of the cytology should be included.

      We have now quantified overlap between XND-1 and HIM-17

      Figure 2: C) Statistics should be included.

      Done

      E) Quantification should be included for the cytology. I recommend changing the eals15 to HIM-5.

      We included better images showing whole gonads instead of one or two nuclei. We were not sure what the reviewers want us to quantify here since the relocalization of the protein to the cytoplasm is very clear.

      I have a general issue with the use of the term epistasis - this is used to order gene function based on different mutant phenotypes, usually with null alleles. While I think the authors have valid points with how they group the different SPO-11 accessory proteins, I do not think they should use the word epistasis, but rather genetic interactions.

      We appreciate the reviewers thoughts on this matter and have removed the term epistasis and use functional groups or genetic interactions throughout the text.

      Figure 4 and the nature of the X chromosome: First, I think it would help the non-C. elegans reader to include a little more information on the X chromosome with respect to its differences compared to the autosomes. I also think that, if possible, it would be beneficial to include a model of the X in Figure 4.

      We have added more about X/autosome differences in the intro and during the discussion of HIM-5 function and have added a figure showing difference in the behavior of the X/autosomes during DSB/crossover formation.

      Minor points:

      Abstract: Given the findings of Silva and Smolikove on SPO-11 breaks, I recommend removing "early" from line 28 in the Abstract.

      Done

      Introduction (line 93): I think "biochemical studies" is a stretch here - I recommend "interaction studies".

      Done

      Results: (lines 160-161): mutations are not required for breaks. Line 172, there is a problem with the sentence.

      Corrected

      Reviewer #2 (Recommendations For The Authors):

      Major comments:

      (1) Figure 1B- The signal for XND-1 seems to appear both in the control and him-17::HA IP. Do the authors have tested the specificity of the XND-1 antibody?

      We included a supplementary figure demonstrating the specificity of the XND-1 antibody by Western blot. This was also previously published (Wagner et al., 2010)

      (2) Figure 1D - can the authors provide an explanation why the him-5p::him-5 transgene that drives a higher expression than pie-1p::him-5 fails to suppress the Him phenotype seen in him-17? What are the HIM-5 levels like in these two strains compared to N2 and him-17 null mutants? Can this information provide explanation for the differential effect of the him-5 transgenes?

      We previously reported that him-5p::him-5 drives higher expression than pie-1p::him-5 (McClendon et al, 2016).

      The reason that him-5p::him-5 does not rescue, despite higher wild type expression is that HIM-17 directly regulates expression of him-5. Since HIM-17 does not regulate the pie-1 promoter, the pie-1p::him-5 construct can at least partially suppress the him-17 mutation.

      We have (hopefully) explained this better in the text.  

      (3) Line 102- the subheading "HIM-5 is the essential factor for meiotic breaks in the Xchromosome" may not be appropriate for this section. This is what has previously been known. However, the results in Figure 1 demonstrate that a him-5 transgene can partially rescue the him-17 and ¬xnd-1 phenotype, but not that it is essential for meiotic DSB formation on X chromosomes.

      We think some of the concern here is sematic and have changed the phraseology to say that HIM-5 is SUFFICIENT for DSBs on the X… which had not previously been shown.

      Vis-à-vis the X chromosome, in all genetic backgrounds examined, the absence of HIM-5 consistently results in a complete lack of DSBs on the X. For instance, in dsb-2 mutants— where HIM-5 is still expressed—DSBs are reduced genome-wide, but the X chromosome occasionally retains breaks. In contrast, even a weak allele of him-17 results specifically in the loss of X chromosome breaks, underscoring a unique requirement for HIM-5 in promoting DSBs on the X. While Figure 1 shows that a him-5 transgene can partially rescue him-17 and xnd-1 phenotypes, the consistent observation that X breaks are absent without HIM-5 supports its classification as sufficient for DSB formation on the X chromosome.

      (4) Figure 1E - please consider enlarging the images and showing multiple examples.

      Done.

      I also suggest that the authors perform a more rigorous analysis to support the conclusion that XND-1 and HIM-17 localize away from the axis by quantifying multiple images and doing line-scan analysis.

      Provided. New images are provided in both, the main and supplemental figures, and quantification is included. There is no detectable overlap of the two protein with one another or the DNA axes (see quantification of overlap in Fig. 1).

      (5) Line 162 - This is the first mention of DSB-1, DSB-2, and DSB-3 in the paper. DSB-1 and DSB-2 are Rec114 homologs in C. elegans (Tesse et al., 2017), while DSB-3 is a homolog of Mei4 (Hinman et al., 2021). These proteins should be properly introduced in the introduction with appropriate citations.

      Done. We appreciate the reviewer pointing out that this was the first reference to these genes.

      (6) Line 169 - the rationale for this experiment is unclear. Why did the Y2H interaction between HIM-5 and DSB-1 prompt the authors to test the rescue of dsb-1 or dsb-2 phenotypes by the ectopic expression of him-5? Do the authors have evidence that HIM-5 level is reduced in dsb-1 or dsb-2 mutants?

      We have reorganized this section to better explain the motivation for looking at these interactions. We did see a difference in the localization in HIM-5 in the dsb-1 mutant animals and we did have a sense that HIM-5 was critical for breaks on the X. We reasoned that it could have independent functions in promoting breaks that were not yet appreciated so wanted to do this experiment.

      (7) Line 172 - "very slightly reduced". This claim requires statistical analysis.

      We added statistical analysis, but we also removed this claim.

      (8) Figures 2C and 2D - Can the authors provide an explanation why the pie-1p::him-5 transgene fails to suppress the phenotypes in dsb-1, while the him-5p::him-5 trasgene can? Again, the rationale for these experiments is unclear. Because of this, the interpretation is also unclear.

      The difference in rescue between the pie-1p::him-5 and him-5p::him-5 transgenes likely reflects differences in expression levels. As previously shown (McClendon et al., 2016), the him-5p::him-5 construct results in significantly higher expression of HIM-5 protein compared to pie-1p::him-5. This elevated expression likely explains its ability to partially rescue the dsb-1 phenotype. In contrast, the lower expression driven by the pie-1 promoter is insufficient to compensate for the absence of dsb-1 function. We have clarified the rationale and interpretation of these experiments in the revised manuscript to better reflect this point.

      (9) Lines 184-185 - the data for endogenously tagged HIM-5::3xHA are not shown anywhere in the paper. This must be shown.

      We have added this in the supplemental figures.

      (10) Figure 2D and 2E - what does the localization of pie-1p::him-5::GFP (eaIs15) and him5p::him-5::GFP (eaIs4) look like in wild-type and dsb-1 mutants? Are the cytoplasmic aggregates caused by increased levels of HIM-5 expression? Can the differential behavior of him-5 transgenes provide explanation for differential rescues?

      We now show both live and fixed images of Phim-5::him-5::gfp transgenes, as well as the localization of the endogenously HA-tagged HIM-5 locus (Figure 2 and S3). In all cases, the protein is initially nuclear and then absent from meiotic nuclei with similar timing. The Ppie1::him-5 transgene was very difficult to image due to low expression (even in wild type) so it not shown here. We presume it is the slightly elevated level of expression of the Phim5::him-5::gfp that can explain the differential rescue.

      (11) Lines 221-222, where are the results shown? Please refer to Figure S3.

      Done

      (12) Figure S3 - these need statistical analyses.

      Done

      (13) Lines 230-231 - what about the rec-1; parg-1; cep-1 triple mutant?

      This is an excellent suggestion and not one we have not yet pursued. Given the lack of strong phenotypes in all combination of double mutants, we prioritized other experiments . However, we agree that examining the rec-1; parg-1; cep-1 triple mutant would provide a valuable test of whether these factors act in the same pathway, and we appreciate the reviewer highlighting this potential future direction.

      (14) Line 298 - I suggest the authors take a look at the Alphafold prediction of DSB-1/DSB-2/DSB-3 and the comparison to human and budding yeast Rec114/Mei4 complex in Guo et al., 2022 eLife, which could provide insights into the Y2H results.

      We thank the reviewer for these comments and have indeed used these interactions and predicted homologies to zero in a region of interaction between these proteins that resembles what is seen in humans and yeast with a dimer of REC114 like proteins wraps stabilizing a central Mei4 helix . This is now shown in Figure 3H, I. Satisfyingly, this modeling predicts that a trimer comprised of 2 DSB-1 proteins with DSB-3 is more stable than a DSB1-DSB-2-DSB-3 trimer. This might explain why DSB-2 is not required in young adults and only becomes essential as DSB-1 levels drop in older animals (Rosu et al., 2013)

      (15) Can the authors introduce mutations within the DSB-1 interfaces that disrupt the interaction to either SPO-11 or DSB-2?

      We have begun to address this question by introducing targeted mutations within DSB-1. As shown in Figure 3E and 3F, mutations in the C-terminal region of DSB-1—which includes a core of four α-helices—disrupt its interaction with DSB-2 and DSB-3, but not with SPO-11. These findings suggest that the C-terminus mediates interactions specifically with DSB2 and DSB-3

      (16) Line 323 - The him-5 phenotypes are too weak to support the idea that it serves as the linchpin for the whole DSB complex. Do the authors have an explanation for why him-5 mutants exhibit X-chromosome-specific DSB defects?

      In response to the reviewer, above, and in the text, we have included a more detailed explanation of why we think HIM-5 has a key role in coordinating meiotic break formation. Although, identified for its role on the X, the phenotypes associated with DSB formation in the mutant are really quite pleiotropic and severe.

      (17) Line 436 - C. elegans lacks DSB hotspots.

      Removed

      Minor comments:

      (1) Figure 1A - please show the raw data for the yeast two-hybrid.

      We show representative yeast colonies in Figure S3.

      (2) It looks like the labeling for Figure 1B and 1C are switched.

      Fixed.

      (3) Figure 1B - what does the red box indicate? Please explain it in the legend.

      It indicates the XND-1 band. We added that information in the legend.

      (4) Figure 1C - in the legend, it was noted that the results are from GFP pulldowns of HIM17::GFP. However, the method for Figure 1B and the method section noted that HIM-17 was tagged with 3xHA, and the pull-down was performed using anti-HA affinity matrix. Please reconcile this discrepancy.

      That’s because they were done in two different sets of experiments. For the IPs we used a HIM-17::HA strain and for the MS, a HIM-17::GFP strain.

      (5) Also in Figure 1C - please call Table S2 in the main text when discussing the mass spec results. Also, it is not clear what HIM-17 and GFP indicate in the table. What makes CKU80 different from the other proteins listed under GFP? Please explain more clearly in the legend.

      We have move the table to supplemental data where we have included all of the peptide counts and gene coverage. We have included in the revised method rationale for inclusion in this table which explains why CKU-80 differs.

      (6) Line 527 - it is unclear what experiment was done for HIM-17. Please revise it to indicate that this is for "HIM-17 immunoprecipitation". Also please indicate the strain used for HIM17 pull-down (AV280?).

      (7) Line 113- please be specific about how the HIM-17 IP was performed. Which epitope and strains are used for pull-downs?

      This indeed was AV280. This has been added to the text and methods.

      (8) Figure 1D- What does ND mean? In the text, it was stated that there was only a minor suppression of hatching rates. The hatching rate for him-5p::him-5; him-17 must have been measured, and the data must be presented.

      ND does mean not determined. We have removed the statement about “minor suppression”. We only tested the overall population dynamics in the Phim-5::him-5;him17(ok424) and the DAPI body counts. The failure to suppress the latter suggests there would be no enect on hatching rates, although we did not test this directly. Since we had done this for the Ppie-1::him-5;him-17 strain, we provided this information to further support the claims of genetic rescue by ectopic expression.

      (9) Line 151 - please specify that STED was used.

      We have removed the STED images, and just show the confocal images with Lightning Processing.

      (10) Figure 1E- the authors suggested that HIM-17 and XND-1 mainly localize to autosomes but not the X chromosome. However, there is not enough evidence that the chromosome excluded from HIM-17 staining is indeed an X chromosome.

      (11) Figure 1E (Line 154) - what are the active chromatin markers examined? Where are the data?

      We have previously shown that the chromosome lacking XND-1 staining is the X (Wagner et al., 2010). The X is heterochromatic and chromatin marks associated with active transcription, including H3K4me3 and HTZ-1 (a variant H2A), preferentially localize to autosomes, effectively anti-marking the X chromosome. As shown in the new Figure 1E, a single chromosome has very little XND-1 and HIM-17 associated proteins. This is the X chromosome.

      (12) Line 172 - It should be a comma instead of the period after "In dsb-1 mutants".

      Fixed

      (13) Figure S3H-K - I suggest the authors indicate the alleles of mre-11 (null vs. iow1) on the graph, similarly to him-5(e1490) to avoid confusion.

      Done

      (14) Lines 294 and 600 - Guo et al. 2022 is now published in eLife. The authors must cite the published paper, not the preprint.

      Fixed

      (15) Line 407 - the reference Carelli et al., 2022 is missing.

      Added

      (16) Line 766 - please remove "is" before nuclear.

      Done

      Reviewer #3 (Recommendations For The Authors):

      Major issues:

      In my view, the most interesting mechanistic finding in the paper is the evidence that HIM-5 may not bind to chromatin in the absence of DSB-1. If validated, this would suggest that HIM-5 is likely to be directly involved in a process that promotes break formation, in contrast to factors such as HIM-17 and XND-1. It does not, however, support the idea that HIM-5 is at the top of a hierarchy of DSB factors, as it is interpreted here. More importantly, the data supporting this claim are unconvincing; only a single image of an unfixed gonad from an animal expressing HIM-5::GFP is shown. Immunofluorescence should be performed and the results must be quantified.

      We have provided additional images of the HIM-5 relocalization to show that we observed this in both fixed and live worms with two different tagged strains. The exclusion from the nucleus is seen in all scenarios. Whether the protein now accumulates exclusively in the cytoplasm/ is destabilized is challenging to address with the fixed images due to the arbitrariness of defining “background” staining.

      More generally, this type of analysis, looking at the interdependence of different factors for their association with chromosomes, is much more informative than the genetic interaction data presented in the paper, which does not seem to provide any mechanistic insights into the functions of the factors analyzed. The paper could potentially be greatly improved through a more extensive, systematic analysis of the interdependence of DSBpromoting factors for their localization to chromosomes.

      We have at least added this for XND-1 and HIM-17 and show they are not interdependent for chromosome association. We also provide for the first time data on the localization of HIM-5 in the dsb-1 mutant. Many of the other interactions have already been shown in the literature and/or were not warranted base on the lack of genetic interaction we present here.

      Minor issues:

      The title is vague and inconclusive. A more concrete title summarizing the major findings would help readers to assess whether the work is of interest.

      We have discussed the title extensively with all authors and all would like to keep the current title.

      The authors claim that the expression of HIM-5 from a different promoter (Ppie-1::him-5) but not its endogenous promoter (Phim-5::him-5) can partially rescue the DSB defect in him-17 mutants. To support this claim, they should really quantify the germline expression of HIM-5 in wild-type, him-17, him-17; Ppie-1::him-5, and Phim-5::him-5; him-17.

      We had previously reported the expression in the N2 background of both transgenes (McClendon et al., 2016)

      Panel O appears to be missing from Figure S3.

      Fixed

      The evidence for chromosome fusions in cep-1; mre-11 mutants shown in S4D is not convincing and the claim should be removed unless stronger evidence can be obtained.

      A clearer image has been added

      The basis of the following statement is unclear: "Furthermore, rec-1;him-5 double mutants give an age-dependent severe loss of DSBs (like dsb-2 mutants) suggesting that the ancestral function of the protein may have a more profound effect on break formation." The manuscript does not seem to include data regarding age-dependent loss of DSBs and no other publication is cited to support this claim. The interpretation is also perplexing; I think that it may be predicated on the idea that REC-1 and HIM-5 are paralogs, but as stated above, this claim is not well supported and is likely specious.

      We have added the reference. This was shown in Chung et al., 2013 – the paper that presented the cloning of the rec-1 locus.

  2. Sep 2025
    1. Author response:

      Joint Public Review

      This manuscript puts forward the provocative idea that a posttranslational feedback loop regulates daily and ultradian rhythms in neuronal excitability. The authors used in vivo long-term tip recordings of the long trichoid sensilla of male hawkmoths to analyze spontaneous spiking activity indicative of the ORNs' endogenous membrane potential oscillations. This firing pattern was disrupted by pharmacological blockade of the Orco receptor. They then use these recordings together with computational modeling to predict that Orco receptor neuron (ORN) activity is required for circadian, not ultradian, firing patterns. Orco did not show a circadian expression pattern in a qPCR experiment, and its conductance was proposed to be regulated by cyclic nucleotide levels. This evidence led the authors to conclude that a post-translational feedback loop (PTFL) clockwork, associated with the ORN plasma membrane, allows for temporal control of pheromone detection via the generation of multi-scale endogenous membrane potential oscillations. The findings will interest researchers in neurophysiology, circadian rhythms, and sensory biology. However, the manuscript has limited experimental evidence to support its central hypothesis and is undermined by several questionable assumptions that underlie their data analysis and model builds, as well as insufficient biological data, including critical controls to validate and/or fully justify the model the authors are proposing.

      We thank the reviewers for their thorough and thoughtful comments and believe that the manuscript will be much stronger once we incorporate the requested changes.

      Please note that we used ORN as acronym for “olfactory receptor neuron” throughout the manuscript. ORNs contain odorant receptors (ORs), and in insects these ORs have to associate with the olfactory receptor co-receptor (Orco) in the cilium of the neuron to form functional OR-Orco complexes for odorant detection. Besides this chaperone function, Orco can form homomers with the potential to act as ionic pacemaker channels; a role which we explore in this study.

      Strengths:

      The study is notable for its combination of long-term in vivo tip recordings with computational modeling, which is technically challenging and adds weight to the authors' claims. The link between Orco, cyclic nucleotides, and circadian regulation is potentially important for sensory neuroscience, and the modeling framework itself - a stochastic Hodgkin-Huxley formulation that explicitly incorporates channel noise - is a solid and forward-looking contribution. Together, these elements make the study conceptually bold and of clear interest to circadian and olfactory biologists.

      Major weaknesses:

      At the same time, several limitations temper the conclusions. The pharmacological evidence relies on a single antagonist and concentration, without key controls. The circadian analysis is based on relatively small numbers of neurons, with rhythms detected only in subsets, and the alignment procedure used in constant darkness raises concerns of bias. The molecular evidence is sparse, with only three qPCR timepoints, and the model, while creative, rests on assumptions that are not yet fully supported by in vivo data.

      Please see our responses to the detailed comments.

      Detailed comments are provided below:

      (1) The role for Orco proposed in the authors' model largely stems from the effects seen following the administration of (a single dose) of the Orco antagonist, OLC15. However, this hypothesis is undercut by the lack of adequate pharmacological controls, including a basic multipoint OLC15 dose-response series in addition to the administration of blockers for the other channels that are embedded in their model, but which were ruled out as being involved in the modulation of biological rhythms. In addition, these studies would (ideally) also benefit from the inclusion of the same concentration (series) of an inactive OLC15 analog to better control for off-target effects.

      The Orco agonist VUAA1 (Jones et al., 2011) binds directly to Orco and increases the channel open time probability. In M. sexta hawkmoths, we have already published that VUAA 1 increases the low spontaneous activity of ORNs in a dose-dependent fashion (Nolte et al., 2016). Chen and Luetje (2012) systematically varied the chemical structure of VUAA1 to identify new Orco ligands and discovered 22 Orco Ligand Candidates (OLC) that either activated or inhibited Orco. In their heterologous expression system, Orco was most sensitive to inhibition by OLC15. Based on these results, we published a dose-response curve of OLC15 inhibition (1-100 µM) using in vivo tip recordings of pheromone-sensitive long trichoid sensilla of M. sexta (Nolte et al., 2016). In that study, we could also demonstrate that OLC15 antagonizes the VUAA1 activation of Orco.

      Furthermore, we tested other published Orco antagonists in in vivo assays in intact hawkmoths, focusing on amiloride-derived antagonists, because we previously identified an amiloride-sensitive cation channel in hawkmoth ORNs. We found that, in contrast to OLC15, the amilorides HMA and MIA were not Orco-specific but instead affected different targets depending on time-of-day (Nolte et al., 2016). Based on those experiments and the dose-response curves we determined that the Orco agonist VUAA1 (Jones et al., 2011) and the Orco antagonist OLC15 (Chen and Luetje, 2012) worked best in hawkmoth ORNs to target Orco pharmacologically. Based on comparative tests with other published Orco antagonists we settled since then in all further experiments on a dose of 50 µM OLC15.

      We will clarify the Methods section accordingly.

      (2) The expression pattern of Orco was assessed using qPCR at only three timepoints. Rhythmic transcripts can easily be missed with such sparse sampling (Hughes et al., 2017). A minimum of six evenly spaced timepoints across a 24-hour cycle would be required to confidently rule out circadian transcriptional regulation. In addition, the use of the timeless mRNA control from another study is not acceptable. Furthermore, qPCR analysis measures transcript abundance, not transcription, as the authors repeatedly state. Transcriptional studies would require nuclear run-off or, more recently, can be done with snRNAseq analysis. Taken together, these concerns undermine the authors' desire to rule out TTFL-based control that directly led them to implicate a PTTF-based model.

      We agree with the referees that more time points and a direct comparison between timeless and Orco mRNA levels should be included in this manuscript. We will include these additional qPCR experiments and edit the manuscript to make clear that we measure transcript abundance, but we will not perform snRNAseq analysis due to time- and financial constraints. We are currently working on the transcriptional control of Orco, both during ontogeny and throughout the day but this work in progress is beyond the scope of this manuscript.

      (3) The modelling presented is based on Orco as a ZT-dependent conductance tied to the cAMP oscillations that were reported by this group in the cockroach and from the presence and functionality in Manduca of homomeric Orco complexes that are devoid of tuning ORs. While these complexes have been generated in cell culture and other heterologous expression systems, as well as presumably exist in vivo in the Drosophila empty neuron and other tuning OR mutants, there is no evidence that these complexes exist in wild-type Manduca ORNs. While this doesn't necessarily undermine every aspect of their models, the authors should note the presence of Orco/OR complexes rather than Orco homomeric complexes.

      Our ELISAs found circadian oscillations in cAMP levels not only in antennae of the Madeira cockroach (Schendzielorz et al., 2014, 2012), but also in hawkmoth antennae (Schendzielorz et al., 2015). We will add the 2015 citation to the Modeling chapter in the Methods section to clarify this.

      We agree with the referees that we cannot distinguish between Orco homo- and heteromers in the different compartments of our hawkmoth ORNs. Thus, as the referee suggests, we will add text regarding the presence and localization of OR-Orco heteromers. However, we have indications that Orco homomers could indeed be present in the hawkmoth ORNs. In a heterologous expression system, MsexOrco expression alone was sufficient to increase intracellular Ca<sup>2+</sup> levels in response to VUAA1 application (Nolte et al., 2013). In differentiating primary cell cultures of hawkmoth antennae, Orco expression started during a developmental time window where ORNs did not yet express pheromone receptors, and Orco affected spontaneous activity (Nolte et al., 2016). Thus, Orco homomers are present in developing hawkmoth ORNs during a time window where ORNs already express spontaneous activity but cannot heteromerize with pheromone receptors. However, we do not know whether and in what ratio homo- and heteromers of Orco and ORs are present in the respective sensillum compartments of adult hawkmoths (Nolte et al., 2013; Stengl, 1994; Stengl and Hildebrand, 1990).

      We will clarify our manuscript accordingly.

      (4) Some aspects of the authors' models, most notably the decision to phase align/optimize their DD and OLC15 recordings, are likely to bias their interpretations.

      It is consensus that insects display daily and circadian rhythms in pheromone-dependent mating, odor-gated feeding, and egg-laying behavior that phase-locks to environmental rhythms, corresponding with daily/circadian rhythms of sensory neuron physiology (e.g., Merlin et al., 2007; Rymer et al., 2007; Schendzielorz et al., 2015, 2012). However, circadian rhythms can be easily masked by stress, like the disturbances during a very challenging long-term recording experiment over several days. In addition, we observed in our animal raising facility that in LD 17:7 light-dark cycles the originally nocturnal hawkmoths M. sexta distribute their activity patterns over the course of the day, finding nocturnal as well as diurnal hawkmoths. Thus, light-dark cycles were not enough to ensure phase-synchronized behavioral rhythms, and it is very likely that the nocturnal hawkmoths rely heavily on pheromone/odor dependent synchronization as also found in other moth species (Ghosh et al., 2024). Here, we used isolated males that were never exposed to the female pheromones so that their circadian activity patterns readily disperse. Therefore, it became necessary in free-running conditions to first determine the respective behavioral rhythm for each animal, and then to phase-align their activity patterns to allow for statistical analysis. Otherwise, circadian differences would average out in a free-running population. As requested by the referees in point (7), we will use additional tests for rhythmicity in each of our recordings and revise the manuscript accordingly.

      Assuming that hawkmoths need pheromone presence as additional Zeitgeber, we are currently working on a new set of experiments where we attempt to improve synchronization by exposure to LD cycles and pheromone before DD and OLC15 recordings. We will add these experiments to the manuscript.

      (5) The tip recordings from long trichoid sensilla are critical aspects of this study. These recordings were carried out on upper sensillar tips located on the distal-most second annulus. Since there are approximately 80 annuli on the Manduca antennae, it is unclear whether the recordings are representative of the antennal response.

      We think the reviewers might have misinterpreted our description of the recording site. In the Methods, we state that we clip off the 20 most distal annuli (leaving a stump of about 60 annuli) and insert the reference electrode into the flagellum up to the second annulus from the cut end, i.e., the recording site is located at 2/3 – 3/4 of the antenna length as seen from the head of the animal. We will make this more clear in the Methods section.

      In addition, our lab did show with antibody stainings against Orco that apparently all ORNs that innervate long and short trichoid sensilla along the whole flagellum express the same staining pattern (Nolte et al., 2016). Furthermore, our patch clamp recordings of primary cell cultures of whole male antennae found largely overlapping ion channel populations across ORNs. This would indicate that all ORNs, whether they express pheromone- or general odorant receptors, could potentially share the same Orco-dependent spontaneous activity rhythms. In our lab, different experimenters from different years that recorded from long trichoid sensilla on different annuli did not detect obvious differences in neither the spontaneous activity nor the pheromone responses (c.f., Dolzer et al., 2003; Gawalek and Stengl, 2018; Schneider et al., 2025). Thus, it is very likely that we are reporting a general encoding mechanism that is not locally restricted along the antennal flagellum.

      (5.1) The authors do not provide any data in support of their cAMP/cGMP-based Orco gating…

      There are publications supporting cyclic nucleotide gating of Orco in Drosophila, but only after previous phosphorylation via protein kinase C (PKC; review: (Wicher and Miazzi, 2021)). Since Orco is very conserved among insect species, it is likely that these PKC and cGMP/cAMP-dependent regulations are present in other insect species. We are currently running thorough tip-recording experiments on the regulation of Orco gating, which are beyond the scope of this manuscript. However, we will add a set of experiments to this manuscript that demonstrates cAMP gating of Orco.

      (5.2)… and the PTTF model proposed is somewhat disappointing.

      For a detailed introduction of our PTFL membrane clock hypothesis please see our opinion paper (Stengl and Schneider, 2024).

      (5.3) The model seems to be influenced by their long-held proposal that insect olfactory signaling has a critical metabotropic component involving cyclic nucleotides, PKC, etc, a view that may be influenced by the use of Orco homomeric complexes generated in HEK cells.

      Indeed, we propose a metabotropic pheromone-transduction cascade, which in moths and cockroaches is based on G-protein-mediated activation of phospholipase C but not on adenylyl cyclase activation. Our hypothesis is not influenced by HEK cell heterologous expression studies of Orco but is supported by our own work comparing in vivo tip recordings of intact hawkmoths with patch clamp experiments on hawkmoth primary cell cultures of olfactory receptor neurons, which are able to respond to their species-specific pheromones in vitro ((Schneider et al., 2025; Stengl, 2010; Stengl and Funk, 2013; Wicher and Miazzi, 2021). In addition, a multitude of publications by other laboratories with in vivo and in vitro studies using physiological, genetic, and immunocytochemical assays all support a metabotropic signal transduction cascade in insect olfaction (reviews: Stengl, 2010; Stengl and Funk, 2013; Wicher and Miazzi, 2021). In contrast, the hypothesis suggesting a solely ionotropic pheromone- and general odor-dependent transduction cascade for all insect species is based on very sparse experimental evidence, based primarily on heterologous expression studies such as HEK cells that lack the insect’s WT molecular surroundings, and thus, cannot predict OR-Orco function in vivo. Furthermore, the ionotropic hypothesis is heavily based upon the argument that an inverse 7TM receptor cannot couple to G-proteins, which lacks careful backup via biochemical and structural studies. In addition, the ionotropic hypothesis lacks support via carefully performed physiological in vivo studies in different insect species that paid attention to analysis of the distinct kinetic components of ORN´s odor/pheromone responses and that employ physiological concentrations and durations of odor/pheromone stimuli (please see our most recent publication by Schneider et al. (2025)).

      (5.4) Nevertheless, structural studies on Orco do not support a cyclic nucleotide binding site, although PKC-based phosphorylation has been implicated in the fine-tuning/adaptation of olfactory signaling.

      While structural studies did not find evidence for conserved known cyclic nucleotide binding sites on Orco, this does not exclude the presence of so far unknown binding sites, or via sites that fold out only after a specific sequence of previous phosphorylations of the many phosphorylation sites on Orco. Indeed, physiological studies in Drosophila presented evidence for cyclic nucleotide dependence of Orco after previous PKC-dependent phosphorylation (Getahun et al., 2013). Our ongoing in vivo experiments in hawkmoths further corroborate a PKC- and cAMP-dependent modulation of Orco. These studies will be published in a follow-up publication.

      (6) Because only 5/11 LD and 7/10 DD animals showed daily rhythms, with averages lacking clear daily modulation, the methods are not sufficiently reliable enough to reveal novel underlying mechanisms of circadian rhythm generation. The reported results are therefore not yet reliable or quantifiable. To quantify their results, the authors should apply tests for circadian rhythmicity using methods such as RAIN, JTK CYCLE, MetaCycle, or Echo. The use of FFT and Wavelet is applauded, but these methods do not have tests of significance for rhythms and can be biased when analyzing data in which there could only be 1-3 circadian cycles. Because the conclusions appear to be based on 11-12 neurons that were recorded for 2-4 days, the reader is concerned that the methods are not yet perfected to provide strong evidence for circadian regulation of spontaneous firing of ORNs. The average data (e.g., Figure 3Bii and 3Cii) highlight the apparent lack of daily rhythms. In summary, the results would be more compelling if more than 50% of the recordings had significant circadian amplitudes and with similar periods and phases.

      The long-term tip-recordings of intact hawkmoths are very challenging and take a very long time to accomplish, thus, we are very happy that we succeeded in obtaining so many of them (N=34). Since 5/11 LD recordings and 7/10 DD recordings revealed daily/circadian rhythmicity and since many other physiological recordings at different ZTs of different members of our laboratory all revealed ZT-dependent pheromone-transduction we can be certain that the physiology of hawkmoth antennae is under strict circadian control. Please see also our response to (4) above commenting the phase-dispersal of activity rhythms observed in our experiments, as well as in the behavior of hawkmoth males in the mating cage.

      Nevertheless, we will follow the advice of the referees to apply additional tests for significance of rhythms in spontaneous activity, and we are thankful for the tests suggested that we were not aware of.

      (7) The statement that circadian patterns of ORN firing are lost with the Orco antagonist (OLC15) is not strongly supported. The manuscript should be revised to quantify how Orco changed circadian amplitude in the 12 recorded neurons. Measures of circadian amplitude can avoid confusing/vague statements like Line 394 “low and high frequency bands appeared to merge during the activity phase around ZT 0 in the animals that showed clear circadian rhythms (N = 5 of 11 in LD)”. The conclusion that Orco blocks circadian firing appears to be contradicted by Figure 6, which indicates that ~6 of these neurons had circadian periods detected by wavelet. The manuscript would be strengthened with details about the specificity and reproducibility of the Orco antagonist. The authors quantify the gradual decrease in firing with the slope of a linear fit to estimate how the “effectiveness [of OLC15] increased over time.” They conclude that the drug “obliterated circadian rhythms and attenuated the spontaneous activity in several, but not all experiments (N = 8 of 12).” The report would be greatly strengthened with corroborating data from additional Orco antagonists and additional doses of OLC15 (the authors use only 50 uM OLC15).

      We will revise our data analysis, according to the valuable suggestions of the referees.

      However, based upon our previous studies with other Orco antagonists and different doses of OLC15 (Nolte et al., 2016) we found that 50 µM OLC15 is the best Orco antagonist dose in M. sexta to target Orco-dependent modulation of spontaneous action potential activity of hawkmoth olfactory receptor neurons. Please see also our response to (1).

      (8) The manuscript includes several statements that are more speculation than conclusion. For example, there is no evidence for tuning or plasticity in this report. Statements like the following should be removed or addressed with experiments that show changes in odor response specificity or sensitivity: "ORN signalosomes are highly plastic endogenous PTFL clocks comprising receptors for circadian and ultradian Zeitgebers that allow to tune into internal physiological and external environmental rhythms as basis for active sensing." (Discussion Line 622). The paper concludes that (line 380) "mean frequency of spontaneous spiking and the frequency of bursting expressed daily modulation, and are both most likely controlled via a circadian clock that targets the leak channel Orco." This is too bold given the available results.

      We will revise the discussion accordingly and clarify which statements are supported via published evidence and which are predictions based upon our novel hypothesis published in our opinion paper (Stengl and Schneider, 2024).

      (9.1) Because Orco conductance is modulated by cyclic nucleotides, it remains highly plausible that circadian regulation occurs upstream at the level of signaling pathways (e.g., calcium, calcium-binding proteins, GPCRs, cyclases, phosphodiesterases).

      We agree with the referees that it is very likely that there are multiple layers of interconnected feedback cycles that control Orco localization and activity. Our novel hypothesis suggests interlocked TTFL and PTFL control of physiological circadian rhythms, not strictly hierarchical TTFL control, which would require a daily turnover of membrane proteins and transcriptional control via the established TTFL clock in insect ORNs. We currently search for TTFL control at all levels of odor/pheromone transduction using ZT-dependent transcriptomics in combination with qPCR and single nuclear transcriptomics, involving also all the molecules suggested by the referees. These studies are ongoing, are very time- and money-consuming, and are beyond the scope of this manuscript.

      (9.2) The possibility that circadian oscillations of cyclic nucleotides are generated by the canonical TTFL mechanism has not been excluded. In fact, extensive work in Drosophila has demonstrated that the TTFL-based molecular clock proteins are required for circadian rhythms in olfaction.

      Our experiments that test circadian TTFL control at different levels of the cAMP transduction cascade in hawkmoth antennae are on the way and are part of another publication. We will revise our discussion accordingly.

      The experiments published for TTFL dependent control of Drosophila olfaction that we are aware of (Krishnan et al., 1999; Tanoue et al., 2004) do not exclude interlinked PTFL and TTFL clocks. Krishnan et al. (1999) demonstrate that the TTFL clock in antennal olfactory receptor neurons correlates with circadian rhythms in odor responses measured in electroantennogram (EAG) recordings, not in single sensillum recordings as in our experiments. EAG recordings comprise not only voltage responses of the olfactory sensory neurons but also voltage changes generated in non-neuronal antennal cells such as trichogen and tormogen cells that built the transepithelial potential gradient via vATPases that generates the high K<sup>+</sup> concentration in the sensillum lymph (Jain et al., 2024; Klein, 1992; Thurm and Küppers, 1980). In addition, EAG recordings most likely contain responses of afferent neurons originating from somata in the brain that maintain central control of the antennae. Thus, EAG recordings are difficult to interpret.

      (11) A defining feature of circadian oscillators is the feedback mechanism that generates a time delay (e.g., PERIOD/TIMELESS repressing their own transcription). While the authors describe how cyclic nucleotides can regulate Orco conductance, they do not provide a convincing explanation of how Orco activity could, in turn, feed back into the proposed PTFL to sustain oscillations. For these reasons, the authors should consider:

      a) Providing a broader discussion of non-TTFL models of circadian rhythms (e.g., redox cycles, post-translational modifications).

      We will revise the discussion accordingly.

      b) Reassessing Orco expression using a higher-resolution temporal sampling ({greater than or equal to}6 timepoints per 24 h).

      We will add those experiments to the revised version of the manuscript (see our response to (2)).

      c) Clarifying or revising the PTFL model to explicitly address how feedback would be achieved. Alternatively, the data may be more consistent with Orco conductance rhythms being regulated by post-translational mechanisms downstream of the canonical TTFL oscillator, as suggested by the Drosophila olfactory system literature.

      We will revise the manuscript accordingly.

      Minor weaknesses:

      (1) The authors should compare the firing patterns of ORN neurons to the bursts, clusters, and packets of retinal efferent spikes reported in Liu JS and Passaglia CL (2011; JBR). By comparing measures in moths to measures in Limulus, the authors might be able to address the question: Is the daily firing pattern of ORN neurons likely a conserved feature of circadian control of sensory sensitivity?

      We will revise the discussion accordingly.

      (2) The methods need further details. For example, it is unclear if or how single neuron activity was discriminated and whether the results were compromised by the relatively large environmental fluctuations in temperature (21-27oC), humidity (35-60%), or other cues known to modulate spontaneous firing.

      We will clarify the Methods section.

      References

      Chen S, Luetje CW. 2012. Identification of New Agonists and Antagonists of the Insect Odorant Receptor Co-Receptor Subunit. PLOS ONE 7:e36784. doi:10.1371/journal.pone.0036784

      Dolzer J, Fischer K, Stengl M. 2003. Adaptation in pheromone-sensitive trichoid sensilla of the hawkmoth Manduca sexta. J Exp Biol 206:1575–1588. doi:10.1242/jeb.00302

      Gawalek P, Stengl M. 2018. The Diacylglycerol Analogs OAG and DOG Differentially Affect Primary Events of Pheromone Transduction in the Hawkmoth Manduca sexta in a Zeitgebertime-Dependent Manner Apparently Targeting TRP Channels. Front Cell Neurosci 12:218. doi:10.3389/fncel.2018.00218

      Getahun MN, Olsson SB, Lavista-Llanos S, Hansson BS, Wicher D. 2013. Insect Odorant Response Sensitivity Is Tuned by Metabotropically Autoregulated Olfactory Receptors. PLOS ONE 8:e58889. doi:10.1371/journal.pone.0058889

      Ghosh S, Suray C, Bozzolan F, Palazzo A, Monsempès C, Lecouvreur F, Chatterjee A. 2024. Pheromone-mediated command from the female to male clock induces and synchronizes circadian rhythms of the moth Spodoptera littoralis. Curr Biol 34:1414-1425.e5. doi:10.1016/j.cub.2024.02.042

      Jain K, Prelic S, Hansson BS, Wicher D. 2024. Expression of Drosophila melanogaster V-ATPases in Olfactory Sensillum Support Cells. Insects 15:1016. doi:10.3390/insects15121016

      Jones PL, Pask GM, Rinker DC, Zwiebel LJ. 2011. Functional agonism of insect odorant receptor ion channels. Proc Natl Acad Sci 108:8821–8825. doi:10.1073/pnas.1102425108

      Klein U. 1992. The insect V-ATPase, a plasma membrane proton pump energizing secondary active transport: immunological evidence for the occurrence of a V-ATPase in insect ion-transporting epithelia. J Exp Biol 172:345–354. doi:10.1242/jeb.172.1.345

      Krishnan B, Dryer SE, Hardin PE. 1999. Circadian rhythms in olfactory responses of Drosophila melanogaster. Nature 400:375–378. doi:10.1038/22566

      Merlin C, Lucas P, Rochat D, François M-C, Maïbèche-Coisne M, Jacquin-Joly E. 2007. An Antennal Circadian Clock and Circadian Rhythms in Peripheral Pheromone Reception in the Moth Spodoptera littoralis. J Biol Rhythms 22:502–514. doi:10.1177/0748730407307737

      Nolte A, Funk NW, Mukunda L, Gawalek P, Werckenthin A, Hansson BS, Wicher D, Stengl M. 2013. In situ Tip-Recordings Found No Evidence for an Orco-Based Ionotropic Mechanism of Pheromone-Transduction in Manduca sexta. PLOS ONE 8:e62648. doi:10.1371/journal.pone.0062648

      Nolte A, Gawalek P, Koerte S, Wei H, Schumann R, Werckenthin A, Krieger J, Stengl M. 2016. No Evidence for Ionotropic Pheromone Transduction in the Hawkmoth Manduca sexta. PLOS ONE 11:e0166060. doi:10.1371/journal.pone.0166060

      Rymer J, Bauernfeind AL, Brown S, Page TL. 2007. Circadian rhythms in the mating behavior of the cockroach, Leucophaea maderae. J Biol Rhythms 22:43–57. doi:10.1177/0748730406295462

      Schendzielorz J, Schendzielorz T, Arendt A, Stengl M. 2014. Bimodal Oscillations of Cyclic Nucleotide Concentrations in the Circadian System of the Madeira Cockroach Rhyparobia maderae. J Biol Rhythms 29:318–331. doi:10.1177/0748730414546133

      Schendzielorz T, Peters W, Boekhoff I, Stengl M. 2012. Time of Day Changes in Cyclic Nucleotides Are Modified via Octopamine and Pheromone in Antennae of the Madeira Cockroach. J Biol Rhythms 27:388–397. doi:10.1177/0748730412456265

      Schendzielorz T, Schirmer K, Stolte P, Stengl M. 2015. Octopamine Regulates Antennal Sensory Neurons via Daytime-Dependent Changes in cAMP and IP3 Levels in the Hawkmoth Manduca sexta. PLOS ONE 10:e0121230. doi:10.1371/journal.pone.0121230

      Schneider AC, Schröder K, Chang Y, Nolte A, Gawalek P, Stengl M. 2025. Hawkmoth Pheromone Transduction Involves G-Protein–Dependent Phospholipase Cβ Signaling. eNeuro 12:ENEURO.0376-24.2024. doi:10.1523/ENEURO.0376-24.2024

      Stengl M. 2010. Pheromone Transduction in Moths. Front Cell Neurosci 4:133. doi:10.3389/fncel.2010.00133

      Stengl M. 1994. Inositol-trisphosphate-dependent calcium currents precede cation currents in insect olfactory receptor neurons in vitro. J Comp Physiol A 174:187–194. doi:10.1007/BF00193785

      Stengl M, Funk NW. 2013. The role of the coreceptor Orco in insect olfactory transduction. J Comp Physiol A 199:897–909. doi:10.1007/s00359-013-0837-3

      Stengl M, Hildebrand JG. 1990. Insect olfactory neurons in vitro: morphological and immunocytochemical characterization of male-specific antennal receptor cells from developing antennae of male Manduca sexta. J Neurosci 10:837–847. doi:10.1523/JNEUROSCI.10-03-00837.1990

      Stengl M, Schneider AC. 2024. Contribution of membrane-associated oscillators to biological timing at different timescales. Front Physiol 14:1243455. doi:10.3389/fphys.2023.1243455

      Tanoue S, Krishnan P, Krishnan B, Dryer SE, Hardin PE. 2004. Circadian Clocks in Antennal Neurons Are Necessary and Sufficient for Olfaction Rhythms in Drosophila. Curr Biol 14:638–649. doi:10.1016/j.cub.2004.04.009

      Thurm U, Küppers J. 1980. Epithelial physiology of insect sensilla In: Locke M, Smith DS, editors. Insect Biology in the Future. Academic Press. pp. 735–763. doi:10.1016/B978-0-12-454340-9.50039-2

      Wicher D, Miazzi F. 2021. Functional properties of insect olfactory receptors: ionotropic receptors and odorant receptors. Cell Tissue Res 383:7–19. doi:10.1007/s00441-020-03363-x

    1. Botryllus schlosseri (Tunicata) is a colonial chordate that has long been studied for its multiple developmental pathways and regenerative abilities and its genetically determined allorecognition system based on a polymorphic locus that controls chimerism and cell parasitism. We present the first chromosome-level genome assembly from an isogenic colony of B. schlosseri clade A1 using a mix of long and short reads scaf-folded using Hi-C. This haploid assembly spans 533 Mb, of which 96% are found in 16 chromosome-scale scaffolds. With a BUSCO completeness of 91.2%, this complete and contiguous B. schlosseri genome assembly provides a valuable genomic resource for the scientific community and lays the foundation for future investigations into the molecular mechanisms underlying coloniality, regeneration, histocompatibility, and the immune system in tunicates.

      This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giaf097), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

      Reviewer 3: Cristian Canestro

      TO THE AUTHORS

      In this MS entitled 'First chromosome-level genome assembly of the colonial chordate model Botryllus schlosseri (Tunicata)', Olivier De Thier and colleagues report the first chromosome-scale assembly of this colonial ascidian specie, paying special attention to differences with previous published assemblies and importantly between haplotypes. The MS is very well written, very easy and pleasant to read. This provides data of great quality and very relevant not only for the ascidian/tunicate community, but to the field of genome structural evolution. I firmly recommend it for publication, although I think that the authors could discuss it in deeper detail. Specially, I miss for instance a more elaborate discussion of the results in our understanding of the similarities and differences between clades that have been published in the last years (I have not been able to find some relevant articles in this regard cited in the bibliography). I also feel that a deeper analysis of the differences between haplotypes could be very interesting, unless they are artifactual effects of the assemblies. As mentioned below, unless this is part of a longer story for a different MS beyond the scope of this one, I encourage the authors to validate some of the differences they find between haplotypes, and try to correlate the structural variations, with differences in gene counts between haplotypes, and to explore whether these differences could be correlated with aspects of biological relevance. I miss, for instance, Venn diagrams with gene contents between previous assemblies, and the haplotypes/haploid genome here reported. In any case, I firmly recommend this MS for publications, since most of my suggestions are not intended to interrogate the results of the MS, but to improve it, but I also understand that some may go beyond the scope of this MS.

      Minor points: Introduction Page 1: "the basic body plan of adult tunicates is highly conserved across the entire subphylum [3]". This sentence, which could be OK for ascidians, probably provides a highly simplified vision of Tunicate adult morphologies, specially comparing the divergent morphologies of Thaliaceans and Appendicularians. Please, elaborate the sentence.

      To understand the comparisons between the data of this MS and previously reported genomes, it seems crucial to understand well the meaning of the "clades and subclades". Please, include in the introduction (or where needed), how are defined those clades, which are their origins and biological/geographical differences, … and all the critical information that will specially help non-tunicate readers to understand the results.

      Results: The authors refer to the presence of large-scale genomic palindromes in Bs1 and Bs3. But it is unclear what are these structures. I suggest to please provide some more detailed explanation about the palindromic nature of these regions.

      The data of haplotype-resolved assemblies is very interesting. I wonder if it is possible to somehow measure the amount of heterozygosity between haplotype 1 and 2, and those versus the previous versions of the genome, to better understand intra and inter-variation between subclades? The differences of the size of some regions between Colombera and this study, and even between haplotypes 1 and 2, are very interesting. I would find more informative to merge the three graphs of Figure S9 into one single graph, so we can also easily compare the different in sizes of the haplotypes with the haploid. If some of those differences are actually due to deletions, that would deserve further analysis. If this analysis is not part of another ongoing project that will be published somewhere else, I suggest identifying with a dot-plot some of those differences, specially between haplotypes, and validate with long-reads crossing those regions whether some of the deletions are real or artifactual. Please, include the dotplot graph together with the two haplotypes in figure S10. In those cases that could be real, it would be very interesting what genes are gone, and if those are not placed somewhere else in the genome as result of translocations, or those genes are actually gone and could explain some of the differences reported in the gen count between haplotypes.

      The authors mentioned the presence of multiple structural variations, although some of which could be artifactual of miss-assemblies. Interestingly, the plot of the synteny blocks between the two haplotypes in figure S11 shows some of those structural variations, including cases of: - deletions: for instance, there are "blank" regions in Bs1A and Bs3A with no lines, which may reflect areas that are not present in the haplotype B. - duplications and translocations within chromosomes or between chromosomes of different haplotypes. Just looking to this plot, I wonder how the distribution of chromosomes between haplotypes is done. For instance, I see that Bs7B shares a duplicated synteny block with chromosomes Bs10B and Bs14B, but not with Bs10A and Bs10B, which means that the duplications are intra-haplotype present in B but not in A. But I wonder if it is possible that Bs10B and Bs14B could be in fact switched to haplotype A, and therefore there would be no duplication nor deletion in one of the haplotypes, just a simple translocation. I may be wrong in the interpretation, but I'm curious to understand the graph. In any case, again, as mentioned above, it would be worthy to validate some of those variations with long reads, which could illuminate the biological relevance between the haplotypes and discard potential artifactual errors of the assemblies.

      I notice that in figures 7 and S13, some lines are thicker than others. Is this because many "thin" lines are overlapped, and they look like a "thick" line. Otherwise, the visual effect of different thicknesses could be misleading. Please, clarify.

      In the analysis of the Hox cluster the authors say "[…] our new assembly revealed that B. schlosseri's Hox genes are not scattered. Instead, eight of them were clustered on the second largest scaffold (Bs2), whereas two other ones are found on the 15th largest scaffold (Bs15)." Generally, the description of the Hox gene in a cluster refers to the fact they are in the vicinity, with near not many other genes in between Hox genes. Therefore, I would not describe that eight Hox genes are clustered by the simple fact that they are in the same chromosome (maybe even in different arms).

    1. AbstractBackground Reference genomes for the entire sea turtle clade have the potential to reveal the genetic basis of traits driving the ecological and phenotypic diversity in these ancient and iconic marine species. Furthermore, these genomic resources can support conservation efforts and deepen our understanding of their unique evolution.Results We present haplotype-resolved, chromosome-level reference genomes and high-quality gene annotations for five sea turtle species. This completes the catalog of reference genomes of the entire sea turtle clade when combined with our previously published reference genomes. Our analysis reveals remarkable genome synteny and collinearity across all species, despite the clade’s origin dating back more than 60 million years. Regions of high interspecific genetic distance and intraspecific genetic diversity are consistently clustered in genomic hotspots, which are enriched with genes coding for immune response proteins, olfactory receptors, zinc fingers, and G-protein-coupled receptors. These hotspot regions may offer insights into the genetic mechanisms driving phenotypic divergence among species, and represent areas of significant adaptive potential. Ancient demographic analysis revealed a synchronous population expansion among sea turtle species during the Pleistocene, with varying magnitudes of demographic change, likely shaped by their diverse ecological adaptations, and biogeographic contexts.Conclusions Our work provides genomic resources for exploring genetic diversity, evolutionary adaptations, and demographic histories of sea turtles. We outline genomic regions with increased diversity, linked to immune response, sensory evolution, and adaptation to varying environments that have historically been subject to strong diversifying selection, and likely will underpin sea turtle’s responses to future environmental change. These reference genomes can assist conservation by providing insights into the demographic and evolutionary processes that sustain and threaten these iconic species.

      This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giaf105), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

      Reviewer 2: Brendan Reid

      The authors of this work provide a fantastic addition to the genomic resources currently available for marine turtles with five new, apparently high-quality reference genomes. These new resources enable a number of interesting cross-species analyses in this group, including phylogenetic reconstruction, inference of demographic history, and identification of hotspots of diversity and divergence. I though this paper was quite clearly written and easy to read overall, and I have one major and a few more minor comments/suggestions.

      Major comment: there is an extensive literature on hybridization among marine turtle lineages (see Vilaca et al. 2021, https://doi.org/10.1111/mec.16113, for a recent genomic example), with lots of evidence for ancient gene flow after initial lineage divergence as well as recent hybridization. The authors do not really mention this phenomenon at all, and since I think it has a lot of bearing on all of the results it would make sense to re-think your findings in light of the fact that some level of gene flow has occurred. Would extensive synteny/lack of genomic rearrangements potentially enable hybridization? Is overall low divergence among lineages potentially a function of gene flow? Are regions of high divergence the result of selection (as you suggest), or could these regions potentially be resistant to gene flow? I believe that IQtree assumes a strictly bifurcating tree, and gene flow can influence PSMC inferences (see Mazet et al. 2016, https://doi.org/10.1038/hdy.2015.104) - how would gene flow among lineages affect your inference of divergence dates and demographic histories?

      MInor commentsL [note - line numbers would have been helpful for providing comments on specific items! I will refer to the lower-left page numbers and paragraph instead]:

      page 3, paragraph 2: Some of the applications you refer to here don't seem terribly germane to the relevance of "genomic resources" in management and conservation per se, and several are just methods using some kind of genetic data ... e.g., "abundance"/close-kin mark recapture doesn't require full genomes (and the reference you cite used microsat data), and the "community"/eDNA applications don't generally rely on genomes but instead on databases of a few (usually mitochondrial) genes. Either include methods that truly benefit from the development of high-quality reference genomes or broaden this to something like "growth in molecular ecology techniques".

      page 4, paragraph 2: last sentence is a bit of a run-on, could break this up a bit.

      page 10, paragraph 3: for me, the ROH methods need some additional explanation and interpretation. The more detailed methods indicate that the ROH were identified on the basis of lower-than-average heterozygosity rather than true homozygosity - I can understand why this might have been done (since the baseline level of heterozygosity varies across species) but it still seems a bit arbitrary and could risk mistaking stretches with simply low variation for IBD tracts. I wonder if a ROH-detection method like ROHan that explicitly incorporates baseline genomic heterozygosity into its model would be more appropriate for comparing results across species and could give different results. I also question a bit the interpretation of these low-diversity tracts as evidence of inbreeding per se. The authors do not comment much on the length distributions of these ROH - given that many of them are quite short I would expect that if there was mating between close kin it probably happened far back in the past and the IBD tracts have been broken up by recombination.

      page 11, paragraph 2: for PSMC analyses it is important to note the method assumes that differences in coalescence time/Ne across the genome result from demography alone. If portions of the genome are under balancing/diversifying selection (such as the areas of high diversity that you detect in this study), the local Ne for inferred these regions would be expected to be larger than the rest of the genome, which could lead to the spurious detection of population expansion or contraction (more likely a contraction for balancing selection). See Boitard et al. 2022 (https://doi.org/10.1093/genetics/iyac008) for a more detailed treatement. I would try excluding the regions putatively under diversifying selection and re-run PSMC to see if your inferences change.

    1. AbstractThe vast majority of cancers exhibit Somatic Copy Number Alterations (SCNAs)—gains and losses of variable regions of DNA. SCNAs can shape the phenotype of cancer cells, e.g. by increasing their proliferation rates, removing tumor suppressor genes, or immortalizing cells. While many SCNAs are unique to a patient, certain recurring patterns emerge as a result of shared selectional constraints or common mutational processes. To discover such patterns in a robust way, the size of the dataset is essential, which necessitates combining SCNA profiles from different cohorts, a non-trivial task.To achieve this, we developed CNSistent, a Python package for imputation, filtering, consistent segmentation, feature extraction, and visualization of cancer copy number profiles from heterogeneous datasets. We demonstrate the utility of CNSistent by applying it to the publicly available TCGA, PCAWG, and TRACERx cohorts. We compare different segmentation and aggregation strategies on cancer type and subtype classification tasks using deep convolutional neural networks. We demonstrate an increase in accuracy over training on individual cohorts and efficient transfer learning between cohorts. Using integrated gradients we investigate lung cancer classification results, highlighting SOX2 amplifications as the dominant copy number alteration in lung squamous cell carcinoma.

      This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giaf104), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

      Reviewer 2: Ellen Visscher

      The paper introduces a python package for imputation, filtering, segmentation, feature extraction and visualisation of CNA profiles. It explains some of the elements of the package, and then demonstrates how data from multiple cohorts can be processed and combined using the package preprocessing pipeline. The authors then use processed data from 3 different cohorts to perform cancer type prediction using a CNN. From this, they get an interesting result to find a biomarker that differentiates two different lung cancers. Throughout, they show visualisations using their package. The package itself seems well documented and designed to be used. There is some clarification required in the methods section specifically around the CNN training and the models therein. There is also one major question of whether all the preprocessing steps are actually required for the downstream CNN analysis. Overall, however, this is a well written manuscript, providing a useful software tool for further analysis of CNA data.

      Major comments: - CNN section- how are the segments decided- is it based on all the training data, or just data in a batch? - Throughout the results pertaining to figure 3A-C, you call it test accuracy- to be clear is this is based on your CV hold outs? This should be reworded everywhere to reflect this. As cross validation indicates, this is not a test set and is a validation set- which is also the way you use it. - Regarding the above, you have a comment saying: "the best test accuracy without cross-validation was 92.34%". Could you please clarify what you mean by this. Only in the CNN section do you describe your training approach, which does not mention a test or separate validation set. - It reads slightly unclearly- you have a section called "model transfer", but are you training 3 different models- one per dataset? You only have one figure for training results which suggests one dataset, but then you have this section called model transfer? - Re all the above, please dedicate a small subsection in methods making this clearer. Are there dedicated test sets? If your main results are for aggregated data, then what are you testing on to ensure generalisability? What is the point of training the 3 different models on 3 different datasets? Perhaps it would make more sense to hold one dataset out as your test set. In some ways, that is what the model transfer is showing, but it would be less confusing to clarify that aim instead of suddenly introducing 3 models. - If the CNN architecture is essentially the same as in Attique et. al., the performance is basically the same and they use only CNs a gene locations- how does this demonstrate that the preprocessing from CNSistent is necessary or advantageous for this task? Maybe having a result which combines CN calls naively over gene locations and comparing to this across the aggregate datasets would be a good way of comparing? I.e showing that preproccessing does offer an advantage when combining different datasets together? Also because this is what you argue in your abstract. For this analysis you would have to make sure you also compare across the same samples to differentiate between filtering/other preprocessing steps. - In Figure 3I, you say "notice the similarity of chromosome 3 pattern for the correctly classified LUSC samples (red) and the misclassified ones (orange)". This is confusing because the orange and red are not similar. In fact for this whole section, it seems that figure 3I does not align with what you are saying?

      Minor comments/errors: - Clarification on why CNSistent needs a reference genome if it's dealing with segments? How is this information used- is it just for the known gaps? - Your caption of Supplementary Figure 1 has a typo about a breakpoint at 16 instead of 14. - You do not explain how you use the knee pt to filter (i.e is it samples above/below the knee pt.) - Your CNN graphic is difficult to interpret and non-standard. - CNN section should clarify at the beginning what the input is and what the output is (i.e a prediction that a sample belongs to a particular cancer type) before explaining the architectural details. - Even though you control for class imbalance, some cancer types are so poorly represented it is unlikely a CNN could learn that, you do kind of mention this in the discussion, but maybe some sort of minimum threshold for inclusion would make sense. - For Fig2D you refer to it as GND, but the axes/title says hemizygosity-are these things equivalent? E.g could have 3-3, low hemizygosity but not diploid? Or if it's aggregated across the whole genome its assumed equivalent? - There is a grammatical error "Runtimes decreased in a near-linearly with the number of compute cores" - You make a comment that "We therefore suspect some TCGA lung cancers might be cases of co-occurring adeno and squamous carcinomas." This is a possibility but given pleiotropy of many phenotypes- it may also be that the biomarker is not always unique to squamous carcinomas.

      Suggestions/Nice to haves: - Maybe make it clearer inside the paper what visualisations come with CNSistent. Looking at the software documentation, there's obviously a lot of useful visualisations that come with that- and some of them you have used in Figure 3 for e.g. - Given there are more total CN callers, maybe good to mention somewhere how CNSistent would work for total CNs only. - You remove profiles that you say are uninformative, could you not include this and then just show how accuracy correlates with no. of break-pts (for e.g). In some ways one might think that there could be useful information in few alteration profiles- because those alterations might be more upstream/causal. - The aggregation step could maybe affect downstream analysis. I.e taking the average could introduce CNs that were never called. Even using min/max- this implies a constant copy number in that region, which may lose information- e.g if it is a functional region having two diff CNs across gene might imply non-functionality. Did you explore the effect of aggregation step? Perhaps taking a small enough resolution of segment types would account for this anyway.

    1. AbstractPolyadenylation is a dynamic process which is important in cellular physiology. Oxford Nanopore Technologies direct RNA-sequencing provides a strategy for sequencing the full-length RNA molecule and analysis of the transcriptome and epi-transcriptome. There are currently several tools available for poly(A) tail-length estimation, including well-established tools such as tailfindr and nanopolish, as well as two more recent deep learning models: Dorado and BoostNano. However, there has been limited benchmarking of the accuracy of these tools against gold-standard datasets. In this paper we evaluate four poly(A) estimation tools using synthetic RNA standards (Sequins), which have known poly(A) tail-lengths and provide a valuable approach to measuring the accuracy of poly(A) tail-length estimation. All four tools generate mean tail-length estimates which lie within 12% of the correct value. Overall, Dorado is recommended as the preferred approach due to its relatively fast run times, low coefficient of variation and ease of use with integration with base-calling.

      This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giaf098), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

      Reviewer 1: Christoph Dieterich

      In this manuscript, the authors present a benchmark to assess the performance of different tools designed for estimation of polyA tail length from Nanopore direct RNA-sequencing data. These tools include tailfindr, nanopolish, Dorado and Boost Nano. Benchmarks on tools and algorithms to analyze Nanopore data, both third party tools and official ONT releases, are of utmost importance for the field. The use of synthetic constructs with known ground truth is recommended as well. Consequently, this study has the potential to provide a significant contribution to the field.

      In the current form, I can however not recommend it for publication in GigaScience. My major concerns are: a) Use of only RNA002 data. This chemistry is outdated and thus the Benchmark is only relevant for old, possibly already published data. A comprehensive Benchmark should also include RNA004 and available tools there (at least Dorado). b) The current data set only contains two polyA tail length, which are relatively short and do not cover longer polyA tails that are common e.g. in mammalian cells. A proper Benchmark should show the performance of the analyzed tools over a range of polyA tail lengths.

      Minor comments: 1) Abstract: "All four tools generate mean tail-length estimates which lie within 13% of the correct value." The value of 13% is given in the Abstract from the submission system, wherease the abstract in the Main text says 12%. Which value is correct? 2) Background, first paragraph: the role of the polyA tail in RNA circularization, which is required for efficient translation of cellular mRNAs is not mentioned. Reference is missing for "is increasingly recognised as a dynamic process which influences timing and degree of protein production." 3) Background, second paragraph: Chiron seems to be a relatively old basecaller (no models for new chemistries). It should be mentioned here that it is required for BoostNano. 4) Mis-priming of internal polyA sites may an important confounding (and currently overlooked) source of errors in Nanopore sequencing. This should be quantified properly and analyzed in more detail (length of these stretches, influence of other nucleotides within the A-rich stretch, etc.). Should be done as well on whole transcriptome data with more possible mispriming sites. 5) Why do the authors think that the poly(T) stretch of the RTA might be truncated? This is composed of DNA oligos, which should be quite stable 6) What are the parameters for filtering used by Dorado and BoostNano? Can the authors explain, why the filtered reads differ? 7) Dorado seems to systematically underestimate polyA tail length. Is this true also for data generated with RNA004 chemistry and longer polyA tails?

    1. AbstractThe ability to differentiate between viable and dead microorganisms in metagenomic data is crucial for various microbial inferences, ranging from assessing ecosystem functions of environmental microbiomes to inferring the virulence of potential pathogens from metagenomic analysis. While established viability-resolved genomic approaches are labor-intensive as well as biased and lacking in sensitivity, we here introduce a new fully computational framework that leverages nanopore sequencing technology to assess microbial viability directly from freely available nanopore signal data. Our approach utilizes deep neural networks to learn features from such raw nanopore signal data that can distinguish DNA from viable and dead microorganisms in a controlled experimental setting of UV-induced Escherichia cell death. The application of explainable AI tools then allows us to pinpoint the signal patterns in the nanopore raw data that allow the model to make viability predictions at high accuracy. Using the model predictions as well as explainable AI, we show that our framework can be leveraged in a real-world application to estimate the viability of obligate intracellular Chlamydia, where traditional culture-based methods suffer from inherently high false negative rates. This application shows that our viability model captures predictive patterns in the nanopore signal that can be utilized to predict viability across taxonomic boundaries. We finally show the limits of our model’s generalizability through antibiotic exposure of a simple mock microbial community, where a new model specific to the killing method had to be trained to obtain accurate viability predictions. While the potential of our computational framework’s generalizability and applicability to metagenomic studies needs to be assessed in more detail, we here demonstrate for the first time the analysis of freely available nanopore signal data to infer the viability of microorganisms, with many potential applications in environmental, veterinary, and clinical settings.Author summary Metagenomics investigates the entirety of DNA isolated from an environment or a sample to holistically understand microbial diversity in terms of known and newly discovered microorganisms and their ecosystem functions. Unlike traditional culturing of microorganisms, genomic approaches are not able to differentiate between viable and dead microorganisms since DNA might persist under different environmental circumstances. The viability of microorganisms is, however, of importance when making inferences about a microorganism’s metabolic potential, a pathogen’s virulence, or an entire microbiome’s impact on its environment. As existing viability-resolved genomic approaches are labor-intensive, expensive, and lack sensitivity, we here investigate our hypothesis if freely available nanopore sequencing signal dat that captures DNA molecule information beyond the DNA sequence might be leveraged to infer such viability. This hypothesis assumes that DNA from dead microorganisms accumulates certain damage signatures that reflect microbial viability and can be read from nanopore signal data using fully computational frameworks. We here show first evidence that such a computational framework might be feasible by training a deep model on controlled experimental data to predict viability at high accuracy, exploring what the model has learned, and using it in a real-world application by application to a bacterial species of veterinary relevance. We finally show that a specific model has to be trained to accurately predict viability after antibiotic exposure of a mock microbial community. While the generalizability of our computational framework therefore needs to be assessed in much more detail, we here demonstrate that freely available data might be usable for relevant viability inferences in environmental, veterinary, and clinical settings.

      This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giaf100), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

      Reviewer 1: Finlay Maguire

      In this paper the authors train a ResNet-based model to predict whether individual 10,000 sample chunks of nanopore signal data originate from live or killed bacterial isolate cultures. From live and UV-killed (at exponential phase) E. coli K-12 cultures DNA was extracted and sequenced using separate R10.4.1 flowcells on a MinION. Signal data from each read in the live and dead extractions were then processed by discarding the first 1,500 samples and dividing the remaining signals into 10,000 sample chunks. These were then split into a balanced 60:20:20 train, test, and validation datasets with the constraint that no two chunks from the same read would end up in the same dataset (e.g., chunk 1 and chunk 2 of 1st read in the killed culture would hypothetically be separated into train and test). During this they also explored/compared the impact of chunk size, model architecture, and performance of a sequence based model using the E. coli data. With a nicely performed class-activation map and masking approach they then identified the signal regions most strongly associated with dead-predictions (such as twisting/kinking/pore blockage of DNA around pyrimidine dimers). Finally, they applied their trained model to a live and heat-killed Chlamydia abortus culture and compared their results to stained microscopy and propidium monoazide PCR measures of viability. They found equivalent performance on the C. abortus data to their E. coli data (despite a different killing-method and taxa).

      The manuscript is well written and the methods are clearly described (including well documented code and deposited data). The authors explainability methodology is excellent although it would have been nice to see a bit more in-depth interpretation of those results. The authors have also presented a convincing case that nanopore signal data does contain information that can be used to distinguish signal chunks from live and dead bacterial monocultures. This methods has the potential to be useful in clinical and environmental genomics if it can be extended to more heterogeneous metagenomic samples. However, despite the title and framing of this manuscript (i.e., "metagenomics"), their analyses do not involve any metagenomic data and their results so far do not demonstrate if this is fesible. Currently, the overall framing (and title) of the manuscript is not appropriate given the work performed at this point. Similarly, given that both E. coli and C. abortus "dead" cultures resulted in median read length less than half the live cultures, the authors do not fully make the case that the signal and ResNet approach is actually required relative to simpler baseline models. Finally, although they did evaluate performance on a complete separate dataset, the authors should at least explore/quantify the correlation of live/dead prediction across chunks of the same read given the default expectation of non-independence of signal chunks from the same read.

      Major - Although the title and framing of the paper suggest that the authors are classifying live and dead bacteria in metagenomic datasets, the actual experiments and method developed are entirely based around sequencing of cultured clonal bacterial isolates. Metagenomic datasets are going to have considerably more heterogeneity in viability, species composition, and DNA signal characteristics. Given this, the paper's title, introduction, and parts of the discussion are a bit of an oversell and inappropriate. This manuscript should be revised to more clearly reflect the work actually performed.

      • This paper doesn't establish whether a ResNet + Signal approach actually outperforms a much simpler baseline. For example, given there is a clear extraction and median read-length differences between live and dead samples, it is possible that a much simpler logistic model using basic features such as read length and/or translocation could perform equivalently.

      • Although the C. abortus analysis demonstrates limited impact of leakage, I'm still a bit concerned that the potential non-independence of chunks from the same read (i.e., chunk 1 and chunk 3 of the same read are more likely to share similar live/dead signal characteristics than Chunk 1 and 3 of different reads). By not having multiple chunks of the same read in the training, validation, or test datasets the authors may have avoided issues with longer-reads being more represented in their datasets. However, this has the potential to introduce data leakage between train and test set (which may impact generalisability when they attempt to extend this method to metagenomics). I think this paper would be improved by some exploration of the correlation of live/dead prediction across chunks of the same read. How often do different chunks of the same read disagree? How does this impact the overall performance of the model? Does taking the average prediction across chunks of the same read improve or degrade performance? Would this problem be better suited to a multiple instance learning approach (i.e., a live/dead label applied to all chunks from a single read) especially in more heterogeneous datasets? To what degree do longer reads with more chunks contribute disproportionately to the overall performance in the C. abortus dataset?

      Minor

      • SRA records don't seem to be live yet (https://www.ncbi.nlm.nih.gov/sra?linkname=bioproject_sra_all&from_uid=1123127)

      • Are the actual pod5 files available?

      • Read-level performance should be analysed and reported.

      • Figure 1B: the test subplot numbers are almost too small to read - they may benefit from being its own panel.

      • Plot axes labels are not always clear (e.g., Figure 3) percentage of what? Chunks? or Reads? It would be nice to see consistent capitalisation of labels and legends.

      • Predictions on viable E. coli and viable C. abortus seems surprisingly similar (91.44% vs 91.34% viable and 8.56% vs 8.66% dead) despite different taxa, potentially underlying viable cell proportion, and output probability densities. This would benefit from further discussion/analysis - do misclassified chunks have any common characteristics? Would you expect the E. coli to have similar microscopy/PCR measured viability percentage as the C. abortus.

      • Would be good to see a bit more discussion/exploration of impact of mixed live/dead cells given ~37.6% viability measure in the C. abortus sample (e.g., how well do models perform with different ratios of live/dead reads) - could potentially be achieved using in-silico spike ins).

    1. There is a third kind of answer that, without competing with the previous two, demonstrates the value of philosophy, even (perhaps, especially) for students like our imagined protagonist: philosophy is the antidote to the uncritical acceptance of the world and ourselves as we are.

      I like the phrase "antidote to the uncritical acceptance" quite a lot. At first, you may think that an "uncritical acceptance" isn't necessarily a bad thing. However, thinking about it more, do you really want to just blindly accept the world around you? Looking critically at yourself and the world allows you to make changes and work to improve the lives of yourself and others, among many other things, simply because you dared to question.

    2. The deep underlying idea is that if we have to choose a social and political arrangement without knowing the position that we may occupy in society, we will choose fair principles to govern our social and political institutions. My teacher had our class re-enact a scenario very much like this one in class. We discussed the principles that would govern our imagined society before we picked our fate out of a hat. Until that point in my young life, I had never thought about justice in that way

      This is a very interesting way to think about justice. The author introduces this method to imagine a fair society with no bias. The reason this works so well is because not knowing what ur position in society will be, allows you to genuinely try ur best to make society as fair and enjoyable as possible for every individual

    3. Therefore, the first step in this kind of philosophical education is to shake students out of a complacent and uncritical acceptance of the world as it is.

      I think this is one of the most important reasons why we need to study philosophy. When we repeat our daily routines and become accustomed to them, we tend to overlook the injustices within them or we may not even recognize them as injustices. Philosophy enables us to think more critically about the society we live in, its institutions, and the impact they have on us.

    4. When students take this imaginative exercise seriously, they start to feel as discomfited as Descartes himself must have. The ground starts shaking under them. It is at this moment that philosophy starts its work.

      By asking so many bizarre questions that one normally does not consider on a day-to-day basis, it pushes us outside of our comfort zone and forces us to take a step into the unknown. This encourages our brains to work in different ways that it may normally not think, ask questions beyond our general scope of thinking, and create new connects and ideas that we may normally have not considered. I think this kind of emphasizes the importance of philosophy because it teaches us how to react when we are pushed outside of our comfort zone and how to think beyond our normal flow of consciousness.

    5. Many philosophers have persuasively criticized Rawls’ use of the original position as an argumentative tool. But we often forget, I think, how successfully it harnesses the power of the imagination to construct an alternative vision of what society could be like.

      We are so used to the life we live that we in ways we become comfortable in it. When imagining a different reality, one in which they may be less high up/wealthy, it becomes difficult for some to acknowledge just the amount of privilege they once had. The "Theory of Justice" gives people a different perspective on life and how different each and every person's life is from one another.

    6. The deep underlying idea is that if we have to choose a social and political arrangement without knowing the position that we may occupy in society, we will choose fair principles to govern our social and political institutions. My teacher had our class re-enact a scenario very much like this one in class. We discussed the principles that would govern our imagined society before we picked our fate out of a hat. Until that point in my young life, I had never thought about justice in that way. The power of this exercise contributed in no small way to my becoming a philosopher. I have recreated a similar activity in various classes I have taught. The discussion it generates among students is reliably superb, but the best moment is when students discover their fate – whether they end up being a doctor or a garbage truck driver or a poor young mother – and have to reckon (at least for that class period) with their principles. Many philosophers have persuasively criticized Rawls’ use of the original position as an argumentative tool. But we often forget, I think, how successfully it harnesses the power of the imagination to construct an alternative vision of what society could be like.

      Though it was a little difficult for me to picture this in real life as it is not realistic that society is completely unaware of ones capabilities before choosing their position in the social hierarchy, I think that this is fascinating to imagine. We often forget that we may not be as secure in our social status or career as we think we are so it is important to be aware of those of lower status around you and not take your position for granted.

    7. Now, ask yourself: what could philosophy do for you?

      I think this is a very interesting start to this article! It puts us into the shoes of someone in a difficult position, in which they must tirelessly work away to simply have a shot at a decent, livable lifestyle. I feel that this scenario they painted for us so vividly is really powerful when leading into this question, because I think people in the current climate of the world tend to underestimate the importance of philosophy, or don't really think about it at all. While maybe a lot of us don't completely relate to the situation of the young mother, a lot of us DO have our own struggles and might find ourselves lost in the grueling work that may come with everyday life. And when simply going through with our daily lives is hard enough, why should we bother with philosophy? Personally, I don't really think about the idea of philosophy at all, and I never really thought it would be relevant to me based on what I want to do in life. And when people don't think something is relevant, why bother with it, right? Life is busy enough as it is. But really, it probably has a lot more relevancy in my life than I think, and I believe that this idea is somewhat being conveyed in this part. That's just how I saw this paragraph, but I thought it was a strong opening!

    8. The deep underlying idea is that if we have to choose a social and political arrangement without knowing the position that we may occupy in society, we will choose fair principles to govern our social and political institutions. My teacher had our class re-enact a scenario very much like this one in class. We discussed the principles that would govern our imagined society before we picked our fate out of a hat. Until that point in my young life, I had never thought about justice in that way. The power of this exercise contributed in no small way to my becoming a philosopher. I have recreated a similar activity in various classes I have taught. The discussion it generates among students is reliably superb, but the best moment is when students discover their fate – whether they end up being a doctor or a garbage truck driver or a poor young mother – and have to reckon (at least for that class period) with their principles. Many philosophers have persuasively criticized Rawls’ use of the original position as an argumentative tool. But we often forget, I think, how successfully it harnesses the power of the imagination to construct an alternative vision of what society could be like.

      This is a brilliant way to describe others lived experiences and how what might not affect you, could affect someone else. Using philosophical teachings can reveal the privileges of some and the shortcomings of others and hopefully create a better understanding of everyones blindspots in day to day life. Truly a very powerful and humbling exercise that can help create common ground and allow others to empathize with eachother and hopefully create a more just society.

    9. The deep underlying idea is that if we have to choose a social and political arrangement without knowing the position that we may occupy in society, we will choose fair principles to govern our social and political institutions. My teacher had our class re-enact a scenario very much like this one in class. We discussed the principles that would govern our imagined society before we picked our fate out of a hat. Until that point in my young life, I had never thought about justice in that way. The power of this exercise contributed in no small way to my becoming a philosopher. I have recreated a similar activity in various classes I have taught. The discussion it generates among students is reliably superb, but the best moment is when students discover their fate – whether they end up being a doctor or a garbage truck driver or a poor young mother – and have to reckon (at least for that class period) with their principles. Many philosophers have persuasively criticized Rawls’ use of the original position as an argumentative tool. But we often forget, I think, how successfully it harnesses the power of the imagination to construct an alternative vision of what society could be like.

      This idea that we must get rid of the idea of "safety" within our lives and experiences can be imagined as a vision of the future that we as people, don't want to imagine. Being a "poor mother" or a "garbage truck driver" can be thought of as a disappointing fate to many who attend college, it can even be a fate so poor in the minds of students, that it serves as motivation in their eyes ; to not be like "them" , its a phrase that sticks with many who hold themselves to a high idea of success. But I believe and resonate with this idea of harnessing imagination as it broadness our perspective on education and life, because no matter how safe we feel behind a wall of education or wealth, there can always be a force of society that challenges our goals.

    1. AbstractWater buffalo is a cornerstone livestock species in many low- and middle-income countries, yet major gaps persist in its genomic characterization—complicated by the divergent karyotypes of its two sub-species (swamp and river). Such genomic complexity makes water buffalo a particularly good candidate for the use of graph genomics, which can capture variation missed by linear reference approaches. However, the utility of this approach to improve water buffalo has been largely unexplored.We present a comprehensive pangenome that integrates four newly generated, highly contiguous assemblies of Pakistani river buffalo with available assemblies from both sub- species. This doubles the number of accessible high-quality river buffalo genomes and provides the most contiguous assemblies for the sub-species to date. Using the pangenome to assay variation across 711 global samples, we uncovered extensive genomic diversity, including thousands of large structural variants absent from the reference genome, spanning over 140 Mb of additional sequence. We demonstrate the utility of these data by identifying putative functional indels and structural variants linked to selective sweeps in key genes involved in productivity and immune response across 26 populations.This study represents one of the first successful applications of graph genomics in water buffalo and offers valuable insights into how integrating assemblies can transform analyses of water buffalo and other species with complex evolutionary histories. We anticipate that these assemblies, and the pangenome and putative functional structural variants we have released, will accelerate efforts to unlock water buffalo’s genetic potential, improving productivity and resilience in this economically important species.

      This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giaf099), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

      Reviewer 4: Wai Yee Low

      Review of "A comprehensive water buffalo pangenome reveals extensive structural variation linked to population specific signatures of selection". This is an impressive work at the frontier of buffalo genomics. I truly enjoy reading the work and my questions/comments are aimed at improving it further. My detailed comments are below: Line 30: I think it is better you include the actual number of publicly available assemblies used to create the pangenome graph. Line 71: There is now a swamp buffalo reference genome with annotation too (NCBI accession: PCC_UOA_SB_1v2). Perhaps consider to cite the swamp buffalo ref https://academic.oup.com/gigascience/article/doi/10.1093/gigascience/giae053/7753516 and rewrite the sentence to say a pangenome can be used for both swamp and river, but a single linear ref from either subspecies for read mapping is not good enough. Line 79: "highlighted" Line 82: What do you mean by "higher quality"? The assemblies have been discussed in this review: https://www.frontiersin.org/journals/genetics/articles/10.3389/fgene.2021.629861/full Line 105: Technically, the graph method for bovine species, which includes water buffalo, is being investigated by the Bovine Pangenome Consortium (BPC). However, nothing useful has been published on the buffalo graph but perhaps consider citing the BPC since your paper overlaps with it (https://genomebiology.biomedcentral.com/articles/10.1186/s13059-023-02975-0). Line 165: It will be good if you add a bit more context of the PanGenie method here as the researchers in buffalo community are not used to this. Additionally, it will be great if all code is made available on GitHub or as Supplementary Info. Line 170: To produce phase pangenome graph, don't you need all input assemblies to be phased? All are input assemblies phased? The UOA_WB_1 is locally phased, not phased throughout the genome. Line 235: "a list of 403 unrelated individuals." What does this translate to in terms that geneticists can understand? Do you mean siblings have been removed? Or individuals sharing the same grandparents were removed? Line 246: Can you please explain how did you get the coordinates to match between the GATK and PanGenie method? You'll need matching coordinates for concordance analysis. As I understand it, the GATK was based on UOA_WB_1? Line 254: Why these 3 chromosomes? Line 257: If you had not filtered for relatedness, how will it impact the selective sweep work? I think including some context will help the readers. Line 259: do you mean at least six samples per group? If yes, is 6 samples enough? Line 261: genotype quality less than 25 according to bcftools? Since you only used biallelic variants, please provide the breakdown between biallelic and multiallelic. Line 281: "… we first PacBio HiFi sequenced one female" Please rewrite this. Line 282: How common are these two breeds in percentage? Line 291: Is this already known? Perhaps cite the literature to show the agreement with previous studies? Fig 1D: This is a bit too small to see especially the SV distribution at the bottom. I can hardly see the median? Line 310: Why did you choose UOA_WB_1 as the reference? Line 311: the ~32.8 mil variants are comprised of SNPs as well? Fig 2: This is probably a panel of a figure but should not be the entire figure. The size of the circle indicates sample size but there should be a legend on the plot for this to say the sizes, right? Darker colour should be used to highlight the countries with samples instead of white? Maybe this could be a Supp figure too. Line 356: S Figure 4 and 5 should be main figures? You will need to annotate the abbreviation of sample-country in the legend of S Figure 5. Line 360: "To enable reuse we have made this dataset available …" The dataset should be made available to reviewers? Line 368: "76% of SNVs were called by both callers" 76% seem low. Also, called does not mean concordant. What is the concordance among called SNVs in both? Did the pangenome approach called most of the variants found in GATK? If not, what might be the reasons? Fig 3B: It is not immediately clear what the difference is, between non repetitive and repetitive regions. The overlapping text in the x-axes makes it hard to read. Line 390: "Analyses such as the study of selective sweeps or genome-wide association studies where low frequency variants are often filtered out will benefit less from the advantages of GATK, particularly given its longer run time." From here on, in this paragraph, it's Discussion, not Results. Line 418: Why human? Could you use cattle? Line 427: I tried the browser and not sure what I can learn from it. It will be helpful if there is a README with some examples on what can be explored. Line 450: How large before you considered it as larger variant? Is this ability to study larger variants still hold despite using only ~10 assemblies in the graph? The use of short reads for selective sweep study will still benefit from being able to incorporate these larger variants? As I understand it, the larger variants were found only from graph, not from the short reads. As such, the selective sweep may not be associated with any larger variants? Line 470: Fig S8 should be a main figure? Line 513: Instead of uniprot link, perhaps consider including this as Supplementary info or text. The info in the link may change in the future. Line 551: However, without scaffolding, the assemblies of Pakistani river buffalo may not be good enough to function as reference genomes for river buffalo? Line 552: When considering new bases, did you do this for each assembly independently or the new bases were discovered cumulatively? Line 581: Some of my questions at Line 450 can be discussed here. Line 586: Perhaps consider discussing the limitations of the small number of assemblies used to create the graph. As such, many SVs are likely still missing and we are still unable to properly assess allele frequency of these larger SVs. Additionally, while some SVs may not be considered as large in this work, it does not mean they have no impact.

    1. AbstractBackground Influenza A virus (IAV) poses a significant threat to animal health globally, with its ability to overcome species barriers and cause pandemics. Rapid and accurate IAV subtypes and host source prediction is crucial for effective surveillance and pandemic preparedness. Deep learning has emerged as a powerful tool for analyzing viral genomic sequences, offering new ways to uncover hidden patterns associated with viral characteristics and host adaptation.Findings We introduce WaveSeekerNet, a novel deep learning model for accurate and rapid prediction of IAV subtypes and host source. The model leverages attention-based mechanisms and efficient token mixing schemes, including the Fourier Transform and the Wavelet Transform, to capture intricate patterns within viral RNA and protein sequences. Extensive experiments on diverse datasets demonstrate WaveSeekerNet’s superior performance to existing models that use the traditional self-attention mechanism. Notably, WaveSeekerNet rivals VADR (Viral Annotation DefineR) in subtype prediction using the high-quality RNA sequences, achieving the maximum score of 1.0 on metrics including the Balanced Accuracy, F1-score (Macro Average), and Matthews Correlation Coefficient (MCC). Our approach to subtype and host source prediction also exceeds the pre-trained ESM-2 (Evolutionary Scale Modeling) models with respect to generalization performance and computational cost. Furthermore, WaveSeekerNet exhibits remarkable accuracy in distinguishing between human, avian, and other mammalian hosts. The ability of WaveSeekerNet to flag potential cross-species transmission events underscores its significant value for real-time surveillance and proactive pandemic preparedness efforts.Conclusions WaveSeekerNet’s superior performance, efficiency, and ability to flag potential cross-species transmission events highlight its potential for real-time surveillance and pandemic preparedness. This model represents a significant advancement in applying deep learning for IAV classification and holds promise for future epidemiological, veterinary studies, and public health interventions.

      This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giaf089), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

      Reviewer 1:Will Dampier

      The manuscript presented by Nguyen et al. is well written, well researched, and well executed. The use of this new "wavelet style" neural network shows both an increased training efficiency and improved accuracy at detecting influenza subtypes for surveillance. However, I think their comparison to a 'plain' Transformer model does not take advantage of the improvements in pre-training and transfer-learning that have become standard practice in deep-learning. I have also included some stylistic suggestions to improve the figures as presented. After addressing these comments, I believe that this will become a very strong manuscript.

      Major Comments:

      The authors present a comparison between their new wavelet architecture and a standard transformer architecture using a one-hot encoded vector of amino-acids. I believe that this is the correct 'null model' to compare your wavelet architecture to, however, it does not represent the 'state of the art' in utilizing transformers for sequence analysis. As I'm sure the authors are aware, the disadvantage of transformers is that they take an extensive amount of training (they note the transformer only models take 2-4X more training epochs to converge). However, the advantage they bring is that they can be extensively trained for one task and then transfer that learning to another related task. A number of models have been pre-trained on giant collections of proteins Asgari et al, https://doi.org/10.1371/journal.pone.0141287 and Rives et al https://doi.org/10.1073/pnas.2016239118 which then allow one to transfer that knowledge to different domains with fewer examples such as demonstrated in Dampier et al https://doi.org/10.3389/fviro.2022.880618. It would be interesting to see whether your wavelet model defeats these pre-trained models with transfer learning. If you showed that, you could argue that there is no need for the extensive expense of 'foundational models'.

      The authors discuss that there is a significant imbalance in the training set and they used up-sampling and limiting to balance out the class representation. Since the classes are not equally represented, the model may not be equally able to predict each class. And the high metrics may only be a representation of its ability to predict the popular classes correctly. The authors should include an additional set of figures (supplemental is fine) that show the metrics broken out by Subtype. It would also be interesting to see a graph of the class-size (before up-sampling) vs F1-score (or another metric) on that class. This could provide lower-bounds for how many samples are needed to train the model.

      Minor Comments:

      Figures 3, 4, and 5: These would benefit from a linked y-axis. It is hard to compare across A/B/C/D when the axes have different y-limits.

    1. Author response:

      We thank both reviewers for their valuable comments. We have prepared a point-by-point response below.

      Reviewer #1 (Public review):

      Weaknesses:

      (1) The conclusions regarding the links between neural and behavioral mechanisms are mostly well supported by the data. However, what is less convincing is the authors' argument that their study offers evidence of 'priming'. An important hallmark of priming, at least as is commonly understood by cognitive scientists, is that it is stimulus specific: i.e., a repeated stimulus facilitates response times (repetition priming), or a repeated but previously ignored stimulus increases response times (negative priming). That is, it is an effect on a subsequent repeated stimulus, not ANY subsequent stimulus. Because (prime or target) stimuli are not repeated in the current experiments, the conditions necessary for demonstrating priming effects are not present. Instead, a different phenomenon seems to be demonstrated here, and one that might be more akin to approach/avoidance behavior to a novel or salient stimulus following an appetitive/aversive stimulus, respectively.

      (2) On a similar note, the authors' claim that 'priming' per se has not been well studied in non-human animals is not quite correct and would need to be revised. Priming effects have been demonstrated in several animal types, although perhaps not always described as such. For example, the neural underpinnings of priming effects on behavior have been very well characterized in human and non-human primates, in studies more commonly described as investigations of 'response suppression'.

      We thank the reviewer for these critical comments. After careful consideration of both reviews, we agree that “priming” may not be the most accurate term to describe the behavioral phenomenon. We plan to revise our terminology throughout the manuscript accordingly to better capture the generalized nature of the effect we observe.

      (3) The outcome measure - i.e., difference scores between the two odors or odor and non-odor (i.e., the number of flies choosing to approach the novel odor versus the number approaching the non-odor (air)) - appears to be reasonable to account for a natural preference for odors in the mock-trained group. However, it does not provide sufficient clarification of the results. The findings would be more convincing if these relative scores were unpacked - that is, instead of analyzing difference scores, the results of the interaction between group and odor preference (e.g., novel or air) (or even within the pre- and post-training conditions with the same animals) would provide greater clarity. This more detailed account may also better support the argument that the results are not due to conditioning of the US with pure air.

      We use the PI score as a standard metric to quantify all the odor preference in behavioral assays because it allows for robust comparison across different genetic or treatment groups under the same experimental setting. In T-maze, real time tracking of fly trajectories is technically difficult. With olfactory arenas, we showed some examples of fly distribution in quadrants over the entire odor choice test period (Figure 2—figure supplement 2) for both pre-trained and post-trained groups and discussed the trajectories in Discussion. We will ensure this point is clarified in the revised text.                       

      Reviewer #2 (Public review):

      […] They finally recorded from different mushroom body output neurons, including the one (MBON-γ4γ5) likely affected by the increased activity of the corresponding γ4 reward dopaminergic neurons after shock preexposure. They recorded odour-evoked responses from these neurons before and after shock preexposure, but did not find any plasticity, while they found a logical effect during spaced cycles of aversive training.

      We thank the reviewer for the summary. We would like to clarify that we did, in fact, observe plasticity in MBON-γ4γ5 following shock exposure, as shown in Figure 4B.

      Overall, the study is very interesting with a substantial amount of behavioural analysis and in vivo 2-photon calcium imaging data, but some major (and some minor) issues have to be resolved to strengthen their conclusions.

      (1) According to neuropsychological work (Henson, Encyclopedia of Neuroscience (2009), vol. 7, pp. 1055-1063), « Priming refers to a change in behavioral response to a stimulus, following prior exposure to the same, or a related, stimulus. Examples include faster reaction times to make a decision about the stimulus, a bias to produce that stimulus when generating responses, or the more accurate identification of a degraded version of the stimulus". Or "Repetition priming refers to a change in behavioural response to a stimulus following re-exposure" (PMID: 18328508). I therefore do not think that the effects observed by the authors are really the investigation of the neural mechanisms of priming. To me, the effect they observed seems more related to sensitisation, especially for the activation of sweet-sensing neurons. For the shock effect, it could be a safety phenomenon, as in Jacob and Waddell, 2020, involving (as for sugar reward) different subsets for short-term and long-term safety.

      As noted in our response to Reviewer #1, we plan to revise our use of the term “priming” in the manuscript to more accurately interpret the behavioral phenomenon.

      (2) The author missed the paper from Thomas Preat, The Journal of Neuroscience, October 15, 1998, 18(20):8534-8538 (Decreased Odor Avoidance after Electric Shock in Drosophila Mutants Biases Learning and Memory Tests). In this paper, one of the effects observed by the authors has already been described, and the molecular requirement of memory-related genes is investigated. This paper should be mentioned and discussed.

      We thank the reviewer for bringing this important reference to our attention. We will cite the Preat (1998) paper and discuss its relevant findings in relation to our own in the revised manuscript.

      (3) Overall, the bidirectional effect they observed is interesting; however, their results are not always clear, and the use of a delta PI is sometimes misleading. The authors have mentioned that shocks induced attraction to the novel odour, while they should stick to the increase or decrease in preference/avoidance.

      The ΔPI is calculated either as (trained PI – mock PI) for different animals or as (post PI – pre PI) for the same animals, with the specific calculation clarified in each figure legend. A positive ΔPI signifies an increase in preference for the odor, which is equivalent to a relative attraction or a decrease in avoidance.

      As not all experiments are done in parallel logic, it is not always easy to understand which protocol the authors are using. For example, only optogenetics is used in the appetitive preexposure. Does exposing flies to sugar or activating reward dopaminergic neurons also increase odour avoidance? The observed increased odour avoidance after optogenetic activation of sweet-sensing neurons involve reward (e.g., decreased response) and/or punishment (e.g., increased response) to increase odour avoidance?  

      We used different behavioral assays (T-maze or arena), stimuli (real shock or optogenetics), and protocols (different or same animal groups) to robustly demonstrate the phenomenon across platforms. We explained each protocol in the figures or texts, and we’ll make them clearer to follow in the revised version. We focused on activating a clean set of sugar sensing neurons because this optogenetic stimulus is an effective and efficient substitute to real sugar. We agree that testing reward dopaminergic neuron activation is a logical extension and will consider adding these experiments in the revised work.

      The author should always statistically test the fly behavioural performances against 0 to have an idea of random choice or a clear preference toward an odour.

      Our primary focus is on the change in preference induced by training, rather than the innate odor preference itself, which can be highly variable due to physiological and environmental factors. Statistical testing against 0 for innate preference scores is not standard practice in this specific paradigm, as the critical question is whether a treatment alters behavior relative to a control.

      On the appetitive side, the internal hunger state would play an important role. The author should test it or at least discuss it.

      For appetitive experiments, we always starve the flies on 1% agar for two days prior to behavioral tests to standardize their hunger state. We will consider adding fed flies as control groups in the revised work.

      (4) The authors found a discrepancy between genetic backgrounds; sometimes the same odour can be attractive or aversive.

      We observed minor discrepancies in innate odor preferences across genetic backgrounds, which is a known and common occurrence. Different genotypes and temperatures can result in different baseline PI scores. However, the key finding is that the relative change in odor preference following an aversive stimulus is consistent: it increases the relative preference for an odor compared to air. This sometimes reverses valence (aversion to attraction) and other times simply reduces aversion. Our analysis focuses on this consistent, relative change.

      Different effects between the T-maze and the olfactory arena are found. The authors proposed that: "Punishment priming effect was still not detected, probably due to the insensitivity of the optogenetic arena". This is unclear to me, considering all prior work using this arena. The author should discuss it more clearly.

      The punishment effect with CS+ present was reliably detected in the T-maze (Figure 1A) but was not significant in the olfactory arena (Figure 2—figure supplement 1B-C). We hypothesize that the olfactory arena assay is less sensitive than the T-maze for detecting such subtle behavioral changes. This is evidenced by the fact that even classical odor-shock conditioning yields lower PI in the arena (typically ~0.4) than in the T-maze (~0.8), likely due to the greater distance flies must explore and travel. The higher variance in the arena may therefore mask more modest effects. Here the effect under investigation was induced by optogenetically activating only a small subset of aversive dopaminergic neurons, a stimulus that is likely weaker than full electric shock. This reduced stimulus strength may have contributed to the challenge of detecting a significant effect in the less sensitive arena paradigm.

      They mentioned that flies could not be conditioned with air and electric shock. However, flies could be conditioned with the context + shock, which is changing in the T-maze and not in the optogenetic area.

      While flies can be conditioned to context, during the optogenetic stimulation period in the arena, the light is delivered uniformly across all four quadrants. Therefore, any potential context conditioning would be equivalent across the entire chamber and should not bias the final distribution of flies between the odor and air quadrants during the test, nor affect the calculated PI score.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      The authors revealed the cellular heterogeneity of companion cells (CCs) and demonstrated that the florigen gene FT is highly expressed in a specific subpopulation of these CCs in Arabidopsis. Through a thorough characterization of this subpopulation, they further identified NITRATE-INDUCIBLE GARP-TYPE TRANSCRIPTIONAL REPRESSOR 1 (NIGT1)-like transcription factors as potential new regulators of FT. Overall, these findings are intriguing and valuable, contributing significantly to our understanding of florigen and the photoperiodic flowering pathway. However, there is still room for improvement in the quality of the data and the depth of the analysis. I have several comments that may be beneficial for the authors. 

      Strengths: 

      The usage of snRNA-seq to characterize the FT-expressing companion cells (CCs) is very interesting and important. Two findings are novel: 1) Expression of FT in CCs is not uniform. Only a subcluster of CCs exhibits high expression level of FT. 2) Based on consensus binding motifs enriched in this subcluster, they further identify NITRATE-INDUCIBLE GARP-TYPE TRANSCRIPTIONAL REPRESSOR 1 (NIGT1)-like transcription factors as potential new regulators of FT. 

      We are pleased to hear that reviewer 1 noted the novelty and importance of our work. As reviewer 1 mentioned, we are also excited about the identification of a subcluster of companion cells with very high FT expression. We believe that this work is an initial step to describe the molecular characteristics of these FT-expressing cells. We are also excited to share our new findings on NIGT1s as potential FT regulators. We believe this finding will attract a broader audience, as the molecular factor coordinating plant nutrition status with flowering time remains largely unknown despite its well-known phenomenon.

      Weaknesses: 

      (1) Title: "A florigen-expressing subpopulation of companion cells". It is a bit misleading. The conclusion here is that only a subset of companion cells exhibit high expression of FT, but this does not imply that other companion cells do not express it at all. 

      We agree with this comment, as it was not our intention to sound like that FT is not produced in other companion cells than the subpopulation we identified. We revised the title to more accurately reflect the point. The new title is “Companion cells with high florigen production express other small proteins and reveal a nitrogen-sensitive FT repressor.”

      (2) Data quality: Authors opted for fluorescence-activated nuclei sorting (FANS) instead of traditional cell sorting method. What is the rationale behind this decision? Readers may wonder, especially given that RNA abundance in single nuclei is generally lower than that in single cells. This concern also applies to snRNA-seq data. Specifically, the number of genes captured was quite low, with a median of only 149 genes per nucleus. Additionally, the total number of nuclei analyzed was limited (1,173 for the pFT:NTF and 3,650 for the pSUC2:NTF). These factors suggest that the quality of the snRNA-seq data presented in this study is quite low. In this context, it becomes challenging for the reviewer to accurately assess whether this will impact the subsequent conclusions of the paper. Would it be possible to repeat this experiment and get more nuclei?

      We appreciate this comment; we noticed that we did not clearly explain the rationale for using single-nucleus RNA sequencing (snRNA-seq) instead of single-cell RNA-seq (scRNA-seq). As reviewer 1 mentioned, RNA abundance in scRNA-seq is higher than in snRNA-seq. To conduct scRNA-seq using plant cells, protoplasting is the necessary step. However, in our study, protoplasting has many drawbacks in isolating our target cells from the phloem. First, it is technically challenging to efficiently isolate protoplasts from highly embedded phloem companion cells from plant tissues. Typically, at least several hours of enzymatic incubation are required to obtain protoplasts from companion cells (often using semi-isolated vasculatures), and the efficiency of protoplasting vasculature cells remains low. Secondly, for our analysis, restoring the time information within a day is also crucial. Therefore, we employed a more rapid isolation method. In the revision, we will explain our rationale for choosing snRNA-seq due to the technical limitations. In the revised manuscripts, we added four new sentences in the Introduction section to clearly explain these points.

      Reviewer 1 also raised a concern about the quality of our snRNA-seq data, referring to the relatively low readcounts per nucleus. Although we believe that shallow reads do not necessarily indicate low quality and are confident in the accuracy of our snRNA-seq data, as supported by the detailed follow-up experiments (e.g., imaging analysis in Fig. 4B), we agree that it is important to address this point in the revision and alleviate readers’ concerns regarding the data quality. 

      We believe the primary reason for the low readcounts per cell is the small amount of RNA present in each Arabidopsis vascular cell nucleus that we isolated. For bulk nuclei RNAseq, we collected 15,000 nuclei. However, the total RNA amount was approximately 3 ng. It indicates that each nucleus isolated contains a very limited amount of RNA (by the simple calculation, 3,000 pg / 15,000 nuclei = 0.2 pg/nucleus). It appears that the size of cells and nuclei was still small in 2-week-old seedlings; thus, each nucleus may contain lower levels of RNA. During the optimization process, we also tried to fix the tissues that we hoped to restore nuclear retained RNA, but unfortunately, in our hands, we encountered the technical issue of nuclei aggregation that hindered the sorting process, which is not suitable for single-nucleus RNA-seq.

      Reviewer 1 suggested that we repeat the same snRNA-seq experiment. We agree that having more cells increases the reliability of data. However, to our knowledge, higher cell numbers enhance the confidence of clustering, but not readcounts per cell. In our snRNAseq data, our target, FT-expressing cells, were observed in cluster 7, which projected at an obvious distance from other cell clusters. Therefore, we think that having more nuclei does not significantly help in separating high FT-expressing cluster 7 cells and different types of cells, although we may obtain more DEGs from the cluster 7 cells. Considering the costs and time required for additional snRNA-seq experiments, we think that adding more followup molecular biology experiment data would be more practical. We clearly stated the limitations of our approach in the Discussion section. “A drawback of our snRNA-seq analysis was shallow reads per nucleus. It appears mainly due to the low abundance of mRNA in nuclei from 2-week-old leaves. Based on our calculation, the average mRNA level per nucleus is approximately 0.2 pg (3,000 pg mRNA from 15,000 sorted nuclei). Future technological advance is needed to improve the data quality“

      In this revised version of the manuscript, we silenced FT gene expression using an amiRNA against FT driven by tissue-specific promoters [pROXY10, cluster 7; pSUC2, companion cells; pPIP2.6, cluster 4 (for the spatial expression pattern of PIP2.6, please see the new data shown in Fig. S8F); pGC1, guard cells]. Given that both FT and ROXY10 were highly expressed in cluster 7 of our snRNA-seq dataset, we anticipated the late flowering phenotype of pROXY10:amiRNA-ft. As we expected, pROXY10:amiR-ft but not pPIP2.6:amiR-ft lines showed delayed flowering phenotypes (Fig. S14A), supporting the validity of our snRNA-seq approach. We are also now more confident in the resolution of our snRNA-seq analysis, since cluster 4-specific PIP2.6 did not cause late flowering despite its higher basal expression than ROXY10 (Fig. S14B).

      (3) Another disappointment is that the authors did not utilize reporter genes to identify the specific locations of the FT-high expressing cells (cluster 7 cells) within the CC population in vivo. Are there any discernible patterns that can be observed? 

      In the original manuscript, as we showed only limited spatial images of overlap between FT and other cluster 7 genes in Fig. 4B, this comment is totally understandable. To respond to it, we added whole leaf images showing the spatial expression of FT and other cluster 7 genes (Fig. S12). These data indicate that cluster 7 genes including FT are expressed highly in minor veins in the distal part of the leaf but weakly in the main vein. We also added enlarged images of spatial expression of FT and cluster 7 genes (FLP1 and ROXY10) to note that those genes do not overlap completely (Fig. S13).

      In contrast to cluster 7 genes, genes highly expressed in cluster 4, such as LTP1 and MLP28, are reportedly highly expressed in the main leaf vein. To further confirm it, we established a transgenic line that expresses a GFP-fusion protein controlled by the promoter of a cluster 4-specific gene PIP2.6 (Fig. S8F). It also showed strong GFP signals in the main vein, consistent with previous observations of LTP1 and MLP28.   In summary, FT-expressing cells (cluster 7 cells) are enriched in companion cells in the minor vein, and their expression patterns show a clear distinction from genes expressed in the main vein (e.g., cluster 4-specific genes). 

      (4) The final disappointment is that the authors only compared FT expression between the nigtQ mutants and the wild type. Does this imply that the mutant does not have a flowering time defect particularly under high nitrogen conditions? 

      We agree with reviewer 1 that more experiments are required to conclude the role of NIGT1 on FT regulation, in addition to our Y1H data, flowering time data of NIGT1 overexpressors, and FT expression in NIGT1 overexpressors and nigtQ mutant.

      First, to test the direct regulation of NIGT1s on FT transcription, we conducted a transient luciferase (LUC) assay in tobacco leaves using effectors (p35S:NIGT1.2, p35S:NIGT1.4, and p35S:GFP) and reporters [pFT:LUC (FT promoter fused with LUC) and pFTm:LUC (the same FT promoter with mutations in NIGT1-binding sites fused with LUC)]. Our result showed that NIGT1.2 and NIGT1.4, but not GFP, decreased the activity of pFT:LUC but not pFTm:LUC (Fig. 5C). This indicates that NIGT1s directly repress the FT gene.

      Second, to address reviewer 1’s suggestion about the effect of of nigtQ mutation on flowering time, we have grown WT and nigtQ plants on 20 mM and 2 mM NH<sub>4</sub>NO<sub>3</sub>. Under 20 mM NH<sub>4</sub>NO<sub>3</sub>, the nigtQ line bolted at earlier days than WT; under 2 mM NH<sub>4</sub>NO<sub>3</sub>, nigtQ and WT bolted at almost same timing (Fig. S17D and E). This result suggests that the nigtQ mutation affects flowering timing depending on nitrogen nutrient status. However, leaf numbers of bolted plants were not different between WT and nigtQ lines (Fig. S17E). Therefore, it appears that nigtQ mutation also accelerated overall growth of plants rather than flowering promotion. We also have measured flowering time by counting leaf numbers of the nigtQ and WT plants at bolting on nitrogen-rich soil. The mutant generated slightly more leaves than WT when they flowered (Fig. S17G). These results suggest that the NIGT-derived fine-tuning of FT regulation is conditional on higher nitrogen conditions. 

      Minor: 

      (1) Abstract: "Our bulk nuclei RNA-seq demonstrated that FT-expressing cells in cotyledons and in true leaves differed transcriptionally.". This sentence is not informative. What exactly is the difference in FT-expressing cells between cotyledons and true leaves? 

      We modified the sentence to clarify the differences between cotyledons and true leaves. “Our bulk nuclei RNA-seq demonstrated that FT-expressing cells in cotyledons and true leaves showed differences especially in FT repressor genes.”

      (2) As a standard practice, to support the direct regulation of FT by NIGT1, the authors should provide EMSA and ChIP-seq data. Ideally, they should also generate promoter constructs with deletions or mutations in the NIGT1 binding sites. 

      To test direct interaction of NIGT1 to the FT promoter sequences, we performed the transient reporter assay using FT promoter driven luciferase reporter (Fig. 5C). NIGT1.2 and NIGT1.4 repressed the FT promoter activity; however, with NIGT1 binding site mutations, this repression was not observed, indicating that NIGT1 binds to the ciselements in the FT promoter to repress its transcription.

      (3) Sorting: Did the authors fix the samples before preparing the nuclei suspension? If not, could this be the reason the authors observed the JA-responsive clusters (Fig. 2J)? Please provide more details related to nuclei sorting in the Methods section. 

      We added a new subsection in the Materials and Methods section to explain a detail of the nuclei sorting procedure. We did not include a sample fixation step. We have tried formaldehyde fixation; however, it clumped nuclei, which was not suitable for snRNA-seq. Moreover, fixation steps generally reduce readcounts of single-cell RNA-seq according to the 10X Genomics’ guideline.

      We agree that JA responses were triggered during the FANS nuclei isolation. Therefore, we added the following sentence. “Since our FANS protocol did not include a sample fixation step to avoid clumping, these cells likely triggered wounding responses during the chopping and sorting process (Fig. S1B).  

      Reviewer #2 (Public review): 

      This manuscript submitted by Takagi et al. details the molecular characterization of the FTexpressing cell at a single-cell level. The authors examined what genes are expressed specifically in FT-expressing cells and other phloem companion cells by exploiting bulk nuclei and single-nuclei RNA-seq and transgenic analysis. The authors found the unique expression profile of FT-expressing cells at a single-cell level and identified new transcriptional repressors of FT such as NIGT1.2 and NIGT1.4. 

      Although previous researchers have known that FT is expressed in phloem companion cells, they have tended to neglect the molecular characterization of the FT-expressing phloem companion cells. To understand how FT, which is expressed in tiny amounts in phloem companion cells that make up a very small portion of the leaf, can be a key molecule in the regulation of the critical developmental step of floral transition, it is important to understand the molecular features of FT-expressing cells in detail. In this regard, this manuscript provides insight into the understanding of detailed molecular characteristics of the FT-expressing cell. This endeavor will contribute to the research field of flowering time. 

      We are grateful that reviewer 2 recognizes the importance of transcriptome profiling of FTexpressing cells at the single-cell level.

      Here are my comments on how to improve this manuscript. 

      (1) The most noble finding of this manuscript is the identification of NTGI1.2 as the upstream regulator of FT-expressing cluster 7 gene expression. The flowering phenotypes of the nigtQ mutant and the transgenic plants in which NIGT1.2 was expressed under the SUC2 gene promoter support that NIGT1.2 functions as a floral repressor upstream of the FT gene. Nevertheless, the expression patterns of NIGT1.2 genes do not appear to have much overlap with those of NIGT1.2-downstream genes in the cluster 7 (Figs S14 and F3). An explanation for this should be provided in the discussion section. 

      We agree with reviewer 2 that the spatial expression patterns of NIGT1.2 and cluster 7 genes do not overlap much, and some discussion should be provided in the manuscript. Although we do not have a concrete answer for this phenomenon, we obtained the new data showing that NIGT1.2 and NIGT1.4 directly repress the FT gene in planta (Fig. 5C).  As NIGT1.2/1.4 are negative regulators of FT, it is plausible that NIGT1.2/1.4 may suppress FT gene expression in non-cluster 7 cells to prevent the misexpression of FT. We added this point in the Results section.

      (2) To investigate gene expression in the nuclei of specific cell populations, the authors generated transgenic plants expressing a fusion gene encoding a Nuclear Targeting Fusion protein (NTF) under the control of various cell type-specific promoters. Since the public audience would not know about NTF without reading reference 16, some explanation of NTF is necessary in the manuscript. Please provide a schematic of constructs the authors used to make the transformants.

      As reviewer 2 pointed out, we lacked a clear explanation of why we used NTF in this study. NTF is the fusion protein that consists of a nuclear envelope targeting WPP domain, GFP, and a biotin acceptor peptide. It was initially designed for the INTACT (isolation of nuclei tagged in specific cell types) method, which enables us to isolate bulk nuclei from specific tissues. Although our original intention was to profile the bulk transcriptome of mRNAs that exist in nuclei of the FT-expressing cells using INTACT, we utilized our NTF transgenic lines for snRNA-seq analysis. To explain what NTF is to readers, we included a schematic diagram of NTF (Fig. S1A) and more explanation about NTF in the Results section.

      Again, we appreciate all reviewers’ careful and constructive comments. With these changes, we hope our revised manuscript is now satisfactory.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review): 

      Summary: 

      The study by Klug et al. investigated the pathway specificity of corticostriatal projections, focusing on two cortical regions. Using a G-deleted rabies system in D1-Cre and A2a-Cre mice to retrogradely deliver channelrhodopsin to cortical inputs, the authors found that M1 and MCC inputs to direct and indirect pathway spiny projection neurons (SPNs) are both partially segregated and asymmetrically overlapping. In general, corticostriatal inputs that target indirect pathway SPNs are likely to also target direct pathway SPNs, while inputs targeting direct pathway SPNs are less likely to also target indirect pathway SPNs. Such asymmetric overlap of corticostriatal inputs has important implications for how the cortex itself may determine striatal output. Indeed, the authors provide behavioral evidence that optogenetic activation of M1 or MCC cortical neurons that send axons to either direct or indirect pathway SPNs can have opposite effects on locomotion and different effects on action sequence execution. The conclusions of this study add to our understanding of how cortical activity may influence striatal output and offer important new clues about basal ganglia function. 

      The conceptual conclusions of the manuscript are supported by the data, but the details of the magnitude of afferent overlap and causal role of asymmetric corticostriatal inputs on some behavioral outcomes may be a bit overstated given technical limitations of the experiments. 

      For example, after virally labeling either direct pathway (D1) or indirect pathway (D2) SPNs to optogenetically tag pathway-specific cortical inputs, the authors report that a much larger number of "non-starter" D2-SPNs from D2-SPN labeled mice responded to optogenetic stimulation in slices than "non-starter" D1 SPNs from D1-SPN labeled mice did. Without knowing the relative number of D1 or D2 SPN starters used to label cortical inputs, it is difficult to interpret the exact meaning of the lower number of responsive D2-SPNs in D1 labeled mice (where only ~63% of D1-SPNs themselves respond) compared to the relatively higher number of responsive D1-SPNs (and D2-SPNs) in D2 labeled mice. While relative differences in connectivity certainly suggest that some amount of asymmetric overlap of inputs exists, differences in infection efficiency and ensuing differences in detection sensitivity in slice experiments make determining the degree of asymmetry problematic. 

      It is also unclear if retrograde labeling of D1-SPN- vs D2-SPN- targeting afferents labels the same densities of cortical neurons. This gets to the point of specificity in some of the behavioral experiments. If the target-based labeling strategies used to introduce channelrhodopsin into specific SPN afferents label significantly different numbers of cortical neurons, might the difference in the relative numbers of optogenetically activated cortical neurons itself lead to behavioral differences? 

      We thank the reviewer for the comments and for raising additional interpretations of our results. We agree that determining the relative number of D1- versus D2-SPN starter cells would allow a more accurate estimate of connectivity. However, due to current technical limitations, achieving this level of precision remains challenging. As the reviewer also noted, differences in the number of cortical neurons targeting D1- versus D2-SPNs could introduce additional complexity to the functional effects observed in the behavioral experiments. Moreover, functional heterogeneity is likely to exist not only among cortical neurons projecting to striatal D1- or D2-SPNs, but also within the striatal D1- and D2-SPN populations themselves. Addressing these questions at the single-neuron level will require more refined viral tools in combination with improved recording and manipulation techniques. Despite these limitations, our results suggest that a subpopulation of cortical neurons selectively targets striatal D1-SPNs, supporting a functional dichotomy of pathway-specific corticostriatal subcircuits in the control of behavior.   

      Reviewer #2 (Public review): 

      Summary: 

      Klug et al. use monosynaptic rabies tracing of inputs to D1- vs D2-SPNs in the striatum to study how separate populations of cortical neurons project to D1- and D2-SPNs. They use rabies to express ChR2, then patch D1-or D2-SPNs to measure synaptic input. They report that cortical neurons labeled as D1-SPN-projecting preferentially project to D1-SPNs over D2-SPNs. In contrast, cortical neurons labeled as D2-SPN-projecting project equally to D1- and D2-SPNs. They go on to conduct pathway-specific behavioral stimulation experiments. They compare direct optogenetic stimulation of D1- or D2-SPNs to stimulation of MCC inputs to DMS and M1 inputs to DLS. In three different behavioral assays (open field, intra-cranial self-stimulation, and a fixed ratio 8 task), they show that stimulating MCC or M1 cortical inputs to D1-SPNs is similar to D1-SPN stimulation, but that stimulating MCC or M1 cortical inputs to D2-SPNs does not recapitulate the effects of D2-SPN stimulation (presumably because both D1- and D2-SPNs are being activated by these cortical inputs). 

      Strengths: 

      Showing these same effects in three distinct behaviors is strong. Overall, the functional verification of the consequences of the anatomy is very nice to see. It is a good choice to patch only from mCherry-negative non-starter cells in the striatum. This study adds to our understanding of the logic of corticostriatal connections, suggesting a previously unappreciated structure. 

      Weaknesses: 

      One limitation is that all inputs to SPNs are expressing ChR2, so they cannot distinguish between different cortical subregions during patching experiments. Their results could arise because the same innervation patterns are repeated in many cortical subregions or because some subregions have preferential D1-SPN input while others do not. 

      Thank you for raising this thoughtful concern. It is indeed not feasible to restrict ChR2 expression to a specific cortical region using the first-generation rabies-ChR2 system alone. A more refined approach would involve injecting Cre-dependent TVA and RG into the striatum of D1- or A2A-Cre mice, followed by rabies-Flp infection. Subsequently, a Flp-dependent ChR2 virus could be injected into the MCC or M1 to selectively label D1- or D2-projecting cortical neurons. This strategy would allow for more precise targeting and address many of the current limitations.

      However, a significant challenge lies in the cytotoxicity associated with rabies virus infection. Neuronal health begins to deteriorate substantially around 10 days post-infection, which provides an insufficient window for robust Flp-dependent ChR2 expression. We have tested several new rabies virus variants with extended survival times (Chatterjee et al., 2018; Jin et al., 2024), but unfortunately, they did not perform effectively or suitably in the corticostriatal systems we examined.

      In our experimental design, the aim is to delineate the connectivity probabilities to D1 or D2-SPNs from cortical neurons. Our hypothesis considered includes the possibility that similar innervation patterns could occur across multiple cortical subregions, or that some subregions might show preferential input to D1-SPNs while others do not, or a combination of both scenarios. This leads us to perform a series behavior test that using optogenetic activation of the D1- or D2-projecting cortical populations to see which could be the case.

      In the cortical areas we examined, MCC and M1, during behavioral testing, there is consistency with our electrophysiological results. Specifically, when we stimulated the D1-projecting cortical neurons either in MCC or in M1, mice exhibited facilitated local motion in open field test, which is the same to the activation of D1 SPNs in the striatum along (MCC: Fig 3C & D vs. I; M1: Fig 3F & G vs. L). Conversely, stimulation of D2-projecting MCC or M1 cortical neurons resulted in behavioral effects that appeared to combine characteristics of both D1- and D2-SPNs activation in the striatum (MCC: Fig 3C & D vs. J; M1: Fig 3F & G vs. M). The similar results were observed in the ICSS test. Our interpretation of these results is that the activation of D1-projecting neurons in the cortex induces behavior changes akin to D1 neuron activation, while activation of D2-projecting neurons in the cortex leads to a combined effect of both D1 and D2 neuron activation. This suggests that at least some cortical regions, the ones we tested, follow the hypothesis we proposed.

      There are also some caveats with respect to the efficacy of rabies tracing. Although they only patch non-starter cells in the striatum, only 63% of D1-SPNs receive input from D1-SPN-projecting cortical neurons. It's hard to say whether this is "high" or "low," but one question is how far from the starter cell region they are patching. Without this spatial indication of where the cells that are being patched are relative to the starter population, it is difficult to interpret if the cells being patched are receiving cortical inputs from the same neurons that are projecting to the starter population. The authors indicate they are patching from mCherry-negative neurons within the region of the mCherry-positive neurons, but since the mCherry population will include both true starter cells and monosynaptically connected cells, this is not perfectly precise. Convergence of cortical inputs onto SPNs may vary with distance from the starter cell region quite dramatically, as other mapping studies of corticostriatal inputs have shown specialized local input regions can be defined based on cortical input patterns (Hintiryan et al., Nat Neurosci, 2016, Hunnicutt et al., eLife 2016, Peters et al., Nature, 2021). 

      This is a valid concern regarding anatomical studies. Investigating cortico-striatal connectivity at the single-cell level remains technically challenging due to current methodological limitations. At present, we rely on rabies virus-mediated trans-synaptic retrograde tracing to identify D1- or D2-projecting cortical populations. This anatomical approach is coupled with ex vivo slice electrophysiology to assess the functional connectivity between these projection-defined cortical neurons and striatal SPNs. This enables us to quantify connection ratios, for example, the proportion of D1-projecting cortical neurons that functionally synapse onto non-starter D1-SPNs.

      To ensure the robustness of our conclusions, it is essential that both the starter cells and the recorded non-starter SPNs receive comparable topographical input from the cortex and other brain regions. Therefore, we carefully designed our experiments so that all recorded cells were located within the injection site, were mCherry-negative (i.e., non-starter cells), and were surrounded by ChR2-mCherry-positive neurons. This configuration ensured that the distance between recorded and starter cells did not exceed 100 µm, maintaining close anatomical proximity and thereby preserving the likelihood of shared cortical innervation within the examined circuitry.

      These methodological details are also described in the section on ex vivo brain slice electrophysiology, specifically in the Methods section, lines 453–459:

      “D1-SPNs (eGFP-positive in D1-eGFP mice, or eGFP-negative in D2-eGFP mice) or D2-SPNs (eGFP-positive in D2-eGFP mice, or eGFP-negative in D1-eGFP mice) that were ChR2-mCherry-negative, but in the injection site and surrounded by cells expressing ChR2-mCherry were targeted for recording. This configuration ensured that the distance between recorded and starter cells did not exceed 100 µm, maintaining close anatomical proximity and thereby preserving the likelihood of shared cortical innervation within the examined circuitry.”

      This experimental strategy was implemented to control for potential spatial biases and to enhance the interpretability of our connectivity measurements.

      A caveat for the optogenetic behavioral experiments is that these optogenetic experiments did not include fluorophore-only controls, although a different control (with light delivered in M1) is provided in Supplementary Figure 3. Another point of confusion is that other studies (Cui et al, J Neurosci, 2021) have reported that stimulation of D1-SPNs in DLS inhibits rather than promotes movement. This study may have given different results due to subtly different experimental parameters, including fiber optic placement and NA.

      We appreciate the reviewer’s thoughtful evaluation and comments. We have added a short discussion of Cui et al.’s study on optogenetic stimulation of D1-SPNs in the DLS (lines 341-343), which reports findings that contrast with ours and those of other studies.

      Reviewer #3 (Public review): 

      Review of resubmission: The authors provided a response to the reviews from myself and other reviewers. While some points were made satisfactorily, particularly in clarification of the innervation of cortex to striatum and the effects of input stimulation, many of my points remain unaddressed. In several cases, the authors chose to explain their rationale rather than address the issues at hand. A number of these issues (in fact, the majority) could be addressed simply by toning done the confidence in conclusions, so it was disappointing to see that the authors by and large did not do this. I repeat my concerns below and note whether I find them to have been satisfactorily addressed or not. 

      In the manuscript by Klug and colleagues, the investigators use a rabies virus-based methodology to explore potential differences in connectivity from cortical inputs to the dorsal striatum. They report that the connectivity from cortical inputs onto D1 and D2 MSNs differs in terms of their projections onto the opposing cell type, and use these data to infer that there are differences in cross-talk between cortical cells that project to D1 vs. D2 MSNs. Overall, this manuscript adds to the overall body of work indicating that there are differential functions of different striatal pathways which likely arise at least in part by differences in connectivity that have been difficult to resolve due to difficulty in isolating pathways within striatal connectivity, and several interesting and provocative observations were reported. Several different methodologies are used, with partially convergent results, to support their main points. 

      However, I have significant technical concerns about the manuscript as presented that make it difficult for me to interpret the results of the experiments. My comments are below. 

      Major: 

      There is generally a large caveat to the rabies studies performed here, which is that both TVA and the ChR2-expressing rabies virus have the same fluorophore. It is thus essentially impossible to determine how many starter cells there are, what the efficiency of tracing is, and which part of the striatum is being sampled in any given experiment. This is a major caveat given the spatial topography of the cortico-striatal projections. Furthermore, the authors make a point in the introduction about previous studies not having explored absolute numbers of inputs, yet this is not at all controlled in this study. It could be that their rabies virus simply replicates better in D1-MSNs than D2-MSNs. No quantifications are done, and these possibilities do not appear to have been considered. Without a greater standardization of the rabies experiments across conditions, it is difficult to interpret the results. 

      This is still an issue. The authors point out why they chose various vectors. I can understand why the authors chose the fluorophores etc. that they did, yet the issues I raised previously are still valid. The discussion should mention that this is a potential issue. It does not necessarily invalidate results, but it is an issue. Furthermore, it is possible (in all systems) that rabies replicates better/more efficiently in some cells than others. This is one possible interpretation that has not really been explored in any study. I don't suggest the authors attempt to do that, but it should be raised as a potential interpretation. If the rabies results could mean several different things, the authors owe it to the readership to state all possible interpretations of data.

      We thank the reviewer for the comments and suggestions. Because the same fluorophore (mCherry) was used in both TVA- and ChR2-expressing viruses, it was not possible to distinguish true starter SPNs from TVA-only SPNs or monosynaptically labeled SPNs. This limitation makes it difficult to precisely assess the efficiency of rabies labeling and retrograde tracing in our experimental setup. Moreover, differences in rabies replication efficiency between D1- and D2-SPNs could potentially lead to an apparent lower connection probability from D1-projecting cortical neurons to D2-SPNs than from D2-projecting cortical neurons to D1-SPNs. We have added this clarification to the Discussion (lines 280-297).

      The authors claim using a few current clamp optical stimulation experiments that the cortical cells are healthy, but this result was far from comprehensive. For example, membrane resistance, capacitance, general excitability curves, etc are not reported. In Figure S2, some of the conditions look quite different (e.g., S2B, input D2-record D2, the method used yields quite different results that the authors write off as not different). Furthermore, these experiments do not consider the likely sickness and death that occurs in starter cells, as has been reported elsewhere. Health of cells in the circuit is overall a substantial concern that alone could invalidate a large portion, if not all, of the behavioral results. This is a major confound given those neurons are thought to play critical roles in the behaviors being studied. This is a major reason why first-generation rabies viruses have not been used in combination with behavior, but this significant caveat does not appear to have been considered, and controls e.g., uninfected animals, infected with AAV helpers, etc, were not included. 

      This issue remains unaddressed. I did not request clarity about experimental design, but rather, raised issues about the potential effects of toxicity. I believe this to be a valid concern that needs to be discussed in the manuscript, especially given what look visually like potential differences in S2. 

      We understand and appreciate the reviewer’s concern regarding the potential cytotoxicity of rabies virus infection. Although we performed the in vivo optogenetic behavioral experiments during a period when rabies-infected cells are generally considered relatively healthy, some deficits in starter cells may still occur and could contribute to the observed effects of optogenetic cortical stimulation. We have added this clarification to the Discussion (lines 298-306).

      The overall purity (e.g., EnvA pseudotyping efficiency) of the RABV prep is not shown. If there was a virus that was not well EnvA-pseudotyped and thus could directly infect cortical (or other) inputs, it would degrade specificity. This issue has not been addressed. Viral strain is irrelevant. The quality of the specific preparations used is what matters.

      While most of the study focuses on the cortical inputs, in slice recordings, inputs from the thalamus are not considered, yet likely contribute to the observed results. Related to this, in in vivo optogenetic experiments, technically, if the thalamic or other inputs to the dorsal striatum project to the cortex, their method will not only target cortical neurons but also terminals of other excitatory inputs. If this cannot be ruled it, stating that the authors are able to selectively activate the cortical inputs to one or the other population should be toned down. 

      The authors added text to the discussion to address this point. While it largely does what is intended, based on the one study cited, I disagree with the authors' conclusions that it is "clear" that potential contamination from other sites does not play a role. The simplest interpretation is the one the authors state, and there is some supporting evidence to back up that assertion, but to me that falls short of making the point "clear" that there are no other interpretations. 

      The statements about specificity of connectivity are not well founded. It may be that in the specific case where they are assessing outside of the area of injections, their conclusions may hold (e.g., excitatory inputs onto D2s have more inputs onto D1s than vice versa). However, how this relates to the actual site of injection is not clear. At face value, if such a connectivity exists, it would suggest that D1-MSNs receive substantially more overall excitatory inputs than D2s. It is thus possible that this observation would not hold over other spatial intervals. This was not explored and thus the conclusions are over-generalized. e.g., the distance from the area of red cells in the striatum to recordings was not quantified, what constituted a high level of cortical labeling was not quantified, etc. Without more rigorous quantification of what was being done, it is difficult to interpret the results. 

      Again, the goal here would be to make a statement about this in the discussion to clarify limitations of the study. I don't expect the authors to re-do all of these experiments, but since they are discussing the corticostriatal circuits, which have multiple subdomains, this remains a relevant point. It has not been addressed. 

      The results in Figure 3 are not well controlled. The authors show contrasting effects of optogenetic stimulation of D1-MSNs and D2-MSNs in the DMS and DLS, results which are largely consistent with the canon of basal ganglia function. However, when stimulating cortical inputs, stimulating the inputs from D1-MSNs gives the expected results (increased locomotion) while stimulating putative inputs to D2-MSNs had no effect. This is not the same as showing a decrease in locomotion - showing no effect here is not possible to interpret. 

      I think that the caveat of showing no clear effects of inputs to D2 stimulation should be pointed out. Yes, I understand that the viruses appeared to express etc., but again it remains possible that the results are driven by a lack of e.g., sufficient ChR2 expression. Aside from a full quantification of the number of cells expressing ChR2, overlap in fiber placement and ChR2 expression (which I don't suggest), this remains a possibility and should be pointed out, as it remains a possibility. 

      In the light of their circuit model, the result showing that inputs to D2-MSNs drive ICSS is confusing. How can the authors account for the fact that these cells are not locomotor-activating, stimulation of their putative downstream cells (D2-MSNs) does not drive ICSS, yet the cortical inputs drive ICSS? Is the idea that these inputs somehow also drive D1s? If this is the case, how do D2s get activated, if all of the cortical inputs tested net activate D1s and not D2s? Same with the results in Figure 4 - the inputs and putative downstream cells do not have the same effects. Given potential caveats of differences in viral efficiency, spatial location of injections, and cellular toxicity, I cannot interpret these experiments. 

      The explanation the authors provide in their rebuttal makes sense, however this should be included in the discussion of the manuscript, as it is interesting and relevant. 

      We thank the reviewer for the valuable comments and suggestions. In line with the reviewer’s recommendation, we have incorporated these explanations into the Discussion (lines 242–279) to help interpret the complex behavioral outcomes of optogenetic stimulation of cortical neurons projecting to D1- or D2-SPNs.

      Reviewer #2 (Recommendations for the authors): 

      I appreciate the authors' responses, which helped clarify some experimental choices. I appreciate that the experiment in Fig S3 serves as a reasonable light control for optogenetics experiments. The careful comparison with methods in Cui et al (2021) is useful, although not added to the main manuscript. Some of the other citations here don't really address the controversy, e.g. Kravitz at al is in DMS, but perhaps fully addressing this issue is outside the scope of the current manuscript and awaits further experiments. I also appreciate the clarification for recording locations that "This configuration ensured that the distance between recorded and starter cells did not exceed 100 µm, maintaining close anatomical proximity and thereby preserving the likelihood of shared cortical innervation within the examined circuitry." However, the statement in the reviewer response does not seem to be added to the manuscript's methods, which I think would be helpful. The criteria for choosing recorded cells are still a bit fuzzy without a map of recording locations and histology. There is also a problem that mCherry-positive cells could be starter cells or could be monosynaptically traced cells, so it is hard to know the area of the starter cell population in these experiments for sure. My evaluation of the manuscript remains largely the same as the original. However, I have adjusted my public review a bit to incorporate the authors' responses. I still think this paper has valuable information, suggesting an interesting and previously unappreciated structure of corticostriatal inputs that I hope this group and others will continue to investigate and incorporate into models of basal ganglia function.

      We thank the reviewer for the valuable suggestions. We have now included a comparison with Cui et al. in the Discussion. In addition, we have added the criteria for selecting recorded cells to the Methods section: ‘This configuration ensured that the distance between recorded and starter cells did not exceed 100 µm, maintaining close anatomical proximity and thereby preserving the likelihood of shared cortical innervation within the examined circuitry.’

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary: 

      This paper applies methods for segmentation, annotation, and visualization of acoustic analysis to zebra finch song. The paper shows that these methods can be used to predict the stage of song development and to quantify acoustic similarity. The methods are solid and are likely to provide a useful tool for scientists aiming to label large datasets of zebra finch vocalizations. The paper has two main parts: 1) establishing a pipeline/ package for analyzing zebra finch birdsong and 2) a method for measuring song imitation. 

      Strengths: 

      It is useful to see existing methods for syllable segmentation compared to new datasets.

      It is useful, but not surprising, that these methods can be used to predict developmental stage, which is strongly associated with syllable temporal structure.

      It is useful to confirm that these methods can identify abnormalities in deafened and isolated songs. 

      Weaknesses: 

      For the first part, the implementation seems to be a wrapper on existing techniques. For instance, the first section talks about syllable segmentation; they made a comparison between whisperseg (Gu et al, 2024), tweetynet (Cohen et al, 2022), and amplitude thresholding. They found that whisperseg performed the best, and they included it in the pipeline. They then used whisperseg to analyze syllable duration distributions and rhythm of birds of different ages and confirmed past findings on this developmental process (e.g. Aronov et al, 2011). Next, based on the segmentation, they assign labels by performing UMAP and HDBScan on the spectrogram (nothing new; that's what people have been doing). Then, based on the labels, they claimed they developed a 'new' visualization - syntax raster ( line 180 ). That was done by Sainburg et. al. 2020 in Figure 12E and also in Cohen et al, 2020 - so the claim to have developed 'a new song syntax visualization' is confusing. The rest of the paper is about analyzing the finch data based on AVN features (which are essentially acoustic features already in the classic literature). 

      First, we would like to thank this reviewer for their kind comments and feedback on this manuscript. It is true that many of the components of this song analysis pipeline are not entirely novel in isolation. Our real contribution here is bringing them together in a way that allows other researchers to seamlessly apply automated syllable segmentation, clustering, and downstream analyses to their data. That said, our approach to training TweetyNet for syllable segmentation is novel. We trained TweetyNet to recognize vocalizations vs. silence across multiple birds, such that it can generalize to new individual birds, whereas Tweetynet had only ever been used to annotate song syllables from birds included in its training set previously. Our validation of TweetyNet and WhisperSeg in combination with UMAP and HDBSCAN clustering is also novel, providing valuable information about how these systems interact, and how reliable the completely automatically generated labels are for downstream analysis. We have added a couple sentences to the introduction to emphasize the novelty of this approach and validation.

      Our syntax raster visualization does resemble Figure 12E in Sainburg et al. 2020, however it differs in a few important ways, which we believe warrant its consideration as a novel visualization method. First, Sainburg et al. represent the labels across bouts in real time; their position along the x axis reflects the time at which each syllable is produced relative to the start of the bout. By contrast, our visualization considers only the index of syllables within a bout (ie. First syllable vs. second syllable etc) without consideration of the true durations of each syllable or the silent gaps between them. This makes it much easier to detect syntax patterns across bouts, as the added variability of syllable timing is removed. Considering only the sequence of syllables rather than their timing also allows us to more easily align bouts according to the first syllable of a motif, further emphasizing the presence or absence of repeating syllable sequences without interference from the more variable introductory notes at the start of a motif. Finally, instead of plotting all bouts in the order in which they were produced, our visualization orders bouts such that bouts with the same sequence of syllables will be plotted together, which again serves to emphasize the most common syllable sequences that the bird produces. These additional processing steps mean that our syntax raster plot has much starker contrast between birds with stereotyped syntax and birds with more variable syntax, as compared to the more minimally processed visualization in Sainburg et al. 2020. There doesn’t appear to be any similar visualizations in Cohen et al. 2020. 

      The second part may be something new, but there are opportunities to improve the benchmarking. It is about the pupil-tutor imitation analysis. They introduce a convolutional neural network that takes triplets as an input (each tripled is essentially 3 images stacked together such that you have (anchor, positive, negative), Anchor is a reference spectrogram from, say finch A; positive means a different spectrogram with the same label as anchor from finch A, and negative means a spectrogram not related to A or different syllable label from A. The network is then trained to produce a low-dimensional embedding by ensuring the embedding distance between anchor and positive is less than anchor and negative by a certain margin. Based on the embedding, they then made use of earth mover distance to quantify the similarity in the syllable distribution among finches. They then compared their approach performance with that of sound analysis pro (SAP) and a variant of SAP. A more natural comparison, which they didn't include, is with the VAE approach by Goffinet et al. In this paper (https://doi.org/10.7554/eLife.67855, Fig 7), they also attempted to perform an analysis on the tutor pupil song.  

      We thank the reviewer for this suggestion. We have included a comparison of our triplet loss embedding model to the VAE model proposed in Goffinet et al. 2021. We also included comparisons of similarity scoring using each of these embedding models combined with either earth mover’s distance (EMD) or maximum mean discrepancy (MMD) to calculate the similarity of the embeddings, as was done in Goffinet et al. 2021. As discussed in the updated results section of the paper and shown in the new Figure 6–figure supplement 1, the Triplet loss model with MMD performs best for evaluating song learning on new birds, not included in model training. We’ve updated the main text of the paper to reflect this switch from EMD to MMD for the primary similarity scoring approach.

      Reviewer #2 (Public Review):

      Summary: 

      In this work, the authors present a new Python software package, Avian Vocalization Network (AVN) aimed at facilitating the analysis of birdsong, especially the song of the zebra finch, the most common songbird model in neuroscience. The package handles some of the most common (and some more advanced) song analyses, including segmentation, syllable classification, featurization of song, calculation of tutor-pupil similarity, and age prediction, with a view toward making the entire process friendlier to experimentalists working in the field.

      For many years, Sound Analysis Pro has served as a standard in the songbird field, the first package to extensively automate songbird analysis and facilitate the computation of acoustic features that have helped define the field. More recently, the increasing popularity of Python as a language, along with the emergence of new machine learning methods, has resulted in a number of new software tools, including the vocalpy ecosystem for audio processing, TweetyNet (for segmentation), t-SNE and UMAP (for visualization), and autoencoder-based approaches for embedding.

      Strengths: 

      The AVN package overlaps several of these earlier efforts, albeit with a focus on more traditional featurization that many experimentalists may find more interpretable than deep learning-based approaches. Among the strengths of the paper are its clarity in explaining the several analyses it facilitates, along with high-quality experiments across multiple public datasets collected from different research groups. As a software package, it is open source, installable via the pip Python package manager, and features high-quality documentation, as well as tutorials. For experimentalists who wish to replicate any of the analyses from the paper, the package is likely to be a useful time saver.

      Weaknesses: 

      I think the potential limitations of the work are predominantly on the software end, with one or two quibbles about the methods.

      First, the software: it's important to note that the package is trying to do many things, of which it is likely to do several well and few comprehensively. Rather than a package that presents a number of new analyses or a new analysis framework, it is more a codification of recipes, some of which are reimplementations of existing work (SAP features), some of which are essentially wrappers around other work (interfacing with WhisperSeg segmentations), and some of which are new (similarity scoring). All of this has value, but in my estimation, it has less value as part of a standalone package and potentially much more as part of an ecosystem like vocalpy that is undergoing continuous development and has long-term support. 

      We appreciate this reviewer’s comments and concerns about the structure of the AVN package and its long-term maintenance. We have considered incorporating AVN into the VocalPy ecosystem but have chosen not to for a few key reasons. (1) AVN was designed with ease of use for experimenters with limited coding experience top of mind. VocalPy provides excellent resources for researchers with some familiarity with object-oriented programming to manage and analyze their datasets; however, we believe it may be challenging for users without such experience to adopt VocalPy quickly. AVN’s ‘recipe’ approach, as you put it, is very easily accessible to new users, and allows users with intermediate coding experience to easily navigate the source code to gain a deeper understanding of the methodology. AVN also consistently outputs processed data in familiar formats (tables in .csv files which can be opened in excel), in an effort to make it more accessible to new users, something which would be challenging to reconcile with VocalPy’s emphasis on their `dataset`classes. (2) AVN and VocalPy differ in their underlying goals and philosophies when it comes to flexibility vs. standardization of analysis pipelines. VocalPy is designed to facilitate mixing-and-matching of different spectrogram generation, segmentation, annotation etc. approaches, so that researchers can design and implement their own custom analysis pipelines. This flexibility is useful in many cases. For instance, it could allow researchers who have very different noise filtering and annotation needs, like those working with field recordings versus acoustic chamber recordings, to analyze their data using this platform. However, when it comes to comparisons across zebra finch research labs, this flexibility comes at the expense of direct comparison and integration of song features across research groups. This is the context in which AVN is most useful. It presents a single approach to song segmentation, labeling, and featurization that has been shown to generalize well across research groups, and which allows direct comparisons of the resulting features. AVN’s single, extensively validated, standard pipeline approach is fundamentally incompatible with VocalPy’s emphasis on flexibility. We are excited to see how VocalPy continues to evolve in the future, and recognize the value that both AVN and VocalPy bring to the songbird research community, each with their own distinct strengths, weaknesses, and ideal use cases. 

      While the code is well-documented, including web-based documentation for both the core package and the GUI, the latter is available only on Windows, which might limit the scope of adoption. 

      We thank the reviewer for their kind words about AVN’s documentation. We recognize that the GUI’s exclusive availability on Windows is a limitation, and we would be happy to collaborate with other researchers and developers in the future to build a Mac compatible version, should the demand present itself. That said, the python package works on all operating systems, so non-Windows users still have the ability to use AVN that way.

      That is to say, whether AVN is adopted by the field in the medium term will have much more to do with the quality of its maintenance and responsiveness to users than any particular feature, but I believe that many of the analysis recipes that the authors have carefully worked out may find their way into other code and workflows. 

      Second, two notes about new analysis approaches:

      (1) The authors propose a new means of measuring tutor-pupil similarity based on first learning a latent space of syllables via a self-supervised learning (SSL) scheme and then using the earth mover's distance (EMD) to calculate transport costs between the distributions of tutors' and pupils' syllables. While to my knowledge this exact method has not previously been proposed in birdsong, I suspect it is unlikely to differ substantially from the approach of autoencoding followed by MMD used in the Goffinet et al. paper. That is, SSL, like the autoencoder, is a latent space learning approach, and EMD, like MMD, is an integral probability metric that measures discrepancies between two distributions. (Indeed, the two are very closely related: https://stats.stackexchange.com/questions/400180/earth-movers-distance-andmaximum-mean-discrepency.) Without further experiments, it is hard to tell whether these two approaches differ meaningfully. Likewise, while the authors have trained on a large corpus of syllables to define their latent space in a way that generalizes to new birds, it is unclear why such an approach would not work with other latent space learning methods.  

      We recognize the similarities between these approaches and have included comparisons of the VAE and MMD as in the Goffinet paper to our triplet loss model and EMD.  As discussed in the updated results section of the paper and shown in the new Figure 6–figure supplement 1, the Triplet loss model with MMD performs best for evaluating song learning on new birds, not included in model training. We’ve updated the main text of the paper to reflect this switch from EMD to MMD for the primary similarity scoring approach. 

      (2) The authors propose a new method for maturity scoring by training a model (a generalized additive model) to predict the age of the bird based on a selected subset of acoustic features. This is distinct from the "predicted age" approach of Brudner, Pearson, and Mooney, which predicts based on a latent representation rather than specific features, and the GAM nicely segregates the contribution of each. As such, this approach may be preferred by many users who appreciate its interpretability.  

      In summary, my view is that this is a nice paper detailing a well-executed piece of software whose future impact will be determined by the degree of support and maintenance it receives from others over the near and medium term.

      Reviewer #3 (Public Review):

      Summary: 

      The authors invent song and syllable discrimination tasks they use to train deep networks. These networks they then use as a basis for routine song analysis and song evaluation tasks. For the analysis, they consider both data from their own colony and from another colony the network has not seen during training. They validate the analysis scores of the network against expert human annotators, achieving a correlation of 80-90%. 

      Strengths: 

      (1) Robust Validation and Generalizability: The authors demonstrate a good performance of the AVN across various datasets, including individuals exhibiting deviant behavior. This extensive validation underscores the system's usefulness and broad applicability to zebra finch song analysis, establishing it as a potentially valuable tool for researchers in the field.

      (2) Comprehensive and Standardized Feature Analysis: AVN integrates a comprehensive set of interpretable features commonly used in the study of bird songs. By standardizing the feature extraction method, the AVN facilitates comparative research, allowing for consistent interpretation and comparison of vocal behavior across studies.

      (3) Automation and Ease of Use. By being fully automated, the method is straightforward to apply and should introduce barely an adoption threshold to other labs.

      (4) Human experts were recruited to perform extensive annotations (of vocal segments and of song similarity scores). These annotations released as public datasets are potentially very valuable. 

      Weaknesses: 

      (1) Poorly motivated tasks. The approach is poorly motivated and many assumptions come across as arbitrary. For example, the authors implicitly assume that the task of birdsong comparison is best achieved by a system that optimally discriminates between typical, deaf, and isolated songs. Similarly, the authors assume that song development is best tracked using a system that optimally estimates the age of a bird given its song. My issue is that these are fake tasks since clearly, researchers will know whether a bird is an isolated or a deaf bird, and they will also know the age of a bird, so no machine learning is needed to solve these tasks. Yet, the authors imagine that solving these placeholder tasks will somehow help with measuring important aspects of vocal behavior.  

      We appreciate this reviewer’s concerns and apologize for not providing sufficiently clear rationale for the inclusion of our phenotype classifier and age regression models in the original manuscript. These tasks are not intended to be taken as a final, ultimate culmination of the AVN pipeline. Rather, we consider the carefully engineered 55-interpretable feature set to be AVN’s final output, and these analyses serve merely as examples of how that feature set can be applied. That said, each of these models do have valid experimental use cases that we believe are important and would like to bring to the attention of the reviewer.

      For one, we showed how the LDA model that can discriminate between typical, deaf, and isolate birds’ songs not only allows us to evaluate which features are most important for discriminating between these groups, but also allows comparison of the FoxP1 knock-down (FP1 KD) birds to each of these phenotypes. Based on previous work (Garcia-Oscos et al. 2021), we hypothesized that FP1 KD in these birds specifically impaired tutor song memory formation while sparing a bird’s ability to refine their own vocalizations through auditory feedback. Thus, we would expect their songs to resemble those of isolate birds, who lack a tutor song memory, but not to resemble deaf birds who lack a tutor song memory and auditory feedback of their own vocalizations to guide learning. The LDA model allowed us to make this comparison quantitatively for the first time and confirm our hypothesis that FP1 KD birds’ songs are indeed most like isolates’. In the future, as more research groups publish their birds’ AVN feature sets, we hope to be able to make even more fine-grained comparisons between different groups of birds, either using LDA or other similar interpretable classifiers. 

      The age prediction model also has valid real-world use cases. For instance, one might imagine an experimental manipulation that is hypothesized to accelerate or slow song maturation in juvenile birds. This age prediction model could be applied to the AVN feature sets of birds having undergone such a manipulation to determine whether their predicted ages systematically lead or lag their true biological ages, and which song features are most responsible for this difference. We didn’t have access to data for any such birds for inclusion in this paper, but we hope that others in the future will be able to take inspiration from our methodology and use this or a similar age regression model with AVN features in their research. We have added a couple lines to the ‘Comparing Song Disruptions with AVN Features’ and ‘Tracking Song Development with AVN Features’ sections of the results to make this more clear. 

      Along similar lines, authors assume that a good measure of similarity is one that optimally performs repeated syllable detection (i.e. to discriminate same syllable pairs from different pairs). The authors need to explain why they think these placeholder tasks are good and why no better task can be defined that more closely captures what researchers want to measure. Note: the standard tasks for self-supervised learning are next word or masked word prediction, why are these not used here? 

      This reviewer appears to have misunderstood our similarity scoring embedding model and our rationale for using it. We will explain it in more depth here and have added a paragraph to the ‘Measuring Song Imitation’ section of the results explaining this rationale more briefly.

      First, nowhere are we training a model to discriminate between same and different syllable pairs. The triplet loss network is trained to embed syllables in an 8-dimensional space such that syllables with the same label are closer together than syllables with different labels. The loss function is related to the relative distance between embeddings of syllables with the same or different labels, not the classification of syllables as same or different. This approach was chosen because it has repeatedly been shown to be a useful data compression step (Schorff et al. 2015, Thakur et al. 2019) before further downstream tasks are applied on its output, particularly in contexts where there is little data per class (syllable label). For example, Schorff et al. 2015 trained a deep convolutional neural network with triplet loss to embed images of human faces from the same individual closer together than images of different individuals in a 128dimensional space. They then used this model to compute 128-dimensional representations of additional face images, not included in training, which were used for individual facial recognition (this is a same vs. different category classifier), and facial clustering, achieving better performance than the previous state of the art. The triplet loss function results in a model that can generate useful embeddings of previously unseen categories, like new individuals’ faces, or new zebra finches’ syllables, which can then be used in downstream analyses. This meaningful, lower dimensional space allows comparisons of distributions of syllables across birds, as in Brainard and Mets 2008, and Goffinet et al. 2021. 

      Next word and masked word prediction are indeed common self-supervised learning tasks for models working with text data, or other data with meaningful sequential organization. That is not the case for our zebra finch syllables, where every bird’s syllable sequence depends only on its tutor’s sequence, and there is no evidence for strong universal syllable sequencing rules (James et al. 2020). Rather, our embedding model is an example of a computer vision task, as it deals with sets of two-dimensional images (spectrograms), not sequences of categorical variables (like text). It is also not, strictly speaking, a selfsupervised learning task, as it does require syllable labels to generate the triplets. A common selfsupervised approach for dimensionality reduction in a computer vision task such as this one would be to train an autoencoder to compress images to a lower dimensional space, then faithfully reconstruct them from the compressed representation.  This has been done using a variational autoencoder trained on zebra finch syllables in Goffinet et al. 2021. In keeping with the suggestions from reviewers #1 and #2, we have included a comparison of our triplet loss model with the Goffinet et al. VAE approach in the revised manuscript. 

      (2) The machine learning methodology lacks rigor. The aims of the machine learning pipeline are extremely vague and keep changing like a moving target. Mainly, the deep networks are trained on some tasks but then authors evaluate their performance on different, disconnected tasks. For example, they train both the birdsong comparison method (L263+) and the song similarity method (L318+) on classification tasks. However, they evaluate the former method (LDA) on classification accuracy, but the latter (8-dim embeddings) using a contrast index. In machine learning, usually, a useful task is first defined, then the system is trained on it and then tested on a held-out dataset. If the sensitivity index is important, why does it not serve as a cost function for training?

      Again, this reviewer seems not to understand our similarity scoring methodology. Our similarity scoring model is not trained on a classification task, but rather on an embedding task. It learns to embed spectrograms of syllables in an 8-dimensional space such that syllables with the same label are closer together than syllables with different labels. We could report the loss values for this embedding task on our training and validation datasets, but these wouldn’t have any clear relevance to the downstream task of syllable distribution comparison where we are using the model’s embeddings. We report the contrast index as this has direct relevance to the actual application of the model and allows comparisons to other similarity scoring methods, something that the triplet loss values wouldn’t allow. 

      The triplet loss method was chosen because it has been shown to yield useful low-dimensional representations of data, even in cases where there is limited labeled training data (Thakur et al. 2019). While we have one of the largest manually annotated datasets of zebra finch songs, it is still quite small by industry deep learning standards, which is why we chose a method that would perform well given the size of our dataset. Training a model on a contrast index directly would be extremely computationally intensive and require many more pairs of birds with known relationships than we currently have access to. It could be an interesting approach to take in the future, but one that would be unlikely to perform well with a dataset size typical to songbird research. 

      Also, usually, in solid machine learning work, diverse methods are compared against each other to identify their relative strengths. The paper contains almost none of this, e.g. authors examined only one clustering method (HDBSCAN).  

      We did compare multiple methods for syllable segmentation (WhisperSeg, TweetyNet, and Amplitude thresholding) as this hadn’t been done previously. We chose not to perform extensive comparison of different clustering methods as Sainburg et al. 2020 already did so and we felt no need to reduplicate this effort. We encourage this reviewer to refer to Sainburg et al.’s excellent work for comparisons of multiple clustering methods applied to zebra finch song syllables.

      (3) Performance issues. The authors want to 'simplify large-scale behavioral analysis' but it seems they want to do that at a high cost. (Gu et al 2023) achieved syllable scores above 0.99 for adults, which is much larger than the average score of 0.88 achieved here (L121). Similarly, the syllable scores in (Cohen et al 2022) are above 94% (their error rates are below 6%, albeit in Bengalese finches, not zebra finches), which is also better than here. Why is the performance of AVN so low? The low scores of AVN argue in favor of some human labeling and training on each bird.  

      Firstly, the syllable error rate scores reported in Cohen et al. 2022 are calculated very differently than the F1 scores we report here and are based on a model trained with data from the same bird as was used in testing, unlike our more general segmentation approach where the model was tested on different birds than were used in training. Thus, the scores reported in Cohen et al. and the F1 scores that we report cannot be compared. 

      The discrepancy between the F1<sub>seg</sub> scores reported in Gu et al. 2023 and the segmentation F1 scores that we report are likely due to differences in the underlying datasets. Our UTSW recordings tend to have higher levels of both stationary and non-stationary background noise, which make segmentation more challenging. The recordings from Rockefeller were less contaminated by background noise, and they resulted in slightly higher F1 scores. That said, we believe that the primary factor accounting for this difference in scores with Gu et al. 2023 is the granularity of our ‘ground truth’ syllable segments. In our case, if there was never any ambiguity as to whether vocal elements should be segmented into two short syllables with a very short gap between them or merged into a single longer syllable, we chose to split them. WhisperSeg had a strong tendency to merge the vocal elements in ambiguous cases such as these. This results in a higher rate of false negative syllable onset detections, reflected in the low recall scores achieved by WhisperSeg (see Figure 2–figure supplement 1b), but still very high precision scores (Figure 2–figure supplement 1a). While WhisperSeg did frequently merge these syllables in a way that differed from our ground truth segmentation, it did so consistently, meaning it had little impact on downstream measures of syntax entropy (Figure 3c) or syllable duration entropy (Figure 3–figure supplement 2a). It is for that reason that, despite a lower F1 score, we still consider AVN’s automatically generated annotations to be sufficiently accurate for downstream analyses. 

      Should researchers require a higher degree of accuracy and precision with their annotations (for example, to detect very subtle changes in song before and after an acute manipulation) we suggest they turn toward one of the existing tools for supervised song annotation, such as TweetyNet.

      (4) Texas bias. It is true that comparability across datasets is enhanced when everyone uses the same code. However, the authors' proposal essentially is to replace the bias between labs with a bias towards birds in Texas. The comparison with Rockefeller birds is nice, but it amounts to merely N=1. If birds in Japanese or European labs have evolved different song repertoires, the AVN might not capture the associated song features in these labs well.  

      We appreciate the author’s concern about a bias toward birds from the UTSW colony. However, this paper shows that despite training (for the similarity scoring) and hyperparameter fitting (for the HDBSCAN clustering) on the UTSW birds, AVN performs as well if not better on birds from Rockefeller than from UTSW. To our knowledge, there are no publicly available datasets of annotated zebra finch songs from labs in Europe or in Asia but we would be happy to validate AVN on such datasets, should they become available. Furthermore, there is no evidence to suggest that there is dramatic drift in zebra finch vocal repertoire between continents which would necessitate such additional validation. While we didn’t have manual annotations for this dataset (which would allow validation of our segmentation and labeling methods), we did apply AVN to recordings shared with us by the Wada lab in Japan, where visual inspection of the resulting annotations suggested comparable accuracy to the UTSW and Rockefeller datasets. 

      (5) The paper lacks an analysis of the balance between labor requirement, generalizability, and optimal performance. For tasks such as segmentation and labeling, fine-tuning for each new dataset could potentially enhance the model's accuracy and performance without compromising comparability. E.g. How many hours does it take to annotate hundred song motifs? How much would the performance of AVN increase if the network were to be retrained on these? The paper should be written in more neutral terms, letting researchers reach their own conclusions about how much manual labor they want to put into their data.  

      With standardization and ease of use in mind, we designed AVN specifically to perform fully automated syllable annotation and downstream feature calculations. We believe that we have demonstrated in this manuscript that our fully automated approach is sufficiently reliable for downstream analyses across multiple zebra finch colonies. That said, if researchers require an even higher degree of annotation precision and accuracy, they can turn toward one of the existing methods for supervised song annotation, such as TweetyNet. Incorporating human annotations for each bird processed by AVN is likely to improve its performance, but this would require significant changes to AVN’s methodology, and is outside the scope of our current efforts.

      (6) Full automation may not be everyone's wish. For example, given the highly stereotyped zebra finch songs, it is conceivable that some syllables are consistently mis-segmented or misclassified. Researchers may want to be able to correct such errors, which essentially amounts to fine-tuning AVN. Conceivably, researchers may want to retrain a network like the AVN on their own birds, to obtain a more fine-grained discriminative method.  

      Other methods exist for supervised or human-in-the-loop annotation of zebra finch songs, such as TweetyNet and DAN (Alam et al. 2023). We invite researchers who require a higher degree of accuracy than AVN can provide to explore these alternative approaches for song annotation. Incorporating human feedback into AVN was never the goal of our pipeline, would require significant changes to AVN’s design and is outside the scope of this manuscript.

      (7) The analysis is restricted to song syllables and fails to include calls. No rationale is given for the omission of calls. Also, it is not clear how the analysis deals with repeated syllables in a motif, whether they are treated as two-syllable types or one.  

      It is true that we don’t currently have any dedicated features to describe calls. This could be a useful addition to AVN in the future. 

      What a human expert inspecting a spectrogram would typically call ‘repeated syllables’ in a bout are almost always assigned the same syllable label by the UMAP+HDBSCAN clustering. The syntax analysis module includes features examining the rate of syllable repetitions across syllable types, as mentioned in lines 222-226 of the revised manuscript. See https://avn.readthedocs.io/en/latest/syntax_analysis_demo.html#Syllable-Repetitions for further details.

      (8) It seems not all human annotations have been released and the instruction sets given to experts (how to segment syllables and score songs) are not disclosed. It may well be that the differences in performance between (Gu et al 2023) and (Cohen et al 2022) are due to differences in segmentation tasks, which is why these tasks given to experts need to be clearly spelled out. Also, the downloadable files contain merely labels but no identifier of the expert. The data should be released in such a way that lets other labs adopt their labeling method and cross-check their own labeling accuracy.  

      All human annotations used in this manuscript have indeed been released as part of the accompanying dataset. Syllable annotations are not provided for all pupils and tutors used to validate the similarity scoring, as annotations are not necessary for similarity comparisons. We have expanded our description of our annotation guidelines in the methods section of the revised manuscript. All the annotations were generated by one of two annotators. The second annotator always consulted with the first annotator in cases of ambiguous syllable segmentation or labeling, to ensure that they had consistent annotation styles. Unfortunately, we haven’t retained records about which birds were annotated by which of the two annotators, so we cannot share this information along with the dataset. The data is currently available in a format that should allow other research groups to use our annotations either to train their own annotation systems or check the performance of their existing systems on our annotations.  

      (9) The failure modes are not described. What segmentation errors did they encounter, and what syllable classification errors? It is important to describe the errors to be expected when using the method. 

      As we discussed in our response to this reviewer’s point (3), WhisperSeg has a tendency to merge syllables when the gap between them is very short, which explains its lower recall score compared to its precision on our dataset (Figure 2–figure supplement 1). In rare cases, WhisperSeg also fails to recognize syllables entirely, again impacting its precision score. TweetyNet hardly ever completely ignores syllables, but it does tend to occasionally merge syllables together or over-segment them. Whereas WhisperSeg does this very consistently for the same syllable types within the same bird, TweetyNet merges or splits syllables more inconsistently. This inconsistent merging and splitting has a larger effect on syllable labeling, as manifested in the lower clustering v-measure scores we obtain with TweetyNet compared to WhisperSeg segmentations. TweetyNet also has much lower precision than WhisperSeg, largely because TweetyNet often recognizes background noises (like wing flaps or hopping) as syllables whereas WhisperSeg hardly ever segments non-vocal sounds. 

      Many errors in syllable labeling stem from differences in syllable segmentation. For example, if two syllables with labels ‘a’ and ‘b’ in the manual annotation are sometimes segmented as two syllables, but sometimes merged into a single syllable, the clustering is likely to find 3 different syllable types; one corresponding to ‘a’, one corresponding to ‘b’ and one corresponding to ‘ab’ merged. Because of how we align syllables across segmentation schemes for the v-measure calculation, this will look like syllable ‘b’ always has a consistent cluster label (or is missing a label entirely), but syllable ‘a’ can carry two different cluster labels, depending on the segmentation. In certain cases, even in the absence of segmentation errors, a group of syllables bearing the same manual annotation label may be split into 2 or 3 clusters (it is extremely rare for a single manual annotation group to be split into more than 3 clusters). In these cases, it is difficult to conclusively say whether the clustering represents an error, or if it actually captured some meaningful systematic difference between syllables that was missed by the annotator. Finally, sometimes rare syllable types with their own distinct labels in the manual annotation are merged into a single cluster. Most labeling errors can be explained by this kind of merging or splitting of groups relative to the manual annotation, not to occasional mis-classifications of one manual label type as another.

      For examples of these types of errors, we encourage this reviewer and readers to refer to the example confusion matrices in figure 2f and Figure 2–figure supplement 3b&e. We also added two paragraphs to the end of the ‘Accurate, fully unsupervised syllable labeling’ section of the Results in the revised manuscript. 

      (10) Usage of Different Dimensionality Reduction Methods: The pipeline uses two different dimensionality reduction techniques for labeling and similarity comparison - both based on the understanding of the distribution of data in lower-dimensional spaces. However, the reasons for choosing different methods for different tasks are not articulated, nor is there a comparison of their efficacy.  

      We apologize for not making this distinction sufficiently clear in the manuscript and have added a paragraph to the ‘Measuring Song Imitation’ section of the Results explaining the rational for using an embedding model for similarity scoring. 

      We chose to use UMAP for syllable labeling because it is a common embedding methodology to precede hierarchical clustering and has been shown to result in reliable syllable labels for birdsong in the past (Sainburg et al. 2020). However, it is not appropriate for similarity scoring, because comparing EMD or MMD scores between birds requires that all the birds’ syllable distributions exist within the same shared embedding space. This can be achieved by using the same triplet loss-trained neural network model to embed syllables from all birds. This cannot be achieved with UMAP because all birds whose scores are being compared would need to be embedded in the same UMAP space, as distances between points cannot be compared across UMAPs. In practice, this would mean that every time a new tutor-pupil pair needs to be scored, their syllables would need to be added to a matrix with all previously compared birds’ syllables, a new UMAP would need to be computed, and new EMD or MMD scores between all bird pairs would need to be calculated using their new UMAP embeddings. This is very computationally expensive and quickly becomes unfeasible without dedicated high power computing infrastructure. It also means that similarity scores couldn’t be compared across papers without recomputing everything each time, whereas EMD and MMD scores obtained with triplet loss embeddings can be compared, provided they use the same trained model (which we provide as part of AVN) to embed their syllables in a common latent space. 

      (11) Reproducibility: are the measurements reproducible? Systems like UMAP always find a new embedding given some fixed input, so the output tends to fluctuate.

      There is indeed a stochastic element to UMAP embeddings which will result in different embeddings and therefore different syllable labels across repeated runs with the same input. We observed that v-measures scores were quite consistent within birds across repeated runs of the UMAP, and have added an additional supplementary figure to the revised manuscript showing this (Figure 2–figure supplement 4).

      Reviewer #1 (Recommendations For The Authors):

      (1) Benchmark their similarity score to the method used by Goffinet et al, 2021 from the Pearson group. Such a comparison would be really interesting and useful.  

      This has been added to the paper. 

      (2) Please clarify exactly what is new and what is applied from existing methods to help the reader see the novelty of the paper.  

      We have added more emphasis on the novel aspects of our pipeline to the paper’s introduction. 

      Minor:

      It's unclear if AVN is appropriate as the paper deals only with zebra finch song - the scope is more limited than advertised.

      We assume this is in reference to ‘Birdsong’ in the paper’s title and ‘Avian’ in Avian Vocalization Network. There is a brief discussion of how these methods are likely to perform on other commonly studied songbird species at the end of the discussion section.

      Reviewer #2 (Recommendations For The Authors):

      A few points for the authors to consider that might strengthen or inform the paper:

      (1) In the public review, I detailed some ways in which the SSL+EMD approach is unlikely to be appreciably distinct from the VAE+MMD approach -- in fact, one could mix and match here. It would strengthen the authors' claim if they showed via experiments that their method outperforms VAE+MMD, but in the absence of that, a discussion of the relation between the two is probably warranted.  

      This comparison has been added to the paper.

      (2) ll. 305-310: This loss of accuracy near the edge is expected on general Bayesian grounds. Any regression approach should learn to estimate the conditional mean of the age distribution given the data, so ages estimated from data will be pulled inward toward the location of most training data. This bias is somewhat mitigated in the Brudner paper by a more flexible model, but it's a general (and expected) feature of the approach.

      (3) While the online AVA documentation looks good, it might benefit from a page on design philosophy that lays out how the various modules fit together - something between the tutorials and the nitty-gritty API. That way, users would be able to get a sense of where they should look if they want to harness pieces of functionality beyond the tutorials.

      Thank you for this suggestion. We will add a page on AVN’s design philosophy to the online documentation. 

      (4) While the manuscript does compare AVN to packages like TweetyNet and AVA that share some functionality, it doesn't really mention what's been going on with the vocalpy ecosystem, where the maintainers have been doing a lot to standardize data processing, integrate tools, etc. I would suggest a few words about how AVN might integrate with these efforts.

      We thank the reviewer for this suggestion.

      (5) ll. 333-336: It would be helpful to provide a citation to some of the self-supervised learning literature this procedure is based on. Some citations are provided in methods, but the general approach is worth citing, in my opinion. 

      We have added a paragraph to the results section with more background on self-supervised learning for dimensionality reduction, particularly in the context of similarity scoring.

      (6) One software concern for medium-term maintenance: AVN docs say to use Python 3.8, and GitHub says the package is 3.9 compatible. I also saw in the toml file that 3.10 and above are not supported. It's worth noting that Python 3.9 reaches its end of life in October 2025, so some dependencies may have to be altered or changed for the package to be viable going forward.  

      Thank you for this comment. We will continue to maintain AVN and update its dependencies as needed.

      Minor points:

      (1) It might be good to note that WhisperSeg is a different install from AVN. May be hard for novice users, though there's a web interface that's available. 

      We’ve added a line to the methods section making this clear. 

      (2) Figure 6b: Some text in the y-axis labels is overlapping here. 

      This has been fixed. Thank you for bringing it to our attention. 

      (3) The name of the Python language is always capitalized.  

      We’ve fixed this capitalization error throughout the manuscript. Thank you.

      Reviewer #3 (Recommendations For The Authors):

      (1) I recommend that the authors improve the motivation of the chosen tasks and data or choose new tasks that more clearly speak to the optimizations they want to perform. 

      We have included more details about the motivation for our LDA classification analysis, age prediction model and embedding model for similarity scoring in the results of the revised manuscript, as discussed in more detail in the above responses to this reviewer. Thank you for these suggestions. 

      (2) They need to rigorously report the (classification) scores on the test datasets: these are the scores associated with the cost function used during training.  

      Based on this reviewer’s ‘Weaknesses: 3’ comment in the public reviews, we believe that they are referring to a classification score for the triplet loss model. As we explained in response to that comment, this is not a classification task, therefor there is no classification score to report. The loss function used to train the model was a triplet loss function. While we could report these values, they are not informative for how well this approach would perform in a similarity scoring context, as explained above. As such, we prefer to include contrast index and tutor contrast index scores to compare the models’ performance for similarity score, as these are directly relevant to the task and are established in the field for said task.

      (3) They need to explain the reasons for the poor performance (or report on the inconsistencies with previous work) and why they prefer a fully automated system rather than one that needs some fine-tuning on bird-specific data.

      We’ve addressed this comment in the public response to this reviewer’s weakness points 3, 5, and 6. 

      (4) They should consider applying their method to data from Japanese and European labs.  

      We’ve addressed this comment in the public response to this reviewer’s weakness point 4.

      (5) The need to document the failure modes and report all details about the human annotations.  

      We’ve added additional description of the failure modes for our segmentation and labeling approaches in the results section of the revised manuscript.

      Details: 

      The introduction is very vague, it fails to make a clear case of what the problem is and what the approach is. It reads a bit like an advertisement for machine learning: we are given a hammer and are looking for a nail.  

      We thank the reviewer for this viewpoint; however, we disagree and have decided to keep our Introduction largely unchanged. 

      L46 That interpretability is needed to maximize the benefits of machine learning is wrong, see self-driving cars and chat GPT.  

      This line states that ‘To truly maximize the benefits of machine learning and deep learning methods for behavior analysis, their power must be balanced with interpretability and generalizability’. We firmly believe that interpretability is critically important when using machine learning tools to gain a deeper scientific understanding of data, including animal behavior data in a neuroscience context. We believe that the introduction and discussion of this paper already provide strong evidence for this claim. 

      L64 What about zebra finches that repeat a syllable in the motif, how are repetitions dealt with by AVN?  

      This is already described in the results section in lines 222-226, and in the methods in the ‘Syntax Features: Repetition Bouts’ section.

      L107 Say a bit more here, what exactly has been annotated?  

      We’ve added a sentence in the introduction to clarify this. Line 113-115. 

      L112 Define spectrogram frames. Do these always fully or sometimes partially contain a vocalization? 

      Spectrogram frames are individual time bins used to compute the spectrogram using a short-term Fourier transform. As described in the ‘Methods; Labeling : UMAP Dimensionality Reduction” section, our spectrograms are computed using ‘The short term Fourier transform of the normalized audio for each syllable […] with a window length of 512 samples and a hop length of 128 samples’. Given that the song files have a standard sampling rate of 44.1kHz, this means each time bin represents 11.6ms of song data, with successive frames advancing in time by 2.9ms. These contain only a small fraction of a vocalization. 

      L122 The reported TweetyNet score of 0.824 is lower than the one reported in Figure 2a.  

      The center line in the box plot in Figure 2a represents the median of the distribution of TweetyNet vmeasure scores. Given that there are a couple outlying birds with very low scores, the mean (0.824 as reported in the text of the results section) is lower than the median. This is not an error.

      L155 Some of the differences in performance are very small, reporting of the P value might be necessary. 

      These methods are unlikely to statistically significantly differ in their validation scores. This doesn’t mean that we cannot use the mean/median values reported to justify favoring one method over another. This is why we’ve chosen not to report p-values here.

      L161 The authors have not really tested more than a single clustering method, failing to show a serious attempt to achieve good performance.  

      We’ve addressed this comment in the public response to this reviewer’s weakness point 2.

      L186 Did isolate birds produce stereotyped syllables that can be clustered? 

      Yes, they did. The validation for clustering of isolate bird songs can be found in Figure 2–figure supplement 4. 

      Fig. 3e: How were the multiple bouts aligned?

      This is described in lines 857-876 in the ‘Methods: Song Timing Features: Rhythm Spectrograms” section of the paper.

      L199 There is a space missing in front of (n=8).  

      Thank you for bringing this to our attention. It’s been corrected in the updated manuscript. 

      L268 Define classification accuracy.  

      We’ve added a sentence in lines 953-954 of the methods section defining classification accuracy. 

      L325 How many motifs need to be identified, why does this need to be done manually? There are semiautomated methods that can allow scaling, these should be  cited here. Also, the mention of bias here should be removed in favor of a more extensive discussion on the experimenter bias (traditionally vs Texas bias (in this paper).  

      All of the methods cited in this line have graphical user interfaces that require users to select a file containing song and manually highlight the start and end each motif to be compared. The exact number of motifs required varies depending on the specific context (e.g. more examples are needed to detect more subtle differences or changes in song similarity) but it is fairly standard for reviewers to score 30 – 100 pairs of motifs. 

      We’ve discussed the tradeoffs between full automation and supervised or human-in-the loop methods in response to this reviewer’s public comment ‘weakness #5 and 6’. Briefly, AVN’s aim is to standardize song analysis, to allow direct comparisons between song features and similarity scores across research groups. We believe, as explained in the paper, that this can be best achieve by having different research groups use the same deep learning models, which perform consistently well across those groups. Introducing semi-automated methods would defeat this benefit of AVN. 

      We’ve also addressed the question of ‘Texas bias’ in response to their reviewer’s public comment ‘Weakness #4’. 

      L340 How is EMD applied? Syllables are points in 8-dim space, but now suddenly authors talk about distributions without explaining how they got from points to distributions. Same in L925.  

      We apologize for the confusion here. The syllable points in the 8-d space are collectively an empirical distribution, not a probability distribution. We referred to them simply as ‘distributions’ to limit technical jargon in the results of the paper, but have changed this to more precise language in the revised manuscript.

      L351 Why do authors now use 'contrast index' to measure performance and no longer 'classification accuracy'?  

      We’ve addressed this comment in the public response to this reviewer’s weakness points 1 and 2.

      Figure 6 What is the confusion matrix, i.e. how well can the model identify pupil-pupil pairings from pupiltutor and from pupil-unrelated pairings? I guess that would amount to something like classification accuracy.  

      There is no model classifying comparisons as pupil-pupil vs. pupil-tutor etc. These comparisons exist only to show the behavior of the similarity scoring approach, which consists of a dissimilarity measure (MMD or EMD) applied to low dimensional representations of syllable generated by the triplet loss model or VAE. This was clarified further in our public response to this reviewer’s weakness points 1 and 2. 

      L487 What are 'song files', and what do they contain?   

      ‘Song files’ are .wav files containing recordings of zebra finch song. They typically contain a single song bout, but they can include multiple song bouts if they are produced close together, or incomplete song bouts if the introductory notes were very soft or the bouts were very long (>30s from the start of the file). Details of these recordings are provided in the ‘Methods: Data Acquisition: UTSW Dataset’ section of the manuscript.

      L497 Calls were only labelled for tweetynet but not for other tasks.  

      That is correct. The rationale for this is provided in the ‘Methods: Manual Song Annotation’ section of the manuscript. 

      L637 There is a contradiction (can something be assigned to the 'own manual annotation category' when the same sentence states that this is done 'without manual annotation'?) 

      We believe there is confusion here between automated annotation and validation. Any bird can be automatically annotated without the need for any existing manual annotations for that individual bird. However, manual labels are required to compare automatically generated annotations against for validation of the method.

      L970 Spectograms of what? (what is the beginning of a song bout, L972). 

      The beginning of a song bout is the first introductory note produced by a bird after a period without vocalizations. This is standard.

    1. Reviewer #3 (Public review):

      Summary:

      The aim of this study was to investigate the temporal progression of the neural response to event boundaries in relation to uncertainty and error. Specifically, the authors asked (1) how neural activity changes before and after event boundaries, (2) if uncertainty and error both contribute to explaining the occurrence of event boundaries, and (3) if uncertainty and error have unique contributions to explaining the temporal progression of neural activity.

      Strengths:

      One strength of this paper is that it builds on an already validated computational model. It relies on straightforward and interpretable analysis techniques to answer the main question, with a smart combination of pattern similarity metrics and FIR. This combination of methods may also be an inspiration to other researchers in the field working on similar questions. The paper is well written and easy to follow. The paper convincingly shows that (1) there is a temporal progression of neural activity change before and after an event boundary, and (2) event boundaries are predicted best by the combination of uncertainty and error signals.

      Weaknesses:

      Regarding question 3, I am less convinced by the results. They show that overlapping but somewhat distinct sets of brain regions relate to uncertainty and error boundaries over time. And that some regions show distinct patterns of temporal progressions in pattern change with both types of boundaries. However, most of the effects they observe in this analysis may still be driven by shared variance, as suggested by the results in Figure 6 and the high correlation between the two boundary time series. More specific comments are provided below.

      Impact:

      If these comments can be addressed sufficiently, I expect that this work will impact the field in its thinking on what drives event boundaries and spur interest in understanding the mechanisms behind the temporal progression of neural activity around these boundaries.

      Comments

      (1) The current analysis of the neural data does not convincingly show that uncertainty and prediction error both contribute to the neural responses. As both terms are modelled in separate FIR models, it may be that the responses we see for both are mostly driven by shared variance. Given that the correlation between the two is very high (r=0.49), this seems likely. The strong overlap in the neural responses elicited by both, as shown in Figure 6, also suggests that what we see may mainly be shared variance. To improve the interpretability of these effects, I think it is essential to know whether uncertainty and error explain similar or unique parts of the variance. The observation that they have distinct temporal profiles is suggestive of some dissociation, but not as convincing as adding them both to a single model.

      (2) The results for uncertainty and error show that uncertainty has strong effects before or at boundary onset, while error is related to more stabilization after boundary onset. This makes me wonder about the temporal contribution of each of these. Could it be the case that increases in uncertainty are early indicators of a boundary, and errors tend to occur later?

      (3) Given that there is a 24-second period during which the neural responses are shaped by event boundaries, it would be important to know more about the average distance between boundaries and the variability of this distance. This will help establish whether the FIR model can properly capture a return to baseline.

      (4) Given that there is an early onset and long-lasting response of the brain to these event boundaries, I wonder what causes this. Is it the case that uncertainty or errors already increase at 12 seconds before the boundaries occur? Or if there are other makers in the movie that the brain can use to foreshadow an event boundary? And if uncertainty or errors do increase already 12 seconds before an event boundary, do you see a similar neural response at moments with similar levels of error or uncertainty, which are not followed by a boundary? This would reveal whether the neural activity patterns are specific to event boundaries or whether these are general markers of error and uncertainty.

      (5) It is known that different brain regions have different delays of their BOLD response. Could these delays contribute to the propagation of the neural activity across different brain areas in this study?

      (6) In the FIR plots, timepoints -12, 0, and 12 are shown. These long intervals preclude an understanding of the full temporal progression of these effects.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      This study investigates how collective navigation improvements arise in homing pigeons. Building on the Sasaki & Biro (2017) experiment on homing pigeons, the authors use simulations to test seven candidate social learning strategies of varying cognitive complexity, ranging from simple route averaging to potentially cognitively demanding selective propagation of superior routes. They show that only the simplest strategy-equal route averaging-quantitatively matches the experimental data in both route efficiency and social weighting. More complex strategies, while potentially more effective, fail to align with the observed data. The authors also introduce the concept of "effective group size," showing that the chaining design leads to a strong dilution of earlier individuals' contributions. Overall, they conclude that cognitive simplicity rather than cumulative cultural evolution explains collective route improvements in pigeons.

      Strengths:

      The manuscript addresses an important question and provides a compelling argument that a simpler hypothesis is necessary and sufficient to explain findings of a recent influential study on pigeon route improvements, via a rigorous systematic comparison of seven alternative hypotheses. The authors should be commended for their willingness to critically re-examine established interpretations. The introduction and discussion are broad and link pigeon navigation to general debates on social learning, wisdom of crowds, and CCE.

      We thank the reviewer for their positive comments.

      Weaknesses:

      The lack of availability of codes and data for this manuscript, especially given that it critically examines and proposes alternative hypotheses for an important published work.

      We thank the reviewer for their comment. The code and data for our manuscript are an important aspect of the study, and we had intended to make them publicly available upon publication. The link to our code and data on figshare can be found here: (https://doi.org/10.6084/m9.figshare.28950032.v1). We will further add this link to the Data Availability Statement of our revised version.  

      Reviewer #2 (Public review):

      Summary:

      The manuscript investigates which social navigation mechanisms, with different cognitive demands, can explain experimental data collected from homing pigeons. Interestingly, the results indicate that the simplest strategy - route averaging - aligns best with the experimental data, while the most demanding strategy - selectively propagating the best route - offers no advantage. Further, the results suggest that a mixed strategy of weighted averaging may provide significant improvements.

      The manuscript addresses the important problem of identifying possible mechanisms that could explain observed animal behavior by systematically comparing different candidate models. A core aspect of the study is the calculation of collective routes from individual bird routes using different models that were hypothesized to be employed by the animals, but which differ in their cognitive demands.

      The manuscript is well-written, with high-quality figures supporting both the description of the approach taken and the presentation of results. The results should be of interest to a broad community of researchers investigating (collective) animal behavior, ranging from experiment to theory. The general approach and mathematical methods appear reasonable and show no obvious flaws. The statistical methods also appear.

      Strengths:

      The main strength of the manuscript is the systematic comparison of different meta-mechanisms for social navigation by modeling social trajectories from solitary trajectories and directly comparing them with experimental results on social navigation. The results show that the experimentally observed behavior could, in principle, arise from simple route averaging without the need to identify "knowledgeable" individuals. Another strength of the work is the establishment of a connection between social navigation behavior and the broader literature on the wisdom of crowds through the concept of effective group size.

      We thank the reviewer for their positive comments.

      Weaknesses:

      However, there are two main weaknesses that should be addressed:

      (1) The first concerns the definition of "mechanism" as used by the authors, for example, when writing "navigation mechanism." Intuitively, one might assume that what is meant is a behavioral mechanism in the sense of how behavior is generated as a dynamic process. However, here it is used at a more abstract (meta) level, referring to high-level categories such as "averaging" versus "leader-follower" dynamics. It is not used in the sense of how an individual makes decisions while moving, where the actual route followed in a social context emerges from individuals navigating while simultaneously interacting with conspecifics in space and time. In the presented work, the approach is to directly combine (global) route data of solitary birds according to the considered "meta-mechanisms" to generate social trajectories. Of course, this is not how pigeon social navigation actually works-they do not sit together before the flight and say, "This is my route, this is your route, let's combine them in this way." A mechanistic modeling approach would instead be some form of agent-based model that describes how agents move and interact in space and time. Such a "bottom-up" approach, however, has its drawbacks, including many unknown parameters and often strongly simplifying (implicit) assumptions. I do not expect the authors to conduct agent-based modeling, but at the very least, they should clearly discuss what they mean by "mechanism" and clarify that while their approach has advantages-such as naturally accounting for the statistical features of solitary routes and allowing a direct comparison of different meta-mechanisms is also limited, as it does not address how behavior is actually generated. For example, the approach lacks any explicit modeling of errors, uncertainty, or stochasticity more broadly (e.g., due to environmental influences). Thus, while the presented study yields some interesting results, it can only be considered an intermediate step toward understanding actual behavioral mechanisms.

      We thank the reviewer for their comment and thoughtful suggestions. We agree that the inherent behavioral mechanisms and the biological basis of these mechanisms cannot be determined just through the navigational data alone. For instance, it remains unexplored if pigeons are adapting their behavior based only on social cues from their partners or using other navigational features such as landmarks or roads, location of the sun, geomagnetic cues or prior learnt routes. However, we do agree (as also pointed by the reviewer) that these behavioral rules generate an emergent ‘meta-mechanism’ where the bird pairs are behaving as if their preferred routes are averaged during a flight. It will be important in future work to explore the biological basis of these mechanisms, but our current approach allows us to only describe the mechanisms in a meta sense with any confidence. Considering this, we believe that our analysis is a more top-down approach towards describing the outcomes of these underlying mechanisms in an abstract sense. We would also like to point the reviewer to Dalmaijer, 2024 [1] who used a bottom up approach, using naive agents and showed that cumulative route improvements emerged in the absence of any sophisticated communication in the same dataset, in agreement with our approach. Considering these points, we will make changes in our revised version to clearly elaborate on what the definition of ‘mechanism’ should include in line with the reviewer’s feedback.

      (2) While the presented study raises important questions about the applicability and viability of cumulative cultural evolution (CCE) in explaining certain animal behaviors such as social navigation, I find that it falls short in discussing them. What are the implications regarding the applicability of CCE to animal data and to previously claimed experimental evidence for CCE? Should these experiments be re-analyzed or critically reassessed? If not, why? What are good examples from animal behavior where CCE should not be doubted? Furthermore, what about the cited definitions and criteria of CCE? Are they potentially too restrictive? Should they be revised-and if so, how? Conversely, if the definitions become too general, is CCE still a useful concept for studying certain classes of animal behavior? I think these are some of the very important questions that could be addressed or at least raised in the discussion to initiate a broader debate within the community.

      We thank the reviewer for their comments and interesting questions regarding our study. We agree with the reviewer that our study opens up new avenues for critically analysing the criteria previous studies have used for providing evidence of CCE in non-human animals. According to our literature review, we found that the field has been usually motivated in thinking about CCE in a ‘process’ focused manner (Reindl et al. [2]) in regards to individuals being able to compare strategies and selecting ones resulting in higher individual fitness. This preferential selection of strategies – termed innovations — allows for the stereotypical ratcheting effect seen in CCE. In our study, we propose that in the case of homing pigeons, the ratcheting effect is more of a statistical outcome rather than deliberate individual judgement. We believe that this strategy is also amenable to certain task types (which in our study was homing route choice) and may change for others (for example solving a puzzle box) and the task also needs to be sufficiently complex for animals to benefit from the use of social information (Caldwell et al. 2008 [3]). Thus, we recommend future work to address what classes of problems would fit well within the definition of “emergent” CCE and which ones don’t. Keeping this framework in mind, studies should clearly state what definition of CCE they are using and should be critically evaluated for their underlying task type and cognitive mechanisms to deem them as CCE. Considering these points we will expand our discussion to highlight these key questions that could be critical to think upon for future research.

      References:

      (1) Dalmaijer ES (2024) Cumulative route improvements spontaneously emerge in artificial navigators even in the absence of sophisticated communication or thought. PLoS Biol. 22:e3002644.

      (2) Reindl, E., Gwilliams, A.L., Dean, L.G. et al. (2020) Skills and motivations underlying children’s cumulative cultural learning: case not closed. Palgrave Commun 6, 106.

      (3) Caldwell CA, Millen AE (2008) Studying cumulative cultural evolution in the laboratory. Phil. Trans. R. Soc. B 363:3529-3539.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      In this manuscript by Lopez-Blanch and colleagues, 21 microexons are selected for a deep analysis of their impacts on behavior, development, and gene expression. The authors begin with a systematic analysis of microexon inclusion and conservation in zebrafish and use these data to select 21 microexons for further study. The behavioral, transcriptomic, and morphological data presented are for the most part convincing. Furthermore, the discussion of the potential explanations for the subtle impacts of individual microexon deletions versus lossof-function in srrm3 and/or srrm4 is quite comprehensive and thoughtful. One major weakness: data presentation, methods, and jargon at times affect readability / might lead to overstated conclusions. However, overall this manuscript is well-written, easy to follow, and the results are of broad interest.

      We thank the Reviewer for their positive comments on our manuscript. In the revised version, we will try to improve readability, reduce jargon and avoid overstatements.  

      Strengths:

      (1) The study uses a wide variety of techniques to assess the impacts of microexon deletion, ranging from assays of protein function to regulation of behavior and development.

      (2) The authors provide comprehensive analyses of the molecular impact of their microexon deletions, including examining how host-gene and paralog expression is affected.

      Weaknesses:

      Major Points:

      (1) According to the methods, it seems that srrm3 social behavior is tested by pairing a 3mpf srrm3 mutant with a 30dpf srrm3 het. Is this correct? The methods seem to indicate that this decision was made to account for a slower growth rate of homozygous srrm3 mutant fish. However, the difference in age is potentially a major confound that could impact the way that srrm3 mutants interact with hets and the way that srrm3 mutants interact with one another (lower spread for the ratio of neighbour in front value, higher distance to neighbour value). This reviewer suggests testing het-het behavior at 3 months to provide age-matched comparisons for del-del, testing age-matched rather than size-matched het-del behavior, and also suggests mentioning this in the main text / within the figure itself so that readers are aware of the potential confound.

      Thank you for bringing up this point. For the tests shown in Figure 5, we indeed decided to match the pairs involving srrm3 mutant fish by fish size since we reasoned this would be more comparable to the other lines, both biologically and methodologically (in terms of video tracking, etc.). However, we are confident the results would be very similar if matched by age, since the differences in social interactions between the srrm3 homozygous mutants and their control siblings are very dramatic at any age. As an example, this can be appreciated, in line with the Reviewer's suggestion, in Videos S2 and S3, which show groups of five 5 mpf fish that are either srrm3 mutant or wild type. It can be observed that the behavior of 5 mpf WT fish (Video S3) is very similar to those of 1 mpf WT fish pairs, with very small interindividual distances, while the difference with repect to the srrm3 mutant group (Video S2) is dramatic. We nonetheless agree that this decision on the experimental design should be clearly stated in the main text and figure legend and we have done so in the revised version.

      (2) Referring to srrm3+/+; srrm4-/- controls for double mutant behavior as "WT for simplicity" is somewhat misleading. Why do the authors not refer to these as srrm4 single mutants?

      This comment applies to Figure 4 as well as the associated figure supplements. We reasoned that this made the understanding of plots easier, but the Reviewer is correct that it can be misleading. As a middle ground, we have now changed Figure 4 to follow the nomenclature of Figure 3D (WD, HD, DD), which is further explained in the legend, but kept the original format in the figure supplements for consistency with the (many) other plots in those figures.

      (3) It's not completely clear how "neurally regulated" microexons are defined / how they are different from "neural microexons"? Are these terms interchangeable?

      Yes, they are interchangeable. We have now double checked the wording to avoid confusion and for consistency.

      (4) Overexpression experiments driving srrm3 / srrm4 in HEK293 cells are not described in the methods.

      We apologized for this omission. We now briefly describe the data and asscoiated methods in more detail in the revised version; however, please note that the data was obtained from a previous publication (Torres-Mendez et al, 2019), where the detailed methodology is reported.

      (5) Suggest including more information on how neurite length was calculated. In representative images, it appears difficult to determine which neurites arise from which soma, as they cross extensively. How was this addressed in the quantification?

      We have added further details to the revised version. With regards to the specific question, we would like to mention that this has not been a very common issue for the time points used in the manuscript (10 hap and 24 hap). At those stages, it was nearly always evident how to track each individual neurite. Dubious cases were simply ignored and not measured, as we aimed for 100 neurites per well. Of course, such complex cases become much more common at later time points (48 and 72 hap), which were not used in this study.

      Reviewer #2 (Public review):

      Summary:

      This manuscript explores in zebrafish the impact of genetic manipulation of individual microexons and two regulators of microexon inclusion (Srrm3 and Srrm4). The authors compare molecular, anatomical, and behavioral phenotypes in larvae and juvenile fish. The authors test the hypothesis that phenotypes resulting from Srrm3 and 4 mutations might in part be attributable to individual microexon deletions in target genes.

      The authors uncover substantial alterations in in vitro neurite growth, locomotion, and social behavior in Srrm mutants but not any of the individual microexon deletion mutants. The individual mutations are accompanied by broader transcript level changes which may resemble compensatory changes. Ultimately, the authors conclude that the severe Srrm3/4 phenotypes result from additive and/or synergistic effects due to the de-regulation of multiple microexons.

      Strengths:

      The work is carefully planned, well-described, and beautifully displayed in clear, intuitive figures. The overall scope is extensive with a large number of individual mutant strains examined. The analysis bridges from molecular to anatomical and behavioral read-outs. Analysis appears rigorous and most conclusions are well-supported by the data.

      Overall, addressing the function of microexons in an in vivo system is an important and timely question.

      Weaknesses:

      The main weakness of the work is the interpretation of the social behavior phenotypes in the Srrm mutants. It is difficult to conclude that the mutations indeed impact social behavior rather than sensory processing and/or vision which precipitates apparent social alterations as a secondary consequence. Interpreting the phenotypes as "autism-like" is not supported by the data presented.

      The Reviewer is absolutely right. It was not our intention to imply that these social defects should be interpreted simply as autistic-like. It is indeed very likely that the main reason for the social alterations displayed by the srrm3 mutants is their impaired vision. We have now added this discussion point explicitly in the revised version. 

      Reviewer #3 (Public review):

      Summary:

      Microexons are highly conserved alternative splice variants, the individual functions of which have thus far remained mostly elusive. The inclusion of microexons in mature mRNAs increases during development, specifically in neural tissues, and is regulated by SRRM proteins. Investigation of individual microexon function is a vital avenue of research since microexon inclusion is disrupted in diseases like autism. This study provides one of the first rigorous screens (using zebrafish larvae) of the functions of individual microexons in neurodevelopment and behavioural control. The authors precisely excise 21 microexons from the genome of zebrafish using CRISPR-Cas9 and assay the downstream impacts on neurite outgrowth, larvae motility, and sociality. A small number of mild phenotypes were observed, which contrasts with the more dramatic phenotypes observed when microexon master regulators SRRM3/4 are disrupted. Importantly, this study attempts to address the reasons why mild/few phenotypes are observed and identify transcriptomic changes in microexon mutants that suggest potential compensatory gene regulatory mechanisms.

      Strengths:

      (1) The manuscript is well written with excellent presentation of the data in the figures.

      (2) The experimental design is rigorous and explained in sufficient detail.

      (3) The identification of a potential microexon compensatory mechanism by transcriptional alterations represents a valued attempt to begin to explain complex genetic interactions.

      (4) Overall this is a study with a robust experimental design that addresses a gap in knowledge of the role of microexons in neurodevelopment.

      Thank you very much for your positive comments to our manuscript.

      Reviewer #1 (Recommendations for the authors):

      Minor Suggestions

      (1) Axes are often scaled differently even between panels in the same figure. For example in Figure 5 - supplement 10, the srrm3_17 y axis scales from 0-20, while the neighboring panels scale from ~1-2.5. This somewhat underrepresents the finding that srrm3 mutants have much larger inter-individual distances. Similarly, in the panel above (src_1), the y-axis is scaled to include a single point around 17cm. As a result, it appears at first glance that the src_1 trials resulted in much lower inter-individual distance. Suggest scaling all of these the same to improve readability.

      While the Reviewer is certainly correct, after careful consideration we decided to have autoscaled axis to prioritize within-plot visualization (i.e. among genotypes within an experiment) than across plots (i.e. among experiments and lines).

      (2) Attention to italicizing gene names.

      Thanks.

      (3) In many points in the methods, we are instructed to "see below." Suggest directing the reader to a particular section heading.

      We found only one such instance, and we directed the reader to the specific section, as suggested.

      (4) In Methods, remove "in the corpus callosum." This is not an accurate descriptor for the site at which Mauthner axons cross.

      This is absolutely correct, apologies for this mistake.

      Clarify:

      (1) In the results section, "tissue-specific regulation was validated..." - suggest mentioning that this was performed in adult tissues / describe dissection in the methods.

      Added.

      (2) In the results section, the meaning of "no event ortholog" is not clear. Does this mean that a microexon does not have a human homolog? If so, suggest stating more clearly.

      Correct. We have added addition information.

      (3) In the results, the authors state that 78% of microexons are affected by srrm3/4 loss-offunction. Suggest stating the method used here (e.g. RNA-seq in mutants as compared to siblings)

      Added.

      (4) It is not clear what "siblings for the main founders means" for example in 3D. Is this effectively the analysis of microexon knockouts across multiple independent lines? Are the lines pooled for stats, for example in 3C?

      The main founder correspond to that listed as _1 and as default for experiments when only one found is used. We now explicitely state this.  

      For 3C, the lines are not pooled for stats; the stats correspond only to the main founder for each line. However, for each main founder line, multiple experiments are usually analyzed together and the stats are done taking their data structure into account (i.e. not simply pooling the values).

      (5) The purpose and a general description of NanoBRET assays should be included in the results.

      We added the main purpose of the NanoBRET assays (testing protein-protein interactions).

      (6) Specify that baseline behavior is analyzed in the light.

      Added.

      (7) In Figure 4A, adult fish are schematized being placed into a 96-well plate. Suggest using the larval diagram as in Figure 6 for accuracy.

      Done.

      (8) In Figure 4, plot titles could be made more accessible, especially in 4 F. Suggest removing extraneous information / italicizing gene names, etc. In G, suggest writing out Baseline, Dark, and Light to make it more accessible. Same in 4B.

      We have implemented some of the suggestions. In particular, italics were not used, since we are referring to the founder line, not the gene.

      (9) Figure 6 legend B - after (barplots), suggest inserting the word "and", to make clear that barplots indicate host gene *and* closely related paralogs are indicated by dots.

      Done.

      (10) In methods: "To better capture all microexons..." This sentence is difficult to understand. Suggested edit: "we excluded *from our calculation?* tissues with known or expected partial overlap... from comparison (for example, ...).

      Done.

      (11) In the methods, "which were defined with similar parameters but -min_rep 2." Suggest spelling this out, e.g. "with similar parameters, but requiring sufficient read coverage in at least n=2 samples per valid tissue group, whereas we only required one.".

      Done.

      (12) RNA was extracted for event and knockout validations. What does event mean here?

      Event refers to the validation of the exon regulatory pattern in WT tissues. We added this information.

      Provide definitions for abbreviations:

      (1) (Figure 6) Delta corrected VST Expression.

      Done.

      (2) "Mic-hosting genes" paralogs.

      Done.

      (3) In Figure 1F, "emic" is not defined.

      Done.

      Misspellings:

      All corrected.

      (1) Figure 6B (percentile is spelled percentil).

      (2) Figure 6B legend (bottom or top decile*).

      (3) Figure 6D - Schizophrenia* genes.

      (4) In Zebrafish husbandry and genotyping: suggest "srrm3 mutants grew more slowly.".

      (5) In results, "reduced body size at 90pdf" > 90dpf.

      Reviewer #2 (Recommendations for the authors):

      (1) Characterization of microexon mutants (Figure 2): The semi-quantitative PCR with flanking primers (Figure 2, supplement1) is well-suited to assess successful deletion of the exon and enables detection of potential mis-splicing around the alternative segment. However, it does not quantify the impact on total transcript levels. The authors should complement those experiments with qPCR measures of the transcript levels - otherwise, it is difficult to link mutant phenotypes to isoforms (as opposed to alterations in the level of gene expression). This point is somewhat addressed in Figure 6 by the RNA Seq analysis but it might help to add data specifically in Figure 2.

      As the Reviewer says, this point is explicitely addressed in Figure 6, where were show the change in the host gene's expression that follows the the removal of some microexons. We prefer to keep this in Figure 6, for consistency, as we believe this is not a direct (regulatory) consequence of the removal, but more likely a compensation effect.

      (2) Social behavior alterations in juvenile fish: The authors report "increased leadership" in Srrm3 mutant fish. However, these fish have impaired vision. Thus, "increased leadership" may simply reflect the fact that they do not perceive their conspecifics and, thus, do not follow them. The heterozygous conspecific will then mostly follow the Srrm3 mutant which appears as the mutant exhibiting an increase in leadership. Figure 5D suggests that Srrm3 del and het fish have the same ratio of "neighbor in front" which would be consistent with the hypothesis that the change in this metric is a consequence of a loss of following behavior due to a loss of vision. The authors should either adjust the discussion of this point or assess with additional experiments whether this is indeed a "social phenotype" or rather a secondary consequence of a loss of vision.

      The Reviewer is absolutely correct, and we have thus modified the short discussion directly related to these patterns.

      (3) The discussion centers on potential reasons why only mild phenotypes are observed in the single microexon mutants. One caveat of the phenotypic analysis provided in the manuscript is that it does not very deeply explore the phenotypic space of neuronal morphologies or circuit function. The behavioral and anatomical read-outs are rather coarse. There are no experiments exploring fine-structure of neuronal projections in vivo or synapse number, morphology, or function. Moreover, no attempts are made to explore which cell types normally express the microexons to potentially focus the loss-of-function analysis to these specific cell types. Of course, such analysis would substantially expand the scope of a study that already covers a large number of mutant alleles. However, the authors may want to add a discussion of these limitations in the manuscript.

      The Reviewer is correct. We aimed at covering this when referring to "(i) we may not be assessing the traits that these microexons are impacting, (ii) we may not have the sensitivity to robustly measure the magnitude of the changes caused by microexon removal". We have now added some of the specific points raised by the Reviewer as examples.

      (4) Note typos in Figure 6D: "schizoFrenia", "WNT signIalling"

      Done.

      Reviewer #3 (Recommendations for the authors):

      I only have a few minor suggestions for the authors.

      (1) It is interesting that a not insignificant number of microexon deletions (3/21) result in cryptic inclusions of intron fragments, and perhaps alludes to an as yet unreported molecular function of microexons in the regulation of host gene expression. Is it possible that microexon inclusion in these 3 genes could be important for expression? I think this requires some further discussion, as (if I'm not mistaken) microexons have thus far only been hypothesised to act as modulators of protein function, not as gene regulatory units.

      While we see that microexon removal can impact expression of the host gene (Figure 6), this is likely a compensatory mechanism (or so we suggest). We do not think these three cases are related to a putative physiological regulation, since the cryptic exons appear only in the deletion line. On the contrary, we think these are "regulatory artifacts" that originate in the nonWT mutated context. I.e. we removed the exon but some splicing signals remained in the intron, which are then recoginized by the spliceosome that incorrectly includes a different piece of the intron.

      (2) The flow of the text accompanying the molecular investigation of microexon function for evi5b and vav in Figure 3 could be improved. The text currently fades out with a speculative explanation for the lack of evi5b interaction phenotype. This final sentence could be moved to the discussion and replaced with a more general summary of the data.

      We have now swapped the order in which these results are described and leave out the discussion about evi5b's microexon function.

      (3) Is this a co-submission with Calhoun et al? If so, both papers should reference each other in the discussion and discuss the relative contributions of each.

      Done

      (4) "1 × 104 cells" in methods Nanobret paragraph should be superscript.

      Done

    1. Cyrus conquered Babylon bloodlessly and became a sort of patron of the Jews. This relationship may have enhanced the influence of Cyrus' religion, Zoroastrianism, on the development of Jewish monotheism, as we will discuss shortly. Cyrus also planned and began building infrastructure like the Royal Road.

      Cyrus is such a fascinating leader! He conquered Babylon without bloodshed, supported the Jews, and even started building amazing projects like the Royal Road. It’s wild to think how his actions might have even influenced the development of Jewish monotheism!

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1

      Evidence, reproducibility and clarity

      SUMMARY

      In this study, Fernandes and colleagues addressed the question of the role of micro-RNAs in regulating the coupling between organ growth and developmental timing. Using Drosophila, they identified the conserved micro-RNA miR-184 as a regulator of the developmental transition between juvenile larval stages and metamorphosis. This transition is under the control of the steroid hormone Ecdysone, and has been shown to be modulated in case of abnormal tissue growth to adjust the duration of larval growth in response to developmental perturbations. The relaxin-like hormone Dilp8 has been identified as a key secreted factor involved in this coupling. Here, the authors show that miR-184 is involved in the regulation of Dilp8 expression both in physiological conditions and upon growth perturbation. They propose that this function is carried out in imaginal tissues, where miR-184 levels are modulated by tissue stress. While several factors have already been involved in triggering sharp dilp8 induction at the transcriptional level, this study adds another level of complexity to the regulation of Dilp8 by proposing that its expression is fine-tunned post-transcriptionally through repression by miR-184.

      __MAJOR COMMENTS______

      Overall, the manuscript is well organized, and the logics of the experimental plan well presented. The results are clear, and I appreciate the quality of the pupariation curves. However, I believe that two main conclusions of the paper are not fully supported by the results presented in the figures: the direct regulation of dilp8 3'UTR by miR-184, and the specificity of this regulation in imaginal discs. Here I develop in more details these two aspects.

      Comment 1) The strategy of the 3'UTR sensor is not fully optimized. Indeed, in most experiments, qRT-PCR is used to assess dilp8 expression levels, although it reflects both transcriptional and post-transcriptional. Importantly, to show that post-transcriptional regulation is involved in the response to tissue damage, the levels of the 3'UTR sensor should be analyzed in discs expressing RAcs (showing at the same time that the response is cell-autonomous in the discs). The expected upregulation of the sensor should be prevented by simultaneous expression of miR-184. This approach would shed light on the relative contribution of transcriptional versus post-transcriptional regulation of dilp8 in response to growth perturbation.

      Response: We thank the reviewer for this comment. We agree that qRT-PCRs do not distinguish between transcriptional and post-transcriptional changes of dilp8 levels, in response to changes in miR-184 levels and tissue damage. In addition to the qRT-PCR data we have looked at dilp8-3’UTR-GFP reporter in response to overexpression of miR-184 in the wingdisc using patched-Gal4 driver, which show downregulation of the GFP reporter in the ptc domain (Fig 4C-D’). This suggests that dilp8 mRNA is a direct target of miR-184 by post-transcriptional regulation through its 3’UTR. Further, to confirm the specificity of the effect of miR-184 on dilp8-3’UTR, we generated a dilp8-3’UTR mutant in which the single target site for miR-184 was mutated. We show that the mutated dilp8-3’UTR reporter doesn’t show any regulation in response to miR-184 overexpression in the ptc domain of the wingdisc (Fig. 4E, E’, F, F’). This experiment confirms the specificity of the dilp8-3’UTR regulation by miR-184.

      As suggested by the reviewer we analysed dilp8-3’UTR-GFP reporter expression by overexpressing RicinA using ptcGAL4 driver in the wing imaginal disc (Fig. S6F-G’). We observed a slight but consistent increase in the dilp8-3’UTR-GFP reporter expression, indicating post-transcriptional regulation of dilp8 expression in response to tissue damage. However, the increase of reporter GFP levels observed in this experiment in response to tissue damage is mild (Fig. S6F-G’) than expected based on the qRT-PCR results (Fig S6A and B). We have added this new data to the manuscript (Fig. S6F-G’).

      We propose the following reasons to explain this result:

      a) both transcriptional and post-transcriptional regulation of dilp8 mRNA in response to developmental perturbations

      b) the data on 3’UTR reporter GFP is specifically from the ptc domain expression of RicinA, whereas for dilp8 transcript levels we have expressed RicinA in all larval imaginal tissues, or in the entire wing imaginal disc, which could be one of the reasons for the stronger effect seen on dilp8 mRNA levels

      c) we are not certain if the tubulin-promoter driven dilp8-3’UTR GFP reporter reflects post-transcriptional regulation of dilp8 by miR-184 efficiently in comparison to qRT-PCR. This is especially as the reporter-GFP-3’UTR will be expressed at very high levels due to the tubulin promoter, a majority of this reporter-GFP mRNA may not be relieved from degradation due to the moderate suppression of miR-184 in response to RicinA overexpression.

      Thus, our experiments suggest that dilp8 levels are regulated post-transcriptionally by miR-184 which contributes to pupariation delays in response to tissue damage. In support of this, we could rescue pupariation delays and dilp8 induction caused by RicinA expression using overexpression of miR-184 (Figs 5B, C). Thus, we confirm that the effect of post-transcriptional regulation by miR-184 during developmental perturbations also contributes to dilp8 induction and pupariation delays. Unfortunately, due to experimental limitations we could not perform simultaneous expression of RicinA and miR-184 to evaluate the rescue of dilp8-3’UTR-GFP sensor expression. The levels of dilp8-3’UTR sensor GFP is reduced efficiently by miR-184 overexpression (Fig 4D), which prevented us from attempting the rescue of the moderate increase of dilp8-3’UTR GFP levels in response to RicinA.

      Comment 2) In my opinion, the use of a 3'UTR sensor is not sufficient to conclude that the regulation by miR-184 is direct, as miR-184 could also regulate an intermediate factor that acts on dilp8 post-transcriptional regulation. To solve this issue, a common strategy is to generate a 3'UTR sensor with mutated binding sites that should abolish the regulation by miR-184. This mutated 3'UTR might also respond differently to tissue damage, which would strongly support the conclusions of the study.

      Response: We couldn’t agree more with the reviewer, this comment is addressed in the response to comment 1. We have confirmed the specificity of regulation of dilp8-3’UTR by miR-184 using target site mutated dilp8-3’UTR (new figures added to the manuscript Fig. 4E, E’, F, F’). We tested if the changes in dilp8 mRNA levels in response to tissue damage is post-transcriptional mediated by miR-184. We observe that there is a slight, but consistent increase of dilp8-3’UTR GFP reporter levels in the ptc domain of wingdisc in response to RicinA expression, suggesting a role for miR-184 mediated post-translational regulation of dilp8. However, we have not yet tested the mutated dilp8-3’UTR GFP reporter in response to tissue damage.

      Comment 3) Concerning the tissue-specific regulation of Dilp8 by miR-184, these results need to be strengthened. Indeed, this comes mostly from phenotypes observed with rn-GAL4. Although this is a classical tool for driving expression in imaginal discs, rn-GAL4 also drives strong expression in other tissues that could contribute to triggering a delay, such as the CNS and part of the gut (proventriculus). In our hands, some growth phenotypes in the wing obtained with rn-GAL4 could be fully reverted by blocking GAL4 in the CNS indicating that the phenotype was not wing-specific. Importantly, miR-184 seems to be highly expressed in the CNS according to FlyBase, reinforcing the possibility that it plays a role in this organ. Here I propose approaches to confirm that miR-184 mediated regulation of dilp8 and developmental timing indeed occur in the discs:

      - Another driver with less secondary expression sites could be used (pdmR11F02-GAL4), or rn-GAL4 could be combined with an elav-GAL80 to prevent expression in most neurons. - The authors could identify the source of Dilp8 upregulation in miR-184 mutants using tissue-specific qRT-PCR instead of whole larvae expression like in Fig 4A-B. - This tissue-specific upregulation could be functionally tested using a rescue experiment, in which the delay observed in miR-184 mutants could be rescued by disc-specific downregulation of Dilp8 (using pdm2-GAL4 for instance).

      Response: We are thankful to the reviewer, and agree that it is important to show that the effects that we see using rn-Gal4 are specific to imaginal discs, and not due to an effect in CNS. We tested this by expressing miR-184 sponge in the CNS. Though miR-184 is highly expressed in the larval CNS, downregulation of miR-184 specifically in the pan-neuronal background using elav-GAL4 led to no effects on pupariation timepoint. We have added this as supplementary data Figure S4. Therefore, we believe that the miR-184 downregulation phenotype in the rnGAL4 background can be mainly attributed to its role in the imaginal discs. In addition, as suggested by the reviewer we have also demonstrated that downregulation of miR-184 in the imaginal discs using rnGAL4 driver leads to an increase in dilp8 expression (Fig S5B). Thus confirming that dilp8 mRNA is enhanced in the imaginal discs by blocking miR-184.

      OPTIONAL: Because it is known that dilp8 is strongly regulated at the transcriptional level, the relative input from post-transcriptional upregulation is an important question arising from this study. Although it might be a more long-term approach, I believe that generating a Dilp8 mutant lacking its 3'UTR or, even better, with mutated miR-184 binding sites, would shed light on the role of this regulation for the response to growth perturbation and/or developmental stability (fluctuating asymmetry).

      Response: We thank the reviewer for the suggestion. This would have been an interesting experiment to carry out especially in the context of fluctuating asymmetry.

      MINOR COMMENTS

      1. __ I think that a number of results could be moved to SI as they are either controls, or reproduce published data without bringing novelty. For instance, results in Fig 5A-D are similar to data published by Sanchez et al, as stated in the text. Fig6A as well.__

      __Response: __We thank the reviewer for this suggestion, Fig. 5A-D, and F has been moved to Fig. S6A-E. We have also moved data from Fig. 6 to Fig. 5, as a result Fig 6 A-D has become Fig. 5 B-D.

      __ Fig 6D is quite mysterious, as it suggests that basal JNK activation regulates miR-184, which is different from a context of tissue damage. I think that this result could be removed. Alternatively, if the authors want to dig in that direction, more experiments should be provided, such as bskDN expression in an RAcs context and the effects on miR-184 levels and the 3'UTR sensor (since transcript levels are already published).__

      Response: We would like to clarify that our experiments suggest that endogenous JNK signalling negatively regulates miR-184, as blocking basal JNK signalling using bskDN increased the levels of miR-184 (changed to Fig 5D). Enhanced JNK signalling has been reported to be involved in tissue damage responses, and we propose that RicinA mediated increase in JNK signalling leads to the reduction of miR-184 (changed to Fig 5A, S6D-E). However, we are not strongly implying this as we did not co-express RicinA and bskDN to show that JNK signalling is responsible for the drop in miR-184 levels in response to tissue damage. We thank the reviewer for seeking this explanation, we have rewritten the results section to improve clarity.

      __ The references related to Dilp8 should be checked more in detail in the intro and discussion. About Dilp8 and developmental stability: remove the ref to Colombani et al 2012, instead put Boone et al 2016 and add Blanco-Obregon et al 2022 (in addition to Garelli et al 2012 who initially identified this phenotype. About Lgr3 as the receptor for Dilp8: add Colombani et al, Current Biology 2015, and cite here Vallejo et al 2015, Garelli et al 2015. Among the important transcriptional regulators of Dilp8, Xrp1 could be mentioned (Boulan et al 2019, Destefanis et al 2022) as it plays a complementary function to JNK depending on the type of tissue stress.__

      __Response: __We are really sorry for the glaring errors in citing appropriate references. We thank the reviewer for correcting this for us. We have made necessary changes to the text.

      Significance

      GENERAL ASSESSMENT This study provides convincing data showing that the conserved microRNA miR-184 plays a role in regulating developmental timing in Drosophila through modulating the levels of Dilp8, a key factor in the coupling between tissue growth and developmental transitions. The results are convincing, but the general conclusions of the paper need to be strengthened regarding the direct regulation of dilp8 by miR-184 and the tissue-specificity of this interaction.

      ADVANCE Dilp8 is a key factor that modulates growth and timing in response to developmental perturbations and contributes to developmental precision in physiological conditions. As such, its regulation has been studied by different groups in the last decade, leading to the identification of several inputs for its transcriptional regulation. Here, the authors uncover a post-transcriptional regulation by miR-184, adding another level of regulation of Dilp8 that contribute to ensuring proper regulation of developmental timing, and opening the possibility that miR-184 might play similar roles in other species.

      AUDIENCE This study is of interest for researchers in the field of basic science, with a focus on developmental timing, tissue damage and biological function of microRNAs.

      REVIEWER EXPERTISE Drosophila, growth control, developmental timing, Dilp8.

      Reviewer #2

      Evidence, reproducibility and clarity

      Drosophila has helped to characterize the mechanisms that coordinate tissue growth with developmental timing. The insulin/relaxin-like peptide Dilp8 has been identified as a key factor that communicates the abnormal growth status of larval imaginal discs to neuroendocrine neurons responsible for regulating the timing of metamorphosis. Dilp8, derived from imaginal discs, targets four Lgr3-positive neurons in the central nervous system, activating cyclic-AMP signaling in an Lgr3-dependent manner. This signaling pathway reduces the production of the molting hormone, ecdysone, delaying the onset of metamorphosis. Simultaneously, the growth rates of healthy imaginal tissues slow down, enabling the development of proportionate individuals.

      In this manuscript "miR-184 modulates dilp8 to control developmental timing during normal growth conditions and in response to developmental perturbations" by Dr. Varghese and colleagues, the authors identify a new post transcriptional regulator of Dilp8. The authors show that miR-184 plays a pivotal role in tissue damage responses by inducing dilp8 expression, which in turn delays pupariation to allow sufficient time for damage repair mechanisms to take effect.

      Major points:

      Comment 1) In most of the experiments for percentage of pupariation, the 50% pupariation in control is around 110 hours AED in figures 1, 2 and 3. In figures 5 and 6 using the UAS Ricin, the controls are more around 90 hours AED. Why this discrepancy?

      Response: We thank the reviewer for asking for this clarification. The former experiments for Figs 1-3 were carried out at 25oC while the latter experiments with a cold sensitive version of RicinA (UAS-RAcs), Figs 5 and 6 (now changed to Figs. 5 and S6 as suggested by reviewer #1) were carried out at 29oC (permissive temperature). This difference in temperature has led to alterations in pupariation timing. We apologise for not having mentioned this in the text, now we have made necessary corrections to the methods section clearly indicating this.

      Comment 2) What is the mechanism behind the expression of miR-184 in stress conditions? Is miR-184 also implicated in other conditions giving rise to a developmental delay (X-rays irradiation or animal bearing rasV12, scrib-/- tumors)?

      Response: We thank the reviewer for these questions.

      a) In response to developmental perturbations by RicinA, we believe that activation of JNK signalling controls miR-184 expression. We propose this as our experiments show that imaginal disc damage leads to enhancement of JNK signalling and increase in dilp8 mRNA levels (as reported earlier by Colombani et al 2012; Sánchez et al 2019), and a simultaneous reduction of miR-184 (Figs. S6A, D, E). We also have performed new experiments to show that in response to RicinA expression in the wingdisc there is moderate increase in the dilp8-3’UTR-GFP sensor expression (Figs. S6F-G’), indicating a post-transcriptional regulation of dilp8 expression in response to tissue stress. We also show that RicinA induced dilp8 expression and pupariation delay can be rescued by increasing miR-184 levels (Fig 5B and C), suggesting that the reduction of miR-184 in response to tissue damage contributes to the damage responses. In a separate experiment we show that blocking the endogenous JNK pathway by the expression of bskDN enhances miR-184 levels, suggesting that miR-184 is under the regulation of JNK signalling (Fig 5D). Hence, we speculate that during tissue stress, activation of JNK signalling leads to a reduction of miR-184 levels which contributes to regulating the levels of dilp8 post-transcriptionally and resulting in pupariation delays. The text has been modified to explain this better.

      b) In a previous paper by Shu et al., 2017 (https://doi.org/10.18632/oncotarget.22226) decreased expression of miR-184 was observed in a lglRNAi; RasV12 tumor background. Apart from this various studies have shown that dilp8 levels increase in response to tumour, radiation stress, apoptosis, and tissue damage (Yeom et al 2021, Ray et al 2019, Demay et al 2014, Katsuyama et al 2015, Colombani et al 2012, Garelli et al 2012). Whether the regulation of dilp8 by miR-184, occurs in these backgrounds is yet to be tested. We have now discussed this possibility in the manuscript.

      Comment 3) dilp8 mutant animals have also been shown to be more resistant to starvation or desiccation (https://doi.org/10.3389/fendo.2020.00461). Is miR-184 implicated in this answer?

      Response: We thank the reviewer for this question. In our earlier experiments miR-184 has been demonstrated to be regulated by nutrition in the larval stages and lack of miR-184 led to enhanced larval death in response to diet restriction (Fernandes et al., 2022). miR-184 was also demonstrated to play a role in the insulin producing cells (IPCs) in regulating lifespan (Fernandes & Varghese., 2022). In the current work, we propose miR-184 to act upstream of dilp8 in response to stress stimuli. Hence, it is possible that miR-184 might be involved in responses to starvation and desiccation stress in the adult female flies, by regulating dilp8 levels post-transcriptionally. However, it has not been tested yet if the miR-184 regulation of dilp8 plays a role in resistance to starvation or desiccation in adult females, as this was not within the scope of the current study. We have now added this reference in the discussion section.

      Comment 4) dilp8 expression has been also shown to be regulated by Xrp1 in response to ribosome stress (https://doi.org/10.1016/j.devcel.2019.03.016). This paper should be included in the manuscript. Is it possible that the expression levels of miR184 are regulated by Xrp1?

      Response: We thank the reviewer for the suggestion and have incorporated the reference into the paper. During ribosome stress in the larval imaginal discs the stress-response transcription factor Xrp1 acts through dilp8 in regulating systemic growth. We agree with the reviewer, it is possible that expression of miR-184 is regulated by Xrp1. Currently we have not explored this possibility. We have now added this to the discussion section.

      Minor points:

      1. __ Does the overexpression of miR184 induce an increased fluctuating asymmetry?__

      Response: We thank the reviewer for asking this question. The role of dilp8 in the fluctuation asymmetry is only observed in the dilp8 hypomorphic mutant background. To replicate this we would have to overexpress miR-184 in either the whole larvae or in the wing discs. Unfortunately overexpression of miR-184 in the wing discs (using rnGAL4) leads to pupal lethality while as overexpression of miR-184 in the whole larvae leads to embryonic lethality and therefore we were not be able to conclude from our experiments if miR-184 overexpression induces increased fluctuating asymmetry.

      2. There are 2 references Colombani et al. (2012 for Dilp8 and 2015 for Lgr3). Can you double check that they are used accordingly

      Response: We thank the reviewer for pointing these errors out and we have incorporated these changes into the paper.

      Significance

      Altogether, the paper present compiling lines of evidence supporting the proposed model. The experiments are well designed and are convincing. The papers is interesting and relevant for a broad audience.

      __Reviewer #3 __

      Evidence, reproducibility and clarity (Required):

      This is an interesting study demonstrating an interaction between miR-184 and the Drosophila insulin-like peptide 8 (dilp8) in the tissue damage response. The authors show that Dilp8 activity is negatively regulated by miR-184, apparently through direct interaction between miR-184 and the dilp8-3'UTR, which leads to lower dilp8 mRNA transcript levels, via an undetermined mechanism, supposedly its degradation? Furthermore, the authors show that during aberrant tissue growth, miR-184 levels are very slightly downregulated (see comment below), and based on other experiments, imply causation of this with the increased dilp8 mRNA levels that occur in these tissues, again via an unclear mechanism: upregulation or stabilization of dilp8 mRNA. The authors present evidence that the JNK pathway, which had been known to be critical for dilp8 mRNA upregulation upon tissue damage, does so via miR-184.

      Major Comments:

      __Comment 1: The data showing the direct regulation of dilp8-3'UTR by miR-184 are not very strong and would require more controls to strengthen the claim, as described below. __

      Response: We have performed new experiments to validate that dilp8-3’UTR is regulated by miR-184. Please see the detailed responses to comments 10-12 below.

      __Comment 2: The miR-184 effects are also very small (less than 2-fold reduction with tissue damage; or less than 2-fold induction with JNK-pathway inhibition via bskDN). These two points are the weakest part of the manuscript and model. __

      Response: We agree with the reviewers on this point. The reduction in miR-184 levels in response to RicinA expression is modest (25–30%), and the induction of miR-184 in response to bskDN expression is less than two-fold (Figs. 5A and D). In contrast, dilp8 transcript levels increase several-fold in response to RicinA expression (Fig. 5C, S6A and B). Since we measure dilp8 transcript levels by qPCR, we detect both transcriptional and post-transcriptional contributions to dilp8 regulation. In addition, we have performed a new experiment to check the post-transcriptional regulation of dilp8, in response to tissue damage. Though the change in the dilp8-3′UTR GFP reporter upon RicinA expression in the ptc domain of the wingdisc is mild (Figs. S6F-G’), this strongly suggests a post-transcriptional outcome of the reduction of miR-184 levels on dilp8. Hence, we propose that tissue damage induces strong transcriptional activation of dilp8, while the reduction of miR-184, despite its smaller magnitude, contributes to dilp8 upregulation via post-transcriptional regulation. In support of this, our experiments demonstrate direct regulation of the dilp8-3′UTR by miR-184 (Figs. 4C-F’), and show strong dilp8 mRNA upregulation in miR-184 deficient conditions (Fig. 4A and B), suggesting the role of miR-184 in maintaining dilp8 levels. We also show that RicinA induced effects on dilp8 and pupariation delay are reversed by co-expression of miR-184 (Fig. 5C). We do not claim that regulation by miR-184 is the sole mechanism for driving dilp8 induction during tissue damage, but suggest that miR-184-mediated post-transcriptional regulation acts in a complementary manner to transcriptional responses. Furthermore, we believe that the mild effect of JNK signaling on miR-184 (as shown by the bskDN experiment) is sufficient for the moderate reduction of miR-184 in response to tissue damage.

      Comment 3: ____Regarding the expression levels, it does not help that the authors show bar graphs with standard errors of the mean instead of the actual data points to allow reliable appreciation of the data dispersion.

      Response: We have modified our figures and have performed statistical analysis according to the suggestions of the reviewers, please see responses to comments 1-9, and 13-19.

      Comment 4: It is difficult to understand how minute changes in miR-184 levels can lead to over an order of magnitude differences (in some cases) in dilp8 mRNA levels considering that it is a stoichiometric relationship. Maybe ?miR-184-Dicer1? complexes are highly stable and re-used for multiple dilp8 transcripts - the authors could discuss how they understand this occurring in their manuscript.

      On the same line, discussion is also rather weak on what regards the mechanism of control of dilp8 mRNA levels by miR-184. Please discuss eg, the evidence for mRNA degradation induction by microRNAs with this UTR binding profile (imperfect UTR binding Fig S4) and-if appropriate-how other possible regulatory models (direct and indirect) could explain the findings.

      Response: We accept the reviewers comment that 25-30% reduction of miR-184 is low in comparison to the many fold increase in dilp8 levels. We believe that both post-transcriptional and transcriptional changes are responsible for the induction of dilp8 in response to tissue damage. However, our experiments suggest the role of post-transcriptional regulation by miR-184, as pupariation delay is rescued by miR-184 overexpression (also please see the response to the previous comment). We are not ruling out the possibility of transcriptional regulation of dilp8 mRNA, rather we are suggesting the possibility that both transcriptional and post-transcriptional means are responsible for changes in dilp8. Moreover, we have not performed absolute measurement of miR-184 in the imaginal discs (what we show is a comparison between control and RicinA expression), hence we do not have an exact estimate of how many miR-184 molecules are reduced and if they would be greatly equal or more in comparison to the dilp8 mRNA molecules that are upregulated, as again while measuring dilp8 mRNA we are not checking how many molecules of dilp8 exactly are increased. As the reviewer suggests, it is possible that miR-184-RISC could be stable to handle multiple dilp8 molecules one after the other, hence it is not a 1:1 relationship between miR-184:dilp8. We have included this in the manuscript. It is also known that imperfect 3’UTR binding as seen in most animal microRNAs leads to translational repression and mRNA deadenylation, which eventually results in mRNA degradation.

      Comment 5: ____We suggest the authors carefully revise their citations to cite appropriate work that supports the claims, and also to avoid missing the seminal studies that report the claims they cite.

      Response: We are really apologetic for the errors citing the key references. We are grateful to the reviewers for correcting this for us. We have made changes to the text to include and correct the references.

      We have the suggestions below which we hope will help the authors improve their manuscript. If the authors address these points raised above, we believe the manuscript should be a valuable contribution to the field, and help in the understanding of how tissues respond to growth aberrations and the regulation of transcript levels by microRNAs.

      Detailed Comments:

      Comment 1. Results 1st paragraph: please describe the screen in more detail. As written, one only discovers it was a miRNA loss-of-function screen when reading the legend of Table S1. Please show the original data of the screen - with dispersion if possible.

      Response: We thank the reviewers for these suggestions, we have now included the data from the screen with SEM, and p-values.

      Comment 2. Results 1st paragraph, Fourth line, "While several miRNAs caused delays in pupariation by 12 hours or more..". Please correct, as actually loss of miRNAs caused delays.

      Response: We thank the reviewer for pointing out this error, we have corrected the text accordingly.

      Comment 3. ____Results (Figure 1) - It says that data from three independent experiments are shown. However there is no dispersion in the data. Could the authors please explain this? Are the results of the three experiments summed and presented as one? or is this one of the three?

      Response: We thank the reviewers for these suggestions and have plotted data with the SEM values.

      Comment 4. It is reported in the legend of Figure S2 that LogRank test was performed to determine statistical significance. However, no statistical data is presented. Please show the results.

      __Response: __We thank the reviewers for these suggestions to improve the data presentation, we have incorporated the p-value as suggested.

      Comment 5. Fig2A and B. Please show the data points in the bar graphs (as in Figure. 2C), or choose another data representation. ____Please consider redoing statistical analysis with a simple t-test. ____It is not clear to me why ANOVA was used to compare two samples. Please state that data are normalized also to control (tub-GAL4>UAS-scramble). Please ____state____ the h post-hatching from which the RNA samples were collected (as in Fig 2C for 20HE quantification).

      __Response: __We thank the reviewers for these suggestions to improve the data presentation, we have incorporated all changes as suggested. Similar changes have been incorporated to the rest of the figures of the manuscript as well. Hours post-hatching information for each figure is now added to the figure legends. __ __

      Comment 6. Fig2C. Fig legend states the bar graphs are "absolute values". Please specify if the bar represents the average, median or something else.

      Response: We thank the reviewer for pointing this out, we have made the suggested changes.

      Comment 7. Throughout the manuscript: please use GAL4 in capital letters or at least standardize it throughout the ms. Currently there are GAL4s and Gal4s.. eg compare Fig 2 and 3 legends.

      Response: We thank the reviewer for pointing this out, we have incorporated all changes as recommended.

      Comment 8. FigS3A and B. Please revise as Fig2A and B above. and apply the same criteria in the respective figure legend.

      __Response: __We thank the reviewer for pointing this out, we have made the changes as recommended.

      Comment 9. Fig. 4 - please indicate on the figures what is whole larvae and what is wing imaginal discs. This will facilitate understanding of the figure.

      __Response: __We thank the reviewers for these suggestions and have included this information in all the figures.

      Comment 10. Fig 4 - Data - Authors do not show that rn-GAL4>miR-184-sponge causes up regulation of dilp8 mRNA levels, hence the model is weakened. Doing this experiment would significantly strengthen the study whatever the result is.

      Response: We thank the reviewer for pointing this out and we have included this in the manuscript (Fig S5B).

      Comment 11. The dilp8-3'UTR experiment is weak especially because its generation is not sufficiently well described in the manuscript. "The dilp8 3'UTR-GFP reporter line was created as described in (Vargheese & Cohen, 2007)" is not sufficient. Please describe the construct generation in sufficient detail so that the experiments can be reproduced by others.

      Response: We thank the reviewer for pointing this out and we have elaborated in the methods section on how we generated the dilp8 3'UTR-GFP reporter and dilp8 3'UTR mutant GFP reporter lines. The plasmid was originally created in Steve Cohen’s lab at EMBL, by modifying pCasper4 plasmid, by introducing a tubulin promoter, EGFP and a multiple cloning site, which allows one to clone 3’UTRs of target genes into this plasmid. Not1 and Xho1 sites were used to clone the dilp8-3’UTR and mut-3’UTR. We hope this explains our strategy sufficiently.

      Comment 12. Making assumptions, if the construct is as described in Vargheese & Cohen, 2007 and contains all of the dilp8 3'UTR - it should be a Tubulin-driven GFP gene with a dilp8-3'UTR "Tub-GFP-(dilp8 3'UTR)". In this case the authors need to rule out the alternative interpretation of the result in Fig. 4D by showing that the expression of miR-184 does not down regulate Tub-GFP expression itself. The best scenario would be to have a mutated dilp8 3'UTR for the miR-184 recognition site. This experiment would significantly strengthen the study and model.

      Response: We thank the reviewer for pointing this out. We agree with the reviewers that this experiment is needed to prove direct regulation of the dilp8-3’UTR by miR-184. We have mutated the sequences complementary to the seed region of miR-184 in the dilp8-3’UTR, and demonstrated that overexpression of miR-184 does not regulate the mutated tub-GFP-(dilp8 3'UTR) expression. This confirms that the dilp8 gene is a direct target of miR-184. This data is added to the manuscript as Figs 4E-F’.

      Comment 13. Figure 4C-D please separate dilp8 from 3'UTR with a space or hyphen.

      Response: We thank the reviewer for pointing this out and have separated dilp8 from 3’UTR with a hyphen.

      Comment 14. Figure 4E. Please name the dilp8 allele as MI00727 as it is not a KO, but rather a hypomorphic mutation (fully WT dilp8 transcripts are still generated, albeit at a much lower level).

      Response: We thank the reviewer for pointing this out and we have made the necessary changes.

      Comment ____15. Figure 6D: please add UAS to bskDN/+. All figures have rn-GAL4 alone or with UAS-GFP as control. This finding would be strengthened with this other control, especially because the size effect is small.____ This being said a general comment for all experiments is that hemi-controls are generally missing for all figures. eg, in Fig 3. One would typically include controls such as A. Phm>+ and +>miR.184; B. aug21>+ and +>miR.184; C. ptth>+ and +>miR.184; D. rn>+ and +>miR.184

      Response: We thank the reviewer for pointing this out. We have added UAS to bskDN, now Fig 5D and have also added the rnGAL4/+ control. We have also performed various hemi-control experiments as suggested by the reviewer to our best capabilities. We have added a separate graph with the hemicontrols in the as a Reviewer Response Figure 1.

      Comment 16. Figure 7: Are IPCs necessary for the model? If not, I suggest removing them and placing the Lgr3 neuron cell bodies much more anterior in this scheme. Their cell bodies are as anterior and rostral as it gets, approximately where the IPCs are depicted in this type of view of the CNS.

      Response: We thank the reviewer for pointing this out and have removed IPCs from the figure, this figure is now labelled as Fig. 6.

      Comment ____17. Table S1- It would be preferable to see the data of these experiments, but if the authors prefer to show this data in a table, please at least add the dispersion analyses (eg standard deviation.. OR median+-quartiles OR Confidence intervals..), N of animals analysed, and statistics against controls.

      Response: We thank the reviewer for pointing this out, we have added the number of larvae analysed, SEM values and statistics against the control condition.

      Comment ____18. In all figures with pupariation time: please also indicate significant findings in the graphs (with an asterisk, for instance) and adjust figure legends accordingly. This could facilitate understanding the data.

      __Response: __Thanks for the suggestion. We have incorporated this information into figure legends.

      Comment ____19. Please revise Figure legends for punctuation.

      __Response: __We have rectified all the errors in punctuation. We thank the reviewers for suggesting this.

      __Comment ____20. __

      a) Abstract:

      Line 10: What is the evidence to call Dilp8 a "paracrine" factor?

      Response: We thank the reviewer for pointing this out, we have changed the text to ‘secreted factor’.

      b) Introduction:

      4th paragraph, 3rd sentence " Dilp8... buffers developmental noise and delays pupariation..." Buffering of developmental noise was first shown in Garelli et al., Science 2012, so this publication should be cited. ____4th paragraph, 5th sentence: please include Jaszczak et al., Genetics 2016. This paper was published together with the 2015 papers, just a matter of timing that it got a 2016 date. Moreover, I do not think Katsuyama et al., 2015 is well cited to back up the statement in this sentence, hence I recommend removing that citation in this sentence.

      Response: We thank the reviewer for pointing this out and have made necessary changes.

      c) 6th paragraph: 5th line "targeting dilp8" : please specify if you mean the gene or the mRNA, or both. Same for line 7.

      Response: We thank the reviewer for pointing this out and have made necessary changes.

      d) Results Page 10, 1st paragraph, 1st sentence: the works cited are not the appropriate studies that demonstrated what is being stated. This was shown in Garelli et al., Science 2012 and Colombani et al., Science 2012. Results Page 10, 1st paragraph, line 11: Please also cite Colombani et al., Science 2012, who first showed that JNK is required for dilp8 regulation.

      Response: We thank the reviewer for pointing this out and are extremely apologetic for this oversight. We have made necessary changes to the manuscript.

      e) Discussion, 2nd paragraph, line 4: again, please indicate the rationale for using "paracrine" to describe Dilp8's activities. The current widely accepted model is that Dilp8 acts on interneurons in the brain ____(eg, reviewed in Juarez-Carreno et al., Cell Stress, 2018; Gontijo and Garelli, Mech Dev, 2018; Mirth and Shingleton, Front Cell Dev Biol, 2019; Texada et al., Genetics 2020; Boulan and Leopold, 2021).____ In order to reach the brain, Dilp8 has to be secreted from the discs and travel to the brain. This is as an endocrine mechanism as it gets for a small larva, considering that some discs can be on the opposite side of the larva (eg, genital discs). While this does not exclude that Dilp8 could also act paracrinally, the only evidence that I am aware of comes from other contexts such as during transdetermination (where Dilp8 has been proposed to work in an autocrine or paracrine fashion, via Drl in imaginal discs (Nemoto et al., Genes to Cells, 2023), however, this is not cited appropriately in this manuscript and is less related to the Lgr3-dependent pathway being studied here.

      Response: We totally agree with the reviewer and appreciate clarifying this for us. We have made necessary changes to the text.

      f) Discussion Page 13, 1st paragraph, This claim is supported by data presented in Garelli et al., Science 2012, not the other two papers. Garelli et al., 2015 shows that the Lgr3 receptor also participates in buffering developmental noise. Other studies have corroborated the Garelli et al., 2012 finding: eg, Colombani et al., Curr Biol 2015; Boone et al., Nat Commun 2016; Blanco-Obregon et al., Nat Commun 2022). Many other studies have shown that Dilp8 promotes developmental stability under tissue stress and challenges.

      Discussion Page 12, 3rd paragraph, 2nd sentence: "The Lgr3 neurons directly interact with ... PTTH ...and insulin-producing neurons" Please cite Colombani et al., 2015 and Vallejo et al., Science 2015. Vallejo et al., propose that circuit with insulin-producing neurons. In the 3rd sentence, only Jaszczak et al., 2016 is cited, whereas this claim/model comes from many studies, such as Halme et al., Curr Biol, 2010; Hackney et al., PLoS One 2012; Garelli et al. Science 2012; Colombani et al., Science, 2012; and the Lgr3 papers from 2015). Jaszczak et al., actually propose that Lgr3 is also required in the ring gland in addition to neurons.

      Discussion page 14 last paragraph,10 line, "In Aedes aegypti ....regulates ilp8 (Ling et al., 2017)". As far as I understand mosquitoes do not have a dilp8 orthologue (see for instance Gontijo and Gontijo, Mech Dev 2018; and Jan Veenstra's work). ilp nomenclature (numbering) does not follow that of Drosophila, so ilp8 is probably a typical Insulin/IGF-like peptide and is NOT an orthologue of Dilp8, a relaxin, so this citation needs to be removed or placed into the broader context of microRNA regulation of ilps.

      Response: We are really sorry for the numerous glaring errors in the references. We thank the reviewers for correcting this for us. We have made necessary changes to the text.

      Thank you for the opportunity to review your interesting work,

      Alisson Gontijo and Rebeca Zanini

      Reviewer #3 (Significance (Required)):

      If the authors address these points raised above, we believe the manuscript should be a valuable contribution to the field, and help in the understanding of how tissues respond to growth aberrations and the regulation of transcript levels by microRNAs.

      __Author’s concluding response: __

      We thank all the reviewers for the overall positive comments and suggestions that we believe have helped us to improve our manuscript. We have incorporated all the changes suggested, especially regarding errors in citing key references. We have performed most of the experimental suggestions. Also, we have modified the way in which graphs are presented, including statistical tests as suggested by the reviewers. Several controls have been performed to strengthen the manuscript further. We believe that this review process aided in significantly improving this manuscript.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1. General Statements

      We thank the reviewer for their positive comments regarding the research article titled "The Ketogenic Diet Metabolite 1 β-Hydroxybutyrate Promotes Mitochondrial Elongation via Deacetylation and Improves Autism-like Behaviour in Zebrafish" by Uddin GM and colleagues. We appreciate your input, and we will address these comments as indicated below with specific responses to each point raised by reviewers.

      The main changes in the updated manuscript are as follows:

      We have revised the introduction to now incorporate additional background information on mitochondria, NAD, and mitochondrial dynamics and function. This addition aims to provide readers with a broader understanding of the mitochondrial context in relation to our study.

      Furthermore, we recognize that previous studies have explored mitochondrial function in the context of the ketogenic diet. While our specific investigation centered on mitochondrial morphology, we acknowledge the importance of comprehensively investigating mitochondrial function. To this end, we have added new data showing how BHB impacts mitochondrial oxidative phosphorylation in HeLa cells (Sup Fig 2), and how both BHB and NMN impact oxygen consumption/glycolysis in zebrafish (Fig 7).

      We have also added new behaviour analysis of the zebrafish (Fig 6), and have re-framed the discussion around neurodevelopment generally, rather than ASD specifically.

      Finally, we have now included a section in our manuscript that discusses the limitations of our study. These limitations can be further investigated to explore and characterize the full mechanistic potential behind the effects of the ketogenic diet and/or NMN on mitochondrial dynamics.

      2. Point-by-point description of the revisions

      This section is mandatory. *Please insert a point-by-point reply describing the revisions that were already carried out and included in the transferred manuscript. *

      *Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Uddin GM and colleagues presented a research article entitled 'The Ketogenic Diet Metabolite 1 β-Hydroxybutyrate Promotes Mitochondrial Elongation via Deacetylation and Improves Autism-like Behaviour in Zebrafish'. Roles of ketogenic diet (KD) and NAD+ precursors in health promotion and longevity, as well as on the alleviation of a broad range of diseases are evident. However, their roles in autism are not well done, which is the novelty of the current study. Addressing below questions will improve the quality of the paper.

      Major concerns 1. In the introduction section, a broad overview of the roles of ketogenic diet (KD) in neurodegenerative disease (and ageing, if possible) should be provided. E.g., the authors should summarize exciting progress on the use of KD to treat Alzheimer's disease in animal models (PMID: 23276384). *

      Response: Thank you for your valuable suggestion. While it is true that the KD appears to be beneficial in neurodegenerative (and other disease) models, our focus in this paper is looking at neurodevelopment, rather than all potential benefits of the KD. Nonetheless, we have addressed this comment by incorporating a brief overview of the roles of the KD in neurodegenerative diseases, including Alzheimer's disease (AD), in the introduction section of the manuscript. Specifically, we have summarized the exciting progress made in utilizing KD to treat AD in animal models, as highlighted in the suggested study. This addition helps to provide a better overview of the potential therapeutic effects of KD in neurodegenerative diseases and strengthens the introduction section of the manuscript.

      • Roles of high fat diet to treat diseases could be extended to rare premature ageing diseases. In such scenario, high fat and NAD+ boosting shared some joint mechanisms (PMID: 25440059 ). *

      Response: This information and the reference are now added to the discussion.

      *In the introduction, a more detailed introduction of NAD+ and its roles in mitochondrial homeostasis (especially mitophagy and the mitochondrial fusion-fission balance) should be included (PMID: 24813611; PMID: 30742114; PMID: 31577933). *

      Response: Although our paper focused primarily on mitochondrial fission and fusion, we have incorporated a new paragraph in the introduction to provide a more detailed introduction detailing NAD+ and its roles in mitochondrial homeostasis, specifically highlighting mitophagy. We have included the suggested references.

      • In regarding to the statement of KD increases NAD+, was it due to increased generation (to check protein levels and activities of different NAD+ synthetic enzymes, such as iNAMPT, NMNAT1-3, and NRK) and/or reduced consumption (in addition to reduced glycolysis, does KD inhibit the activities of CD38 and PARPs? In this paper, Sirtuins' activities is (are increased)). Detailed exploration of the activities of these proteins will unveil a clear molecular mechanisms on how KD affects/regulates NAD+. *

      Response: Thank you for the comment. We agree that exploring the detailed mechanism of how the ketogenic diet (KD) affects NAD+ is an interesting question that will have important implications once answered. However, fully elucidating the mechanism of action would require a more comprehensive investigation, which is beyond the scope of this current project. We have now added this as a future direction in the manuscript.

      *Fig. 1: in the NAD+ field, the normal used NR/NMN concentrations are normally high like to use 500 µM to 2-5 mM (as the NAD+ levels in cells are high). In addition to use 50 µM, the authors are strongly to have a dose-dependent study (50 µM, 500µM, 1, 2, 5 mM), and see changes of mitochondrial funciton and parameters. In this condition, NAD+ levels should be also checked. *

      Response: We have added new supplemental data showing the initial dose response of the effects of BHB and NMN on mitochondrial morphology, which led us to choosing the relevant doses for the remainder of the paper. Our objective was not to investigate the broad impacts of different NMN concentrations on mitochondrial function and parameters, or NAD+ levels. As such, we have only focused on doses where we see effects on mitochondrial morphology.

      *Fig. 2: a comprehensive characterization of mitochondrial fusion-fission should be performed. In addition to the protein evaluated, changes on other key fusion-fission proteins, like Bax, Bak, Mfn-1, Mfn-2, etc should be performed (PMID: 17035996; PMID: 24813611). *

      Response: We agree that looking at other key proteins involved in mediating mitochondrial fission and fusion could provide additional insight. Indeed, given the changes in global acetylation that we see, it is expected that some other proteins may also be regulated in this way. However, there are at least a dozen proteins involved in mediating mitochondrial fusion and fission, not to mention many more proteins that regulate these proteins. Unfortunately, it is not feasible to analyze all the proteins involved in mitochondrial fusion-fission. Moreover, looking only at protein levels, doesn't necessarily inform about the activity of any protein. Instead, we concentrated in this paper on investigating known links between protein acetylation and mitochondrial dynamics, particularly focusing on the proteins that have known links to acetylation (i.e., DRP1, OPA1, MFNs). We have added a note in the discussion acknowledging that other means of regulation could also be occurring in parallel.

      *Figs. 1-5 were focused on mitochondrial morphology, whether KD and NMN changed mitochondrial funciton should be explored, such as to use seahorse to check ECR and OCR. *

      Response: Although our question was focused on morphology, we agree that mitochondrial function is important. We have added new data showing that BHB increases basal oxygen consumption in HeLa cells (Sup Fig 2), as well as new data showing that BHB and NMN influence oxygen consumption and glycolysis in our zebrafish model (Fig 7)

      • Fig. 6: NR/NMN used in animal studies (via gavage or in drinking water in mice, and on plate for worms and flies) are normally high (e.g., in drinking water for mice could be 4-12 mM; for worms and flies are normally 1-5 mM); for zebrafish, while they are swimming in water, this reviewer concerned whether it was true that 50 µM of NMN was sufficient to show the benefit presented.*

      Response: Our data show that these doses are indeed sufficient. We did look at some higher doses for NMN, but these were toxic, leading to poor survival and were not studied further.

      *Minor concerns 1. Line 26: For 'a growing list of neurological disorders, including autism spectrum disorder (ASD)', please add AD in. *

      Response: Line 26 is part of the abstract, which we feel should be focused more on the main message of the paper, which does not involve AD. As addressed above, we have added AD as an example in the introduction.

      *Line 57: For 'with side effects such as gastrointestinal disturbances, nausea/vomiting, diarrhea, constipation, and hypertriglyceridemia being reported', rate of frequency shall be provided if any. *

      Response: We have modified the statement to indicate the relative percent of patients suffering the various side effects.

      *Reviewer #1 (Significance (Required)):

      The novelty of the current study was to investigate effects of KD and NAD+ on autism. This investigation was not performed before and thus is the novelty.

      Weakness, effects of KD and NAD+/NMN on mitochondrial function were not well-investigated and should be done. Introduction was not well done, many key information in the fields were not provided which may mislead the readers an over-evaluation of the novelty of the current study.*

      Response: As outlined above, we have edited the introduction to include additional information requested by the reviewer. Moreover, our focus in this manuscript was to look at the mechanisms underlying changes in mitochondrial morphology, not mitochondrial function per se, though this is clearly important and related. Nonetheless, as discussed above, we have also added new data showing how BHB impacts mitochondrial function.

      *My expertise lies in NAD+, mitochondria, and brain health.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      The study examined the effect of beta-hydroxybutyrate and nicotinamide nucleotide on mitochondrial morphology and the molecular pathways which mitigate this effect as well as the effect of these treatments on behavior in zebrafish. The study is well done and well written. The only thing I think that could be improved are the bar in the graph some the significant comparisons. It is sometimes difficult to see which groups are being compared.*

      Response: We're happy to adjust how the data is displayed in the relevant bar graphs, but it is not clear exactly what changes the reviewer would like. To some degree this will depend on the specific guideline of the final journal where we hope the manuscript will be published. As such, we have not made changes at this point.

      ***Referees cross-commenting**

      The other reviewers do have some fair comments. Multiple doses would be helpful and showing bioenergetic data would complement the morphological measurements. Additionally, behavioral assays showing changes in social behavior in the Zebrafish would provide a stronger link to ASD. *

      Response: As discussed above, we have added new information on doses and mitochondrial bioenergetics. With respect to behaviour, we have added thigmotaxis data and reworked the discussion around behaviour and neurodevelopment so that it is less specific to ASD.

      *Reviewer #2 (Significance (Required)):

      As beta-hydroxybutyrate is an important substrate for the ketogenic diet, this study helps explain the potential mechanisms in which the ketogenic diet may enhance mitochondrial function.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      In this paper, Uddin and colleagues have investigated components of the ketogenic diet to understand changes in both mitochondrial morphology and protein expression, and zebrafish locomotor behaviour. They investigate whether beta-hydroxybutyrate (BHB) or nicotinamide nucleotide (NMN) application can later human mitochondria in HeLA cell lines, and also recue a locomotion defect in shank3b+/- zebrafish larvae that have previously been proposed as a model for autism. This study is strengthened by showing data from two species; however the link between the HeLA cell line data and larval zebrafish is not strong. The study would be improved by assessing zebrafish mitochondrial changes after drug application, and testing more than one concentration of BH and NMN in the behavioural assay. This is an interesting study, and it is nicely written and presented. I have made some comments to strengthen the study below.

      Major comments My expertise is in modelling some aspects of autism in zebrafish. To this end I have focussed on the zebrafish part of this manuscript more fully. I have several comments related to the zebrafish experiments. 1. The changes in mitochondrial morphology, peroxisome number and mitochondrial protein levels were measured in HeLA cells and not comparable data is shown for zebrafish. The same experiments should be repeated using larval zebrafish or a zebrafish cell line. *

      Response: We chose to use HeLa cells for the mechanistic studies due to practical reasons. Cell lines offer a controlled and well-established system for investigating cellular processes and molecular mechanisms. Measuring these parameters in tissues is significantly more challenging and requires different reagents (e.g., antibodies) and methodology (electron microscopy) that are not feasible in the current study.

      On the other hand, zebrafish larvae were employed for the behavior studies, which cannot be conducted using cell lines. By utilizing zebrafish, we were able to examine the effects of beta-hydroxybutyrate (BHB) and nicotinamide nucleotide (NMN) on locomotor behavior, providing valuable insights into potential therapeutic implications for autism.

      While we acknowledge the limitations of not directly measuring mitochondrial morphology, peroxisome number, and mitochondrial protein levels in zebrafish, we believe that our study provides significant contributions to understanding the effects of BHB and NMN in zebrafish behavior. Future studies could certainly consider incorporating zebrafish-specific experiments to complement the findings in HeLa cells.

      • How did you choose the concentration of BHB and NMN to use in behavioural experiments? And the timing of application - I don't really understand why you waited 3 days after drug application to measure locomotion. *

      Response: These doses chosen initially as they were similar the doses that induced mitochondrial elongation in HeLa cells and were tolerated by the fish larvae. As we saw promising effects at these initial doses, we decided to explore them in more detail. While we agree that it would be worth comparing the effects of additional doses, as well as looking at their effects at other timepoints, such work would be a major endeavour and is beyond the scope of our initial investigations, which we feel are worth reporting in their current state.

      With respect to the treatment paradigm, fish larvae were treated 10-48 hours post fertilization, as this is a critical neurogenic developmental timepoint that is often used for exposure studies. Fish do not fully hatch until 3-4 days post fertilization, and display only minimal movement before 5 days, which is why we waited until 5 days to look at movement.

      • Do the shank3b+/- larvae show any morphological deficits? Their decrease in locomotion is striking. Is the morphology also rescued by drug application? Can you tie this to the mitochondrial changes that you observed in HeLA cells?*

      Response: We do not observe any gross changes in fish morphology that might explain a decrease in locomotion. Unfortunately, it is not feasible to look at mitochondrial morphology in the fish at this time. However, based on previous published work showing that the ketogenic diet promotes mitochondrial elongation in mouse brains (PMID:32380723), we would expect mitochondrial morphology also to be changed in the fish. Nonetheless, as we have not examined this directly in fish, we are not making this specific claim in this manuscript.

      • In figure 6A you use time spent swimming as a readout of distance. This doesn't really make sense, because without also showing speed of swimming it is not possible to know whether time and distance correlate in the same way across genotypes. This figure could be improved by showing more detail - speed of swimming, time spent immobile etc. This can easily be extracted from the films that you have already made using the ViewPoint software. *

      Response: As requested, we have reanalyzed the zebrafish movement data for a more refined analysis. In the revised version (Fig 6), we include analysis of both speed and distance travelled within a defined time. Importantly, these findings still support differences between WT and shank3b+/- fish that are restored by BHB and NMN to varying degrees.

      • Showing a change in locomotion is not enough to claim that a model is autism-like. At a minimum I think that you need to show changes in social behaviour - likely using older fish (more than three weeks) that interact with each other. Changes in locomotion can be caused by so many factors, many of which are not indicative of autism. It is important that as a field we do not simply claim that locomotion can be used as a proxy for more complex disease phenotypes. This recent review may help you with this point:* https://www.frontiersin.org/articles/10.3389/fnmol.2020.575575/full.

      Response: The reviewer makes an important point that the movement behaviour phenotypes that we see do not necessarily represent classic ASD phenotypes (i.e., repetitive behaviour, reduced sociability, and reduced communication). To begin to address this issue, we analyzed thigmotaxis, which can be a measure of anxiety. Notably, we also see differences that are reversed by BHB and NMN. However, we cannot model all ASD behaviours in a fish model, and we are not set up to look at social behaviour, especially in the young fish that we were studying. As such, even though Shank3 is a recognized ASD gene, and the shank3b+/- model we are studying is a validated ASD model (PMID: 29619162), we have re-phrased the manuscript in the context of neurodevelopment generally, rather than with respect to ASD specifically. As such, we ascribe the movement and thigmotaxis phenotypes as neurodevelopmental phenotypes that are improved by BHB and NMN.

      *For the statistics, as far as I can tell, all of the data should be analysed by ANOVA or the non-parametric equivalent followed by a post-hoc test. Please check this and add information about normality in. *

      Response: As requested, we have clarified our statistical methodology throughout the manuscript.

      For the mechanistic data, we used t-tests for direct comparisons between two groups (e.g., vehicle vs. treatment). While multiple conditions such as vehicles, NMN, BHB, or etomoxir were tested, statistical comparisons were only conducted comparisons between the vehicle and each treatment group individually. As we are not also making comparisons between treatments this is not a multiple comparison, and ANOVA is not applicable in this context. We have clarified this rationale in the manuscript to avoid any confusion.

      For the zebrafish study, where multiple factors were involved (e.g., treatments across different time points or conditions), we performed a two-way ANOVA followed by Tukey's post-hoc test to identify specific group differences. This approach was appropriate for analyzing these datasets and ensures robust conclusion.

      With respect to normality testing, all datasets were assessed for normality using the Shapiro-Wilk test, and no violations of normality were observed. The updated text now includes these details.

      *Minor comments

      1. Make sure that you refer to the fish line as shank3b+/- throughout - see abstract.*

      This has bee corrected.

      • Please add a space between all numbers and units (e.g. 5 Mm). *

      This has bee corrected.

      • There is a spelling error on line 340 page 16: finings instead of findings. *

      This has bee corrected.

      • In figure 1, if each dot represents a different sample, then there appear to be many fewer samples analysed in 1D compared to 1B. Can you comment upon this please*

      __Response: __A total of 80-150 cells were counted per condition, and the analyses were performed on 3 independent replicates with 2 independent technical replicates for each treatment condition. The quantification of mean mitochondrial branch length in Figure 1B was measured using Image-J and the MiNA plugin. The measurements were taken from three independent replicates using a standard region of interest (ROI) and randomly selected areas from each image.

      In Figure 1D, NAD+ levels were measured 24 hours after treatment of vehicle, βHB, NMN, or Eto+βHB in HeLa cells (n=3-6/group). Each sample lysate represents an independent experimental dish from which coverslips were collected for image analysis.

      The difference in sample numbers between Figure 1B and 1D arises because image analysis involves individual cells fixed and stained on coverslips, whereas the NAD assay requires the whole lysate from the entire cell culture dish. Therefore, the higher cell count in Figure 1B represents the number of cells analyzed on coverslips, while Figure 1D represents NAD levels from the lysate normalized to the protein concentration.

      *Reviewer #3 (Significance (Required)):

      I think that this will be interesting to autism researchers and it could lead to more investigation of the ketogenic diet. Some more work is needed, likely in other model organisms, before this research can be translated to human patients. *

      __Response: __We agree that the findings of our study could be of interest to autism researchers and have implications for further investigation of the ketogenic diet (KD). It is important to note that further work, including studies in other model organisms, would be beneficial before translating this research to human patients.

      Our study aimed to provide mechanistic insights into the effects of the KD on mitochondrial morphology and behavior. We recognize that the translation of research findings to human patients requires rigorous investigation, including preclinical and clinical studies. Our study contributes to the understanding of the underlying mechanisms involved in the KD's effects, laying the groundwork for future research and potential therapeutic avenues.

      We appreciate your perspective and emphasize that our intention is to provide valuable insights into the mechanisms underlying the KD's effects rather than suggesting immediate translation to human patients. Further investigation and validation in diverse models and clinical settings will be necessary before considering clinical applications.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      In this paper, Uddin and colleagues have investigated components of the ketogenic diet to understand changes in both mitochondrial morphology and protein expression, and zebrafish locomotor behaviour. They investigate whether beta-hydroxybutyrate (BHB) or nicotinamide nucleotide (NMN) application can later human mitochondria in HeLA cell lines, and also recue a locomotion defect in shank3b+/- zebrafish larvae that have previously been proposed as a model for autism. This study is strengthened by showing data from two species; however the link between the HeLA cell line data and larval zebrafish is not strong. The study would be improved by assessing zebrafish mitochondrial changes after drug application, and testing more than one concentration of BH and NMN in the behavioural assay.

      This is an interesting study, and it is nicely written and presented. I have made some comments to strengthen the study below.

      Major comments

      My expertise is in modelling some aspects of autism in zebrafish. To this end I have focussed on the zebrafish part of this manuscript more fully. I have several comments related to the zebrafish experiments.

      1. The changes in mitochondrial morphology, peroxisome number and mitochondrial protein levels were measured in HeLA cells and not comparable data is shown for zebrafish. The same experiments should be repeated using larval zebrafish or a zebrafish cell line.
      2. How did you choose the concentration of BHB and NMN to use in behavioural experiments? And the timing of application - I don't really understand why you waited 3 days after drug application to measure locomotion.
      3. Do the shank3b+/- larvae show any morphological deficits? Their decrease in locomotion is striking. Is the morphology also rescued by drug application? Can you tie this to the mitochondrial changes that you observed in HeLA cells?
      4. In figure 6A you use time spent swimming as a readout of distance. This doesn't really make sense, because without also showing speed of swimming it is not possible to know whether time and distance correlate in the same way across genotypes. This figure could be improved by showing more detail - speed of swimming, time spent immobile etc. This can easily be extracted from the films that you have already made using the ViewPoint software.
      5. Showing a change in locomotion is not enough to claim that a model is autism-like. At a minimum I think that you need to show changes in social behaviour - likely using older fish (more than three weeks) that interact with each other. Changes in locomotion can be caused by so many factors, many of which are not indicative of autism. It is important that as a field we do not simply claim that locomotion can be used as a proxy for more complex disease phenotypes. This recent review may help you with this point: https://www.frontiersin.org/articles/10.3389/fnmol.2020.575575/full.
      6. For the statistics, as far as I can tell, all of the data should be analysed by ANOVA or the non-parametric equivalent followed by a post-hoc test. Please check this and add information about normality in.

      Minor comments

      1. Make sure that you refer to the fish line as shank3b+/- throughout - see abstract.
      2. Please add a space between all numbers and units (e.g. 5 Mm).
      3. There is a spelling error on line 340 page 16: finings instead of findings.
      4. In figure 1, if each dot represents a different sample, then there appear to be many fewer samples analysed in 1D compared to 1B. Can you comment upon this please?

      Significance

      I think that this will be interesting to autism researchers and it could lead to more investigation of the ketogenic diet. Some more work is needed, likely in other model organisms, before this research can be translated to human patients.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review)

      The weaknesses are in the clarity and resolution of the data that forms the basis of the model. In addition to whole embryo morphology that is used as evidence for convergent extension (CE) defects, two forms of data are presented, co-expression and IP, as well as a strong reliance on IF of exogenously expressed proteins. Thus, it is critical that both forms of evidence be very strong and clear, and this is where there are deficiencies; 1) For vast majority of experiments general morphology and LWR was used as evidence of effects on convergent extension movements rather than Keller explants or actual cell movements in the embryo. 2) The study would benefit from high or super resolution microscopy, since in many cases the differences in protein localization are not very pronounced. 3) The IP and Western analysis data often show subtle differences, and not apparent in some cases. 4) It is not clear how many biological repeats were performed or how and whether statistical analyses were performed. 

      (1) To more objectively assess the convergent extension phenotypes, we developed a Fiji macro to automatically quantify the LWR in various injected Xenopus embryos, as detailed in the Methods section. We acknowledge that a limitation in the current manuscript is how to link our mechanistic model at the molecular level with the actual cellular behavior during convergent extension, and we plan to perform cell biological studies in the future to elucidate the link;

      (2) We have repeated some of the imaging experiments in DMZ explants using a Zeiss LSM 900 confocal equipped with Airyscan2 detector that can increase the resolution to ~100 nm. The new data are in Suppl. Fig. 4, 9, 11, 16;

      (3) We have repeated all IP and western blots at least three times and provided quantification and statistical analyses;

      (4) We have added the information on biological repeats and statistical analyses in all figures and figure legends.

      Reviewer #2 (Public Review):

      The protein localization experiments in animal cap assays are for the most part convincing, but with the caveat that the authors assume that the proteins are acting within the same cell. As Fzd and Vangl2 are thought to localize to opposite cell ends in many contexts, can the authors be sure that the effects they observe are not due to trans interactions? 

      In our previous publication, we provided evidence that Vangl is necessary and sufficient to recruit Dvl to the plasma membrane within the same cell (Figure 3 in 10.1093/hmg/ddx095). In a more recent publication ( 10.1038/s41467-025-57658-0 ), we further elucidated a mechanism through which Dvl oligomerization switches its binding from Vangl to Fz, and determined that Dvl binding to Vangl and Fz are differentially mediated by its PDZ and DEP domain, respectively. In the current manuscript, we also performed co-IP experiment under various conditions to demonstrate binding between Dvl and Vangl. We feel that these evidences together provide a strong argument for our model where Vangl2 acts within the same cell to sequester Dvl from Fz.

      In regards to the Dvl patches induced by Wnt11 (Fig. 3 and Suppl. Fig. 9), we performed separate injection of EGFP- and mSc-tagged Dvl into adjacent blastomeres, and demonstrated that the Wnt11-induced patches arise from symmetrical accumulation of Dvl at contact of two neighboring cells (Suppl. Fig. 9a-c’). This scenario is different from epithelial PCP where Fz/Dvl and Vangl/Pk are asymmetrically accumulated at the contact between two adjacent cells.

      The authors propose a model whereby Vangl2 acts as an adaptor between Dvl and Ror, to first prevent ectopic activation of signaling, and then to relay Dvl to Fzd upon Wnt stimulation. This is based on the observation that Ror2 can be co-IPed with Vangl2 but not Dvl; and secondly that the distribution of Ror2 in membrane patches after Wnt11 stimulation is broader than that of Fzd7/Dvl, while Vangl2 localizes to the edges of these patches. The data for both these points is not wholly convincing. The co-IP of Ror2 and Vangl2 is very weak, and the input of Dvl into the same experiment is very low, so any direct interaction could have been missed. Secondly, the broader distribution of Ror2 in membrane patches is very subtle, and further analysis would be needed to firm up this conclusion. 

      (1) We repeated the co-IP experiment with Myc-tagged Vangl or Dvl. Using the same anti-Myc antibody and experimental condition (including the expression level of Vangl, Dvl and Ror2), we still found that Ror2 could be pulled down by Vangl but not Dvl (Suppl. Fig. 15b). Whereas this data confirms our previous conclusion, we acknowledge that a negative data does not fully exclude the possibility for direct biding between Ror and Dvl.

      (2) We re-analyzed the signal intensity of Dvl and Ror in Wnt11-induced patches. By quantifying the intensity ratio between Ror and Dvl along the patches, we found an increase over two folds at the border of the patches (Fig. 7j, bottom panel). We interpret this data to suggest that Ror is accumulated to a higher level than Dvl at the patch borders.     

      A final caveat to these experiments is that in the animal cap assays, loss of function and gain of function both cause convergence and extension defects, so any genetic interactions need to be treated with caution i.e. two injected factors enhancing a phenotype does not imply they act in the same direction in a pathway, in particular as there are both cis/trans and positive/negative feedbacks between the PCP proteins. 

      We agree with the reviewer that a difficulty in studying PCP/ non-canonical signaling is that both loss and gain of function of any its components can cause convergence and extension defects. Genetic interactions, especially synergistic interactions, should be interpreted with caution. But we do want to point out that, in a number of case, we were also able to demonstrate epistasis. For instance, we found that Dvl2 over-expression induced CE defects can be rescued by Pk over-expression (Fig. 1e and f), whereas Vangl/ Pk co-injection induced severe CE defects can be reciprocally rescued by Dvl2 over-expression (Fig. 1g). Likewise, we showed that Fz2/ Dvl2 co-injection induced CE defects can be rescued by wild-type Vangl2 but not Vangl2 RH mutant (Suppl. Fig. 6b), and Ror2 can rescue Vangl2 overexpression induced CE defect (Suppl. Fig. 14). Collectively, these functional interaction data consistently demonstrate an antagonism between Dvl/ Fz/ Ror2 and Vangl2/ Pk, which is correlated with our imaging and biochemical studies.

      As you can see from the reviews, the referees generally agree that your paper is a potentially valuable contribution to the field. Your observations are important because of the novel model based on the inhibitory feedback regulation between planar cell polarity (PCP) protein complexes. However, the reviewers also stated that the model is only partly supported by data because of insufficient clarity and missing controls in several experiments supporting the proposed model. The paper would be significantly improved if your conclusions are backed up by additional experimentation. Specifically, the referees wanted to see the reproducibility of the results shown in Figures 3, 4, 8, S3, S7, S12. 

      We hope that you are able to revise the paper along the lines suggested by the referees to increase the impact of your study on the current understanding of PCP signaling mechanisms. 

      We thank the reviewers for careful reading of our manuscript and for their constructive critiques and suggestions. We have repeated the animal cap studies in original Figures 3, 4, 8 and S3 with DMZ explants, and the new data are in Supplementary Fig. 9, 11, 16 and 4, respectively. We also repeated the biochemical studies in original Figure S 7and 12, and the new data are in Supplementary Fig. 8 and 15.

      Reviewer #1 (Recommendations For The Authors):

      Major points:(1) The author conducted an analysis of the subcellular localization of PCP core proteins, including Vangl2, Pk, Fz, and Dvl, within animal cap explants (ectodermal explants). To validate the model proposing that 'non-canonical Wnt induces Dvl to transition from Vangl to Fz, while PK inhibits this transition, and they function synergistically with Vangl to suppress Dvl during Convergent Extension (CE),' it is crucial to assess the subcellular localization of PCP core proteins in dorsal marginal zone (DMZ) cells, which are known to undergo CE. Notably, the overexpression of Wnt11 alone, as employed by the author, does not induce animal cap elongation. Therefore, the use of animal cap explants may not be sufficient to substantiate the model during Convergent Extension (CE). Indeed, previous knowledge indicates that Vangl2 and Pk localize to the anterior region in DMZ explants. However, the results presented in this manuscript appear to differ from this established understanding. Consequently, to provide more robust support for the proposed model, it is advisable to replicate the key experiments (Figures 3, 4, 8, and Figure S3) using DMZ explants. 

      We repeated the experiments in Figure 3, 4, 8 and Figure S3 with DMZ explant and the new data are in new Supplementary Fig. 9, 11, 16 and 4, respectively.In regards to “previous knowledge indicates that Vangl2 and Pk localize to the anterior region in DMZ explants”, we are aware Vangl/ Pk localization to the anterior cell cortex in neural epithelium from the studies by the Sokol and Wallingford labs, but are not aware of similar reports in DMZ explants. When we examined the localization of small amount of injected EGFP-mPk2 (0.1 ng mRNA) in DMZ explants, we saw a somewhat uniform distribution on the plasma membrane (Suppl. Fig. 4). In addition, in a related recent publication, we examined endogenous XVangl2 protein localization in activin induced animal cap explants that do undergo CE. What we observed was that whereas low level injected Dvl2 and Fz form clusters on the plasma member, endogenous XVangl2 remains uniformly distributed on the plasma membrane (Suppl. Fig. 3S-Z in 10.1038/s41467-025-57658-0 ). These observations may suggest potential differences of PCP protein localization during neural vs. mesodermal convergence and extension.

      (2) The author suggests that 'Vangl2 and Pk together synergistically disrupt Fz7-Dvl2 patches.' As shown in Figure 4 (panels J' to I'), it is evident that the co-expression of Pk and Vangl2 increases Fz7 endocytosis. Nevertheless, a significant amount of Fz7 still co-localizes with Dvl2. To strengthen the author's hypothesis, additional clear assay is required such as Fluorescence resonance energy transfer (FRET) assay. 

      We appreciate this valuable advice. Since none of the tagged Fz/ Dvl/ Vangl proteins we had were suitable for FRET, we made proteins tagged with mClover and mRuby2, which were reported as optimized FRET pairs. But in our hands mRuby2 seems to require very long time (~2 days) to mature and become detectable at room temperature, and is not suitable for our Xenopus experiments. We are in the process of establishing a luciferase based NanoBiT system to detect Fz-Dvl and Dvl-Vangl interactions in live cells and cell lysates, and will use it in future studies to investigate their interaction dynamics.

      For the current manuscript, we reason that a substantial reduction of Fz7-Dvl2 clusters with Vangl2/ Pk co-injection would still support our idea that Vangl2 and Pk act synergistically to sequester Dvl from Fz to prevent their clustering in response to non-canonical Wnt ligands.

      (3) The IP data is less clear and evident. A couple of examples are: a) Fig 2g where the authors report that the Vangl2 R177H variant reduced Vangl2 interaction with Pk and recruitment of Pk to the plasma membrane, but it appears that the variant interacts slightly better than WT Vangl2 with Pk. In Fig. S7a, the authors state that Pk overexpression can indeed significantly reduce Wnt11-induced dissociation of EGFP-Vangl2 and Flag-Dvl2 in the DMZ. However, there is a minimal impact when compared to the Wnt11 absent control. Based on the results presented in Fig S12a the authors indicate that Wnt11 reduces the association between Vangl2 and Dvl2, which can be discerned, but loss of Ror2 does not change this in any obvious way - but the authors indicate it does. In S12b, the authors have suggested that Ror and Dvl do not form a direct binding interaction. However, the interpretation of Figure S12b is not entirely convincing due to several issues. Notably, the expression levels of each protein appear inconsistent, the bands are not sufficiently clear, and there is the detection of three different tag proteins on a single blot. To strengthen the validity of these findings, it is advisable to repeat this experiment with improved quality. 

      We repeated all the co-IP and western blot analyses pointed out by the reviewer, and performed quantification and statistical analyses.

      Fig 2g had a mistake in the labeling and is replaced with new Figure 2g;

      Fig. S7a is replaced by new data in Supplementary Figure 8a and b;

      Fig. S12a and 12b are replaced by new data in Supplementary Figure 15a, a’ and b, respectively. In 15a and a’, we noticed a consistent decrease of Dvl2-Vangl2 co-IP in Xror2 morphant. The reason for this is not yet clear and will need further study in the future.

      Minor points: (1) In all the whole embryo injection assays examining morphology, no Western analysis is performed to show roughly equivalent and appropriate levels of the various proteins are being expressed. Differences will affect the data. 

      Although we did not do western analyses to examine the protein levels in various functional interaction assays, we did examine how co-expression of Vangl2, mPk2 or Dvl2 may impact each other’s protein levels in Supplementary Fig. 2, which did not reveal any significant change when co-injected in different combination.

      (2) The author's prior publication (Bimodal regulation of Dishevelled function by Vangl2 during morphogenesis, Hum Mol Genet. 2017) presented clear evidence of Vangl2 overexpression inducing Dvl2 membrane localization. However, Figure S4 in the current manuscript did not provide clear evidence of membrane localization. To strengthen the hypothesis that Vangl2-RH mutant also induces Dvl2 membrane localization, further comprehensive imaging analysis is needed. 

      We re-analyzed the imaging data and replaced old Figure S4 with a new Supplementary Fig. 5.

      (3) In Supplementary Figure 9, the authors propose that the overexpression of Vangl2/Pk induces Fz7 endocytosis, as indicated by its co-localization with FM4-64. However, it raises a question: how does the Fz7-GFP protein internalize into the cells without endocytosis, as seen in Figures S9a-c'? To enhance readers' understanding, a discussion addressing this point should be included. 

      We think that this might be a technical issue. As detailed in the Method section, we only incubated the embryos transiently with FM4-64 for 30 minutes, and the embryos were subsequently washed and dissected in 0.1X MMR without the dye. Therefore, only the Fz7-GFP protein endocytosed during the 30 minute-incubation would be labeled by FM-64, whereas that endocytosed before or after the incubation would not. Alternatively, the very few Fz7-GFP puncta occasionally observed in the absence of Vangl2/Pk overexpression could be vesicles trafficking to the plasma membrane.

      (4) Statistical analyses are absent for several results, including those in Figure 2f, Figure S4d, and Figure S7b. 

      We repeated these experiments and included statistical analyses. The new data are in Figure 2f, Supplementary Fig. 5d and Supplementary Fig. 8b.

      (5) This manuscript lacks any results regarding Ck1. Therefore, it is advisable to consider removing the discussion or mention of CK1. 

      We agree, and tune down the discussion on CK1 and removed CK1 from our model in Fig. 9.

      Reviewer #2 (Recommendations For The Authors):

      (1) In all the convergence and extension assays, the authors should report n numbers (i.e. number of animals), what statistical test is used, and what the error bars show. Ideally dot-plots would be used instead of bar charts as they give a better insight into the data distribution. It might be useful to give a section on the statistical analyses used in the M&M, including e.g. any power calculations carried out, as now required by many journals. 

      We have follow the advice to use dot-plots for all the quantification analyses in the manuscript. We include in the figure legends the statistical test used and what the error bars show. The number of embryos analyzed were included in each panel in the figures. We also provided more details in the Methods section on how the LWR quantification was carried out.

      (2) I think Figure 2g is wrongly labelled? FLAG bands are in all three lanes in the western blot, but not labelled as such in the schematic. 

      We corrected the schematic labeling in Figure 2g, and thank the reviewer for catching this mistake.

      (3) In Figure S7, the authors show that co-IP of Dvl and Vangl2 is reduced by Wnt11 and the effects of Wnt are blocked by Pk. Does Pk have any effect in the absence of Wnt? 

      We examined the effect of Pk over-expression on Dvl2-Vangl2 co-IP as advised, and did not see a significant impact in the absence of Wnt11 co-injection. The data is included in the new Supplementary Figure 8a. We interpret the data to suggest that “at least under the condition of our co-IP experiment, Pk may not directly impact the steady-state binding between Vangl and Dvl”.

      (4) In Figure 3, the authors show (as published previously) that Wnt11 induces patches of Dvl at the plasma membrane. It would be useful to see Dvl in the absence of Wnt and Vangl2/Dvl in the absence of Wnt. 

      Dvl is widely known as a cytoplasmic protein and its localization has been published by many labs over the past 20-30 years. In our recent publication (10.1038/s41467-025-57658-0 ), we also re-examined Dvl localization when injected at various dosages. So we did not feel it was necessary to show its localization in the absence of Wnt11 again, but included a reference to our prior publication. In regards to Vangl/Dvl distribution in the absence of Wnt11, the readers can see Suppl. Fig. 5b as an example, in addition to our previous publications referenced in the manuscript.

      (5) In the review figures, the difference in Fz7-GFP patch formation in d' and e' (vs e.g. a') is not very clear. Could the images be improved or (better) quantified in some way? 

      We assume that “review figures” refer to Figure 3 or 4? If so, we felt that Fz7-GFP patch formation was clear in Fig. 3d’, e’ or Fig. 4d’, e’. Nevertheless, we repeated these experiments in DMZ explants as advised by Reviewer 1, and additional examples of Fz7-EGFP patch formation can be seen in the new Suppl. Fig. 9d-f’ and Suppl. Fig. 11d-f’.

      (6) In Figure 6d, I'm concerned that the loss of flag-Dvl2 might occur via dephosphorylation in the IP reaction. Also the M&M don't include methodological details about buffers and whether phosphatase inhibitors were used. A compelling control would be anti-FLAG pulldown showing retention of phosphorylation. Also Figure 6f shows a reduced ratio of fast-to-slow migrating bands of Dvl with Vangl2/Pk - unless I have misunderstood, is this ratio the wrong way round? 

      We added co-IP buffer and protease inhibitor information in Methods.

      We agree that the concern about dephosphorylation during IP reaction is valid, and that direct pull down of Dvl to show the phosphorylated form is a compelling control. We therefore note that in Suppl. Fig. 8a and 15b, direct pull down of Flag-Dvl or Myc-Dvl (with anti-Flag or anti-Myc) did show the slower migrating, phosphorylated form. Additional examples in which Vangl only co-IP the faster migrating unphosphorylated Dvl include Suppl. Fig. 15a, and in a related paper we published recently (Fig. 3R and R’ in 10.1038/s41467-025-57658-0 ).

      Finally, we did wrongly label Figure 6f in the last submission, and the ratio should have been “slow/fast”. We have made the correction, and appreaicte the reviewer for the meticulousness in perusing our manuscript.

      (7) In Figure 7, what does Ror2 look like in the absence of Wnt11? 

      We included new Figure 7a-c to show that without Wnt11 co-injection, Ror2 is uniformly distributed on the plasma membrane.

      (8) Also in Figure 7, Ror2 patches are said to be slightly wider than Dvl2 patches "reminiscent of Vangl2" - I wouldn't describe them as being similar. Vangl2 shows a distinct dip in the center of the Dvl patches, Ror2 does not show a dip, and is only (at best) in a slightly wider patch, and I would want to see further examples to be convinced that the localization domain is reproducibly wider. The merge of many samples in 7d may actually be making the distribution harder to see and if the Xror2 and Dvl2 intensities were normalized I'm not sure how different the curves would appear. (i.e. the Xror2 curve looks like a flattened version of the Dvl2 curve). 

      We have added an additional panel in the new Figure 7j to compare the intensity ratio of Ror/ Dvl2 along the patches, and this analysis reveals an over two folds increase of the ratio at the border region. This quantification may make a more convincing argument that at the patch border region, Dvl is diminished whereas Ror2 accumulate with Vangl2. 

      (9) In Figure S12a, the authors suggest Wnt11 induced dissociation of Dvl from Vangl2 (by co-IP), and this is reduced after Ror2 MO. This would be more convincing with replicates and quantitation. 

      We have repeated this experiment with Vangl2 pull down and added quantification. The data is in the new Suppl. Fig. 15a.

      (10) In Figure S12b, the authors suggest Ror2 can co-IP Vangl2 but not Dvl. This is not very convincing, as the Dvl input band is very weak, and the Vangl2 co-IP band is very weak. 

      We repeated the co-IP experiment with Myc-tagged Vangl or Dvl. Using the same anti-Myc antibody and experimental condition (including the expression level of Vangl, Dvl and Ror2), we still found that Ror2 could be pulled down by Vangl but not Dvl (Suppl. Fig. 15b).

      (11) "Prickle" spelled "Prickel" in the abstract (and abbreviated to "PK" not "Pk" at one place in the abstract and several places in text) 

      We have corrected these typos.

      (12) Quite a lot of interesting observations are in supplemental figures. Normally it might be expected that extra data supporting a conclusion would be in supplemental, but here some of the supplemental data feels like it is more than simply additional evidence. For instance supplemental Figures 2 and 3 feel more than just supplemental (and Supplemental Figure 3 if merged with Figure 2 would make it easier for the reader). Moreover, for example, the description of the results in Figure 2 is punctuated by references to supplemental Figures 4 and 5 that contain key data to support the conclusions, which means the reader has to flick backwards and forwards from place to place in the manuscript to follow the argument. It is of course up to the authors, but in some cases putting supplemental data back into the main figures (for which there is no size or number limit) would increase clarity. 

      These are excellent points; in the resubmitted manuscript we have a total of 24 data figures, and we used 8 as main figures since we felt that they provide the most relevant and conclusive evidence to our model. We will consult the copy editors at eLife on how to arrange the rest as main vs. supporting figures when requesting publication as version of record.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the reviewers for their thoughtful comments and overall very supportive feedback.

      Reviewer #1 writes: "The study is very thorough and the experiments contain the appropriate controls. (...) The findings of the study can have relevance for human conditions involving disrupted mitochondrial dynamics, caused for example by mutations in mitofusins." Reviewer #2 writes: "The dataset is rich and the time-resolved approach strong." Reviewer #3 writes: "I admire the philosophy of the research, acknowledging an attempt to control for the many possible confounding influences. (...) This is a powerful and thoughtful study that provides a collection of new mechanistic insights into the link between physical and genetic properties of mitochondria in yeast."

      We address all points below. We have not yet updated our text and figures since we expect substantial additions from new experiments. But we have included Figure R1 with some additional analyses of existing data at the bottom of the manuscript.

      Reviewer1

      1.1 Statistical comparisons are missing throughout the manuscript (with the exception of Fig. 2c). Appropriate statistical tests, along with p-values, should be used and reported where different gorups are compared, for example (but not limited to) Fig. 3d and most panels of Fig. 4.

      We initially decided not to add too many extra labels to the already very busy plots, given that the magnitude of change mostly speaks for itself. However, we will try to find meaningful statistical tests together with a sensible graphical representation for all of the figures. For one example see Figure R1A.

      1.2. I do not agree with the use of Atp6 protein as a direct read-out of mtDNA content. While Atp6 protein levels will decrease with decreasing mtDNA content, the inverse is not necessarily true: decreased Atp6 protein levels do not necessarily indicate decreased mtDNA levels, because they could alternatively or additionally be caused by decreased transcription and/or translation. Therefore, please do not equate Atp6 protein levels to mtDNA levels, and instead rephrase the text referencing the Atp6 experiments in the Results and Discussion sections to measure "mtDNA expression" or "mt-encoded protein" or similar. For example, on p. 14 line 431 should read "mtDNA expression" rather than "decreased synthesis of mtDNA", and line 440 on the same page "mean mtDNA levels" should be "mtDNA expression" or similar.

      All three reviewers agree that using Atp6-NG as a direct proxy for mtDNA requires more validation, or at least rephrasing of the text. We agree that this is the most important point to address. We had previously tried using the mtDNA LacO array (Osman et al. 2015) to directly assess the amount of nucleoids per cell. However, the altered mitochondrial morphology of the Fzo1 depleted cells combined with the LacI-GFP which is still in mitochondria even when mtDNA is gone, increases the noise level to a point that we cannot interpret the signal. However, as this manuscript was in the submission process, the Schmoller lab (co-authors #2 and #7) adapted the HI-NESS system to label mtDNA in live yeast cells(Deng et al. 2025). This system promises much better signal to noise and we expect we can address all concerns regarding the actual count of nucleoids per cell. Should this unexpectedly fail for technical reasons, we will try to calibrate the Atp6-levels with DAPI staining at defined time points and will rephrase the text as the reviewer suggests.

      1.3. In Fig. 3, the authors use the fluorescence intensity of a mitochondrially-targeted mCardinal as a read-out of mitochondrial mass. Please provide evidence that this is not affected by MMP, either with relevant references or by control experiments (e.g. comparing it to N-acridine orange or other MMP-independent dyes or methods).

      Whether or not the import of any mitochondrial protein is dependent on the MMP depends largely on the signal sequence. The preSu9-signaling sequence was previously characterized as largely independent of the MMP compared to other presequences (Martin, Mahlke, and Pfanner 1991), which is why Vowinckel (Vowinckel et al. 2015) and others (Di Bartolomeo et al. 2020; Perić et al. 2016; Ebert et al. 2025) have previously used this as a neutral reference to the strongly MMP-dependent pre-Cox4 signal to estimate MMP. As one control in our own data, we consider that the population-averaged mitochondrial fluorescent signal Figure S3C stays constant in the first few hours, in agreement with the total averaged mitochondrial proteome (Fig R1E). As additional controls, we plan to compare the signal to an MMP independent dye as the reviewer suggests.

      1.4. In Fig. 2e-f, the authors use a promoter reporter with Neongreen to answer whether the reduced levels of the nuclear-encoded mitochondrial proteins Mrps5 and Qcr7 are due to decreased expression or to protein degradation, and find no evidence of degradation of the Neongreen reporter protein. However, subcellular localization might affect the availability of the protein to proteases. Although not absolutely required, it would be relevant to know if the Neongreen fusion protein is found in the same subcellular compartment as Mrps5 and Qcr7 at 0h and 9h after Fzo1 depletion.

      Here, it seems we need to explain the set-up and interpretation of the data better. The key point we are trying to make with the promoter-Neongreen construct is that the regulation is not mainly at the level of transcription. We are showing that the reduction in the levels of the actual protein (orange bars) is not (mainly) explained by a reduction in expression, since the promoter is similarly active at 0 and at 9 hours (grey bars). If expression from the promoter were strongly reduced, the Neongreen would be diluted with growth and would also decrease, but this is not the case. The fluorophore itself is just floating around in the cytosol and is not subject to the same post-translational regulation as Mrps5 and Qcr7, so there is no reason to expect degradation.

      1.5. Fzo1 depletion leads to a very rapid drop in MMP during the first hour of depletion. In the Discussion, can the authors speculate on the possible mechanism of this rapid MMP drop that occurs well before mtDNA or mt-encoded proteins are decreased in level?

      This is indeed an interesting point. We think there are likely three reasons causing this initial drop: Firstly, due to the fragmentation the mixing of mitochondrial content is disturbed and smaller fragments may have suboptimal stoichiometry of components (see also (Khan et al. 2024) who look at this in detail including the Fzo1 deletion); secondly, already fairly early, some mitochondrial fragments may not contain any mtDNA and therefore will be unable to synthesize ETC proteins; thirdly, altered morphological features like changes in the surface-to-volume ratios may play a role. Sadly, mechanistically following up on this is not possible with the tools in our hands and therefore outside of the scope of this manuscript. But we are happy to include these speculations in our discussion.

      1.6. In Fig. 2a, the mtDNA copy number of Fzo1-depleted cells is ca 1.3-fold of the control cells at the 0h timepoint. Why might this be? Is it an impact of one of the inducers? If so, we might be looking at the combination of two different processes when measuring copy number: one that is an induction caused by the inducer(s), and the other a consequence of Fzo1 depletion itself.

      We believe that this 30% increase is within the noise of the experiment rather than an effect of the induction. Since we normalize to t=0 uninduced, the first black data point does not have error bars, emphasizing this difference. None of the protein data suggests that there is an increase in mtDNA encoded proteins (see e.g. 2B, or Atp6 fluorescence data). In the planned HI-NESS experiment, we will see in our single cell data whether there is an actual increase in mtDNA upon TIR induction. Additionally, we will run a qPCR to carefully determine mtDNA levels of untreated wild-type cells, tetracycline treated wild-type cells and tetracycline induced TIR expressing cells to exclude effects of tetracycline as well as the expression of TIR on mtDNA.

      Minor comments:

      1.7. p. 3, line 71: "ten thousands of dividing cells.." should be "tens of thousands of dividing cells".

      Thank you, will correct.

      1.8.-p.4, line 116: please be even more clear with what the "depleted" cells and controls are treated with: are depleted cells treated with both inducers, and controls with neither?

      We will make this more clear. Depleted cells are treated with both inducers, the control cells are not. However, in Figure 1A and in S1 we do controls to show that inducing TIR per se or adding aTC per se does not change growth rate or mitochondrial morphology. We will make this more clear.

      1.9. -p.5, lines 147-148: the authors write "the rate with which the abundance of Cox2 and Var1 proteins decreases was similar to the rate of mtDNA loss" though the actual rate is not shown. Please calculate and show rates for these processes side by side to make comparison possible, or alternatively rephrase the statement.

      Indeed this was not phrased well. We will call it dynamics rather than rates.

      1.10. -Fig. 2d: changing the y-axis numbering to match those in panels a and b would facilitate comparisons.

      Makes sense, we will change this.

      1.11. Fig. 2e: it is recommended to label the western blot panels to indicate what protein is being imaged in each (Neongree,, Mrps5, Qcr7).

      We will adapt the labelling to make it more clear.

      1.12. -p.9, line 262: I suggest referencing Fig. 4e at the end of the first sentence for clarity.

      We will modify the sentence as suggested.

      1.13. -In the sections related to Fig. 3a and Fig. 5a as well as the connected supplemental data, the authors discuss both the median and the mean of mitochondrial mass and Atp6 protein, respectively. For purposes of clarity, I suggest decreasing the focus on the mean (that is provided only in the supplemental data) and focusing the text mainly on the median. The two show differing trends and it is very good that both are shown, but the clarity of the text can be improved by focusing more on the median where possible.

      We will check the phrasing and simplify.

      1.14. -p. 14, line 435: the statement that mt mass is maintained over the first 9h of depletion is only true for the mean mt mass, not for the median. Please make this clear or rephrase.

      We will check phrasing, make it more clear and also point out the extended proteomics data (see Fig R1), which corresponds to the mean of the populations

      1.15.-p.14, line 452: "mitofusions" should be "mitofusins".

      Thanks for catching this.

      Reviewer 2:

      2.1. While inducible TIR is used to reduce background, the manuscript should rigorously exclude auxin/TIR off-targets (growth, mitochondrial phenotypes, gene expression). Please include full matched controls: (plus minus)auxin, (plus minus)TIR, epitope tag alone, and a degron control on an unrelated mitochondrial membrane protein.

      We agree that rigorous controls are crucial for the interpretation of the results. However, we think we have already included most of the controls the reviewer is asking for, but we might have not pointed this out clearly enough. For example, in Fig 1A, we could make it more clear by adding more labels in which samples we added aTC, which is only described in the figure legend.

      Here is a list of all the controls:

      • Each depletion experiment is always matched with an experiment of the same strain without induction. So the genetic background as well as effects such as light exposure, time spent in the microfluidics systems, etc are controlled for.
      • Figure S1D shows that the growth rate is wildtype like in a strain containing either the AID tag or the TIR protein AND upon addition of both chemicals. It also shows that the final genetic background (AID-tag and TIR) also grows like wildtype if the inducers are not added. This conclusively shows that neither the tags/constructs nor the chemicals per se affect growth rate
      • In Figure S1C we show the mitochondrial morphology of the same controls. We will make sure to label them more consistently to match panel D, and include an actual wildtype and a FLAG-AID-Fzo1 strain without TIR treated with both aTC and 5-Ph-IAA as direct comparison
      • In figure 1A we compare the Fzo1 protein levels of a strain with and without TIR. We show that in absence of TIR, adding either aTC or Auxin does not change Fzo1 levels and that the levels are comparable in the strain that is able to deplete Fzo1 directly before addition of 5-Ph-IAA (after 2 h of induction of TIR through addition of tetracycline)
      • Additionally, in Figure S2C we show that two hours after adding aTC, the entire proteome does not change significantly apart from a strong induction of TIR. We can also make this more clear in the figure legend.
      • Additionally, we will run a qPCR to carefully determine mtDNA levels of untreated wild-type cells, tetracycline treated wild-type cells and tetracycline induced TIR expressing cells to exclude effects of tetracycline as well as the expression of TIR on mtDNA. (also in response to 1.6.) In summary, we think we have controlled sufficiently for all confounding parameters and most importantly showed that addition of either aTC or Auxin as well as the FLAG-AID tag per se does not disturb mitochondria or cell growth. We do not see what a degron control on an unrelated protein will tell us. Depending on the nature of the protein, it may or may not have a phenotype that may or may not be related to morphology changes etc.

      2.2. The Mitoloc preSu9 vs Cox4 import ratio is only a proxy of mitochondrial membrane potential (ΔΨm) and itself depends on mitochondrial mass, protein expression, matrix ATP, and import saturation. The authors need to calibrate ΔΨm with orthogonal dyes (TMRE/TMRM) and pharmacologic titrations (FCCP/antimycin/oligomycin) to generate a response curve; show that Mitoloc tracks dye-based ΔΨm across the relevant range and corrects for mass/photobleaching. Report single-cell ΔΨm vs mass residuals.

      We completely agree that the MitoLoc system is only a rough proxy for the actual membrane potential. That is why we make no quantitative claims on the absolute value or absolute difference between groups of cells. We also make very clear in Fig 3B what we are actually measuring and can emphasize again in the text that this is only a proxy. We agree that it is a good idea to compare MitoLoc values to TMRE staining as the reviewer suggests, we will do these experiments in depleted and control cells at different timepoints. Please note though that also dye staining has its caveats, especially in dynamic live cell experiments. TMRM for example is not compatible with the acidic pH 5 medium that is typically used for yeast and subjecting cells to washing steps and higher pH may change both morphology of mitochondria and the MMP, especially in cells that are already “stressed”. We prefer not to complete elaborate pharmacological titration experiments because firstly, this was extensively done in the original MitoLoc paper by the Ralser lab ((Vowinckel et al. 2015), cited 120 times); secondly, the value of the MMP is not the most critical claim of the manuscript. See also 3.12. Please note that in Figure S4D we had already plotted MMP vs mitochondrial concentration.

      2.3. To use Atp6-mNeon as a proxy for mtDNA is an assumption. Interpreting Atp6 intensity as "functional mtDNA" could be confounded by translation, turnover, or assembly. Please (i) report mtDNA copy number time courses (you have qPCR), nucleoid counts (DAPI/PicoGreen or TFAM/Abf2 tagging), and (ii) assess translation (e.g., 35S-labeling or puromycin proxies) and turnover (proteasome/AAA protease inhibition, mitophagy mutants -some data are alluded to- plus mRNA levels for mtDNA-encoded genes). This will support the "reduced synthesis" versus "increased degradation" conclusion.

      We agree with all three reviewers that Atp6 is only a proxy for mtDNA (Jakubke et al. 2021; Roussou et al. 2024) and the correlation should be checked more carefully. We will use the very recently established Hi-NESS system to follow nucleoids/ mtDNA during depletion experiments. See detailed reply to 1.2.

      (ii) in Figure 2C we inhibit mitochondrial translation and show that in this case control and depleted cells have the same level of Cox2, at least suggesting that degradation is not the key mechanism controlling the levels of mtDNA encoded proteins. We cannot do proteasome inhibitor assays since the nature of the AID-TIR systems requires an active proteasome. In figure S5C we show that the Atp6 depletion is similar in an atg32 deletion. This does not completely exclude a contribution of mitophagy to the observed phenotype, but does confirm that mitophagy is not the primary reason for cells becoming petite.

      2.4. The promoter-NeonGreen reporters argue against transcriptional down-regulation of nuclear OXPHOS. Please add mRNA (RT-qPCR/RNA-seq) for representative genes and a pulse-chase or degradation-pathway dependency (e.g., proteasome/mitophagy/autophagy mutants) to firmly assign active degradation. The authors need to normalize proteomics to mitochondrial mass (e.g., citrate synthase/porin) to separate organelle abundance from protein turnover.

      While we are happy to perform qPCR experiments for selected genes, a full RNA-seq experiment seems outside the scope of this study. As explained above, a proteasome inhibitor experiment is not possible in this set-up. Bulk mitophagy/autophagy seems unlikely to be the cause of the decrease of the nuclear-encoded OXPHOS proteins, since most other mitochondrial proteins do not decrease on average on population level in the first hours. This data is now plotted as additional figure (see below) and will be included in the supplementary of the revised manuscript (Fig R1E).

      2.5. Using preSu9-mCardinal intensity as "mitochondrial concentration" is sensitive to expression, import competence, and morphology/segmentation. The authors should provide validation that this metric tracks 3D volume across fragmentation states (e.g., correlation with mito-GFP volumetrics; detergent-free CS activity; TOMM20/Por1 immunoblot per cell).

      We agree that this is an important point and the co-authors discussed this point quite intensively. In figure S3A and B we show (using confocal data) that there is a very strong correlation between the total fluorescence signal and the 3D volume reconstruction. However, the slope of the correlation is different between tubular and fragmented mitochondria (compare panels A and B) and see figure legend. Since we are dealing with diffraction-limited objects it is likely that the 3D reconstruction is sensitive to morphology, especially if mitochondria are “clumping”. We therefore think that the total fluorescence signal is actually a better estimate of mitochondrial mass per cell than the 3D volume reconstruction (especially for our data obtained with a conventional epifluorescence microscope). The mean of the total mitochondrial fluorescence also better matches the population average mitochondrial proteome (Fig R1E). To consolidate this assumption, we will additionally compare our data to a strain with Tom70-Neongreen and to MMP independent dyes.

      Notably, since the morphology is similarly altered in mothers and buds this is of minor impact for our main point – the unequal distribution between mother and buds.

      2.6. The unequal mother-daughter distribution is compelling, but causality remains inferred. Test whether modulating inheritance machinery (actin cables/Myo2, Num1, Mmr1) or altering fission (Dnm1 inhibition) modifies segregation defects and rescues mtDNA/Atp6 decline. Complementation with Fzo1 re-expression at defined times would help order the phenotype cascade.

      We agree that rescue experiments would be very useful. We have some preliminary data for tether experiments, for example with Num1. The general problem is that the fragmented mitochondria clump together. We have not found a method to restore an equal distribution between mother and daughter cells. We will try to optimize the assay, but are not overly confident it will work. Mmr1 deletion aggravates the Fzo1 phenotype, likely also because the distribution becomes even more heterogeneous, but we have not rigorously analyzed this.

      We like the idea of the Fzo1 re-expression and will run such experiments. This will be especially powerful in combination with the new HI-NESS mtDNA reporter. We may be able to track exactly when cells reach the point-of-no return and become petite. This will also help connecting our mathematical model more directly to the data.

      2.7. The model is useful but should include parameter sensitivity (segregation variance, synthesis slopes, initial nucleoid number) and prospective validation (e.g., predict rescue upon partial restoration of synthesis or inheritance, then test experimentally).

      We will refine our model to include the to-be-measured nucleoids/mtDNA values. We will include a parameter sensitivity analysis with the updated model.

      Reviewer 3:

      3.1. About the use of Atp6 as a good proxy for mtDNA content. This is assumed from l285 onwards, based on a previous publication. As the link is fairly central to part of the paper's arguments, and the system in this study is being perturbed in several different ways, a stronger argument or demonstration that this link remains intact (and unchanged, as it is used in comparisons) would seem important.

      We agree, see 1.2.

      3.2. About confounding variables and processes. The study does an admirable job of being transparent and attempting to control for the many different influences involved in the physical-genetic link. But some remain less clearly unpacked, including some I think could be quite important. For example, there is a lot of focus on mito concentration -- but given the phenotypes are changing the sizes of cells, do concentration changes come from volume changes, mito changes, or both? In "ruling out" mitophagy -- a potentially important (and intuitive) influence, the argument is not presented as directly as it could be and it's not completely clear that it can in fact be ruled out in this way. There are a couple of other instances which I've put in the smaller points below.

      Thank you for acknowledging our efforts to show transparent and well-controlled experiments! We address each of the specific points below.

      3.3. full genus name when it first appears

      We will add the full name.

      3.4. I may be wrong here, but I thought the petite phenotype more classically arises from mtDNA deletion mutations, not loss? The way this is phrased implies that mtDNA loss is [always] the cause. Whether I'm wrong on that point or not, the petite phenotype should be described and referenced.

      We can expand the text and cite additional relevant papers. The term “petite” refers to any strain that is respiratory incompetent and leads to small colonies (not necessarily small cells!) (Seel et al. 2023). This can be mutations or gene loss (fragments) on the mtDNA (these are called cytoplasmic petite), or chemically induced loss of mtDNA (e.g. EtBr), or mutations of nuclear genes required for respiration (these are termed nuclear petite; some nuclear petites show loss of mtDNA in addition to the mutation in the nuclear genome) (Contamine and Picard 2000).

      3.5. para starting l59 -- should mention for context that mitochondria in (healthy, wildtype) yeast are generally much more fused than in other organisms

      ok.

      3.6. Fig 1C -- very odd choice of y-axis range! either start at zero or ensure that the data fill as much vertical space of the plot as possible

      True, this was probably some formatting relic. We will adapt the axis to fill the full space. Most of our axes start at 0, but that doesn’t make so much sense here, since we consider the solidity in the control as “baseline”.

      3.7. "wild-type like more tubular mitochondria" reads rather awkwardly. "more tubular mitochondria (as in the wild-type)"?

      Thank you, sounds better.

      3.8. l106 -- imaging artefacts? are mitos fragmenting because of photo stress? -- this is mentioned in l577-8 in the Methods, but the data from the growth rate and MMP comparison isn't given -- an SI figure would be helpful here. It would be reassuring to know that mito morphology wasn't changing in response to phototoxicity too.

      In the methods we just briefly point out that we have done all our “due diligence” controls to check that we do not generate phototoxicity, something that we highlight in the cited review. We do not explicitly have a figure for this, but figure S1A shows that the solidity of the mitochondrial network in control cells stays the same over 9 hours, even though these cells are exposed to the same cultivation and imaging regime as the depleted cells. We will also add a picture of control cells after 9 h. In S1B we show that control cells containing TIR but no AID tag treated with both chemicals imaged over 9 hours also show the same solidity (~mitochondrial morphology) as untreated control. Also, the doubling times of cells grown in our imaging system (Fig R1B) are very similar to the shake flask (Fig R1A). All in all, we are very confident that our imaging settings did not impact our reported phenotypes.

      3.9. para l146 -- so this suggests mtDNA-encoded proteins have a very rapid turnover, O(hours) -- is this known/reasonable?

      Reference (Christiano et al. 2014) suggests that respiratory chain proteins are shorter lived than the average yeast protein. However, based on Figure 2C we think the dynamics mostly speak for a dilution by growth.

      3.10. section l189 -- it's hard to reason fully about these statistics of mitochondrial concentration given that the petite phenotype is fundamentally affecting overall cell volume. can we have details on the cell size distribution in parallel with these results? to put it another way -- how does mitochondrial *amount* per cell change?

      This is a good point. We report mostly on mitochondrial “concentrations” because we think this is what the cell actually cares about (mitochondrial activity in relationship to cytosolic activity). But we will include additional graphs on mitochondrial amount as well as size distributions (Fig R1C, related to Fig 4F). We can already point out that the size distribution of the population does not change much in the first hours. The “petite” phenotype refers to small colonies on growth medium with limited supply of a fermentable carbon source, not to smaller size of single cells.

      3.11. l199 the mean in Fig S3C certainly does change -- it increases, clearly relative both to control and to its initial value. rather than sweeping this under the carpet we should look in more detail to understand it (a consequence of the increased skew of the distribution)?

      This relates somewhat to the previous point. The increase in average concentration is not due to an increased amount in the population, but due to the fact that it is the small buds that get a very high amount of the mitochondria which “exaggerates” the asymmetric/heterogenous distribution. This will be clarified by the figures we mention in the point above.

      3.12. para line 206 -- this doesn't make it clear whether your MMP signal is integrated over all mitochondria in the cell, or normalised by mitochondrial content? this matters quite a lot for the interpretation if the distributions of mitochondrial content are changing. reading on, this is even more important for para line 222. Reading further on, there is an equation on l612 that gives a definition, but it doesn't really clarify (apologies if I'm misunderstanding).

      For each cell, we basically calculate the relative mitochondrial enrichment of the MMP sensitive vs the MMP insensitive pre-sequence.

      So, MMP= (total intensity of mitochondrial pre-Cox4 Neongreen/ total intensity of mitochondrial pre-Su9 Cardinal) / (total cytosolic pre-Cox4 Neongreen/ total cytosolic pre-Su9 Cardinal).

      We calculate this value for each cell, but we do not have the optical resolution to calculate it for individual mitochondrial fragments.

      Both constructs are driven by the same strong promoter, so transcription of the fluorophore should never limit the uptake. Also, in Figure 3D we compare control and depleted cells with similar total mitochondrial concentration, so the difference must be due to a different import of the two fluorophores, see also Fig S4D. The calculated “MMP” value is of course only a crude proxy for the actual membrane potential in millivolts and we do not want to make any claims on absolute values or quantitative differences. But essentially what we are interested in is “mitochondrial health/activity” and we think the system is good at reporting this. See also 2.2.

      3.13. l230 -- a point of personal interest -- low mito concentrations are connected to low "function" (MMP) and give extended division times -- this is interestingly exactly the model needed to reproduce observations in HeLa cells (https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1002416). That model went on to predict several aspects of downstream cellular behaviour -- it would be very interesting to see how compatible that picture (parameterised using HeLa observations) is with yeast!

      Thank you for pointing out your interesting paper, which we will include in our discussion. Another recent preprint about fission yeast (Chacko et al. 2025) also fits into this picture. Since you were kind enough to disclose your identity, we would be happy to discuss this further with you in person if we can maybe follow-up on this.

      3.14. l239 "less mitochondria" -- a bit tricky but I'd say "fewer mitochondria" or "less mitochondrial content"

      Thanks, we will think about how to best rephrase this, probably less mitochondrial content.

      3.15. Section l234 So here (and in Fig 4) the focus is on overall distributions of mitochondrial concentration in different cells (mother-to-be, mother, bud; gen 1, gen >1). But we've just seen that one effect of fzo1 is to broader the distribution of mitochondrial concentration across cells. Can't we look in more depth at the implications of this heterogeneity? For example in Fig 4F (which is cool) we look at the distribution of all fzo1 mothers-to-be, mothers, and buds. But this loses information about the provenance. For example, do mothers-to-be with extremely low mito concentrations just push everything to the bud, while mothers-to-be with high mito concentrations distribute things more evenly? It would seem very easy and very interesting to somehow subset the distribution of mothers-to-be by concentration and see how different subsets behave

      This is a good point. When analyzing the data, we pretty much plotted everything against everything and then chose the graphs that we think will best guide the reader through the story-line. We can make additional supplementary plots where we show the starting concentrations/amounts of the mother in relationship to the resulting split ratio at the end of the cycle (Fig R1D).

      3.16. l285 -- experimental design -- do we know that Atp6 will continue to be a good proxy for functional mtDNA in the face of the perturbations provided by Fzo1 depletion? Especially if there is impact on the expression of mitoribosomes, the relationship between mtDNA and Atp6 may look rather different in the mutant?

      This is actually our top-priority experiment now. We will use the HI-NESS system and possibly DAPI staining to make a more direct link to mtDNA/ nucleoid numbers, see 1.2.

      3.17. l290 -- ruled out mitophagy. This message could be much clearer. Comparing Fig S5C and Fig 3A side-by-side is a needlessly difficult task -- put Fig 3A into Fig S5. Then we see that when mitophagy is compromised, the distribution of mitochondrial concentration has a lower median and much lower upper quartile than in the mitophagy-equipped Fzo1 mutant? What is going on here? For a paper motivated by disentangling coupled mechanisms, this should be made clearer!

      Thanks for pointing this out. We can of course easily include the control in the corresponding figure. Compromising mitophagy is likely to generally affect mitochondrial health and turnover a little bit, independent of what is going on with Fzo1. The second evidence that speaks against large-scale mitophagy is the proteomics data: On population level the dynamics of the respiratory chain proteins are very different from those of other (nuclear encoded) mitochondrial proteins. We will add additional supplementary figures to make this more clear, see Fig R1E. Most mitochondrial proteins in the proteomics experiment stay constant in the first few hours, consistent with the imaging data showing that the mean mitochondrial content of the population does not change initially. This again highlights that it is the unequal distribution which is the problem and not massive degradation of mitochondria.

      3.18. With the Atp6 signal, how do we know that fluorescence from different cells is comparable? Buds will be smaller than mother cells for example, potentially leading to less occlusion of the fluorescent signal by other content in the cytoplasm

      This is of course a general problem that anyone faces doing quantitative fluorescence microscopy. From the technical side, we have done the best we could by taking a reasonable amount of z-slices and by choosing fluorophores that are in a range with little cellular background fluorescence (e.g. Neongreen is much better than GFP). From a practical standpoint, we are always comparing to the control, which is subject to the same technical limitations as the depleted cells and the cell sizes are very similar. So, even if we are systematically overestimating the Atp6 concentration in the bud by a few %, the difference to the control would still be qualitatively true. We therefore do not think that any of our conclusions are affected by this.

      3.19. l343 -- maintenance of mtDNA -- here the point about l285 (is the Atp6-mtDNA relationship the same in the Fzo1 mutant) is particularly important, as we're directly tying findings about the protein product to implications about the mtDNA

      We will carefully address this, see above.

      3.20. l367 -- on a first read this description of the model feels like lots of choices have been made without being fully justified. Why a log-normal distribution (when the fit to the data looks rather flawed); why the choice of 5 groups for nucleoid number (why not 3? or 8?); the process used for parameter fitting is very unclear (after reading the methods I think some of these values are read directly from the data, but the shapes of the distributions remain unexplained). l705 -- presumably the ratio was drawn from a log-normal distribution and then the corresponding nucleoid numbers were rounded to integers? the ratio itself wasn't rounded? (also l367) How were the log-normal distributions fitted to experiments (Figs. S7A,B)? Just by eye?

      We will update our model based on measured nucleoid counts and then explain more stringently the choices we make/ parameters we select.

      3.21. l711 by random selection -- just at random? ("selection" could be confusing) Overall, it feels like the model may be too complicated for what it needs to show. Either (a) the model should show qualitatively that unequal inheritance and reduced production leads to rapid loss -- which a much simpler model, probably just involving a couple of lines of algebra, could show. Or (b) the model should quantitatively reproduce the particular numerical observations from the experiments -- it's not totally clear that it does this (do the cell-cycle-based decay timescales in Fig 7 correspond to the hour-based decay timescales in other plots, for example). At the moment the model is at a (b) level of detail but it's only clear that it's reporting the (a) level of results.

      If the HI-NESS and Fzo1 re-addition experiments work as explained above, all parameters will have direct experimental data, and we should get much closer to (a).

      3.22. A lot of the discussion repeats the results; depending on editorial preferences some of this text could probably be pared back to focus on the literature connections and context.

      We will think about streamlining the discussion once some of the additional material alluded to above has been added.

      3.23. Data availability -- it looks like much of the data required to reproduce the results is not going to be made available. Images and proteomic data are promised, but the data associated with mitochondrial concentration and other features are not mentioned. For FAIR purposes all the data (including statistics from analysis of the images) should be published.

      We maybe didn’t phrase this clearly. All data will be made available. Where technically feasible, this will be directly accessible in a repository, otherwise by request to the corresponding author.

      On our OMERO server, we have deposited many TB of raw images as well as all the intermediate steps such as segmentation masks, and the csv files with all the extracted data for each cell (including background corrections etc). Additionally, we can include csvs with the data grouped in a way that we used to generate all the box blots etc. As of now, the OMERO data is unfortunately only available by requesting a personal guest login from our bioinformatics facility, but we were promised that with the next technical update there will be a public link available. The proteomics data and the model are already fully accessible. The raw western blot images with corresponding ponceau staining will be included with the final publication either as additional supplementary material or in whatever format matches the journal requirements.

      3.24 l660 -- can an overview of the EM protocol be given, to avoid having to buy the Mayer 2024 article?

      The cited paper is open access. But we can also include more details in our method section.

      References:

      Chacko, L. A., H. Nakaoka, R. Morris, W. Marshall, and V. Ananthanarayanan. 2025. 'Mitochondrial function regulates cell growth kinetics to actively maintain mitochondrial homeostasis', bioRxiv.

      Christiano, R., N. Nagaraj, F. Frohlich, and T. C. Walther. 2014. 'Global proteome turnover analyses of the Yeasts S. cerevisiae and S. pombe', Cell Rep, 9: 1959-65.

      Contamine, V., and M. Picard. 2000. 'Maintenance and integrity of the mitochondrial genome: a plethora of nuclear genes in the budding yeast', Microbiol Mol Biol Rev, 64: 281-315.

      Deng, Jingti, Lucy Swift, Mashiat Zaman, Fatemeh Shahhosseini, Abhishek Sharma, Daniela Bureik, Francesco Padovani, Alissa Benedikt, Amit Jaiswal, Craig Brideau, Savraj Grewal, Kurt M. Schmoller, Pina Colarusso, and Timothy E. Shutt. 2025. 'A novel genetic fluorescent reporter to visualize mitochondrial nucleoids', bioRxiv: 2023.10.23.563667.

      Di Bartolomeo, F., C. Malina, K. Campbell, M. Mormino, J. Fuchs, E. Vorontsov, C. M. Gustafsson, and J. Nielsen. 2020. 'Absolute yeast mitochondrial proteome quantification reveals trade-off between biosynthesis and energy generation during diauxic shift', Proc Natl Acad Sci U S A, 117: 7524-35.

      Ebert, A. C., N. L. Hepowit, T. A. Martinez, H. Vollmer, H. L. Singkhek, K. D. Frazier, S. A. Kantejeva, M. R. Patel, and J. A. MacGurn. 2025. 'Sphingolipid metabolism drives mitochondria remodeling during aging and oxidative stress', bioRxiv.

      Jakubke, C., R. Roussou, A. Maiser, C. Schug, F. Thoma, R. Bunk, D. Horl, H. Leonhardt, P. Walter, T. Klecker, and C. Osman. 2021. 'Cristae-dependent quality control of the mitochondrial genome', Sci Adv, 7: eabi8886.

      Khan, Abdul Haseeb, Xuefang Gu, Rutvik J. Patel, Prabha Chuphal, Matheus P. Viana, Aidan I. Brown, Brian M. Zid, and Tatsuhisa Tsuboi. 2024. 'Mitochondrial protein heterogeneity stems from the stochastic nature of co-translational protein targeting in cell senescence', Nature Communications, 15: 8274.

      Martin, J., K. Mahlke, and N. Pfanner. 1991. 'Role of an energized inner membrane in mitochondrial protein import. Delta psi drives the movement of presequences', J Biol Chem, 266: 18051-7.

      Osman, C., T. R. Noriega, V. Okreglak, J. C. Fung, and P. Walter. 2015. 'Integrity of the yeast mitochondrial genome, but not its distribution and inheritance, relies on mitochondrial fission and fusion', Proc Natl Acad Sci U S A, 112: E947-56.

      Perić, Matea, Peter Bou Dib, Sven Dennerlein, Marina Musa, Marina Rudan, Anita Lovrić, Andrea Nikolić, Ana Šarić, Sandra Sobočanec, Željka Mačak, Nuno Raimundo, and Anita Kriško. 2016. 'Crosstalk between cellular compartments protects against proteotoxicity and extends lifespan', Scientific Reports, 6: 28751.

      Roussou, Rodaria, Dirk Metzler, Francesco Padovani, Felix Thoma, Rebecca Schwarz, Boris Shraiman, Kurt M. Schmoller, and Christof Osman. 2024. 'Real-time assessment of mitochondrial DNA heteroplasmy dynamics at the single-cell level', The EMBO Journal, 43: 5340-59-59.

      Seel, A., F. Padovani, M. Mayer, A. Finster, D. Bureik, F. Thoma, C. Osman, T. Klecker, and K. M. Schmoller. 2023. 'Regulation with cell size ensures mitochondrial DNA homeostasis during cell growth', Nat Struct Mol Biol, 30: 1549-60.

      Vowinckel, J., J. Hartl, R. Butler, and M. Ralser. 2015. 'MitoLoc: A method for the simultaneous quantification of mitochondrial network morphology and membrane potential in single cells', Mitochondrion, 24: 77-86.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      This article addresses the connection between perturbed mitochondrial structure and genetics in yeast. When mitochondrial fusion is compromised, what is the chain of causality -- the mechanism -- that leads to mtDNA populations becoming depleted? This is a fascinating question, linking physical cell biology to population genetics. I admire the philosophy of the research, acknowledging and attempt to control for the many possible confounding influences. The manuscript describes the context and the research tightly and digestibly; the figures illustrate the results in a clear and natural way.

      For transparency, I am Iain Johnston and I am happy for this review to be treated as public domain. To my eyes my most important shortcoming as a review is my relative lack of familiarity with the yeast fzo1 mutant; while I am familiar with analysis of yeast mito morphology and mtDNA segregation, a reviewer familiar with the nuances of this strain and its culture would be a useful complement.

      I have a few more general points and a collection of smaller points below that I believe might help make the story more robust.

      General points

      1. About the use of Atp6 as a good proxy for mtDNA content. This is assumed from l285 onwards, based on a previous publication. As the link is fairly central to part of the paper's arguments, and the system in this study is being perturbed in several different ways, a stronger argument or demonstration that this link remains intact (and unchanged, as it is used in comparisons) would seem important.
      2. About confounding variables and processes. The study does an admirable job of being transparent and attempting to control for the many different influences involved in the physical-genetic link. But some remain less clearly unpacked, including some I think could be quite important. For example, there is a lot of focus on mito concentration -- but given the phenotypes are changing the sizes of cells, do concentration changes come from volume changes, mito changes, or both? In "ruling out" mitophagy -- a potentially important (and intuitive) influence, the argument is not presented as directly as it could be and it's not completely clear that it can in fact be ruled out in this way. There are a couple of other instances which I've put in the smaller points below.

      Smaller points

      l47 full genus name when it first appears

      l58 I may be wrong here, but I thought the petite phenotype more classically arises from mtDNA deletion mutations, not loss? The way this is phrased implies that mtDNA loss is [always] the cause. Whether I'm wrong on that point or not, the petite phenotype should be described and referenced.

      para starting l59 -- should mention for context that mitochondria in (healthy, wildtype) yeast are generally much more fused than in other organisms

      Fig 1C -- very odd choice of y-axis range! either start at zero or ensure that the data fill as much vertical space of the plot as possible

      l105 "wild-type like more tubular mitochondria" reads rather awkwardly. "more tubular mitochondria (as in the wild-type)"?

      l106 -- imaging artefacts? are mitos fragmenting because of photo stress? -- this is mentioned in l577-8 in the Methods, but the data from the growth rate and MMP comparison isn't given -- an SI figure would be helpful here. It would be reassuring to know that mito morphology wasn't changing in response to phototoxicity too.

      para l146 -- so this suggests mtDNA-encoded proteins have a very rapid turnover, O(hours) -- is this known/reasonable?

      section l189 -- it's hard to reason fully about these statistics of mitochondrial concentration given that the petite phenotype is fundamentally affecting overall cell volume. can we have details on the cell size distribution in parallel with these results? to put it another way -- how does mitochondrial amount per cell change?

      l199 the mean in Fig S3C certainly does change -- it increases, clearly relative both to control and to its initial value. rather than sweeping this under the carpet we should look in more detail to understand it (a consequence of the increased skew of the distribution)?

      para line 206 -- this doesn't make it clear whether your MMP signal is integrated over all mitochondria in the cell, or normalised by mitochondrial content? this matters quite a lot for the intepretation if the distributions of mitochondrial content are changing. reading on, this is even more important for para line 222. Reading further on, there is an equation on l612 that gives a definition, but it doesn't really clarify (apologies if I'm misunderstanding).

      l230 -- a point of personal interest -- low mito concentrations are connected to low "function" (MMP) and give extended division times -- this is interestingly exactly the model needed to reproduce observations in HeLa cells (https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1002416). That model went on to predict several aspects of downstream cellular behaviour -- it would be very interesting to see how compatible that picture (parameterised using HeLa observations) is with yeast!

      l239 "less mitochondria" -- a bit tricky but I'd say "fewer mitochondria" or "less mitochondrial content"

      Section l234 So here (and in Fig 4) the focus is on overall distributions of mitochondrial concentration in different cells (mother-to-be, mother, bud; gen 1, gen >1). But we've just seen that one effect of fzo1 is to broader the distribution of mitochondrial concentration across cells. Can't we look in more depth at the implications of this heterogeneity? For example in Fig 4F (which is cool) we look at the distribution of all fzo1 mothers-to-be, mothers, and buds. But this loses information about the provenance. For example, do mothers-to-be with extremely low mito concentrations just push everything to the bud, while mothers-to-be with high mito concentrations distribute things more evenly? It would seem very easy and very interesting to somehow subset the distribution of mothers-to-be by concentration and see how different subsets behave

      l285 -- experimental design -- do we know that Atp6 will continue to be a good proxy for functional mtDNA in the face of the perturbations provided by Fzo1 depletion? Especially if there is impact on the expression of mitoribosomes, the relationship between mtDNA and Atp6 may look rather different in the mutant?

      l290 -- ruled out mitophagy. This message could be much clearer. Comparing Fig S5C and Fig 3A side-by-side is a needlessly difficult task -- put Fig 3A into Fig S5. Then we see that when mitophagy is compromised, the distribution of mitochondrial concentration has a lower median and much lower upper quartile than in the mitophagy-equipped Fzo1 mutant? What is going on here? For a paper motivated by disentagling coupled mechanisms, this should be made clearer!

      With the Atp6 signal, how do we know that fluorescence from different cells is comparable? Buds will be smaller than mother cells for example, potentially leading to less occlusion of the fluorescent signal by other content in the cytoplasm

      l336 -- similar to the Jajoo et al. mechanism in fission yeast -- but are you talking about feedback control of the mtDNA or the protein (or mRNA) product?

      l343 -- maintenance of mtDNA -- here the point about l285 (is the Atp6-mtDNA relationship the same in the Fzo1 mutant) is particularly important, as we're directly tying findings about the protein product to implications about the mtDNA

      l367 -- on a first read this description of the model feels like lots of choices have been made without being fully justified. Why a log-normal distribution (when the fit to the data looks rather flawed); why the choice of 5 groups for nucleoid number (why not 3? or 8?); the process used for parameter fitting is very unclear (after reading the methods I think some of these values are read directly from the data, but the shapes of the distributions remain unexplained). l705 -- presumably the ratio was drawn from a log-normal distribution and then the corresponding nucleoid numbers were rounded to integers? the ratio itself wasn't rounded? (also l367) How were the log-normal distributions fitted to experiments (Figs. S7A,B)? Just by eye? l711 by random selection -- just at random? ("selection" could be confusing) Overall, it feels like the model may be too complicated for what it needs to show. Either (a) the model should show qualitatively that unequal inheritance and reduced production leads to rapid loss -- which a much simpler model, probably just involving a couple of lines of algebra, could show. Or (b) the model should quantitatively reproduce the particular numerical observations from the experiments -- it's not totally clear that it does this (do the cell-cycle-based decay timescales in Fig 7 correspond to the hour-based decay timescales in other plots, for example). At the moment the model is at a (b) level of detail but it's only clear that it's reporting the (a) level of results.

      A lot of the discussion repeats the results; depending on editorial preferences some of this text could probably be pared back to focus on the literature connections and context.

      Data availability -- it looks like much of the data required to reproduce the results is not going to be made available. Images and proteomic data are promised, but the data associated with mitochondrial concentration and other features are not mentioned. For FAIR purposes all the data (including statistics from analysis of the images) should be published.

      l660 -- can an overview of the EM protocol be given, to avoid having to buy the Mayer 2024 article?

      Significance

      This is a powerful and thoughtful study that provides a collection of new mechanistic insights into the link between physical and genetic properties of mitochondria in yeast. Cell biologists, geneticists, and the mitochondrial field will find this of potentially deep interest. Because of the mode and dynamics of inheritance in budding yeast, findings here may not be directly transferrable to other eukaryotes, but these insights are still of interest for researchers outside of yeast for their insight into how this well-studied system manages its mitochondrial populations.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      This is a manuscript describing outbreaks of Pseudomonas aeruginosa ST 621 in a facility in the US using genomic data. The authors identified and analysed 254 P. aeruginosa ST 621 isolates collected from a facility from 2011 to 2020. The authors described the relatedness of the isolates across different locations, specimen types (sources), and sampling years. Two concurrently emerged subclones were identified from the 254 isolates. The authors predicted that the most recent common ancestor for the isolates can be dated back to approximately 1999 after the opening of the main building of the facility in 1996. Then the authors grouped the 254 isolates into two categories: 1) patient-to-patient; or 2) environment-to-patient using SNP thresholds and known epidemiological links. Finally, the authors described the changes in resistance gene profiles, virulence genes, cell wall biogenesis, and signaling pathway genes of the isolates over the sampling years.

      Strengths:

      The major strength of this study is the utilisation of genomic data to comprehensively describe the characteristics of a long-term Pseudomonas aeruginosa ST 621 outbreak in a facility. This fills the data gap of a clone that could be clinically important but easily missed from microbiology data alone.

      Weaknesses:

      The work would further benefit from a more detailed discussion on the limitations due to the lack of data on patient clinical information, ward movement, and swabs collected from healthcare workers to verify the transmission of Pseudomonas aeruginosa ST 621, including potential healthcare worker to patient transmission, patient-to-patient transmission, patient-to-environment transmission, and environment-to-patient transmission. For instance, the definition given in the manuscript for patient-to-patient transmission could not rule out the possibility of the existence of a shared contaminated environment. Equally, as patients were not routinely swabbed, unobserved carriers of Pseudomonas aeruginosa ST 621 could not be identified and the possibility of misclassifying the environment-to-patient transmissions could not be ruled out. Moreover, reporting of changes in rates of resistance to imipenem and cefepime could be improved by showing the exact p-values (perhaps with three decimal places) rather than dichotomising the value at 0.05. By doing so, readers could interpret the strength of the evidence of changes.

      Impact of the work:

      First, the work adds to the growing evidence implicating sinks as long-term reservoirs for important MDR pathogens, with direct infection control implications. Moreover, the work could potentially motivate investments in generating and integrating genomic data into routine surveillance. The comprehensive descriptions of the Pseudomonas aeruginosa ST 621 clones outbreak is a great example to demonstrate how genomic data can provide additional information about long-term outbreaks that otherwise could not be detected using microbiology data alone. Moreover, identifying the changes in resistance genes and virulence genes over time would not be possible without genomic data. Finally, this work provided additional evidence for the existence of long-term persistence of Pseudomonas aeruginosa ST 621 clones, which likely occur in other similar settings.

      We thank the reviewer for their thorough evaluation of our work, and for the suggested improvements. A main goal of this study was to show that integrating routine wgs in the clinic was a game changer for infection control efforts. We appreciate this aspect was highlighted as a strength by this reviewer. While some of the weaknesses identified are inherent to the data (or lack thereof) available for this study, we have revised the manuscript to include a detailed discussion on limitations (sampling, thresholds of genetic relatedness, definition and categories etc.) that could influence the genomic inferences. We also provided exact p-values for the changes in rates of resistance, as requested. Finally, we have positively answered all the specific recommendations suggested by the reviewer and modified the manuscript accordingly.

      Reviewer #2 (Public Review):

      Summary:

      The authors present a report of a large Pseudomonas aeruginosa hospital outbreak affecting more than 80 patients with first sampling dates in 2011 that stretched over more than 10 years and was only identified through genomic surveillance in 2020. The outbreak strain was assigned to the sequence type 621, an ST that has been associated with carpabapenem resistance across the globe. Ongoing transmission coincided with both increasing resistance without acquisition of carbapenemase genes as well as the convergence of mutations towards a host-adapted lifestyle.

      Strengths:

      The convincing genomic analyses indicate spread throughout the hospital since the beginning of the century and provide important benchmark findings for future comparison.

      The sampling was based on all organisms sent to the Multidrug-resistant Organism Repository and Surveillance Network across the U.S. Military Health System.

      Using sequencing data from patient and environmental samples for phylogenetic and transmission analyses as well as determining recurring mutations in outbreak isolates allows for insights into the evolution of potentially harmful pathogens with the ultimate aim of reducing their spread in hospitals.

      Weaknesses:

      The epidemiological information was limited and the sampling methodology was inconsistent, thus complicating the inference of exact transmission routes. Epidemiological data relevant to this analysis include information on the reason for sampling, patient admission and discharge data, and underlying frequency of sampling and sampling results in relation to patient turnover.

      We thank the reviewer for their thoughtful feedback on our manuscript and for highlighting the quality of the genomic analyses. We agree that the lack of patient epi data (e.g. date of admission and discharge) and the inconsistent sampling through the years are limitations of this study. We have revised the manuscript to acknowledge these limitations and discuss how not having this data complicates the inference of exact transmission routes. Finally, we have positively answered all the specific recommendations suggested by the reviewer and modified the manuscript accordingly.

      Reviewer #3 (Public Review):

      Summary:

      This paper by Stribling and colleagues sheds light on a decade-long P. aeruginosa outbreak of the high-risk lineage ST-621 in a US Military hospital. The origins of the outbreak date back to the late 90s and it was mainly caused by two distinct subclones SC1 and SC2. The data of this outbreak showed the emergence of antibiotic resistance to cephalosporin, carbapenems, and colistin over time highlighting the emerging risk of extensively resistant infections due to P. aeruginosa and the need for ongoing surveillance.

      Strengths:

      This study overall is well constructed and clearly written. Since detailed information on floor plans of the building and transfers between facilities was available, the authors were able to show that these two subclones emerged in two separate buildings of the hospital. The authors support their conclusions with prospective environmental sampling in 2021 and 2022 and link the role of persistent environmental contamination to sustaining nosocomial transmission. Information on resistance genes in repeat isolates for the same patients allowed the authors to detect the emergence of resistance within patients. The conclusions have broader implications for infection control at other facilities. In particular, the paper highlights the value of real-time surveillance and environmental sampling in slowing nosocomial transmission of P. aeruginosa.

      Weaknesses:

      My major concern is that the authors used fixed thresholds and definitions to classify the origin of an infection. As such, they were not able to give uncertainty measures around transmission routes nor quantify the relative contribution of persistent environmental contamination vs patient-to-patient transmission. The latter would allow the authors to quantify the impact of certain interventions. In addition, these results represent a specific US military facility and the transmission patterns might be specific to that facility. The study also lacked any data on antibiotic use that could have been used to relate to and discuss the temporal trends of antimicrobial resistance.

      We thank the reviewer for their evaluation of our work and for highlighting the broad implications of our findings regarding the application of real-time surveillance to suppress nosocomial transmission. We agree with the reviewer that fixed thresholds and definitions are imperfect to classify the origin of an infection. The design of this study (e.g. inconsistent sampling through time) was not conducive to provide a comprehensive/quantitative measurement of transmission routes. Thus, we decided to apply conservative thresholds of genetic relatedness and strict conditions (e.g. time between isolate collection, shared hospital location etc.) to favor specificity as our goal was simply to establish that cases of environmentto-patient transmission did happen. In the absence of a truth set, we have not performed sensitivity analysis, but we are conducting a follow-up study to compare inferences from MCMC models to our original fixed-thresholds predictions. This limitation is now discussed in the revised manuscript. Finally, we have positively answered all the specific recommendations suggested by the reviewer and modified the manuscript accordingly including the addition of Figure S3.

      Reviewer #1 (Recommendations For The Authors):

      The definitions used on lines 391-396 are necessarily somewhat arbitrary, but it would be helpful to have a little bit more justification for the choices made, particularly for the definition of environmental involving the "3x the number of years they were separated". It seems a little hard to square this with the more relaxed 10 SNP cutoff for a patient-to-patient designation. Are there reasons for thinking SNP differences associated with environmental transmission should be smaller than for patient-to-patient, or is the aim here just to set the bar higher for assuming an environmental source? Because these definitions are quite arbitrary, there could also be some value in exploring the sensitivity of the results to these assumptions.

      Thank you. We agree with the reviewers that SNP thresholds, albeit necessarily, are arbitrary and that more discussion/justification was needed to put the genomic inferences in context. We have revised the manuscript to indicate that: 1/ the 10 SNP cutoff for a patient-to-patient designation was set to account for the known evolution rate of P. aeruginosa (inferred by BEAST at 2.987E-7 subs/site/year in this study and similar to previous estimates PMID: 24039595) and the observed within host variability (now displayed in revised Fig. 1E). We note that this SNP distance was not sufficient and that an epi link (patients on the same ward at the same time) needed to be established. 2/ the environment-to-patient definition was indeed set to be most conservative (nearly identical isolates in two patients from the same ward with no known temporal overlap for > 365 days). This was indeed done to favor high specificity as this inference relied solely on clinical isolates (i.e. the identical environmental strain in the patientenvironment-patient chain was not sampled). For these clinical isolates to have acquired no/very little mutation in that much time, no/low replication is expected and, although unsampled, we propose this most likely happened on hospital surfaces.

      While the term "core genome" should be familiar to most readers, "shell genome" and "cloud genome" are less widely known, and an explanation of what these terms mean here would be helpful.

      Thank you. We have revised the manuscript to define the core, shell, and cloud genomes as genes sets found in ≥ 99%, ≥ 95% and ≥ 15% of isolates, respectively.

      In the first paragraph of the discussion, it could be added that in many cases for clinically important Gram negatives short read sequencing alone will fail to detect transmission events as outbreaks can be driven by plasmid spread with only very limited clonal spread (see, for example, https://www.nature.com/articles/s41564-021-00879-y )

      Thank you. We agree this is an important/emerging aspect of surveillance. However, the goal of this discussion point was to explain why such a large outbreak was missed prior to implementing WGS (short read) surveillance. We feel that discussing “plasmid outbreaks” (which is not at play here, and relatively rare in P. aeruginosa compared to the Enterobacteriaceae) and the need for long read will distract from the narrative. 

      line 599 What does "Mock" mean here? Would it be more accurate to say it is a simplified floor plan?

      Thank you. “Mock” was changed to “simplified”

      IPAC abbreviation is only used once - spelling it out in full would increase readability.

      Revised manuscript was edited as suggested.

      MHS is only used twice.

      Revised manuscript was edited to spell out Military Health System

      Line 364: full stop missing.

      Revised manuscript was edited as suggested.

      Line 401: Bayesian rather than bayesian.

      Revised manuscript was edited as suggested.

      Reviewer #2 (Recommendations For The Authors):

      Thank you for giving me the opportunity to review this interesting manuscript.

      The conclusions of this paper are mostly well supported by the data presented, but epidemiological information was limited and the sampling methodology was inconsistent, thus complicating inference of exact transmission routes.

      Major issues:

      What was the baseline frequency of clinical and/or screening samples of Pseudomonas aeruginosa at the hospital? Neither Figure 1D nor Table S1 allows for differentiating between clinical and screening samples. Most isolates were cultured from clinical materials, and there is no information about the patients' length of stay and their respective sampling dates. Is there any possibility of finding out whether the samples were collected for clinical or screening purposes? Would it be possible to include the patients' admission data to determine whether the strains were imported into the hospital or related to a previous stay, e.g. among known carriers? Also, the issue of sampling dates vs. patient stay on the ward should be addressed, as there may be an overlap in patients' stay on the ward but no overlap in terms of sampling dates or even missing samples (missing links).

      We have revised the manuscript to address this important point: i) 16 isolates were from surveillance swabs and are labelled “Surveillance” in Table S1. The remaining 237 were clinical isolates; ii) unfortunately, because the sampling was done under a public health surveillance framework, we do not have access to historical patient data (admission/discharge date, wards, rooms, etc.) and we can not calculate length of stay or better identify patient overlap. These limitations are now acknowledged in the discussion of the revised manuscript.

      In order to evaluate the extent of the outbreak, more epidemiological data would be useful What is the size of the hospital, what is the average patient turnover, and what is the average length of stay in ICU and non-ICU? Is there any specialization besides the military label?

      We have revised the manuscript to indicate that facility A is 425-bed medical center and is the only Level 1 trauma center in the Military Health System. Unfortunately, the data to calculate length of stay, throughout the years, in ICU and non-ICU, was not available to us. This limitation is now also acknowledged in the discussion.

      Perhaps the authors could attempwt to discuss the extent to which large outbreaks like these may be considered as part of unavoidable evolutionary processes within the hospital microbiome as opposed to accumulation and transmission of potentially harmful genes/clones, and differentiate between the putative community spread without any epidemiological links on the one hand, and hospital outbreaks that could be targeted by local infection prevention activities on the other hand.,

      We respectfully disagree with the suggestion that this large outbreak “may be considered as part of unavoidable evolutionary processes within the hospital microbiome” and should be opposed to “transmission of potentially harmful genes/clones”. As a matter of fact, our data showed that infection control staff at Facility A responded with multiple interventions, including closing sinks, replacing tubing, and using foaming detergents. This resulted in slowing the spread of the ST621 outbreak with just 3 cases identified in 2022, 0 cases in 2023 and 1 case in 2024. This is now discussed in the revised manuscript.

      Page 5, lines 88-92 lines 101-104. It seems as if the outbreak was identified only by the means of genomic surveillance. This raises questions as to the rationale for sampling and sequencing, especially prior to 2020. Considering 11 cases per year between 2011 and 2016, one could assume such an outbreak would have been noticed without sequencing data.

      The MRSN was created in 2010, in response to the outbreak of MDR Acinetobacter baumannii in US military personnel returning from Iraq and Afghanistan. Between 2011 and 2017, the MRSN collected MDR isolates (mandate for all MDR ESKAPE but compliance varied between years and facilities) from across the Military Health System and, for select isolates (e.g. high-risk isolates carrying ESBLs or carbapenemases) performed molecular typing by PFGE. In 2017 the MRSN started to perform whole genome sequencing of its entire repository. In 2020, a routine prospective sequencing service was started and first detected the ST621 outbreak. A retrospective analysis of historical isolate genomes (2011-2019) identified additional cases. The first paragraph of the discussion lists possible factors to explain why the ST621 escaped detection by traditional approaches. We believe 11 cases per year is not a strong signal when stratified by month, wards, or both, especially for a clone lacking a carbapenemase and without a remarkable antibiotic susceptibility profile. 

      Did the infection control personnel suspect transmission? If yes, was the sampling and submission of samples to the MRSN adapted based on the epidemiologic findings?

      The ST621 outbreak was unsuspected before the initial genomic detection in 2020. Until that point, MDR isolates only (Magiorakos et al PMID: 21793988) were collected but compliance was variable through time. Quickly thereafter (starting in 2021), complete sampling of all clinical P. aeruginosa (MDR or not) from Facility A was started. The manuscript was revised to clarify those details of the sampling strategy.

      Is there any information about how many environmental sites were sampled without evidence of ST621 / screening samples were cultured without evidence of Pseudomonas aeruginosa?

      For patient isolates, only 16 isolates were from surveillance swabs. The remaining 237 were clinical isolates. No denominator data was available to calculate P. aeruginosa and ST-621 positivity rate in surveillance swabs throughout the time period. For environmental isolates, a total of 159 swabs were taken from 55 distinct locations in 8 wards/units including the ER. This data is now included in the revised manuscript. However, a complete analysis of these swabs (positivity rate for ESKAPE pathogens, P. aeruginosa, per ward/floor/room, per swab type (sink drain, bed rail etc.) etc.) is beyond the scope of this study and is being performed as a follow up investigation.

      Page 5 lines 89 and 39 Figure S1B. Please describe how the allelic distance for the cluster threshold was selected.

      As indicated in the legend of Figure S1B, no thresholds were applied. All ST621 isolates ever sequenced by the MRSN were included. All except 3 isolates shared between 023 cgMLST allelic differences. The remaining 3 were distant by 88-89 allelic differences. The text was revised to clarify this point.

      Page 5 lines 99-100. Could the authors please provide some distribution measures (e.g. IQR).

      Done as requested. The revised manuscript now reads “…of just 38 single nucleotide polymorphisms (SNPs), and an IQR of 19 (Fig. 1A, Table S1).”

      Page 5 line 102. Could the authors please provide some distribution measures (e.g. IQR).

      Please see above. A chart was created and is now included as Fig. S2.

      Page 6 line 107 and page 34 figure 1c. In the text it is stated that isolates were collected in 27 wards, the figure 1C depicts 26 wards and n/a.

      Thank you for spotting this inconsistency. This has been fixed in the revised manuscript.

      Page 6 lines 117-118. Samples collected in the emergency room would imply samples collected on admission, already addressed previously. Did the authors investigate a potential import into the hospital from community reservoirs or were all these isolates collected among patients who had been previously admitted to the hospital and/or tested positive for the outbreak strain?

      We agree that samples collected in the ER imply samples collected on admission. Of the 29 ER isolates only 9 (31%) were primary isolates (first detection in a new patient) which suggests a majority were from returning patients at Facility A. Because the sampling was done under a public health surveillance framework, we do not have access to historical patient data (admission/discharge date, wards, rooms, etc.) to investigate/confirm that these 9 patients had previous visits at Facility A. This point is now discussed in the revised manuscript.

      Page 6 line 128. This could also represent increased selective pressure. However, according to Table S1, the 28 isolates collected in 2011 (the number does not match with Figure 1D) were from many different wards, thus indicating earlier spread throughout the hospital.

      Yes, we agree. Please note that table S1 lists all isolates for 2011 whereas Figure 1D focuses on primary (first isolate from each patients) only.  

      Page 7 line 133. Both Figure 2 and the discussion section, page 13 line 296 suggest the year 2005 instead of 2004?

      Thank you for catching this typographical error. This was corrected to 2004 in the revised manuscript.

      Figure 1E. The figure should also depict intra-patient diversity for comparison.

      Thank you for this great suggestion. We have revised Figure 1E accordingly.

      Page 7, lines 146-147 Could the authors attempt explaining the upper part of the bimodal peaks?

      This is an all-vs-all SNP analysis for all inter-patient isolates. For each isolates all distances to other isolates are reported, not only the smallest. The upper peaks represent comparisons to isolates from a different outbreak subclone (SC1 vs SC2).

      Page 7, line 150 This is a very small number considering the extent of the outbreak and suggests a large number of missing links. Or does this rather imply continuous import and evolution over time that does not necessarily represent transmission within the hospital?

      We believe all cases were due to transmission happening within the hospital. Based on conservative thresholds (genetic relatedness and epi link, or lack thereof) the precise origin from another patient (n=10) or a contaminated surface (n=12) can be inferred. For the remaining 60 patients, with the available sampling, the conditions we chose are not met and we simply do not conclude whether a direct patient-to-patient or an environmental origin was more likely.

      Page 8 line 155. What does the temporal overlap refer to - sampling date versus patient's stay on the ward? Please specify.

      The temporal overlap was investigated from sampling dates, as dates of patient admission/discharged were not available.

      Page 8, line 157: What does primary/serial isolate mean - first and follow-up samples of ST621 per patient?

      Yes. Primary isolate is used to designate the first isolate from a patient. Serial isolates designate follow-up samples of ST621.

      Page 8 line 165: Table S3 and Figure 3 only refer to environmental samples from three wards. Ward 20 rooms 2 and 18 as well as ward 1 rooms 1 and 6 were hotspots - is there any information on the specific infection control/disinfection measures? Addressed in discussion page 12, lines 273-275, but no information on what was actually done.

      The manuscript was revised to indicate the precise disinfection measures that were taken. A follow-up study is ongoing to assess long-term efficacy and monitor possible retrograde growth from previously contaminated sinks.

      Page 8 line 175: Evaluation of change in resistance fraction over time - There may have been a selection bias with an inconsistent number of strains sequenced per year.

      Yes, incomplete sampling and possible selection bias are now listed with other limitations of this study in the discussion of the revised manuscript.

      Page 9 line 183: The referral to Table S1 is unclear, I could not find the number and the specific isolates selected for long-read sequencing.

      Thank you. This has been added to the revised Table S1.

      Page 10 lines 217-225 and Figure 4C: Perhaps it is possible to better align what is written in the text and the caption of the figure. The caption does not clarify that only one patient develops colistin resistance (what was the reason to include the other patients?).

      Thank you. We have revised the text and the caption of the figure to clarify that only isolates from one patient developed colistin resistance. The isolates from the other patients on Fig. 4C are shown to provide context and accurately map the emergence of the PhoQE77fs mutation.  

      Page 10, lines 228-229 and Table S5: How is it possible to identify those 64 genes in Table S5?

      We have revised Table S5 to facilitate the identification of the 64 genes with ≥ 2 independently acquired mutations (excluding SYN). Specifically, we have added column E labeled “Counts independent mutations per locus (excluding SYN)”. A total of 205 rows (in this table each row is a variant) have a value ≥ 2 and these represent 64 genes (upon deduplication of locus tags).  

      Page 13, lines 280-281: Where is the information on chronic infection presented? Serial cultures would not necessarily mean chronic infection.

      Authors response: Yes, we agree this was not the appropriate characterization and this was revised to ‘long-term’ infections.

      Page 14 line 306: Emergence of colistin resistance in a single patient, correct?

      Yes. This was further clarified in the text.

      Page 14 lines 315-320: This should go to the results section. In particular disinfection, closing, and replacing of tubing should be mentioned in the results section in reference to the results presented in Table S3.

      Thank you. We have considered this suggestion and have decided to leave this discussion as the closing paragraph of this publication. A follow-up study is ongoing to assess long-term efficacy of these interventions on the ST-621 bur also other outbreak clones at Facility A.

      Methods

      Page 15 lines 330-333: Perhaps it is possible to avoid redundancy.

      Thank you. We have revised the text accordingly.

      Page 15 lines 341: Information on which isolates were subjected to long-read sequencing is missing.

      Thank you. This has been added to the revised Table S1.

      Page 16 line 345: Was there a particular reason why Newbler was chosen?

      No. At the time Newbler was the default assembler built in the MRSN bacterial genome analysis pipeline and QC processes.

      Page 16, line 357-358: What was the rationale for selecting this isolate as reference genome?

      This isolate was chosen because it was collected early in the outbreak and phylogenetic analysis revealed it had low root to tip divergence.

      Page 16 line 361: Why 310 isolates, if only 253 were assigned to the outbreak clone and only a subset of those were collected in facility A?

      This was a typographical error that has corrected (it now reads “…set of 253 isolates.”) in the revised manuscript.  

      Page 17 lines 387-395: What is the reason that intra-patient diversity was not included in the set of criteria for SNP distances?

      The observed within host variability (now displayed in revised Fig. 1E) was taken into consideration when setting SNP thresholds for categorizing patient-to-patient transmission or environment-to-patient event. This is now clarified in the revised manuscript.

      Page 17 line 392: How was the threshold of <=10 SNPs determined?

      The 10 SNP cutoff to infer a patient-to-patient transmission event was set to account for the known evolution rate of P. aeruginosa (inferred by BEAST at 2.987E-7 subs/site/year in this study, and similar to previous estimates PMID: 24039595) and the observed within host variability (now displayed in revised Fig. 1E). We note that this SNP distance was not sufficient and that an epi link (patients on the same ward within the same month) needed to be established.

      Page 17 line 395 and Figure 2: What was the assumed average mutation rate per genome per year?

      Thank you. The mean substitution rate inferred by BEAST was 2.987E-7 similar to estimate from previous studies on P. aeruginosa outbreaks (e.g. PMID: 24039595).

      Reviewer #3 (Recommendations For The Authors):

      Please find (line-by-line comments) on each section of the manuscript below:

      Introduction

      Line 86: I am wondering why the authors state ">28 facilities" instead of the exact number of facilities from which these lineages were recovered.

      Thank you. Manuscript was revised to provide the exact number of facilities. It now reads “…recovered from 37 and 28 facilities, respectively.”

      Methods

      It's not clear to me which criteria were used for collecting these isolates (both prospective and retrospective). I understand that some of the data are described in more detail in Lebreton et al but I did not find the specific criteria for the collection of the isolates and I imagine that these might differ if different facilities. Would it be possible to comment on that and add a short paragraph in the Methods section?

      Thank you. This lack of clarity was also raised by other reviewers, and we have revised the manuscript to indicate that: 1/MDR isolates only (Magiorakos et al PMID: 21793988) were collected from 2011-2020 with the same criteria for all facilities although compliance was variable through time and between facilities; and 2/ starting in 2021 all P. aeruginosa isolates, irrespective of their susceptibility profile, were collected from Facility A

      The data comes from a US Military hospital. Is this related to the US Veterans Affairs Healthcare system? Is there more detailed information about the demographics of the patient population?

      Facility A is part of the Military Health System (MHS) which provides care for active service members and their families. This is distinct from the US Veterans Affairs Healthcare system. Only limited patient data was accessible to us as this study was done as part of our public health surveillance activities. Patient age (avg. 57.2 +/- 21.0) and gender (ratio male/female 1.7) are provided in the revised manuscript. 

      Line 384ff: The origin of infection was inferred based on the SNP threshold and epidemiological links. However, recombination events can complicate the interpretation of SNP data. Have the authors attempted to account for this?

      Thank you. We agree that recombination events can complicate the interpretation of SNP data. We used Gubbins v2.3.1 to filter out recombination from the core SNP alignment, as indicated in the revised manuscript.

      The authors' definition of environment-to-patient transmission seems conservative (nearly identical strain and no known temporal overlap for > 365 days). Have the authors changed the threshold, performed sensitivity analyses, and tested how this would affect their results?

      Indeed, acknowledging that fixed thresholds have limitations in their ability to accurately predict the origin of infections, we took a conservative approach to favor specificity as our goal was simply to establish that cases of environment-to-patient transmission did happen. In the absence of a truth set, we have not performed sensitivity analysis, but we are conducting a follow-up study to compare inferences from MCMC models to our original predictions. This limitation is now discussed in the revised manuscript.

      The authors don't seem to incorporate the role of healthcare workers in the transmission process. Could they comment on this? I am assuming that environment-to-patient transmission could either be directly from the environment to the patient or via a healthcare worker. I think it's fine to make simplifying assumptions here but it would be great if this was explicitly described.

      Thank you for this suggestion. We have not sampled the hands of healthcare workers in this study. As a result, the reviewer is correct to say that we made the simplifying assumption that healthcare workers would be possible intermediates in either environment-topatient or patient-to-patient transmissions, as previously described by others (PMID: 8452949). This limitation is now discussed in the revised manuscript.

      Page 5, line 100: What does "all vs all" mean? Based on the supplement, I assume it's the pairwise distance and then averaged across all of those. It would improve the readability of the manuscript if the authors could briefly define this term and then maybe refer to Table S1.

      Thank you. We have created Fig.S2 and revised the manuscript to state that ST-621 isolates from facility A belonged to the same outbreak clone with a distance (averaged all vs all pairwise comparison) of just 38 single nucleotide polymorphisms (SNPs), and an IQR of 19 (Fig. S2, Table S1).

      Figure 1D: It would be interesting to see additional figures in the supplement on the percentage of sequenced isolates per year and whether it varies across the different sources/sites. Is there any information on which isolates were chosen for sequencing?

      Lack of clarity in the sampling/sequencing scheme was raised by multiple reviewers and we have provided a thorough response to earlier comments. We also have revised the material and methods section accordingly. Finally, we have created Fig. S3 to show the percentage of sequenced isolates per year across different sources/sites, as suggested by the reviewer. No noticeable patterns were observed. 

      It seems like only a subset of all clinical isolates were sequenced. Would it be possible that SC2 was present already earlier but not picked up until a certain date?

      Although all isolates received by the MRSN were sequenced, compliance varied through time so it is true that not all clinical isolates were sequenced between 2011-2019. As such, we fully agree with this hypothesis and discuss this possibility as BEAST analysis placed the origin of SC2 in 2004 while the first detection of an SC2 isolate was in December 2012. This limitation is now discussed in the revised manuscript.

      Could the authors elaborate on whether the isolates resulted from single-colony picks? Is it possible that the different absence of a subclone is due to the fact that they picked only a colony?

      Yes, the isolates resulted from single-colony picks except when the presence of different colony morphologies was noted. In the latter, representative isolates for each colony morphologies were processed. We have revised the methods to make that clear.

      Figure 2: It is difficult to see which nodes belong to which patient due to the small font size. I wonder if it was possible to color the nodes for each patient, to make it more readable.

      We tried coloring the nodes but with > 60 distinct patients/colors we decided it did not improve clarity. We have revised figure 2 to increase the font size.  

      Page 7-8, lines 154-155: Did the authors check whether there were isolates of the same strain (that were found in the environment) present in other patients elsewhere in the ward?

      Yes. In rare cases, we observed virtually genetically identical isolates from two patients collected in different wards. Because we only have access to clinical isolate data (collected from patient X in ward Y) and do not have access to patient data (admission/discharge date, wards, rooms, etc.), we do not know but cannot exclude that patients overlap in a room prior to the sampling of their P. aeruginosa isolates. We designed our fixed thresholds to be conservative. As a result, in this analysis, these cases are labelled as “undetermined”.  

      Page 8: Do the authors have any information on antibiotic use during this timeframe? From the discussion, it seems like there is no patient-level prescription data. Is there any data on overall trends? How were trends in antibiotic use correlated with trends in antibiotic resistance?

      Unfortunately, patient-level prescription data (or any other data not linked to the bacterial specimens) was not accessible to us as this study was done as part of our public health surveillance activities.

      To infer the origin of infection, the authors used a static method with fixed thresholds and definitions. This study does not provide any uncertainty with their estimates. Maybe the authors could add a sentence in the discussion section that MCMC methods to infer transmission trees incorporating WGS could provide these estimates. These methods have not been applied to PA a lot but two examples where MCMC methods have been used without WGS (though the definition of environmental contamination may differ between these studies and this study).

      https://doi.org/10.1186/s13756-022-01095-x

      https://doi.org/10.1371/journal.pcbi.1006697

      Thank you for this great suggestion. We have revised the manuscript to include a discussion on the limitations of fixed thresholds to infer transmission chains/origins, and to discuss existing alternatives including MCMC methods. 

      Line 322-323: This sentence is a bit vague since not all of these HAI are due to P. aeruginosa. I would suggest citing a number that is specific to PA.

      Thank you. While our paper shows a particular example of protracted P. aeruginosa outbreak, the roll-out of routine WGS surveillance in the clinic will help prevent hospital-associated drug-resistant infections for more than this species. We believe that broadening the scope in the last sentence of the manuscript is important and we decline to revise as suggested.

    1. Good night, ladies, good night, sweet ladies, good night, good night.

      It's interesting to me how Eliot ends this section of The Waste Land with Ophelia's last words before she commits suicide. Lines before, we get references to "Bill," "Lou," and "May," indicating that the speaker is bidding farewell from the pub setting. Ophelia's line, on the other hand, bids farewell on behalf of not just Lil and the woman in the pub, but all the "sweet ladies" of the waste land. This idea of death as a fate is super interesting. The women have their emotional and spiritual deaths connected to Ophelia's physical death. This is yet another instance where we see suicide in a female in The Waste Land. If I think about what Eliot is trying to get at with women x waste land, especially with this Ophelia connection, I'd say the waste land is a world where the modes of expressing experiences like song, symbol, and even madness have been stripped of their meaning and beauty, leaving only bad nerves, dirty gossip, and the last call of the pub. This is obviously not the ideal place for women; hence, modern society is not fit for women to flourish.

    1. One critique of all of these approaches, however, is that no design, no matter how universal, will equally serve everyone. This is the premise of design justice44 Costanza-Chock, S. (2020). Design justice: Community-led practices to build the worlds we need. MIT Press. , which observes that design is fundamentally about power, in that designs may not only serve some people less well, but systematically exclude them in surprising, often unintentional ways.

      I agree with this. I am privileged to often forget about the exclusion of certain groups in "universal" designs. An example of this that I thought of was pens. I found out recently that a lot of left-handed people have a hard time with ink pens as there palms tend to smear the wet ink immediately after writing. Another example I could think of were the original Band-Aid colors, and how they did a poor job of representing people of all skin tones. Any design that leaves out a certain group of people should always have a substitute version for those people or should not be designed altogether.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      In this manuscript entitled "Molecular dynamics of the matrisome across sea anemone life history", Bergheim and colleagues report the prediction, using an established sequence analysis pipeline, of the "matrisome" - that is, the compendium of genes encoding constituents of the extracellular matrix - of the starlet sea anemone Nematostella vectensis. Re-analysis of an existing scRNA-Seq dataset allowed the authors to identify the cell types expressing matrisome components and different developmental stages. Last, the authors apply time-resolved proteomics to provide experimental evidence of the presence of the extracellular matrix proteins at three different stages of the life cycle of the sea anemone (larva, primary polyp, adult) and show that different subsets of matrisome components are present in the ECM at different life stages with, for example, basement membrane components accompanying the transition from larva to primary polyp and elastic fiber components and matricellular proteins accompanying the transition from primary polyp to the adult stage. 

      Strengths: 

      The ECM is a structure that has evolved to support the emergence of multicellularity and different transitions that have accompanied the complexification of multicellular organisms. Understanding the molecular makeup of structures that are conserved throughout evolution is thus of paramount importance. 

      The in-silico predicted matrisome of the sea anemone has the potential to become an essential resource for the scientific community to support big data annotation efforts and understand better the evolution of the matrisome and of ECM proteins, an important endeavor to better understand structure/function relationships. This study is also an excellent example of how integrating datasets generated using different -omic modalities can shed light on various aspects of ECM metabolism, from identifying the cell types of origins of matrisome components using scRNA-Seq to studying ECM dynamics using proteomics. 

      We greatly appreciate the positive feedback regarding the design of our study and the evolutionary significance of our findings.

      Weaknesses: 

      My concerns pertain to the three following areas of the manuscript: 

      (1) In-silico definition of the anemone matrisome using sequence analysis: 

      a) While a similar computational pipeline has been applied to predict the matrisome of several model organisms, the authors fail to provide a comprehensive definition of the anemone matrisome: In the text, the authors state the anemone matrisome is composed of "551 proteins, constituting approximately 3% of its proteome (see page 6, line 14), but Figure 1 lists 829 entries as part of the "curated" matrisome, Supplementary Table S1 lists the same 829 entries and the authors state that "Here, we identified 829 ECM proteins that comprise the matrisome of the sea anemone Nematostella vectensis" (see page 17, line 10). Is the sea anemone matrisome composed of 551 or 829 genes? If we refer to the text, the additional 278 entries should not be considered as part of the matrisome, but what is confusing is that some are listed as glycoproteins and the "new_manual_annotation" proposed by the authors and that refer to the protein domains found in these additional proteins suggest that in fact, some could or should be classified as matrisome proteins. For example, shouldn't the two lectins encoded by NV2.3951 and NV2.3157 be classified as matrisome-affiliated proteins? Based on what has been done for other model organisms, receptors have typically been excluded from the "matrisome" but included as part of the "adhesome" for consistency with previously published matrisome; the reviewer is left wondering whether the components classified as "Other" / "Receptor" should not be excluded from the matrisome and moved to a separate "adhesome" list. 

      In addition to receptors, the authors identify nearly 70 glycoproteins classified as "Other". Here, does other mean "non-matrisome" or "another matrisome division" that is not core or associated? If the latter, could the authors try to propose a unifying term for these proteins? Unfortunately, since the authors do not provide the reasons for excluding these entries from the bona fide matrisome (list of excluding domains present, localization data), the reader is left wondering how to treat these entries. 

      Overall, the study would gain in strength if the authors could be more definitive and, if needed, even propose novel additional matrisome annotations to include the components for now listed as "Other" (as was done, for example, for the Drosophila or C. elegans matrisomes). 

      The reviewer is correct to point out the confusing terminology used throughout our manuscript, where both the total of 829 proteins constituting the curated list of ECM domain proteins and the actual matrisome (excluding "others") were referred to as "matrisomes". In general, we followed the example set by Naba & Hynes in their 2012 paper (Mol Cell Proteomics. 2012 Apr;11(4):M111.014647. doi: 10.1074/mcp.M111.014647), where they define the "matrisome" as encompassing all components of the extracellular matrix ("core matrisome") and those associated with it ("matrisome-associated" proteins). This corresponds to our group of 551 proteins, comprising both core matrisome and matrisomeassociated proteins. The Naba & Hynes paper also contains the inclusive and exclusive domain lists for the matrisome that we applied for our dataset. In the revised manuscript, we have now labelled the group of 829 proteins as "curated ECM domain proteins/genes", which includes all proteins positively selected for containing a bona fide ECM domain. After excluding non-matrisomal proteins such as receptors, we arrive at the 551 proteins that constitute the "Nematostella matrisome". We have maintained this terminology throughout the revised manuscript and have revised Figures 1B and 4B accordingly.

      Regarding the category of "other" proteins, which by definition are not part of the matrisome although containing ECM domains, we have taken the reviewer's advice and classified these in more detail. We categorized all receptors as "adhesome" (202 proteins).  The remaining group of “other” secreted ECM domain proteins were then further subcategorized. Those exhibiting significant matches in the ToxProt database were subclassified as "putative venoms" (15 proteins). This group also includes the two lectins (NV2.3951 and NV2.3157), which had been originally shifted to the “other” category due to their classification as venoms. We categorized as “adhesive proteins” (28 proteins) factors such as coadhesins that due to their domain architecture resemble bioadhesive proteins described in proteomic studies of other invertebrate species, such as corals or sponges (see also https://doi.org/10.1016/j.jprot.2022.104506). Further sub-categories are stress/injury response proteins (9 proteins) and ion channels (6 proteins). The remaining 17 proteins were categorized as “uncharacterized ECM domain proteins”. These include highly diverse proteins possessing either single ECM domains or novel domain combinations. We decided to retain those in our dataset as candidates for future functional characterization.

      b) It is surprising that the authors are not providing the full currently accepted protein names to the entries listed in Supplementary Table S1 and have used instead "new_manual_annotation" that resembles formal protein names. This liberty is misleading. In fact, the "new_manual_annotation" seems biased toward describing the reason the proteins were positively screened for through sequence analysis, but many are misleading because there is, in fact, more known about them, including evidence that they are not ECM proteins. The authors should at least provide the current protein names in addition to their "new_manual_annotations". 

      c) To truly serve as a resource, the Table should provide links to each gene entry in the Stowers Institute for Medical Research genome database used and some sort of versioning (this could be added to columns A, B, or D). Such enhancements would facilitate the assessment of the rigor of the list beyond the manual QC of just a few entries. 

      d) Since UniProt is the reference protein knowledge database, providing the UniProt IDs associated with the predicted matrisome entries would also be helpful, giving easy access to information on protein domains, protein structures, orthology information, etc. 

      e) In conclusion, at present, the study only provides a preliminary draft that should be more rigorously curated and enriched with more comprehensive and authoritative annotations if the authors aspire the list to become the reference anemone matrisome and serve the community. 

      Table S1 has been updated to include links to the respective Stowers Institute IDs (first two columns), as well as SwissProt IDs and current descriptions from both the Stowers Institute (SI) and Swissprot.

      In our manual annotations, we prioritized these over automated ones due to the considerable effort invested in examining each sequence individually. The cnidaria-specific minicollagens and NOWA proteins might serve as an example. According to the SI descriptions, the minicollagens are annotated as “keratin-associated protein, predicted or hypothetical protein, collagen-like protein and pericardin”. We classified these as minicollagens on the basis of overall domain architecture and of signature domains and sequence motifs, such as minicollagen cysteine-rich domains (CRDs) and polyproline stretches (doi: 10.1016/j.tig.2008.07.001). NOWA is a CTLD/CRD-containing protein that is part of nematocyst tubules (doi:10.1016/j.isci.2023.106291). The first two NOWA isoforms, according to Si descriptions, were annotated as aggrecan and brevican core proteins, which is very misleading. We therefore feel that our manual annotations better serve the cnidarian research community in classifying these proteins.

      Automated annotations of ECM proteins often rely on similarities between individual domains, neglecting overall domain composition. For example, Swissprot descriptions annotate 31 TSP1 domain-containing proteins in our list as "Hemicentin-1", but closer inspection reveals that only one sequence (NV2.24790) qualifies as Hemicentin-1 due to its characteristic vWFA, Ig-like, TSP1, G2 nidogen, and EGF-like domain architecture. Regarding novel protein annotations, NV2.650 might serve as an example. While SI descriptions annotate this protein as "epidermal growth factor" based on the presence of several EGF-like domains, our analysis reveals two integrin alpha N-terminal domains that classify this sequence as integrin-related. We have therefore assigned a description (Secreted integrin-N-related protein) that references this defining domain and avoids misclassification within the EGF family.

      In cases where the automated annotation (including those in Genbank) matched our own findings, we adopted the existing description, as seen with netrin-1 (NV2.7734). We acknowledge that our manual annotations are not flawless and will be refined by future research. Nonetheless, we offer them as an approximation to a more accurate definition of the identified protein list.

      (2) Proteomic analysis of the composition of the mesoglea during the sea anemone life cycle: 

      a) The product of 287 of the 829 genes proposed to encode matrisome components was detected by proteomics. What about the other ~550 matrisome genes? When and where are they expressed? The wording employed by the authors (see line 11, page 13) implies that only these 287 components are "validated" matrisome components. Is that to say that the other ~550 predicted genes do not encode components of the ECM? This should be discussed. 

      Obviously, our wording was not sufficiently accurate here. In the revised Fig. 1B we indicated that 210 of the 551 matrisome (core and associated) proteins were confirmed by mass spectrometry. In total, 287 proteins were identified by mass spectrometry, meaning that 77 of those are non-matrisomal proteins belonging to the “adhesome” (47) and “other” (30) groups. The fact that the remaining 542 proteins of the matrisome predicted by our in silico analysis could not be identified has two major reasons: (1) Our study was focussed on the molecular dynamics of the mesoglea. Therefore, only mesogleas were isolated for the mass spectrometry analysis and nematocysts were mostly excluded by extensive washing steps. As nematocysts contribute significantly to the predicted matrisome, this group of proteins is underrepresented in the mass spectrometry analysis. (2) A significant fraction of the predicted ECM proteins constitutes soluble factors and transmembrane receptors. These might not be necessarily part of the mesoglea isolates. In addition, the isolation and solubilization method we applied might have technical limitations. Although we used harsh conditions for solubilizing the mesoglea samples (90°C and high DTT concentrations), we cannot exclude that we missed proteins which resisted solubilization and thus trypsinization. We confirmed that all genes predicted by the in silico analysis have transcriptomic profiles as demonstrated in supplementary table S4. We have clarified these points in the revised results part (p.6) and also revised the statement in line 16, page 13.

      b) Can the authors comment on how they have treated zero TMT values or proteins for which a TMT ratio could not be calculated because unique to one life stage, for example? 

      We did not include these proteins in the analysis of the respective statistical comparison. This involved only very few proteins (about 10).  

      c) Could the authors provide a plot showing the distribution of protein abundances for each matrisome category in the main figure 4? In mammals, the bulk of the ECM is composed of collagens, followed by fibrillar ECM glycoproteins, the other matrisome components being more minor. Is a similar distribution observed in the sea anemone mesoglea? 

      We have included such a plot showing protein abundances across life stages and protein categories (Fig. 4A). Collagens and basement membrane proteoglycans (perlecan) are the most abundant protein categories in the core matrisome while secreted factors dominate in the matrisome-associated group.

      d) Prior proteomic studies on the ECM of vertebrate organisms have shown the importance of allowing certain post-translational modifications during database search to ensure maximizing peptide-to-spectrum matching. Such PTMs include the hydroxylation of lysines and prolines that are collagen-specific PTMs. Multiple reports have shown that omitting these PTMs while analyzing LC-MS/MS data would lead to underestimating the abundance of collagens and the misidentification of certain collagens. The authors may want to reanalyze their dataset and include these PTMs as part of their search criteria to ensure capturing all collagen-derived peptides. 

      Thank you for this suggestion. We have re-analyzed our dataset including lysine and proline hydroxylation as PTM. While we obtained in total 70 more proteins using this approach, this additional group did not contain any large collagen or minicollagen we had not detected before. We only obtained two additional collagen-like proteins with very short triple helical domains (V2t013973001.1, NV2t024002001.1), one being a fragment. We don’t feel this justifies implementing a re-analysis of the proteome in our study.

      e) The authors should ensure that reviewers are provided with access to the private PRIDE repository so the data deposited can also be evaluated. They should also ensure that sufficient meta-data is provided using the SRDF format to allow the re-use of their LCMS/MS datasets. 

      We apologize for not providing the reviewer access in our initial submission and have asked the editorial office to forward the PRIDE repository link to all reviewers immediately after receiving the reviews. We did upload a metadata.csv file with the proteomics dataset. This file contains an annotation of all TMT labels to the samples and conditions and replicates used in the manuscript. It contains similar information as an SRDF format file. In addition, the search output files on protein and psm level have been provided. So, from our point of view, we provided all necessary information to reproduce the analysis.

      (3) Supplementary tables: 

      The supplementary tables are very difficult to navigate. They would become more accessible to readers and non-specialists if they were accompanied by brief legends or "README" tabs and if the headers were more detailed (see, for example, Table S2, what does "ctrl.ratio_Larvae_rep2" exactly refer to? Or Table S6 whose column headers using extensive abbreviations are quite obscure). Similarly, what do columns K to BX in Supplementary Table S1 correspond to? Without more substantial explanations, readers have no way of assessing these data points. 

      We have revised the tables and removed any redundant data columns. We also included detailed explanations of the used abbreviations, both in the headers and in a separate README file. Some of the information was apparently lost during the conversion to pdf files. We will therefore upload the original .xls files when submitting the revised manuscript.

      Reviewer #2 (Public review): 

      This work set out to identify all extracellular matrix proteins and associated factors present within the starlet sea anemone Nematostella vectensis at different life stages. Combining existing genomic and transcriptomic datasets, alongside new mass spectometry data, the authors provide a comprehensive description of the Nematostella matrisome. In addition, immunohistochemistry and electron microscopy were used to image whole mount and decellularized mesoglea from all life stages. This served to validate the de-cellularization methods used for proteomic analyses, but also resulted in a very nice description of mesoglea structure at different life stages. A previously published developmental cell type atlas was used to identify the cell type specificity of the matrisome, indicating that the core matrisome is predominantly expressed in the gastrodermis, as well as cnidocytes. The analyses performed were rigorous and the results were clear, supporting the conclusions made by the authors. 

      Thank you. We greatly appreciate the positive assessment of our study.

      Reviewer #3 (Public review): 

      Summary: 

      This manuscript by Bergheim et al investigates the molecular and developmental dynamics of the matrisome, a set of gene products that comprise the extracellular matrix, in the sea anemone Nematostella vectensis using transcriptomic and proteomic approaches. Previous work has examined the matrisome of the hydra, a medusozoan, but this is the first study to characterize the matrisome in an anthozoan. The major finding of this work is a description of the components of the matrisome in Nematostella, which turns out to be more complex than that previously observed in hydra. The authors also describe the remodeling of the extracellular matrix that occurs in the transition from larva to primary polyp, and from primary polyp to adult. The authors interpret these data to support previously proposed (Steinmetz et al. 2017) homology between the cnidarian endoderm with the bilaterian mesoderm. 

      Strengths: 

      The data described in this work are robust, combining both transcriptome and proteomic interrogation of key stages in the life history of Nematostella, and are of value to the community. 

      Thank you for your positive assessment of our dataset. 

      Weaknesses: 

      The authors offer numerous evolutionary interpretations of their results that I believe are unfounded. The main problem with extending these results, together with previous results from hydra, into an evolutionary synthesis that aims to reconstruct the matrisome of the ancestral cnidarian is that we are considering data from only two species. I agree with the authors' depiction of hydra as "derived" relative to other medusozoans and see it as potentially misleading to consider the hydra matrisome as an exemplar for the medusozoan matrisome. Given the organismal and morphological diversity of the phylum, a more thorough comparative study that compares matrisome components across a selection of anthozoan and medusozoan species using formal comparative methods to examine hypotheses is required. 

      Specifically, I question the author's interpretation of the evolutionary events depicted in this statement: 

      "The observation that in Hydra both germ layers contribute to the synthesis of core matrisome proteins (Epp et al. 1986; Zhang et al. 2007) might be related to a secondary loss of the anthozoan-specific mesenteries, which represent extensions of the mesoglea into the body cavity sandwiched by two endodermal layers." 

      Anthozoans and medusozoans are evolutionary sisters. Therefore, the secondary loss of "anthozoan-like mesenteries" in hydrozoans is at least as likely as the gain of this character state in anthozoans. By extension, there is no reason to prefer the hypothesis that the state observed in Nematostella, where gastroderm is responsible for the synthesis of the core matrisome components, is the ancestral state of the phylum. Moreover, the fossil evidence provided in support of this hypothesis (Ou et al. 2022) is not relevant here because the material described in that work is of a crown group anthozoan, which diversified well after the origin of Anthozoa. The phylogenetic structure of Cnidaria has been extensively studied using phylogenomic approaches and is generally well supported (Kayal et al. 2018; DeBiasse et al. 2024). Based on these analyses, anthozoans are not on a "basal" branch, as the authors suggest. The structure of cnidarian phylogeny bifurcates with Anthozoa forming one clade and Medusozoa forming the other. From the data reported by Bergheim and coworkers, it is not possible to infer the evolutionary events that gave rise to the different matrisome states observed in Nematostella (an anthozoan) and hydra (a medusozoan). Furthermore, I take the observation in Fig 5 that anthozoan matrisomes generally exhibit a higher complexity than other cnidarian species to be more supportive of a lineage-specific expansion of matrisome components in the Anthozoa, rather than those components being representative of an ancestral state for Cnidaria. Whatever the implication, I take strong issue with the statement that "the acquisition of complex life cycles in medusozoa, that are distinguished by the pelagic medusa stage, led to a secondary reduction in the matrisome repertoire." There is no causal link in any of the data or analyses reported by Bergheim and co-workers to support this statement and, as stated above, while we are dealing with limited data, insufficient to address this question, it seems more likely to me that the matrisome expanded in anthozoans, contrasting with the authors' conclusions. While the discussion raises many interesting evolutionary hypotheses related to the origin of the cnidarian matrisome, which is of vital interest if we are to understand the origin of the bilaterian matrisome, a more thorough comparative analysis, inclusive of a much greater cnidarian species diversity, is required if we are to evaluate these hypotheses. 

      DeBiasse MB, Buckenmeyer A, Macrander J, Babonis LS, Bentlage B, Cartwright P, Prada C, Reitzel AM, Stampar SN, Collins A, et al. 2024. A Cnidarian Phylogenomic Tree Fitted With Hundreds of 18S Leaves. Bulletin of the Society of Systematic Biologists [Internet] 3. Available from: https://ssbbulletin.org/index.php/bssb/article/view/9267

      Epp L, Smid I, Tardent P. 1986. Synthesis of the mesoglea by ectoderm and endoderm in reassembled hydra. J Morphol [Internet] 189:271-279. Available from: https://pubmed.ncbi.nlm.nih.gov/29954165/ 

      Kayal E, Bentlage B, Sabrina Pankey M, Ohdera AH, Medina M, Plachetzki DC, Collins AG, Ryan JF. 2018. Phylogenomics provides a robust topology of the major cnidarian lineages and insights on the origins of key organismal traits. BMC Evol Biol [Internet] 18:1-18. Available from: https://bmcecolevol.biomedcentral.com/articles/10.1186/s12862-018-1142-0

      Ou Q, Shu D, Zhang Z, Han J, Van Iten H, Cheng M, Sun J, Yao X, Wang R, Mayer G. 2022. Dawn of complex animal food webs: A new predatory anthozoan (Cnidaria) from Cambrian. The Innovation 3:100195 

      Steinmetz PRH, Aman A, Kraus JEM, Technau U. 2017. Gut-like ectodermal tissue in a sea anemone challenges germ layer homology. Nature Ecology & Evolution 2017 1:10 [Internet] 1:1535-1542. Available from: https://www.nature.com/articles/s41559-017-0285-5

      Zhang X, Boot-Handford RP, Huxley-Jones J, Forse LN, Mould AP, Robertson DL, Li L, Athiyal M, Sarras MP. 2007. The collagens of hydra provide insight into the evolution of metazoan extracellular matrices. J Biol Chem [Internet] 282:6792-6802. Available from: https://pubmed.ncbi.nlm.nih.gov/17204477/ 

      We agree with the reviewer that only the analysis of several additional anthozoan and medusozoan representatives will yield a valid basis for a reconstruction of the ancestral cnidarian matrisome and allow statements about ancestral or novel features within the phylum. We have therefore revised our statements in the discussion part of the manuscript by implementing the cited literature and also findings from medusozoan genome analysis (e.g. Gold et al., 2018) demonstrating that changes in gene content are as common in the anthozoans as in medusozoans, which questioned the previously stated “basal” state of Nematostella or of anthozoans in general.

      Reviewer #1 (Recommendations for the authors): 

      (1) In Figure 2A, an "o" is missing in the labeling of the "developing cnidcytes" population. 

      Thank you, we have corrected the typo.

      (2) It would be helpful to have the different life stages indicated as headers of the heat maps presented in Figure 4. 

      We have included symbolic representations for the different life stages on top of the heat maps in addition to the respective labels at the bottom.

      Reviewer #2 (Recommendations for the authors): 

      Important changes: 

      (1) Figure 2B The x-axis tissue names should be changed to something more easily readable/understandable - some are clear, but others are not. Perhaps abbreviations could be expanded in the legend. 

      We have expanded the legend in Fig. 2B to render it more easily readable. We have also rotated the maps in A to have them aligned with the ones in Fig.3B.

      (2) Figure 3B This figure would be improved by the inclusion of cluster names, to understand better the mapping. 

      We have added relevant cluster names to Fig. 3B and as stated above aligned the orientation of the maps in Fig. 2B and Fig. 3B.

      (3) Figure 3C As with 2B, I find the y-axis cnidocyte cell state names to be unclear at times. Perhaps abbreviations could be expanded in the legend. 

      All abbreviations were expanded in Fig.3C axis labels.

      (4) Many of the supplementary tables are not well exported or easily readable as is (gene names are truncated, headers truncated, etc), which means that they may not be easily usable by researchers in the field interested in following up on this work in other contexts. Indeed, to be more usable, please consider sharing these supplementary data as .csv files, for example, instead of as .pdfs. 

      We are sorry for this inconvenience, which was obviously caused by the conversion to pdf files. We will upload the original csv files when submitting the revised manuscript.

      Smaller nitpicky comments: 

      (5) Page 2 line 4 & page 3 line 7: Please consider a term other than "pre-bilaterian". The drawing/ordering of a phylogeny of extant species is not meaningful in terms of more or less ancestral. e.g. if the tips are flipped in the drawing of the tree, can we say that bilaterians are pre-cnidarians? What does that mean? 

      We have used that term on the basis that cnidarians existed before the appearance of bilaterians according to the fossil record and molecular phylogenies (McFadden et al., 2021; Adoutte et al., 2000;Cavalier-Smith et al., 1996; Collins, 1998; Kim et al., 1999; Medina et al., 2001; Wainright et al., 1993). To acknowledge remaining uncertainties in the timing of origin of animals, we will use the term “early-diverging metazoans” instead, which is widely accepted in the cnidarian community. 

      (6) Page 3 line 9 I was confused by the use of "gastrula-shaped body" to describe cnidarians, which are on the whole very morphologically diverse and don't all resemble gastrulae (that can also be quite diverse). 

      This term is sometimes used to refer to the diploblastic cnidarian body plan (outer ectoderm, inner endoderm) with a mouth that corresponds to the blastopore. To avoid misunderstandings, we changed it in the revised manuscript to “Cnidarians, the sister group to bilaterians, are characterized by a simple body plan with a central body cavity and a mouth opening surrounded by tentacles.”

      Reviewer #3 (Recommendations for the authors): 

      (1) In general, I felt there was a lot of discussion about protein structure and diversity that is difficult to follow without a figure. I think some of the information in Supplementary Figures S5, S9, and S11 should be in the main figures. 

      Following the reviewer’s suggestion, we have integrated Fig. S5 (collagens) into the main Fig. 2 and Fig. S9 (polydoms) into Fig. 4. As metalloproteases are not extensively discussed in the manuscript (and also due to the large size of the figure) we have kept Fig. S11 as a supplementary figure.

      (2) Page 3, Line 7: The use of the term "pre-bilaterian" is inappropriate. Cnidarians and bilaterians are evolutionary sisters. Therefore, each lineage derives from the same split and is the same age. The cnidarian lineage is not older than the bilaterian lineage. 

      Following a similar request by reviewer 2 we have replaced this term by “early diverging metazoans”.

      (3) Page 5, Line 10. How were in silico matrisomes from early-branching metazoan species predicted? 

      We applied the same bioinformatic pipeline as for the Nematostella matrisome. We clarified this in the respective methods part.

      (4) Page 16, Line 8: This should be Thus. 

      Obviously, the wording of this sentence was ambiguous. We changed it to ”In contrast, the adult mesoglea is significantly enriched in elastic fiber components, such as fibrillins and fibulin. This compositional shift likely adds to the visco-elastic properties (Gosline 1971a, b) of the growing body column (Fig. 4B,D, supplementary table S7).”

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer 1:

      While BAP1 mutant UM cell lines were included for some of the experiments, it seems the in-vivo data mentioned in the response to the reviewers comment is missing? The authors stated that "MP46 (Supplementary Fig. 3a) is BAP1-null uveal melanoma cell line with no detectable protein expression (Amirouchene-Angelozzi et al., Mol Oncol 2014), and we have observed strong tumor growth inhibition in this CDX model with our BAF ATPase inhibitor." But the CDX model data shown in Figure 4 is from 92.1 cells. If this data is available, then the manuscript would benefit from its addition.

      We thank the reviewer for bringing this to our attention. As the reviewer mentioned, we show 92-1 CDX model in our manuscript. Additionally, strong tumor growth inhibition in MP-46  CDX model treated with our BAF ATPase inhibitor can be found in Vaswani et al., 2025 (PMID:39801091, https://pubmed.ncbi.nlm.nih.gov/39801091/).

      Reviewer 3:<br /> Supplementary Figure 2C<br /> Is the T910M mutation in the parental MP41 cells heterozygous? If so, the authors should indicate this in the figure legend. If this is a homozygous mutation, the authors should explain how the inhibitors suppress SMARCA4 activity in cells that have a LOF mutation.

      We thank the reviewer for bringing this to our attention. We updated the figure legend accordingly to reflect the genotype of the mutations highlighted in the table.


      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The presented study by Centore and colleagues investigates the inhibition of BAF chromatin remodeling complexes. The study is well-written, and includes comprehensive datasets, including compound screens, gene expression analysis, epigenetics, as well as animal studies. This is an important piece of work for the uveal melanoma research field, and sheds light on a new inhibitor class, as well as a mechanism that might be exploited to target this deadly cancer for which no good treatment options exist.

      Strengths:

      This is a comprehensive and well-written study.

      Weaknesses:

      There are minimal weaknesses.

      We thank the reviewer for the positive comments.

      Reviewer #2 (Public Review):

      Summary:

      The authors generate an optimized small molecule inhibitor of SMARCA2/4 and test it in a panel of cell lines. All uveal melanoma (UM) cell lines in the panel are growth-inhibited by the inhibitor making the focus of the paper. This inhibition is correlated with the loss of promoter occupancy of key melanocyte transcription factors e.g. SOX10. SOX10 overexpression and a point mutation in SMARCA4 can rescue growth inhibition exerted by the SMARCA2/4 inhibitor. Treatment of a UM xenograft model results in growth inhibition and regression which correlates with reduced expression of SOX10 but not discernible toxicity in the mice. Collectively the data suggest a novel treatment of uveal melanoma.

      Strengths:

      There are many strengths of the study including the strong challenge of the on-target effect, the assays used, and the mechanistic data. The results are compelling as are the effects of the inhibitor. The in vivo data is dose-dependent and doses are low enough to be meaningful and associated with evidence of target engagement.

      Weaknesses:

      The authors introduce the field stating that SMARCA4 inhibitors are more effective in SMARCA2 deficient cancers and the converse. Since the desirable outcome of cancer therapy would be synthetic lethality it is not clear why a dual inhibitor is desirable. Wouldn't this be associated with more side effects? It is not known how the inhibitor developed here impacts normal cells, in particular T cells which are essential for any durable response to cancer therapies in patients. Another weakness is that the UM cell lines used do not molecularly resemble metastatic UM. These UM most frequently have mutations in the BAP1 tumor suppressor gene. It is not clear if the described SMARCA2/4 inhibitor is efficacious in BAP1 mutant UM cell lines in vitro or BAP1 mutant patient-derived xenografts in vivo.

      We thank the reviewer for their insightful and constructive comments. As we demonstrate in Fig. 1d, uveal melanoma cells are selectively and deeply sensitive to BAF ATPase inhibition, and provides a therapeutic window. This is confirmed in Fig. 4a-c, as we demonstrated robust tumor growth inhibition, achieved at a dose well-tolerated in xenograft study. FHD-286, a dual BRM/BRG1 inhibitor similar to FHT-1015 with optimized physical properties, has been evaluated in a Phase I trial in patients with metastatic uveal melanoma (NCT04879017) and manuscript describing results of this clinical trial is currently in preparation.

      As the reviewer mentioned, BAP1 loss is a signature of metastatic uveal melanoma. MP38 is a BAP1 mutant uveal melanoma cell line, and we demonstrated growth inhibition and robust caspase 3/7 activity in response to FHT-1015 (Supplementary Fig. 3a and 3f). MP46 (Supplementary Fig. 3a) is BAP1-null uveal melanoma cell line with no detectable protein expression (Amirouchene-Angelozzi et al., Mol Oncol 2014), and we have observed strong tumor growth inhibition in this CDX model with our BAF ATPase inhibitor.

      Reviewer #3 (Public Review):

      Summary:

      This manuscript reports the discovery of new compounds that selectively inhibit SMARCA4/SMARCA2 ATPase activity that work through a different mode as previously developed SMARCA4/SMARCA2 inhibitors. They also demonstrate the anti-tumor effects of the compounds on uveal melanoma cell proliferation and tumor growth. The findings indicate that the drugs exert their effects by altering chromatin accessibility at binding sites for lineage-specific transcription factors within gene enhancer regions. In uveal melanoma, altered expression of the transcription factor, SOX10, and SOX10 target gene underlies the anti-proliferative effects of the compounds. This study is significant because the discovery of new SMARCA4/SMARCA2 inhibitory compounds that can abrogate uveal melanoma tumorigenicity has therapeutic value. In addition, the findings provide evidence for the therapeutic use of these compounds in other transcription factor-dependent cancers.

      Strengths:

      The strengths of this manuscript include biochemical evidence that the new compounds are selective for SMARCA4/SMARCA2 over other ATPases and that the mode of action is distinct from a previously developed compound, BRM014, which binds the RecA lobe of SMARCA2. There is also strong evidence that FHT1015 suppresses uveal melanoma proliferation by inducing apoptosis. The in vivo suppression of tumor growth without toxicity validates the potential therapeutic utility of one of the new drugs. The conclusion that FHT1015 primarily inhibits SMARCA4 activity and thereby suppresses chromatin accessibility at lineage-specific enhancers is substantiated by ATAC-seq and ChIP-seq studies.

      Weaknesses:

      The weaknesses include a lack of more precise information on which SMARCA4/SMARCA2 residues the drugs bind. Although the I1173M/I1143M mutations are evidence that the critical residues for binding reside outside the RecA lobe, this site is conserved in CHD4, which is not affected by the compounds. Hence, this site may be necessary but not sufficient for drug binding or specifying selectivity. A more precise evaluation of the region specifying the effect of the new compounds would strengthen the evidence that they work through a novel mode and that they are selective. Another concern is that the mechanisms by which FHT1015 promotes apoptosis rather than simply cell cycle arrest are not clear. Does SOX10 or another lineage-specific transcription factor underlie the apoptotic effects of the compounds?

      We thank the reviewer for the valuable comments.

      We believe that our dual ATPase inhibitor is selective and additional insights into binding specificity and selectivity for earlier stage compounds of this series were recently published in Vaswani et al., 2025 (PMID:39801091, https://pubmed.ncbi.nlm.nih.gov/39801091/).

      The reviewer also poses a great question regarding the mechanism of apoptosis. The mechanism of apoptosis is extremely complex, but we observed a decrease in pro-survival BCL-2 protein expression in response to FHT-1015, in the experiment corresponding to Supplementary Fig. 5e. In the experiment described in Fig. 3k, we also monitored caspase 3/7 activity over time, and SOX10 overexpression rescued 92-1 cells from FHT-1015 induced apoptosis. This suggests the role of SOX10 as an important mediator of response to BAF ATPase inhibition, including apoptosis induced by FHT-1015.

      Additional Reviews:

      The referees would like to draw the authors' attention to the following issues that would best benefit from additional revision. 

      The clinical relevance of the study would be strengthened by the use of uveal melanoma cell lines with BAP1 mutations that better represent metastatic uveal melanoma. The use of patient-derived xenografts would also be pertinent and would be a useful addition. Similarly, attention to the effects of the inhibitor on non-cancerous proliferative cells such as blood/T/immune cells would also strengthen the manuscript. As the study reports the administration of one of the inhibitors in mice for the xenograft experiments, it would be important to assess any potential effects on blood cell counts and better discuss the eventual toxicity or lack of toxicity and how it was assessed. 

      The authors should better explain how SOX10 over expression can rescue viability in the presence of the inhibitor. Similarly given the critical roles of BRG1, SOX10, and MITF in cutaneous melanoma some specific discussion on the sensitivity of cutaneous melanoma cells to the inhibitor should be considered, and potential differences with uveal melanoma highlighted. 

      Aside from these issues, the authors are urged to consider the other points mentioned below. 

      Reviewer #1 (Recommendations For The Authors): 

      Figure 1d, as well as the text in the manuscript referring to this figure, would benefit from indicating specific cell lines used for UM. The same for the sentence in line 153. 

      We thank the reviewer for bringing this to our attention. We have added the cell line names and updated the manuscript accordingly.

      For any of the studies conducted, is there any link with the genetics of UM? E.g. BAP1 wildtype/BAP1 mutant? 

      As addressed above in the public review section, MP38 is a BAP1 mutant uveal melanoma cell line, and we demonstrated growth inhibition and robust caspase 3/7 activity in response to FHT-1015 (Supplementary Fig. 3a and 3f). MP46 (Supplementary Fig. 3a) is BAP1-null uveal melanoma cell line with no detectable protein expression (Amirouchene-Angelozzi et al., Mol Oncol 2014), and we have observed strong tumor growth inhibition in this CDX model with our BAF ATPase inhibitor.

      Row 191 - How were peaks classified as enhancer-occupied? 

      We used annotatePeaks function of HOMER package to annotate genomic locations, as well as H3K27ac ChIP-seq to annotate peaks as enhancer-occupied. We thank the reviewer to pointing it out and have updated the manuscript accordingly to include this information.

      Row 259, the two cell lines should be named, also in Figure 3i. 

      We have added the cell line names and updated the manuscript accordingly.

      Reviewer #2 (Recommendations For The Authors): 

      As a proof of concept, this study is truly excellent and the authors should be commended. However, it is desirable that new knowledge in cancer is translated to the clinic. To this end there are a few things needed to strengthen the study. 

      I am rephrasing my statements from the public review to say that I would recommend testing the inhibitor in T cells (side effects) and BAP1 mutant cell lines (for clinical relevance). 

      As addressed in the public review section, MP38 is a BAP1 mutant uveal melanoma cell line, and we demonstrated growth inhibition and robust caspase 3/7 activity in response to FHT-1015 (Supplementary Fig. 3a and 3f). MP46 (Supplementary Fig. 3a) is BAP1-null uveal melanoma cell line with no detectable protein expression (Amirouchene-Angelozzi et al., Mol Oncol 2014), and we have observed strong tumor growth inhibition in this CDX model with our BAF ATPase inhibitor.

      Regarding concerns for any potential side effect on T cells, we observed an increase in both CD4 and CD8 T-cell populations in the peripheral blood and the spleen, when naïve, non-tumor bearing CD-1 mice were dosed with SMARCA2/4 dual ATPase inhibitor FHD-286 once daily for 14 days. FHD-286 is a compound similar to FHT-1015 described in Vaswani et al., 2025 (PMID:39801091, https://pubmed.ncbi.nlm.nih.gov/39801091/). In addition, FHD-286 has been tested in tumor bearing syngeneic models. When B16F10 tumor bearing C57BL/6 were dosed with FHD-286 for 10 days, we observed an increase in CD69+ activated CD8 T-cell infiltration in the tumor microenvironment (doi:10.1136/jitc-2022-SITC2022.0888).

      Reviewer #3 (Recommendations For The Authors): 

      (1) Determine drug binding by crystal structure or generate additional SMARCA4 or SMARCA2 mutations in the region near I1173/I1143 that are not conserved in CHD4 and test them in an ATPase assay for effects on drug inhibition. For example, Q1166 in SMARCA4 and Q1136 in SMARCA4 could be changed to Alanine as in CHD4. Would this abrogate drug inhibition? 

      We believe that our dual ATPase inhibitor is selective and additional insights into binding specificity and selectivity for earlier stage compounds of this series were recently published in Vaswani et al., 2025 (PMID:39801091, https://pubmed.ncbi.nlm.nih.gov/39801091/).

      (2) The finding that SOX10 can rescue the antiproliferative effects of FHT1015 suggests that SMARCA4 is primarily needed for SOX10 expression. However, the co-occupancy of SMARCA4 and SOX10 at enhancers suggests that they cooperate to promote chromatin accessibility. It is unclear how over-expression of SOX10 can promote chromatin accessibility in drug-inhibited cells since SOX10 does not have chromatin remodeling activity. ATAC-seq in cells over-expressing SOX10 and treated with the drug could identify SOX10-dependent targets that do not require SMARCA4 activity and clarify the mechanism. It would also be informative to determine if SOX10 over-expression abrogates the effects of FHT1015 on both cell cycle and apoptosis, helping to resolve whether it is a partial or complete rescue of proliferation. 

      We agree that running ATAC-seq in cells overexpressing SOX10 would clarify this mechanism. However, shifts in corporate strategy deprioritized any further experiments for this project. One potential mechanism that SOX10 overexpression can partially rescue BAF inhibition phenotype is through overexpressed SOX10 localizing to open chromatin regions (mostly promoters) across the genome. We know from our ATAC-seq data (Fig. 2) that BAF inhibition leads to loss of chromatin accessibility at SOX10 enhancer sites, while promoter regions are only partially affected. Therefore, we think that overexpression of SOX10 would allow upregulation of its target genes via binding to the promoter regions. In this model, the enhancer-driven SOX10 target genes are likely to remain silenced.  

      (3) Although the in vivo studies indicate that the drugs are well-tolerated, additional in vitro studies to determine the effects of the drug on the proliferation/survival of non-cancerous cells would further validate their therapeutic utility.

      Author Response: The reviewer raises a critical question. FHD-286, a dual BRM/BRG1 inhibitor similar to FHT-1015 with optimized physical properties, has been evaluated in a Phase I trial in patients with metastatic uveal melanoma (NCT04879017), and it was well tolerated at continuous daily dose of up to 7.5 mg QD and at intermittent dose of up to 17.5 mg QD.  Manuscript describing results of this clinical trial is currently in preparation.

    1. Author response:

      Reviewer #1 (Public review):

      It appears obvious that with no or a little fitness penalty, it becomes beneficial to have MHC-coding genes specific to each pathogen. A more thorough study that takes into account a realistic (most probably non-linear in gene number) fitness penalty, various numbers of pathogens that could grossly exceed the self-consistent fitness limit on the number of MHC genes, etc, could be more informative.

      The reviewer seems to be referring to the cost of excessively high presentation breadth.  Such a cost is irrelevant to the inferior fitness of a polymorphic population with heterozygote advantage compared to a monomorphic population with merely doubled gene copy number.  It is relevant to the possibility of a fitness valley separating these two states, but this issue is addressed explicitly in the manuscript.

      An addition or removal of one of the pathogens is reported to affect "the maximum condition", a key ecological characteristic of the model, by an enormous factor 10^43, naturally breaking down all the estimates and conclusions made in [RS]. This observation is not substantiated by any formulas, recipes for how to compute this number numerically, or other details, and is presented just as a self-standing number in the text.

      It is encouraging that the reviewer agrees that this observation, if correct, would cast doubt on the conclusions of Siljestam and Rueffler.  I would add that it is not the enormity of this factor per se that invalidates those conclusions, but the fact that the automatic compensatory adjustment of c<sub>max</sub> conceals the true effects of removing a pathogen, which are quite large.

      I am not sure why the reviewer doubts that this observation is correct.  The factor of 2.7∙10<sup>43</sup> was determined in a straightforward manner in the course of simulating the symmetric Gaussian model of Siljestam and Rueffler with the specified parameter values.  A simple way to determine this number is to have the simulation code print the value to which c<sub>max</sub>  is set, or would be set, by the procedure of Siljestam and Rueffler for different parameter values.  In another section of this response I will describe how to do this with the simulation code written and used by Siljestam and Rueffler; doing so confirms the value that I obtained with my own code.  Furthermore, I will now give a theoretical derivation of this factor.

      As specified by Siljestam and Rueffler, the positions of the m pathogens in (m-1)-dimensional antigenic space correspond to the vertices of a regular simplex centered at the origin, with distance between vertices equal to 1.  The squared distance from the origin to each of the m vertices of such a simplex is (m-1)/2m (https://polytope.miraheze.org/wiki/Simplex).  Thus, the sum of the m squared distances is (m-1)/2.  For the (0, 0) homozygote, condition is multiplied by a factor of exp(-(vr)<sup>2</sup>/2) for each pathogen, where r is the distance from the origin.  It follows that, with v=20, all the pathogens together decrease condition by a factor of exp(20<sup>2</sup>∙(m-1)/4) = exp(100∙(m-1)).  Thus, increasing or decreasing m by 1 changes this value by a factor of exp(100) = 2.7∙10<sup>43</sup>.

      This begs the conclusion that the branching remains robust to changes in c_max that span 4 decades as well.

      That shows only that the results are not extremely sensitive to c<sub>max</sub> or K.  They are, nonetheless, exquisitely sensitive to m and v.  This difference in sensitivities is the reason that a relatively small change to m leads to such a large compensatory change in c<sub>max</sub> a change large enough to have a major effect on the results.

      As I wrote above, there is no explanation behind this number, so I can only guess that such a number is created by the removal or addition of a pathogen that is very far away from the other pathogens. Very far in this context means being separated in the x-space by a much greater distance than 1/\nu, the width of the pathogens' gaussians. Once again, I am not totally sure if this was the case, but if it were, some basic notions of how models are set up were broken. It appears very strange that nothing is said in the manuscript about the spatial distribution of the pathogens, which is crucial to their effects on the condition c.

      I did not explicitly describe the distribution of pathogens in antigenic space because it is exactly the same as in Siljestam and Rueffler, Fig. 4: the vertices of a regular simplex, centered at the origin, with unity edge length.

      The number in question (2.7∙10<sup>43</sup>) pertains to the Gaussian model with v=20.  As specified by Siljestam and Rueffler, each pathogen lies at a distance of 1 from every other pathogen, so the distance of any pathogen from the others is indeed much greater than 1/v.  This condition holds, however, for most of the parameter space explored by Siljestam and Rueffler (their Fig. 4), and for all of the parameter space that seemingly supports their conclusions.  Thus, if this condition indicates that “basic notions of how models are set up were broken”, they must have been broken by Siljestam and Rueffler.

      Overall, I strongly suspect that an unfortunately poor setup of the model reported in the manuscript has led to the conclusions that dispute the much better-substantiated claims made in [SD].

      The reviewer seems to be suggesting that my simulations are somehow flawed and my conclusions unreliable.  I will therefore describe how my conclusions about sensitivity to parameter values can be verified using the simulation code provided by Siljestam and Rueffler themselves, with only small, easily understood modifications.  I will consider adding this description as a supplement when I revise the manuscript.

      The starting point is the Matlab file MHC_sim_Dryad.m, available at https://doi.org/10.5061/dryad.69p8cz98j.  First, we can add a line that prints the value of the variable logcmax, which represents the natural logarithm of cmax determined and used by the code.  Below line 116 (‘prework’), add the line ‘logcmax’ (with no semicolon).

      Now, at the Matlab prompt, execute MHC_sim_Dryad(false, 8, 20, 1) to run the simulation for the Gaussian model with m=8, v=20, and K=1.  The output will indicate that logcmax=700, in accord with the theoretical factor exp(100*(m-1)) derived above.  The allelic diversity, n<sub>e</sub>, will rise to a steady state-level of about 140, as in the red curve of my Fig. 2.

      Now lower m to 7, i.e,  run MHC_sim_Dryad(false, 7, 20, 1).  The output will indicate that logcmax=600.  This confirms that lowering m by 1 causes the code to lower the value of c<sub>max</sub> by a factor exp(100)=2.7∙10<sup>43</sup>, which must also be the factor by which the condition of the most fit homozygote would increase without this adjustment.

      With the change of m to 7 and the compensatory change in c<sub>max</sub>, steady-state allelic diversity remains high.  But what if m changes but c<sub>max</sub> remains the same, as it would in reality?

      To find out, we can fix the value of c<sub>max</sub> to the value used with m=8 by adding the following line below the line previously added: ‘logcmax = 700’.  With this additional modification in place, executing MHC_sim_Dryad(false, 7, 20, 1) confirms that without a compensatory change to c<sub>max</sub>, lowering m from 8 to 7 mostly eliminates allelic diversity, in accord with the corresponding curve in my Fig. 2.  Similarly, raising m from 8 to 9, or changing v from 20 to 19.5 or 20.5 (executing MHC_sim_Dryad(false, 8, 19.5, 1) or MHC_sim_Dryad(false, 8, 20.5, 1)), largely eliminates diversity, confirming the other results in my Fig. 2.  Results for the bitstring model can also be confirmed, though this requires additional changes to the code.

      Thus, the extreme sensitivity of the results of Siljestam and Rueffler to parameter values can be verified with the code that they used for their simulations, indicating that my conclusions are not consequences of my having done a “poor setup of the model”.

      Response to Reviewer #2 (Public review):

      (1) The statement that the model outcome of Siljestam and Rueffler is very sensitive to parameter values is, in this form, not correct. The sensitivity is only visible once a strong assumption by Siljestam and Rueffler is removed. This assumption is questionable, and it is well explained in the manuscript by J. Cherry why it should not be used. This may be seen as a subtle difference, but I think it is important to pin done the exact nature of the problem (see, for example, the abstract, where this is presented in a misleading way).

      I appreciate the distinction, and the importance of clearly specifying the nature of the problem.  However, Siljestam and Rueffler do not invoke the implausible assumption that changes to the number of pathogens or their virulence will be accompanied by compensatory changes to c<sub>max</sub>.  Rather, they describe the adjustment of c<sub>max</sub> (Appendix 7) as a “helpful” standardization that applies “without loss of generality”.  Indeed, my low-diversity results could be obtained, despite such adjustment, by combining the small change to m or v with a very large change to K (e.g., a factor of 2.7∙10<sup>43</sup>).  In this sense there is no loss of generality, but the automatic adjustment of c<sub>max</sub> obscures the extreme sensitivity of the results to m and v.

      (2) The title of the study is very catchy, but it needs to be explained better in the text.

      I had hoped that the final paragraph of the Discussion would make the basis for the title clear.  I will consider whether this can be clarified in a revision.

    1. Chapter 4: Common Writing Assignments College writing assignments serve a different purpose than the typical writing assignments you completed in high school. The textbook Successful Writing explains that high school teachers generally focus on teaching you to write in a variety of modes and formats, including personal writing, expository writing, research papers, creative writing, and writing short answers and essays for exams. Over time, these assignments help you build a foundation of writing skills. In college, many instructors will expect you to already have that foundation. Your college composition courses will focus on writing for its own sake, helping you make the transition to college-level writing assignments. However, in most other college courses, writing assignments serve a different purpose. In those courses, you may use writing as one tool among many for learning how to think about a particular academic discipline. Additionally, certain assignments teach you how to meet the expectations for professional writing in a given field. Depending on the class, you might be asked to write a lab report, a case study, a literary analysis, a business plan, or an account of a personal interview. You will need to learn and follow the standard conventions for those types of written products. Finally, personal and creative writing assignments are less common in college than in high school. College courses emphasize expository writing, writing that explains or informs. Often expository writing assignments will incorporate outside research, too. Some classes will also require persuasive writing assignments in which you state and support your position on an issue. College instructors will hold you to a higher standard when it comes to supporting your ideas with reasons and evidence. Common Types of College Writing Assignments Below you will find a list of different types of writing assignments you may write as you pursue your academic goals. Review each assignment and think about the writing you’ve done in high school and how these assignments might look different in your college composition classes.   Figure 1   After reviewing Figure 1 and the descriptions of various types of writing assignments, watch the following video about the writing process. No matter what type of assignment you are writing, it will be important for you to follow a writing process: a series of steps a writer takes to complete a writing task. Making use of a writing process ensures that you stay organized and focused while allowing you to break up a larger assignment into several distinct tasks. Not every writer follows the same process, and part of the work you will do in your writing classes is to discover the writing process that works best for you. Even though the writing process is often presented as a linear set of steps that writers follow from beginning to end, composition scholars now recognize the recursive nature of writing. In other words, many writers repeat steps in the process and not all writers invest an equal amount of time in each stage. Instead, writers often loop back to individual stages as needed in order to develop and refine their work. As you watch the video below, consider your current writing process (if you have one) and reflect upon how you might develop your process to support your growth as a writer—and to save yourself time and stress when completing college writing assignments. In the previous chapters, we covered college writing at CNM and reading strateg

      The key to this is there are different types of writing assignments that has in the common writing assignments.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      This work computationally characterized the threat-reward learning behavior of mice in a  recent study (Akiti et al.), which had prominent individual differences. The authors  constructed a Bayes-adaptive Markov decision process model and fitted the behavioral data  by the model. The model assumed (i) hazard function starting from a prior (with free mean  and SD parameters) and updated in a Bayesian manner through experience (actually no real  threat or reward was given in the experiment), (ii) risk-sensitive evaluation of future  outcomes (calculating lower 𝛼 quantile of outcomes with free 𝛼 parameter), and (iii) heuristic  exploration bonus. The authors found that (i) brave animals had more widespread hazard  priors than timid animals and thereby quickly learned that there was in fact little real threat,  (ii) brave animals may also be less risk-aversive than timid animals in future outcome  evaluation, and (iii) the exploration bonus could explain the observed behavioral features,  including the transition of behavior from the peak to steady-state frequency of bout. Overall,  this work is a novel interesting analysis of threat-reward learning, and provides useful  insights for future experimental and theoretical work. However, there are several issues that I  think need to be addressed.

      Strengths:

      (1) This work provides a normative Bayesian account for individual differences in  braveness/timidity in reward-threat learning behavior, which complements the analysis by  Akiti et al. based on model-free threat reinforcement learning.

      (2) Specifically, the individual differences were characterized by (i) the difference in the  variance of hazard prior and potentially also (ii) the difference in the risk-sensitivity in the  evaluation of future returns.

      Weakness:

      (1) Theoretically the effect of prior is diluted over experience whereas the effect of biased  (risk-aversive) evaluation persists, but these two effects could not be teased apart in the  fitting analysis of the current data.

      (2) It is currently unclear how (whether) the proposed model corresponds to neurobiological ( rather than behavioral) findings, different from the analysis by Akiti et al.

      We thank reviewer #1 for their useful feedback which we’ve used to improve the discussion,  formatting and clarity of the paper, and for highlighting important questions for future  extensions of our work.

      Major points:

      (1) Line 219

      It was assumed that the exploration bonus was replenished at a steady rate when the animal  was at the nest. An alternative way would be assuming that the exploration bonus slowly  degraded over time or experience, and if doing so, there appears to be a possibility that the  transition of the bout rate from peak to steady-state could be at least partially explained by  such a decrease in the exploration bonus.

      Section 2.2.3 explains the mechanism of the exploration bonus which motivates approach.  We think that the mechanism suggested by the reviewer is, in essence, what is happening in  the model. The exploration pool is indeed depleted over time or bouts of experience at the  object. In the peak confident phase for brave animals and the peak cautious phase for timid  animals, the rate of depletion exceeds the rate of regeneration, since the agent spends only  a single turn at the nest between bouts. In the steady-state phase, the exploration pool has  depleted so much previously that the agent must wait multiple turns at the nest for the pool  to regenerate to a sufficiently high value to justify approaching the object again.

      We have updated section 2.2.3 to explain that agents spend one turn at the nest during peak  phase but multiple turns during steady-state phase. Hopefully, this makes our mechanism  clear:

      “In simulations, when 𝐺(𝑡) is high, the agent has a high motivation to explore the object,  spending only a single turn in the nest state between bouts. In other words, the depletion  from 𝐺0 substantially influences the time point at which approach makes a transition from  peak to steady-state; the steady-state time then depends on the dynamics of depletion  (when at the object) and replenishment (when at the nest). In particular, in the steady-state  phases, the agent must wait multiple turns at the nest for 𝐺(𝑡)  to regenerate so that  informational reward once again exceeds the potential cost of hazard.“

      (2) Line 237- (Section 2.2.6, 2.2.7, Figures 7, 9)

      I was confused by the descriptions about nCVaR. I looked at the cited original literature  Gagne & Dayan 2022, and understood that nCVaR is a risk-sensitive version of expected  future returns (equation 4) with parameter α (α-bar) (ranging from 0 to 1) representing risk  preference. Line 269-271 and Section 4.2 of the present manuscript described (in my  understanding) that α was a parameter of the model. Then, isn't it more natural to report  estimated values of α, rather than nCVaR, for individual animals in Section 2.2.6, 2.2.7,  Figures 7, 9 (even though nCVaR monotonically depends on α)? In Figures 7 and 9, nCVaR  appears to be upper-bounded to 1. The upper limit of α is 1 by definition, but I have no idea why nCVaR was also bounded by 1. So I would like to ask the authors to add more detailed  explanations on nCVaR. Currently, CVaR is explained in Lines 237-243, but actually, there is  no explanation about nCVaR rather than its formal name 'nested conditional value at risk' in  Line 237.

      Thank you for pointing out this error. We have corrected the paper to use nCVaR to refer to  the objective and nCVaR's α, or sometimes just α, to refer to the risk sensitivity parameter  and thus the degree of risk sensitivity.

      (3) Line 333 (and Abstract)

      Given that animals' behaviors could be equally well fitted by the model having both nCVaR ( free α) and hazard prior and the alternative model having only hazard prior (with α = 1), may  it be difficult to confidently claim that brave (/timid) animals had risk-neutral (/risk-aversive)  preference in addition to widespread (/low-variance) hazard prior? Then, it might be good to  somewhat weaken the corresponding expression in the Abstract (e.g., add 'potentially also'  to the result for risk sensitivity) or mention the inseparability of risk sensitivity and prior belief  pessimism (e.g., "... although risk sensitivity and prior belief pessimism could not be teased  apart").

      Thank you for this suggestion, we have duly weakened the wording in the Abstract to say  “potentially more risk neutral”:

      “Some animals begin with cautious exploration, and quickly transition to confident approach  to maximize exploration for reward; we classify them as potentially more risk neutral, and  enjoying a flexible hazard prior. By contrast, other animals only ever approach in a cautious  manner and display a form of  self-censoring; they are characterized by potential risk  aversion and high and inflexible hazard priors.”

      Reviewer #2 (Public Review):

      Shen and Dayan build a Bayes adaptive Markov decision process model with three key  components: an adaptive hazard function capturing potential predation, an intrinsic reward  function providing the urge to explore, and a conditional value at risk (CvaR, closely related  to probability distortion explanations of risk traits). The model itself is very interesting and  has many strengths including considering different sources of risk preference in generating  behavior under uncertainty. I think this model will be useful to consider for those studying  approach/avoid behaviors in dynamic contexts.

      The authors argue that the model explains behavior in a very simple and unconstrained  behavioral task in which animals are shown novel objects and retreat from them in various  manners (different body postures and patterns of motor chunks/syllables). The model itself  does capture lots of the key mouse behavioral variability (at least on average on a  mouse-by-mouse basis) which is interesting and potentially useful. However, the variables in  the model - and the internal states it implies the mice have during the behavior - are  relatively unconstrained given the wide range of explanations one can offer for the mouse  behavior in the original study (Akiti et al). This reviewer commends the authors on an original  and innovative expansion of existing models of animal behaviour, but recommends that the  authors  revise their study to reflect the obvious  challenges . I would also recommend a  reduction in claiming that this exercise gives a normative-like or at least quantitative account  of mental disorders.

      We thank reviewer #2 for highlighting some of the strengths of our paper as well as pointing  out important limitations of Akiti et al’s original study which we’ve inherited as well as some  limitations of our own method. We address their concerns below.

      We have added a paragraph to the discussion discussing the limitations of the state  representation we adopted from Akiti’s study.

      (Reviewer #1 had the same concern, see above) “Motivated by tail-behind versus  tail-exposed in Akiti et al. (2022), we model approach using a dichotomy between cautious  and confident approach states [...]”

      We have reduced the suggestion that our model provides an account of mental disorders in  the abstract.

      Before:

      “On the other hand, “timid” animals, characterized by risk aversion and high and inflexible  hazard priors, display self-censoring that leads to the sort of asymptotic maladaptive  behavior that is often associated with psychiatric illnesses such as anxiety and depression.”

      After:

      “By contrast, other animals only ever approach in a cautious manner and display a form of  self-censoring; they are characterized by potential risk aversion and high and inflexible  hazard priors. “

      My main comment is that this paper is a very nice model creation that can characterize the  heterogeneity rodent behavior in a very simple approach/avoid context (Akiti et al; when a  novel object is placed in an arena) that itself can be interpreted in a multitude of ways. The  use of terms like "exploration", "brave", etc in this context is tricky because the task does not  allow the original authors (Akiti et al) to quantify these "internal states" or "traits" with the  appropriate level of quantitative detail to say whether this model is correct or not in capturing  the internal states that result in the rodent behavior. That said, the original behavioral setup  is so simple that one could imagine capturing the behavioral variability in multiple ways ( potentially without evoking complex computations that the original authors never showed  the mouse brain performs). I would recommend reframing the paper as a new model that  proposes a set of internal states that could give rise to the behavioral heterogeneity  observed in Akiti et al, but nonetheless is at this time only a hypothesis. Furthermore, an  explanation of what would be really required to test this would be appreciated to make the  point clearer.

      We thought very hard about using terms that might be considered to be anthropomorphic  such as ‘timid’ and ‘brave’. We are, of course, aware, of the concerns articulated by  investigators such as LeDoux about this. However, we think that, provided that we are clear  on the first appearance (using ‘scare’ quotes) that we are using them as indeed labels for  latent characteristics that capture correlations in various aspects of behaviour, they are more  helpful than harmful in making our descriptions understandable.

      Reviewer #3 (Public Review):

      Summary:

      The manuscript presents computational modelling of the behaviour of mice during  encounters with novel and familiar objects, originally reported by Akiti et al. (Neuron 110, 2022)          . Mice typically perform short bouts of approach followed by a retreat to a safe  distance, presumably to balance exploration to discover possible rewards with the potential  risk of predation. However, there is considerable heterogeneity in this exploratory behaviour,  both across time as an individual subject becomes more confident in approaching the object,  and across subjects; with some mice rapidly becoming confident to closely explore the  object, while other timid mice never become fully confident that the object is safe. The  current work aims to explain both the dynamics of adaptation of individual animals over time,  and the quantitative and qualitative differences in behaviour between subjects, by modelling  their behaviour as arising from model-based planning in a Bayes adaptive Markov Decision  Process (BAMDP) framework, in which the subjects maintain and update probabilistic  estimates of the uncertain hazard presented by the object, and rationally balance the  potential reward from exploring the object with the potential risk of predation it presents.

      In order to fit these complex models to the behaviour the authors necessarily make  substantial simplifying assumptions, including coarse-graining the exploratory behaviour into  phases quantified by a set of summary statistics related to the approach bouts of the animal.  Inter-individual variation between subjects is modelled both by differences in their prior  beliefs about the possible hazard presented by the object and by differences in their risk  preference, modelled using a conditional value at risk (CVaR) objective, which focuses the  subject's evaluation on different quantiles of the expected distribution of outcomes.  Interestingly these two conceptually different possible sources of inter-subject variation in  brave vs timid exploratory behaviour turn out not to be dissociable in the current dataset as  they can largely compensate for each other in their effects on the measured behaviour.  Nonetheless, the modelling captures a wide range of quantitative and qualitative differences  between subjects in the dynamics of how they explore the object, essentially through  differences in how subject's beliefs about the potential risk and reward presented by the  object evolve over the course of exploration, and are combined to drive behaviour.

      Exploration in the face of risk is a ubiquitous feature of the decision-making problem faced  by organisms, with strong clinical relevance, yet remains poorly understood and  under-studied, making this work a timely and welcome addition to the literature.

      Strengths:

      (1) Individual differences in exploratory behaviour are an interesting, important, and  under-studied topic.

      (2) Application of cutting-edge modelling methods to a rich behavioural dataset, successfully  accounting for diverse qualitative and qualitative features of the data in a normative  framework.

      (3) Thoughtful discussion of the results in the context of prior literature.

      Limitations:

      (1) The model-fitting approach used of coarse-graining the behaviour into phases and fitting  to their summary statistics may not be applicable to exploratory behaviours in more complex  environments where coarse-graining is less straightforward.

      (2) Some aspects of the work could be more usefully clarified within the manuscript.

      We thank reviewer #3 for their positive feedback and helping us to improve the clarity of our  paper. We have added discussion they thought was missing.

      Reviewer #1 (Recommendations for the authors):

      (1) Line 25-28

      This part of the Abstract might give an impression that timidity (but not braveness) is  potentially associated with psychiatric illness and even that timidity is thus inferior to  braveness. However, even though extreme timidity might indeed be associated with anxiety  or depression, extreme braveness could also be associated with other psychiatric or  behavioral problems. Moreover, as a population, the existence of both timid and brave  individuals could be advantageous, and it could be a reason why both types of individuals  evolutionarily survived in the case of wild animals (although Akiti et al. used mice, which may  have no or very limited genetic varieties, and so things may be different). So I would like to  encourage the authors to elaborate on the expression of this part of the Abstract and/or  enrich the related discussion in the Discussion.

      This is an important point. We note on line 38 that excessive novelty seeking (potentially  caused by excessive braveness) could also be maladaptive.

      Additionally, we have added a paragraph to the discussion discussing heterogeneity in risk  sensitivity within a population.

      “Our data show that there is substantial variation in the degrees of risk sensitivity across the  mice.  Previous works have reported substantial interpopulation and intrapopulation  differences in risk-sensitivity in humans which depend on gender, age, socioeconomic  status, personality characteristics, wealth and culture (Rieger et al., 2015; Frey et al., 2017).  Despite the normative appeal of 𝛼 = 1, it is possible that a population may benefit from  including individuals with $\alpha$ different from 1.0 or highly negative priors. For example,  more cautious individuals could learn from merely observing the risky behavior of less  cautious individuals. Furthermore, we have only considered risk-sensitivity under epistemic  uncertainty in our work. Risk averse individuals, for instance with 𝛼 < 1 may be more  successful than risk-neutral agents in environments where there are unexpected dangers ( unknown unknowns). Risk-aversion is thus a temperament of ecological and evolutionary  significance (Réale et al., 2007).”

      (2) Line 149

      Section 2.2 consists of eight subsections. I think this organization may not be very  appealing, because there are a bit too many subsections, and their relations are not  immediately clear to readers. So I would like to encourage the authors to make an  elaboration. For example, since 2.2.1 - 2.2.5 describes a summary of model construction  and model fitting whereas 2.2.6-2.2.8 shows the results, it could be good to divide these into  separate sections (2.2.1 - 2.2.5 and 2.3.1 - 2.3.3).

      Thank you for pointing this out. We’ve renumbered the sections as you’ve suggested.

      (3) Line 347-8

      Theoretically, the effect of prior is diluted over experience whereas the effect of biased  (risk-aversive) evaluation persists, as the authors mentioned in Lines 393-394. Then isn't it  possible to consider environments/conditions in which the two effects can be separated?

      We appreciate this suggestion. Indeed, our original thought in modeling this experiment was  that this would be exactly the case here - with epistemic uncertainty reducing as the object  became more familiar. However, proving to an animal that a single environment is  completely stationary/fixed is hard - reflected in our conclusion here that the exploration  bonus pool replenishes. Thus, we argued in the discussion that a series of environments  would be necessary to separate risk sensitivity from priors.

      (4) Line 407

      It would be nice to add a brief phrase explaining how (in what sense) this model's  assumption was consistent with the reported behavior. Also, should the assumption of  having two discrete approach states (cautious and confident) itself be regarded as a  limitation of the model? If the tail-behind and tail-exposure approaches were not merely  operationally categorized but were indicated to be two qualitatively distinct behaviors in the  experiment by Akiti et al., it is reasonable to model them as two discrete states, but  otherwise, the assumption of two discrete states would need to be mentioned as a  simplification/limitation.

      We have now removed line 407, and now have an additional  paragraph in the discussion  discussing the limitations of the tail-behind and tail-exposure state representation: “Motivated by tail-behind versus tail-exposed in Akiti et al. (2022), we model approach using  a dichotomy between cautious and confident approach states. This is likely a crude  approximation to the continuous and multifaceted nature of animal approach behavior. For  example, during approach animals likely adjust their levels of vigilance continuously (or  discretely; Lloyd and Dayan (2018)) to  monitor threat, and choose different velocities for  movement, and different attentional strategies for inspecting the novel object. We hope  future works will model these additional behavioral complexities, perhaps with additional  internal states, and corroborate these states with neurobiological data.”

      (5) Line 418

      The authors contrasted their model-based analyses with the model-free analyses of Akiti et  al. Another aspect of differences between the authors' model and the model of Akiti et al. is  whether it is normative or mechanistic: while how the model of Akiti et al. can be biologically  implemented appears to be clear (TS dopamine represents threat TD error, and TS  dopamine-dependent cortico-striatal plasticity implements TD error-based update of  model-free threat prediction), biological implementation of the authors' model seems more  elusive. Given this, it might be a fruitful direction to explore how these two models can be  integrated in the future.

      We enthusiastically agree that it would be most interesting in the future to explore the  integration of the two models - and, in the discussion ( Lines 537-548, 454-461) , point to  some first steps that might be fruitful along these lines. There are two separate  considerations here: one is that our account is mostly computational and algorithmic,  whereas Akiti’s model is mostly algorithmic and implementational; the second is, as noted by  the reviewer, that our account is model-based, whereas Akiti’s model is model-free (in the  sense of reinforcement learning; RL). These are related - thanks in no small part to the work  from the group including Akiti, we know a lot more about the implementation of model-free  than model-based RL. However, our model-based account does reach additional features of  behavior not captured in Akiti et al.’s model such as bout duration, frequency, and approach  type. Thus, the temptation of unification.

      (6) Line 426

      Related to the previous point, it would be nice to more specifically describe what variable TS  dopamine can represent in the authors' model if possible.

      In the discussion  (Lines 454-461) , we speculate that  TS dopamine could still respond to the  physical salience of the novel object and affect choices by determining the potential cost of  the encountered threat or the prior on the hazard function. For example, perhaps ablating TS  dopamine reduces the hazard priors which leads to faster transition from cautious to  confident approach and longer bout durations, consistent with the optogenetics behavioral  data reported in Akiti et al.

      Reviewer #2 (Recommendations for the authors):

      My guess is simpler versions of the model would not fit the data well. But this does not mean  for example that the mice have probability distortions (CvaR) or that even probabilistic  reasoning and the internal models necessary to support them are acting in the behavioral  context studied by Akiti. So related to the above, I would ask what other models would fit and  would not fit the data? And what does this mean?

      These are good points. Our model provides an approximately normative account of the  animals’ behavior  in terms of what it achieves relative to a utility function. In practice, the  animals could deploy a precompiled model-free policy (which does not rely on probabilistic  computations) that is exactly equivalent to our model-based policy. With the current  experiment, we cannot conclude whether or not the animals are performing the prospective  calculations in an online manner. Of course, the extent to which animals or humans are  performing probabilistic computations online and have internal models are on-going  questions of study.

      Model comparison is difficult because currently we do not know of any other risk-sensitive  exploration models. We cannot directly compare to the model in Akiti et al. since our model  explains additional features of behavior: bout duration, frequency, and approach type.  Indeed, our model is as simple as it can be in the sense with the exception of nCVaR,  removing any of the other parameters makes it difficult to fit some animals in our dataset. In the future, our model could be used to fit other datasets of risk-sensitive exploration and,  ideally,  be compared to other models.

      Explaining why animals avoid the novel object in what the offers call benign environment is a  very tricky issue. In Akiti et al, the readers are not yet convinced that the mice know that this  environment is benign. Being placed in an arena with a novel object presents mice with a  great uncertainty and we do not know whether they treat this as benign. Therefore, the  alternative explanations in this study need to be carefully discussed in lieu of the limitations  of the initial study.

      It is certainly true that it is unclear if the arena is  completely  benign to the animals. However,  the amount of time the animal spends in the center of the arena decreases significantly from  habituation to novelty days. This suggests that the animals avoid the novel object largely  because of the object itself, rather than the potential danger associated with the arena.  Furthermore, the animals are not reported as exhibiting more extreme behaviours such as  freezing. In any case, our account is relative in the sense that we are comparing the time the  animal spends at the object versus elsewhere in the environment, driven by the relative  novelty and relative risk of the environment versus the object. Trying to get more absolute  measures of these quantities would require a richer experimental set-up, for instance with  different degree of habituation or experience of the occurrence of (other) novel objects, in  general.

      We added a short note to the discussion to explain this:

      “Fourth, we modeled the relative amount of time the animal spends at the object versus  elsewhere in the environment which depends on the differential risk in the two states.  However, it is likely the animals avoid the novel object largely because of the object itself,  rather than the potential danger associated with the arena since they spend much less time  at the center of the arena during novelty than habituation days.”

      Figure 2 - how confident are the authors that each mouse differs from y=1? Related to this,  the behavior in Akiti is very noisy and changes across time. I am not sure if the authors fully  describe at what levels their model captures the behavior vs not in a detailed enough  fashion.

      We have performed a random permutation test on the minute-to-minute data. We have  updated Figure 2 so that brave animals that pass the Benjamini–Hochberg procedure y>1 at  level q=0.05 are represented with solid green dots and animals that don’t pass are  represented with hollow dots. 8 out of 11 brave animals passed Benjamini–Hochberg.

      Reviewer #3 (Recommendations for the authors):

      (1) I could not find information in the preprint about code availability. Please consider making  the code public to help others apply these modelling methods.

      We have released code and included the url in the paper in the Methods section.

      (2) Though the manuscript was generally clearly written, there were a number of places  where some additional information or clarification would be useful:

      a) Please define and explain the terms 'tail-behind' and 'tail-exposed' (used to describe  approach bout types) when first used.

      We have added definitions when we first mention these terms:

      “[...] 'tail-behind' (bouts where the animal's nose was closer to the object than the tail for the  entire bout) and 'tail-exposed' (bouts where the animal's tail is closer to the object than the  nose at some point during the bout), associated respectively with cautious risk-assessment  and engagement”

      b) At lines 57-58 when contrasting the 'model-free' account of Akiti et al with the 'model-based' account of the current work, it would be worth clarifying that these terms are  being used in the RL sense rather than e.g. a model-based analysis of the data.  

      We have updated the relevant lines to say “model-free/based reinforcement learning”.

      c) Line 61, the phrase 'the significant long-run approach of timid animals despite having  reached the "avoid" state' is unclear as the 'avoid' state has not been defined.

      We updated the terminology to “avoidance behavior” to be consistent with Akiti et al.  Avoidance refers to the animal routinely avoiding the object and therefore being unable to  learn whether it is safe.

      d) It was not completely clear to me how the coarse-graining of the behaviour was  implemented. Specifically, how were animals assigned to the brave, intermediate, or timid  group, and how were the parameters of the resulting behavioural phases fit?

      Sorry that this was not clear. Section 2.1 explains how the minute-to-minute behavioral data  was coarse-grained and how animal groups were assigned. We have added further  explanation of Figure 2 to the main text:

      “Fig 2 summarizes our categorization of the animals into the three groups: brave,  intermediate, and timid based on the phases identified in the animal's exploratory  trajectories. Timid animals spend no time in confident approach and are plotted in orange at  the origin of Fig 2. Brave animals differ from intermediate animals in that their approach time  during the first ten minutes of the confident phase is greater than the last ten minutes ( steady-state phase). Brave animals are plotted in green above and intermediate animals  are plotted in black below the y=1 line in Fig 2.”

      We also added extra information to outline the goal, and methodology of coarse-graining and  animal grouping:

      “We sought to capture  these qualitative differences (cautious versus confident) as well as  aspects of the quantitative changes in bout durations and frequencies as the animal learns  about their environment. To make this readily possible, we abstracted the data in two ways:

      averaging  bout statistics over time, and clustering the animals into three groups with  operationally distinct behaviors.”

      e) What purpose does the 'retreat' state serve in the BAMDP model (as opposed to  transitioning directly from 'object' to 'nest' states), and why do subjects not pass through it  following 'detect' states?

      Thank you for pointing this out. We have updated Figure 3 to note that the two “detected  states” also point to the “retreat” state. The reviewer is correct that there could be alternative  versions of the state diagram, and the ‘retreat’ state could indeed have been eliminated.  However, we thought that it was helpful to structure the animal’s progress through state  space.

      f) Why was the hazard function parameterised via the mean and SD at each time step rather  than with a parametric form of the mean and SD as a function of time?

      Since the agent can only spend 2, 3, or 4 turns at the object states, we didn’t see a need to  parameterize the mean and SD as a function of time. Doing so is a good solution to scaling  up the hazard function to more time-steps.

      (3) There were also a couple of points that could potentially be usefully touched on in the  discussion:

      a) What, if any, is the relationship between the CVaR objective and distributional RL? They  seem potentially related due to both focussing on quantiles of the outcome distribution.

      We have added a paragraph to the discussion discussing the connection between  distributional RL and CVaR:

      “CVaR is known to come in different flavors in the case of temporally-extended behavior.  Gagne and Dayan (2021) introduces two alternative time-consistent formulations of CVaR:  nested CVaR (nCVaR) and precommitted CVaR (pCVaR). nCVaR and pCVaR both enjoy  Bellman equations which make it possible to compute approximately optimal policies without  directly computing whole distributions of the outcomes. We use nCVaR in this study for its  computational efficiency. There is, of course, great current interest in distributional  reinforcement learning (Bellemare et al., 2023b) which does acquire such whole  distributions, not the least because of prominent observations linking non-linearities in the  response functions of dopamine neurons to methods for learning distributions of outcomes ( Dabney et al., 2020; Masset et al., 2023; Sousa et al., 2023). One functional motivation for  considering entire outcome distributions is the possibility of using them to determine  risk-sensitive policies (Gagne and Dayan, 2021).

      While it is possible to compute CVaR directly from return distributions, Gagne and Dayan  (2021) showed that this can lead to temporally inconsistent policies where the agent  deviates from its original plans (the authors called this the fixed CVaR or fCVaR measure).

      Rather further removed from our model-based methods is work from Antonov and Dayan  (2023), who consider a model-free exploration strategy which exploits full return distributions  to compute the value of perfect information which is used as a heuristic for trying actions  with uncertain consequences. Future works can examine risk-sensitive versions of Antonov  and Dayan (2023)'s computationally efficient model-free algorithm as one solution to the  burdensome computations in our model-based method.”

      b) Why normatively might subjects have non-neutral risk preference as captured by the  CvaR?

      We also added a paragraph to the discussion discussing the advantage of heterogeneity in  risk sensitivity within a population:

      (Reviewer #1 had the same question, see above) “Our data show that there is substantial  variation in the degrees of risk sensitivity across the mice.  Previous works have reported  substantial interpopulation and intrapopulation differences in risk-sensitivity in humans which  depend on gender, age, socioeconomic status, personality characteristics, wealth and culture [...]”

      c) Relevance of the current modelling work to clinical conditions characterised by  dysregulation of risk assesment (e.g. anxiety or PTSD).

      We’ve added a paragraph to the discussion:

      “Inter-individual differences in risk sensitivity are also of critical importance in psychiatry,  reflected in a panoply of anxiety disorders (Butler and Mathews, 1983; Giorgetta et al., 2012;  Maner et al., 2007; Charpentier et al., 2017), along with worry and rumination (Gagne and  Dayan, 2022). Understanding the spectrum of   extreme priors and extreme values of 𝛼  could have therapeutic implications, adding significance to the search for tasks that can  more cleanly separate them.”

      d) Is it surprising to see differences in risk preference (nCVaR) between the familiar object  and novel object condition, given that risk preference might be conceptualised as a trait  rather than a state variable?

      Thank you for raising this point. You are right that we expected risk sensitivity (nCVaR alpha)  to be the same between FONC and UONC animals on average. It is difficult to know if alpha  is higher for FONC than UONC animals due to the non-identifiability between alpha and  hazard priors. We have added this discussion to the paper:

      “This is surprising if we interpret 𝛼 as a trait that is stable through time. Unfortunately, due to  the non-identifiability between 𝛼 and hazard priors, we cannot verify whether 𝛼 is actually  higher for FONC animals than UONC animals.”

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      The study is methodologically solid and introduces a compelling regulatory model. However, several mechanistic aspects and interpretations require clarification or additional experimental support to strengthen the conclusions.

      Strengths:

      (1) The manuscript presents a compelling structural and biochemical analysis of human glutamine synthetase, offering novel insights into product-induced filamentation.

      (2) The combination of cryo-EM, mutational analysis, and molecular dynamics provides a multifaceted view of filament assembly and enzyme regulation.

      (3) The contrast between human and E. coli GS filamentation mechanisms highlights a potentially unique mode of metabolic feedback in higher organisms.

      Weaknesses:

      (1) The mechanism underlying spontaneous di-decamer formation in the absence of glutamine is insufficiently explored and lacks quantitative biophysical validation.

      (2) Claims of decamer-only behavior in mutants rely solely on negative-stain EM and are not supported by orthogonal solution-based methods.

      We thank the reviewer for the summary and noting of the strengths. We agree that the evolutionary divergence of metabolic feedback in GS homologs is a fruitful avenue for future studies. With regard to the weaknesses, the di-decamer in the absence of glutamine only forms under high (higher than physiological) concentrations of enzyme. Our primary evidence for the mutant behavior was the lack of crosslinking (Figure 1E), with supplementary support from the negative stain. In the revised version we will soften the language to say “reduced” rather than “did not support” filament formation.

      Reviewer #2 (Public review):

      The authors set out to resolve the high-resolution structure of a glutamine synthetase (GS) decamer using cryo-EM, investigate glutamine binding at the decamer interface, and validate structural observations through biochemical assays of ATP hydrolysis linked to enzyme activity. Their work sits at the intersection of structural and functional biology, aiming to bridge atomic-level details with biological mechanisms - a goal with clear relevance to researchers studying enzyme catalysis and metabolic regulation.

      Strengths and weaknesses of methods and results:

      A key strength of the study lies in its use of cryo-EM, a technique well-suited for resolving large, dynamic macromolecular complexes like the GS decamer. The reported resolutions (down to 2.15 Å) initially suggest the potential for detailed structural insights, such as side-chain interactions and ligand density. However, several methodological limitations significantly undermine the reliability of the results:

      (1) Cryo-EM data processing: The absence of critical details about B-factor sharpening - a standard step to enhance map interpretability - is a major concern. For high-resolution maps (<3 Å), sharpening is typically applied to resolve side-chain features, yet the submitted maps (e.g., those in Figures 1D, 2D, and supplementary figures) appear unprocessed, with density quality inconsistent with the claimed resolutions. This makes it difficult to evaluate whether observed features (e.g., glutamine binding) are genuine or artifacts of unsharpened data.

      (2) Modeling and density consistency: The structural models, particularly for glutamine binding at the decamer interface, do not align with the reported resolution. The maps shown in Figure 2D and Supplementary Figure S7 lack sufficient density to confidently place glutamine or even surrounding residues, conflicting with claims of 2.15 Å resolution. Additionally, fitting a non-symmetric ligand (glutamine) into a symmetry-refined map requires justification, as symmetry constraints may distort ligand placement.

      (3) Biochemical assay controls: While the enzyme activity assays aim to link structure to function, they lack essential controls (e.g., blank reactions without GS or substrates, substrate omission tests) to confirm that ATP hydrolysis is GS-dependent. The use of TCEP, a reducing agent, is also not paired with experiments to rule out unintended effects on the PK/LDH system, further limiting confidence in activity measurements.

      Achievement of aims and support for conclusions:

      The study falls short of convincingly achieving its goals. The claimed high-resolution structural details (e.g., side-chain densities, ligand binding) are not supported by the provided maps, which lack sharpening and show inconsistencies in density quality. Similarly, the biochemical data do not robustly validate the structural claims due to missing controls. As a result, the evidence is insufficient to confirm glutamine binding at the decamer interface or the functional relevance of the observed structural features.

      Likely impact and utility:

      If these methodological gaps are addressed, the work could make a meaningful contribution to the field. A well-resolved GS decamer structure would advance understanding of enzyme assembly and ligand recognition, while validated biochemical assays would strengthen the link between structure and function. Improved data processing and clearer reporting of validation steps would also make the structural data more reliable for the community, providing a resource for future studies on GS or related enzymes.

      We disagree with the reviewer’s overall assessment.

      With regard to sharpening and resolution: we examined sharpened maps and in a revised version will present additional supplementary figures showing these maps side by side. We note that the resolutions reported are global and that the most interesting features are, of course, in the periphery and subject to conformational and compositional heterogeneity. We will include supplementary figures of core side chain densities that are more like what are expected by the reviewer in the revision. 

      With regard to modeling: the apo filament and turnover filament datasets were handled nearly identically. The additional density is therefore likely not artefactual to the symmetry operator - however, the lower resolution in this region noted by the reviewer is worthy of further exploration. The maps are public and we think this is the most plausible interpretation of the density, which we based primarily on the biochemical data and will include more speculation in the version.

      With regard to the biochemical controls: we point the reviewer to Figure S1, which shows that omission of ammonia or glutamate in the wild-type (tagless) system removes any coupling of the reactions. We will perform the additional controls to publication quality in the revised version along with the TCEP control. We note that the reducing agent is present across all experiments, ruling out an effect on any specific result. The inclusion of TCEP is also very standard in other published uses of the Coupled ATPase assay (e.g. PMID: 31778111 and PMID: 32483380 by our first author)

      Additional context:

      Cryo-EM has transformed structural biology by enabling high-resolution analysis of large complexes, but its success hinges on rigorous data processing and validation steps that are critical to ensuring reproducibility. The challenges highlighted here are not unique to this study; they reflect broader issues in the field where incomplete reporting of methods can obscure the reliability of results. By addressing these points, the authors would not only strengthen their current work but also set a positive example for transparent and rigorous structural biology research.

      All the data is public and the reviewer or anyone is free to reinterpret the maps and models - and we encourage that rather than just an interpretation of our static figures. In addition, we will upload the raw micrograph data for the apo filament and turnover filament datasets to EMPIAR prior to submitting the revision.

      Reviewer #3 (Public review):

      In this manuscript, the authors propose a product-dependent negative-feedback mechanism of human glutamine synthetase, whereby the product glutamine facilitates filament formation, leading to reduced catalytic specificity for ammonia. Using time-resolved cryo-EM, the authors demonstrate filament formation under product-rich conditions. Multiple high-quality structures, including decameric and di-decameric assemblies, were resolved under different biochemical states and combined with MD simulations, revealing that the conformational space of the active site loop is critical for the GS catalysis. The study also includes extensive steady-state kinetic assays, supporting the view that glutamine regulates GS assembly and its catalytic activity. Overall, this is a detailed and comprehensive study. However, I would advise that a few points be addressed and clarified.

      (1) In Figure 2D and Supplementary Figure 7, the extra density observed between the two decamers does not appear to have the defining features of a glutamine. A less defined density may be expected given the nature of the complex, but even though mutagenesis assays were performed to support this assignment, none of these results constitutes direct and conclusive evidence for glutamine binding at this site. I would thus suggest showing the density maps at multiple contour thresholds to allow readers to also better evaluate the various small molecules under turnover conditions that cannot be well fitted based on this density map, helping to provide a more balanced interpretation of the results.

      (2) On the same point regarding the density for the enzyme under turnover conditions, more details should be provided about the symmetry expansion and classification performed, and also show the approximate ratio of reconstructions that include this density. Did you try symmetry expansion followed by focused classification, especially on the interface region?

      (3) The interface between the two decamers of the model needs to be double-checked and reassigned, especially for the residues surrounding the fitted glutamine. For example, the side chain of the Lys residue shown in the attached figure is most likely modeled incorrectly.

      We thank the reviewer for the feedback. As noted above, we will include supplemental figures that show maps at multiple thresholds and sharpening schemes. We noted in the manuscript and above that our interpretation here is based on integrating biochemical evidence alongside the density and will make that even more clear in the revised manuscript. The filaments +/- the putative glutamine density were processed nearly identically, but we will attempt various schemes of focused classification/symmetry expansion in the revision as well. However, we point out that there is extensive averaging there that makes modeling a bit trickier than expected given the global resolution.

    1. Praising students for merely meeting expectations may reduce student behavior over time as it “cheapens” your praise.

      This is something I agree with wholeheartedly. And I think it is because I see this in my job, we have an "Employee of the Quarter" program and it sounds wonderful on the surface level but the unfortunate reality is that every single person will eventually get this award even if they don't deserve it. This will cause employees to think "Oh I can get this extra special recognition and this award just for being here/doing below the bare minimum/doing the bare minimum,,."

    1. Group G Ben Braniff, Kim Maynard, Nick Devic, Maria Echeverri Solis, Sam Yalda

      1. Design has a major impact on the world and society. Even the little things can add up to a lot. Sustainability is a revolutionary Idea that should be at the core of every design now.

      2. Society is another bottom line meaning all design inherently affects humans and/or is designed for humans. It's important to design for the extremes and the edge cases like people with disabilities.

      3. Corporations output a lot of waste. When they make small changes to be more sustainable, it results in big changes and saving a lot of material. Small changes can include anything from using 2% less plastic per water bottle to using wood buttons instead of plastic ones.

      4. A lot of people don't consider themselves disabled, but it's very common at some point in people's lives to have a certain level of impairment. It's important to keep this in mind when designing as you're designing for the general population--not just a specific individual.

      5. Addressing issues like world hunger may require rethinking the way we design food production. As they stated for example, choosing kangaroo meat over beef as a more environmentally sustainable option.

      6. Thoughtful design choices per the example in the video such as adding white circles inside letters to reduce ink use, can improve efficiency and conserve resources.

      7. It is interesting how he opens up his discussion to slowly introduce that design isn't just about doing it for marketing or 'profit' as he pointed out. When watching this it helps a person realize that design is so much more powerful than that if you put it towards another cause. Design could end up being the solution to some of the biggest problems in society.

      8. A very important point he made was that improving accessibility is beneficial to many more people than just the people that initially needed it such as people with disabilities. From this i think a good takeaway is that design should always be considerate of any disabilities/needs that the audience might have because sometimes that design is just better for everyone in general.

      9. My first take is design should go beyond money and aesthetics. By thinking about sustainability and accessibility the designers can create solutions that are socially responsible and environmentally friendly.

      10. My second take is when you design with people with disabilities you end up with solutions that are more usable and inclusive

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This work aims to elucidate the molecular mechanisms affected in hypoxic conditions, causing reduced cortical interneuron migration. They use human assembloids as a migratory assay of subpallial interneurons into cortical organoids and show substantially reduced migration upon 24 hours of hypoxia. Bulk and scRNA-seq show adrenomedullin (ADM) up-regulation, as well as its receptor RAMP2, confirmed atthe protein level. Adding ADM to the culture medium after hypoxic conditions rescues the migration deficits, even though the subtype of interneurons affected is not examined. However, the authors demonstrate very clearly that ineffective ADM does not rescue the phenotype, and blocking RAMP2 also interferes with the rescue. The authors are also applauded for using 4 different cell lines and using human fetal cortex slices as an independent method to explore the DLXi1/2GFP-labelled iPSC-derived interneuron migration in this substrate with and without ADM addition (after confirming that also in this system ADM is up-regulated). Finally, the authors demonstrate PKA-CREB signalling mediating the effect of ADM addition, which also leads to up-regulation of GABAreceptors. Taken together, this is a very carefully done study on an important subject - how hypoxia affects cortical interneuron migration. In my view, the study is of great interest.

      Strengths:

      The strengths of the study are the novelty and the thorough work using several culture methods and 4 independent lines.

      Weaknesses:

      The main weakness is that other genes regulated upon hypoxia are not confirmed, such that readers will not know until which fold change/stats cut-off data are reliable.

      Reviewer #2 (Public review):

      Summary

      The manuscript by Puno and colleagues investigates the impact of hypoxia on cortical interneuron migration and downstream signaling pathways. They establish two models to test hypoxia, cortical forebrain assembloids, and primary human fetal brain tissue. Both of these models provide a robust assay for interneuron migration. In addition, they find that ADM signaling mediates the migration deficits and rescue using exogenous ADM.

      Strengths:

      The findings are novel and very interesting to the neurodevelopmental field, revealing new insights into how cortical interneurons migrate and as well, establishing exciting models for future studies. The authors use sufficient iPSC lines including both XX and XY, so the analysis is robust. In addition, the RNAseq data with re-oxygenation is a nice control to see what genes are changed specifically due to hypoxia. Further, the overall level of validation of the sequencing data and involvement of ADM signaling is convincing, including the validation of ADM at the protein level. Overall, this is a very nice manuscript.

      Weaknesses:

      I have a few comments and suggestions for the authors. See below.

      Reviewer #3 (Public review):

      Summary:

      The authors aimed to test whether hypoxia disrupts the migration of human cortical interneurons, a process long suspected to underlie brain injury in preterm infants but previously inaccessible for direct study. Using human forebrain assembloids and ex vivo developing brain tissue, they visualized and quantified interneuron migration under hypoxic conditions, identified molecular components of the response, and explored the effect of pharmacological intervention (specifically ADM) on restoring the migration deficits.

      Strengths:

      The major strength of this study lies in its use of human forebrain assembloids and ex vivo prenatal brain tissue, which provide a direct system to study interneuron migration under hypoxic conditions. The authors combine multiple approaches: long-term live imaging to directly visualize interneuron migration, bulk and single-cell transcriptomics to identify hypoxia-induced molecular responses, pharmacological rescue experiments with ADM to establish therapeutic potential, and mechanistic assays implicating the cAMP/PKA/pCREB pathway and GABA receptor expression in mediating the effect. Together, this rigorous and multifaceted strategy convincingly demonstrates that hypoxia disrupts interneuron migration and that ADM can restore this defect through defined molecular mechanisms.

      Overall, the authors achieve their stated aims, and the results strongly support their  conclusions. The work has a significant impact by providing the first direct evidence of hypoxia-induced interneuron migration deficits in the human context, while also nominating a candidate therapeutic avenue. Beyond the specific findings, the methodological platform - particularly the combination of assembloids and live imaging - will be broadly useful to the community for probing neurodevelopmental processes in health and disease.

      Weaknesses:

      The main weakness of the study lies in the extent to which forebrain assembloids

      recapitulate in vivo conditions, as the migration of interneurons from hSO to hCO does not fully reflect the native environment or migratory context of these cells. Nevertheless, this limitation is tempered by the fact that the work provides the first direct observation of human interneuron migration under hypoxia, representing a major advance for the field. In addition, while the transcriptomic analyses are valuable and highlight promising candidates, more in-depth exploration will be needed to fully elucidate the molecular mechanisms governing neuronal migration and maturation under hypoxic conditions.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The authors should examine if all cortical interneurons are affected by ADM or only subtypes (Parvalbumin/Somatostatin).

      We thank the reviewer for raising this important question. In our study, we utilized the Dlx1/2b::eGFP reporter to broadly label cortical interneurons; however, this system does not distinguish specific interneuron subtypes. To address this, in the revised version of the manuscript we will use the single-cell RNA sequencing data and immunostainings to provide this information. Based on previous analyses from Birey et al (Cell Stem Cell, 2022), we expect interneurons within assembloids to express mostly calbindin (CALB2) and somatostatin (SST) at this in vitro stage of development; parvalbumin subtype appears later based on data from Birey et al (Nature, 2017) and more recently from Varela et al, (bioRxiv, 2025).

      In parallel, we will analyze available scRNA-seq data from developing human primary brain tissue a similar age as the one used in the manuscript, and check whether these subtypes of interneurons are similar to the ones within assembloids.

      (2) The authors should test more candidates from their bulk RNA-seq data with different fold changes for regulation after hypoxia, to allow the reader to judge at which cut-off the DEGs may be reproducible. This would make this database much more valuable for the field of hypoxia research.

      We appreciate the reviewers’ thoughtful suggestion. In addition to the bulk RNA-seq analysis, we did validate several upregulated hypoxia-responsive genes with varying fold changes by qPCR; these include PDK1, PFKP, VEGFA (Figure S1). 

      We go agree that in-depth investigation of specific cut-offs would be interesting, however, this could be the focus of a different manuscript.

      Reviewer #2 (Recommendations for the authors):

      (1) Can the authors comment on the possibility of inflammatory response pathways being activated by hypoxia? Has this been shown before? While not the focus of the manuscript, it could be discussed in the Discussion as an interesting finding and potential involvement of other cells in the Hypoxic response.

      We thank the reviewer this important comment about inflammation. Indeed, hypoxia has been shown to activate the inflammatory response pathways. In various studies, it was found that HIF-1a can interact with NF-κB signaling, leading to the upregulation of pro-inflammatory cytokines such as IL-1β, IL-6, and TNF-α (Rius et al., Cell, 2008; Hagberg et al., Nat Rev Neurol, 2015).

      In our transcriptomics data (Figure 2D), and to the reviewers’ point, we identified enrichment of inflammatory signaling response following the hypoxic exposure. Since hSO at the time of analyses do contain astrocytes, we think these glia contribute to the observed pro-inflammatory changes. Based on these results and because ADM is known to have strong anti-inflammatory properties, the effects of ADM on hypoxic astrocytes should be investigated in future studies focused on hypoxia-induced inflammation. In the revision, we will address this comment in the discussion section and cite the appropriate papers.

      (2) Could the authors comment on the mechanism at play here with respect to ADM and binding to RAMP2 receptors - is this a potential autocrine loop, or is the source of ADM from other cell types besides inhibitory neurons? Given the scRNA-seq data, what cell-to-cell mechanisms can be at play? Since different cells express ADM, there could be different mechanisms in place in ventral vs dorsal areas.

      Based on our scRNA-seq data in hSOs showing significant upregulation of ADM expression in astrocytes and progenitors, we speculate that the primary mechanism is likely to involve paracrine interactions. However, we cannot exclude autocrine mechanisms with the included experiments. Dissecting these interactions in a cell-type specific manner could be an important focus for future ADM-related studies.

      To address the question about the possible different mechanisms in ventral versus dorsal areas, in the revision we will plot and include in the figures the data about the cell-type expression of ADM and its receptors in hCOs.

      (3) For data from Figure 6 - while the ELISA assays are informative to determine which pathways (PKA, AKT, ERK) are active, there is no positive control to indicate these assays are "working" - therefore, if possible, western blot analysis from assembloid tissue could be used (perhaps using the same lysates from Figure 3) as an alternative to validate changes at the protein level (however, this might prove difficult); further to this, is P-CREB activated at the protein level using WB?

      We thank the reviewer for this comment and the observation. Although we did not include a traditional positive control in these ELISA assays, several lines of evidence indicate that the measurements are reliable. First, the standard curves behaved as expected, and all sample values fell within the assay’s dynamic range. Second, technical replicates showed low variability, and the observed changes across experimental conditions (e.g., hypoxia vs. control) were consistent with the expected biological responses based on previous literature. We agree that including western blot validation would strengthen the findings, and we will note this for our future studies focused on CREB and ADM.

      (4) Could the authors comment further on the mechanism and what biological pathways and potential events are downstream of ADM binding to RAMP2 in inhibitory neurons? What functional impact would this have linked to the CREB pathway proposed? While the link to GABA receptors is proposed, CREB has many targets beyond this.

      We appreciate the reviewers’ insightful question. Currently, not much is known about the molecular pathways and downstream cellular events triggered by ADM binding to RAMP2 in inhibitory neurons, and in general in brain cells. The data from our study brings the first information about the cell-type specific expression of ADM in baseline and hypoxic conditions and is one of the key novelties of our study.

      While the signaling landscape of ADM in interneurons is largely unexplored, several studies in other (non-brain) cell types have demonstrated that ADM binding to RAMP2 can activate downstream cascades such as the cAMP/PKA/CREB pathway, PI3K/AKT, and ERK/MAPK, all of which are also known to be critical regulators of neuronal development and survival. These previously published data along with our CREB-targeted findings in hypoxic interneurons, suggest ADM–RAMP2 signaling could influence multiple aspects of interneuron biology, but these remain to be evaluated in future studies.

      We agree with the reviewer that CREB has a wide range of transcriptional targets. We decided to focus on GABA as a target of CREB for two main reasons, including: (i) GABA signaling has been previously shown to play an important role in the migration of cortical interneurons, and (ii) a previous study by Birey et al. (Cell Stem Cell, 2022) demonstrated that CREB pathway activity is essential for regulating interneuron migration in assembloid models of Timothy Syndrom, thus further providing evidence that dysregulation of CREB activity disrupts migration dynamics.

      While our study provides a first step toward uncovering the mechanisms of interneuron migration protection by ADM, we fully acknowledge that future work will be needed to delineate the full spectrum of ADM–RAMP2 downstream signaling events in inhibitory neurons and other brain cells.

      (5) Does hypoxia cause any changes to inhibitory neurogenesis (earlier stages than migration?) - this might always be known, but was not discussed.

      We appreciate this question from the reviewer; however, this was not something that we focused on in this manuscript due to the already large amount of data included. A separate study focusing on neurogenesis defects and the molecular mechanisms of injury for that specific developmental process would be an important next step.

      (6) In the Discussion section, it might be worth detailing to the readers what the functional impact of delayed/reduced migration of inhibitory neurons into the cortex might result in, in terms of functional consequences for neural circuit development.

      We thank the Reviewer for the suggestion of detailing the functional impact of reduced inhibitory neuron migration. We will revise the manuscript by incorporating a paragraph about this in the Discussion section.

      Reviewer #3 (Recommendations for the authors):

      Most of the evidence presented is convincing in supporting the conclusions, and I have only minor suggestions for improvement:

      (1) The bulk RNA-seq was performed in hSOs only, which may not fully capture the phenotypes of migrating or migrated interneurons. It would be valuable, if feasible, to sort migrated cells from hSO-hCO assembloids and specifically examine their molecular mediators.

      We thank the reviewer for this suggestion. While it is likely that the cellular environment will have some influence on a subset of the molecular changes, based on all the data from the manuscript and our specific target, the RNA-sequencing on hCOs was sufficient to capture essential changes like ADM upregulation. The in-depth exploration on differential responses of migrated versus non-migrated interneurons to hypoxia could be the focus of a different project.

      (2) In Figure 3, it is striking that cell-type heterogeneity dominates over hypoxia vs. control conditions. A joint embedding of hSO and hCO cells could provide further insight into molecular differences between migrated and non-migrated interneurons.

      We thank the reviewer for this observation and opportunity to clarify. Since we manually separated the assembloids before the analyses, we processed these samples separately. That is why they separate like this. In the revision, we will add data about ADM expression and its receptors’ expression in the hCOs.

      (3) It would be helpful to expand the discussion on how closely the migration observed in hSO-hCO assembloids reflects in vivo conditions, and what environmental aspects are absent from this model. This would better frame the interpretation and translational relevance of the findings.

      We thank the Reviewer for bringing up this important point. Although the assembloid model offers the unique advantage of allowing the direct investigation of migration patterns of hypoxic interneurons, we fully agree it does not fully recapitulate the in vivo environment. While there are multiple aspects that cannot be recapitulated in vitro at this time (e.g. cellular complexity, vasculature, immune response, etc), we are encouraged by the validation of our main findings in ex vivo developing human brain tissue, which strongly supports the validity of our findings for in vivo conditions.

      We will expand our discussion to include more details and the need to validate these findings using in vivo models, while also acknowledging that different species (e.g. rodents versus non-human primates versus humans) might have different responses to hypoxia.

      (4) The authors suggest that hypoxia is also associated with delayed interneuron maturation, yet the bulk RNA-seq data primarily reveal stress and hypoxia-related genes. A more detailed discussion of why genes linked to interneuron maturation and function were not strongly affected would clarify this point.

      We thank the Reviewer for the opportunity to clarify.

      The RNAseq data was performed during the acute stages of hypoxia/reoxygenation and we think a maturation phenotype might be difficult to capture at this point and would require analysis at later in vitro assembloid maturation stages.

      Our speculation about a possible maturation defect is based on data from previous studies from developmental biology that showed failure of interneurons to reach their final cortical location within a specified developmental window will impair their integration within the neuronal network, and thus lead to maturation defects and possible elimination by apoptosis.

      Since preterm infants suffer from countless hypoxic events over multiple months, we suggest these repetitive events are likely to induce cumulative delays in migration, inability of interneurons to reach their target in time, followed by abnormal integration within the excitatory network, and eventual elimination of some of these interneurons through apoptosis. However, the direct demonstration of this effect following a hypoxic insult would require prolonged in vivo experiments in rodents to follow the migration, network integration and apoptosis of interneurons; to our knowledge this experimental design is not technically feasible at this time.

      (5) Relatedly, while the focus on interneuron migration is well justified, acknowledging how hypoxia might also impact other aspects of cortical development (e.g., progenitor proliferation, neuronal maturation, or circuit integration) would place the findings in a broader developmental framework and strengthen their relevance.

      We appreciate the Reviewer’s suggestion to discuss the role of hypoxia on other processes during cortical development. In the revised manuscript, we will include citations about the effects of hypoxia on interneuron proliferation, maturation and circuit integration as available, and also expand to other cell types known to be affected.

      (6) Very minor: in Figure S3C and D, it was not stated what the colors mean (grey: control, yellow: hypoxia)

      Thank you for pointing out this error and we will correct it in our revision.

    1. This manuscript examines preprint review services and their role in the scholarly communications ecosystem.  It seems quite thorough to me. In Table 1 they list many peer-review services that I was unaware of e.g. SciRate and Sinai Immunology Review Project.

      To help elicit critical & confirmatory responses for this peer review report I am trialling Elsevier’s suggested “structured peer review” core questions, and treating this manuscript as a research article.

      Introduction

      1. Is the background and literature section up to date and appropriate for the topic?

        Yes.

      2. Are the primary (and secondary) objectives clearly stated at the end of the introduction?

        No. Instead the authors have chosen to put the two research questions on page 6 in the methods section. I wonder if they ought to be moved into the introduction – the research questions are not methods in themselves. Might it be better to state the research questions first and then detail the methods one uses to address those questions afterwards? [as Elsevier’s structured template seems implicitly to prefer.

      Methods

      1. Are the study methods (including theory/applicability/modelling) reported in sufficient detail to allow for their replicability or reproducibility?

        I note with approval that the version number of the software they used (ATLAS.ti) was given.

        I note with approval that the underlying data is publicly archived under CC BY at figshare.

        The Atlas.ti report data spreadsheet could do with some small improvement – the column headers are little cryptic e.g. “Nº  ST “ and “ST” which I eventually deduced was Number of Schools of Thought and Schools of Thought (?)   

        Is there a rawer form of the data that could be deposited with which to evidence the work done? The Atlas.ti report spreadsheet seemed like it was downstream output data from Atlas.ti. What was the rawer input data entered into Atlas.ti? Can this be archived somewhere in case researchers want to reanalyse it using other tools and methods.

        I note with disapproval that Atlas.ti is proprietary software which may hinder the reproducibility of this work. Nonetheless I acknowledge that Atlas.ti usage is somewhat ‘accepted’ in social sciences despite this issue.

        I think the qualitative text analysis is a little vague and/or under-described: “Using ATLAS.ti Windows (version 23.0.8.0), we carried out a qualitative analysis of text from the relevant sites, assigning codes covering what they do and why they have chosen to do it that way.” That’s not enough detail. Perhaps an example or two could be given? Was inter-rater reliability performed when ‘assigning codes’ ? How do we know the ‘codes’ were assigned accurately?

      2. Are statistical analyses, controls, sampling mechanism, and statistical reporting (e.g., P-values, CIs, effect sizes) appropriate and well described?

        This is a descriptive study (and that’s fine) so there aren’t really any statistics on show here other than simple ‘counts’ (of Schools of Thought) in this manuscript. There are probably some statistical processes going on within the proprietary qualitative analysis of text done in ATLAS.ti but it is under described and so hard for me to evaluate. 

      Results

      1. Is the results presentation, including the number of tables and figures, appropriate to best present the study findings?

        Yes. However, I think a canonical URL to each service should be given.  A URL is very useful for disambiguation, to confirm e.g. that the authors mean this Hypothesis (www.hypothes.is) and NOT this Hypothesis (www.hyp.io). I know exactly which Hypothesis is the one the authors are referring to but we cannot assume all readers are experts 😊

        Optional suggestion: I wonder if the authors couldn’t present the table data in a slightly more visual and/or compact way? It’s not very visually appealing in its current state. Purely as an optional suggestion, to make the table more compact one could recode the answers given in one or more of the columns 2, 3 and 4 in the table e.g. "all disciplines = ⬤ , biomedical and life sciences = ▲, social sciences =  ‡  , engineering and technology = † ". I note this would give more space in the table to print the URLs for each service that both reviewers have requested.

        ———————————————————————————————

        | Service name | Developed by | Scientific disciplines | Types of outputs |

        | Episciences | Other | ⬤ | blah blah blah. |

        | Faculty Opinions | Individual researcher | ▲ | blah blah blah. |

        | Red Team Market | Individual researcher | ‡ | blah blah blah. |

        ———————————————————————————————

        The "Types of outputs" column might even lend themselves to mini-colour-pictograms (?) which could be more concise and more visually appealing? A table just of text, might be scientifically 'correct' but it is incredibly dull for readers, in my opinion.

      2. Are additional sub-analyses or statistical measures needed (e.g., reporting of CIs, effect sizes, sensitivity analyses)?

        No / Not applicable. 

      Discussion

      1. Is the interpretation of results and study conclusions supported by the data and the study design?

        Yes.

      2. Have the authors clearly emphasized the limitations of their study/theory/methods/argument?

        No. Perhaps a discussion of the linguistic/comprehension bias of the authors might be appropriate for this manuscript. What if there are ‘local’ or regional Chinese, Japanese, Indonesian or Arabic language preprint review services out there? Would this authorship team really be able to find them?

      Additional points:

      • Perhaps the points made in this manuscript about financial sustainability (p24) are a little too pessimistic. I get it, there is merit to this argument, but there is also some significant investment going on there if you know where to look. Perhaps it might be worth citing some recent investments e.g. Gates -> PREreview (2024) https://content.prereview.org/prereview-welcomes-funding/  and Arcadia’s $4 million USD to COAR for the Notify Project which supports a range of preprint review communities including Peer Community In, Episciences, PREreview and Harvard Library.  (source: https://coar-repositories.org/news-updates/coar-welcomes-significant-funding-for-the-notify-project/

      • Although I note they are mentioned, I think more needs to be written about the similarity and overlap between ‘overlay journals’ and preprint review services. Are these arguably not just two different terms for kinda the same thing? If you have Peer Community In which has it’s overlay component in the form of the Peer Community Journal, why not mention other overlay journals like Discrete Analysis and The Open Journal of Astrophysics.   I think Peer Community In (& it’s PCJ) is the go-to example of the thin-ness of the line the separates (or doesn’t!) overlay journals and preprint review services. Some more exposition on this would be useful.

    2. Thank you very much for the opportunity to review the preprint titled “Preprint review services: Disrupting the scholarly communication landscape?” (https://doi.org/10.31235/osf.io/8c6xm) The authors review services that facilitate peer review of preprints, primarily in the STEM (science, technology, engineering, and math) disciplines. They examine how these services operate and their role within the scholarly publishing ecosystem. Additionally, the authors discuss the potential benefits of these preprint peer review services, placing them in the context of tensions in the broader peer review reform movement. The discussions are organized according to four “schools of thought” in peer review reform, as outlined by Waltman et al. (2023), which provides a useful framework for analyzing the services. In terms of methodology, I believe the authors were thorough in their search for preprint review services, especially given that a systematic search might be impractical.

      As I see it, the adoption of preprints and reforming peer review are key components of the move towards improving scholarly communication and open research. This article is a useful step along that journey, taking stock of current progress, with a discussion that illuminates possible paths forward. It is also well-structured and easy for me to follow. I believe it is a valuable contribution to the metaresearch literature.

      On a high level, I believe the authors have made a reasonable case that preprint review services might make peer review more transparent and rewarding for all involved. Looking forward, I would like to see metaresearch which gathers further evidence that these benefits are truly being realised.

      In this review, I will present some general points which merit further discussion or clarification to aid an uninitiated reader. Additionally, I raise one issue regarding how the authors framed the article and categorised preprint review services and the disciplines they serve. In my view, this problem does not fundamentally undermine the robust search, analyses, and discussion in this paper, but it risks putting off some researchers and constrains how broadly one should derive conclusions.

      General comments

      Some metaresearchers may be aware of preprints, but not all readers will be familiar with them. I suggest briefly defining what they are, how they work, and which types of research have benefited from preprints, similar to how “preprint review service” is clearly defined in the introduction.

      Regarding Waltman et al.’s (2023) “Equity & Inclusion” school of thought, does it specifically aim for “balanced” representation by different groups as stated in this article? There is an important difference between “balanced” versus “equitable” representation, and I would like to see it addressed in this text.

      Another analysis I would like to see is whether any of the 23 services reviewed present any evidence that their approach has improved research quality. For instance, the discussion on peer review efficiency and incentives states that there is currently “no hard evidence” that journals want to utilise reviews by Rapid Reviews: COVID-19, and that “not all journals are receptive” to partnerships. Are journals skeptical of whether preprint review services could improve research quality? Or might another dynamic be at work?

      The authors cite Nguyen et al. (2015) and Okuzaki et al. (2019), stating that peer review is often “overloaded”. I would like to see a clearer explanation by what “overloaded” means in this context so that a reader does not have to read the two cited papers.

      To the best of my understanding, one of the major sticking points in peer review reform is whether to anonymise reviewers and/or authors. Consequently, I appreciate the comprehensive discussion about this issue by the authors.

      However, I am only partially convinced by the statement that double anonymity is “essentially incompatible” with preprint review. For example, there may be, as yet not fully explored, ways to publish anonymous preprints with (a) a notice that it has been submitted to, or is undergoing, peer review; and (b) that the authors will be revealed once peer review has been performed (e.g. at least one review has been published). This would avoid the issue of publishing only after review is concluded as is the case for Hypothesis and Peer Community In.

      Additionally, the authors describe 13 services which aim to “balance transparency and protect reviewers’ interests”. This is a laudable goal, but I am concerned that framing this as a “balance” implies a binary choice, and that to have more of one, we must lose an equal amount of the other. Thinking only in terms of “balance” prevents creative, win-win solutions. Could a case be made for non-anonymity to be complemented by a reputation system for authors and reviewers? For example, major misconduct (e.g. retribution against a critical review) would be recorded in that system and dissuade bad actors. Something similar can already be seen in the reviewer evaluation system of CrowdPeer, which could plausibly be extended or modified to highlight misconduct.

      I also note that misconduct and abusive behaviour already occur even in fully or partially anonymised peer review, and they are not limited to the review or preprints. While I am not aware of existing literature on this topic, academics’ fears seem reasonable. For example, there is at least anecdotal testimonies that a reviewer would deliberately reject a paper to retard the progress of a rival research group, while taking the ideas of that paper and beating their competitors to winning a grant. Or, a junior researcher might refrain from giving a negative review out of fear that the senior researcher whose work they are reviewing might retaliate. These fears, real or not, seem to play a part in the debates about if and how peer review should (or should not) be anonymised. I would like to see an exploration of whether de-anonimisation will improve or worsen this behaviour and in what contexts. And if such studies exist, it would be good to discuss them in this paper.

      I found it interesting that almost all preprint review services claim to be complementary to, and not compete with, traditional journal-based peer review. The methodology described in this article cannot definitely explain what is going on, but I suspect there may be a connection between this aversion to compete with traditional journals, and (a) the skepticism of journals towards partnering with preprint review services and (b) the dearth of publisher-run options. I hypothesise that there is a power dynamic at play, where traditional publishers have a vested interest in maintaining the power they hold over scholarly communication, and that preprint review services stress their complementarity (instead of competitiveness) as a survival mechanism. This may be an avenue for further metaresearch.

      To understand preprints from which fields of research are actually present on the services categorised under “all disciplines,” I used the Random Integer Set Generator by the Random.org true random number service (https://www.random.org/integer-sets/) to select five services for closer examination: Hypothesis, Peeriodicals, PubPeer, Qeios, and Researchers One. Of those, I observed that Hypothesis is an open source web annotation service that allows commenting on and discussion of any web page on the Internet regardless of whether it is research or preprints. Hypothesis has a sub-project named TRiP (Transparent Review in Preprints), which is their preprint review service in collaboration with Cold Spring Harbor Laboratory. It is unclear to me why the authors listed Hypothesis as the service name in Table 1 (and elsewhere) instead of TRiP (or other similar sub-projects). In addition, Hypothesis seems to be framed as a generic web annotation service that is used by some as a preprint review tool. This seems fundamentally different from others who are explicitly set up as preprint review services. This difference seems noteworthy to me.

      To aid readers, I also suggest including hyperlinks to the 23 services reviewed in this paper. My comments on disciplinary representation in these services are elaborated further below.

      One minor point of curiosity is that several services use an “automated tool” to select reviewers. It would be helpful to describe in this paper exactly what those tools are and how they work, or report situations where services do not explain it.

      Lastly, what did the authors mean by “software heritage” in section 6? Are they referring to the organisation named Software Heritage (https://www.softwareheritage.org/) or something else? It is not clear to me how preprint reviews would be deposited in this context.

      Respecting disciplinary and epistemic diversity

      In the abstract and elsewhere in the article, the authors acknowledge that preprints are gaining momentum “in some fields” as a way to share “scientific” findings. After reading this article, I agree that preprint review services may disrupt publishing for research communities where preprints are in the process of being adopted or already normalised. However, I am less convinced that such disruption is occurring, or could occur, for scholarly publishing more generally.

      I am particularly concerned about the casual conflation of “research” and “scientific research” in this article. Right from the start, it mentions how preprints allow sharing “new scientific findings” in the abstract, stating they “make scientific work available rapidly.” It also notes that preprints enable “scientific work to be accessed in a timely way not only by scientists, but also…” This framing implies that all “scholarly communication,” as mentioned in the title, is synonymous with “scientific communication.” Such language excludes researchers who do not typically identify their work as “scientific” research. Another example of this conflation appears in the caption for Figure 1, which outlines potential benefits of preprint review services. Here, “users” are defined as “scientists, policymakers, journalists, and citizens in general.” But what about researchers and scholars who do not see themselves as “scientists”?

      Similarly, the authors describe the 23 preprint review services using six categories, one of which is “scientific discipline”. One of those disciplines is called “humanities” in the text, and Table 1 lists it as a discipline for Science Open Reviewed. Do the authors consider “humanities” to be a “scientific” discipline? If so, I think that needs to be justified with very strong evidence.

      Additionally, Waltman et al.’s four schools of thought for peer review reform works well with the 23 services analysed. However, at least three out of the four are explicitly described as improving “scientific” research.

      Related to the above are how the five “scientific disciplines” are described as the “usual organisation” of the scholarly communication landscape. On what basis should they be considered “usual”? In this formulation, research in literature, history, music, philosophy, and many other subjects would all be lumped together into the “humanities”, which sit at the same hierarchical level as “biomedical and life sciences”, arguably a much more specific discipline. My point is not to argue for a specific organisation of research disciplines, but to highlight a key epistemic assumption underlying the whole paper that comes across as very STEM-centric (science, technology, engineering, and math).

      How might this part of the methodology affect the categories presented in Table 1? “Biomedical and life sciences” appear to be overrepresented compared to other “disciplines”. I’d like to see a discussion that examines this pattern, and considers why preprint review services (or maybe even preprints more generally) appear to cover mostly the biomedical or physical sciences.

      In addition, there are 12 services described as serving “all disciplines”. I believe this paper can be improved by at least a qualitative assessment of the diversity of disciplines actually represented on those services. Because it is reported that many of these service stress improving the “reproducibility” of research, I suspect most of them serve disciplines which rely on experimental science.

      I randomly selected five services for closer examination, as mentioned above. Of those, only Qeios has demonstrated an attempt to at least split “arts and humanities” into subfields. The others either don’t have such categories altogether, or have a clear focus on a few disciplines (e.g. life sciences for Hypothesis/TRiP). In all cases I studied, there is a heavy focus on STEM subjects, especially biology or medical research. However, they are all categorised by the authors as serving “all disciplines”.

      If preprint review services originate from, or mostly serve, a narrow range of STEM disciplines (especially experiment-based ones), it would be worth examining why that is the case, and whether preprints and reviews of them could (or could not) serve other disciplines and epistemologies.

      It is postulated that preprint review services might “disrupt the scholarly communication landscape in a more radical way”. Considering the problematic language I observed, what about fields of research where peer-reviewed journal publications are not the primary form of communication? Would preprint review services disrupt their scholarly communications?

      To be clear, my concern is not just the conflation of language in a linguistic sense but rather inequitable epistemic power. I worry that this conflation would (a) exclude, minoritise, and alienate researchers of diverse disciplines from engaging with metaresearch; and (b) blind us from a clear pattern in these 23 services, that is their strong focus on the life sciences and medical research and a discussion of why that might be the case. Critically, what message are we sending to, for example, a researcher of 18th century French poetry with the language and framing of this paper? I believe the way “disciplines” are currently presented here poses a real risk of devaluing and minoritising certain subject areas and ways of knowing. In its current form, I believe that while this paper is a very valuable contribution, one should not derive from it any conclusions which apply to scholarly publishing as a whole.

      The authors have demonstrated inclusive language elsewhere. For example, they have consciously avoided “peer” when discussing preprint review services, clearly contrasting them to “journal-based peer review”. Therefore, I respectfully suggest that similar sensitivity be adopted to avoid treating “scientific research” and “research” as the same thing. A discussion, or reference to existing works, on the disciplinary skew of preprints (and reviews of them) would also add to the intellectual rigour of this already excellent piece.

      Overall, I believe this paper is a valuable reflection on the state of preprints and services which review them. Addressing the points I raised, especially the use of more inclusive language with regards to disciplinary diversity, would further elevate its usefulness in the metaresearch discourse. Thank you again for the chance to review.

      Signed:

      Dr Pen-Yuan Hsing (ORCID ID: 0000-0002-5394-879X)

      University of Bristol, United Kingdom

      Data availability

      I have checked the associated dataset, but still suggest including hyperlinks to the 23 services analysed in the main text of this paper.

    1. In "Researchers Are Willing to Trade Their Results for Journal Prestige: Results from a Discrete Choice Experiment", the authors investigate researchers’ publication preferences using a discrete choice experiment in a cross-sectional survey of international health and medical researchers. The study investigates publishing decisions in relation to negotiation of trade-offs amongst various factors like journal impact factor, review helpfulness, formatting requirements, and usefulness for promotion in their decisions on where to publish. The research is timely; as the authors point out, reform of research assessment is currently a very active topic. The design and methods of the study are suitable and robust. The use of focus groups and interviews in developing the attributes for study shows care in the design. The survey instrument itself is generally very well-designed, with important tests of survey fatigue, understanding (dominant choice task) and respondent choice consistency (repeat choice task) included. Respondent performance was good or excellent across all these checks. Analysis methods (pMMNL and latent class analysis) are well-suited to the task. Pre-registration and sharing of data and code show commitment to transparency. Limitations are generally well-described.

      In the below, I give suggestions for clarification/improvement. Except for some clarifications on limitations and one narrower point (reporting of qualitative data analysis methods), my suggestions are only that – the preprint could otherwise stand, as is, as a very robust and interesting piece of scientific work.

      1. Respondents come from a broad range of countries (63), with 47 of those countries represented by fewer than 10 respondents. Institutional cultures of evaluation can differ greatly across nations. And we can expect variability in exposure to the messages of DORA (seen, for example, in level of permeation of DORA as measured by signatories in each country, https://sfdora.org/signers/)..%3B!!NVzLfOphnbDXSw!HdeyeHHei6yWQHFjhN3deSSfp82ur9i9JNOLEVOYZN0BvyslUO2S8DlvjBbautmafJEvlUsxQZbT0JLQX7lO8EcOYtZsJkA%24&data=05%7C02%7Ca.l.brasil.varandas.pinto%40cwts.leidenuniv.nl%7C9f47a111adec49d04bb608dd0614ae94%7Cca2a7f76dbd74ec091086b3d524fb7c8%7C0%7C0%7C638673408085242099%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=by5mhPfSM0MFFG9LE2iiYjdtSs5IhvpuukqVv%2FLak2s%3D&reserved=0 "https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2F%2Fsfdora.org%2Fsigners%2F).%3B!!NVzLfOphnbDXSw!HdeyeHHei6yWQHFjhN3deSSfp82ur9i9JNOLEVOYZN0BvyslUO2S8DlvjBbautmafJEvlUsxQZbT0JLQX7lO8EcOYtZsJkA%24&data=05%7C02%7Ca.l.brasil.varandas.pinto%40cwts.leidenuniv.nl%7C9f47a111adec49d04bb608dd0614ae94%7Cca2a7f76dbd74ec091086b3d524fb7c8%7C0%7C0%7C638673408085242099%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=by5mhPfSM0MFFG9LE2iiYjdtSs5IhvpuukqVv%2FLak2s%3D&reserved=0") In addition, some contexts may mandate or incentivise publication in some venues using measures including IF, but also requiring journals to be in certain databases like WoS or Scopus, or having preferred journal lists). I would suggest the authors should include in the Sampling section a rationale for taking this international approach, including any potentially confounding factors it may introduce, and then adding the latter also in the limitations.

      2. Reporting of qualitative results: In the introduction and methods, the role of the focus groups and interviews seems to have been just to inform the design of the experiment. But then, results from that qualitative work then appear as direct quotes within the discussion to contextualise or explain results. In this sense though, the qualitative results are being used as new data. Given this, I feel that the methods section should include description of the methods and tools used for qualitative data analysis (currently it does not). But in addition, to my understanding (and this may be a question of disciplinary norms – I’m not a health/medicine researcher), generally new data should not be introduced in the discussion section of a research paper. Rather the discussion is meant to interpret, analyse, and provide context for the results that have already been presented. I personally hence feel that the paper would benefit from the qualitative results being reported separately within the results section.

      3. Impact factors – Discussion section: While there is interesting new information on the relative trade-offs amongst other factors, the most emphasised finding, that impact factors still play a prominent role in publication venue decisions, is hardly surprising. More could perhaps be done to compare how the levels of importance reported here differ with previous results from other disciplines or over time (I know a like-for-like comparison is difficult but other studies have investigated these themes, e.g., https://doi.org/10.1177/01655515209585). In addition, beyond the question of whether impact factors are important, a more interesting question in my view is why they still persist. What are they used for and why are they still such important “driver[s] of researchers’ behaviour”? This was not the authors’ question, and they do provide some contextualisation by quoting their participants, but still I think they could do more to contextualise what is known from the literature on that to draw out the implications here. The attribute label in the methods for IF is “ranking”, but ranking according of what and for what? Not just average per-article citations in a journal over a given time frame. Rather, impact factors are used as a proxy indicators of less-tangible desirable qualities – certainly prestige (as the title of this article suggests), but also quality, trust (as reported by one quoted focus group member “I would never select a journal without an impact factor as I always publish in journals that I know and can trust that are not predatory”, p.6), journal visibility, importance to the field, or improved chances of downstream citations or uptake in news media/policy/industry etc. Picking apart the interactions of these various factors in researchers’ choices to make use of IFs (which is not in all cases bogus or unjustified) could add valuable context. I’d especially recommend engaging at least briefly with more work from Science and Technology Studies - especially Müller and de Rijcke’s excellent Thinking with Indicators study (doi: 10.1093/reseval/rvx023), but also those authors other work, as well as work from Ulrike Felt, Alex Rushforth (esp https://doi.org/10.1007/s11024-015-9274-5), Björn Hammerfelt and others.

      4. Disciplinary coverage: (1) A lot of the STS work I talk about above emphasises epistemic diversity and the ways cultures of indicator use differ across disciplinary traditions. For this reason, I think it should be pointed out in the limitations that this is research in Health/Med only, with questions on generalisability to other fields. (2) Also, although the abstract and body of the article do make clear the disciplinary focus, the title does not. Hence, I believe the title should be slightly amended (e.g., “Health and Medical Researchers Are Willing to Trade …”)

    1. when we are immersed in something, surrounded by it the waywe are by images from the media, we may come to accept them as just part ofthe real and natural world.

      We’re constantly surrounded by media images so it’s easy to take them for granted. I think Hall is saying for us to take a step back and think critically about what they show and why.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public review):  

      Summary:  

      The authors state the study's goal clearly: "The goal of our study was to understand to what extent animal individuality is influenced by situational changes in the environment, i.e., how much of an animal's individuality remains after one or more environmental features change." They use visually guided behavioral features to examine the extent of correlation over time and in a variety of contexts. They develop new behavioral instrumentation and software to measure behavior in Buridan's paradigm (and variations thereof), the Y-maze, and a flight simulator. Using these assays, they examine the correlations between conditions for a panel of locomotion parameters. They propose that inter-assay correlations will determine the persistence of locomotion individuality.

      Strengths:  

      The OED defines individuality as "the sum of the attributes which distinguish a person or thing from others of the same kind," a definition mirrored by other dictionaries and the scientific literature on the topic. The concept of behavioral individuality can be characterized as: (1) a large set of behavioral attributes, (2) with inter-individual variability, that are (3) stable over time. A previous study examined walking parameters in Buridan's paradigm, finding that several parameters were variable between individuals, and that these showed stability over separate days and up to 4 weeks (DOI: 10.1126/science.aaw718). The present study replicates some of those findings and extends the experiments from temporal stability to examining correlation of locomotion features between different contexts.  

      The major strength of the study is using a range of different behavioral assays to examine the correlations of several different behavior parameters. It shows clearly that the inter-individual variability of some parameters is at least partially preserved between some contexts, and not preserved between others. The development of high-throughput behavior assays and sharing the information on how to make the assays is a commendable contribution.

      Weaknesses:  

      The definition of individuality considers a comprehensive or large set of attributes, but the authors consider only a handful. In Supplemental Fig. S8, the authors show a large correlation matrix of many behavioral parameters, but these are illegible and are only mentioned briefly in Results. Why were five or so parameters selected from the full set? How were these selected? Do the correlation trends hold true across all parameters? For assays in which only a subset of parameters can be directly compared, were all of these included in the analysis, or only a subset?  

      The correlation analysis is used to establish stability between assays. For temporal re-testing, "stability" is certainly the appropriate word, but between contexts it implies that there could be 'instability'. Rather, instead of the 'instability' of a single brain process, a different behavior in a different context could arise from engaging largely (or entirely?) distinct context-dependent internal processes, and have nothing to do with process stability per se. For inter-context similarities, perhaps a better word would be "consistency".  

      The parameters are considered one-by-one, not in aggregate. This focuses on the stability/consistency of the variability of a single parameter at a time, rather than holistic individuality. It would appear that an appropriate measure of individuality stability (or individuality consistency) that accounts for the high-dimensional nature of individuality would somehow summarize correlations across all parameters. Why was a multivariate approach (e.g. multiple regression/correlation) not used? Treating the data with a multivariate or averaged approach would allow the authors to directly address 'individuality stability', along with the analyses of single-parameter variability stability.

      The correlation coefficients are sometimes quite low, though highly significant, and are deemed to indicate stability. For example, in Figure 4C top left, the % of time walked at 23{degree sign}C and 32{degree sign}C are correlated by 0.263, which corresponds to an R2 of 0.069 i.e. just 7% of the 32{degree sign}C variance is predictable by the 23{degree sign}C variance. Is it fair to say that 7% determination indicates parameter stability? Another example: "Vector strength was the most correlated attention parameter... correlations ranged... to -0.197," which implies that 96% (1 - R2) of Y-maze variance is not predicted by Buridan variance. At what level does an r value not represent stability?

      The authors describe a dissociation between inter-group differences and inter-individual variation stability, i.e. sometimes large mean differences between contexts, but significant correlation between individual test and retest data. Given that correlation is sensitive to slope, this might be expected to underestimate the variability stability (or consistency). Is there a way to adjust for the group differences before examining correlation? For example, would it be possible to transform the values to in-group ranks prior to correlation analysis?

      What is gained by classifying the five parameters into exploration, attention, and anxiety? To what extent have these classifications been validated, both in general, and with regard to these specific parameters? Is increased walking speed at higher temperature necessarily due to increased 'explorative' nature, or could it be attributed to increased metabolism, dehydration stress, or a heat-pain response? To what extent are these categories subjective?

      The legends are quite brief and do not link to descriptions of specific experiments. For example, Figure 4a depicts a graphical overview of the procedure, but I could not find a detailed description of this experiment's protocol.

      Using the current single-correlation analysis approach, the aims would benefit from re-wording to appropriately address single-parameter variability stability/consistency (as distinct from holistic individuality). Alternatively, the analysis could be adjusted to address the multivariate nature of individuality, so that the claims and the analysis are in concordance with each other.

      The study presents a bounty of new technology to study visually guided behaviors. The Github link to the software was not available. To verify successful transfer or open-hardware and open-software, a report would demonstrate transfer by collaboration with one or more other laboratories, which the present manuscript does not appear to do. Nevertheless, making the technology available to readers is commendable.

      The study discusses a number of interesting, stimulating ideas about interindividual variability and presents intriguing data that speaks to those ideas, albeit with the issues outlined above.

      While the current work does not present any mechanistic analysis of interindividual variability, the implementation of high-throughput assays sets up the field to more systematically investigate fly visual behaviors, their variability, and their underlying mechanisms.  

      Comments on revisions:  

      I want to express my appreciation for the authors' responsiveness to the reviewer feedback. They appear to have addressed my previous concerns through various modifications including GLM analysis, however, some areas still require clarification for the benefit of an audience that includes geneticists.  

      (1) GLM Analysis Explanation (Figure 9)  

      While the authors state that their new GLM results support their original conclusions, the explanation of these results in the text is insufficient. Specifically:

      The interpretation of coefficients and their statistical significance needs more detailed explanation. The audience includes geneticists and other nonstatistical people, so the GLM should be explained in terms of the criteria or quantities used to assess how well the results conform with the hypothesis, and to what extent they diverge.

      The criteria used to judge how well the GLM results support their hypothesis are not clearly stated.

      The relationship between the GLM findings and their original correlationbased conclusions needs better integration and connection, leading the reader through your reasoning.

      We thank the reviewer for highlighting this important point. We have revised the Results section in the reviseed manuscript to include a more detailed explanation of the GLM analysis. Specifically, we now clarify the interpretation of the model coefficients, including the direction and statistical significance, in relation to the hypothesized effects. We also outline the criteria we used to assess how well the GLM supports our original correlation-based conclusions—namely, whether the sign and significance of the coefficients align with the expected relationships derived from our prior analysis. Finally, we explicitly describe how the GLM results confirm or extend the patterns observed in the correlation-based analysis, to guide readers through our reasoning and the integration of both approaches.

      (2) Documentation of Changes  

      One struggle with the revised manuscript is that no "tracked changes" version was included, so it is hard to know exactly what was done. Without access to the previous version of the manuscript, it is difficult to fully assess the extent of revisions made. The authors should provide a more comprehensive summary of the specific changes implemented, particularly regarding:

      We thank the reviewer for bringing this to our attention. We were equally confused to learn that the tracked-changes version was not visible, despite having submitted one to eLife as part of our revision. 

      Upon contacting the editorial office, they confirmed that we did submit a trackedchanges version, but clarified that it did not contain embedded figures (as they were added manually to the clean version).  The editorial response said in detail: “Regarding the tracked-changes file: it appears the version with markup lacked figures, while the figure-complete PDF had markup removed, which likely caused the confusion mentioned by the reviewers.” We hope this answer from eLife clarifies the reviewers’ concern.

      (2)  Statistical Method Selection  

      The authors mention using "ridge regression to mitigate collinearity among predictors" but do not adequately justify this choice over other approaches. They should explain:

      Why ridge regression was selected as the optimal method  

      How the regularization parameter (λ) was determined  

      How this choice affects the interpretation of environmental parameters' influence on individuality

      We appreciate the reviewer’s thoughtful question regarding our choice of statistical method. In response, we have expanded the Methods section in the revised manuscript to provide a more detailed justification for the use of a GLM, including ridge regression. Specifically, we explain that ridge regression was selected to address collinearity and to control for overfitting.

      We now also describe how the regularization parameter (λ) was selected: we used 5-fold cross-validation over a log-spaced grid (10<sup>⁻⁶</sup> - 10<sup>⁶</sup) to identify the optimal value that minimized the mean squared error (MSE).

      Finally, we clarify in both the Methods and Results sections how this modeling choice affects the interpretation of our findings. 

      Reviewer #2 (Public review):  

      Summary:  

      The authors repeatedly measured the behavior of individual flies across several environmental situations in custom-made behavioral phenotyping rigs.

      Strengths:  

      The study uses several different behavioral phenotyping devices to quantify individual behavior in a number of different situations and over time. It seems to be a very impressive amount of data. The authors also make all their behavioral phenotyping rig design and tracking software available, which I think is great, and I'm sure other folks will be interested in using and adapting to their own needs.

      Weaknesses/Limitations:  

      I think an important limitation is that while the authors measured the flies under different environmental scenarios (i.e. with different lighting, temperature) they didn't really alter the "context" of the environment. At least within behavioral ecology, context would refer to the potential functionality of the expressed behaviors so for example, an anti-predator context, or a mating context, or foraging. Here, the authors seem to really just be measuring aspects of locomotion under benign (relatively low risk perception) contexts. This is not a flaw of the study, but rather a limitation to how strongly the authors can really say that this demonstrates that individuality is generalized across many different contexts. It's quite possible that rank-order of locomotor (or other) behaviors may shift when the flies are in a mating or risky context.  

      I think the authors are missing an opportunity to use much more robust statistical methods It appears as though the authors used pearson correlations across time/situations to estimate individual variation; however far more sophisticated and elegant methods exist. The problem is that pearson correlation coefficients can be anti-conservative and additionally, the authors have thus had to perform many many tests to correlate behaviors across the different trials/scenarios. I don't see any evidence that the authors are controlling for multiple testing which I think would also help. Alternatively, though, the paper would be a lot stronger, and my guess is, much more streamlined if the authors employ hierarchical mixed models to analyse these data, which are the standard analytical tools in the study of individual behavioral variation. In this way, the authors could partition the behavioral variance into its among- and within-individual components and quantify repeatability of different behaviors across trials/scenarios simultaneously. This would remove the need to estimate 3 different correlations for day 1 & day 2, day 1 & 3, day 2 & 3 (or stripe 0 & stripe 1, etc) and instead just report a single repeatability for e.g. the time spent walking among the different strip patterns (eg. figure 3). Additionally, the authors could then use multivariate models where the response variables are all the behaviors combined and the authors could estimate the among-individual covariance in these behaviors. I see that the authors state they include generalized linear mixed models in their updated MS, but I struggled a bit to understand exactly how these models were fit? What exactly was the response? what exactly were the predictors (I just don't understand what Line404 means "a GLM was trained using the environmental parameters as predictors (0 when the parameter was not changed, 1 if it was) and the resulting individual rank differences as the response"). So were different models run for each scenario? for different behaviors? Across scenarios? What exactly? I just harp on this because I'm actually really interested in these data and think that updating these methods can really help clarify the results and make the main messages much clearer!

      I appreciate that the authors now included their sample sizes in the main body of text (as opposed to the supplement) but I think that it would still help if the authors included a brief overview of their design at the start of the methods. It is still unclear to me how many rigs each individual fly was run through? Were the same individuals measured in multiple different rigs/scenarios? Or just one?

      I really think a variance partitioning modeling framework could certainly improve their statistical inference and likely highlight some other cool patterns as these methods could better estimate stability and covariance in individual intercepts (and potentially slopes) across time and situation. I also genuinely think that this will improve the impact and reach of this paper as they'll be using methods that are standard in the study of individual behavioral variation

      Reviewer #3 (Public review):  

      This manuscript is a continuation of past work by the last author where they looked at stochasticity in developmental processes leading to inter-individual behavioural differences. In that work, the focus was on a specific behaviour under specific conditions while probing the neural basis of the variability. In this work, the authors set out to describe in detail how stable individuality of animal behaviours is in the context of various external and internal influences. They identify a few behaviours to monitor (read outs of attention, exploration, and 'anxiety'); some external stimuli (temperature, contrast, nature of visual cues, and spatial environment); and two internal states (walking and flying).

      They then use high-throughput behavioural arenas - most of which they have built and made plans available for others to replicate - to quantify and compare combinations of these behaviours, stimuli, and internal states. This detailed analysis reveals that:

      (1) Many individualistic behaviours remain stable over the course of many days.  

      (2) That some of these (walking speed) remain stable over changing visual cues. Others (walking speed and centrophobicity) remain stable at different temperatures.

      (3) All the behaviours they tested fail to remain stable over spatially varying environment (arena shape).

      (4) and only angular velocity (a read out of attention) remains stable across varying internal states (walking and flying)

      Thus, the authors conclude that there is a hierarchy in the influence of external stimuli and internal states on the stability of individual behaviours.

      The manuscript is a technical feat with the authors having built many new high-throughput assays. The number of animals are large and many variables have been tested - different types of behavioural paradigms, flying vs walking, varying visual stimuli, different temperature among others.  

      Comments on revisions:'  

      The authors have addressed my previous concerns.  

      We thank the reviewer for the positive feedback and are glad our revisions have satisfactorily addressed the previous concerns. We appreciate the thoughtful input that helped us improve the clarity and rigor of the manuscript.

      Reviewer #1 (Recommendations for the authors):  

      Comment on Revised Manuscript  

      Recommendations for Improvement  

      (1) Expand the Results section for Figure 9 with a more detailed interpretation of the GLM coefficients and their biological significance

      (2) Provide explicit criteria (or at least explain in detail) for how the GLM results confirm or undermine their original hypothesis about environmental context hierarchy

      While the claims are interesting, the additional statistical analysis appears promising. However, clearer explanation of these new results would strengthen the paper and ensure that readers from diverse backgrounds can fully understand how the evidence supports the authors' conclusions about individuality across environmental contexts. 

      We thank the reviewer for these constructive suggestions. In response to these suggestions, we have expanded both the Methods and Results sections to provide a more detailed explanation of the GLM coefficients, including their interpretation and how they relate to our original correlation-based findings.

      We now clarify how the direction, magnitude, and statistical significance of specific coefficients reflect the influence of different environmental factors on the persistence of individual behavioral traits. To make this accessible to readers from diverse backgrounds, we explicitly outline the criteria we used to evaluate whether the GLM results support our hypothesis about the hierarchical influence of environmental context, namely, whether the structure and strength of effects align with the patterns predicted from our prior correlation analysis.

      These additions improve clarity and help readers understand how the new statistical results reinforce our conclusions about the context-dependence of behavioral individuality.

      Reviewer #2 (Recommendations for the authors):  

      Thanks for the revision of the paper! I updated my review to try and provide a little more guidance by what I mean about updating your analyses. I really think this is a super cool data set and I genuinely wish this were MY dataset so that way I could really dig into it to partition the variance. These variance partitioning methods are standard in my particular subfield (study of individual behavioral variation in ecology and evolution) and so I think employing them is 1) going to offer a MUCH more elegant and holistic view of the behavioral variation (e.g. you can report a single repeatability estimate for each behavior rather than 3 different correlations) and 2) improve the impact and readership for your paper as now you'll be using methods that a whole community of researchers are very familiar with. It's just a suggestion, but I hope you consider it!

      We sincerely thank the reviewer for the insightful and encouraging feedback and for introducing us to this modeling approach. In response to this suggestion, we have incorporated a hierarchical linear mixed-effects model into our analysis (now presented in Figure 10), accompanied by a new supplementary table (Table T3). We also updated the Methods, Results, and Discussion sections to describe the rationale, implementation, and implications of the mixed-model analysis.

      We agree with the reviewer that this approach provides a more elegant way to quantify behavioral variation and individual consistency across contexts. In particular, the ability to estimate repeatability directly aligns well with the core questions of our study. It facilitates improved communication of our findings to ecology, evolution, and behavior researchers. We greatly appreciate the suggestion; it has significantly strengthened both the analytical framework and the interpretability of the manuscript.

    1. Today, teachers are continually faced with the challenge of effectively reaching out to their classroom of students who span the spectrum of learning readiness, personal interests, skills, knowledge, and perspective. We know that not all students are alike.

      This is why I think it's important to survey your students at the beginning of the year in order to learn about their interests. Gaining insight into how your students learn best can help you, as the teacher, vary your teaching methods. Yes, habits can be good so students can know what the expectations are, but offering different sources, different instructional strategies, and diversifying your classroom layout can cover a wide range of learners. Keeping in mind that people learn through all of the major senses can truly help students retain information. For example, I am an auditory learner. I have to read aloud or talk things out. That's why I have to read things a few times to really grasp the material when it's a quiet setting. Therefore, timed tests really get my anxiety levels up. Not everyone has this problem or even recognizes it. Being an auditory learner may be great in the college setting during lectures, but it becomes very difficult in the test setting when everything is quiet. How could a teacher make an adjustment in the test setting for my scenario?

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      This work addresses an important question in the field of Drosophila aggression and mating. Prior social isolation is known to increase aggression in males, manifesting as increased lunging, which is suppressed by group housing (GH). However, it is also known that single housed (SH) males, despite their higher attempts to court females, are less successful. Here, Gao et al., develop a modified aggression assay to address this issue by recording aggression in Drosophila males for 2 hours, with a virgin female immobilized by burying its head in the food. They found that while SH males frequently lunge in this assay, GH males switch to higher intensity but very low frequency tussling. Constitutive neuronal silencing and activation experiments implicate cVA sensing Or67d neurons in promoting high frequency lunging, similar to earlier studies, whereas Or47b neurons promote low frequency but higher intensity tussling. Optogenetic activation revealed that three pairs of pC1SS2 neurons increase tussling. Cell-type-specific DsxM manipulations combined with morphological analysis of pC1SS2 neurons and side-by-side tussling quantification link the developmental role of DsxM to the functional output of these aggression-promoting cells. In contrast, although optogenetic activation of P1a neurons in the dark did not increase tussling, thermogenetic activation under visible light drove aggressive tussling. Using a further modified aggression assay, GH males exhibit increased tussling and maintain territorial control, which could contribute to a mating advantage over SH males, although direct measures of reproductive success are still needed.

      Strengths:

      Through a series of clever neurogenetic and behavioral approaches, the authors implicate specific subsets of ORNs and pC1 neurons in promoting distinct forms of aggressive behavior, particularly tussling. They have devised a refined territorial control paradigm, which appears more robust than earlier assays using a food cup (Chen et al., 2002). This new setup is relatively clutter-free and could be amenable to future automation using computer vision approaches. The updated Figure 5, which combines cell-type-specific developmental manipulation of pC1SS2 neurons with behavioral output, provides a link between developmental mechanisms and functional aggression circuits. The manuscript is generally well written, and the claims are largely supported by the data.

      Thank you for the precise summary of the manuscript and acknowledgment of the novelty and significance of the study.

      Weakness:

      Although most concerns have been addressed, the manuscript still lacks a rigorous, objective method for quantifying lunging and tussling. Because scoring appears to have been done manually and a single lunge in a 30 fps video spans only 2-3 frames, the 0.2 s cutoff seems arbitrary, and there are no objective criteria distinguishing reciprocal lunging from tussling. Despite this, the study offers valuable insights into the neural and behavioral mechanisms of Drosophila aggression.

      Thank you for this comment. The duration of each lunge was measured by analyzing the videos frame by frame—from the frame before the initiation of the lunge to the frame after its completion—resulting in an average span of 3–5 frames. Given a frame rate of 30 fps, this corresponds to approximately 0.1–0.17 seconds. We acknowledge that there are certain limitations for manually quantifying the two types of aggressive behaviors, which has now been stated in the newly added “Limitations of the Study” section in the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      Gao et al. investigated the change of aggression strategies by the social experience and its biological significance by using Drosophila. Two modes of inter-male aggression in Drosophila are known: lunging, high-frequency but weak mode, and tussling, low-frequency but more vigorous mode. Previous studies have mainly focused on the lunging. In this paper, the authors developed a new behavioral experiment system for observing tussling behavior and found that tussling is enhanced by group rearing, while lunging is suppressed. They then searched for neurons involved in the generation of tussling. Although olfactory receptors named Or67d and Or65a have previously been reported to function in the control of lunging, the authors found that these neurons do not function in the execution of tussling and another olfactory receptor, Or47b, is required for tussling, as shown by the inhibition of neuronal activity and the gene knockdown experiments. Further optogenetic experiments identified a small number of central neurons pC1[SS2] that induce the tussling specifically. These neurons express doublesex (dsx), a sex-determination factor, and knockdown of dsx strongly suppresses the induction of tussling. In order to further explore the ecological significance of the aggression mode change in group-rearing, a new behavioral experiment was performed to examine the territorial control and the mating competition. And finally, the authors found that differences in the social experience (group vs. solitary rearing) and the associated change in aggression strategy are important in these biologically significant competitions. These results add a new perspective to the study of aggression behavior in Drosophila. Furthermore, this study proposes an interesting general model in which the social experience modified behavioral changes play a role in reproductive success.

      Strengths:

      A behavioral experiment system that allows stable observation of tussling, which could not be easily analyzed due to its low-frequency, would be very useful. The experimental setup itself is relatively simple, just the addition of a female to the platform, so it should be applicable to future research. The finding about the relationship between the social experience and the aggression mode change is quite novel. Although the intensity of aggression changes with the social experience was already reported in several papers (Liu et al., 2011 etc), the fact that the behavioral mode itself changes significantly has rarely been addressed, and is extremely interesting. The identification of sensory and central neurons required for the tussling makes appropriate use of the genetic tools and the results are clear. A major strength of this study in neurobiology is the finding that another group of neurons (Or47b-expressing olfactory neurons and pC1[SS2] neurons), distinct from the group of neurons previously thought to be involved in low-intensity aggression (i.e. lunging), function in the tussling behavior. Furthermore, the results showing that the regulation of aggression by pC1[SS2] neurons is based on the function of the dsx gene will bring a new perspective to the field. Further investigation of the detailed circuit analysis is expected to elucidate the neural substrate of the conflict between the two aggression modes. The experimental systems examining the territory control and the reproductive competition in Fig. 6 are novel and have advantages in exploring their biological significance. It is important to note that in addition to showing the effects of age and social experience on territorial and mating behaviors, the authors experimentally demonstrated that altered fighting strategy has effects with respect to these behaviors.

      Thank you for your precise summary of our study and being very positive on the novelty and significance of the study.

      Reviewer #3 (Public review):

      In this revised manuscript, Gao et al. presented a series of well-controlled behavioral data showing that tussling, a form of high-intensity fighting among male fruit flies (Drosophila melanogaster) is enhanced specifically among socially experienced and relatively old males. Moreover, results of behavioral assays led authors to suggest that increased tussling among socially experienced males may increase mating success. They also concluded that tussling is controlled by a class of olfactory sensory neurons and sexually dimorphic central neurons that are distinct from pathways known to control lunges, a common male-type attack behavior.

      A major strength of this work is that it is the first attempt to characterize behavioral function and neural circuit associated with Drosophila tussling. Many animal species use both low-intensity and high-intensity tactics to resolve conflicts. High-intensity tactics are mostly reserved for escalated fights, which are relatively rare. Because of this, tussling in the flies, like high-intensity fights in other animal species, have not been systematically investigated. Previous studies on fly aggressive behavior have often used socially isolated, relatively young flies within a short observation duration. Their discovery that 1) older (14-days old) flies tend to tussle more often than younger (2 to 7-days-old) flies, 2) group-reared flies tend to tussle more often than socially isolated flies, and 3) flies tend to tussle at later stage (mostly ~15 minutes after the onset of fighting), are the result of their creativity to look outside of conventional experimental settings. These new findings are key for quantitatively characterizing this interesting yet under-studied behavior.

      Newly presented data have made several conclusions convincing. Detailed descriptions of methods to quantify behaviors help understand the basis of their claims by improving transparency. However, I remain concerned about authors' persistent attempt to link the high intensity aggression to reproductive success. The authors' effort to "tone down" the link between the two phenomena remains insufficient. There are purely correlational. I reiterate this issue because the overall value of the manuscript would not change with or without this claim.

      Thank you for acknowledging the novelty and significance of the study. Regarding the relationship you mentioned between high-intensity aggression and reproductive success, we further toned down the statement between them throughout the manuscript in the revised manuscript. We also modified the title to “Social Experience Shapes Fighting Strategies in Drosophila”. In addition, we now added a ‘Limitations of the Study’ section to clearly state the correlation between tussling and reproductive success.

      Reviewer #1 (Recommendations for the authors):

      If possible, mention the EM-connectome data showing the minimal interneuronal path from Or47b ORNs to pC1SS2 neurons (even if derived from the female connectome), which can strengthen the model of parallel sensory-central pathways.

      Thank you for this comment. According to data from the EM connectome, connecting Or47b ORNs to pC1d neurons requires at least two intermediate neurons. An example minimal pathway is: ORN_VA1v (L) → AL-AST1 (L) → PLP245 (L) → pC1d (R). We have added this point in the Discussion section of the revised manuscript.

      I'm not convinced that labeling lunges as "gentle" combat behavior works, either in the abstract or elsewhere. While lunging is indeed a lower-intensity form of aggression compared to tussling, applying anthropomorphic descriptors risks misleading readers.

      Thank you for this comment. We now use “low-intensity” instead of “gentle” to describe lunging.

      In Materials & Methods, please cross-check all figure-panel references after the recent re-numbering (e.g. "Figure 5A6A" etc.).

      Thank you for this comment. We have thoroughly verified the figure panel references in the Materials & Methods section.

      Ensure that Table S1 is clearly cited in the main text where you first describe fly genotypes.

      Thank you for this comment. We have now cited Table S1 in the main text.

      There are multiple grammatical errors and typos throughout the manuscript. Please correct them. Some examples are below, but this is not an exhaustive list:

      Line 98-102 requires rephrasing as the results are already published and not being observed by the authors.

      Thank you for this comment. We have revised the manuscript to “we occasionally observed the high-intensity boxing and tussling behavior in male flies as previously reported (Chen et al., 2002; Nilsen et al., 2004), which….”

      line 116- lower not 'lowed'.

      Corrected.

      line 942 & 945- knock-down males not 'knocking down males'.

      Corrected. Thank you very much for these comments.

      Reviewer #2 (Recommendations for the authors):

      The authors have almost completely answered the major comments I have noted on the ver.1 manuscript: (1) They clearly show changes in fighting strategy in the territory control behavior experiment in Fig. 6-figure supplements. (2) A detailed description of how aggressive behavior is measured. Thus, I am convinced by this revision.

      Thank you for these comments that make the manuscript a better version.

      Furthermore, in Fig. 5, which examined the relationship of pC1[SS2] characteristics with the function of dsx, is a novel data and very interesting. I look forward to further developments.

      Thank you. We will continue to explore this part in our future study.

      However, one point still concerns me.

      Line 192: Although the authors describe it as "usage-dependent," the trans-Tango technique is essentially a postsynaptic cell-labeling technique. It is possible that the labeling intensity in postsynaptic cells increases from the change in expression levels of the Or47b gene due to GH. However, there is no difference in the expression level of the Or47b gene labeled by GFP between SH and GH. Therefore, we cannot conclude that the expression of the Or47b gene is increased by rearing conditions.

      The original paper on trans-TANGO (Talay et al., 2017) does not discuss the usage-dependency. A review of trans-synaptic labeling techniques (Ni, Front Neural Circuits. 2021) discusses that the increase in trans-TANGO signaling with aging may be related to synaptic strength, but there is no experimental evidence for this. In my opinion, the results in Figure 3-figure supplement 2 only weakly suggest that the increase in trans-TANGO signaling may be explained by an increase in synaptic strength due to group rearing.

      We appreciate the reviewer’s insightful comment regarding the interpretation of the trans-Tango signal. Indeed, the original trans-Tango study (Talay et al., 2017) does not claim that the method is usage-dependent. The observed increase in trans-Tango labeling with age, as reported in their supplemental figures, may reflect accumulation over time, potentially influenced by synaptic maturation or increased component expression. To avoid overstating our results, we have revised the relevant statement in the manuscript to remove the term "usage-dependent" and now describe the change in trans-Tango signal more cautiously.  

      Reviewer #3 (Recommendations for the authors):

      Below are the cases where their professed attempts to "tone down the statement" appear ignored:

      Lines 27-29:

      "Our findings... suggest how social experience shapes fighting strategies to optimize reproductive success".

      We have now revised the manuscript to “Our findings… suggest that social experience may shape fighting strategies to optimize reproductive success.”

      Lines 85-86:

      "... discover that this infrequent yet intense form of combat is... crucial for territory dominance and mating competition".

      We have now revised the manuscript to “…discover that this infrequent yet intense form of combat is enhanced by social enrichment, while the low-intensity lunging is suppressed by social enrichment.” 

      Lines 335-339:

      "Here, we found that... GH males tend to... increase the high-intensity tussling, which enhances their territorial and mating competition."

      We have removed “which enhances their territorial and mating competition” in the revised manuscript.

      Lines 343-344:

      "... presenting a paradox between social experience, aggression and reproductive success. Our result resolved this paradox..."

      We have now revised the manuscript to “...Our results provide an explanation for this paradox…”

      Lines 355-358:

      "Interestingly, we found that the mating advantage gained through social enrichment can even offset the mating disadvantage associated with aging, further supporting the vital role of shifting fighting strategies in experienced, aged males."

      We have removed “further supporting the vital role of shifting fighting strategies in experienced, aged males” in the revised manuscript.

      Lines 361-362:

      "These results separate the function of the two fighting forms and rectify out understanding of how social experiences regulate aggression and reproductive success."

      We have removed this sentence in the revised manuscript.

      Some may say that a speculative statement is harmless, but I think it indeed is harmful unless it is clearly indicated as a speculation. It is regrettable that authors remain reluctant to change their claim without providing any new supporting evidence. All three reviewers raised the same concern in the first round of review.

      We apologize for not making the speculative nature of the statement clearer in the previous version. In the revised manuscript, we have now explicitly rephrased sentences to only suggest a correlation but not a causal link between tussling and reproductive success.

      I have no choice but to keep my evaluation of the manuscript as "Incomplete" unless the authors thoroughly eliminate any attempt to link these two. This must go beyond changing a few words in the lines listed above.

      Thank you for this comment. In addition to the lines listed above, we carefully checked all statements regarding the correlation between fighting strategies and reproductive success throughout the full text. Furthermore, we have also added a “Limitations of the Study” section to address the shortcomings of this study in the revised manuscript.

      I do not have the same level of concern over the interpretation of Fig. 6A-C, because this is directly linked to aggressive interactions. Even if the socially isolated males do not engage in tussling, it is not a leap to assume that a different fighting tactic of socially experienced males can give them an advantage in defending a territory. To me, this is a sufficient ethological link with the observed behavioral change.

      Thank you for this insightful comment.

      The following are relatively minor, although important, concerns.

      I beg to differ over the authors' definition of "tussling". Supplemental movies S1 and S2 appear to include "tussling" bouts in which 2 flies lunging at each other in rapid succession, and supplemental movie S3 appears to include bouts of "holding", in which one fly holds the opponent's wings and shakes vigorously. These cases suggest that the definition of "tussling" as opposed to "lunging" has a subjective element. However, I would not delve on this matter further because it is impossible to be completely objective over behavioral classification, even by using a computational method. An important point is that the definition is applied consistently within the publication. I have no reason to doubt that this was not the case.

      Thank you for this comment. Since the analysis of tussling behavior was conducted manually, it is challenging to achieve complete objectivity. However, we made every effort to apply consistent criteria throughout the analysis. We have added a “Limitations of the Study” section in the revised manuscript to clearly state this caveat. We appreciate your understanding.

      Authors now state that "all tester flies were loaded by cold anesthesia" (lines 432-433). I would like to draw attention to the well-known fact that anesthesia, whether by ice or by CO2, are long known to affect fly's subsequent behaviors (for aggression, see Trannoy S. et al., Learn. Mem. 2015. 22: 64-68). It will be prudent to acknowledge the possibility that this handling method could have contributed to unusually high levels of spontaneous tussling, which has not been reported elsewhere before.

      Thank you for this comment. The increased tussling behavior observed in our study is unlikely due to cold anesthesia, as noted by Trannoy S. et al. (2015), cold anesthesia profoundly reduces locomotion and general aggressiveness in flies. We acknowledge that the use of cold anesthesia in behavioral experiments may have potential effects on aggression. To minimize this influence, we allowed the flies to recover and adapt for at least 30 minutes before behavioral recording. Moreover, both control and experimental groups were treated in exactly the same manner to ensure consistency.

      It is intriguing that pC1SS2 neurons are dsx+ but fru-. Authors convincingly demonstrated that these neurons are clearly distinct from the P1a neurons, a well-characterized hub for male social behaviors. It is possible that pC1SS2 neurons overlap with previously characterized dsx+ neurons that are important for male aggressions (measured by lunges), such as in Koganezawa et al., Curr. Biol. 2016 and Chiu et al., Cell 2020, a point authors could have explicitly raised.

      Thank you for this comment. We have added this point into the Discussion section of the revised manuscript, as follows: “That tussling-promoting… aggression (Koganezawa et al., 2016). Moreover, the anatomical features of pC1<sup>SS2</sup> neurons are highly similar to the male-specific aggression-promoting (MAP) neurons identified by another previous study (Chiu et al., 2021).

      I acknowledge the authors' courage to initiate an investigation to a less characterized, high intensity fighting behavior. Tussling requires the simultaneous engagement of two flies. Even if there are confusion over the distinction between lunges and tussling, authors' conclusion that socially experienced flies and socially isolated flies employ distinct fighting strategy is convincing. The concern I raised above is about the interpretation of the data, not about the quality of data.

      Thank you for your constructive comments to make this manuscript better.

    1. You may receive an assignment prompt that asks you to write from your memory, recapturing the experience of reading a special book or text from your childhood or adolescence. Think of this as a chance to recapture something significant from your past, to explore its importance, and to reconstruct it in writing for others to appreciate. Certain books we’ve read live in our memories. When we first read these books or when they were read to us, they spoke to us in some important way. They may still speak to us. Find a book that played an important role in your life when you were a child or an adolescent. Why was it important? What was it like to read this book? Did you read it on your own or did someone read it to you? If someone read it to you, who was it, and what was the experience like? Is there a connection between this book and learning to read on your own? Re-read the book. (If it is long, like Little Women, for example, it is all right to skim it, although you may find yourself re-reading certain parts.) In your essay, use the book as a springboard for your writing by focusing on an insight (a discovery) you have made about the book. Be sure to cite passages and tell the effect they had on you. As you shape your drafts, give attention to organization, the way you build your story. Decide what the reader needs to know in the beginning, and think about the order the events happened and how much to tell the reader at each point. Give attention also to the pictures you create: try to reconstruct key moments by showing what happened rather than merely telling that it happened. Dialogue and scene descriptions often help to make those moments come alive. Finally, give careful thought to the story’s theme or controlling idea.

      brainstorm on how to wirte a narrative

    1. Author response:

      Reviewer #1 (Public Review):

      Summary:

      The present study evaluates the role of visual experience in shaping functional correlations between extrastriate visual cortex and frontal regions. The authors used fMRI to assess "resting-state" temporal correlations in three groups: sighted adults, congenitally blind adults, and neonates. Previous research has already demonstrated differences in functional correlations between visual and frontal regions in sighted compared to early blind individuals. The novel contribution of the current study lies in the inclusion of an infant dataset, which allows for an assessment of the developmental origins of these differences.

      The main results of the study reveal that correlations between prefrontal and visual regions are more prominent in the blind and infant groups, with the blind group exhibiting greater lateralization. Conversely, correlations between visual and somato-motor cortices are more prominent in sighted adults. Based on these data, the authors conclude that visual experience plays an instructive role in shaping these cortical networks. This study provides valuable insights into the impact of visual experience on the development of functional connectivity in the brain.

      Strengths:

      The dissociations in functional correlations observed among the sighted adult, congenitally blind, and neonate groups provide strong support for the study's main conclusion regarding experience-driven changes in functional connectivity profiles between visual and frontal regions.

      In general, the findings in sighted adult and congenitally blind groups replicate previous studies and enhance the confidence in the reliability and robustness of the current results.

      Split-half analysis provides a good measure of robustness in the infant data.

      Weaknesses:

      There is some ambiguity in determining which aspects of these networks are shaped by experience.

      This uncertainty is compounded by notable differences in data acquisition and preprocessing methods, which could result in varying signal quality across groups. Variations in signal quality may, in turn, have an impact on the observed correlation patterns.

      The study's findings could benefit from being situated within a broader debate surrounding the instructive versus permissive roles of experience in the development of visual circuits.

      Reviewer #2 (Public Review):

      Summary:

      Tian et al. explore the developmental organs of cortical reorganization in blindness. Previous work has found that a set of regions in the occipital cortex show different functional responses and patterns of functional correlations in blind vs. sighted adults. In this paper, Tian et al. ask: how does this organization arise over development? Is the "starting state" more like the blind pattern, or more like the adult pattern? Their analyses reveal that the answer depends on the particular networks investigated; some functional connections in infants look more like blind than sighted adults; other functional connections look more like sighted than blind adults; and others fall somewhere in the middle, or show an altogether different pattern in infants compared with both sighted and blind adults. 

      Strengths:

      The question raised in this paper is extremely important: what is the starting state in development for visual cortical regions, and how is this organization shaped by experience? This paper is among the first to examine this question, particularly by comparing infants not only with sighted adults but also blind adults, which sheds new light on the role of visual (and cross-modal) experience. Another clear strength lies in the unequivocal nature of many results. Many results have very large effect sizes, critical interactions between regions and groups are tested and found, and infant analyses are replicated in split halves of the data. 

      Weaknesses:

      A central claim is that "infant secondary visual cortices functionally resemble those of blind more than sighted adults" (abstract, last paragraph of intro). I see two potential issues with this claim. First, a minor change: given the approaches used here, no claims should be made about the "function" of these regions, but rather their "functional correlations". Second (and more importantly), the claim that the secondary visual cortex in general resembles blind more than sighted adults is still not fully supported by the data. In fact, this claim is only true for one aspect of secondary visual area functional correlations (i.e., their connectivity to A1/M1/S1 vs. PFC). In other analyses, the infant secondary visual cortex looks more like sighted adults than blind adults (i.e., in within vs. across hemisphere correlations), or shows a different pattern from both sighted and blind adults (i.e., in occipito-frontal subregion functional connectivity). It is not clear from the manuscript why the comparison to PFC vs. non-visual sensory cortex is more theoretically important than hemispheric changes or within-PFC correlations (in fact, if anything, the within-PFC correlations strike me as the most important for understanding the development and reorganization of these secondary visual regions). It seems then that a more accurate conclusion is that the secondary visual cortex shows a mix of instructive effects of vision and reorganizing effects of blindness, albeit to a different extent than the primary visual cortex.

      Relatedly, group differences in overall secondary visual cortex connectivity are particularly striking as visualized in the connectivity matrices shown in Figure S1. In the results (lines 105-112), it is noted that while the infant FC matrix is strongly correlated with both adult groups, the infant group is nonetheless more strongly correlated with the blind than sighted adults. I am concerned that these results might be at least partially explained by distance (i.e., local spread of the bold signal), since a huge portion of the variance in these FC matrices is driven by stronger correlations between regions within the same system (e.g., secondary-secondary visual cortex, frontal-frontal cortex), which are inherently closer together, relative to those between different systems (e.g., visual to frontal cortex). How do results change if only comparisons between secondary visual regions and non-visual regions are included (i.e., just the pairs of regions within the bold black rectangle on the figure), which limits the analysis to long-rang connections only? Indeed, looking at the off-diagonal comparisons, it seems that in fact there are three altogether different patterns here in the three groups. Even if the correlation between the infant pattern and blind adult pattern survives, it might be more accurate to claim that infants are different from both adult groups, suggesting both instructive effects of vision and reorganizing effects of blindness. It might help to show the correlation between each group and itself (across independent sets of subjects) to better contextualize the relative strength of correlations between the groups. 

      It is not clear that differences between groups should be attributed to visual experience only. For example, despite the title of the paper, the authors note elsewhere that cross-modal experience might also drive changes between groups. Another factor, which I do not see discussed, is possible ongoing experience-independent maturation. The infants scanned are extremely young, only 2 weeks old. Although no effects of age are detected, it is possible that cortex is still undergoing experience-independent maturation at this very early stage of development. For example, consider Figure 2; perhaps V1 connectivity is not established at 2 weeks, but eventually achieves the adult pattern later in infancy or childhood. Further, consider the possibility that this same developmental progression would be found in infants and children born blind. In that case, the blind adult pattern may depend on blindness-related experience only (which may or may not reflect "visual" experience per se). To deal with these issues, the authors should add a discussion of the role of maturation vs. experience and temper claims about the role of visual experience specifically (particularly in the title). 

      The authors measure functional correlations in three very different groups of participants and find three different patterns of functional correlations. Although these three groups differ in critical, theoretically interesting ways (i.e., in age and visual/cross-modal experience), they also differ in many uninteresting ways, including at least the following: sampling rate (TR), scan duration, multi-band acceleration, denoising procedures (CompCor vs. ICA), head motion, ROI registration accuracy, and wakefulness (I assume the infants are asleep).

      Addressing all of these issues is beyond the scope of this paper, but I do feel the authors should acknowledge these confounds and discuss the extent to which they are likely (or not) to explain their results. The authors would strengthen their conclusions with analyses directly comparing data quality between groups (e.g., measures of head motion and split-half reliability would be particularly effective).

      Response #1: We appreciate the reviewer’s comments. In response, we have revised the paper to provide a more balanced summary of the data and clarified in the introduction which signatures the paper focuses on and why. Additionally, we have included several control analyses to account for other plausible explanations for the observed group differences. Specifically, we randomly split the infant dataset into two halves and performed split-half cross-validation. Across all comparisons, the results from the two halves were highly similar, suggesting that the effects are robust (see Supplementary Figures S3 and S4).

      Furthermore, we compared the split-half noise ceiling across the groups (infants, sighted adults, and blind adults) and found no significant differences between them (details in response #6). Finally, we repeated our analysis after excluding infants with a radiology score of 4 or 5, and the results remained consistent, indicating that our findings are not confounded by potential brain anomalies (details in response #2).

      We hope these control analyses help strengthen our conclusions.

      Reviewer #3 (Public Review):

      Summary:

      This study aimed to investigate whether the differences observed in the organization of visual brain networks between blind and sighted adults result from a reorganization of an early functional architecture due to blindness, or whether the early architecture is immature at birth and requires visual experience to develop functional connections. This question was investigated through the comparison of 3 groups of subjects with resting-state functional MRI (rs-fMRI). Based on convincing analyses, the study suggests that: 1) secondary visual cortices showed higher connectivity to prefrontal cortical regions (PFC) than to non-visual sensory areas (S1/M1 and A1) in sighted infants like in blind adults, in contrast to sighted adults; 2) the V1 connectivity pattern of sighted infants lies between that of sighted adults (stronger functional connectivity with non-visual sensory areas than with PFC) and that of blind adults (stronger functional connectivity with PFC than with non-visual sensory areas); 3) the laterality of the connectivity patterns of sighted infants resembled those of sighted adults more than those of blind adults, but sighted infants showed a less differentiated fronto-occipital connectivity pattern than adults.

      Strengths:

      The question investigated in this article is important for understanding the mechanisms of plasticity during typical and impaired development, and the approach considered, which compares different groups of subjects including, neonates/infants and blind adults, is highly original.

      -Overall, the analyses considered are solid and well-detailed. The results are quite convincing, even if the interpretation might need to be revised downwards, as factors other than visual experience may play a role in the development of functional connections with the visual system.

      Weaknesses:

      While it is informative to compare the "initial" state (close to birth) and the "final" states in blind and sighted adults to study the impact of post-natal and visual experience, this study does not analyze the chronology of this development and when the specialization of functional connections is completed. This would require investigating when experience-dependent mechanisms are important for the setting- establishment of multiple functional connections within the visual system. This could be achieved by analyzing different developmental periods in the same way, using open databases such as the Baby Connectome Project. Given the early, "condensed" maturation of the visual system after birth, we might expect sighted infants to show connectivity patterns similar to those of adults a few months after birth.

      The rationale for mixing full-term neonates and preterm infants (scanned at term-equivalent age) from the dHCP 3rd release is not understandable since preterms might have a very different development related to prematurity and to post-natal (including visual) experience. Although the authors show that the difference between the connectivity of visual and other sensory regions, and the one of visual and PFC regions, do not depend on age at birth, they do not show that each connectivity pattern is not influenced by prematurity. Simply not considering the preterm infants would have made the analysis much more robust, and the full-term group in itself is already quite large compared with the two adult groups. The current study setting and the analyses performed do not seem to be an adequate and sufficient model to ascertain that "a few weeks of vision after birth is ... insufficient to influence connectivity".

      In a similar way, excluding the few infants with detected brain anomalies (radiological scores higher or equal to 4) would strengthen the group homogeneity by focusing on infants supposed to have a rather typical neurodevelopment. The authors quote all infants as "sighted" but this is not guaranteed as no follow-up is provided.

      Response #2: We appreciate the reviewer’s suggestion. We re-analyzed the infant cohort after excluding all cases with radiological scores ≥4 (n =39 infants excluded). The revised analysis confirmed that the connectivity patterns reported in the main text remain statistically unchanged (see Supplementary Fig. S11). This demonstrates the robustness of our findings to potential confounding effects from potential brain anomalies. We have explicitly clarified this in the revised Methods section (page 14, line 391in the manuscript).

      In our dataset, newborns (average age at scan = 2.79 weeks) have very limited and immature vision. We agree with the reviewer that long-term visual outcomes cannot be guaranteed without follow-up data. The term "sighted infants" was used operationally to distinguish this cohort from congenitally blind populations.

      The post-menstrual age (PMA) at scan of the infants is also not described. The methods indicate that all were scanned at "term-equivalent age" but does this mean that there is some PMA variability between 37 and 41 weeks? Connectivity measures might be influenced by such inter-individual variability in PMA, and this could be evaluated.

      The rationale for presenting results on the connectivity of secondary visual cortices before one of the primary cortices (V1) was not clear to understand. Also, it might be relevant to better justify why only the connectivity of visual regions to non-visual sensory regions (S1-M1, A1) and prefrontal cortex (PFC) was considered in the analyses, and not the ones to other brain regions.

      In relation to the question explored, it might be informative to reposition the study in relation to what others have shown about the developmental chronology of structural and functional long-distance and short-distance connections during pregnancy and the first postnatal months.

      The authors acknowledge the methodological difficulties in defining regions of interest (ROIs) in infants in a similar way as adults. The reliability and the comparability of the ROIs positioning in infants is definitely an issue. Given that brain development is not homogeneous and synchronous across brain regions (in particular with the frontal and parietal lobes showing delayed growth), the newborn brain is not homothetic to the adult brain, which poses major problems for registration. The functional specialization of cortical regions is incomplete at birth. This raises the question of whether the findings of this study would be stable/robust if slightly larger or displaced regions had been considered, to cover with greater certainty the same areas as those considered in adults. And have other cortical parcellation approaches been considered to assess the ROIs robustness (e.g. MCRIB-S for full-terms)?

      Recommendations for the Authors:

      Reviewer #1(Recommendations for the authors):

      Further consideration should be given to the underlying changes in network architecture that may account for differences in functional correlations across groups. An increase (or decrease) in correlation between two regions could signify an increase (decrease) in connection or communication between those regions. Alternatively, it might reflect an increase in communication or connection with a third region, while the physical connections/interactions between the two original regions remain unchanged. These possibilities lead to distinct mechanistic interpretations. For example, there are substantial changes in connectivity during early visual (e.g. Burkhalter A. 1993, Cerebral Cortex) and visuo-motor development (e.g., Csibra et al. 2000 Neuroreport). It's not clear whether increases in communication within the visual network and improvements in visuo-motor behavior (e.g., Yizhar et al. 2023 Frontiers in Neuroscience) wouldn't produce a qualitatively similar pattern of results.

      Relatedly, the within-network correlation patterns between visual ROIs and frontal ROIs appear markedly different between sighted adults and infants (Supplementary Figure S1). To what extent do the differences in long-range correlations between visual and frontal regions reflect these within-network differences in functional organization?

      Response #3: The reviewer is raising some interesting questions about possible mechanisms and network changes. Resting state studies are indeed always subject to possibility that some effects are mediated by a third, unobserved region. Prior whole-cortex connectivity analyses have observed primarily changes in occipito-frontal connectivity in blindness, so there is not a clear cortical ‘third region’ candidate (Deen et al., 2015). However, some thalamic affects have also been observed and could contribute to the phenomenon (Bedny et al., 2011). Resting state changes in correlation between two areas do not imply changes in strength of long-range anatomical connectivity. Indeed, in the current case they may well reflect differential functional coupling, rather than strengthening or weakening of anatomical connections. We now discuss this in the Discussion section on page 12, line 301 as follows:

      “Despite these insights, many questions remain regarding the neurobiological mechanisms underlying experience-based functional connectivity changes and their relationship to anatomical development. Long-range anatomical connections between brain regions are already present in infants—even prenatally—though they remain immature (Huang et al., 2009; Kostović et al., 2019, 2021; Takahashi et al., 2012; Vasung, 2017). Functional connectivity changes may stem from local synaptic modifications within these stable structural pathways, consistent with findings that functional connectivity can vary independently of structural connection strength (Fotiadis et al., 2024). Moreover, functional connectivity has been shown to outperform structural connectivity in predicting individual behavioral differences, suggesting that experience-based functional changes may reflect finer-scale synaptic or network-level modulations not captured by macrostructural measures (Ooi et al., 2022). Prior studies also suggest that, even in adults, coordinated sensory-motor experience can lead to enhancement of functional connectivity across sensory-motor systems, indicating that large-scale changes in functional connectivity do not necessarily require corresponding changes in anatomical connectivity (Guerra-Carrillo et al., 2014; Li et al., 2018).”

      It is not clear how changes in correlation patterns among visual areas would produce the connectivity between visual areas and prefrontal areas reported in the current study. Activity in visual areas drives correlations both among visual areas and between visual and prefrontal areas and the same is true of prefrontal corticies.

      The findings from this study should be more closely linked to the extensive literature surrounding the debate on whether experience plays an instructive or permissive role in visual development (e.g., Crair 1999 Current Opin Neurobiol; Sur et al. 1999 J Neurobiol; Kiorpes 2016 J Neurosci; Stellwagen & Shatz 2002 Neuron; Roy et al. 2020 Nature Communications).

      Response #4: The instructive role suggests that specific experiences or patterns of neural activity directly shape and organize neural circuitry, while the permissive role indicates that such experiences or activity merely enable other factors, such as molecular signals, to influence neural circuit formation(Crair, 1999; Sur et al., 1999). To distinguish whether experience plays an instructive or permissive role, it is essential to manipulate the pattern or information content of neural activity while maintaining a constant overall activity level (Crair, 1999; Roy et al., 2020; Stellwagen & Shatz, 2002). However, both the sighted and blind adult groups have had extensive experience and neural activity in the visual cortices. For the sighted group, activity in the visual cortex is partly driven by bottom-up input from the external environment, through the retina, LGN, and ultimately to the cortex. In contrast, the blind group’s visual cortex activity is partially driven by top-down input from non-visual networks. The precise role of this activity in shaping the observed connectivity patterns remains unclear. Although our study cannot speak to this issue directly, we now link to the relevant literature on page 12,line 320 of the manuscript in the Discussion section as follows:

      “The current findings reveal both effects of vision and effects of blindness on the functional connectivity patterns of the visual cortex. A further open question is whether visual experience plays an instructive or permissive role in shaping neural connectivity patterns. An instructive role suggests that specific sensory experiences or patterns of neural activity directly shape and organize neural circuitry. In contrast, a permissive role implies that sensory experience or neural activity merely facilitates the influence of other factors—such as molecular signals—on the formation and organization of neural circuits (Crair, 1999; Sur et al., 1999). Studies with animals that manipulate the pattern or informational content of neural activity while keeping overall activity levels constant could distinguish between these hypotheses (Crair, 1999; Roy et al., 2020; Stellwagen & Shatz, 2002).”

      The assertion that a few weeks of vision after birth is insufficient to influence connectivity is provocative. Though supported by the study's results, it would benefit from integration with research in animal models showing considerable malleability of networks from early experience (e.g., Akerman et al. 2002 Neuron; Li et al. 2006 Nature Neuroscience; Stacy et al. 2023 J Neuroscience).

      Response #5: We thank the reviewer for their suggestion. The present study found that several weeks of postnatal visual experience is insufficient to significantly alter the long-term connectivity patterns of the visual cortices. While animal studies have shown that acute visual experience, or even exposure to visual stimuli through unopened eyelids, can robustly influence visual system development(Akerman et al., 2002; Li et al., 2008; Van Hooser et al., 2012). We think this discrepancy may be attributed to the substantial differences in developmental timelines between species. The human lifespan is much longer, and so is the human critical period, making it unclear how to map duration from one species to another. We briefly touched upon the time course issue in page 11 line 289 in the Discussion section as follows:

      “The present results reveal the effects of experience on development of functional connectivity between infancy and adulthood, but do not speak to the precise time course of these effects. Infants in the current sample had between 0 and 20 weeks of visual experience. Comparisons across these infants suggests that several weeks of postnatal visual experience is insufficient to produce a sighted-adult connectivity profile. The time course of development could be anywhere between a few months and years and could be tested by examining data from children of different ages.”

      Substantial differences between the groups are evident in several key aspects of the study, including the number of subjects, brain sizes, imaging parameters, and data preprocessing, all of which are likely to have an impact on the overall signal quality. To clarify how these differences might have impacted correlation differences between groups, it would be essential to include information on the noise ceilings for each correlation analysis within each group.

      Response #6: We thank the reviewer for their suggestion. We now report the split-half noise ceiling for adult and infant groups. For each participant, we first split the rs-fMRI time series into two halves, then calculated the ROI-wise rsFC pattern from the two splits. The split-half noise ceiling was estimated according to Lage-Castellanos et al (2019). The noise ceilings of the three groups (infants: 0.90 ± 0.056,blind adults: 0.88 ± 0.041, sighted adults: 0.90 ± 0.055) showed no significant difference (One-way ANOVA<sub>,</sub> F(2,552) = 2.348, p = 0.097). Therefore, we believe that overall signal quality is unlikely to impact our results. We also add the relevant context in the Method section in page 16 Line 447 as follows:

      “Substantial differences between the groups exist in this study, including the number of subjects, brain sizes, imaging parameters, and data preprocessing, all of which are likely to have an impact on the overall signal quality. To address this concern, we compared the split-half noise ceiling across the groups (infants, sighted adults, and blind adults). For each participant, we first split the rs-fMRI time series into two halves, then calculated the ROI-wise rsFC pattern from the two splits. The split-half noise ceiling was estimated according to Lage-Castellanos et al (Lage-Castellanos et al., 2019). The noise ceilings of the three groups (infants: 0.90 ± 0.056, blind adults: 0.88 ± 0.041, sighted adults: 0.90 ± 0.055) showed no significant difference (One-way ANOVA, F (2,552) = 2.348, p = 0.097). Therefore, overall signal quality is unlikely to impact our results.”

      In general, it appears that the infant correlations are stronger compared to the other groups. While this could reflect increased coherence or lack of differentiation, it is also possible that it is simply due to the presence of a non-neuronal global signal. Such a signal has the potential to substantially limit the effective range of functional correlations and comparisons with adults. To address this, it is advisable to conduct control analyses aimed at assessing and potentially removing global signals.

      Response #7: We agree with the reviewer that global signal regression (GSR) may help reduce non-neuronal artifacts, such as motion, cardiac, and respiratory signals, which are known to correlate with the global signal. However, the global signal also contains neural signals from gray matter, and removing it can introduce unwanted artifacts, especially for the current study. First, GSR can reduce the physiological accuracy of functional connectivity (FC); second, GSR may have differential effects across groups, potentially introducing additional artifacts in between-group comparisons, as noted by Murphy et al (Murphy & Fox, 2017). The CompCor method (Behzadi et al., 2007; Whitfield-Gabrieli & Nieto-Castanon, 2012) is capble to estimate the global non-neuronal artifacts like the GSR method. Meanwhile as it estimate global non-neuronal artifacts from signals within the white matter (WM) and cerebrospinal fluid (CSF) masks, but not the gray matter (GM), CompCor could introduce minimal unwanted bias to the GM signal.

      Was there a difference in correlations for preterm vs term neonates? Recent research has suggested that preterm births can have an impact on functional networks, particularly in frontal cortices. e.g., Tokariev et al. 2019, Li et al. 2021 elife; Zhang et al. 2022 Fronteirs in Neuroscience.

      Response #8: We have compared preterm and term neonates for all the main results, including the connectivity from the secondary visual cortex/V1 to non-visual sensory cortices versus prefrontal cortices, the laterality of occipito-frontal connectivity, and the specialization across different fronto-occipital networks. This information is reported in Page 6 line 169 and Supplementary Figure S7. The connectivities of full-term infants are generally higher than those of preterm infants. However, the connectivity patterns of term and preterm infants are very similar.

      The consistency between the current results and prior work (e.g., Burton et al. 2014) is notable, particularly in the observed greater correlations in prefrontal regions and weaker correlations in somato-motor regions for early blind individuals compared to sighted. However, almost all visual-frontal correlations in both groups were negative in that prior study. Some discussion on why positive correlations were found in the current study could help to clarify.

      Response #9: Many other papers have reported positive correlations similar to those found in our study (e.g., Deen et al., 2015; Kanjlia et al., 2021). In contrast, Burton's study identified predominantly negative visual-frontal correlations, we think this is likely because the global signal was regressed out during preprocessing. This methodological choice can lead to an increase in negative connections (Murphy & Fox, 2017).

      The term "secondary visual areas" used throughout the paper lacks specificity, and its usage in terms of underlying anatomical and functional areas has been inconsistent in the literature. It would be advisable to adopt a more precise characterization based on functional and/or anatomical criteria.

      Response #10: We specified in the article that Tthe occipital ROIs were defined in the current study are functional areas in people born blind identified in prior studies as regions that respond to three non-visual tasks such as language, math, or executive function, and show functional connectivity changes in blind adults in previous studies (Kanjlia et al., 2016, 2021; Lane et al., 2015). These regions respond to language, math and executivie function in the congenitally blind population (see Figure 1.) The are refered collectively as ‘secondary visual areas’ to destinguish them from V1. Anatomically, these three regions cover the majority of the lateral occipital cortex and part of the ventral occipital cortex, providing a good sample of the connectivity profile of higher-order visual areas. Thus, we are using the term "secondary visual areas" to refer to these regions. In blind individuals, although these regions respond to non-visual tasks, their exact functions are unknown.

      The inclusion of the ventral temporal cortex in the visual ROIs is currently only depicted in Supplementary Figure S7. To enhance the clarity of the areas of interest analyzed, it would be advisable to illustrate the ventral temporal areas in the main text. Were there notable differences in the frontal correlations between the lateral occipital visual areas and ventral temporal areas?

      Response #11: We thank the reviewer for pointing out this issue. We added a statement about the ventral visual cortex in describing the location of the ROI and added the ventral view of ROIs in the Figure 1. The language-responsive and math -responsive ROIs covers both the lateral and ventral visual cortex, whereas executive function (response-conflict) regions cover only the lateral visual cortex. We compared the connectivity patterns of these three regions and found no differences (see supplementary Fig S2).

      The blind group results are characterized as reflecting a reorganization in comparison to sighted adults while the results for sighted adults compared to infants are discussed more as a maturation ("adult pattern isn't default but requires experience to establish"). Both the sighted and blind adult groups showed differences from the infant group, and these differences are attributed to the role of experience. Why use "reorganization" for one result and maturation for another?

      Response #12: We agree with the reviewer that both of the adult groups should be thought of as equal in relation to the infants. In other words, the brain develops under one set of experiential conditions or another. We do not think that the adult sighted pattern reflects maturation. Rather, the sighted adult pattern reflects the combined influence of maturation and visual experience. The adult blind pattern reflects the combined influence of maturation and blindness. We use the term ‘reorganization’ to label differences in the blind adults relative to sighted infants. We do so for the purpose of clarity and to remain consistent with terminology in prior liaterature. However, we agree with the reviewer that the blind group does not reflect ‘reorganization’ intrinsically any more than the sighted adult group.

      The statement that "visual experience is required to set up long-range functional connectivity" is unclear, especially since the infant and blind groups showed stronger long-range functional correlations with PFC.

      Response #13: We revised this sentence to specifically as “visual experience establishes elements of the sighted-adult long-range connectivity” in tha Abstract line 17.

      The statement that the visual ROIS roughly correspond to "the anatomical location of areas such as V5/MT+, LO, V3a, and V4v" appears imprecise. From Supplementary Figure S7, these areas cover anterior portions of ventral temporal cortex (do these span the anatomical location of putative category-selective areas?) and into the intraparietal sulcus.

      Response #14: Thanks to the reviewer for the clarification. The ventral ROIs cover the middle and part of the anterior portion of the ventral temporal lobe, including the putative category-selective areas. Additionally, the dorsal ROIs extend beyond the occipital lobe to the intraparietal sulcus and superior parietal lobule. We have added a more detailed description of the anatomical location of the ROI in the Methods section Page 17 line 489 as follows:

      “Each functional ROI spans multiple anatomical regions and together the secondary visual ROIs tile large portions of lateral occipital, occipito-temporal, dorsal occipital and occipito-parietal cortices. In sighted people, the secondary visual occipital ROIs include the anatomical locations of functional regions such as motion area V5/MT+, the lateral occipital complex (LO), category specific ventral occipitotemporal cortices and dorsally, V3a and V4v.  The occipital ROI also covers the middle of the ventral temporal lobe. Dorsally, it extended to the intraparietal sulcus and superior parietal lobule.”

      The motivation for assessing correlations with motor and frontal regions was briefly discussed in the introduction. It would be helpful to reiterate this motivation when first introducing the analyses in the results.

      Response #15: Thank you for the thoughtful suggestion. Upon reflection, we chose to substantially revise the Introduction to more clearly and comprehensively explain the rationale for examining the couplings with motor and frontal regions, rather than reiterating it in the Results section. We believe this revised framing provides a stronger foundation for the analyses that follow, while avoiding redundancy across sections. We hope this addresses the reviewer’s concern.

      Reviewer #2 (Recommendations for the authors):

      Congratulations on a well-written paper and an interesting set of results.

      Reviewer #3 (Recommendations for the authors):

      Abstract:

      Mentioning "sighted infants" does not seem adequate.

      Response #16: In our dataset, newborns (average age at scan = 2.79 weeks) have very limited and immature vision. We agree with the reviewer that long-term visual outcomes cannot be guaranteed without follow-up data. The term "sighted infants" was used operationally to distinguish this cohort from congenitally blind populations.

      In sentences after "Specifically...", it was not clear whether the authors referred to V1 connectivity.

      Response #17: We thank the reviewer for this comment. In the revised abstract, we have removed the original "Specifically..." phrasing and clarified the results.

      Introduction

      Talking about the "instructive effects" of vision might be confusing or misleading. Visual experiences like exposure to oral language are part of the normal/spontaneous environment that allows the infant behavioral acquisitions (contrarily with learnings that occur later during development with instruction like for reading).

      Response #18: We appreciate the reviewer’s concern and would like to clarify that the term “instructive effect” is used here derived from neurodevelopmental studies (Crair, 1999; Sur et al., 1999). In this context, “instructive” refers to activity-dependent mechanisms where patterns of neural activity actively guide the organization of synaptic connectivity, emphasizing that spontaneous or sensory-driven activity (e.g., retinal waves, visual experience) can directly shape circuit refinement, as seen in ocular dominance column formation. In the context of our study, we emphasize that vision plays an instructive role in setting up the balance of connectivity between occipital cortex and non-visual networks.

      For references on the development of connectivity, I would advise citing MRI studies but also studies based on histological approaches (see for example the detailed review by Kostovic et al, NeuroImage 2019).

      Response #19: We thank the reviewer for this suggestion. We have incorporated a discussion on the long-range anatomical connections that emerge as early as infancy, referencing studies that employed diffusion MR imaging and histological methods, as detailed below.

      “Many long-range anatomical connections between brain regions are already established in infants, even before birth, although they are not yet mature (Huang et al., 2009; Kostović et al., 2019, 2021; Takahashi et al., 2012; Vasung, 2017).” (Page 12, line 303 in the manuscript)

      Results

      P7 l170: It might be helpful to be precise that this is "compared with inter-hemispheric connectivity".

      Response #20: We thank the reviewer for this suggestion. To align with our established terminology, we have revised the statement to explicitly contrast within-hemisphere connectivity with between-hemisphere connectivity. The modified text now reads (page 7, line 183 in the manuscript):

      “Compared to sighted adults, blind adults exhibited a stronger dominance of within-hemisphere connectivity over between-hemisphere connectivity. That is, in people born blind, left visual networks are more strongly connected to left PFC, whereas right visual networks are more strongly connected to right PFC.

      L176-181: It was not clear to me what was the difference between "across" and "between hemisphere connectivity". Would it be informative to test the difference between blind and sighted adults?

      Response #21: We clarify that there is no distinction between the terms “across” and “between hemisphere connectivity”—they refer to the same concept. To ensure consistency, we have revised the text to exclusively use “between hemisphere connectivity” throughout the manuscript. Regarding the comparison between blind and sighted adults, we conducted statistical comparisons between these groups in our analysis, and the results have been incorporated into the revised version (Page 7, line 187 in the manuscript).

      Adding statistics on Figure 3, but also on Figures 1 and 2 might help the reading.

      Response #22: We have added the statistics in Figure 1-4.

      Adding the third comparison in Figure 4 would be possible in my view.

      Response #23: We explored integrating the response-conflict region into Figure 4, but this would require a 3x3 bar chart with pairwise statistical significance markers, which introduced excessive visual complexity that hindered readers’ ability to grasp our intended message. To ensure clarity, we retained the original Figure 4 while providing the complete three-region analysis (including all statistical comparisons) in Supplementary Figure S8 to ensure completeness.

      Methods

      The authors might have to specify ages at birth, and ages at scan (median + range?).

      Response #24: We have added that information in the Methods section as follows:

      “The average age from birth at scan = 2.79 weeks (SD = 3.77, median = 1.57, range = 0 – 19.71); average gestational age at scan = 41.23 weeks (SD = 1.77, median = 41.29, range = 37 – 45.14); average gestational age at birth = 38.43 weeks (SD = 3.73, median = 39.71, range = 23 – 42.71).” (Page 14, line 379 in the manuscript)

      It might be relevant to comment on the range of available fMRI volumes, and the fact that connectivity measures might then be less robust in infants.

      Response #25: We report the range of fMRI volumes in the Methods section (Page 16, Line 449). Adult participants (blind and sighted) underwent 1–4 scanning sessions, each containing 240 volumes (mean scan duration: 710.4 seconds per participant). For infants, all subjects had 2300 fMRI volumes, and we retained a subset of 1600 continuous volumes per subject with the minimum number of motion outliers. While infant connectivity measures may inherently exhibit lower robustness due to developmental and motion-related factors, our infant cohort’s large sample size (n=475) and stringent motion censoring criteria enhance the reliability of group-level inferences. We have integrated this clarification into the Methods section (Page 16, Line 444) as follows:

      "While infant connectivity estimates may be less robust at the individual level compared to adults due to shorter scan durations and higher motion, our cohort’s large sample size (n=475) and rigorous motion censoring mitigate these limitations for group-level analyses. "

      The mention of dHCP 2nd release should be removed from the paragraph on data availability.

      Response #26: We have removed it.

    1. Reviewer #3 (Public review):

      A bias in how people infer the amount of control they have over their environment is widely believed to be a key component of several mental illnesses including depression, anxiety, and addiction. Accordingly, this bias has been a major focus in computational models of those disorders. However, all of these models treat control as a unidimensional property, roughly, how strongly outcomes depend on action. This paper proposes---correctly, I think---that the intuitive notion of "control" captures multiple dimensions in the relationship between action and outcome. In particular, the authors identify one key dimension: the degree to which outcome depends on how much *effort* we exert, calling this dimension the "elasticity of control". They additionally argue that this dimension (rather than the more holistic notion of controllability) may be specifically impaired in certain types of psychopathology. This idea has the potential to change how we think about several major mental disorders in a substantial way, and can additionally help us better understand how healthy people navigate challenging decision-making problems. More concisely, it is a *very good idea*.

      The more concrete contributions, however, are not as strong. In particular, evidence for the paper's most striking claims is weak. Quoting the abstract, these claims are (1) "the elasticity of control [is] a distinct cognitive construct guiding adaptive behavior" and (2) "overestimation of elasticity is associated with elevated psychopathology involving an impaired sense of control."

      Main issues

      I'll highlight the key points.

      - The task cannot distinguish elasticity inference from general learning processes

      - Participants were explicitly instructed about elasticity, with labeled examples

      - The psychopathology claims rely on an invalid interpretation of CCA, and are contradicted by simple correlations (elasticity bias and the sense of agency scale is r=0.03)

      Distinct construct

      Starting with claim 1, there are three subclaims here. (1A) People's behavior is sensitive to differences in elasticity; (1B) there are mental processes specific to elasticity inference, i.e., not falling out of general learning mechanisms; and, implicitly, (1C) people infer elasticity naturally as they go about their daily lives. The results clearly support 1A. However, 1B and 1C are not well supported.

      (1B) The data cannot support the "distinct cognitive construct" claim because the task is too simple to dissociate elasticity inference from more general learning processes (also raised by Reviewer 1). The key behavioral signature for elasticity inference (vs. generic controllability inference) is the transfer across ticket numbers, illustrated in Fig 4. However, this pattern is also predicted by a standard Bayesian learner equipped with an intuitive causal model of the task. Each ticket gives you another chance to board and the agent infers the probability that each attempt succeeds. Crucially, this logic is not at all specific to elasticity or even control. An identical model could be applied to inferring the bias of a coin from observations of whether any of N tosses were heads-a task that is formally identical to this one (at least, the intuitive model of the task; see first minor comment).

      Importantly, this point cannot be addressed by showing that the author's model fits data better than this or any other specific Bayesian model. It is not a question of whether one particular updating rule explains data better than another. Rather, it is a question of whether the task can distinguish between biases in *elasticity* inference versus biases in probabilistic inference more generally. The present task cannot make this distinction because it does not make separate measurements of the two types of inference. To provide compelling evidence that elasticity inference is a "distinct cognitive construct", one would need to show that there are reliable individual differences in elasticity inference that generalize across contexts but do not generalize to computationally similar types of probabilistic inference (e.g. the coin flipping example).

      (1C) The implicit claim that people infer elasticity outside of the experimental task is undermined by the experimental design. The authors explicitly tell people about the two notions of control as part of the training phase: "To reinforce participants' understanding of how elasticity and controllability were manifested in each planet, [participants] were informed of the planet type they had visited after every 15 trips."

      In the revisions, the authors seem to go back and forth on whether they are claiming that people infer elasticity without instruction (I won't quote it here). I'll just note that the examples they provide in the most recent rebuttal are all cases in which one never receives explicit labels about elasticity. If people only infer elasticity when it is explicitly labeled, I struggle to see its relevance for understanding human cognition and behavior.

      Psychopathology

      Finally, I turn to claim 2, that "overestimation of elasticity is associated with elevated psychopathology involving an impaired sense of control." The CCA analysis is in principle unable to support this claim. As the authors correctly note in their latest rebuttal, the CCA does show that "there is a relationship between psychopathology traits and task parameters". The lesion analysis further shows that "elasticity bias specifically contributes to this relationship" (and similarly for the Sense of Agency scale). Crucially, however, this does *not* imply that there is a relationship between those two variables. The most direct test of that relationship is the simple correlation, which the authors report only in a supplemental figure: there is no relationship (r=0.03). Although it is of course possible that there is a relationship that is obscured by confounding variables, the paper provides no evidence-statistical or otherwise-that such a relationship exists.

      Minor comments

      The statistical structure of the task is inconsistent with the framing. In the framing, participants can make either one or two second boarding attempts (jumps) by purchasing extra tickets. The additional attempt(s) will thus succeed with probability p for one ticket and 2p - p^2 for two tickets; the p^2 captures the fact that you only take the second attempt if you fail on the first. A consequence of this is buying more tickets has diminishing returns. In contrast, in the task, participants always jumped twice after purchasing two tickets, and the probability of success with two tickets was exactly double that with one ticket. Thus, if participants are applying an intuitive causal model to the task, the researcher could infer "biases" in elasticity inference that are probably better characterized as effective use of prior information (encoded in the causal model).

      The model is heuristically defined and does not reflect Bayesian updating. For example, it over-estimates maximum control by not using losses with less than 3 tickets (intuitively, the inference here depends on what your beliefs about elasticity). Including forced three-ticket trials at the beginning of each round makes this less of an issue; but if you want to remove those trials, you might need to adjust the model. The need to introduce the modified model with kappa is likely another symptom of the heuristic nature of the model updating equations.

    2. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      This research takes a novel theoretical and methodological approach to understanding how people estimate the level of control they have over their environment and how they adjust their actions accordingly. The task is innovative and both it and the findings are well-described (with excellent visuals). They also offer thorough validation for the particular model they develop. The research has the potential to theoretically inform understanding of control across domains, which is a topic of great importance.

      We thank the Reviewer for their favorable appraisal and valuable suggestions, which have helped clarify and strengthen the study’s conclusion. 

      In its revised form, the manuscript addresses most of my previous concerns. The main remaining weakness pertains to the analyses aimed at addressing my suggesting of Bayesian updating as an alternative to the model proposed by the authors. My suggestion was to assume that people perform a form of function approximation to relate resource expenditure to success probability. The authors performed a version of this where people were weighing evidence for a few canonical functions (flat, step, linear), and found that this model underperformed theirs. However, this Bayesian model is quite constrained in its ability to estimate the function relating resources. A more robust test would be to assume a more flexible form of updating that is able to capture a wide range of distributions (e.g., using basis functions, gaussian processes, or nonparametric estimators); see, e.g., work by Griffiths on human function learning). The benefit of testing this type of model is that it would make contact with a known form of inference that individuals engage in across various settings and therefore could offer a more parsimonious and generalizable account of function learning, whereby learning of resource elasticity is a special case. I defer to the authors as to whether they'd like to pursue this direction, but if not I think it's still important that they acknowledge that they are unable to rule out a more general process like this as an alternative to their model. This pertains also to inferences about individual differences, which currently hinge on their preferred model being the most parsimonious.

      We thank the Reviewer for this thoughtful suggestion. We acknowledge that more flexible function learning approaches could provide a stronger test in favor of a more general account. Our Bayesian model implemented a basis function approach where the weights of three archetypal functions (flat, step, linear) are learned from experience Testing models with more flexible basis functions would likely require a task with more than three levels of resource investment (1, 2, or 3 tickets). This would make an interesting direction for future work expanding on our current findings. We now incorporate this suggestion in more detail in our updated manuscript (335-341):

      “Second, future models could enable generalization to levels of resource investment not previously experienced. For example, controllability and its elasticity could be jointly estimated via function approximation that considers control as a function of invested resources. Although our implementation of this model did not fit participants’ choices well (see Methods), other modeling assumptions drawn from human function learning [30] or experimental designs with continuous action spaces may offer a better test of this idea.”

      Reviewer #2 (Public review):

      This research investigates how people might value different factors that contribute to controllability in a creative and thorough way. The authors use computational modeling to try to dissociate "elasticity" from "overall controllability," and find some differential associations with psychopathology. This was a convincing justification for using modeling above and beyond behavioral output and yielded interesting results. Notably, the authors conclude that these findings suggest that biased elasticity could distort agency beliefs via maladaptive resource allocation. Overall, this paper reveals important findings about how people consider components of controllability. The authors have gone to great lengths to revise the manuscript to clarify their definitions of "elastic" and "inelastic" and bolster evidence for their computational model, resulting in an overall strong manuscript that is valuable for elucidating controllability dynamics and preferences. 

      We thank the Reviewer for their constructive feedback throughout the review process, which has substantially strengthened our manuscript and clarified our theoretical framework.

      One minor weakness is that the justification for the analysis technique for the relationships between the model parameters and the psychopathology measures remains lacking given the fact that simple correlational analyses did not reveal any significant associations.

      We note that the existence of bivariate relationships is not a prerequisite for the existence of multivariate relationships. Conditioning the latter on the former, therefore, would risk missing out on important relationships existing in the data. Ultimately, correlations between pairs of variables do not offer a sensitive test for the general hypothesis that there is a relationship between two sets of variables. As an illustration, consider that elasticity bias correlated in our data (r = .17, p<.001) with the difference between SOA (sense of agency) and SDS (self-rating depression). Notably, SOA and SDS were positively correlated (r = .47, p<.001), and neither of them was correlated with elasticity bias (SOA: r=.04 p=.43, SDS: r=-.06, p=.16). It was a dimension that ran between them that mapped onto elasticity bias. This specific finding is incidental and uncorrected for multiple comparisons, hence we do not report it in the manuscript, but it illustrates the kinds of relationships that cannot be accounted for by looking at bivariate relationships alone.  

      Reviewer #3 (Public review):

      A bias in how people infer the amount of control they have over their environment is widely believed to be a key component of several mental illnesses including depression, anxiety, and addiction. Accordingly, this bias has been a major focus in computational models of those disorders. However, all of these models treat control as a unidimensional property, roughly, how strongly outcomes depend on action. This paper proposes---correctly, I think---that the intuitive notion of "control" captures multiple dimensions in the relationship between action and outcome.

      In particular, the authors identify one key dimension: the degree to which outcome depends on how much *effort* we exert, calling this dimension the "elasticity of control". They additionally argue that this dimension (rather than the more holistic notion of controllability) may be specifically impaired in certain types of psychopathology. This idea has the potential to change how we think about several major mental disorders in a substantial way and can additionally help us better understand how healthy people navigate challenging decision-making problems. More concisely, it is a very good idea.

      We thank the Reviewer for their thoughtful engagement with our manuscript. We appreciate their recognition of elasticity as a key dimension of control that has the potential to advance our understanding of psychopathology and healthy decision-making.

      Starting with theory, the authors do not provide a strong formal characterization of the proposed notion of elasticity. There are existing, highly general models of controllability (e.g., Huys & Dayan, 2009; Ligneul, 2021) and the elasticity idea could naturally be embedded within one of these frameworks. The authors gesture at this in the introduction; however, this formalization is not reflected in the implemented model, which is highly task-specific.

      Our formal definition of elasticity, detailed in Supplementary Note 1, naturally extends the reward-based and information-theoretic definitions of controllability by Huys & Dayan (2009) and Ligneul (2021). We now further clarify how the model implements this formalized definition (lines 156-159).

      “Conversely, in the ‘elastic controllability model’, the beta distributions represent a belief about the maximum achievable level of control (𝑎<sub>Control</sub>, 𝑏<sub>Control</sub>) coupled with two elasticity estimates that specify the degree to which successful boarding requires purchasing at least one (𝑎<sub>elastic≥1</sub>, 𝑏<sub>elastic≥1</sub>) or specifically two (𝑎<sub>elastic2</sub>, 𝑏<sub>elastic2</sub>) extra tickets. As such, these elasticity estimates quantify how resource investment affects control. The higher they are, the more controllability estimates can be made more precise by knowing how much resources the agent is willing and able to invest (Supplementary Note 1).”

      Moreover, the authors present elasticity as if it is somehow "outside of" the more general notion of controllability. However, effort and investment are just specific dimensions of action; and resources like money, strength, and skill (the "highly trained birke") are just specific dimensions of state. Accordingly, the notion of elasticity is necessarily implicitly captured by the standard model. Personally, I am compelled by the idea that effort and resource (and therefore elasticity) are particularly important dimensions, ones that people are uniquely tuned to. However, by framing elasticity as a property that is different in kind from controllability (rather than just a dimension of controllability), the authors only make it more difficult to integrate this exciting idea into generalizable models.

      We respectfully disagree that we present elasticity as outside of, or different in kind from, controllability. Throughout the manuscript, we explicitly describe elasticity as a dimension of controllability (e.g., lines 70-72, along many other examples). This is also expressed in our formal definition of elasticity (Supplementary Note 1). 

      The argument that vehicle/destination choice is not trivial because people occasionally didn't choose the instructed location is not compelling to me-if anything, the exclusion rate is unusually low for online studies. The finding that people learn more from non-random outcomes is helpful, but this could easily be cast as standard model-based learning very much like what one measures with the Daw two-step task (nothing specific to control here). Their final argument is the strongest, that to explain behavior the model must assume "a priori that increased effort could enhance control." However, more literally, the necessary assumption is that each attempt increases the probability of success-e.g. you're more likely to get a heads in two flips than one. I suppose you can call that "elasticity inference", but I would call it basic probabilistic reasoning.

      We appreciate the Reviewer’s concerns but feel that some of the more subjective comments might not benefit from further discussion. We only note that controllability and its elasticity are features of environmental structure, so in principle any controllability-related inference is a form of model-based learning. The interesting question is whether people account in their world model for that particular feature of the environment.   

      The authors try to retreat, saying "our research question was whether people can distinguish between elastic and inelastic controllability." I struggle to reconcile this with the claim in the abstract "These findings establish the elasticity of control as a distinct cognitive construct guiding adaptive behavior". That claim is the interesting one, and the one I am evaluating the evidence in light of.

      In real-world contexts, it is often trivial that sometimes further investment enhances control and sometimes it does not. For example, students know that if they prepare more extensively for their exams they will likely be able to achieve better grades, but they also know that there is uncertainty in this regard – their grades could improve significantly, modestly, or in some cases, they might not improve at all, depending on the type of exams their study program administers and the knowledge or skills being tested. Our research question was whether in such contexts people learn from experience the degree to which controllability is elastic to invested resources and adapt their resource investment accordingly. Our findings show that they do. 

      The authors argue for CCA by appeal to the need to "account for the substantial variance that is typically shared among different forms of psychopathology". I agree. A simple correlation would indeed be fairly weak evidence. Strong evidence would show a significant correlation after *controlling for* other factors (e.g. a regression predicting elasticity bias from all subscales simultaneously). CCA effectively does the opposite, asking whether-with the help of all the parameters and all the surveys-one can find any correlation between the two sets of variables. The results are certainly suggestive, but they provide very little statistical evidence that the elasticity parameter is meaningfully related to any particular dimension of psychopathology.

      We agree with the Reviewer on the relationship between elasticity and any particular dimension of psychopathology. The CCA asks a different question, namely, whether there is a relationship between psychopathology traits and task parameters, and whether elasticity bias specifically contributes to this relationship. 

      I am very concerned to see that the authors removed the discussion of this limitation in response to my first review. I quote the original explanation here:

      - In interpreting the present findings, it needs to be noted that we designed our task to be especially sensitive to overestimation of elasticity. We did so by giving participants free 3 tickets at their initial visits to each planet, which meant that upon success with 3 tickets, people who overestimate elasticity were more likely to continue purchasing extra tickets unnecessarily. Following the same logic, had we first had participants experience 1 ticket trips, this could have increased the sensitivity of our task to underestimation of elasticity in elastic environments. Such underestimation could potentially relate to a distinct psychopathological profile that more heavily loads on depressive symptoms. Thus, by altering the initial exposure, future studies could disambiguate the dissociable contributions of overestimating versus underestimating elasticity to different forms of psychopathology.

      The logic of this paragraph makes perfect sense to me. If you assume low elasticity, you will infer that you could catch the train with just one ticket. However, when elasticity is in fact high, you would find that you don't catch the train, leading you to quickly infer high elasticity eliminating the bias. In contrast, if you assume high elasticity, you will continue purchasing three tickets and will never have the opportunity to learn that you could be purchasing only one-the bias remains.

      The authors attempt to argue that this isn't happening using parameter recovery. However, they only report the *correlation* in the parameter, whereas the critical measure is the *bias*. Furthermore, in parameter recovery, the data-generating and data-fitting models are identical-this will yield the best possible recovery results. Although finding no bias in this setting would support the claims, it cannot outweigh the logical argument for the bias that they originally laid out. Finally, parameter recovery should be performed across the full range of plausible parameter values; using fitted parameters (a detail I could only determine by reading the code) yields biased results because the fitted parameters are themselves subject to the bias (if present). That is, if true low elasticity is inferred as high elasticity, then you will not have any examples of low elasticity in the fitted parameters and will not detect the inability to recover them.

      The logic the Reviewer describes breaks down when one considers the dynamics of participants’ resource investment choices. A low elasticity bias in a participant’s prior belief would make them persist for longer in purchasing a single ticket despite failure, as compared to a person without such a bias. Indeed, the ability of the experimental design to demonstrate low elasticity biases is evidenced by the fact that the majority of participants were fitted with a low elasticity bias (μ = .16 ± .14, where .5 is unbiased). 

      Originally, the Reviewer was concerned that elasticity bias was being confounded with a general deficit in learning. The weak inter-parameter correlations in the parameter recovery test resolved this concern, especially given that, as we now noted, the simulated parameter space encompassed both low and high elasticity biases (range=[.02,.76]). Furthermore, regarding the Reviewer's concern about bias in the parameter recovery, we found no such significant bias with respect to the elasticity bias parameter (Δ(Simulated, Recovered)= -.03, p=.25), showing that our experiment could accurately identify low and high elasticity biases.

      The statistical structure of the task is inconsistent with the framing. In the framing, participants can make either one or two second boarding attempts (jumps) by purchasing extra tickets. The additional attempt(s) will thus succeed with probability p for one ticket and 2p – p<sup>^</sup>2 for two tickets; the p<sup>^</sup>2 captures the fact that you only take the second attempt if you fail on the first. A consequence of this is buying more tickets has diminishing returns. In contrast, in the task, participants always jumped twice after purchasing two tickets, and the probability of success with two tickets was exactly double that with one ticket. Thus, if participants are applying an intuitive causal model to the task, they will appear to "underestimate" the elasticity of control. I don't think this seriously jeopardizes the key results, but any follow-up work should ensure that the task's structure is consistent with the intuitive causal model.

      We thank the Reviewer for this comment, and agree the participants may have employed the intuitive understanding the Reviewer describes. This is consistent with our model comparison results, which showed that participants did not assume that control increases linearly with resource investment (lines 677-692). Consequently, this is also not assumed by our model, except perhaps by how the prior is implemented (a property that was supported by model comparison). In the text, we acknowledge that this aspect of the model and participants’ behavior deviates from the true task's structure, and it would be worthwhile to address this deviation in future studies. 

      That said, there is no reason that this will make participants appear to be generally underestimating elasticity. Following exposure to outcomes for one and three tickets, any nonlinear understanding of probabilities would only affect the controllability estimate for two tickets. This would have contrasting effects on the elasticity estimated to the second and third tickets, but on average, it would not change the overall elasticity estimated. On the other hand, such a participant is only exposed to outcomes for two and three tickets, they would come to judge the difference between the first and second tickets too highly, thereby overestimating elasticity.  

      The model is heuristically defined and does not reflect Bayesian updating. For example, it overestimates maximum control by not using losses with less than 3 tickets (intuitively, the inference here depends on what your beliefs about elasticity). Including forced three-ticket trials at the beginning of each round makes this less of an issue; but if you want to remove those trials, you might need to adjust the model. The need to introduce the modified model with kappa is likely another symptom of the heuristic nature of the model updating equations.

      Note that we have tested a fully Bayesian model (lines 676-691), but found that this model fitted participants’ choices worse. 

      You're right; saying these analyses provides "no information" was unfair. I agree that this is a useful way to link model parameters with behavior, and they should remain in the paper. However, my key objection still holds: these analyses do not tell us anything about how *people's* prior assumptions influence behavior. Instead, they tell us about how *fitted model parameters* depend on observed behavior. You can easily avoid this misreading by adding a small parenthetical, e.g.

      Thus, a prior assumption that control is likely available **(operationalized by \gamma_controllability)** was reflected in a futile investment of resources in uncontrollable environments.

      We thank the Reviewer for the suggestion and have added this parenthetical (lines 219, 225).

    1. Reviewer #2 (Public review):

      Summary:

      This paper considers the effects of cognitive load (using an n-back task related to font color), predictability, and age on reading times in two experiments. There were main effects of all predictors, but more interesting effects of load and age on predictability. The effect of load is very interesting, but the manipulation of age is problematic, because we don't know what is predictable for different participants (in relation to their age). There are some theoretical concerns about prediction and predictability, and a need to address literature (reading time, visual world, ERP studies).

      Strengths/weaknesses

      It is important to be clear that predictability is not the same as prediction. A predictable word is processed faster than an unpredictable word (something that has been known since the 1970/80s), e.g., Rayner, Schwanenfluegel, etc. But this could be due to ease of integration. I think this issue can probably be dealt with by careful writing (see point on line 18 below). To be clear, I do not believe that the effects reported here are due to integration alone (i.e., that nothing happens before the target word), but the evidence for this claim must come from actual demonstrations of prediction.

      The effect of load on the effects of predictability is very interesting (and also, I note that the fairly novel way of assessing load is itself valuable). Assuming that the experiments do measure prediction, it suggests that they are not cost-free, as is sometimes assumed. I think the researchers need to look closely at the visual world literature, most particularly the work of Huettig. (There is an isolated reference to Ito et al., but this is one of a large and highly relevant set of papers.)

      There is a major concern about the effects of age. See the Results (161-5): this depends on what is meant by word predictability. It's correct if it means the predictability in the corpus. But it may or may not be correct if it refers to how predictable a word is to an individual participant. The texts are unlikely to be equally predictable to different participants, and in particular to younger vs. older participants, because of their different experiences. To put it informally, the newspaper articles may be more geared to the expectations of younger people. But there is also another problem: the LLM may have learned on the basis of language that has largely been produced by young people, and so its predictions are based on what young people are likely to say. Both of these possibilities strike me as extremely likely. So it may be that older adults are affected more by words that they find surprising, but it is also possible that the texts are not what they expect, or the LLM predictions from the text are not the ones that they would make. In sum, I am not convinced that the authors can say anything about the effects of age unless they can determine what is predictable for different ages of participants. I suspect that this failure to control is an endemic problem in the literature on aging and language processing and needs to be systematically addressed.

      Overall, I think the paper makes enough of a contribution with respect to load to be useful to the literature. But for discussion of age, we would need something like evidence of how younger and older adults would complete these texts (on a word-by-word basis) and that they were equally predictable for different ages. I assume there are ways to get LLMs to emulate different participant groups, but I doubt that we could be confident about their accuracy without a lot of testing. But without something like this, I think making claims about age would be quite misleading.

    1. Reviewer #2 (Public review):

      Summary:

      This study investigates the influence of prior stimuli over multiple time scales in a position discrimination task, using pupillometry data and a reanalysis of EEG data from an existing dataset. The authors report consistent history-dependent effects across task-related, task-unrelated, and stimulus-related dimensions, observed across different time scales. These effects are interpreted as reflecting a unified mechanism operating at multiple temporal levels, framed within predictive coding theory.

      Strengths:

      The authors have done a good job in their revision, clarifying important points and stating the limitations of the study clearly.

      I also think they made a valid effort to address and correct issues arising from the temporal dependency confound, although I still wonder whether the best approach would have been to design an experiment in a way that avoided this confound in the first place.<br /> Overall, this is a substantially improved version, and I particularly appreciate the clarification and correction regarding the direction of the bias in the EEG data (repulsive rather than attractive).

      Weaknesses:

      These are now relatively minor points.

      I believe this latter aspect, the repulsive bias, may deserve further discussion, especially in relation to their behavioral findings and, in particular, to earlier work proposing multi-stage frameworks of serial dependence, where low-level repulsion interacts with attractive biases at higher-level stages (Fritsche et al., 2020; Pascucci et al., 2019; Sheehan & Serences, 2022). The authors may also consider to cite some key reviews on serial dependence that discuss both repulsion and attraction in forced-choice and reproduction tasks (Manassi et al., 2023; Pascucci et al., 2023).

      Related to this, after finding the opposite pattern, is the sentence in line 472-473 ("Further, we found an attractive...") and the related argument still valid?

      Regarding my earlier point about former line 197 and Figure 3b,c: what I noticed-similar to the patterns reported in the studies I referenced-is that the data cannot be simply described as showing faster and more accurate responses for small deltas. Responses also appear faster and more accurate for very large deltas, with performance being worse in between. Indeed, as the authors state: "The peak in precision for large Deltas locations is consistent with alternate events being encoded more precisely, while the peak for small offsets may be explained by the attractive bias towards the previous target." I wonder whether it is necessary, or unequivocally supported by the data, to hypothesize two separate mechanisms here. An alternative could be interference effects between consecutive stimuli that are neither identical nor completely different-making the previous one more likely to interfere with the current stimulus representation.

      Finally, this is definitely a minor point, but I still find the reply to my comment about the prediction of stable retinal input rather speculative. Such a prediction would seem more plausible in world-centered coordinates.

    2. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      (1) The manuscript is quite dense, with some concepts that may prove difficult for the non-specialist. I recommend spending a few more words (and maybe some pictures) describing the difference between task-relevant and task-irrelevant planes. Nice technique, but not instantly obvious. Then we are hit with "stimulus-related", which definitely needs some words (also because it is orthogonal to neither of the above). 

      We agree that the original description of the planes was too terse and have expanded on this in the revised manuscript.

      Line 85 - To test the influence of attention, trials were sorted according to two spatial reference planes, based on the location of the stimulus: task-related and task-unrelated (Fig. 1b). The task-related plane corresponded to participants’ binary judgement (Fig 1b, light cyan vertical dashed line) and the task-unrelated plane was orthogonal to this (Fig 1b, dark cyan horizontal dashed line). For example, if a participant was tasked with performing a left-or-right of fixation judgement, then their task-related plane was the vertical boundary between the left and right side of fixation, while their task-unrelated plane was the horizontal boundary. The former (left-right) axis is relevant to their task while the latter (top-bottom) axis is orthogonal and task irrelevant. This orthogonality can be leveraged to analyze the same data twice (once according to the task-related plane and again according to the taskunrelated plane) in order to compare performance when the relative location of an event is either task relevant or irrelevant.

      Line 183 - whereas task planes were constant, the stimulus-related plane was defined by the location of the stimulus on the previous trial, and thus varied from trial to trial. That is, on each trial, the target is considered a repeat if it changes location by <|90°| relative to its location on the previous trial, and an alternate if it moves by >|90°|.

      (2) While I understand that the authors want the three classical separations, I actually found it misleading. Firstly, for a perceptual scientist to call intervals in the order of seconds (rather than milliseconds), "micro" is technically coming from the raw prawn. Secondly, the divisions are not actually time, but events: micro means one-back paradigm, one event previously, rather than defined by duration. Thirdly, meso isn't really a category, just a few micros stacked up (and there's not much data on this). And macro is basically patterns, or statistical regularities, rather than being a fixed time. I think it would be better either to talk about short-term and long-term, which do not have the connotations I mentioned. Or simply talk about "serial dependence" and "statistical regularities". Or both. 

      We agree that the temporal scales defined in the current study are not the only way one could categorize perceptual time. We also agree that by using events to define scales, we ignore the influence of duration. In terms of the categories, we selected these for two reasons: 1) they conveniently group previous phenomena, and 2) they loosely correspond to iconic-, short- and long-term memory. We agree that one could also potentially split it up into two categories (e.g., short- and long-term), but in general, we think any form of discretization will have limitations. For example, Reviewer 1 suggests that the meso category is simply a few micros stacked together. However, there is a rich literature on phenomena associated with sequences of an intermediate length that do not appear to be entirely explained by stacking micro effects (e.g., sequence learning and sequential dependency). We also find that when controlling for micro level effects, there are clear meso level effects. Also, by the logic that meso level effects are just stacked micro effects, one could also argue the same for macro effects. We don’t think this argument is incorrect, rather we think it exemplifies the challenge of discretising temporal scales. Ultimately, the current study was aimed to test whether seemingly disparate phenomena identified in previous work could be captured by unifying principles. To this end we found that these categories were the most useful. However, we have included a “Limitations and future directions” section in the Discussion of the revised manuscript that acknowledges both the alternative scheme proposed by Reviewer 1, and the value of extending this work to consider the influence of duration (as well as events).

      Line 488 - Limitations and future directions. One potential limitation of the current study is the categorization of temporal scales according to events, independent of the influence of event duration. While this simplification of time supports comparison between different phenomena associated with each scale (e.g., serial dependence, sequential dependencies, statistical learning), future work could investigate the role of duration to provide a more comprehensive understanding of the mechanisms identified in the current study.

      Related to this, while the temporal scales applied here conveniently categorized known sensory phenomena, and partially correspond to iconic-, short-, and long-term memory, they are but one of multiple ways to delineate time. For example, temporal scales could alternatively be defined simply as short- and long-term (e.g., by combining micro and meso scale phenomena). However, this could obscure meaningful differences between phenomena associated with sensory persistence and short-term memory, or qualitative differences in the way that shortsequences of events are processed.

      (3) More serious is the issue of precision. Again, this is partially a language problem. When people use the engineering terms "precision" and "accuracy" together, they usually use the same units, such as degrees. Accuracy refers to the distance from the real position (so average accuracy gives bias), and precision is the clustering around the average bias, usually measured as standard deviation. Yet here accuracy is percent correct: also a convention in psychology, but not when contrasting accuracy with precision, in the engineering sense. I suggest you change "accuracy" to "percent correct". On the other hand, I have no idea how precision was defined. All I could find was: "mixture modelling was used to estimate the precision and guess rate of reproduction responses, based on the concentration (k) and height of von Mises and uniform distributions, respectively". I do not know what that means.

      In the case of a binary decision, is seems reasonable to use the term “accuracy” to refer to the correspondence between the target state and the response on a task. However, we agree that while our (main) task is binary, the target is not and nor is the secondary task. We thank the reviewer for bringing this to our attention, as we agree that this will be a likely cause of confusion. To avoid confusion we have specifically referred to “task accuracy” throughout the revised manuscript.

      With regards to precision, our measure of precision is consistent with what Reviewer 1 describes as such, i.e., the clustering of responses. In particular, the von Mises distribution is essentially a Gaussian distribution in circular space, and the kappa parameter defines the width of the distribution, regardless of the mean, with larger values of kappa indicating narrower (more precise) distributions. We could have used standard deviation to assess precision; however, this would incorrectly combine responses on which participants failed to encode the target (e.g., because of a blink) and were simply guessing. To account for these trials, we applied mixture modelling of guess and genuine responses to isolate the precision of genuine responses, as is standard in the visual working memory literature. However, we agree that this was not sufficiently described in the original manuscript and have elaborated on this method in the revised version.

      Line 598 - From the reproduction task, we sought to estimate participant’s recall precision. It is likely that on some trials participants failed to encode the target and were forced to make a response guess. To isolate the recall precision from guess responses, we used mixture modelling to estimate the precision and guess rate of reproduction responses, based on the concentration (k) and height of von Mises and uniform distributions, respectively (Bays et al., 2009). The k parameter of the von Mises distribution reflects its width, which indicates the clustering of responses around a common location.

      (4) Previous studies show serial dependence can increase bias but decrease scatter (inverse precision) around the biased estimate. The current study claims to be at odds with that. But are the two measures of precision relatable? Was the real (random) position of the target subtracted from each response, leaving residuals from which the inverse precision was calculated? (If so, the authors should say so..) But if serial dependence biases responses in essentially random directions (depending on the previous position), it will increase the average scatter, decreasing the apparent precision. 

      Previous studies have shown that when serial dependence is attractive there is a corresponding increase in precision around small offsets from the previous item (citations). Indeed, attractive biases will lead to reduced scattering (increased precision) around a central attracter. Consistent with previous studies, and this rational, we also found an attractive bias coupled with increased precision. To clarify, for the serial dependency analysis, we calculated bias and precision by binning reproduction responses according to the offset between the current and previous target and then performing the same mixture modelling described above to estimate the mean (bias) and kappa (precision) parameters of the von Mises distribution fit to the angular errors. This was not explained in the original manuscript, so we thank Reviewer 1 for bringing this to our attention and have clarified the analysis in the revised version.

      Line 604 - For the serial dependency analysis, we calculated bias and precision by binning reproduction responses according to the angular offset between the current and previous target and then performing mixture modelling to estimate the mean (bias) and k (precision) parameters of the von Mises distribution.

      (5) I suspect they are not actually measuring precision, but location accuracy. So the authors could use "percent correct" and "localization accuracy". Or be very clear what they are actually doing. 

      As explained in our response to Reviewer 1’s previous comment, we are indeed measuring precision.

      Reviewer #2 (Public review):

      (1) The abstract should more explicitly mention that conclusions about feedforward mechanisms were derived from a reanalysis of an existing EEG dataset. As it is, it seems to present behavioral data only.

      It is not clear what relevance the fact that the data has been analyzed previously has to the results of the current study. However, we do think that it is important to be clear that the EEG recordings were collected separately from the behavioural and eyetracking data, so we have clarified this in the revised abstract.

      Line 7 - By integrating behavioural and pupillometry recordings with electroencephalographical recordings from a previous study, we identify two distinct mechanisms that operate across all scales.

      (2) The EEG task seems quite different from the others, with location and color changes, if I understand correctly, on streaks of consecutive stimuli shown every 100 ms, with the task involving counting the number of target events. There might be different mechanisms and functions involved, compared to the behavioral experiments reported. 

      As stated above, we agree that it is important that readers are aware that the EEG recordings were collected separately to the behavioural and eyetracking data. We were forthright about this in the original manuscript and how now clarified this in the revised abstract. We agree that collecting both sets of data in the same experiment would be a useful validation of the current results and have acknowledged this in a new Limitations and future directions section of the Discussion of the revised manuscript.

      Line 501 - Another limitation of the current study is that the EEG recordings were collected in the separate experiment to the behavioural and pupillometry data. The stimuli and task were similar between experiments, but not identical. For example, the EEG experiment employed coloured arc stimuli presented at a constant rate of ~3.3 Hz and participants were tasked with counting the number of stimuli presented at a target location. By contrast, in the behavioural experiment, participants viewed white blobs presented at an average rate of ~2.8 Hz and performed a binary spatial task coupled with an infrequent reproduction task. An advantage of this was that the sensory responses to stimuli in the EEG recordings were not conflated with motor responses; however, future work combining these measures in the same experiment would serve as a validation for the current results.

      (3) How is the arbitrary choice of restricting EEG decoding to a small subset of parieto-occipital electrodes justified? Blinks and other artifacts could have been corrected with proper algorithms (e.g., ICA) (Zhang & Luck, 2025) or even left in, as decoders are not necessarily affected by noise. Moreover, trials with blinks occurring at the stimulus time should be better removed, and the arbitrary selection of a subset of electrodes, while reducing the information in input to the decoder, does not account for trials in which a stimulus was missed (e.g., due to blinks).

      Electrode selection was based on several factors: 1) reduction of eye movement/blink artifacts (as noted in the original manuscript), 2) consistency with the previous EEG study (Rideaux, 2024) and other similar decoding studies (Buhmann et al., 2024; Harrison et al., 2023; Rideaux et al., 2023), 3) improved signal-to-noise by including only sensors that carry the most position information (as shown in Supplementary Figure 1a and the previous EEG study). We agree that this was insufficiently explained in the original manuscript and have clarified our sensor selection in the revised version.

      Line 631 - We only included the parietal, parietal-occipital, and occipital sensors in the analyses to i) reduce the influence of signals produced by eye movements, blinks, and non-sensory cortices, ii) for consistency with similar previous decoding studies (Buhmann et al., 2024; Rideaux, 2024; Rideaux et al., 2025), and iii) to improve decoding accuracy by restricting sensors to those that carried spatial position information (Supplementary Fig. 1a).

      (4) The artifact that appears in many of the decoding results is puzzling, and I'm not fully convinced by the speculative explanation involving slow fluctuations. I wonder if a different high-pass filter (e.g., 1 Hz) might have helped. In general, the nature of this artifact requires better clarification and disambiguation.

      We agree that the nature of this artifact requires more clarification and disambiguation. Due to relatively slow changes in the neural signal, which are not stimulus-related, there is a degree of temporal autocorrelation in the recordings. This can be filtered out, for example, by using a stricter high-pass filter; however, we tried a range of filters and found that a cut-off of at least 0.7 Hz is required to remove the artifact, and even a filter of 0.2 Hz introduces other (stimulus-related) artifacts, such as above-chance decoding prior to stimulus onset. These stimulus-related artifacts are due to the temporal smearing of data, introduced by the filtering, and have a more pronounced and complex influence on the results and are more difficult to remove through other means, such as the baseline correction applied in the original manuscript.

      The temporal autocorrelation is detected by the decoder during training and biases it to classify/decode targets that are presented nearby in time as similar. That is, it learns the neural pattern for a particular stimulus location based on the activity produced by the stimulus and the temporal autocorrelation (determined by slow stimulus unrelated fluctuations). The latter only accounts for a relatively smaller proportion of the variance in the neural recordings under normal circumstances and would typically go undetected when simply plotting decoding accuracy as a function of position. However, it becomes weakly visible when decoding accuracy is plotted as a function of distance from the previous target, as now the bias (towards temporally adjacent targets) aligns with the abscissa. Further, it becomes highly visible when the stimulus labels are shuffled, as now the decoder can only learn from the variance associated with the temporal autocorrelation (and not from the activity produced by the stimulus).

      In the linear discriminant analysis, this led to temporally proximal items being more likely to be classified as on the same side. This is why there is above-chance performance for repeat trials (Supplementary Figure 2b), and below-chance performance for alternate trials, even when the labels are shuffled – the temporal autocorrelation produces a general bias towards classifying temporally proximate stimuli as on the same side, which selectively improves the classification accuracy of repeat trials. Fortunately, the bias is relatively constant as a function of time within the epoch and is straightforward to estimate by shuffling the labels, which means that it can be removed through a baseline correction. However, to further demonstrate that the autocorrelation confound cannot account for the differences observed between repeat and alternate trials in the micro classification analysis, we now additionally show the results from a more strictly filtered version of the data (0.7 Hz). These results show a similar pattern as the original, with the additional stimulusrelated artifacts introduced by the strict filter, e.g., above chance decoding prior to stimulus onset.

      In the inverted encoding analysis, the same temporal autocorrelation manifests as temporally proximal trials being decoded as more similar locations. This is why there is increased decoding accuracy for targets with small angular offsets from the previous target, even when the labels are shuffled (Supplementary Figure 3c), because it is on these trials that the bias happens to align with the correct position. This leads to an attractive bias towards the previous item, which is most prominent when the labels are shuffled.

      To demonstrate the phenomenon, we simulated neural recordings from a population of tuning curves and performed the inverted encoding analysis on a clean version of the data and a version in which we introduced temporal autocorrelation. We then repeated this after shuffling the labels. The simulation produced very similar results to those we observed in the empirical data, with a single exception: while precision in the simulated shuffled data was unaffected by autocorrelation, precision in the unshuffled data was clearly affected by this manipulation. This may explain why we did not find a correlation between the shuffled and unshuffled precision in the original manuscript. 

      These results echo those from the classification analysis, albeit in a more continuous space. However, whereas in the classification analysis it was straightforward to perform a baseline correction to remove the influence of general temporal dependency, the more complex nature of the accuracy, precision, and bias parameters over the range of time and delta location makes this approach less appropriate. For example, the bias in the shuffled condition ranged from -180 to 180 degrees, which when subtracted from the bias in the unshuffled condition would produce an equally spurious outcome, i.e., the equal opposite of this extreme bias. Instead for the inverted encoding analysis, we used the data high-pass filtered at 0.7 Hz. As with the classification analysis, this removed the influence of general temporal dependencies, as indicated by the results of the shuffled data analysis (Supplementary Figure 3f), but it also temporally smeared the stimulus-related signal, resulting in above chance decoding accuracy prior to stimulus onset (Supplementary Figure 3d). However, given thar we were primarily interested in the pattern of accuracy, precision, and bias as a function of delta location, and less concerned with the precise temporal dynamics of these changes, which appeared relatively stable in the filtered data. Thus, this was the more suitable approach to removing the general temporal dependencies in the inverted encoding analysis and the one that is presented in Figure 3.

      We have updated the revised manuscript in light of these changes, including a fuller description of the artifact and the results from the abovementioned control analyses.

      Figure 3 updated.

      Figure 3 caption - e) Decoding accuracy for stimulus location, from reanalysis of previously published EEG data (17). Inset shows the EEG sensors included in the analysis (blue dots), and black rectangles indicate the timing of stimulus presentations (solid: target stimulus, dashed: previous and subsequent stimuli). f) Decoding accuracy for location, as a function of time and D location. Bright colours indicate higher decoding accuracy; absolute accuracy values can be inferred from (e). g-i) Average location decoding  (g) accuracy, (h) precision, and (h) bias from 50 – 500 ms following stimulus onset. Horizontal bar in (e) indicates cluster corrected periods of significance; note, all time points were significantly above chance due to temporal smear introduced by strict high-pass filtering (see Supplementary Figure 3 for full details). Note, the temporal abscissa is aligned across (e & f). Shaded regions indicate ±SEM.

      Line 218 - To further investigate the influence of serial dependence, we applied inverted encoding modelling to the EEG recordings to decode the angular location of stimuli. We found that decoding accuracy of stimulus location sharply increased from ~60 ms following stimulus onset (Fig. 3e). Note, to reduce the influence of general temporal dependencies, we applied a 0.7 Hz high-pass filter to the data, which temporally smeared the stimulus-related information, resulting in above chance decoding accuracy prior to stimulus presentation (for full details, see Supplementary Figure 3). To understand how serial dependence influences the representation of these features, we inspected decoding accuracy for location as a function of both time and D location (Fig. 3f). We found that decoding accuracy varied depending not only as a function of time, but also as a function of D location. To characterise this relationship, we calculated the average decoding accuracy from 50 ms until the end of the epoch (500 ms), as a function of D location (Fig. 3g). This revealed higher accuracy for targets with larger D location. We found a similar pattern of results for decoding precision (Fig. 3h). These results are consistent with the micro temporal context (behavioural) results, showing that targets that alternated were recalled more precisely. Lastly, we calculated the decoding bias as a function of D location and found a clear repulsive bias away from the previous item (Fig. 3i). While this result is inconsistent with the attractive behavioural bias, it is consistent with recent studies of serial dependence suggesting an initial pattern of repulsion followed by an attractive bias during the response period (20–22).

      Line 726 - As shown in Supplementary Figure 3, we found the same general temporal dependencies in the decoding accuracy computed using inverted encoding that were found using linear discriminant classification. However, as a baseline correction would not have been appropriate or effective for the parameters decoded with this approach, we instead used a high-pass filter of 0.7 Hz to remove the confound, while being cautious about interpreting the timing of effects produced by this analysis due to the temporal smear introduced by the filter.

      Supplementary Figure 2 updated.

      Supplementary Figure 2 caption - Removal of general micro temporal dependencies in EEG responses. We found that there were differences in classification accuracy for repeat and alternate stimuli in the EEG data, even when stimulus labels were shuffled. This is likely due to temporal autocorrelation within the EEG data due to low frequency signal changes that are unrelated to the decoded stimulus dimension. This signal trains the decoder to classify temporally proximal stimuli as the same class, leading to a bias towards repeat classification. For example, in general, the EEG signal during trial one is likely to be more similar to that during trial two than during trial ten, because of low frequency trends in the recordings. If the decoder has been trained to classify the signal associated with trial one as a leftward stimulus, then it will be more likely to classify trial two as a leftward stimulus too. These autocorrelations are unrelated to stimulus features; thus, to isolate the influence of stimulus-specific temporal context, we subtracted the classification accuracy produced by shuffling the stimulus labels from the unshuffled accuracy (as presented in Figure 2e, f). We confirmed that using a stricter high-pass filter (0.7 Hz) removes this artifact, as indicated by the equal decoding accuracy between the two shuffled conditions. However, the stricter high-pass filter temporally smears the stimulus-related signal, which introduces other (stimulus-related) artifacts, e.g., above-chance decoding accuracy prior to stimulus presentation, that are larger and more complex, i.e., changing over time. Thus, we opted to use the original high pass filter (0.1 Hz) and apply a baseline correction. a) The uncorrected classification  accuracy along task related and unrelated planes. Note that these results are the same as the corrected version shown in Figure 2e, because the confound is only apparent when accuracy is grouped according to temporal context.

      b) Same as (a), but split into repeat and alternate stimuli, along (left) task-related and (right) unrelated planes. Classification  accuracy when labels are shuffled is also shown. Inset in (a) shows the EEG sensors included in the analysis (blue dots). (c, d) Same as (a, b), but on data filtered using a 0.7 Hz high-pass filter. Black rectangles indicate the timing of stimulus presentations (solid: target stimulus, dashed: previous and subsequent stimuli). Shaded regions indicate ±SEM.

      Supplementary Figure 3 updated.

      Supplementary Figure 3 caption - Removal of general temporal dependencies in EEG responses for inverted encoding analyses. As described in Methods - Neural Decoding, we used inverted encoding modelling of EEG recordings to estimate the decoding accuracy, precision, and bias of stimulus location. Just as in the linear discriminant classification analysis, we also found the influence of general temporal dependencies in the results produced by the inverted encoding analysis. In particular, there was increased decoding accuracy for targets with low D location. This was weakly evident in the period prior to stimulus presentation, but clearly visible when the labels were shuffled. These results are mirror those from the classification analysis, albeit in a more continuous space. However, whereas in the classification analysis it was straightforward to perform a baseline correction to remove the influence of general temporal dependency, the more complex nature of the accuracy, precision, and bias parameters over the range of time and D location makes this approach less appropriate. For example, the bias in the shuffled condition ranged from -180° to 180°, which when subtracted from the bias in the unshuffled condition would produce an equally spurious outcome, i.e., the equal opposite of this extreme bias. Instead for the inverted encoding analysis, we used the data high-pass filtered at 0.7 Hz. As with the classification analysis, this significantly reduced the influence of general temporal dependencies, as indicated by the results of the shuffled data analysis, but it also temporally smeared the stimulus-related signal, resulting in above chance decoding accuracy prior to stimulus onset. However, we were primarily interested in the pattern of accuracy, precision, and bias as a function of D location, and less concerned with the precise temporal dynamics of these changes. Thus, this was the more suitable approach to removing the general temporal dependencies in the inverted encoding analysis and the one that is presented in Figure 3. (a) Decoding accuracy as a function of time for the EEG data filtered using a 0.1 Hz high-pass filter. Inset shows the EEG sensors included in the analysis (blue dots), and black rectangles indicate the timing of stimulus presentations (solid: target stimulus, dashed: previous and subsequent stimuli). (b, c) The same as (a), but as a function of time and D location for (b) the original data and (c) data with shuffled labels. (d-f) Same as (a-c), but for data filtered using a 0.7 Hz high-pass filter. Shaded regions in (a, d) indicate ±SEM. Horizontal bars in (a, d) indicate cluster corrected periods of significance; note, all time points in (d) were significantly above chance. Note, the temporal abscissa is vertically aligned across plots (a-c & d-f).

      In the process of performing these additional analyses and simulations, we became aware that the sign of the decoding bias in the inverted encoding analyses had been interpreted in the wrong direction. That is, where we previously reported an initial attractive bias followed by a repulsive bias relative to the previous target, we have in fact found the opposite, an initial repulsive bias followed by an attractive bias relative to the previous target. Based on the new control analyses and simulations, we think that the latter attractive bias was due to general temporal dependencies. That is, in the filtered data, we only observe a repulsive bias. While the bias associated with serial dependence was not a primary feature of the study, this (somewhat embarrassing) discovery has led to reinterpretation of some results relating to serial dependence. However, it is encouraging to see that our results now align with those of recent studies (Fischer et al., 2024; Luo et al., 2025; Sheehan et al. 2024).

      Line 385 - Our corresponding EEG analyses revealed better decoding accuracy and precision for stimuli preceded by those that were different and a bias away from the previous stimulus. These results are consistent with finding that alternating stimuli are recalled more precisely. Further, while the repulsive pattern of biases is inconsistent with the observed behavioural attractive biases, it is consistent with recent work on serial dependence indicating an initial period of repulsion, followed by an attractive bias during the response period (20–22). These findings indicate that serial dependence and first-order sequential dependencies can be explained by the same underlying principle.

      (5) Given the relatively early decoding results and surprisingly early differences in decoding peaks, it would be useful to visualize ERPs across conditions to better understand the latencies and ERP components involved in the task.

      A rapid presentation design was used in the EEG experiment, and while this is well suited to decoding analyses, unfortunately we cannot resolve ERPs because the univariate signal is dominated by an oscillation at the stimulus presentation frequency (~3 Hz). We agree that this could be useful to examine in future work.

      (6) It is unclear why the precision derived from IEM results is considered reliable while the accuracy is dismissed due to the artifact, given that both seem to be computed from the same set of decoding error angles (equations 8-9).

      This point has been addressed in our response to point (4).

      (7) What is the rationale for selecting five past events as the meso-scale? Prior history effects have been shown to extend much further back in time (Fritsche et al., 2020). 

      We used five previous items in the meso analyses to be consistent with previous research on sequential dependencies (Bertelson, 1961; Gao et al., 2009; Jentzsch & Sommer, 2002; Kirby, 1976; Remington, 1969). However, we agree that these effects likely extend further and have acknowledged this in the revied version of the manuscript.

      Line 240 - Higher-order sequential dependences are an example of how stimuli (at least) as far back as five events in the past can shape the speed and task accuracy of responses to the current stimulus (9, 10); however, note that these effects have been observed for more than five events (20).

      (8) The decoding bias results, particularly the sequence of attraction and repulsion, appear to run counter to the temporal dynamics reported in recent studies (Fischer et al., 2024; Luo et al., 2025; Sheehan & Serences, 2022). 

      This point has been addressed in our response to point (4).

      (9) The repulsive component in the decoding results (e.g., Figure 3h) seems implausibly large, with orientation differences exceeding what is typically observed in behavior. 

      As noted in our response to point (4), this bias was likely due to the general temporal dependency confound and has been removed in the revised version of the manuscript.

      (10) The pattern of accuracy, response times, and precision reported in Figure 3 (also line 188) resembles results reported in earlier work (Stewart, 2007) and in recent studies suggesting that integration may lead to interference at intermediate stimulus differences rather than improvement for similar stimuli (Ozkirli et al., 2025).

      Thank you for bringing this to our attention, we have acknowledged this in the revised manuscript.

      Line 197 - Consistent with our previous binary analysis, and with previous work (19), we also found that responses were faster and more accurate when D location was small (Fig. 3b, c).

      (11) Some figures show larger group-level variability in specific conditions but not others (e.g., Figures 2b-c and 5b-c). I suggest reporting effect sizes for all statistical tests to provide a clearer sense of the strength of the observed effects. 

      Yes, as noted in the original manuscript, we find significant differences between the variance task-related and -unrelated conditions. We think this is due to opposing forces in the task-related condition: 

      “The increased variability of response time differences across the taskrelated plane likely reflects individual differences in attention and prioritization of responding either quickly or accurately. On each trial, the correct response (e.g., left or right) was equally probable. So, to perform the task accurately, participants were motivated to respond without bias, i.e., without being influenced by the previous stimulus. We would expect this to reduce the difference in response time for repeat and alternate stimuli across the taskrelated plane, but not the task-unrelated plane. However, attention may amplify the bias towards making faster responses for repeat stimuli, by increasing awareness of the identity of stimuli as either repeats or alternations (17). These two opposing forces vary with task engagement and strategy and thus would be expected produce increased variability across the task-related plane.” We agree that providing effect sizes may provided a clearer sense of the observed effects and have done so in the revised version of the manuscript.

      Line 739 - For Wilcoxon signed rank tests, the rank-biserial correlation (r) was calculated as an estimate of effect size, where 0.1, 0.3, and 0.5 indicate small, medium, and large effects, respectively (54). For Friedman’s ANONA tests, Kendal’s W was calculated as an estimate of effect size, where 0.1, 0.3, and 0.5 indicate small, medium, and large effects, respectively (55).

      (12) The statement that "serial dependence is associated with sensory stimuli being perceived as more similar" appears inconsistent with much of the literature suggesting that these effects occur at post-perceptual stages (Barbosa et al., 2020; Bliss et al., 2017; Ceylan et al., 2021; Fischer et al., 2024; Fritsche et al., 2017; Sheehan & Serences, 2022). 

      In light of the revised analyses, this statement has been removed from the manuscript.

      (13) If I understand correctly, the reproduction bias (i.e., serial dependence) is estimated on a small subset of the data (10%). Were the data analyzed by pooling across subjects?

      The dual reproduction task only occurred on 10% of trials. There were approximately 2000 trials, so ~200 reproduction responses. For the micro and macro analyses, this was sufficient to estimate precision within each of the experimental conditions (repeat/alternate, expected/unexpected). However, it is likely that we were not able to reproduce the effect of precision at the meso level across both experiments because we lacked sufficient responses to reliably estimate precision when split across the eight sequence conditions. Despite this, the data was always analysed within subjects.

      (14) I'm also not convinced that biases observed in forced-choice and reproduction tasks should be interpreted as arising from the same process or mechanism. Some of the effects described here could instead be consistent with classic priming. 

      We agree that the results associated with the forced-choice task (response time task accuracy) were likely due to motor priming, but that a separate (predictive) mechanism may explain the (precision) results associated with the reproduction task. These are two mechanisms we think are operating across the three temporal scales investigated in the current study.

      Reviewing Editor Comments:

      (1) Clarify task design and measurement: The dense presentation makes it difficult to understand key design elements and their implications. Please provide clearer descriptions of all task elements, and how they relate to each other (EEG vs. behaviour, stimulus plane vs. TR and TU plane, reproduction vs. discrimination and role of priming), and clearly explain how key measures were computed for each of these (e.g., precision, accuracy, reproduction bias).

      In the revised manuscript, we have expanded on descriptions of the source and nature of the data (behavioural and EEG), the different planes analyzed in the behavioural task, and how key metrics (e.g., precision) were computed.

      (2) Offer more insight into underlying data, including original ERP waveforms to aid interpretation of decoding results and the timing of effects. In particular, unpack the decoding temporal confound further.

      In the revised manuscript, we have considerably offered more insight into the decoding results, in particular, the nature of the temporal confound. We were unable to assess ERPs due to the rapid presentation design employed in the EEG experiment.

      (3) Justify arbitrary choices such as electrode selection for EEG decoding (e.g., limiting to parieto-occipital sensors), number of trials in meso scale, and the time terminology itself.

      In the revised manuscript, we have clarified the reasons for electrode selection.

      (3) Discuss deviations from literature: Several findings appear to contradict or diverge from previous literature (e.g., effects of serial dependence). These discrepancies could be discussed in more depth. 

      Upon re-analysis of the serial dependence bias and removal of the temporal confound, the results of the revised manuscript now align with those from previous literature, which has been acknowledged.

      Reviewer #1 (Recommendations for the authors):

      (1) would like to use my reviewer's prerogative to mention a couple of relevant publications. 

      Galluzzi et al (Journal of Vision, 2022) "Visual priming and serial dependence are mediated by separate mechanisms" suggests exactly that, which is relevant to this study.

      Xie et al. (Communications Psychology, 2025) "Recent, but not long-term, priors induce behavioral oscillations in peri-saccadic vision" also seems relevant to the issue of different mechanisms. 

      Thank you for bringing these studies to our attention. We agree that they are both relevant have referenced both appropriately in the revised version of the manuscript.

      Reviewer #2 (Recommendations for the authors): 

      (1) I find the discussion on attention and awareness (from line 127 onward) somewhat vague and requiring clarification.

      We agree that this statement was vague and referred to “awareness” without operationation. We have revised this statement to improve clarity.

      Line 135 - However, task-relatedness may amplify the bias towards making faster responses for repeat stimuli, by increasing attention to the identity of stimuli as either repeats or alternations (17).

      (2) Line 140: It's hard to argue that there are expectations that the image of an object on the retina is likely to stay the same, since retinal input is always changing. 

      We agree that retinal input is often changing, e.g., due to saccades, self-motion, and world motion. However, for a prediction to be useful, e.g., to reduce metabolic expenditure or speed up responses, it must be somewhat precise, so a prediction that retinal input will change is not necessarily useful, unless it can specify what it will change to. Given retinal input of x at time t, the range of possible values of x at time t+1 (predicting change) is infinite. By contrast, if we predict that x=x at time t+1 (no change), then we can make a precise prediction. There is, of course, other information that could be used to reduce the parameter space of predicted change from x at time t, e.g., the value of x at time t-1, and we think this drives predictions too. However, across the infinite distribution of changes from x, zero change will occur more frequently than any other value, so we think it’s reasonable to assert that the brain may be sensitive to this pattern.

      (3) Line 564: The gambler's fallacy usually involves sequences longer than just one event.

      Yes, we agree that this phenomenon is associated with longer sequences. This section of the manuscript was in regards to previous findings that were not directly relevant to the current study and has been removed in the revised version.

      (4) In the shared PDF, the light and dark cyan colors used do not appear clearly distinguishable. 

      I expect this is due to poor document processing or low-quality image embeddings. I will check that they are distinguishable in the final version.

      References: 

      Barbosa, J., Stein, H., Martinez, R. L., Galan-Gadea, A., Li, S., Dalmau, J., Adam, K. C. S., Valls-Solé, J., Constantinidis, C., & Compte, A. (2020). Interplay between persistent activity and activity-silent dynamics in the prefrontal cortex underlies serial biases in working memory. Nature Neuroscience, 23(8), Articolo 8. https://doi.org/10.1038/s41593-020-0644-4

      Bliss, D. P., Sun, J. J., & D'Esposito, M. (2017). Serial dependence is absent at the time of perception but increases in visual working memory. Scientific reports, 7(1), 14739. 

      Ceylan, G., Herzog, M. H., & Pascucci, D. (2021). Serial dependence does not originate from low-level visual processing. Cognition, 212, 104709. https://doi.org/10.1016/j.cognition.2021.104709

      Fischer, C., Kaiser, J., & Bledowski, C. (2024). A direct neural signature of serial dependence in working memory. eLife, 13. https://doi.org/10.7554/eLife.99478.1

      Fritsche, M., Mostert, P., & de Lange, F. P. (2017). Opposite effects of recent history on perception and decision. Current Biology, 27(4), 590-595. 

      Fritsche, M., Spaak, E., & de Lange, F. P. (2020). A Bayesian and efficient observer model explains concurrent attractive and repulsive history biases in visual perception. eLife, 9, e55389. https://doi.org/10.7554/eLife.55389

      Gekas, N., McDermott, K. C., & Mamassian, P. (2019). Disambiguating serial effects of multiple timescales. Journal of vision, 19(6), 24-24. 

      Luo, M., Zhang, H., Fang, F., & Luo, H. (2025). Reactivation of previous decisions repulsively biases sensory encoding but attractively biases decision-making. PLOS Biology, 23(4), e3003150. https://doi.org/10.1371/journal.pbio.3003150

      Ozkirli, A., Pascucci, D., & Herzog, M. H. (2025). Failure to replicate a superiority effect in crowding. Nature Communications, 16(1), 1637. https://doi.org/10.1038/s41467025-56762-5

      Sheehan, T. C., & Serences, J. T. (2022). Attractive serial dependence overcomes repulsive neuronal adaptation. PLoS biology, 20(9), e3001711. 

      Stewart, N. (2007). Absolute identification is relative: A reply to Brown, Marley, and

      Lacouture (2007).  Psychological  Review, 114, 533-538. https://doi.org/10.1037/0033-295X.114.2.533

      Treisman, M., & Williams, T. C. (1984). A theory of criterion setting with an application to sequential dependencies. Psychological review, 91(1), 68. 

      Zhang, G., & Luck, S. J. (2025). Assessing the impact of artifact correction and artifact rejection on the performance of SVM- and LDA-based decoding of EEG signals. NeuroImage, 316, 121304. https://doi.org/10.1016/j.neuroimage.2025.121304

  3. mathieubcd.github.io mathieubcd.github.io
    1. Scenario analysis is amethod in which multiple potential future states (or outcomes) are forecast.It is not constrained by events of the past, which may not capture the impactof changes in the environment; rather it uses both trends (the known) anduncertainties (the unknown) to predict a range of possible future scenarios.

      Healthcare leaders often rely too heavily on past data, even though the future rarely unfolds the same way as the past. Scenario analysis encourages organizations to think in terms of possibilities, not certainties, which is especially relevant in healthcare, where conditions can change quickly. For example, we can plan for best-case, worst-case, and most likely outcomes during a pandemic. This improves resource planning and highlights the risks of making decisions based on outdated assumptions. It’s a reminder that uncertainty should be treated as part of strategy, not just as an obstacle.

    1. Are we to keep the people of India ignorant in order that we may keep them submissive? Or do we think that we can give them knowledge without awakening ambition? Or do we mean to awaken ambition and to provide it with no legitimate vent? Who will answer any of these questions in the affirmative? Yet one of them must be answered in the affirmative, by every person who maintains that we ought permanently to exclude the natives from high office. 1 have no fears. The path of duty is plain before us: and it is also the path of wisdom, of national prosperity, of national honor.

      Here, Macaulay challenges the logic of permanently excluding Indians from higher office under British rule. He frames the issue as a series of rhetorical questions, pointing out the contradictions in denying education and advancement to Indians while still claiming to rule justly. His language reveals both a moral stance and a pragmatic one: keeping India submissive through ignorance is unjust and also unwise for Britain’s long-term prosperity. By insisting that knowledge will naturally create ambition, he argues that denying Indians political opportunity would lead to instability. Overall, the passage reveals Macaulay’s conviction that the gradual inclusion of Indians into governance was not only a duty but also a means to strengthen Britain’s honor and secure its empire.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      Summary:

      The authors aim to explore the effects of the electrogenic sodium-potassium pump (Na<SUP>+</SUP>/K<SUP>+</SUP>ATPase) on the computational properties of highly active spiking neurons, using the weakly-electric fish electrocyte as a model system. Their work highlights how the pump's electrogenicity, while essential for maintaining ionic gradients, introduces challenges in neuronal firing stability and signal processing, especially in cells that fire at high rates. The study identifies compensatory mechanisms that cells might use to counteract these effects, and speculates on the role of voltage dependence in the pump's behavior, suggesting that Na<SUP>+</SUP>/K<SUP>+</SUP>-ATPase could be a factor in neuronal dysfunctions and diseases

      Strengths:

      (1) The study explores a less-examined aspect of neural dynamics-the effects of Na<SUP>+</SUP>/K<SUP>+</SUP>-ATPase electrogenicity. It offers a new perspective by highlighting the pump's role not only in ion homeostasis but also in its potential influence on neural computation.

      (2) The mathematical modeling used is a significant strength, providing a clear and controlled framework to explore the effects of the Na<SUP>+</SUP>/K<SUP>+</SUP>-ATPase on spiking cells. This approach allows for the systematic testing of different conditions and behaviors that might be difficult to observe directly in biological experiments.

      (3) The study proposes several interesting compensatory mechanisms, such as sodium leak channelsand extracellular potassium buffering, which provide useful theoretical frameworks for understanding how neurons maintain firing rate control despite the pump's effects.

      Weaknesses:

      (1) While the modeling approach provides valuable insights, the lack of experimental data to validate the model's predictions weakens the overall conclusions.

      (2)The proposed compensatory mechanisms are discussed primarily in theoretical terms without providing quantitative estimates of their impact on the neuron's metabolic cost or other physiological parameters.

      Comments on revisions:

      The revised manuscript is notably improved.

      We thank the reviewer for their concise and accurate summary and appreciate the constructive feedback on the article’s strengths and weaknesses. Experimental work is beyond the scope of our modeling-based study. However, we would like our work to serve as a framework for future experimental studies into the role of the electrogenic pump current (and its possible compensatory currents) in disease, and its role in evolution of highly specialized excitable cells (such as electrocytes).

      Quantitative estimates of metabolic costs in this study are limited to the ATP that is required to fuel the Na<SUP>+</SUP>/K<SUP>+</SUP> pump. By integrating the net pump current over time and dividing by one elemental charge, one can find the rate of ATP that is consumed by the Na<SUP>+</SUP>/K<SUP>+</SUP> pump for either compensatory mechanism. The difference in net pump current is thus proportional to ATP consumption, which allows for a direct comparison of the cost efficiency of the Na<SUP>+</SUP>/K<SUP>+</SUP> pump for each proposed compensatory mechanism. The Na<SUP>+</SUP>/K<SUP>+</SUP> pump is however not the only ATP-consuming element in the electrocyte, and some of the compensatory mechanisms induce other costs related to cell ‘housekeeping’ or presynaptic processes. We now added a section in the appendix titled ‘Considerations on metabolic costs of compensatory mechanisms’ (section 11.4), where we provide rough estimates on the influence of the compensatory mechanisms on the total metabolic costs of the cell and membrane space occupation. Although we argue that according these rough estimates, the impact of discussed compensatory mechanisms could be significant, due to the absence of more detailed experimental quantification, a plausible quantitative cost estimate on the whole cell level remains beyond the scope of this article.

      Reviewer #1 (Recommendations for the authors):

      I just have a few recommendations on the updated manuscript.

      (1) When exploring the different roles of Na<SUP>+</SUP>/K<SUP>+</SUP>-ATPase in the Results section, the authors employed many different models. For instance, the voltage equation on page 15, voltage equation (2) on page 22, voltage equation (12) on page 24, voltage equation (30) on page 32, and voltage equation (38) on page 35 are presented as the master equations for their respective biophysical models. Meanwhile, the phase models are presented on page 29 and page 33. I would recommend that the authors clearly specify which equations correspond to each subsection of the Results section and explicitly state which equations were used to generate the data in each figure. This would help readers more easily follow the connections between the models, the results, and the figures.

      We thank the reviewer for pointing out that the links of the different voltage equations to the results could be expressed more explicitly in the article. All simulations were done using the ‘master equation’  expressed in Eq. 2, and the other voltage equations that are specified in the article (in the new version of the article Eqs. 13, 31, and 39) are reformulations of Eq. 2 to analytically show different properties of the voltage equation (Eq. 2). This has now been mentioned in the article when formulating the voltage equations, and the equation for the total leak current (in the new version Eq. 3) has been added for completeness.

      (2) The authors may want to revisit their description and references concerning Eigenmannia virescens. For example, wave-type weakly electric fish (e.g., Eigenmannia) and pulse-type weakly electric fish (e.g., Gymnotus carapo) exhibit large differences, making references 52-55 may be inappropriate for subsection 4.3.1, as these studies focus on Gymnotus carapo. Additionally, even within wave-type species, chirp patterns vary. For example, Eigenmannia can exhibit short "pauses"-type chirps, whereas Apteronotus leptorhynchus (another waver-form fish) does not (https://pubmed.ncbi.nlm.nih.gov/14692494/).

      We thank the reviewer for pointing this out. The citations and phrasing in sections 4.3.1 and 4.3.2 have been updated to specifically refer to the weakly electric fish e. Virescens.

      (3) Table on page 21: Please explain why the parameter value (13.5mM) of [Na<SUP>^</SUP>+]_{in} is 10 timeslarger than its value (1.35mM) in reference [26]? How does this value (13.5mM) compare with the range of variable [Na<SUP>^</SUP>+]_{in} in equation (6)?

      The intracellular sodium concentration in reference [26] was reported to be 1.35 mM, but the authors also reported an extracellular sodium concentration of 120 mM, and a sodium reversal potential of 55 mV. Upon calculating the sodium reversal potential, we found that an intracellular sodium concentration of 1.35 mM would give a sodium reversal potential of 113 mV. An intracellular sodium concentration of 13.5 mM, on the other hand, leads to the reported and physiological reversal potential of 55 mV. This has now been clarified in the article, and the connection between this value and Eq. 6 (Eq. 7 in the new version) has also been clarified.

      Reviewer #2 (Public review):

      Summary:

      The paper by Weerdmeester, Schleimer, and Schreiber uses computational models to present the biological constraints under which electrocytes - specialized, highly active cells that facilitate electro-sensing in weakly electric fish-may operate. The authors suggest potential solutions that these cells could employ to circumvent these constraints.

      Electrocytes are highly active or spiking (greater than 300Hz) for sustained periods (for minutes to hours), and such activity is possible due to an influx of sodium and efflux of potassium ions into these cells after each spike. The resulting ion imbalance must be restored, which in electrocytes, as with many other biological cells, is facilitated by the Na-K pumps at the expense of biological energy, i.e., ATP molecules. For each ATP molecule the pump uses, three positively charged sodium ions from the intracellular space are exchanged for two positively charged potassium ions from the extracellular space. This creates a net efflux of positive ions into the extracellular space, resulting in hyperpolarized potentials for the cell over time. For most cells, this does not pose an issue, as their firing rate is much slower, and other compensatory mechanisms and pumps can effectively restore the ion imbalances. However, in the electrocytes of weakly electric fish, which spike at exceptionally high rates, the net efflux of positive ions presents a challenge. Additionally, these cells are involved in critical communication and survival behaviors, underscoring their essential role in reliable functioning.

      In a computational model, the authors test four increasingly complex solutions to the problem of counteracting the hyperpolarized states that occur due to continuous NaK pump action to sustain baseline activity. First, they propose a solution for a well-matched Na leak channel that operates in conjunction with the NaK pump, counteracting the hyperpolarizing states naturally. Their model shows that when such an orchestrated Na leak current is not included, quick changes in the firing rates could have unexpected side effects. Secondly, they study the implications of this cell in the context of chirps-a means of communication between individual fish. Here, an upstream pacemaking neuron entrains the electrocyte to spike, which ceases to produce a so-called chirp - a brief pause in the sustained activity of the electrocytes. In their model, the authors demonstrate that including the extracellular potassium buffer is necessary to obtain a reliable chirp signal. Thirdly, they tested another means of communication in which there was a sudden increase in the firing rate of the electrocyte, followed by a decay to the baseline. For this to occur reliably, the authors emphasize that a strong synaptic connection between the pacemaker neuron and the electrocyte is necessary. Finally, since these cells are energy-intensive, they hypothesize that electrocytes may have energy-efficient action potentials, for which their NaK pumps may be sensitive to the membrane voltages and perform course correction rapidly.

      Strengths:

      The authors extend an existing electrocyte model (Joos et al., 2018) based on the classical Hodgkin and Huxley conductance-based models of sodium and potassium currents to include the dynamics of the sodium-potassium (NaK) pump. The authors estimate the pump's properties based on reasonable assumptions related to the leak potential. Their proposed solutions are valid and may be employed by weakly electric fish. The authors explore theoretical solutions to electrosensing behavior that compound and suggest that all these solutions must be simultaneously active for the survival and behavior of the fish. This work provides a good starting point for conducting in vivo experiments to determine which of these proposed solutions the fish employ and their relative importance. The authors include testable hypotheses for their computational models.

      Weaknesses:

      The model for action potential generation simplifies ion dynamics by considering only sodium and potassium currents, excluding other ions like calcium. The ion channels considered are assumed to be static, without any dynamic regulation such as post-translational modifications. For instance, a sodium-dependent potassium pump could modulate potassium leak and spike amplitude (Markham et al., 2013).

      This work considers only the sodium-potassium (NaK) pumps to restore ion gradients. However, in many cells, several other ion pumps, exchangers, and symporters are simultaneously present and actively participate in restoring ion gradients. When sodium currents dominate action potentials, and thus when NaK pumps play a critical role, such as the case in Eigenmannia virescens, the present study is valid. However, since other biological processes may find different solutions to address the pump's non-electroneutral nature, the generalizability of the results in this work to other fast-spiking cell types is limited. For example, each spike could include a small calcium ion influx that could be buffered or extracted via a sodium-calcium exchanger.

      We thank the reviewer for the detailed summary and the updated identified strengths and weaknesses. The current article indeed focuses on and isolates the interplay between sodium currents, potassium currents, and sodium-potassium pump currents. As discussed in section 5.1, in excitable cells where these currents are the main players in action-potential generation, the results presented in this article are applicable. The contribution of post-translational effects of ion channels, other ionic currents, and other active transporters and pumps, could be exciting avenues for further studies

      .

      Reviewer #2 (Recommendations for the authors):

      Thank you for addressing my comments.

      All the figures are now consistent. The color schema used is clear.

      The methods and discussions expansions improve the paper.

      Including the model assumptions and simplifications is appreciated.

      Including internal references is helpful.

      The equations are clear, and the references have been fixed.

      I am content with the changes. I have updated my review accordingly.

      We thank the reviewer for their initial constructive comments that lead to the significant improvement of the article.

      Page : 3 Line : 113 Author : Unknown Author 07/24/2025 

      Although this is technically correct, the article is about electrocommunication signals and does not focus on sensing.

      Page : 3 Line : 153 Author : Unknown Author 07/24/2025

      electrocommunication

      Page : 4 Line : 164 Author : Unknown Author 07/24/2025 

      Judging from the cited article, I think this should be a sodium-dependent potassium current.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The authors developed a sequence-based method to predict drug-interacting residues in IDP, based on their recent work, to predict the transverse relaxation rates (R2) of IDP trained on 45 IDP sequences and their corresponding R2 values. The discovery is that the IDPs interact with drugs mostly using aromatic residues that are easy to understand, as most drugs contain aromatic rings. They validated the method using several case studies, and the predictions are in accordance with chemical shift perturbations and MD simulations. The location of the predicted residues serves as a starting point for ligand optimization.

      Strengths:

      This work provides the first sequence-based prediction method to identify potential druginteracting residues in IDP. The validity of the method is supported by case studies. It is easy to use, and no time-consuming MD simulations and NMR studies are needed.

      Weaknesses:

      The method does not depend on the information of binding compounds, which may give general features of IDP-drug binding. However, due to the size and chemical structures of the compounds (for example, how many aromatic rings), the number of interacting residues varies, which is not considered in this work. Lacking specific information may restrict its application in compound optimization, aiming to derive specific and potent binding compounds.

      We fully recognize that different compounds may have different interaction propensity profiles along the IDP sequence. In future studies, we will investigate compound-specific parameter values. The limiting factor is training data, but such data are beginning to be available.

      Reviewer #2 (Public review):

      Summary:

      In this work, the authors introduce DIRseq, a fast, sequence-based method that predicts druginteracting residues (DIRs) in IDPs without requiring structural or drug information. DIRseq builds on the authors' prior work looking at NMR relaxation rates, and presumes that those residues that show enhanced R2 values are the residues that will interact with drugs, allowing these residues to be nominated from the sequence directly. By making small modifications to their prior tool, DIRseq enables the prediction of residues seen to interact with small molecules in vivo.

      Strengths:

      The preprint is well written and easy to follow

      Weaknesses:

      (1) The DIRseq method is based on SeqDYN, which itself is a simple (which I do not mean as a negative - simple is good!) statistical predictor for R2 relaxation rates. The challenge here is that R2 rates cover a range of timescales, so the physical intuition as to what exactly elevated R2 values mean is not necessarily consistent with "drug interacting". Presumably, the authors are not using the helix boost component of SeqDYN here (it would be good to explicitly state this). This is not necessarily a weakness, but I think it would behove the authors to compare a few alternative models before settling on the DIRseq method, given the somewhat ad hoc modifications to SeqDYN to get DIRseq.

      Actually, the factors that elevate R2 are well-established. These are local interactions and residual secondary structures (if any). The basic assumption of our method is that intra-IDP interactions that elevate R2 convert to IDP-drug interactions. This assumption was supported by our initial observation that the drug interaction propensity profiles predicted using the original SeqDYN parameters already showed good agreement with CSP profiles. We only made relatively small adjustments to the parameters to improve the agreement. Indeed we did not apply the helix boost portion of SeqDYN to DIRseq, and now state as such (p. 4, second last paragraph). We now also compare DIRseq with several alternative models, as summarized in new Table S2.

      Specifically, the authors previously showed good correlation between the stickiness parameter of Tesei et al and the inferred "q" parameter for SeqDYN; as such, I am left wondering if comparable accuracy would be obtained simply by taking the stickiness parameters directly and using these to predict "drug interacting residues", at which point I'd argue we're not really predicting "drug interacting residues" as much as we're predicting "sticky" residues, using the stickiness parameters. It would, I think, be worth the authors comparing the predictive power obtained from DIRseq with the predictive power obtained by using the lambda coefficients from Tesei et al in the model, local density of aromatic residues, local hydrophobicity (note that Tesei at al have tabulated a large set of hydrophobicity scores!) and the raw SeqDYN predictions. In the absence of lots of data to compare against, this is another way to convince readers that DIRseq offers reasonable predictive power.

      We now compare predictions of these various parameter sets, and report the results in Table S2.  In short, among all the tested parameter sets, DIRseq has the best performance as measured by (1) strong correlations between prediction scores and CSPs and (2) high true positives and low false positives (p. 7-9).

      (2) Second, the DIRseq is essentially SeqDYN with some changes to it, but those changes appear somewhat ad hoc. I recognize that there is very limited data, but the tweaking of parameters based on physical intuition feels a bit stochastic in developing a method; presumably (while not explicitly spelt out) those tweaks were chosen to give better agreement with the very limited experimental data (otherwise why make the changes?), which does raise the question of if the DIRseq implementation of SeqDYN is rather over-parameterized to the (very limited) data available now? I want to be clear, the authors should not be critiqued for attempting to develop a model despite a paucity of data, and I'm not necessarily saying this is a problem, but I think it would be really important for the authors to acknowledge to the reader the fact that with such limited data it's possible the model is over-fit to specific sequences studied previously, and generalization will be seen as more data are collected.

      We have explained the rationale for the parameter tweaks, which were limited to q values for four amino-acid types, i.e., to deemphasize hydrophobic interactions and slightly enhance electrostatic interactions (p. 4-5). We now add that these tweaks were motivated by observations from MD simulations of drug interactions with a-syn (ref 13). As already noted in the response to the preceding comment, we now also present results for the original parameter values as well as for when the four q values are changed one at a time.

      (3) Third, perhaps my biggest concern here is that - implicit in the author's assumptions - is that all "drugs" interact with IDPs in the same way and all drugs are "small" (motivating the change in correlation length). Prescribing a specific length scale and chemistry to all drugs seems broadly inconsistent with a world in which we presume drugs offer some degree of specificity. While it is perhaps not unexpected that aromatic-rich small molecules tend to interact with aromatic residues, the logical conclusion from this work, if one assumes DIRseq has utility, is that all IDRs bind drugs with similar chemical biases. This, at the very least, deserves some discussion.

      The reviewer raises a very important point. In Discussion, we now add that it is important to further develop DIRseq to include drug-specific parameters when data for training become available (p. 12-13). To illustrate this point, we use drug size as a simple example, which can be modeled by making the b parameter dependent on drug molecule size.

      (4) Fourth, the authors make some general claims in the introduction regarding the state of the art, which appear to lack sufficient data to be made. I don't necessarily disagree with the author's points, but I'm not sure the claims (as stated) can be made absent strong data to support them. For example, the authors state: "Although an IDP can be locked into a specific conformation by a drug molecule in rare cases, the prevailing scenario is that the protein remains disordered upon drug binding." But is this true? The authors should provide evidence to support this assertion, both examples in which this happens, and evidence to support the idea that it's the "prevailing view" and specific examples where these types of interactions have been biophysically characterized.

      We now cite nine studies showing that IDPs remain disordered upon drug binding.

      Similarly, they go on to say:

      "Consequently, the IDP-drug complex typically samples a vast conformational space, and the drug molecule only exhibits preferences, rather than exclusiveness, for interacting with subsets of residues." But again, where is the data to support this assertion? I don't necessarily disagree, but we need specific empirical studies to justify declarative claims like this; otherwise, we propagate lore into the scientific literature. The use of "typically" here is a strong claim, implying most IDP complexes behave in a certain way, yet how can the authors make such a claim? 

      Here again we add citations to support the statement.

      Finally, they continue to claim:

      "Such drug interacting residues (DIRs), akin to binding pockets in structured proteins, are key to optimizing compounds and elucidating the mechanism of action." But again, is this a fact or a hypothesis? If the latter, it must be stated as such; if the former, we need data and evidence to support the claim.

      We add citations to both compound optimization and mechanism of action.

      Reviewer #1 (Recommendations for the authors):

      (1) The authors should compare the sequences of the IDPs in the case studies with the 45 IDPs in training the SeqDYN model to make sure that they are not included in the training dataset or are highly homologous.

      Please note that the data used for training SeqDYN were R2 rates, which are independent of the property being studied here, i.e., drug interacting residues. Therefore whether the IDPs studied here were in the training set for SeqDYN is immaterial.

      (2) The authors manually tuned four parameters in SeqDYN to develop the model for predicting drug-interacting residues without giving strict testing or explanations. More explanations, testing of more values, and ablation testing should be given.

      As responded above, we now both expand the explanation and present more test results.

      (3) The authors changed the q values of L, I, and M to the value of V. What are the results if these values are not changed?

      These results are shown in Table S2 (entry named SeqDYN_orig).

      (4) Only one b value is chosen based on the assumption that a drug molecule interacts with 3-4 residues at a time. However, the number of interacting residues is related to the size of the drug molecule. Adjusting the b value with the size of the ligand may provide improvement. It is better to test the influence of adjusting b values. At least, this should be discussed.

      Good point! We now state that b potentially can be adjusted according to ligand size (p. 12-13). In addition, we also show the effect of varying b on the prediction results (Table S2; p. 8, last paragraph).

      (5) The authors add 12 Q to eliminate end effects. However, explanations on why 12 Qs are chosen should be given. How about other numbers of Q or using other residues (e.g., the commonly used residues in making links, like GS/PS or A?

      As we already explained, “Gln was selected because its 𝑞 value is at the middle of the 20 𝑞 values.” (p. 5, second paragraph). Also, 12 Qs are sufficient to remove any end effects; a higher number of Qs does not make any difference.

      Reviewer #2 (Recommendations for the authors):

      (1) The authors make reference to the "C-terminal IDR" in cMyc, but the region they note is found in the bHLH DNA binding domain (which falls from residue ~370-420).

      We now clarify that this region is disordered on its own but form a helix-loop-loop structure upon heterodimerization with Max (p. 11, last paragraph).

      (2) Given the fact that X-seq names are typically associated with sequencing-based methods, it's perhaps confusing to name this method DIRseq?

      We appreciate the reviewer’s point, but by now the preprint posted in bioRxiv is in wide circulation, and the DIRseq web server has been up for several months, so changing its name would cause a great deal of confusion.

      (3) I'd encourage the authors just to spell out "drug interacting residues" and retain an IDR acronym for IDRs. Acronyms rarely make writing clearer, and asking folks to constantly flip between IDR and DIR is asking a lot of an audience (in this reviewer's opinion, anyway).

      The reviewer makes a good point; we now spell out “drug-interacting residues”.

      (4) The assumption here is that CSPs result from direct drug:IDR interactions. However, CSPs result from a change in the residue chemical environment, which could in principle be an indirect effect (e.g., in the unbound state, residues A and B interact; in the bound state, residue A is now free, such that it experiences a CSP despite not engaging directly). While I recognize such assumptions are commonly made, it behoves the authors to explicitly make this point so the reader understands the relationship between CSPs and binding.

      We did add caveats of CSP in Introduction (p. 3, second paragraph).

      (5) On the figures, please label which protein is which figure, as well as provide a legend for the annotations on the figures (red line, blue bar, cyan region, etc.)

      We now label protein names in Fig. 1. For annotation of display items, it is also made in the Figs. 2 and 3 captions; we now add it to the Fig. 4 caption.

      (6) abstract: "These successes augur well for deciphering the sequence code for IDP-drug binding." - This is not grammatically correct, even if augur were changed to agree. Suggest rewriting.

      “Augur well” means to be a good sign (for something). We use this phrase here in this meaning.

      (6) page 5: "we raised the 𝑞 value of Asp to be the same as that of Glu" → suggested "increased" instead of raised.

      We have made the suggested change.

      (7) The authors should consider releasing the source code (it is available via the .js implementation on the server, but this is not very transferable/shareable, so I'd encourage the authors to provide a stand-alone implementation that's explicitly shareable).

      We have now added a link for the user to download the source code.

    1. Author response:

      The following is the authors’ response to the current reviews.

      eLife Assessment

      The authors examine the effect of cell-free chromatin particles (cfChPs) derived from human serum or from dying human cells on mouse cells in culture and propose that these cfChPs can serve as vehicles for cell-to-cell active transfer of foreign genetic elements. The work presented in this paper is intriguing and potentially important, but it is incomplete. At this stage, the claim that horizontal gene transfer can occur via cfChPs is not well supported because it is only based on evidence from one type of methodological approach (immunofluorescence and fluorescent in situ hybridization (FISH)) and is not validated by whole genome sequencing.

      We disagree with the eLife assessment that our study is incomplete because we did not perform whole genome sequencing. Tens of thousands of genomes have been sequenced, and yet they have failed to detect the presence of the numerous “satellite genomes” that we describe in our paper. To that extent whole genome sequencing has proved to be an inappropriate technology. Rather, eLife should have commended us for the numerous control experiments that we have done to ensure that our FISH probes and antibodies are target specific and do not cross-react.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Horizontal gene transfer is the transmission of genetic material between organisms through ways other than reproduction. Frequent in prokaryotes, this mode of genetic exchange is scarcer in eukaryotes, especially in multicellular eukaryotes. Furthermore, the mechanisms involved in eukaryotic HGT are unknown. This article by Banerjee et al. claims that HGT occurs massively between cells of multicellular organisms. According to this study, the cell free chromatin particles (cfChPs) that are massively released by dying cells are incorporated in the nucleus of neighboring cells.

      The reviewer is mistaken. We do not claim that the internalized cfChPs are incorporated into the nucleus. We show throughout the paper that the cfChPs perform their novel functions autonomously outside the genome without being incorporated into the nucleus. This is clearly seen in all our chromatin fibre images, metaphase spreads and our video abstract. Occasionally, when the cfChPs fluorescent signal overlie the chromosomes, we have been careful to state that the cfChPs are associated with the chromosomes without implying that they have integrated.

      These cfChPs are frequently rearranged and amplified to form concatemers, they are made of open chromatin, expressed, and capable of producing proteins. Furthermore, the study also suggests that cfChPs transmit transposable elements (TEs) between cells on a regular basis, and that these TEs can transpose, multiply, and invade receiving cells. These conclusions are based on a series of experiments consisting in releasing cfChPs isolated from various human sera into the culture medium of mouse cells, and using FISH and immunofluorescence to monitor the state and fate of cfChPs after several passages of the mouse cell line.

      Strengths:

      The results presented in this study are interesting because they may reveal unsuspected properties of some cell types that may be able to internalize free-circulating chromatin, leading to its chromosomal incorporation, expression, and unleashing of TEs. The authors propose that this phenomenon may have profound impacts in terms of diseases and genome evolution. They even suggest that this could occur in germ cells, leading to within-organism HGT with long-term consequences.

      Again the reviewer makes the same mistake. We do not claim that the internalized cfChPs are incorporated into the chromosomes. We have addressed this issue above.

      We have a feeling that the reviewer has not understood our work – which is the discovery of “satellite genomes” which function autonomously outside the nuclear genome.

      Weaknesses:

      The claims of massive HGT between cells through internalization of cfChPs are not well supported because they are only based on evidence from one type of methodological approach: immunofluorescence and fluorescent in situ hybridization (FISH) using protein antibodies and DNA probes. Yet, such strong claims require validation by at least one, but preferably multiple, additional orthogonal approaches. This includes, for example, whole genome sequencing (to validate concatemerization, integration in receiving cells, transposition in receiving cells), RNA-seq (to validate expression), ChiP-seq (to validate chromatin state).

      We disagree with the reviewer that our study is incomplete because we did not perform whole genome sequencing. Tens of thousands of genomes have been sequenced, and yet they have failed to detect the presence of the numerous “satellite genomes” that we describe in our paper. To that extent whole genome sequencing has proved to be an inappropriate approach. Rather, the reviewer should have commended us for the numerous control experiments that we have done to ensure that our FISH probes and antibodies are target specific and do not cross-react.

      Should HGT through internalization of circulating chromatin occur on a massive scale, as claimed in this study, and as illustrated by the many FISH foci observed on Fig 3 for example, one would expect that the level of somatic mosaicism may be so high that it would prevent assembling a contiguous genome for a given organism. Yet, telomere-to-telomere genomes have been produced for many eukaryote species, calling into question the conclusions of this study.

      The reviewer has raised a related issue below and we have responded to both of them together.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I thank the authors for taking my comments and those of the other reviewer into account and for adding new material to this new version of the manuscript. Among other modifications/additions, they now mention that they think that NIH3T3 cells treated with cfChPs die out after 250 passages because of genomic instability which might be caused by horizontal transfer of cfChPs DNA into the genome of treated cells (pp. 45-46, lines 725-731). However, no definitive formal proof of genomic instability and horizontal transfer is provided.

      We mention that the NIH3T3 cells treated with cfChPs die out after 250 passages in response to the reviewer’s earlier comment “Should HGT through internalization of circulating chromatin occur on a massive scale, as claimed in this study, and as illustrated by the many FISH foci observed in Fig 3 for example, one would expect that the level of somatic mosaicism may be so high that it would prevent assembling a contiguous genome for a given organism”.

      We have agreed with the reviewer and have simply speculated that the cells may die because of extreme genomic instability. We have left it as a speculation without diverting our paper in a different direction to prove genomic instability.

      The authors now refer to an earlier study they conducted in which they Illumina-sequenced NIH3T3 cells treated with cfChPs (pp. 48, lines. 781-792). This study revealed the presence of human DNA in the mouse cell culture. However, it is unclear to me how the author can conclude that the human DNA was inside mouse cells (rather than persisting in the culture medium as cfChPs) and it is also unclear how this supports horizontal transfer of human DNA into the genome of mouse cells. Horizontal transfer implies integration of human DNA into mouse DNA, through the formation of phosphodiester bounds between human nucleotides and mouse nucleotides. The previous Illumina-sequencing study and the current study do not show that such integration has occured. I might be wrong but I tend to think that DNA FISH signals showing that human DNA lies next to mouse DNA does not necessarily imply that human DNA has integrated into mouse DNA. Perhaps such signals could result from interactions at the protein level between human cfChPs and mouse chromatin?

      With due respect, our earlier genome sequencing study that the reviewer refers to was done on two single cell clones developed following treatment with cfChPs. So, the question of cfChPs lurking in the culture medium does not arise.

      The authors should be commended for doing so many FISH experiments. But in my opinion, and as already mentioned in my earlier review of this work, horizontal transfer of human DNA into mouse DNA should first be demonstrated by strong DNA sequencing evidence (multiple long and short reads supporting human/mouse breakpoints; discarding technical DNA chimeras) and only then eventually confirmed by FISH.

      As mentioned earlier, we disagree with the reviewer that our study is incomplete because we did not perform whole genome sequencing. Tens of thousands of genomes have been sequenced, and yet they have failed to detect the presence of the numerous “satellite genomes” that we describe in our paper. To that extent whole genome sequencing has proved to be an inappropriate approach. Rather, the reviewer should have commended us for the numerous control experiments that we have done to ensure that our FISH probes and antibodies are target specific and do not cross-react.

      Regarding my comment on the quantity of human cfChPs that has been used for the experiments, the authors replied that they chose this quantity because it worked in a previous study. Could they perhaps explain why they chose this quantity in the earlier study? Is there any biological reason to choose 10 ng and not more or less? Is 10 ng realistic biologically? Could it be that 10 ng is orders of magnitude higher than the quantity of cfChPs normally circulating in multicellular organisms and that this could explain, at least in part, the results obtained in this study?

      The reviewer again raises the same issue to which we have already addressed in our revised manuscript. To quote “We chose to use 10ng based on our earlier report in which we had obtained robust biological effects such as activation of DDR and activation of apoptotic pathways using this concentration of cfChPs (Mittra I et. al., 2015)”.

      It is also mentioned in the response that RNA-seq has been performed on mouse cells treated with cfChPs, and that this confirms human-mouse fusion (genomic integration). Since these results are not included in the manuscript, I cannot judge how robust they are and whether they reflect a biological process rather than technical issues (technical chimeras formed during the RNA-seq protocol is a well-known artifact). In any case, I do not think that genomic integration can be demonstrated through RNA-seq as junction between human and mouse RNA could occur at the RNA level (i.e. after transcription). RNA-seq could however show whether human-mouse chimeras that have been validated by DNA-sequencing are expressed or not.

      We did perform transcriptome sequencing as suggested earlier by the reviewer, but realized that the amount of material required to be incorporated into the manuscript to include “material and methods”, “results”, “discussion”, “figures” and “legends to figures” and “supplementary figures and tables” would be so massive that it will detract from the flow of our work and hijack it in a different direction. We have, therefore, decided to publish the transcriptome results as a separate manuscript.

      Given these comments, I believe that most of the weaknesses I mentioned in my review of the first version of this work still hold true.

      An important modification is that the work has been repeated in other cell lines, hence I removed this criticism from my earlier review.

      Additional changes made

      (1) We have now rewritten the “Abstract” to 250 words to fit in eLife’s instructions. (It was not possible to reduce the word count further.

      (2) We have provided the Video 1 as separate file instead of link.

      (3) Some of Figure Supplements (which were stand-alone) are now given as main figures. We have re-arranged Figures and Figure Supplements in accordance with eLife’s instructions.

      (4) We have now provided a list of the various cell lines used in this study, their tissue origin and procurement source in Supplementary File 3.


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Horizontal gene transfer is the transmission of genetic material between organisms through ways other than reproduction. Frequent in prokaryotes, this mode of genetic exchange is scarcer in eukaryotes, especially in multicellular eukaryotes. Furthermore, the mechanisms involved in eukaryotic HGT are unknown. This article by Banerjee et al. claims that HGT occurs massively between cells of multicellular organisms. According to this study, the cell free chromatin particles (cfChPs) that are massively released by dying cells are incorporated in the nucleus of neighboring cells. These cfChPs are frequently rearranged and amplified to form concatemers, they are made of open chromatin, expressed, and capable of producing proteins. Furthermore, the study also suggests that cfChPs transmit transposable elements (TEs) between cells on a regular basis, and that these TEs can transpose, multiply, and invade receiving cells. These conclusions are based on a series of experiments consisting in releasing cfChPs isolated from various human sera into the culture medium of mouse cells, and using FISH and immunofluorescence to monitor the state and fate of cfChPs after several passages of the mouse cell line.

      Strengths:

      The results presented in this study are interesting because they may reveal unsuspected properties of some cell types that may be able to internalize free-circulating chromatin, leading to its chromosomal incorporation, expression, and unleashing of TEs. The authors propose that this phenomenon may have profound impacts in terms of diseases and genome evolution. They even suggest that this could occur in germ cells, leading to within-organism HGT with long-term consequences.

      Weaknesses:

      The claims of massive HGT between cells through internalization of cfChPs are not well supported because they are only based on evidence from one type of methodological approach: immunofluorescence and fluorescent in situ hybridization (FISH) using protein antibodies and DNA probes. Yet, such strong claims require validation by at least one, but preferably multiple, additional orthogonal approaches. This includes, for example, whole genome sequencing (to validate concatemerization, integration in receiving cells, transposition in receiving cells), RNA-seq (to validate expression), ChiP-seq (to validate chromatin state).

      We have responded to this criticism under “Reviewer #1 (Recommendations for the authors, item no. 1-4)”.

      Another weakness of this study is that it is performed only in one receiving cell type (NIH3T3 mouse cells). Thus, rather than a general phenomenon occurring on a massive scale in every multicellular organism, it could merely reflect aberrant properties of a cell line that for some reason became permeable to exogenous cfChPs. This begs the question of the relevance of this study for living organisms.

      We have responded to this criticism under “Reviewer #1 (Recommendations for the authors, item no. 6)”.

      Should HGT through internalization of circulating chromatin occur on a massive scale, as claimed in this study, and as illustrated by the many FISH foci observed in Fig 3 for example, one would expect that the level of somatic mosaicism may be so high that it would prevent assembling a contiguous genome for a given organism. Yet, telomere-to-telomere genomes have been produced for many eukaryote species, calling into question the conclusions of this study.

      The reviewer is right in expecting that the level of somatic mosaicism may be so high that it would prevent assembling a contiguous genome. This is indeed the case, and we find that beyond ~ 250 passages the cfChPs treated NIH3T3 cells begin to die out apparently become their genomes have become too unstable for survival. This point will be highlighted in the revised version (pp. 45-46, lines 725-731).

      Reviewer #2 (Public review):

      I must note that my comments pertain to the evolutionary interpretations rather than the study's technical results. The techniques appear to be appropriately applied and interpreted, but I do not feel sufficiently qualified to assess this aspect of the work in detail.

      I was repeatedly puzzled by the use of the term "function." Part of the issue may stem from slightly different interpretations of this word in different fields. In my understanding, "function" should denote not just what a structure does, but what it has been selected for. In this context, where it is unclear if cfChPs have been selected for in any way, the use of this term seems questionable.

      We agree. We have removed the term “function” wherever we felt we had used it inappropriately.

      Similarly, the term "predatory genome," used in the title and throughout the paper, appears ambiguous and unjustified. At this stage, I am unconvinced that cfChPs provide any evolutionary advantage to the genome. It is entirely possible that these structures have no function whatsoever and could simply be byproducts of other processes. The findings presented in this study do not rule out this neutral hypothesis. Alternatively, some particular components of the genome could be driving the process and may have been selected to do so. This brings us to the hypothesis that cfChPs could serve as vehicles for transposable elements. While speculative, this idea seems to be compatible with the study's findings and merits further exploration.

      We agree with the reviewer’s viewpoint. We have replaced the term “predatory genome” with a more realistic term “satellite genome” in the title and throughout the manuscript. We have also thoroughly revised the discussion section and elaborated on the potential role of LINE-1 and Alu elements carried by the concatemers in mammalian evolution. (pp. 46-47, lines 743-756).

      I also found some elements of the discussion unclear and speculative, particularly the final section on the evolution of mammals. If the intention is simply to highlight the evolutionary impact of horizontal transfer of transposable elements (e.g., as a source of new mutations), this should be explicitly stated. In any case, this part of the discussion requires further clarification and justification.

      As mentioned above, we have revised the “discussion” section taking into account the issues raised by the reviewer and highlighted the potential role of cfChPs in evolution by acting as vehicles of transposable elements.

      In summary, this study presents important new findings on the behavior of cfChPs when introduced into a foreign cellular context. However, it overextends its evolutionary interpretations, often in an unclear and speculative manner. The concept of the "predatory genome" should be better defined and justified or removed altogether. Conversely, the suggestion that cfChPs may function at the level of transposable elements (rather than the entire genome or organism) could be given more emphasis.

      As mentioned above, we have replaced the term “predatory genome” with “satellite genome” and revised the “discussion” section taking into account the issues raised by the reviewer.

      Reviewer #1 (Recommendations for the authors):

      (1) I strongly recommend validating the findings of this study using other approaches. Whole genome sequencing using both short and long reads should be used to validate the presence of human DNA in the mouse cell line, as well as its integration into the mouse genome and concatemerization. Breakpoints between mouse and human DNA can be searched in individual reads. Finding these breakpoints in multiple reads from two or more sequencing technologies would strengthen their biological origin. Illumina and ONT sequencing are now routinely performed by many labs, such that this validation should be straightforward. In addition to validating the findings of the current study, it would allow performance of an in-depth characterization of the rearrangements undergone by both human cfChPs and the mouse genome after internalization of cfChPs, including identification of human TE copies integrated through bona fide transposition events into the mouse genome. New copies of LINE and Alu TEs should be flanked by target site duplications. LINE copies should be frequently 5' truncated, as observed in many studies of somatic transposition in human cells.

      (2) Furthermore, should the high level of cell-to-cell HGT detected in this study occur on a regular basis within multicellular organisms, validating it through a reanalysis of whole genome sequencing data available in public databases should be relatively easy. One would expect to find a high number of structural variants that for some reason have so far gone under the radar.

      (3) Short and long-read RNA-seq should be performed to validate the expression of human cfChPs in mouse cells. I would also recommend performing ChIP-seq on routinely targeted histone marks to validate the chromatin state of human cfChPs in mouse cells.

      (4) The claim that fused human proteins are produced in mouse cells after exposing them to human cfChPs should be validated using mass spectrometry.

      The reviewer has suggested a plethora of techniques to validate our findings. Clearly, it is neither possible to undertake all of them nor to incorporate them into the manuscript. However, as suggested by the reviewer, we did conduct transcriptome sequencing of cfChPs treated NIH3T3 cells and were able to detect the presence of human-human fusion sequences (representing concatemerisation) as well as human-mouse fusion sequences (representing genomic integration). However, we realized that the amount of material required to be incorporated into the manuscript to include “material and methods”, “results”, “discussion”, “figures” and “legends to figures” and “supplementary figures and tables” would be so massive that it will detract from the flow of our work and hijack it in a different direction. We have, therefore, decided to publish the transcriptome results as a separate manuscript. However, to address the reviewer’s concerns we have now referred to results of our earlier whole genome sequencing study of NIH3T3 cells similarly treated with cfChPs wherein we had conclusively detected the presence of human DNA and human Alu sequences in the treated mouse cells. These findings have now been added as an independent paragraph (pp. 48, lines. 781-792).

      (5) It is unclear from what is shown in the paper (increase in FISH signal intensity using Alu and L1 probes) if the increase in TE copy number is due to bona fide transposition or to amplification of cfChPs as a whole, through mechanisms other than transposition. It is also unclear whether human TEs end up being integrated into the neighboring mouse genome. This should be validated by whole genome sequencing.

      Our results suggest that TEs amplify and increase their copy number due to their association with DNA polymerase and their ability to synthesize DNA (Figure 14a and b). Our study design cannot demonstrate transposition which will require real time imaging.

      The possibility of incorporation of TEs into the mouse genome is supported by our earlier genome sequencing work, referred to above, wherein we detected multiple human Alu sequences in the mouse genome (pp. 48, lines. 781-792).

      (6) In order to be able to generalize the findings of this study, I strongly encourage the authors to repeat their experiments using other cell types.

      We thank the reviewer for this suggestion. We have now used four different cell lines derived from four different species and demonstrated that horizontal transfer of cfChPs occur in all of them suggesting that it is a universal phenomenon. (pp. 37, lines 560-572) and (Supplementary Fig. S14a-d).

      We have also mentioned this in the abstract (pp. 3, lines 52-54).

      (7) Since the results obtained when using cfChPs isolated from healthy individuals are identical to those shown when using cfChPs from cancer sera, I wonder why the authors chose to focus mainly on results from cancer-derived cfChPs and not on those from healthy sera.

      Most of the experiments were conducted using cfChPs isolated from cancer patients because of our especial interest in cancer, and our earlier results (Mittra et al., 2015) which had shown that cfChPs isolated from cancer patients had significantly greater activity in terms of DNA damage and activation of apoptotic pathways than those isolated from healthy individuals. We have now incorporated the above justification on (pp. 6, lines. 124-128).

      (8) Line 125: how was the 10-ng quantity (of human cfChPs added to the mouse cell culture) chosen and how does it compare to the quantity of cfChPs normally circulating in multicellular organisms?

      We chose to use 10ng based on our earlier report in which we had obtained robust biological effects such as activation of DDR and apoptotic pathways using this concentration of cfChPs (Mittra I et. al. 2015). We have now incorporated the justification of using this dose in our manuscript (pp. 51-52, lines. 867-870).

      (9) Could the authors explain why they repeated several of their experiments in metaphase spreads, in addition to interphase?

      We conducted experiments on metaphase spreads in addition to those on chromatin fibres because of the current heightened interest in extra-chromosomal DNA in cancer, which have largely been based on metaphase spreads. We were interested to see how the cfChP concatemers might relate to the characteristics of cancer extrachromosomal DNA and whether the latter in fact represent cfChPs concatemers acquired from surrounding dying cancer cells. We have now mentioned this on pp. 7, lines 150-155.

      (10) Regarding negative controls consisting in checking whether human probes cross-react with mouse DNA or proteins, I suggest that the stringency of washes (temperature, reagents) should be clearly stated in the manuscript, such that the reader can easily see that it was identical for controls and positive experiments.

      We were fully aware of these issues and were careful to ensure that washing steps were conducted meticulously. The careful washing steps have been repeatedly emphasized under the section on “Immunofluorescence and FISH” (pp. 54-55, lines. 922-944).

      (11) I am not an expert in Immuno-FISH and FISH with ribosomal probes but it can be expected that ribosomal RNA and RNA polymerase are quite conserved (and thus highly similar) between humans and mice. A more detailed explanation of how these probes were designed to avoid cross-reactivity would be welcome.

      We were aware of this issue and conducted negative control experiment to ensure that the human ribosomal RNA probe and RNA polymerase antibody did not cross-react with mouse. Please see Supplementary Fig. S4c.

      (12) Finally, I could not understand why the cfChPs internalized by neighboring cells are called predatory genomes. I could not find any justification for this term in the manuscript.

      We agree and this criticism has also been made by #Reviewer 2. We have now replaced the term “predatory” genomes with “satellite” genomes.

      Reviewer #2 (Recommendations for the authors):

      (1) P2 L34: The term "role" seems to imply "what something is supposed to do" (similar to "function"). Perhaps "impact" would be more neutral. Additionally, "poorly defined" is vague-do you mean "unknown"?

      We thank the reviewer for this suggestion. We have now rephrased the sentence to read “Horizontal gene transfer (HGT) plays an important evolutionary role in prokaryotes, but it is thought to be less frequent in mammals.” (pp. 2, lines. 26-27).

      (2) P2 L35: It seems that the dash should come after "human blood."

      Thank you, we have changed the position of the dash (pp. 2, line. 29).

      (3) P2 L37: Must we assume these structures have a function? Could they not simply be side effects of other processes?

      We think this is a matter of semantics, especially since we show that cfChPs once inside the cell perform many functions such as replication, DNA synthesis, RNA synthesis, protein synthesis etc. We, therefore, think the word “function” is not inappropriate.

      (4) Abstract: After reading the abstract, I am unclear on the concept of a "predatory genome." Based on the summarized results, it seems one cannot conclude that these elements provide any adaptive value to the genome.

      We agree. We have now replaced the term “predatory” genomes with a more realistic term viz. “satellite” genomes.

      (5) Video abstract: The video abstract does not currently stand on its own and needs more context to be self-explanatory.

      Thank you for pointing this out. We have now created a new and much more professional video with more context which we hope will meet with the reviewer’s approval.

      (6) P4 L67: Again, I am uncertain that HGT should be said to have "a role" in mammals, although it clearly has implications and consequences. Perhaps "role" here is intended to mean "consequence"?

      We have now changed the sentence to read as follows “However, defining the occurrence of HGT in mammals has been a challenge” (pp. 4, line. 73).

      (7) P6 L111: The phrase "to obtain a new perspective about the process of evolution" is unclear. What exactly is meant by this statement?

      We have replaced this sentence altogether which now reads “The results of these experiments are presented in this article which may help to throw new light on mammalian evolution, ageing and cancer” (pp. 5-6, lines 116-118).

      (8) P38 L588: The term "predatory genome" has not been defined, making it difficult to assess its relevance.

      This issue has been addressed above.

      (9) P39 L604: The statement "transposable elements are not inherent to the cell" suggests that some TEs could originate externally, but this does not rule out that others are intrinsic. In other words, TEs are still inherent to the cell.

      This part of the discussion section has been rewritten and the above sentence has been deleted.

      (10) P39 L609: The phrase "may have evolutionary functions by acting as transposable elements" is unclear. Perhaps it is meant that these structures may serve as vehicles for TEs?

      This sentence has disappeared altogether in the revised discussion section.

      (11) P41 L643: "Thus, we hypothesize ... extensively modified to act as foreign genetic elements." This sentence is unclear. Are the authors referring to evolutionary changes in mammals in general (which overlooks the role of standard mutational processes)? Or is it being proposed that structural mutations (including TE integrations) could be mediated by cfChPs in addition to other mutational mechanisms?

      We have replaced this sentence which now reads “Thus, “within-self” HGT may occur in mammals on a massive scale via the medium of cfChP concatemers that have undergone extensive and complex modifications resulting in their behaviour as “foreign” genetic elements” (pp. 47, lines 763-766).

      (12) P41 L150: The paragraph beginning with "It has been proposed that extreme environmental..." transitions too abruptly from HGT to adaptation. Is it being proposed that cfChPs are evolutionary processes selected for their adaptive potential? This idea is far too speculative at this stage and requires clarification.

      We agree. This paragraph has been removed.

      (13) P43 L681: This summary appears overly speculative and unclear, particularly as the concept of a "predatory genome" remains undefined and thus cannot be justified. It suggests that cfChPs represent an alternative lifestyle for the entire genome, although alternative explanations seem far more plausible at this point.

      We have now replaced the term “predatory” genome with “satellite” genome. The relevant part of the summary section has also been partially revised (pp. 49-50, lines 817-831).

      Changes independent of reviewers’ comments.

      We have made the following additions / modifications.

      (1) The abstract has been modified and it’s “conclusion” section has been rewritten.

      (2) Section 1.14 has been newly added together with accompanying Figures 15 a,b and c.

      (3) The “Discussion” section has been greatly modified and parts of it has been rewritten.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The present work studies the coevolution of HIV-1 and the immune response in clinical patient data. Using the Marginal Path Likelihood (MPL) framework, they infer selection coefficients for HIV mutations from time-series data of virus sequences as they evolve in a given patient.

      Strengths:

      The authors analyze data from two human patients, consisting of HIV population sequence samples at various points in time during the infection. They infer selection coefficients from the observed changes in sequence abundance using MPL. Most beneficial mutations appear in viral envelop proteins. The authors also analyze SHIV samples in rhesus macaques, and find selection coefficients that are compatible with those found in the corresponding human samples.

      Weaknesses:

      The MPL method used by the authors considers only additive effects of mutations, thus ignoring epistasis.

      As suggested, we have now addressed this limitation by inferring epistatic fitness landscapes for CH505, CH848, SHIV.CH505, and SHIV.CH848. Indeed, the computational burden of the epistasis inference procedure was one constraint that motivated us to consider only additive fitness in the previous version of our paper. The original approach developed by Sohail et al. (2022) tested only sequences with <50 sites due to this limitation, far smaller than the ones we consider. Beyond this computational constraint, we also believed that 1) an additive fitness model may suffice to capture local fitness landscapes, and practically, 2) epistatic interactions are more challenging to validate than the effects of individual mutations, making the interpretation of the model more complex.

      However, after performing the analyses described in this paper, we developed a new approach for identifying epistatic interactions that can scale to much longer sequences (Shimagaki et al., Genetics, in press). We therefore applied this method to infer an epistatic fitness landscape for the HIV and SHIV data sets that we studied. As in that work, we focused on short-range (<50 bp) interactions which we could more confidently estimate from data. We have added a section in the SI describing the epistatic fitness model and our analysis. 

      Overall, we found substantial agreement between the epistatic and purely additive models in terms of the estimated fitness effects of individual mutations (new Supplementary Fig. 8) and overall fitness (Supplementary Fig. 9). Consistent with our prior work, we did not find substantial evidence for very strong epistatic interactions (Supplementary Fig. 10). This does not necessarily mean that strong epistatic interactions do not exist; rather, this shows that strong interactions don’t substantially improve the fit of the model to data, and thus many are regularized toward zero. While the biological validation of epistatic interactions is challenging, we found that the largest epistatic interactions, which we defined as the top 1% of all shortrange interactions, were modestly but significantly enriched in the CD4 binding site, V1 and V5 regions for CH505 and in the CD4 binding site, V4, and V5 for CH848. In addition, mutation pairs N280S/V281A and E275K/V281G, which confer resistance to CH235, ranked in the top 15% of all epistatic interactions in CH505.

      We have now included an additional section in the Results, “Robustness of inferred selection to changes in the fitness model and finite sampling”, which discusses our epistatic analyses (page 6, lines 415-464), along with the above Supplementary Figures and a technical section in the SI summarizing the epistasis inference approach.

      Although the evolution of broadly neutralizing antibodies (bnAbs) is a motivating question in the introduction and discussion sections (and the title), the relevance of the analysis and results to better understanding how bnAbs arise is not clear. The only result presented in direct connection to bnAbs is Figure 6.

      It is true that, while bnAb development is a major motivator of our study, our analysis focuses on HIV-1 and does not directly consider antibody evolution. We have now brought attention to this point as a limitation directly in the Discussion. Following the suggestion below in the “Recommendations for the authors,” we have edited our manuscript to place more emphasis on viral fitness and somewhat reduce the emphasis on bnAbs, though this remains an important motivating factor. Specifically, the Abstract now begins

      Human immunodeficiency virus (HIV)-1 evolves within individual hosts to escape adaptive immune responses while maintaining its capacity for replication. Coevolution between the HIV-1 and the immune system generates extraordinary viral genetic diversity. In some individuals, this process also results in the development of broadly neutralizing antibodies (bnAbs) that can neutralize many viral variants, a key focus of HIV-1 vaccine design. However, a general understanding of the forces that shape virusimmune coevolution within and across hosts remains incomplete. Here we performed a quantitative study of HIV-1 evolution in humans and rhesus macaques, including individuals who developed bnAbs.

      We have similarly modified the Discussion to focus first on viral fitness. In response to comments from Reviewer 3, we have also more clearly articulated how our work might contribute to the understanding of bnAb development in the Discussion.

      Questions or suggestions for further discussion:

      I list here a number of points for which I believe the paper would benefit if additional discussion/results were included.

      The MPL method used by the authors considers only additive effects of mutations, thus ignoring epistasis. In Sohail et al (2022) MBE 39(10), p. msac199  (https://doi.org/10.1093/molbev/msac199) an extension of MPL is developed allowing one to infer epistasis. Can the authors comment on why this was not attempted here?

      I presume one possible reason is that epistasis inference requires considerably more computational effort (and more data). However, since the authors find most beneficial mutations occurring in Env, perhaps restricting the analysis to Env genes only (e.g. the trimer shown in Figure 2) can lead to tractable inference of epistasis within this segment (instead of the full genome).

      As described above, we have now addressed this comment by inferring epistatic fitness landscapes for the data sets that we consider. Our overall results using the epistatic fitness model are consistent with the ones that we previously obtained with an additive model.

      Do the authors find correlations in the inferred selection coefficients of the two samples CH505 and CH848? I could not find any discussion of this in the manuscript. Only correlations between Humans and RM are discussed.

      To address this question, we compared the fitness values and individual selection coefficients across CH505 and CH848 data sets. We found little correlation between CH505 and CH848 fitness values (shown in a new Supplementary Fig. 6) or selection coefficients. We found only 199 common mutations between HIV-1 amino acid sequences from CH505 and CH848 out of 868 and 1,406 total mutations, respectively. Thus, we were not surprised to find no strong relationship between fitness estimates from CH505 and CH848 data sets. 

      Reviewer #2 (Public review):

      Summary:

      This paper combines a biological topic of interest with the demonstration of important theoretical/methodological advances. Fitness inference is the foundation of the quantitative analysis of adapting systems. It is a hard and important problem and this paper highlights a compelling approach (MPL) first presented in (1) and refined in (2), roughly summarized in equation 12.

      (1) Sohail, M. S., Louie, R. H., McKay, M. R. & Barton, J. P. Mpl resolves genetic linkage in fitness inference from complex evolutionary histories. Nature biotechnology 39, 472-479 (2021).

      (2) Shimagaki, K. & Barton, J. P. Bézier interpolation improves the inference of dynamical models from data. Physical Review E 107, 024116 (2023).

      The authors find that positive selection shapes the variable regions of env in shared patterns across two patient donors. The patterns of positive selection are interesting in and of themselves, they confirm the intuition that hyper-variation in env is the result of immune evasion rather than a broadly neutral landscape (flatness). They show that the immune evasion patterns due to CD8 T and naive B-cell selection are shared across patients. Furthermore, they suggest that a particular evolutionary history (larger flux to high fitness states) is associated with bNAb emergence. Mimicking this evolutionary pattern in vaccine design may help us elicit bNAbs in patients in the future.

      There is a lot of information to be found in the full fitness landscape of env. The enormous strength of reversion-to-consensus in the patterns is a known pattern of HIV post-infection populations but they are nicely quantified here. Agreement between SHIV and HIV evolution is shown. They find selection is larger for autologous antibodies than the bNAbs themselves (perhaps bNAbs are just too small a component of the host response to drive the bulk of selection?), and that big fitness increases precede antibody breadth in rhesus macaques, suggesting that this fitness increase is the immune challenge required to draw forth a bNAb. This is all of high interest to HIV researchers.

      Strength of evidence:

      One limitation is, of course, that the fitness model is constant in time when the immune challenge is variable and changing. This simplification may complicate some interpretations.

      We agree that this is a limitation of our current approach. In prior work, we have found that the constant fitness effects of mutations that we infer typically reflect the time-averaged fitness effect when the selection changes over time (Gao and Barton, PNAS 2025; Lee et al., Nat Commun 2025). It could be difficult, however, to capture changes in selection that fluctuate rapidly with underlying immune responses. We have added a new paragraph in the Discussion that more clearly sets out some of the limitations of our analysis, including our assumption of constant selection coefficients.

      There are additional methodological and technical limitations that should be considered in the interpretation of our results. Most notably, we assume that the viral fitness landscape is static in time. While we do not expect selection for effective replication (“intrinsic” fitness) to change substantially over time, pressure for immune escape could vary along with the immune responses that drive them. In prior work, we have found that constant selection coefficients typically reflect the average fitness effect of a mutation when its true contribution to fitness is time-varying [42,43]. This may not adequately description mutational effects that undergo large or rapid shifts in time. Future work should also examine temporal patterns in selection for individual mutations.

      Equation 12 in the methods is really a beautiful tool because it is so simple, but accounts for linkage and can be solved precisely even in the presence of detailed mutational and selection models. However, the reliance on incomplete observations of the frequency leads to complications that must be carefully (re)addressed here.

      For instance, the consistent finding of strong selection in hypervariable regions is biologically intuitive but so striking, that I worry that it might be the result of a bias for selection in high entropy regions. 

      Thank you for this suggestion. We agree that it is important to carefully interrogate these results. To assess the effects of general sequence variability on inferred selection, we first computed a position-specific entropy measure, H<sub >i</sub >, for each site i. We first defined the time-dependent entropy H<sub >i</sub >(t) = - ∑<sub >a</sub> x<sub>i</sub> (a, t) log x<sub>i</sub> (a, t)), where x<sub>i</sub> (a, t) represents the frequency of amino acid/nucleotide a at position i and time t, at each sample time. We then computed H<sub>i</sub> as the average of H<sub>i</sub>(t) across all sample times. A new Supplementary Fig. 1 plots the entropy against the inferred selection coefficients. Although some sequence variation must be observed in order for us to infer that a mutation is beneficial, we did not find a systematic bias toward larger (more beneficial) selection coefficients at more variable sites. Overall, we found only a modest correlation between inferred selection coefficients and entropy (Pearson’s r = 0.33 and 0.29 for CH505 and CH848, respectively), which appears to be partly driven by the tendency for mutations inferred to be significantly deleterious to occur at sites with low entropy. In addition to the new Supplementary Figure, we have added a reference to this analysis in the main text:

      To test whether our results might be biased by overall sequence variability, we examined the relationship between our inferred selection coefficients and entropy, a common measure of sequence variability. Overall, we found only a modest correlation between selection and entropy, suggesting that the signs of selection that we observe are not due to increased sequence variability alone (Supplementary Fig. 1).

      Mutational and covariance terms in equation 12 might be underestimated, due to finite sampling effect in highly diverse populations. Sampling effects lead to zeros in x(t) when actual frequency zeros might be rare at the population sizes of HIV viral loads and mutation rates. Both mutational flux and C underestimation will bias selection upward in eq. 12. 

      The prior papers (1) and (2) seem to show robustness to finite sampling effects, but, again, more care needs to be shown that this robustness transfers to the amino acid inference under these conditions. That synonymous sites are rarely selected for in the nucleotide level is a good sign, and it may be a matter of simply fully explaining the amino-acid level model.

      As above, we agree that these tests are important. To assess the robustness of our results to finite sampling, we performed bootstrap sampling on the viral sequences and inferred selection coefficients using the resampled sequences. Specifically, we resampled the same number of sequences as in the original data at each time point and repeated this for all time points across all HIV-1 and SHIV data sets. A new Supplementary Fig. 11 shows a typical comparison of the original selection coefficients vs. those obtained through bootstrap resampling. Overall, we observe a high degree of consistency between the selection coefficients in each case, which is surely aided by the long time series in these data sets. As pointed out by the reviewer, uncertainty in low-frequency mutations is a particular concern, though the effects on inferred selection are mitigated by regularization. 

      We have added a section in the Results, “Robustness of inferred selection to changes in the fitness model and finite sampling”, which includes this analysis:

      Finite sampling of sequence data could also affect our analyses. To further test the robustness of our results, we inferred selection coefficients using bootstrap resampling, where we resample sequences from the original ensemble, maintaining the same number of sequences for each time point and subject. The selection coefficients from the bootstrap samples are consistent with the original data (see Supplementary Fig. 11), with Pearson’s r values of around 0.85 for HIV-1 data sets and 0.95 for SHIV data sets, respectively.

      Uncertainty propagates to the later parts of the paper, eg. HIV and SIV shared patterns might be the result of shared biases in the method application. However, this worry does not extend to the apples-to-apples comparison of fitness trajectories across individuals (Figures 5 and 6) which I think are robust (for these sample sizes). 

      One way to address this uncertainty is to compare the fitness values and individual selection coefficients across CH505 and CH848 data sets, which was also requested by Reviewer 1. Overall, we found little correlation between CH505 and CH848 fitness values (shown in a new Supplementary Fig. 6) or selection coefficients. This suggests that similarities between HIV-1 and SHIV landscapes are not solely determined by potential biases in the inference approach. We have now added a reference to this point in the main text:

      In contrast, the inferred fitness landscapes of CH505 and CH848, which share few mutations in common, are poorly correlated (Supplementary Fig. 6). This suggests that the similarities between viral fitness values in humans and RMs are not artifacts of the model, but rather stem from similarities in underlying evolutionary drivers.

      The timing evidence is slightly weakened by the fact that bNAb detection is different from bNAb presence and the possibility that fitness increases occurred after the bNAbs appeared remains. Still, their conclusion is plausible and fits in with the other observations which form a coherent and compelling picture.

      Yes, we agree that this is a limitation of our analysis — bNAbs may have been present at low levels before they were detected, and we cannot definitively reject selection by bNAbs. Nonetheless, in at least one case (RM5695), rapid fitness gains were substantially separated in time from bNAb detection (roughly 2 weeks after infection vs. 16 weeks, respectively). We have now added this point in a new paragraph in the Discussion:

      While we found a strong relationship between viral fitness dynamics and the emergence of bnAbs, it may not be true that the former stimulates the latter. For example, bnAbs may have been present within each host before they were experimentally detected. Rapid viral fitness gains within hosts that developed broad antibody responses could then have been driven by undetected bnAb lineages. However, we did not find strong selection for known bnAb resistance mutations, and in at least one case (RM5695), rapid fitness gains (roughly 2 weeks after infection) substantially preceded bnAb detection (16 weeks). Still, given the limited size of the data set that we studied, it is unclear the extent to which our results will transfer to larger and broader data sets.

      Overall thisrpretations could provide valuable insights into the broader significance of these results. is a convincing paper, part of a larger admirable project of accurately inferring complete fitness landscapes.

      Reviewer #3 (Public review):

      Summary:

      Shimagaki et al. investigate the virus-antibody coevolutionary processes that drive the development of broadly neutralizing antibodies (bnAbs). The study's primary goal is to characterize the evolutionary dynamics of HIV-1 within hosts that accompany the emergence of bnAbs, with a particular focus on inferring the landscape of selective pressures shaping viral evolution. To assess the generality of these evolutionary patterns, the study extends its analysis to rhesus macaques (RMs) infected with simianhuman immunodeficiency viruses (SHIV) incorporating HIV-1 Env proteins derived from two human individuals.

      Strengths:

      A key strength of the study is its rigorous assessment of the similarity in evolutionary trajectories between humans and macaques. This cross-species comparison is particularly compelling, as it quantitatively establishes a shared pattern of viral evolution using a sophisticated inference method. The finding that similar selective pressures operate in both species adds robustness to the study's conclusions and suggests broader biological relevance.

      Weaknesses:

      However, the study has some limitations. The most significant weakness is that the authors do not sufficiently discuss the implications of the observed similarities. While the identification of shared evolutionary patterns (e.g., Figure 5) is intriguing, the study would benefit from a more explicit discussion of what these findings mean for instance, in the context of HIV vaccine design, immunotherapy, or fundamental viral-host interactions. Even speculative inte

      Thank you for this suggestion. We have now clarified the potential implications of our work in several areas. While speculative, one possible application is in vaccine design: it may be beneficial to design sequential immunogens to mimic the patterns of viral evolution associated with rapid fitness gains. This “population-based” design principle is different from typical approaches, which have focused on molecular details of virus surface proteins. 

      We have extended our discussion of our results in the context of viral evolution within and across hosts and related host species. Overall, our work suggests that there may be relatively few paths to significantly higher viral fitness in vivo. Evolutionary “contingencies” such as shifting immune pressure or epistatic interactions could influence the direction of evolution, but not so dramatically that the dynamics that we see in different hosts are not comparable. We have also connected our work more broadly to the literature in evolutionary parallelism in HIV-1 in different contexts.

      A secondary, albeit less critical, limitation is the placement of methodological details in the Supplementary Information. While it is understandable that the authors focus on results in the main text - especially since the methodology is not novel and has been previously described in earlier publications - some readers might benefit from a more thorough presentation of the method within the main paper.

      We have now modified the main text to add a new section, “Model overview,” that lays out the key steps of our approach. While we reserve technical details for the Methods, we believe that this new section provides more intuition about how our results were obtained (including a discussion of the important Eq. 12, now Eq. 3 in the main text) and our underlying assumptions.

      Conclusions:

      Overall, the study presents a compelling analysis of HIV-1 evolution and its parallels in SHIV-infected macaques. While the quantitative comparison between species is a notable contribution, a deeper discussion of its broader implications would strengthen the paper's impact.

      Reviewer #1 (Recommendations for the authors):

      I suggest de-emphasizing bnAbs and focusing on selection landscape inference, which seems to be the actual focus of the paper.

      While we do not directly study antibody development in this work, bnAb development is certainly an important motivating factor. As described in the responses above, we have now modified the Abstract and Discussion to place relatively more emphasis on fitness comparisons and to relatively less focus on bnAb development.  

      Reviewer #2 (Recommendations for the authors):

      Please make sure that the MPL method is defined in this paper and its limitations are at least partially repeated.

      As noted in responses above, we have now included more methodological details in the main text of the paper, which we hope will make the intuition and assumptions involved in our analysis clearer.

      I'd like the code to better show or describe the model, I could not figure out the model details by looking at the code. It seems mostly just to be csv exporting for use with preexisting MPL code. A longer code readme would be helpful.

      We have now updated the README on GitHub to include a conceptual overview of our inference approach, which references how each step is implemented in the code.

      Reviewer #3 (Recommendations for the authors):

      Try to give some more details (not necessarily giving the full mathematical derivation) on the statistical method utilized.

      As noted above, we have now expanded our discussion of the statistical methods and assumptions in the main text.

      Figures 3 and 4 are somewhat 'messy'. Although I do not have a constructive suggestion here, I feel that with a little more effort maybe the authors could come up with something more clean.

      It is true that the mutation frequency dynamics are somewhat “choppy” and difficult to follow intuitively. To attempt to make these figures easier to parse visually, we have increased the transparency on the lines and added exponential smoothing to the mutation frequencies, resulting in smoother trajectories. The trajectories without smoothing are retained in Supplementary Fig. 3. Here we also note that this smoothing is for visual purposes only; we use the original frequency trajectories for inference, rather than the smoothed ones.

    1. Author response:

      Reviewer #1 (Public review)

      Summary:

      Ever since the surprising discovery of the membrane-associated Periodic Skeleton (MPS) in axons, a significant body of published work has been aimed at trying to understand its assembly mechanism and function. Despite this, we still lack a mechanistic understanding of how this amazing structure is assembled in neuronal cells. In this article, the authors report a "gap-and-patch" pattern of labelled spectrin in iPSC-derived human motor neurons grown in culture. The mid-sections of these axons exhibit patches with reasonably well-organized MPS that are separated by gaps lacking any detectable MPS and having low spectrin content. Further, they report that the intensity modulation of spectrin is correlated with intensity modulations of tubulin as well. However, neurofilament fluorescence does not show any correlation. Using DIC imaging, the authors show that often the axonal diameter remains uniform across segments, showing a patch-gap pattern. Gaps are seen more abundantly in the midsection of the axon, with the proximal section showing continuous MPS and the distal segment showing continuous spectrin fluorescence but no organized MPS. The authors show that spectrin degradation by caspase/calpain is not responsible for gap formation, and the patches are nascent MPS domains. The gap and patch pattern increases with days in culture and can be enhanced by treating the cells using the general kinase inhibitor staurosporine. Treatment with the actin depolymerizing agent Latrunculin A reduces gap formation. The reasons for the last two observations are not well understood/explained.

      We thank the reviewer for the detailed and accurate description of the data shown and its relevance to further our understanding of MPS assembly mechanism and function.

      Strengths:

      The claims made in the paper are supported by extensive imaging work and quantification of MPS. Overall, the paper is well written and the findings are interesting. Although much of the reported data are from axons treated with staurosporine, this may be a convenient system to investigate the dynamics of MPS assembly, which is still an open question.

      We thank the reviewer for the positive comments on the manuscript, the techniques used and the proposed model.

      Weaknesses:

      Much of the analysis is on staurosporine-treated cells, and the effects of this treatment can be broad. The increase in patch-gap pattern with days in culture is intriguing, and the reason for this needs to be checked carefully. It would have been nice to have live cell data on the evolution of the patch and gap pattern using a GFP tag on spectrin. The evolution of individual patches and possible coalescence of patches can be observed even with confocal microscopy if live cell super-resolution observation is difficult.

      We will consider the inclusion of live imaging experiments using the expressión of C-terminus-tagged human beta2-spectrin in the revised version of the manuscript. We are familiar with live-imaging and FRAP experiments and we will explore how to develop these experiments to generate data for inclusion in a revised submission.

      Some more comments:

      (1) Axons can undergo transient beading or regularly spaced varicosity formation during media change if changes in osmolarity or chemical composition occur. Such shape modulations can induce cytoskeletal modulations as well (the authors report modulations in microtubule fluorescence). The authors mention axonal enlargements in some instances. Although they present DIC images to argue that the axons showing gaps are often tubular, possible beading artefacts need to be checked. Beading can be transient and can be checked by doing media changes while observing the axons on a microscope.

      We don´t discard the presence of “nano beads” in these axons. It was recently suggested that the normal morphology of axons is indeed resembling “pearls-on-a-string” (Griswold et al., 2025), with “nano beads” separated by thin tubular "connectors" (also referred to as NSV, for non-synaptic varicosities). However, it is unlikely that the gap-patch pattern of beta2-spectrin can be attributed to such a morphology, given we used formaldehyde as fixative, and Griswold and colleagues show that the use of aldehyde-based fixatives do not preserve NSVs. We are able to see scattered axonal enlargements (“micro beads”), as we described in distal portions in Fig. 1C(C2) and E. However, the number, appearance and staining of these are not compatible with the gap-patch pattern in beta2-spectrin. Moreover, we would have expected to see these NSVs in our extensive STED imaging, yet we did not. We will discuss this further in the resubmission.

      (2) Why do microtubules appear patchy? One would imagine the microtubule lengths to be greater than the patch size and hence to be more uniform.

      Our stainings are for tubulin protein isoforms beta-III and alpha-II. That is, they would label microtubules, but free tubulin as well. The slight decrease in intensity for tubulin within gaps is indeed something to investigate, but we don´t interpret this as “patchy microtubules”. If the Reviewer refers to Fig. 2C-D, it is actually difficult to anticipate the slight decrease in intensity by the naked eye. To further support this, we will consider including stainings and quantitative analyses for microtubules in the resubmission. We are familiar with the use of permeabilizing conditions during fixation (in protocols known as “cytoskeletal fixation” to label microtubules (and not free tubulin).

      (3) Why do axons with gaps increase with days in culture? If patches are nascent MPS that progressively grow, one would have expected fewer gaps with increasing days in culture. Is this indicative of some sort of degeneration of axons?

      We agree with the apparent discrepancy. However, one has to take into account that these axons are still elongating even at 2 weeks in culture. Hence, at any time point, there is a new axonal compartment recently added, and hence, with low beta2-spectrin and no MPS. Also, the dynamical evolution of the MPS has to take into account beta2-spectrin supply. If supply is somehow lower than a given threshold, it is expected that there will be more gaps, given the new, more distant parts of the axons have a lower supply of beta2-spectrin . To explore this formally, we are working on simulations of these multifactorial dynamic systems to better understand this, that together with key experimental observations would enhance our understanding into overall MPS assembly in growing axons. However, findings for this project will be the subject of another manuscript.

      (4) It is surprising that Latrunculin A reduces gap formation induced by staurosporine (also seems to increase MPS correlation) while it decreases actin filament content. How can this be understood? If the idea is to block actin dynamics, have the authors tried using Jasplakinolide to stabilize the filaments?

      The results with the co-treatment with Latrunculin A and Staurosporine are indeed intriguing, and provide clear evidence that the gap-and-patch pattern arises from local assembly of the MPS, requiring new actin filaments. However, the fact that F-actin within the pre-formed MPS seems unaffected is not surprising. There are many different populations of F-actin in axons (i.e. MPS rings, longitudinal filaments, actin patches, actin trails). Latrunculin A affects filaments indirectly. The target of Latrunculin A is not actin filaments, but free monomers. It ultimately affects actin filaments as they end up losing monomers, and devoid of new monomers, filaments get shorter and eventually disappear. The drastic decrease in F-actin in our axons reflects that. The fact that F-actin in the MPS is preserved only speaks to the fact that these filaments are stable -if they are not losing monomers in the time frame of the treatment, the filament remains unaffected. We will support this with more observations and imaging and with a more extensive discussion summarizing the literature on the matter in the resubmission.

      On the other hand, the use of F-actin stabilizing drugs (like Jasplakinolide) would have a different effect. We will study how an experiment with these drugs could be informative of the process under investigation for the resubmission

      (5) The authors speculate that the patches are formed by the condensation of free spectrins, which then leaves the immediate neighborhood depleted of these proteins. This is an interesting hypothesis, and exploring this in live cells using spectrin-GFP constructs will greatly strengthen the article. Will the patch-gap regions evolve into continuous MPS? If so, do these patches expand with time as new spectrin and actin are recruited and merge with neighboring patches, or can the entire patch "diffuse" and coalesce with neighboring patches, thus expanding the MPS region?

      We agree with the reviewer's interpretation. A virtue of our experimental model and our interpretations of the observations in fixed cells is that it gives rise to informative questions such as the ones posed by the reviewer. As stated above, we will consider the inclusion of live imaging experiments using the expressión of C-terminus tagged human beta2-spectrin in the revised version of the manuscript. We are familiar with live-imaging and FRAP experiments and we think we can provide the evidence suggested.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Gazal et al. describe the presence of unique gaps and patches of BetaII-spectrin in medial sections of long human motor neuron axons. BII-spectrin, along with Alpha-spectrin, forms horizontal linkers between 180nm spaced F-actin rings in axons. These F-actin rings, along with the spectrin linkers, form membrane periodic structures (MPS) which are critical for the maintenance of the integrity, size, and function of axons. The primary goal of the authors was to address whether long motor axons, particularly those carrying familial mutations associated with the neurodegenerative disorder ALS, show defects in gaps and patches of BetaII-spectrin, ultimately leading to degradation of these neurons.

      We thank the reviewer for the detailed and accurate description of the data shown.

      Strengths:

      The experiments are well-designed, and the authors have used the right methods and cutting-edge techniques to address the questions in this manuscript. The use of human motor neurons and the use of motor neurons with different familial ALS mutations is a strength. The use of isogenic controls is a positive. The induction of gaps and patches by the kinase inhibitor staurosporine and their rescue by Latrunculin A is novel and well-executed. The use of biochemical assays to explore the role of calpains is appropriate and well-designed. The use of STED imaging to define the periodicity of MPS in the gaps and patches of spectrin is a strength.

      We thank the reviewer for the positive comments on the manuscript, the techniques used and the proposed model.

      Weaknesses:

      The primary weakness is the lack of rigorous evaluation to validate the proposed model of spectrin capture from the gaps into adjacent patches by the use of photobleaching and live imaging. Another point is the lack of investigation into how gaps and patches change in axons carrying the familial ALS mutations as they age, since 2 weeks is not a time point when neurodegeneration is expected to start.

      We will consider the inclusion of live imaging experiments using the expressión of tagged human beta2-spectrin in the revised version of the manuscript. We are familiar with live-imaging and FRAP experiments and we believe we can provide the evidence suggested. We don't discard the notion that axons carrying familial ALS mutations will show defects in MPS formation and/or stability when observed at longer culture times, or under culture conditions that promote neuronal aging (Guix et al., 2021). Thus, we will continue to work with these cells, but the goal of that project lies well beyond the primary message of the present manuscript, and we anticipate that the revised version will not include new data on this matter. 

      Reviewer #3 (Public review):

      Summary:

      Gazal et al present convincing evidence supporting a new model of MPS formation where a gap-and-patch MPS pattern coalesces laterally to give rise to a lattice covering the entire axon shaft.

      Strengths:

      (1) This is a very interesting study that supports a change in paradigm in the model of MPS lattice formation.

      (2) Knowledge on MPS organization is mainly derived from studies using rat hippocampal neurons. In the current manuscript, Gazal et al use human IPS-derived motor neurons, a highly relevant neuron type, to further the current knowledge on MPS biology.

      (3) The quality of the images provided, specifically of those involving super-resolution, is of a high standard. This adequately supports the conclusions of the authors.

      We thank the reviewer for the positive comments on the manuscript, the techniques used and the proposed model.

      Weaknesses:

      (1) The main concern raised by the manuscript is the assumption that staudosporine-induced gap and patch formation recapitulates the physiological assembly of gaps and patches of betaII-spectrin.

      We will further explore the inclusion of more measurements of other parameters and variables towards establishing whether these gaps-and-patches patterns are equivalent structures in control and staurosporine-treated cells. 

      (2) One technical challenge that limits a more compelling support of the new model of MPS formation is that fixed neurons are imaged, which precludes the observation of patch coalescence.

      As stated before regarding similar comments by other reviewers, we will consider the inclusion of live imaging experiments in the revised version of the manuscript.

      Nicolas Unsain, PhD, and Thomas Durcan, PhD.

      References

      Griswold, J.M., Bonilla-Quintana, M., Pepper, R. et al. Membrane mechanics dictate axonal pearls-on-a-string morphology and function. Nat Neurosci 28, 49–61 (2025). https://doi.org/10.1038/s41593-024-01813-1

      Guix F.X., Marrero Capitán A., Casadomé-Perales A., Palomares-Pérez .I, López Del Castillo I., Miguel V., Goedeke L., Martín M.G., Lamas S., Peinado H., Fernández-Hernando C., Dotti C.G. Increased exosome secretion in neurons aging in vitro by NPC1-mediated endosomal cholesterol buildup. Life Sci Alliance. 2021 Jun 28;4(8):e202101055. doi: 10.26508/lsa.202101055. Print 2021 Aug.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC-2025-03098

      Corresponding author: Pedro Escoll

      1. General Statements

      Our study investigates the interplay between the metabolism of host cells and the intracellular replication of Salmonella enterica serovar Typhimurium (ST). Type III Secretion Systems (T3SSs) are considered essential for ST to replicate within macrophages. However, we found that restricting macrophages to different bioenergetic contexts, such as supplementing them with glycerol, modulates bacterial replication and remarkably, enables a T3SS-deficient ST mutant (ΔprgHssaV) to replicate intracellularly. This T3SS-independent replication occurs within the Salmonella-containing vacuole (SCV) and is driven by the capacity of the host cell to provide these preferred nutrients, rather than by the host glycolytic activity itself.

      2. Description of the planned revisions

      __Reviewer #1 (Evidence, reproducibility and clarity): __

      Summary:

      In this manuscript, the authors investigate how host cell metabolic heterogeneity influences the intracellular replication of Salmonella enterica serovar Typhimurium. They use live-cell imaging of infected human primary macrophages to reveal that bacterial replication does not occur uniformly across infected cells. They demonstrate that supplementation with specific carbon sources-used by Salmonella during infection-promotes bacterial replication and increases the proportion of macrophages supporting intracellular growth. These effects are seen even in the absence of functional Type III Secretion Systems (T3SS), using a ΔprgHssaV double mutant. The authors further suggest that this replication enhancement is not strictly dependent on host glycolytic activity but rather on the host cell's ability to import nutrients. Their findings imply that intracellular Salmonella can exploit host cell metabolism to grow, even without its canonical virulence secretion systems, under nutrient-favorable conditions.

      Major Concern:

      While the topic is potentially interesting, the novelty is not fully clear. The concept that nutrient availability impacts intracellular Salmonella replication, largely via T3SS2 function, has been addressed previously (e.g., Liss et al., 2017). The finding that added exogenous carbon sources can enhance bacterial growth is thus not unexpected. The key claim-that Salmonella can replicate intracellularly even in the absence of T3SS function-would be significantly strengthened by demonstrating whether this is specific to Salmonella, or whether similar effects are seen with non-intracellular organisms such as E. coli K-12. If the phenomenon is unique to Salmonella, this would suggest a pathogen-specific mechanism beyond general metabolic support.

      As acknowledged by the Reviewer, the novelty and key claim of our work is that Salmonella can replicate intracellularly even in the absence of T3SS. To experimentally sustain that claim, we showed evidence that providing macrophages with the preferred carbon sources used by Salmonella during infection, such as glycerol, bypass the requirement of both T3SS by Salmonella to grow, intravacuolarly, inside macrophages.

      With respect to the article mentioned by the Reviewer (Liss et al. 2017, ref 36 in the manuscript), there are three important novel insights provided by our work: i) we show that Salmonella can replicate intracellularly in the SCV even in the absence of T3SS if certain carbon sources are provided; ii) we show the preference of Salmonella for certain carbon sources intracellularly such as glycerol and galactose (but not preferentially glucose); and iii) we have extended our observations to primary human macrophages in addition to RAW cells.

      We are not convinced that the experiment suggested by the Reviewer to use E. coli K12 (ECK12) is necessary to support our findings for Salmonella, but we propose to add the requested experiment. Briefly, we will infect hMDMs and RAW macrophages with ST-WT-GFP, ST-ΔprgHΔssaV or ECK12-WT-GFP, while culturing macrophages on different carbon sources (glucose, glycerol, galactose, fructose). Then we will monitor intracellular bacterial growth. By comparing bacterial growth of ST double mutant with ECK12-WT-GFP under favorable carbon sources such as glycerol, the results will be definitive to answer whether this phenomenon is unique to Salmonella or not.

      Specific Comments:

      1. Figure 1H: The effect shown here is not compelling due to inconsistent y-axis scaling. Panels 1B, 1C, and 1D should use a unified axis range with 1H to allow direct visual comparison of growth dynamics.

      Thank you, we will change it as suggested.

      Figures 1B, 1C, 1G, 1H: The current presentation of individual growth traces makes it difficult to appreciate the population-level trend. A smoothed average line overlaid on these plots could better represent the average dynamics of replicative vs. non-replicative infections. Or alternatively the total fraction of cells that proliferate summarized as a segmented bar plot (possibly binned per time point).

      We will plot the results as suggested, the total fraction of infected cells harboring bacteria that proliferate as a segmented bar plot, binned per time point.

      Figure 2G: This panel would benefit from including a comparable condition with the SPI-1/SPI-2 double mutant to aid interpretation. Additionally, the authors should explore whether this nutrient-supported replication is seen in non-phagocytic cells such as HeLa or Caco-2, which would help delineate whether the observed phenomenon is macrophage-specific.

      The graph asked by Reviewer is Figure S1D. As we are representing ST growth in macrophages supporting Salmonella replication, some of the conditions, such as lactate, cannot be shown in the infection conditions using the double mutant because there are no cells supporting the replication of the double mutant, so there are no cells to plot.

      As suggested, we are also going to perform the same experiments in HeLa cells to investigate whether the observed phenomenon is macrophage specific.

      Line 117: The sentence stating that the double mutant can undergo "exponential intracellular growth even in the absence of T3SS-dependent secretion" is an overstatement. The data suggest only a modest improvement in growth, restricted to a minority of infected cells. This claim should be revised accordingly, as should similar overstatements in the discussion (e.g., lines 203-204).

      We will remove the term 'exponential' and revise the sentence at line 117 and those in the discussion. Line 203-204 will be: 'we demonstrated that providing macrophages with preferred nutrients allows a subpopulation of ST to replicate intracellularly without the need for a functional T3SS'.

      Line 162: The authors should clarify that glycerol had the strongest effect in primary macrophages, while multiple alternative carbon sources had notable effects primarily in RAW cells.

      We will add this clarification in the text.

      Lines 198-201: This relates to the major concern. The authors should assess whether the observed growth enhancement is unique to Salmonella by testing other bacteria not known for intracellular replication. This would clarify whether the effect is due to general nutrient-driven host cell permissivity or a pathogen-specific adaptation.

      As outlined above, we will perform the suggested experiment with E. coli K12 to answer whether this phenomenon is unique to Salmonella or not.

      RAW 264.7 Observations: The modest intracellular growth of SPI-1/SPI-2 double mutants in RAW cells is consistent with prior observations in the field. The idea that nutrient availability explains this is noteworthy. The authors might consider whether differences in standard culture media (e.g., glucose concentration) influence these outcomes. This could have broader implications for reproducibility in infection models.

      Thank you for the suggestion, we will include a paragraph discussing whether differences in standard culture media might influence bacterial replication. Indeed, to answer also a question from Reviewer #2, we will include a new supplementary Figure where we have already compared "no Glucose" (0 mM), "low Glucose" (2 mM) and standard culture media Glucose levels (10 mM). Our results show that differences in Glucose levels in the culture media influence Salmonella intracellular growth in hMDMs and RAW macrophages (see Figure below).

      Reviewer #1 (Significance):

      This manuscript highlights how host cell metabolism and nutrient availability can influence intracellular Salmonella replication. While the findings are intriguing, the current framing overstates their novelty and impact. Key revisions-such as comparative experiments with non-pathogenic bacteria and non-phagocytic cells, consistent figure scaling, and more measured language-would improve the clarity and significance of the work. If the authors can show Salmonella-specific mechanisms at play, the study could offer important insights into host-pathogen metabolic interactions.

      We believe that performing all experiments suggested by the Reviewers, as well as the requested changes in the text to avoid overstatements, will improve the manuscript and will offer readers new insights and details to better understand the metabolic interactions happening between host and pathogens and how they can shape bacterial virulence.

      Reviewer #2 (Evidence, reproducibility and clarity):

      Summary: In their study titled "Provision of Preferred Nutrients to Macrophages Enables Salmonella to Replicate Intracellularly Without Relying on Type III Secretion Systems", Dr. Garcia-Rodriguez et al. describe the influence of the host cell metabolism on the intracellular proliferation potential of Salmonella during infection. The authors investigate whether the supplementation of the media with different carbon sources has an impact on the intracellular lifestyle of Salmonella. By using single cell tracking in live-cell microscopy, including the use of different reporter strains, they describe that glycerol benefits Salmonella's ability to grow within its vacuolar niche, in part, interestingly, in a Type-3-Secretion System independent manner.

      They furthermore highlight the dependence on host background for this observation by showing that effects differ between cells of varying metabolic activity. Throughout their study, they use cutting-edge methodologies, as well as Salmonella strains that could be of versatile use in other investigations. This work, while limited to in vitro models for now, has implications for the better understanding of how pathogens and their host are intertwined. This, in turn, has significance for the development of new anti-infective strategies further down the line. I therefore believe that it should be disseminated to the research community. The following comments summarize ideas how the quality of the study could be improved:

      Major comments:

      1. Salmonella, especially when cultured to activate the SPI-1 T3SS, introduce rapid cell death in their host - most commonly through activation of the NLRC4 inflammasome and downstream pyroptotic signaling. The authors don't describe the effect of the infection in differently supplemented media on host cell death, yet it would be important to elucidate whether this cellular response is also altered.

      We have performed these experiments and tracked host cell death by measuring Annexin-V levels in single cells, during infection in the conditions using the different supplements. We will include these results in the revised version of the manuscript and main text. Please see the Figure below showing that the different carbon sources did not affect macrophages cell death significantly (future Figure S1E and S1F)

      The aspect of partially T3SS-independent growth enhancement by glycerol (and depending on the host background glucose) is most curious. The authors quantify this by determining the percentage of cells containing proliferating Salmonella and by tracking individual cells over the time course of the infection. I am missing a general statement on whether the initial infection rate (i.e. timepoint 0) is comparable across conditions and mutants, and whether possible discrepancies in the infection rate could have downstream effects on the statements and claims made in the manuscript. This is, to my mind, also important for the quantification of cytosolic and vacuolar bacteria. There, the authors always speak in "percent of infected cells", so it is relevant whether the number of infected cells varies among conditions (see e.g. Figure 3).

      We thank the reviewer for this comment. The initial infection rate at t=0 significantly differs between WT and mutants in RAW 264.7 macrophages, and carbon source supplementation has no effect. However, as we only analyze infected cells, this does not affect the final results. In any case, we are going to add the graphs of % of infected cells at t=0 as supplementary Figures S1G-K.

      The authors use a concentration of 10mM for all supplemented alternative carbon sources. It would be useful to discuss the rationale behind this approach, including whether all chemicals have the same ability to be taken up by the cell. A concentration series (at least for some of the tested compounds) may be beneficial to bolster the conclusions that the authors make.

      We use 10 mM as this is the concentration of Glucose in standard culture media. By using 10 mM for all the different carbon sources, we can thus compare them keeping concentration constant (10 mM). Indeed, to answer also Reviewer #1, we will include in the manuscript a paragraph discussing whether differences in standard culture media might influence bacterial replication. As this Reviewer suggested, we will include a new supplementary Figure comparing no Glucose (0 mM), low Glucose (2 mM) and standard culture media Glucose levels (10 mM), showing that the concentration of glucose has a gradual effect in supporting the replication of the T3SS-deficient strain in RAW macrophages (see Figure below).

      I think it would strengthen the study, if the authors used host cell mutants in certain metabolite transporters, or alternatively Salmonella mutants that are deficient in uptake or metabolism of some of the compounds used in this study. This point is alluded to in the discussion, and I believe if the authors could show that in certain host mutant backgrounds the impact of supplementation with alternative carbon sources can be reversed, it would immensely bolster the strength of the claims.

      Following Reviewer's suggestion, we generated ST metabolic mutants unable to metabolize glycerol, galactose or fructose. As seen in the Figures below, during infection, the supplementations with glycerol/galactose does not boost Salmonella replication in metabolic mutants as in WT conditions, demonstrating that supplemented carbon sources indeed arrive to bacteria within the SCV and are used by intracellular Salmonella to grow. This Figures will be now Future Figure 4J-N.

      I think it would be useful to include the meaning of this work for other intracellular pathogens in the discussion section: Do the authors believe that this phenotype is Salmonella-specific? If the pathogens are at hand, it might be interesting to infect with other intracellular bacteria, such as Shigella or Francisella to investigate if the boosting of growth by glycerol also holds true for these.

      We have performed experiments with Legionella pneumophila and galactose (see figure below), showing that this carbon source is specific of Salmonella (as shown in Figure 4F in the manuscript). We could perform experiments also with L. pneumophila and glycerol to answer the Reviewers question. However, we think that the results with Legionella might be out of the focus of this article and would constitute themselves a new article, as both pathogens have a very different, non-comparable intracellular metabolism. Thus, the experiment suggested by Reviewer #1 using E. coli K12 (ECK12) while culturing macrophages on different carbon sources (glucose, glycerol, galactose, fructose) is in our opinion a better fit. We will monitor intracellular bacterial growth and, by comparing bacterial growth of the ST-ΔprgHssaV double mutant with ECK12-WT-GFP under favorable carbon sources such as glycerol, the results will be definitive to answer whether this phenomenon is unique to Salmonella or not.

      Minor comments:

      • Line 41: The authors write "are required for", but given their findings, it might be more accurate to phrase this as "have previously been described to be required for" or "have previously been described essential for".

      We will change it.

      • Line 86: Is the referencing of Figure S1C correct or should it be S1A?

      Yes, thank you, it is S1A, we will change it.

      • Lines 119,120: Related to what is displayed in Figure 2G: Are these differences significant?

      Glucose, galactose and lactate curves are significantly different compared to control (p

      • Lines 126,127: What is the change for glycerol, and is the intracellular growth significantly higher compared to the control?

      6,2 {plus minus} 1.9% in glycerol vs. 2 {plus minus} 1% in control, p

      • Figure 1E&F: Related to one of the major comments: Would it be possible to quantify this at timepoint 0 to ensure that the initial infection rates are the same across conditions?

      As outlined above, we will add the graphs of % of infected cells at t=0 as supplementary Figures S1G-K (Major Comment number 2 from this Reviewer)

      • Figure 3E,F: Why does the sum of the curves not add up to 100% (especially in the beginning)? And related to that, why do both the percentage of cytosolic and vacuolar cells grow over time? Since this infection is performed with gentamycin present, re-infection should not be possible.

      The localization module of the SINA plasmid relies on transcriptional reporters, whose expression requires time for induction and detection. Therefore, at early time points, infected cells are not classified as vacuolar or cytoplasmic because the reporters have not yet been expressed (as described in PLoS Pathog. 2021;17(4):e1009550, PMID: 33930101).

      At later time points, a subset of cells harbors bacteria that do not express any of the reporters. These bacteria are considered dormant, representing about 10% of the population, as detailed in the same article. In addition, a small percentage of infected cells simultaneously contain both STvac and STcyt. Such cells are subclassified as harboring STcyt but also STvac. Consequently, the total proportion of infected cells carrying STvac and STcyt may also exceed 100%.

      • Figure S1A: While significance testing is described in the legend, there are no indications of significance in the figure panels.

      The Reviewer is right, there is no significant changes between conditions, we will change the significance testing to ns=non-significant.

      • Figure S1B: Due to the stark discrepancies between hMDMs and RAW264.7, it might make sense to plot them on two different y-axes. Furthermore, I would clarify the y-axis: In the legend, it seems as CFU counts are shown, while CFU/ml/t2 rather describes a change over time.

      We agree. However, we will maintain the scale of the Y-axis as it was required by Reviewer #1 to be consistent with Y-axis. We will change the legend to indicate that we plot CFU/ml/t2.

      • Figure S1C: The prgH-mutant seems to outperform the wildtype in intracellular proliferation, while the double mutant underperforms compared to the ssaV-mutant. Could you please discuss/explain how the prgH-deletion has seemingly opposite effects on intracellular proliferation, depending on whether it is introduced in a wildtype or ssaV-KO background?

      As T3SS-1 plays a role in inducing macrophage cell death via activation of the NLRC4 inflammasome, macrophages infected with bacteria carrying a functional T3SS-1 (such as WT), are more prone to undergo cell death at late time-points, which disrupts bacterial proliferation and reduces the proportion of infected cells. Thus, these dead cells were not considered in the analysis. Even if cell death of ST-WT-infected RAW macrophages remains below 5%, more ΔprgH-infected cells are considered in the analyses at late time-points, and ST-ΔprgH continue replicating (and growing in ST area).

      • Figure S2A: As for the comments related to Figure 3, I am unsure how the sum of STvac and STcyt can deviate from 100. This is especially puzzling for the red curve (glycerol) at e.g. 3hpi, when the sum of the two clearly seems to be larger than 100.

      At early time points, no infected cells are classified as vacuolar or cytoplasmic because the reporters have not yet been expressed. At later time points, a subset of cells harbor bacteria that do not express any of the reporters, which are considered dormant (10% of the population). Finally, a small percentage of infected cells simultaneously contain both STvac and STcyt, therefore the total proportion of infected cells carrying STvac and STcyt may also exceed 100%.

      **Cross-commenting** I agree in principle with the comments raised by Reviewer #1 - especially when it comes to the enhancement in significance if the authors assess the species specificity. Elucidating whether the growth enhancement is Salmonella-specific, occurs for other intracellular pathogens (e.g. Shigella, Francisella) or also for extracellular bacteria (e.g. E. coli, Yersinia), would definitely strengthen the study.

      As said before, for the revision we are going to perform the experiments suggested by Reviewer #1 of using E. coli K12 (ECK12) while culturing macrophages on different carbon sources (glucose, glycerol, galactose, fructose). And to satisfy this Reviewer's curiosity, we are going to perform experiments also with L. pneumophila and glycerol.

      Reviewer #2 (Significance):

      General assessment:

      As the authors write in their discussion, the strength of this study is also it's limitation: Using single cell tracking in microscopy is a very elegant and powerful approach, yet conversely, it limits the scope of the study to in vitro approaches. While it enables assessment of bacterial pathogenicity and host-dependence on a single-cell level, it remains to be investigated whether the conclusion that the authors draw from their work will hold in more complex or physiologically relevant models.

      During the preparation of this Revision Plan, we discovered the article published in PLoS Pathogens by Andrew Grant and Pietro Mastroni "Attenuated Salmonella Typhimurium Lacking the Pathogenicity Island-2 Type 3 Secretion System Grow to High Bacterial Numbers inside Phagocytes in Mice" (PLoS Pathog 2012 8(12): e1003070, PMID: 23236281). In this article, authors showed that our main conclusion is also relevant in vivo (Salmonella Typhimurium can replicate within macrophages in the absence of T3SS). This will be addressed in the Discussion of the revised manuscript. Our study provides a metabolic explanation, at the single cell level for those observations.

      A further small shortcoming of the study is the heavy focus on the bacterial aspect in this host-pathogen interaction. While the authors do link the proliferative potential of the intracellular bacteria to the metabolic status of the individual host cell, more could be done with respect to host responses in the varying media compositions, including investigating alterations to the cell cycle, induction of cell death, or the ability to activate inflammatory signaling.

      We agree, and we are actively investigating how restricting macrophages to specific carbon sources impact other host responses, such as cytokine production. For the revised manuscript, we will add the results on the induction of cell death.

      Nonetheless, this study is of large interest to the field and the systematic approach to addressing their hypotheses speaks to the scientific excellence of the investigators.

      Thank you.

      3. Description of the revisions that have already been incorporated in the transferred manuscript

      N/A

      • *

      4. Description of analyses that authors prefer not to carry out

      N/A

    1. For example, in the Logic & Communication column, we see many light-orange cells – the AI often thought papers were a bit clearer or better argued (by its judgment) than the human evaluators did.

      I wonder if we should normalize this in a few ways, at least as an alternative measure.

      I suspect the AI's distribution of ratings may have different than the human distribution of ratings overall and, the "bias" may also differ by category.

      Actually, that might be something to do first -- compare the distributions of (middle -- later more sophisticated) ratings for humans and for LLMs in an overall sense.

      One possible normalization would be to state these as percentiles relative to the other stated percentiles within that group (humans, LLMs), or even within categories of paper/field/cause area (I suspect there's some major difference between the more applied and niche-EA work and the standard academic work (the latter is also probably concentrated in GH&D and environmental econ). On the other hand, the systematic differences between LLM and human ratings on average might also tell us something interesting. So I wouldn't want to only use normalized measures.

      I think a more sophisticated version of this normalization just becomes a statistical (random effects?) model where you allow components of variation along several margins.

      It's true the ranks thing gets at this issue to some extent, as I guess Spearman also does? But I don't think it fully captures it.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      In this paper, the GFP-GBP system for mistargeting protein localization was used in fission yeast cells to discover new protein interactions involved in vesicular trafficking during cytokinesis. This approach uncovered a new association between the F-BAR protein Rga7 and its binding partner Rng10 with the Munc13 protein Ync13 at the cell division site. Additional associations were observed between Rga7-Rng10, Ync13 and the glucan synthases Ags1 and Bgs4, and the vesicle fusion protein Sec1. These interactions identified by the GFP-GBP system were further supported by co-immunoprecipitation experiments and by defining localization dependencies with live cell imaging in a variety of mutant strains. The imaging data are all of high quality and for the most part support the conclusions. However, in my opinion some of the interpretations are overstated, and the manuscript would benefit from providing additional mechanistic information. Major and minor recommendations are outlined below.

      Major suggestions 1. The co-IP data are interpreted to suggest that all the above-mentioned proteins form a single "big complex." However, as noted in the manuscript and reflected in the model, the multipass integral membrane proteins Bgs4 and Ags1 are embedded in the vesicle membrane and likely only indirectly associate with the scaffold Rga7-Rng10 via Ync13, without forming a 'complex'. One would expect the entirety of these vesicle contents to co-IP if the model is correct. The first paragraph of page 11 should be revised to more clearly reflect this scenario and to align with the proposed model.

      Response: We thank the reviewer for this thoughtful clarification. In the original manuscript, we stated “…indicating these proteins do interact or form big protein complexes… These results suggest that Rga7, Rng10, and Ync13 form a protein complex.” We agree that our initial wording may have unintentionally implied that all proteins detected in co-IP experiments assemble into a single, large physical complex. As the reviewer correctly noticed, the multipass integral membrane proteins Bgs4 and Ags1 are embedded within vesicle membranes and are more likely to associate indirectly with the Rga7-Rng10-Ync13 complex, rather than being part of one unified protein complex. To avoid overinterpretation, we have modified the last sentence of the first paragraph on the original page 11 as below: “These results suggest that Rga7, Rng10, and Ync13 do form a protein complex, although maybe dynamic and not super stable (see Discussion). Our data indicate that Rga7 interacts with both Ync13 and Rng10 to form a module on the plasma membrane for targeting of the vesicles containing cargos such as glucan synthases Bgs4 and Ags1. However, these glucan synthases are multipass integral membrane embedded proteins and likely only indirectly associate with the module Rng10-Rga7-Ync13, without forming a big protein complex.”

      Can Ync13 be artificially directed or tethered to the division site independently of Rga7-Rng10 (e.g., via Imp2)? If so, can this rescue the phenotypes of rga7Δ cells? This experiment could clarify whether Ync13 is the key functional effector of the Rga7-Rng10 complex.

      Response: We thank the reviewer for suggesting this interesting experiment. We agree that testing whether correctly localized Ync13 is sufficient to execute the division-site function of the Rga7–Rng10 complex would clarify its role. To test this, we artificially targeted Ync13 to the division site independently of Rga7 by tethering it to the scaffold protein Pmo25. Pmo25, an MO25 family protein, localizes to both the plasma membrane at the division site and the spindle pole body (mainly one of the SPBs) during mitosis and cytokinesis, enabling us to mislocalize Ync13 to these structures through GFP–GBP system. We did not use Imp2 because its localization pattern (mainly to the contractile ring [1, 2]) is different from Ync13. Microscopy revealed robust localization of Ync13 at the division site and the SPB in rga7Δ cells, and this tethered Ync13 persisted along the cleavage furrow throughout ring constriction. Importantly, enforced division-site localization of Ync13 significantly rescued the cytokinesis defects and cell lysis of rga7Δ. Consistently, growth assays on Phloxin B (PB) plate showed the elevated lysis/death in rga7Δ cells was rescued by Ync13 tethering to Pmo25-GBP. Together, these findings support that Ync13 is a key functional effector acting downstream of the Rga7–Rng10 scaffold at the division site. We have added these results in the new Figure 6 and associate text in the revised manuscript. We have also updated the model in Figure 8 to reflect this new result.

      1. Demeter J, Sazer S. imp2, a new component of the actin ring in the fission yeast Schizosaccharomyces pombe. J Cell Biol. 1998;143(2):415-27. PubMed PMID: 9786952.
      2. Martin-Garcia R, Coll PM, Perez P. F-BAR domain protein Rga7 collaborates with Cdc15 and Imp2 to ensure proper cytokinesis in fission yeast. J Cell Sci. 2014;127(Pt 19):4146-58. Epub 2014/07/24. doi: 10.1242/jcs.146233. PubMed PMID: 25052092.
      3. The authors should consider structural or computational modeling of the proposed Rga7-Rng10-Ync13 complex. Such analysis could offer insight into how these components interact and strengthen the proposed model. Response: We thank the reviewer for this valuable suggestion. Following the recommendation, we performed structural modeling of the Rga7–Rng10–Ync13 complex using AlphaFold3. Our previous work demonstrated that the F-BAR protein Rga7 forms a stable dimer and its F-BAR domain binds the C-terminal (aa751–1038) region of Rng10 [3]. Based on these findings, we constructed an input model consisting of two full-length Rga7 subunits, two Rng10(751–1038) subunits, and one full-length Ync13. The predicted structure revealed a modular organization in which Rng10(751–1038) associated strongly with the F-BAR domain of the Rga7 dimer, consistent with our prior biochemical data [3]. In addition, the model suggested that Ync13 interacted with the GAP domain of Rga7, positioning Ync13 in close proximity to the Rga7–Rng10 interface (Fig. S5, A, B, D and F). Further domain specific predictions confirmed the interactions between Rga7-GAP and Ync13 N-terminus (pTM: 0.63, ipTM: 0.64), two Rga7 F-BARs (pTM: 0.74, ipTM: 0.71), as well as Rga7 F-BAR and Rng10(751–1038) (pTM: 0.56, ipTM: 0.78) (Fig. S5, C-F). Overlay analyses revealed that the interacting domains align well with the structure of whole complex as the root mean square differences (RMSDs) are Liu Y, McDonald NA, Naegele SM, Gould KL, Wu J-Q. The F-BAR domain of Rga7 relies on a cooperative mechanism of membrane binding with a partner protein during fission yeast cytokinesis. Cell Rep. 2019;26(10):2540-8.e4. doi: 10.1016/j.celrep.2019.01.112. PubMed PMID: 30840879; PubMed Central PMCID: PMCPMC6425953.

      Minor text edits 1. Define "SIN" in the discussion section for clarity.

      Response: We defined the SIN pathway in the Discussion section as suggested: “At low restrictive temperatures, the lethality of mutant sid2, the most downstream kinase in the Septation Initiation Network, is partially rescued by upregulating Rho1. Thus, it has been suggested that the Septation Initiation Network activates Rho1, which in turn activates the glucan synthases [4].”

      Alcaide-Gavilán M, Lahoz A, Daga RR, Jimenez J. Feedback regulation of SIN by Etd1 and Rho1 in fission yeast. Genetics. 2014;196(2):455-70. Epub 2013/12/18. doi: 10.1534/genetics.113.155218. PubMed PMID: 24336750; PubMed Central PMCID: PMCPMC3914619.

      Figure S3, the protein schematics should start at residue "1" and not "0".

      Response: We apologize for the mistake. The schematics in revised figure (now Figure S4A) have been corrected to start at residue 1.

      Mass spectrometry data referenced in the text are not provided in the manuscript.

      __Response: __We apologize for the omission. The mass spectrometry data are now shown in Table S1. __

      __

      In Figure 4A. The Ags1 rim localization does not appear decreased as the authors claim.

      __Response: __After examining the data again, we agree with the reviewer’s assessment. So, we reworded the sentence as the following: “We also found that in ync13Δ cells, the Bgs4 intensity at the rim of the septum was much lower than in WT after ring constriction (Fig. 4B).”


      On page 13: "both Rga7 and Rng10 can mistarget Trs120 to mitochondria."

      Response: Thank you. The typo “mistargeting” has been corrected to “mistarget”.

      Minor figure edits 1. Consider inverting single-channel images to display fluorescence on a white background, which would improve visual clarity.

      Response: We appreciate the reviewer’s suggestion. However, we have chosen to retain the original display format with fluorescence shown in a black background, to be consistent with our (and some others’) previous publications. We believe this format preserves clarity while allowing easier comparison with the previously published works.

      The Figure 1 legend should describe the experimental setup rather than restating conclusions.

      Response: We thank the reviewer for this helpful suggestion. The Figure 1 legend has been revised to describe the experimental setup and imaging conditions rather than summarizing conclusions as the following:

      Fig. 1. Physical interactions among the key cytokinetic proteins in plasma membrane deposition and septum formation revealed by ectopic mistargeting to mitochondria by Tom20-GBP. __Arrowheads mark examples of colocalization at mitochondria. (A) Ync13 colocalizes with Rga7 and Rng10 at cell tips and the division site. (B-F) Tom20-GBP can ectopically mistarget Rga7/Rng10-mEGFP and their interacting partners tagged with tdTomato/RFP/mCherry to mitochondria. Tom20–GBP was used to recruit mEGFP-tagged Rga7 or Rng10 to mitochondria, and colocalization was assessed with tdTomato/RFP/mCherry-tagged candidate binding partners. Cells were grown at 25ºC in YE5S + 1.2 M sorbitol medium for ~36 to 48 h and then were washed with YE5S without sorbitol and grown in YE5S for 4 h before imaging. (B) Rga7/Rng10-Ync13. (C) Rga7/Rng10-Trs120. (D) Rga7/Rng10-Bgs4. (E) Rga7/Rng10-Ags1. (F)__ Rga7-Smi1. Bars, 5 μm.

      Reduce the number of arrows indicating co-localization in microscopy images; highlighting 1-2 representative examples is sufficient and less visually cluttered.

      Response: We appreciate the reviewer’s suggestion. We have revised the micrographs to reduce the number of arrowheads, highlighting several representative examples of co-localization per image. This improves clarity and reduces visual clutter while still guiding the reader to the key observations.

      Figure 3F, the scale bar is listed as 5 μm in the legend but it appears to my eye to be 2 μm.

      Response: We thank the reviewer for noticing this error. After rechecking the original imaging data, we have added a new 5 μm scale bar.

      The orientation of Bgs4/Smi1 should be inverted in the schematic within vesicles so that Smi1 is always on the cytoplasmic side.

      Response: We thank the reviewer for pointing out this error. The schematic has been corrected so that Bgs4 and Smi1 are oriented appropriately, with Smi1 consistently placed on the cytoplasmic side of vesicles because it does not have a transmembrane domain. The revised schematic is included in the updated Figure 8.

      6. Also in the schematic, Mid1 is not at the constricting CR and therefore needs to be removed.

      __Response: __Thank you for the suggestion. Mid1 has been removed from the model figure.

      Reviewer #1 (Significance (Required): From the data presented in the manuscript, it is proposed that Rga7 and Rng10 form a scaffold at the division site for delivery of exocytic vesicles marked by the TRAPPII complex but not the exocyst complex. Further, it is proposed that these vesicles deliver specifically the glucan synthases necessary for septation. Overall, this study builds on previous work from the Wu lab to clarify how the TRAPPII-decorated vesicles are specifically delivered to the cell division site, adding some new information about vesicle trafficking regulation during cytokinesis. It also provides new insight into the role of a F-BAR scaffold protein.

      This paper will be of interest to those studying cytokinesis and also those studying mechanisms of intracellular trafficking.

      Reviewer expertise: Cell division, signaling, membrane biology

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary:

      This paper provides a comprehensive analysis of the roles of Rng10, Rga7, and Ync13 in cytokinesis using fission yeast as a model system. The authors demonstrate that Ync13/Rna7/Rng10 not only interact with each other but also associate with components of glucan synthases, which are essential for secondary septum formation but not for the primary septum. They further show that Ync13 is involved in exocytosis through its interaction with Sec1 and plays a role in membrane trafficking via interaction with the TRAPP-II complex. Collectively, their findings reveal a coordinated mechanism that ensures the timely formation of the secondary septum during cytokinesis, as deletion of these proteins disrupts septum formation and leads to cell lysis.

      The conclusions drawn in this paper are well-supported by the data, with a clear methodology and robust statistical analyses that enhance reproducibility. However, I have the following major and minor comments:

      Major Comments - 1) The authors propose that Ync13, Rng10, and Rga7 interact to form a protein complex, supported by their mislocalization studies. While these findings are suggestive, additional co-immunoprecipitation (co-IP) data specifically demonstrating a direct interaction between Ync13 and Rng10 would strengthen the claim.

      Response: We thank the reviewer for this suggestion. The direct interaction between Rga7 with Rng10 has been already established and published by our group [3, 5]. Here we found that Rga7 and Ync13 directly interact by in vitro binding assay (Figure 2, D and E). While our current data do not suggest a direct physical interaction between Ync13 and Rng10, our mislocalization results and other data do provide strong support for their functional association. In particular, ectopic tethering of Ync13 to mitochondria recruits Rng10 to the same sites and vice versa (Figures. 1B and S2A). Additionally, division-site tethering of Ync13 by Pmo25-GBP rescues both the growth and cell-lysis phenotype of rga7Δ (Figure 6), consistent with the idea that Ync13 functions downstream of Rga7-Rng10 because Rga7 localization depends on Rng10 (Figure 8). Furthermore, our AlphaFold3 modeling predicts that Rng10 binds the BAR domain of Rga7, whereas Ync13 binds the GAP domain of Rga7, suggesting that Rng10 and Ync13 are positioned within the same complex through Rga7 without direct interaction (Figure S5).

              The predicted lack of direct interaction between Ync13 and Rng10(751–1038) is supported by the experiment mentioned below to answer the minor question from the Reviewer 3. We tested the mistargeting of mECitrine-Rng10(751–1038) in *rga7Δ tom20-GBP* cells and found that Ync13-tdTomato could not be recruited to mitochondria (Figure S4H). This indicates that Ync13 cannot interact with Rng10(751–1038) independently of Rga7, supporting our proposed model that Rga7 interacts with Rng10 through the BAR domain while with Ync13 through the GAP domain. We have added these clarifications to the revised manuscript (Results and Discussion) to better contextualize the evidence for the Rga7–Rng10–Ync13 assembly.
      

      Liu Y, McDonald NA, Naegele SM, Gould KL, Wu J-Q. The F-BAR Domain of Rga7 Relies on a Cooperative Mechanism of Membrane Binding with a Partner Protein during Fission Yeast Cytokinesis. Cell Rep. 2019;26(10):2540-8.e4. doi: 10.1016/j.celrep.2019.01.112. PubMed PMID: 30840879; PubMed Central PMCID: PMCPMC6425953. Liu Y, Lee I-J, Sun M, Lower CA, Runge KW, Ma J, et al. Roles of the novel coiled-coil protein Rng10 in septum formation during fission yeast cytokinesis. Mol Biol Cell. 2016;27(16):2528-41. Epub 2016/07/08. doi: 10.1091/mbc.E16-03-0156. PubMed PMID: 27385337; PubMed Central PMCID: PMCPMC4985255.

      2) It remains unclear whether Ync13 directly interacts with components of the glucan synthase complex (Bgs4/Ags1), or if this association is mediated through other factors (Rng10, Rga7). Clarifying the nature of this interaction would significantly enhance the mechanistic insight.

      Response: We thank the reviewer for this thoughtful clarification. As pointed out by Reviewer 1 in major comment 1, the multipass integral membrane proteins Bgs4 and Ags1 are embedded within vesicle membranes and are more likely to associate indirectly with the Rga7–Rng10-Ync13 complex rather than being part of one unified protein complex, although Rga7 Co-IPs with Bgs4 and its binding partner Smi1 (Figure 1, A-C). We would like to make it clear that our model or manuscript does not claim direct interactions between the Ync13-Rga7-Rng10 module and the glucan synthase complexes but suggest that the module aids in selection of vesicle targeting sites on the plasma membrane. To clarify, we have revised the text to more clearly state that our co-IP and in vitro binding results demonstrate that Rga7 physically associates with Ync13 and Rng10, and that vesicle-associated proteins such as Bgs4 and Ags1 are likely recruited through indirect interactions.

      __Minor comments: __1) The manuscript refers to mass spectrometry-based interaction data, but the corresponding dataset is not included. Providing this would enhance transparency and reproducibility.

      __Response: __We apologize for the omission. The mass spectrometry data are now shown in Table S1.

      2) In Figure 2D, the MBP-6x pull-down lane shows a faint band around 76 kDa. The authors should clarify what this band represents and whether it has any relevance to the study.

      Response: We thank the reviewer for noticing this faint band. The weak ~76 kDa band in the MBP-6x pull-down lane is non-specific background binding of MBP and Rga7. We added a note in the figure legend to clarify this point.


      3) A quantification graph corresponding to the data in Figure 3G would aid in better interpreting the results and assessing their significance.

      Response: We thank the reviewer for this suggestion. We have now added two quantification graphs corresponding to Figure 3G, showing the measured Rng10 signal intensities across the division site. Statistical analysis shows the full width at half maximum (FWHM) is significantly different between WT and ync13D cells, and the figure legend and text have been updated accordingly in the revised manuscript.

      4) Figure 4D appears to be missing time legends, which are essential for interpreting the dynamics of the experiment.

      Response: We thank the reviewer for noticing this. We apology for making this confusing statement in figure legend. We would like to clarify that the full width at half maximum (FWHM) was calculated from line scans using single time point images from cells at the end of contractile-ring constriction. Those line scans were fitted with the Gaussian distribution to calculate the mean and standard deviation of FWHM. We have updated the figure legend to make it clearer in the revised manuscript.

      Reviewer #2 (Significance (Required)):

      Nature and Significance of the Advance This study provides a conceptual and mechanistic advance in understanding the spatial and temporal regulation of membrane trafficking during cytokinesis. It identifies a conserved module-Ync13-Rga7-Rng10-that directs the selective tethering and fusion of secretory vesicles at the division site, functioning independently of the exocyst complex. This finding challenges the prevailing model that the exocyst is universally required for vesicle tethering during cytokinesis. While previous work has underscored the roles of TRAPP-II and vesicle trafficking in septum formation (Wang et al., 2016; Arellano et al., 1997; Gerien and Wu, 2018), the precise mechanism targeting vesicles to the division site remained unclear. This study fills that gap by elucidating how Ync13 and Rga7 coordinate vesicle delivery and glucan synthase localization (Liu et al., 2016; Zhu et al., 2018), thereby extending our understanding of septum biogenesis and membrane remodeling beyond actomyosin ring dynamics.

      Relevant Audience: This work is relevant to: • Cell biologists investigating cytokinesis, membrane trafficking, or vesicle fusion. • Yeast geneticists interested in conserved cell division pathways. • Researchers focused on SNARE-mediated membrane dynamics and trafficking regulation. • Biomedical scientists exploring analogous processes in mammalian systems, particularly those studying cell division defects linked to disease. The findings have implications across both basic and translational research in cell biology and membrane dynamics.

      My Expertise: My research focuses on membrane fusion, specifically the SNARE-mediated fusion process. I study the spatio-temporal regulation of fusion events and the coordinated action of regulatory proteins in determining the structural and functional outcomes of membrane fusion. This background provides me with the framework to critically evaluate studies investigating cytokinesis and trafficking mechanisms at the molecular level.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Zhang et al. elucidate key roles of a conserved module the Ync13-Rga7-Rng10 complex in coordinating selective tethering, docking, and fusion of glucan synthases containing vesicles with the plasma membrane, a process crucial for cell wall synthesis and survival of fission yeast at division. Using methods including mistargeting proteins to mitochondria, co-immunoprecipitation, in vitro binding assays, genetic and cellular methods, electron microscopy, and live-cell confocal microscopy, the authors demonstrate that this module controls a vesicle targeting pathway mediated by the TRAPP-II complex and SM protein Sec1, which ensures glucan synthases Bgs4 and Ags1 are deposited at the division site in a spatiotemporal manner.

      Major comments: The authors report aberrant accumulation of Bgs4 and Ags1 in the center of the septum after actomyosin ring constriction in ync13del cells and detect no overall defects in Bgs1 distribution there (Figure 4). When similar experiments were analyzed in this paper ( https://pmc.ncbi.nlm.nih.gov/articles/PMC6249806/), Bgs1 distribution and level did change in cells lacking Ync13, although these phenotypes of Bgs1 appeared later that that of Bgs4. I wonder whether there could exist a second wave of Bgs1 arrival in ync13del cells at later time points after ring fully constricts. Could this late recruitment of Bgs1 depends on Rng7 and Rng10, since these protein complexes are enriched in the middle of septum of ync13del cells? Or as the authors mentioned in the Discussion, could Rho GTPase regulated by Rga7 GAP also play a role in Bgs1 accumulation or fusion with the septum in the above scenario, if no obvious accumulation of vesicles is observed in ync13del cells with electron microscopy? How does Bgs1 localize in ync13-19 rng10del double?

      Response: We thank the reviewer for this insightful observation. We repeated the experiment to observe the localization of Bgs1 in WT and ync13Δ cells. We confirmed our earlier observation reported in this manuscript that the localization of Bgs1 at rim of the division site and its distribution along the division plane in ync13Δ is not very different from WT, although its intensity is higher and has more variation in ync13Δ cells (Figure above) . As suggested by the reviewer, we did microscopy to test Bgs1 localization in ync13-19 temperature sensitive mutant, rng10Δ, ync13-19 rng10Δ, and WT (Fig. S7). While line scan curves for Bgs1 localization at the division site steep for ync13-19 rng10Δ double mutant, it has no statistically significant difference in FWHM as compared to control WT (Fig. S7). Please note that we used different confocal systems, cameras, and laser powers for Fig. 4, C and E (PerkinElmer UltraVIEW Vox CSUX1) and Fig. S7 (Nikon W1+SoRa), so the FWHMs are not comparable between the two figures.

      To test if there is any second wave of Bgs1 localization at the division site, we tracked the fluorescence intensity of Bgs1 throughout 2 h long movies and plotted the Bgs1 intensity profile at the division site over time. The data clearly show only one peak of Bgs1 and no later accumulation at the division site, although Bgs1 intensity has more variation in ync13-19 and ync13-19 rng10Δ cells and the intensity is higher in ync13-19 rng10Δ cells. All these experiments conclude that Ync13-Rga7-Rng10 module impacts the localization of glucan synthases essential for the secondary septum (Bgs4 and Ags1) but not the primary (Bgs1).

      Assessments of protein abundance by Western blotting (Figure 3C and 3D) can benefit from some quantifications.

      Response: We thank the reviewer for this suggestion. We have now quantified the Western blot bands in Figures 3C and 3D, which have been added as supplementary figures along with the Western blot for Rng10 (Fig. S6, A-C) in the revised figures.

      Minor comments: Based on a series of experiments in which mistargeting Rga7 and Rng10 truncations drive Ync13-tdTomato to mitochondria, the authors suggest that Rga7, Rng10, and Ync13 have multivalent interactions with each other. Previous study (https://pmc.ncbi.nlm.nih.gov/articles/PMC6425953/) demonstrated that in cells co-expressing Tom20-GBP mECitrine-Rng10(751-950), Rga7 was efficiently mistargeted to mitochondria. This raises a possibility that Ync13 mistargeted by mECitrine-Rng10(751-1038) could come from Rga7 that strongly associated with Rng10(751-1038) on mitochondria. I wonder whether the authors could compare some of their truncation mistargeting experiments in the original manuscript and the ones in which either Rga7 or Rng10 is deleted, e.g. Tom20-GBP mECitrine-Rng10(751-1038) experiments in rga7del cells, if cells are still viable in this genetic background.

      Response: We thank the reviewer for this insightful suggestion. We tested the mistargeting of mECitrine-Rng10(751–1038) in rga7Δ tom20-GBP cells and found that Ync13-tdTomato could not be recruited to mitochondria. This indicates that Ync13 cannot interact with Rng10 C-terminus independently of Rga7, supporting the Alphafold3 modeling and our proposed model that Rga7 interacts with Rng10 through the BAR domain while with Ync13 through the GAP domain. We have added the new data to the revised manuscript (Fig. S4H and associate text) and included a brief discussion highlighting that Rga7 is required for the Rng10–Ync13 interaction. We removed the mentioning of multivalent interactions in the manuscript to minimize confusion.

      It is interesting that rga7del rng10del double mutants can survive better in EMM or YES with sorbitol. I wonder this would allow the authors to test whether the interaction between Ync13 and Sec1 is modulated by the presence of Rga7 and Rng10 or even the entire vesicle? Does mistargeted Ync13 overexpressed using the 3nmt1 promoter is still capable of driving Sec1 to mitochondria in rga7del rng10del cells.

      Response: We thank the reviewer for this suggestion. While we did not succeed in constructing the pentamutant deleting both rga7 and rng10 and mislocalizing Ync13 to mitochondria, we were able to make a quadruple mutant deleting rng10 and mislocalizing Ync13 to mitochondria. We tested whether mistargeted Ync13 overexpressed using the 3nmt1 promoter can recruit Sec1 to mitochondria in rng10Δ cells. Our results show that overexpressed Ync13 is still able to drive Sec1 localization to mitochondria without Rng10 (Fig. S2G). This suggests that Rng10 (together with Rga7) primarily functions to recruit and position Ync13 at the division site rather than being strictly required for the interaction between Ync13 and Sec1. This is also consistent with our Pmo25-GBP mislocalization experiments where we found that rga7Δ 3nmt1-mECitrine-ync13 cells even under the repressed condition for the 3nmt1 promoter can partially rescue the lysis phenotype of rga7Δ cells (Figure 6).

      The endogenous level of Ync13 is not particular high. Is this low level of Ync13 crucial for its function? Does mildly elevated level of Ync1 promote vesicle fusion at the closing septum?

      Response: We thank the reviewer for this insightful question. To test if there is a correlation between Ync13 levels and vesicle fusion at the division site, we mildly overexpressed Ync13 from the 3nmt1 promoter in YE5S rich medium without additionally added thiamine to obtain cells with different Ync13 levels (the rich medium has some residual amount of thiamine, which partially represses the nmt1 promoter) and then tracked the Rab11 GTPase Ypt3 labeled vesicles. This resulted in increased levels of Ync13 as well as Ypt3 at the division site (Fig. S8B). We measured the Ync13 intensity at division site and counted the number of Ypt3 vesicles reaching the division site in 2-minute continuous movie at the middle focal plane. We observed that increasing Ync13 level promoted the tethering and accumulation of Ypt3 vesicles at the division site until it reached a plateau (Fig. S8B). Thus, the Ync13 level is important for vesicle fusion at the division site. Collectively, Ync13, working with Rga7 and Rng10, plays an important role in vesicle targeting and fusion on the plasma membrane at the division site during cytokinesis. This is consistent with our results that overexpressed Ync13 can mislocalize Sec1 to mitochondria in rng10Δ (Fig. S2G) and can rescue the rga7Δ (Fig. 6).

      Reviewer #3 (Significance (Required)):

      Most of conclusions are well supported by a combination of methods. Out of curiosity, I wonder how much of Bgs4 or Smi1 detected in Co-IP experiments exist in the vesicle-bound form. The authors propose a very interesting working model that addresses several key challenges in achieving vesicle targeting specificity when timely delivery of various enzymes to their respective spatial locations along the primary and secondary septum must be orchestrated. I think this manuscript will be of interest to a broad audience.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Joint Public Review:

      Weaknesses:

      The lack of pleiotropy is an unconfirmable assumption of MR, and the addition of those models is therefore quite important, as this is a primary weakness of the MR approach. Given that concern, I read the sensitivity analyses using pleiotropy-robust models as the main result, and in that case, they can't test their hypotheses as these models do not show a BMI instrumental variable association. The other weakness, which might be remedied, is that the power of the tests here is not described. When a hypothesis is tested with an under-powered model, the apparent lack of association could be due to inadequate sample size rather than a true null. Typically, when a statistically significant association is reported, power concerns are discounted as long as the study is not so small as to create spurious findings. That is the case with their primary BMI instrumental variable model - they find an association so we can presume it was adequately powered. But the primary models they share are not the pleiotropy-robust methods MR-Egger, weighted median, and weighted mode. The tests for these models are null, and that could mean a couple of things: (1) the original primary significant association between the BMI genetic instrument was due to pleiotropy, and they therefore don't have a robust model to explore the effects of the tobacco genetic instrument. (2) The power for the sensitivity analysis models (the pleiotropy-robust methods) is inadequate, and the authors share no discussion about the relative power of the different MR approaches. If they do have adequate power, then again, there is no need to explore the tobacco instrument.

      Reviewing Editor Comments:

      We suggest that the authors add power estimates to assess whether the sample size is sufficient, given the strength and variability of the genetic instruments. It would also be helpful to present effect estimates for the tobacco instruments alone, to clarify their independent contribution and improve the interpretation of the joint models. In addition, the role of pleiotropy should be addressed more clearly, including which model is considered primary. Stratified analyses by smoking status are encouraged, as prior studies indicate that BMI-HNC associations may differ between smokers and non-smokers. Finally, the comparison with previous studies should be revised, as most reported null findings without accounting for tobacco instruments. If this study finds an association, it should not be framed as a replication

      We would like to highlight that post-hoc power calculations are often considered redundant since the statistical power estimated for an observed association is directly related to its p-value[1]. In other words, the uncertainty of the association is already reflected in its 95% confidence interval. However, we understand power calculations may still be of interest to the reader, so we have incorporated them in the revised manuscript. We have edited the text as follows (lines 151-155):“Consequently, we used the total R<sup>2</sup> values to examine the statistical power in our study[42]. However, we acknowledge that the value of post-hoc power calculations is limited, since the statistical power estimated for an observed association is already reflected in the 95% confidence interval presented alongside the point estimate[43].” We have also added supplementary figures 1 and 2.

      We can see that when using the latest HEADSpAcE data we were able to detect BMI-HNC ORs as small as 1.16 with 80% power, while the GAME-ON dataset only permitted the detection of ORs as small as 1.26 using the same BMI instruments (Figure B). We have explained these figures in the results section as follows (lines 257-263): “Using the BMI genetic instruments (total R<sup>2</sup>= 4.8%) and an α of 0.05, we had 80% statistical power to detect an OR as small as 1.16 for HNC risk (Supplementary Figure 1). For WHR (total R<sup>2</sup>= 3.1%) and WC (total R<sup>2</sup>= 4.4%), we could detect odds ratios (ORs) as small as 1.20 and 1.17, respectively. This is an improvement in terms of statistical power compared to the GAME-ON analysis published by Gormley et al.[28], for which there was 80% power to detect an OR as small as 1.26 using the same BMI genetic instruments (Supplementary Figure 2).”

      The reason we use inverse variance weighted (IVW) Mendelian randomization (MR) to obtain our main results rather than the pleiotropy-robust methods mentioned by the reviewer/editors (i.e., MR-Egger, weighted median and weighted mode) is that the former has greater statistical power than the latter[2]. Hence, instead of focussing on the statistical significance of the pleiotropy-robust analyses, we consider it is of more value to compare the consistency of the effect sizes and direction of the effect estimates across methods. Any evidence of such consistency increases our confidence in our main findings, since each method relies on different assumptions. As we cannot be sure about the presence and nature of horizontal pleiotropy, it is useful to compare results across methods even though they are not equally powered. It is true that our results for the genetically predicted effects of body mass index (BMI) on the risk of head and neck cancer (HNC) differ across methods. This is precisely what led us to question the validity of our main finding (suggesting a positive effect of BMI on HNC risk). We have now clarified this in the methods section of the revised manuscript as advised. Lines 165-171:

      “Because the IVW method assumes all genetic variants are valid instruments[44], which is unlikely the case, three pleiotropy-robust two-sample MR methods (i.e., MR-Egger[45], weighted median[46] and weighted mode[47]) were used in sensitivity analyses. When the magnitude and direction of effect estimates are consistent across methods that rely on different assumptions, the main findings are more convincing. As we cannot be sure about the presence and nature of horizontal pleiotropy, it is useful to compare results across methods even if they are not equally powered.”

      We understand that the reviewer/editors are concerned that we do not have a robust model to explore the role of tobacco consumption in the link between BMI and HNC. However, we have a different perspective on the matter. If indeed, the main IVW finding for BMI and HNC is due to pleiotropy (since some of the pleiotropy-robust methods suggest conflicting results), then the IVW multivariable MR method is a way to explore the potential source of this bias[3]. We were particularly interested in exploring the role of smoking in the observed association because smoking and adiposity are known to influence each other [4-9] and share a genetic basis[10, 11].

      We agree that it would be useful to present the univariable MR effect estimates for smoking behaviour and HNC risk along those obtained using multivariable MR. We have now included the univariable MR estimates for both smoking behaviour variables as a note under Supplementary Table 11 and in the manuscript (lines 316-318): “In univariable IVW MR, both CSI and SI were linked to an increased risk of HNC (CSI OR=4.47 per 1-SD higher CSI, 95%CI 3.31–6.03, p<0.001; SI OR=2.07 per 1-SD higher SI 95%CI 1.60–2.68, p<0.001) (Additional File 2: note in Supplementary Table 11).”

      We understand the appeal of conducting stratified MR analyses by smoking status. However, we anticipate such analyses would hinder the interpretation of our findings as they can induce collider bias which could spuriously lead to different effect estimates across strata[12, 13].

      We thank the reviewer/editors for their comment regarding the way we frame of our findings. We have now edited the discussion section to highlight our study results are different to those obtained in studies that do not account for smoking behaviour. Lines 398-401: “With a much larger sample (N=31,523, including 12,264 cases), our IVW MR analysis suggested BMI may play a role in HNC risk, in contrast to previous studies. However, our sensitivity analyses implied that causality was uncertain.”

      Reviewer #1 (Recommendations for the authors):

      The authors do share a table of the percent variance explained of the different genetic instruments, which vary widely, and that table is very welcome because we can get some sense of their utility. The problem is that they don't translate that into a power estimate for the case-control study size that they use. They say that it is the biggest to date, which is good, but without some formal power estimate, it is not particularly reassuring. A framework for MR study power estimates was reported in PMID: 19174578, but that was using very simple MR constructs in use in 2009, and it isn't clear to me if that framework can be used here. That power paper suggests that weak genetic instruments need very large sample sizes, far larger than what is used in the current manuscript. I am unable to estimate the true strength of the instruments used here, and so I am unsure of whether power is an issue or not.

      We have now included power calculations in our manuscript to address the reviewer’s concerns. Nevertheless, as mentioned above, post-hoc power calculations are of limited value, as statistical power is already reflected in the uncertainty around the point estimates (the 95% confidence intervals). Hence, it is important to avoid drawing conclusions regarding the likelihood of true effects or false negatives based on these calculations.

      Although the hypothesis here is that smoking accounts for the apparent BMI association previously reported for HNC, it would have been preferable to see the estimates for their 2 genetic instruments for tobacco alone. The current results only show the BMI instruments alone and then with the tobacco instruments. I would like to see what the risk estimates are for the tobacco instrument alone, so that I can judge for myself what happens in the joint models. As presented, one can only do that for the BMI instruments.

      We thank the reviewer for this comment. The univariable IVW MR estimate of smoking initiation was OR=2.07 (95%CI 1.60 to 2.68, p<0.001), while the one for comprehensive smoking index was OR=4.47 (95%CI 3.31 to 6.03, p<0.001). We have included this information in the manuscript as requested (please see response to reviewing editor above).

      On line 319, they write that "We did not find evidence against bias due to correlated pleiotropy..." I find this difficult to parse, but I think it means that they should believe that correlated pleiotropy remains a problem. So again, they seem to see their primary model as compromised, and so do I. This limitation is again stated by the authors on lines 351-352.

      We apologise if the wording of the sentence was not easy to understand. When using the CAUSE method, we did not find evidence to reject the null hypothesis that the sharing (correlated pleiotropy) model fits the data at least as well as the causal model. In other words, our CAUSE finding and the inconsistencies observed across our other sensitivity analyses led us to believe that our main IVW MR estimate for BMI-HNC was likely biased by correlated pleiotropy. We believe it is important to explore the source of this bias, which is why we used multivariable MR to investigate the direct effect of BMI on HNC risk while accounting for smoking behaviour.

      In the following paragraphs (lines 358-369), the authors state that their findings are consistent with prior reports, but that doesn't seem to be the case if we take their primary BMI instrument as representing the outcome of this manuscript. Here, they find an association between the BMI instrument and HNC risk, but in each of the other papers they present the primary finding was null without the extensive model changes or the aim of accounting for tobacco with another instrument. I don't see that as replication.

      This is a good point. We have now edited the discussion of our manuscript to avoid giving the impression that our findings replicate those from studies that do not account for smoking behaviour in their analyses. We have edited lines 384-401 as follows:

      “Previous MR studies suggest adiposity does not influence HNC risk[27-29]. Gormley et al.[28] did not find a genetically predicted effect of adiposity on combined oral and oropharyngeal cancer when investigating either BMI (OR=0.89 per 1-SD, 95% CI 0.72–1.09, p=0.26), WHR (OR=0.98 per 1-SD, 95% CI 0.74–1.29, p=0.88) or waist circumference (OR=0.73 per 1-SD, 95% CI 0.52–1.02, p=0.07) as risk factors. Similarly, a large two-sample MR study by Vithayathil et al.[29] including 367,561 UK Biobank participants (of which 1,983 were HNC cases) found no link between BMI and HNC risk (OR=0.98 per 1-SD higher BMI, 95% CI 0.93–1.02, p=0.35). Larsson et al.[27] meta-analysed Vithayathil et al.’s[29] findings with results obtained using FinnGen data to increase the sample size even further (N=586,353, including 2,109 cases), but still did not find a genetically predicted effect of BMI on HNC risk (OR=0.96 per 1-SD higher BMI, 95% CI 0.77–1.19, p=0.69). With a much larger sample (N=31,523, including 12,264 cases), our IVW MR analysis suggested BMI may play a role in HNC risk, in contrast to previous studies. However, our sensitivity analyses implied that causality was uncertain.”

      We also deleted part of a sentence in the discussion section, so lines 416-418 now look as follows: “An important strength of our study was that the HEADSpAcE consortium GWAS used had a large sample size which conferred more statistical power to detect effects of adiposity on HNC risk compared to previous MR analyses[27-29].”

      On lines 384-386 they note a strength is that this is the largest study to date, but I would reiterate that larger and more powerful does not equate to adequately powered.

      This is true. We have included power calculations in the manuscript as requested.

      It's well known that different HNC subsites have different etiologies, as they mention on lines 391-392, and it is implicit in their use of data on HPV positive and negative oropharyngeal cancer. They say that they did not find evidence for heterogeneity in this study, but that would only be true for the null BMI instrument. The effect sizes for their smoking instruments are strikingly different between the subsites.

      We agree and are sorry for the confusion we may have caused by the way we worded our findings. We have edited the text to clarify that the lack of subsite heterogeneity only applied to our results for BMI/WHC/WC-HNC risk. Lines 418-424 now read as follows:

      “Furthermore, the availability of data on more HNC subsites, including oropharyngeal cancers by HPV status, allowed us to investigate the relationship between adiposity and HNC risk in more detail than previous MR studies which limited their subsite analyses to oral cavity and overall oropharyngeal cancers[28, 68]. This is relevant because distinct HNC subsites are known to have different aetiologies[69], although we did not find evidence of heterogeneity across subsites in our analyses investigating the genetically predicted effects of BMI, WHR and WC on HNC risk.”

      Finally, the literature on mutational patterns gives us strong reason to believe that HNC caused by tobacco are biologically distinct from tumors not caused by tobacco. The authors report in the introduction that traditional observational studies of BMI and HNC have reported different findings in smokers versus never smokers, so I would assume there is a possibility that the BMI instrument could have different associations with tumors of the tobacco-induced phenotype and tumors with a non-tobacco induced phenotype. I would assume that authors have access to the data on self-reported tobacco use behavior, even if they can't separate these tumors by molecular types. Stratifying their analysis by tobacco users or not might reveal different results with the BMI instrument.

      We appreciate the reviewer’s comment. We agree that it would have been interesting to present stratified analyses by smoking status along our main findings. However, we decided against this because of the risk of inducing collider bias in our MR analyses i.e., where stratifying on smoking status may induce spurious associations between the adiposity instruments and confounding factors. Multivariable MR is considered a better way of investigating the direct effects of an exposure (adiposity) on an outcome (HNC) accounting for a third variable (smoking)[14], which is why we opted for this method instead.

      References:

      (1) Heinsberg LW, Weeks DE: Post hoc power is not informative. Genet Epidemiol 2022, 46(7):390-394.

      (2) Burgess S, Butterworth A, Thompson SG: Mendelian randomization analysis with multiple genetic variants using summarized data. Genet Epidemiol 2013, 37(7):658-665.

      (3) Burgess S, Davey Smith G, Davies NM, Dudbridge F, Gill D, Glymour MM, Hartwig FP, Kutalik Z, Holmes MV, Minelli C et al: Guidelines for performing Mendelian randomization investigations: update for summer 2023. Wellcome Open Res 2019, 4:186.

      (4) Morris RW, Taylor AE, Fluharty ME, Bjorngaard JH, Asvold BO, Elvestad Gabrielsen M, Campbell A, Marioni R, Kumari M, Korhonen T et al: Heavier smoking may lead to a relative increase in waist circumference: evidence for a causal relationship from a Mendelian randomisation meta-analysis. The CARTA consortium. BMJ Open 2015, 5(8):e008808.

      (5) Taylor AE, Morris RW, Fluharty ME, Bjorngaard JH, Asvold BO, Gabrielsen ME, Campbell A, Marioni R, Kumari M, Hallfors J et al: Stratification by smoking status reveals an association of CHRNA5-A3-B4 genotype with body mass index in never smokers. PLoS Genet 2014, 10(12):e1004799.

      (6) Taylor AE, Richmond RC, Palviainen T, Loukola A, Wootton RE, Kaprio J, Relton CL, Davey Smith G, Munafo MR: The effect of body mass index on smoking behaviour and nicotine metabolism: a Mendelian randomization study. Hum Mol Genet 2019, 28(8):1322-1330.

      (7) Asvold BO, Bjorngaard JH, Carslake D, Gabrielsen ME, Skorpen F, Smith GD, Romundstad PR: Causal associations of tobacco smoking with cardiovascular risk factors: a Mendelian randomization analysis of the HUNT Study in Norway. Int J Epidemiol 2014, 43(5):1458-1470.

      (8) Carreras-Torres R, Johansson M, Haycock PC, Relton CL, Davey Smith G, Brennan P, Martin RM: Role of obesity in smoking behaviour: Mendelian randomisation study in UK Biobank. BMJ 2018, 361:k1767.

      (9) Freathy RM, Kazeem GR, Morris RW, Johnson PC, Paternoster L, Ebrahim S, Hattersley AT, Hill A, Hingorani AD, Holst C et al: Genetic variation at CHRNA5-CHRNA3-CHRNB4 interacts with smoking status to influence body mass index. Int J Epidemiol 2011, 40(6):1617-1628.

      (10) Thorgeirsson TE, Gudbjartsson DF, Sulem P, Besenbacher S, Styrkarsdottir U, Thorleifsson G, Walters GB, Consortium TAG, Oxford GSKC, consortium E et al: A common biological basis of obesity and nicotine addiction. Transl Psychiatry 2013, 3(10):e308.

      (11) Wills AG, Hopfer C: Phenotypic and genetic relationship between BMI and cigarette smoking in a sample of UK adults. Addict Behav 2019, 89:98-103.

      (12) Coscia C, Gill D, Benitez R, Perez T, Malats N, Burgess S: Avoiding collider bias in Mendelian randomization when performing stratified analyses. Eur J Epidemiol 2022, 37(7):671-682.

      (13) Hamilton FW, Hughes DA, Lu T, Kutalik Z, Gkatzionis A, Tilling K, Hartwig FP, Davey Smith G: Non-linear Mendelian randomization: evaluation of effect modification in the residual and doubly-ranked methods with simulated and empirical examples. Eur J Epidemiol 2025.

      (14) Sanderson E, Davey Smith G, Windmeijer F, Bowden J: An examination of multivariable Mendelian randomization in the single-sample and two-sample summary data settings. Int J Epidemiol 2019, 48(3):713-727.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer # 1 (Public review)

      This study aims to elucidate the mechanisms by which stress-induced α2A-adrenergic receptor (α2A-AR) internalization leads to cytosolic noradrenaline (NA) accumulation and subsequent neuronal dysfunction in the locus coeruleus (LC). While the manuscript presents an interesting but ambitious model involving calcium dynamics, GIRK channel rundown, and autocrine NA signaling, several key limitations undermine the strength of the conclusions. 

      (1) First, the revision does not include new experiments requested by reviewers to validate core aspects of the mechanism. Specifically, there is no direct measurement of cytosolic NA levels or MAO-A enzymatic activity to support the link between receptor internalization and neurochemical changes. The authors argue that such measurements are either not feasible or beyond the scope of the study, leaving a significant gap in the mechanistic chain of evidence. 

      Although the reviewer #1 commented that “The authors argue that such measurements are either not feasible or beyond the scope of the study, leaving a significant gap in the mechanistic chain of evidence”, we believe that this comment may be unfair. 

      It may be unfair for the reviewer #1 to neglect our responses to the original reviewer comments regarding the direct measurement of cytosolic NA levels. It is true that none of the recommended methods to directly measure cytosolic NA levels are not feasible as described in the original authors’ response (see the original authors’ response to the comment raised by the Reviewer #1 as Recommendations for the authors (2)). To measure extracellular NA with GRAB-NE photometry, α2A-ARs must be expressed in the cell membrane. GRAB-NE photometry is not applicable unless α2A-ARs are expressed, whereas increases in cytosolic NA levels are caused by internalization of α2A-ARs in our study.

      In our study, we elaborated to detect the change in MAO-A protein with Western blot method, instead of examining MAO-A enzymatic activity. Because the relative quantification of active AEP and Tau N368 proteins by Western blot analysis should accurately reflect the change in the MAO-A enzymatic activity, enzymatic assay may not be necessarily required while we admit the necessity of enzymatic assay to better demonstrate the MAO-A activities as discussed in the previously revised manuscript (R1, page 10, lines 314-315). 

      We used the phrase “beyond the scope of the current study” for “the mechanism how Ca<sup>2+</sup> activates MAO-A” as described in the original authors’ responses (see the original authors’ response to the comment raised by the Reviewer #1 as Weakness (3)). We do not think that this mechanism must be investigated in the present study because the Ca<sup>2+</sup> dependent nature of MAO-A activity is already known (Cao et al., 2007). 

      On the other hand, because it is not possible to measure cytosolic NA levels with currently available methods, the quantification of the connection between α2A-AR internalization and increased cytosolic NA levels must be considered outside the scope of the study. However, our study demonstrated the qualitative relationship between α2A-AR internalization and active-AEP/TauN-368 reflecting increased cytosolic NA levels, leaving “a small gap in the mechanistic chain of evidence.” Therefore, it may be unreasonable to criticize our study as “leaving a significant gap in the mechanistic chain of evidence” with the phrase “beyond the scope of the current study.” 

      (2) Second, the behavioral analysis remains insufficient to support claims of cognitive impairment. The use of a single working memory test following an anxiety test is inadequate to verify memory dysfunction behaviors. Additional cognitive assays, such as the Morris Water Maze or Novel Object Recognition, are recommended but not performed.

      As described in the original authors’ response (see the original authors’ response to the comment raised by the Reviewer #1 as Weakness (4)), we had already done another behavioral test using elevated plus maze (EPM) test. By combining the two tests, it may be possible to more accurately evaluate the results of Y-maze test by differentiating the memory impairment from anxiety. However, the results obtained by these behavioral tests showed that chronic RS mice displayed both anxiety-like and memory impairment-like behaviors. Accordingly, we have softened the implication of anxiety and memory impairment (page 13, lines 396-399) and revised the abstract (page 2, line 59) in the revised manuscript (R2).  

      (3) Third, concerns regarding the lack of rigor in differential MAO-A expression in fluorescence imaging were not addressed experimentally. Instead of clarifying the issue, the authors moved the figure to supplementary data without providing further evidence (e.g., an enzymatic assay or quantitative reanalysis of Western blot, or re-staining of IF for MAO-A) to support their interpretation.

      Because the quantification of MAO-A expression can be performed with greater accuracy by means of Western blot than by immunohistochemistry, we have moved the immunohistochemical results (shown in Figure 5) to the supplemental data (Figure S8) following the suggestion made by the Reviewer #3. As the relative quantification of active AEP and Tau N368 proteins by Western blot analysis may accurately reflect changes in the MAO-A enzymatic activity which is consistent with the result of Western blot analysis of MAO-A, enzymatic assay or re-staining of immunofluorescence for MAO-A may not be necessarily required. We do not think that a new experiment of Western blot analysis is necessary to re-evaluate MAO-A just because of the lack of the less-reliable quantification of immunohistochemical staining.

      (4) Fourth, concerns regarding TH staining remain unresolved. In Figure S7, the α2A-AR signal appears to resemble TH staining, and vice versa, raising the possibility of labeling errors. It is recommended that the authors re-examine this issue by either double-checking the raw data or repeating the immunostaining to validate the staining.

      The reviewer #3 is misunderstanding Figure S7. In Figure S7, there are two types of α2A-AR expressing neurons; one is TH-positive LC neuron and the other is TH-negative neuron in mesencephalic trigeminal nucleus (MTN). This clearly indicates that TH staining is specific. Furthermore, α2A-AR staining was much more extensive in MTN neurons than in LC neurons. Thus, α2A-AR signal is not similar to TH signal and there are no labeling errors, which is also evident in the merged image (Figure S7C).

      (5) Overall, the manuscript offers a potentially interesting framework but falls short in providing the experimental rigor necessary to establish causality. The reliance on indirect reasoning and reorganizing of existing data, rather than generating new evidence, limits the overall impact and interpretability of the study.

      Overall, the reviewer #1 was not satisfied with our revision regardless of the authors’ responses. As detailed above in our responses to the replies (1)~(4), we believe that in the original authors’ responses and in the above-described responses we effectively responded to the criticisms by the reviewer #1.

      Reviewer #2 (Public review): 

      Comments on revisions: 

      The authors have addressed all of the reviewers' comments.

      We appreciate constructive and helpful comments made by the reviewer #2.

      Reviewer #3 (Public review): 

      Weaknesses:  

      Nevertheless, the manuscript currently reads as a sequence of discrete experiments rather than a single causal chain. Below, I outline the key points that should be addressed to make the model convincing.

      Please see the responses to the recommendation for the authors made by reviewer #3.

      Reviewer #3 (Recommendations for the authors):

      (1) Causality across the pathway  

      Each step (α2A internalisation, GIRK rundown, Ca<sup>2+</sup> rise, MAO-A/AEP upregulation) is demonstrated separately, but no experiment links them in a single preparation. Consider in vivo Ca<sup>2+</sup> or GRAB NE photometry during restraint stress while probing α2A levels with i.p. clonidine injection or optogenetic over excitation coupled to biochemical readouts. Such integrated evidence would help to overcome the correlational nature of the manuscript to a more mechanistic study. 

      Authors response: It is not possible to measure free cytosolic NA levels with GRAB NE photometry when α2A AR is internalized as described above (see the response to the comment made by reviewer #1 as the recommendation for the authors).

      The core idea behind my comment, as well as that of Reviewer 1, was to encourage integrating your individual findings into a more cohesive in vivo experiment. Using GRAB-NE to measure extracellular NA could serve as an indirect readout of NA uptake via NAT, and ultimately, cytosolic NA levels. Connecting these experiments would significantly strengthen the manuscript and enhance its overall impact. 

      It may be true that the measurement of extracellular NA could serve as an indirect readout of NA uptake via NAT, and ultimately cytosolic NA levels. However, the reviewer #3 is still misunderstanding the applicability of GRAB-NE method to detect NE in our study. As described in the original authors’ response, there appeared to be no fluorescence probe to label cytosolic NA at present. Especially, the GRAB-NE method recommended by the reviewers #1 and #3 is limited to detect NA only when α2A-AR is expressed in the cell membrane.Therefore, when increases in cytosolic NA levels are caused by internalization of α2A-ARs, NA measurement with GRAB-NE photometry is not applicable.

      (2) Pharmacology and NE concentration  

      The use of 100 µM noradrenaline saturates α and β adrenergic receptors alike. Please provide ramp measurements of GIRK current in dose-response at 1-10 µM NE (blocked by atipamezole) to confirm that the rundown really reflects α2A activity rather than mixed receptor effects. 

      Authors response: It is true that 100 µM noradrenaline activates both α and β adrenergic receptors alike. However, it was clearly showed that enhancement of GIRK-I by 100 µM noradrenaline was completely antagonized by 10 µM atipamezole and the Ca<sup>2+</sup> dependent rundown of NA-induced GIRK-I was prevented by 10 µM atipamezole. Considering the Ki values of atipamezole for α2A AR (=1~3 nM) (Vacher et al., 2010, J Med Chem) and β AR (>10 µM) (Virtanen et al., 1989, Arch Int Pharmacodyn Ther), these results really reflect α2A AR activity but not β AR activity (Figure S5). Furthermore, because it is already well established that NA-induced GIRK-I was mediated by α2A AR activity in LC neurons (Arima et al., 1998, J Physiol; Williams et al., 1985, Neuroscience), it is not necessarily need to re-examine 1-10 µM NA on GIRK-I.

      While the milestone papers by Williams remain highly influential, they should be re-evaluated in light of more recent findings, given that they date back over 40 years. Advances in our understanding now allow for a more nuanced interpretation of some of their results. For example, see McKinney et al. (eLife, 2023). This study demonstrates that presynaptic β-adrenergic receptors-particularly β2-can enhance neuronal excitability via autocrine mechanisms. This suggests that your post-activation experiments using atipamezole may not fully exclude a contribution of β-adrenergic signaling. Such a role might become apparent when conducting more detailed titration experiments.

      The reviewer #3 may be misunderstanding the report by McKinney et al. (eLife, 2013). This paper did not demonstrate that presynaptic β-adrenergic receptors-particularly β2- can enhance neuronal excitability via autocrine mechanisms. It is impossible for LC neurons to increase their excitability by activating β-adrenergic receptors, as we have clearly shown that enhancement of GIRK-I by 100 µM noradrenaline was completely antagonized by 10 µM atipamezole. Considering the difference in Ki values of atipamezole for α2-AR (= 2~4 nM) (Vacher et al., 2010, J Med Chem) and β-AR (>10 µM) (Virtanen et al., 1989, Arch Int Pharmacodyn Ther), such a complete antagonization (of 100 µM NA-induced GIRK-I) by 10 µM atipamezole really reflect α2A-AR activity but not β-AR activity (Figure S5). Furthermore, it is already well established that NA-induced GIRK-I was mediated by α2-AR activity in LC neurons (Arima et al., 1998, J Physiol). McKinney et al. (eLife, 2023) have just found the absence of lateral inhibition on adjacent LC neurons by NA autocrine caused respective spike activity. This has nothing to do with autoinhibition.

      (4) Age mismatch and disease claims 

      All electrophysiology and biochemical data come from juvenile (< P30) mice, yet the conclusions stress Alzheimer-related degeneration. Key endpoints need to be replicated in adult or aged mice, or the manuscript should soften its neurodegenerative scope. 

      Authors response: As described in the section of Conclusion, we never stress Alzheimer-related degeneration, but might give such an impression. To avoid such a misunderstanding, we have added a description “However, the present mechanism must be proven to be valid in adult or old mice, to validate its involvement in the pathogenesis of AD.” (R1, page 14, lines 448-450).

      It would be great to see this experiment performed in aged mice-you are the one who has everything in place to do it right now! 

      In our future separate studies, we would like to prove that the present mechanism is valid in aged mice, to validate its involvement in the pathogenesis of AD. This is partly because the patch-clamp study in aged mice is extremely difficult and takes much time.

      Authors response: In the abstract, you suggest that internalization of α2A-adrenergic receptors could represent a therapeutic target for Alzheimer's disease. "...Thus, it is likely that internalization of α2A-AR increased cytosolic NA, as reflected in AEP increases, by facilitating reuptake of autocrine-released NA. The suppression of α2A-AR internalization may have a translational potential for AD treatment."

      α2A-AR internalization was involved in the degeneration of LC neurons. Because we confirmed that spike-frequency adaptation reflecting α2A-AR-mediated autoinhibition can be induced in adult mice as prominently as in juvenile mice (Figure S10), it is not inadequate to suggest that the suppression of α2A-AR internalization may have a translational potential for anxiety/AD treatment (see Discussion; R2, page 14, lines 445-449).

      (6) Quantitative histology  

      Figure 5 presents attractive images, but no numerical analysis is provided. Please provide ROI-based fluorescence quantification (with n values) or move the images to the supplement and rely on the Western blots. 

      Author response: We have moved the immunohistochemical results in Fig. 5 to the supplement, as we believe the quantification of immunohistochemical staining is not necessarily correct.   

      What do you mean by that " ...immunohistochemical staining is not necessarily correct."  

      It is evident that in terms of quantification, Western blot analysis is a more accurate method than immunohistochemical staining. In this sense, it is the contention of our study that the ROI-based fluorescence quantification of immunohistochemical staining is not necessarily an accurate or correct procedure, compared to the quantification by Western blot analysis.

    1. Author response:

      Notes to Editors

      We previously received comments from three reviewers at Biological Psychiatry, which we have addressed in detail below. The following is a summary of the reviewers’ comments along with our responses.

      Reviewers 1 and 2 sought clearer justification for studying the cognition-mental health overlap (covariation) and its neuroimaging correlates. In the revised manuscripts, we expanded the Introduction and Discussion to explicitly outline the theoretical implications of investigating this overlap with machine learning. We also added nuance to the interpretation of the observed associations.

      Reviewer 1 raised concerns about the accessibility of the machine learning methodology for readers without expertise in this field. We revised the Methods section to provide a clearer, step-by-step explanation of our machine learning approach, particularly the two-level machine learning through stacking. We also enhanced the description of the overall machine learning design, including model training, validation, and testing.

      In response to Reviewer 2’s request for deeper interpretation of our findings and stronger theoretical grounding, we have expanded our discussion by incorporating a thorough interpretation of how mental health indices relate to cognition, material that was previously included only in supplementary materials due to word limit constraints. We have further strengthened the theoretical justification for our study design, with particular emphasis on the importance of examining shared variance between cognition and mental health through the derivation of neural markers of cognition. Additionally, to enhance the biological interpretation of our results, we included new analyses of feature importance across neuroimaging modalities, providing clearer insights into which neural features contribute most to the observed relationships.

      Notably, Reviewer 3 acknowledged the strength of our study, including multimodal design, robust analytical approach, and clear visualization and interpretation of results. Their comments were exclusively methodological, underscoring the manuscript’s quality.

      Reviewer 1:

      The authors try to bridge mental health characteristics, global cognition and various MRI-derived (structural, diffusion and resting state fMRI) measures using the large dataset of UK Biobank. Each MRI modality alone explained max 25% of the cognitionmental health covariance, and when combined together 48% of the variance could be explained. As a peer-reviewer not familiar with the used methods (machine learning, although familiar with imaging), the manuscript is hard to read and I wonder what the message for the field might be. In the end of the discussion the authors state '... we provide potential targets for behavioural and physiological interventions that may affect cognition', the real relevance (and impact) of the findings is unclear to me.

      Thank you for your thorough review and practical recommendations. We appreciate your constructive comments and suggestions and hope our revisions adequately address your concerns.

      Major questions

      (1) The methods are hard to follow for people not in this specific subfield, and therefore, I expect that for readers it is hard to understand how valid and how useful the approach is.

      Thank you for your comment. To enhance accessibility for readers without a machine learning background, we revised the Methods section to clarify our analyses while retaining important technical details needed to understand our approach. Recognizing that some concepts may require prior knowledge, we provide detailed explanations of each analysis step, including the machine learning pipeline in the Supplementary Methods.

      Line 188: “We employed nested cross-validation to predict cognition from mental health indices and 72 neuroimaging phenotypes (Fig. 1). Nested cross-validation is a robust method for evaluating machine-learning models while tuning their hyperparameters, ensuring that performance estimates are both accurate and unbiased. Here, we used a nested cross-validation scheme with five outer folds and ten inner folds.

      We started by dividing the entire dataset into five outer folds. Each fold took a turn being held out as the outerfold test set (20% of the data), while the remaining four folds (80% of the data) were used as an outer-fold training set. Within each outer-fold training set, we performed a second layer of cross-validation – this time splitting the data into ten inner folds. These inner folds were used exclusively for hyperparameter tuning: models were trained on nine of the inner folds and validated on the remaining one, cycling through all ten combinations.

      We then selected the hyperparameter configuration that performed best across the inner-fold validation sets, as determined by the minimal mean squared error (MSE). The model was then retrained on the full outer-fold training set using this hyperparameter configuration and evaluated on the outer-fold test set, using four performance metrics: Pearson r, the coefficient of determination ( R<sup>2</sup>), the mean absolute error (MAE), and the MSE. This entire process was repeated for each of the five outer folds, ensuring that every data point is used for both training and testing, but never at the same time. We opted for five outer folds instead of ten to reduce computational demands, particularly memory and processing time, given the substantial volume of neuroimaging data involved in model training. Five outer folds led to an outer-fold test set at least n = 4 000, which should be sufficient for model evaluation. In contrast, we retained ten inner folds to ensure robust and stable hyperparameter tuning, maximising the reliability of model selection.

      To model the relationship between mental health and cognition, we employed Partial Least Squares Regression (PLSR) to predict the g-factor from 133 mental health variables. To model the relationship between neuroimaging data and cognition, we used a two-step stacking approach [15–17,61] to integrate information from 72 neuroimaging phenotypes across three MRI modalities. In the first step, we trained 72 base (first-level) PLSR models, each predicting the g-factor from a single neuroimaging phenotype. In the second step, we used the predicted values from these base models as input features for stacked models, which again predicted the g-factor. We constructed four stacked models based on the source of the base predictions: one each for dwMRI, rsMRI, sMRI, and a combined model incorporating all modalities (“dwMRI Stacked”, “rsMRI Stacked”, “sMRI Stacked”, and “All MRI Stacked”, respectively). Each stacked model was trained using one of four machine learning algorithms – ElasticNet, Random Forest, XGBoost, or Support Vector Regression – selected individually for each model (see Supplementary Materials, S6).

      For rsMRI phenotypes, we treated the choice of functional connectivity quantification method – full correlation, partial correlation, or tangent space parametrization – as a hyperparameter. The method yielding the highest performance on the outer-fold training set was selected for predicting the g-factor (see Supplementary Materials, S5).

      To prevent data leakage, we standardized the data using the mean and standard deviation derived from the training set and applied these parameters to the corresponding test set within each outer fold. This standardization was performed at three key stages: before g-factor derivation, before regressing out modality-specific confounds from the MRI data, and before stacking. Similarly, to maintain strict separation between training and testing data, both base and stacked models were trained exclusively on participants from the outer-fold training set and subsequently applied to the corresponding outer-fold test set.

      To evaluate model performance and assess statistical significance, we aggregated the predicted and observed g_factor values from each outer-fold test set. We then computed a bootstrap distribution of Pearson’s correlation coefficient (_r) by resampling with replacement 5 000 times, generating 95% confidence intervals (CIs) (Fig. 1). Model performance was considered statistically significant if the 95% CI did not include zero, indicating that the observed associations were unlikely to have occurred by chance.”

      (2) If only 40% of the cognition-mental health covariation can be explained by the MRI variables, how to explain the other 60% of the variance? And related to this %: why do the author think that 'this provides us confidence in using MRI to derive quantitative neuromarkers of cognition'?

      Thank you for this insightful observation. Using the MRI modalities available in the UK Biobank, we were able to account for 48% of the covariation between cognition and mental health. The remaining 52% of unexplained variance may arise from several sources. One possibility is the absence of certain neuroimaging modalities in the UK Biobank dataset, such as task-based fMRI contrasts, positron emission tomography, arterial spin labeling, and magnetoencephalography/electroencephalography. Prior research from our group and others has consistently demonstrated strong predictive performance from specific task-based fMRI contrasts, particularly those derived from tasks like the n-Back working memory task and the face-name episodic memory task, none of which is available in the UK Biobank.

      Moreover, there are inherent limitations in using MRI as a proxy for brain structure and function. Measurement error and intra-individual variability, such as differences in a cognitive state between cognitive assessments and MRI acquisition, may also contribute to the unexplained variance. According to the Research Domain Criteria (RDoC) framework, brain circuits represent only one level of neurobiological analysis relevant to cognition. Other levels, including genes, molecules, cells, and physiological processes, may also play a role in the cognition-mental health relationship.

      Nonetheless, neuroimaging provides a valuable window into the biological mechanisms underlying this overlap – insights that cannot be gleaned from behavioural data alone. We have now incorporated these considerations into the Discussion section.

      Line 658: “Although recent debates [18] have challenged the predictive utility of MRI for cognition, our multimodal marker integrating 72 neuroimaging phenotypes captures nearly half of the mental health-explained variance in cognition. We demonstrate that neural markers with greater predictive accuracy for cognition also better explain cognition-mental health covariation, showing that multimodal MRI can capture both a substantial cognitive variance and nearly half of its shared variance with mental health. Finally, we show that our neuromarkers explain a substantial portion of the age- and sex-related variance in the cognition-mental health relationship, highlighting their relevance in modeling cognition across demographic strata.

      The remaining unexplained variance in the relationship between cognition and mental health likely stems from multiple sources. One possibility is the absence of certain neuroimaging modalities in the UK Biobank dataset, such as task-based fMRI contrasts, positron emission tomography, arterial spin labeling, and magnetoencephalography/electroencephalography. Prior research has consistently demonstrated strong predictive performance from specific task-based fMRI contrasts, particularly those derived from tasks like the n-Back working memory task and the face-name episodic memory task, none of which is available in the UK Biobank [15,17,61,69,114,142,151].

      Moreover, there are inherent limitations in using MRI as a proxy for brain structure and function. Measurement error and intra-individual variability, such as differences in a cognitive state between cognitive assessments and MRI acquisition, may also contribute to the unexplained variance. According to the RDoC framework, brain circuits represent only one level of neurobiological analysis relevant to cognition [14]. Other levels, including genes, molecules, cells, and physiological processes, may also play a role in the cognition-mental health relationship.

      Nonetheless, neuroimaging provides a valuable window into the biological mechanisms underlying this overlap – insights that cannot be gleaned from behavioural data alone. Ultimately, our findings validate brain-based neural markers as a fundamental neurobiological unit of analysis, advancing our understanding of mental health through the lens of cognition.”

      Regarding our confidence in using MRI to derive neural markers for cognition, we base this on the predictive performance of MRI-based models. As we note in the Discussion (Line 554: “Consistent with previous studies, we show that MRI data predict individual differences in cognition with a medium-size performance (r ≈ 0.4) [15–17, 28, 61, 67, 68].”), the medium effect size we observed (r ≈ 0.4) agrees with existing literature on brain-cognition relationships, confirming that machine learning leads to replicable results. This effect size represents a moderate yet meaningful association in neuroimaging studies of aging, consistent with reports linking brain to behaviour in adults (Krämer et al., 2024; Tetereva et al., 2022). For example, a recent meta-analysis by Vieira and colleagues (2022) reported a similar effect size (r = 0.42, 95% CI [0.35;0.50]). Our study includes over 15000 participants, comparable to or more than typical meta-analyses, allowing us to characterise our work as a “mega-analysis”. And on top of this predictive performance, we found our neural markers for cognition to capture half of the cognition-mental health covariation, boosting our confidence in our approach.

      Krämer C, Stumme J, da Costa Campos L, Dellani P, Rubbert C, Caspers J, et al. Prediction of cognitive performance differences in older age from multimodal neuroimaging data. GeroScience. 2024;46:283–308.

      Tetereva A, Li J, Deng JD, Stringaris A, Pat N. Capturing brain cognition relationship: Integrating task‐based fMRI across tasks markedly boosts prediction and test‐retest reliability. NeuroImage. 2022;263:119588.

      (3) Imagine that we can increase the explained variance using multimodal MRI measures, why is it useful? What does it learn us? What might be the implications?

      We assume that by variance, Reviewer 1 referred to the cognition-mental health covariation mentioned in point 2) above.

      If we can increase the explained cognition-mental health covariation using multimodal MRI measures, it would mean that we have developed a reasonable neuromarker that is close to RDoC’s neurobiological unit of analysis for cognition. RDoC treats cognition as one of the main basic functional domains that transdiagnostically underly mental health. According to RDoC, mental health should be studied in relation to cognition, alongside other domains such as negative and positive valence systems, arousal and regulatory systems, social processes, and sensorimotor functions. RDoC further emphasizes that each domain, including cognition, should be investigated not only at the behavioural level but also through its neurobiological correlates. This means RDoC aims to discover neural markers of cognition that explain the covariation between cognition and mental health. For us, we approach the development of such neural markers using multimodal neuroimaging. We have now explained the motivation of our study in the first paragraph of the Introduction.

      Line 43: “Cognition and mental health are closely intertwined [1]. Cognitive dysfunction is present in various mental illnesses, including anxiety [2, 3], depression [4–6], and psychotic disorders [7–12]. National Institute of Mental Health’s Research Domain Criteria (RDoC) [13,14] treats cognition as one of the main basic functional domains that transdiagnostically underly mental health. According to RDoC, mental health should be studied in relation to cognition, alongside other domains such as negative and positive valence systems, arousal and regulatory systems, social processes, and sensorimotor functions. RDoC further emphasizes that each domain, including cognition, should be investigated not only at the behavioural level but also through its neurobiological correlates. In this study, we aim to examine how the covariation between cognition and mental health is reflected in neural markers of cognition, as measured through multimodal neuroimaging.”

      More specific issues:

      Introduction

      (4) In the intro the sentence 'in some cases, altered cognitive functioning is directly related to psychiatric symptom severity' is in contrast to the next sentence '... are often stable and persist upon alleviation of psychiatric symptoms'.

      Thank you for pointing this out. The first sentence refers to cases where cognitive deficits fluctuate with symptom severity, while the second emphasizes that core cognitive impairments often remain stable even during symptom remission. To avoid this confusion, we have removed these sentences.

      (5) In the intro the text on the methods (various MRI modalities) is not needed for the Biol Psych readers audience.

      We appreciate your comment. While some members of our target audience may have backgrounds in neuroimaging, machine learning, or psychiatry, we recognize that not all readers will be familiar with all three areas. To ensure accessibility for those who are not familiar with neuroimaging, we included a brief overview of the MRI modalities and quantification methods used in our study to provide context for the specific neuroimaging phenotypes. Additionally, we provided background information on the machine learning techniques employed, so that readers without a strong background in machine learning can still follow our methodology.

      (6) Regarding age of the study sample: I understand that at recruitment the subjects' age ranges from 40 to 69 years. At MRI scanning the age ranges between about 46 to 82. How is that possible? And related to the age of the population: how did the authors deal with age in the analyses, since age is affecting both cognition as the brain measures?

      Thank you for noticing this. In the Methods section, we first outline the characteristics of the UK Biobank cohort, including the age at first recruitment (40-69 years). Table 1 then shows the characteristics of participant subsamples included in each analysis. Since our study used data from Instance 2 (the second in-person visit), participants were approximately 5-13 years older at scanning, resulting in the age range of 46 to 82 years. We clarified the Table 1 caption as follows:

      Line 113: “Table 1. Demographics for each subsample analysed: number, age, and sex of participants who completed all cognitive tests, mental health questionnaires, and MRI scanning”

      We acknowledge that age may influence cognitive and neuroimaging measures. In our analyses, we intentionally preserved age-related variance in brain-cognition relationships across mid and late adulthood, as regressing out age completely would artificially remove biologically meaningful associations. At the same time, we rigorously addressed the effects of age and sex through additional commonality analyses quantifying age and sex contributions to the relationship between cognition and mental health.

      As noted by Reviewer 1 and illustrated in Figure 8, age and sex shared substantial overlapping variance with both mental health and neuroimaging phenotypes in explaining cognitive outcomes. For example, in Figure 8i, age and sex together accounted for 43% of the variance in the cognition-mental health relationship:

      (2.76 + 1.03) / (2.76 + 1.03 + 3.52 + 1.45) ≈ 0.43

      Furthermore, neuromarkers from the all-MRI stacked model explained 72% of this age/sexrelated variance:

      2.76 / (2.76 + 1.03) ≈ 0.72

      This indicates that our neuromarkers captured a substantial portion of the cognition-mental health covariation that varied with age and sex, highlighting their relevance in age/sex-sensitive cognitive modeling.

      In the Methods, Results, and Discussion, we say:

      Methods

      Line 263: “To understand how demographic factors, including age and sex, contribute to this relationship, we also conducted a separate set of commonality analyses treating age, sex, age2, age×sex, and age2×sex as an additional set of explanatory variables (Fig. 1).”

      Results

      Line 445: “Age and sex shared substantial overlapping variance with both mental health and neuroimaging in explaining cognition, accounting for 43% of the variance in the cognition-mental health relationship. Multimodal neural marker of cognition based on three MRI modalities (“All MRI Stacked”) explained 72% of this age and sex-related variance (Fig. 8i–l and Table S21).”

      Discussion

      Line 660: “We demonstrate that neural markers with greater predictive accuracy for cognition also better explain cognition-mental health covariation, showing that multimodal MRI can capture both a substantial cognitive variance and nearly half of its shared variance with mental health. Finally, we show that our neuromarkers explain a substantial portion of the age- and sex-related variance in the cognition-mental health relationship, highlighting their relevance in modeling cognition across demographic strata.”

      (7) Regarding the mental health variables: where characteristics with positive value (e.g. happiness and subjective wellbeing) reversely scored (compared to the negative items, such as anxiety, addition, etc)?

      We appreciate you noting this. These composite scores primarily represent standard clinical measures such as the GAD-7 anxiety scale and N-12 neuroticism scale. We did not reverse the scores to keep their directionality, therefore making interpretability consistent with the original studies the scores were derived from (e.g., Davis et al., 2020; Dutt et al., 2022). Complete descriptive statistics for all mental health indices and detailed derivation procedures are provided in the Supplementary Materials (S2). On Page 6, Supplementary Methods, we say:

      Line 92: “Composite mental health scores included the Generalized Anxiety Disorder (GAD-7), the Posttraumatic Stress Disorder (PTSD) Checklist (PCL-6), the Alcohol Use Disorders Identification Test (AUDIT), the Patient Health Questionnaire (PHQ-9) [12], the Eysenck Neuroticism (N-12), Probable Depression Status (PDS), and the Recent Depressive Symptoms (RDS-4) scores [13, 14]. To calculate the GAD-7, PCL-6, AUDIT, and PHQ-9, we used questions introduced at the online follow-up [12]. To obtain the N-12, PDS, and RDS-4 scores [14], we used data collected during the baseline assessment [13, 14].

      We subcategorized depression and GAD based on frequency, current status (ever had depression or anxiety and current status of depression or anxiety), severity, and clinical diagnosis (depression or anxiety confirmed by a healthcare practitioner). Additionally, we differentiated between different depression statuses, such as recurrent depression, depression triggered by loss, etc. Variables related to self-harm were subdivided based on whether a person has ever self-harmed with the intent to die.

      To make response scales more intuitive, we recorded responses within the well-being domain such that the lower score corresponded to a lesser extent of satisfaction (“Extremely unhappy”) and the higher score indicated a higher level of happiness (“Extremely happy”). For all questions, we assigned the median values to “Prefer not to answer” (-818 for in-person assessment and -3 for online questionnaire) and “Do not know” (-121 for in-person assessment and -1 for online questionnaire) responses. We excluded the “Work/job satisfaction” question from the mental health derivatives list because it included a “Not employed” response option, which could not be reasonably coded.

      To calculate the risk of PTSD, we used questions from the PCL-6 questionnaire. Following Davis and colleagues [12], PCL-6 scores ranged from 6 to 29. A PCL-6 score of 12 or below corresponds to a low risk of meeting the Clinician-Administered PTSD Scale diagnostic criteria. PCL-6 scores between 13 and 16 and between 17 and 25 are indicative of an increased risk and high risk of PTSD, respectively. A score of above 26 is interpreted as a very high risk of PTSD [12, 15]. PTSD status was set to positive if the PCL-6 score exceeded or was equal to 14 and encompassed stressful events instead of catastrophic trauma alone [12].

      To assess alcohol consumption, alcohol dependence, and harm associated with drinking, we calculated the sum of the ten questions from the AUDIT questionnaire [16]. We additionally subdivided the AUDIT score into the alcohol consumption score (questions 1-3, AUDIT-C) and the score reflecting problems caused by alcohol (questions 4-10, AUDIT-P) [17]. In questions 2-10 that followed the first trigger question (“Frequency of drinking alcohol”), we replaced missing values with 0 as they would correspond to a “Never” response to the first question.

      An AUDIT score cut-off of 8 suggests moderate or low-risk alcohol consumption, and scores of 8 to 15 and above 15 indicate severe/harmful and hazardous (alcohol dependence or moderate-severe alcohol use disorder) drinking, respectively [16, 18]. Subsequently, hazardous alcohol use and alcohol dependence status correspond to AUDIT scores of ≥ 8 and ≥ 15, respectively. The “Alcohol dependence ever” status was set to positive if a participant had ever been physically dependent on alcohol. To reduce skewness, we logx+1-transformed the AUDIT, AUDIT-C, and AUDIT-P scores [17].”

      Davis KAS, Coleman JRI, Adams M, Allen N, Breen G, Cullen B, et al. Mental health in UK Biobank – development, implementation and results from an online questionnaire completed by 157 366 participants: a reanalysis. BJPsych Open. 2020;6:e18.

      Dutt RK, Hannon K, Easley TO, Griffis JC, Zhang W, Bijsterbosch JD. Mental health in the UK Biobank: A roadmap to selfreport measures and neuroimaging correlates. Hum Brain Mapp. 2022;43:816–832.  

      (8) In the discussion section (page 23, line 416-421), the authors refer to specific findings that are not described in the results section > I would add these findings to the main manuscript (including the discussion / interpretation).

      We appreciate your careful reading. We agree that our original Results section did not explicitly describe the factor loadings for mental health in the PLSR model, despite discussing their implications later in the paper. We needed to include this part of the discussion in the Supplementary Materials to meet the word limit of the original submission. However, in response to your suggestion, we have now added the results regarding factor loadings to the Results section. We also moved the discussion of the association between mental health features and general cognition from the Supplementary Material to the manuscript’s Discussion.

      Results

      Line 298: “On average, information about mental health predicted the g-factor at  R<sup>2</sup><sub>mean</sub> = 0.10 and r<sub>mean</sub> \= 0.31 (95% CI [0.291, 0.315]; Fig. 2b and 2c and Supplementary Materials, S9, Table S12). The magnitude and direction of factor loadings for mental health in the PLSR model allowed us to quantify the contribution of individual mental health indices to cognition. Overall, the scores for mental distress, alcohol and cannabis use, and self-harm behaviours relate positively, and the scores for anxiety, neurological and mental health diagnoses, unusual or psychotic experiences, happiness and subjective well-being, and negative traumatic events relate negatively to cognition.”

      Discussion

      Line 492: “Factor loadings derived from the PLSR model showed that the scores for mental distress, alcohol and cannabis use, and self-harm behaviours related positively, and the scores for anxiety, neurological and mental health diagnoses, unusual or psychotic experiences, happiness and subjective well-being, and negative traumatic events related negatively to the g-factor. Positive PLSR loadings of features related to mental distress may indicate greater susceptibility to or exaggerated perception of stressful events, psychological overexcitability, and predisposition to rumination in people with higher cognition [72]. On the other hand, these findings may be specific to the UK Biobank cohort and the way the questions for this mental health category were constructed. In particular, to evaluate mental distress, the UK Biobank questionnaire asked whether an individual sought or received medical help for or suffered from mental distress. In this regard, the estimate for mental distress may be more indicative of whether an individual experiencing mental distress had an opportunity or aspiration to visit a doctor and seek professional help [73]. Thus, people with better cognitive abilities and also with a higher socioeconomic status may indeed be more likely to seek professional help.

      Limited evidence supports a positive association between self-harm behaviours and cognitive abilities, with some studies indicating higher cognitive performance as a risk factor for non-suicidal self-harm. Research shows an inverse relationship between cognitive control of emotion and suicidal behaviours that weakens over the life course [73,74]. Some studies have found a positive correlation between cognitive abilities and the risk of nonsuicidal self-harm, suicidal thoughts, and suicidal plans that may be independent of or, conversely, affected by socioeconomic status [75,76]. In our study, the magnitude of the association between self-harm behaviours and cognition was low (Fig. 2), indicating a weak relationship.

      Positive PLSR loadings of features related to alcohol and cannabis may also indicate the influence of other factors. Overall, this relationship is believed to be largely affected by age, income, education, social status, social equality, social norms, and quality of life [79–80]. For example, education level and income correlate with cognitive ability and alcohol consumption [79,81–83]. Research also links a higher probability of having tried alcohol or recreational drugs, including cannabis, to a tendency of more intelligent individuals to approach evolutionary novel stimuli [84,85]. This hypothesis is supported by studies showing that cannabis users perform better on some cognitive tasks [86]. Alternatively, frequent drinking can indicate higher social engagement, which is positively associated with cognition [87]. Young adults often drink alcohol as a social ritual in university settings to build connections with peers [88]. In older adults, drinking may accompany friends or family visits [89,90]. Mixed evidence on the link between alcohol and drug use and cognition makes it difficult to draw definite conclusions, leaving an open question about the nature of this relationship.

      Consistent with previous studies, we showed that anxiety and negative traumatic experiences were inversely associated with cognitive abilities [90–93]. Anxiety may be linked to poorer cognitive performance via reduced working memory capacity, increased focus on negative thoughts, and attentional bias to threatening stimuli that hinder the allocation of cognitive resources to a current task [94–96]. Individuals with PTSD consistently showed impaired verbal and working memory, visual attention, inhibitory function, task switching, cognitive flexibility, and cognitive control [97–100]. Exposure to traumatic events that did not reach the PTSD threshold was also linked to impaired cognition. For example, childhood trauma is associated with worse performance in processing speed, attention, and executive function tasks in adulthood, and age at a first traumatic event is predictive of the rate of executive function decline in midlife [101,102]. In the UK Biobank cohort, adverse life events have been linked to lower cognitive flexibility, partially via depression level [103].

      In agreement with our findings, cognitive deficits are often found in psychotic disorders [104,105]. We treated neurological and mental health symptoms as predictor variables and did not stratify or exclude people based on psychiatric status or symptom severity. Since no prior studies have examined isolated psychotic symptoms (e.g., recent unusual experiences, hearing unreal voices, or seeing unreal visions), we avoid speculating on how these symptoms relate to cognition in our sample.

      Finally, negative PLSR loadings of the features related to happiness and subjective well-being may be specific to the study cohort, as these findings do not agree with some previous research [107–109]. On the other hand, our results agree with the study linking excessive optimism or optimistic thinking to lower cognitive performance in memory, verbal fluency, fluid intelligence, and numerical reasoning tasks, and suggesting that pessimism or realism indicates better cognition [110]. The concept of realism/optimism as indicators of cognition is a plausible explanation for a negative association between the g-factor and friendship satisfaction, as well as a negative PLSR loading of feelings that life is meaningful, especially in older adults who tend to reflect more on the meaning of life [111]. The latter is supported by the study showing a negative association between cognitive function and the search for the meaning of life and a change in the pattern of this relationship after the age of 60 [112]. Finally, a UK Biobank study found a positive association of happiness with speed and visuospatial memory but a negative relationship with reasoning ability [113].”

      (9) In the discussion section (page 24, line 440-449), the authors give an explanation on why the diffusion measure have limited utility, but the arguments put forward also concern structural and rsfMRI measures.

      Thank you for this important observation. Indeed, the argument about voxel-averaged diffusion components (“… these metrics are less specific to the properties of individual white matter axons or bundles, and instead represent a composite of multiple diffusion components averaged within a voxel and across major fibre pathways”) could theoretically apply across other MRI modalities. We have therefore removed this point from the discussion to avoid overgeneralization. However, we maintain our central argument about the biological specificity of conventional tractography-derived diffusion metrics as their particular sensitivity to white matter microstructure (e.g., axonal integrity, myelin content) may make them better suited for detecting neuropathological changes than dynamic cognitive processes. This interpretation aligns with the mixed evidence linking these metrics to cognitive performance, despite their established utility in detecting white matter abnormalities in clinical populations (e.g., Bergamino et al., 2021; Silk et al., 2009). We clarify this distinction in the manuscript.

      Line 572: “The somewhat limited utility of diffusion metrics derived specifically from probabilistic tractography in serving as robust quantitative neuromarkers of cognition and its shared variance with mental health may stem from their greater sensitivity and specificity to neuronal integrity and white matter microstructure rather than to dynamic cognitive processes. Critically, probabilistic tractography may be less effective at capturing relationships between white matter microstructure and behavioural scores cross-sectionally, as this method is more sensitive to pathological changes or dynamic microstructural alterations like those occurring during maturation. While these indices can capture abnormal white matter microstructure in clinical populations such as Alzheimer’s disease, schizophrenia, or attention deficit hyperactivity disorder (ADHD) [117–119], the empirical evidence on their associations with cognitive performance is controversial [114, 120–126].”

      Bergamino M, Walsh RR, Stokes AM. Free-water diffusion tensor imaging improves the accuracy and sensitivity of white matter analysis in Alzheimer’s disease. Sci Rep. 2021;11:6990.

      Silk TJ, Vance A, Rinehart N, Bradshaw JL, Cunnington R. White-matter abnormalities in attention deficit hyperactivity disorder: a diffusion tensor imaging study. Hum Brain Mapp. 2009;30:2757–2765.

      Reviewer 2:

      This is an interesting study combining a lot of data to investigate the link between cognition and mental health. The description of the study is very clear, it's easy to read for someone like me who does not have a lot of expertise in machine learning.

      We thank you for your thorough review and constructive feedback. Your insightful comments have helped us identify conceptual and methodological aspects that required improvement in the manuscript. We have incorporated relevant changes throughout the paper, and below, we address each of your points in detail.

      Comment 1: My main concern with this manuscript is that it is not yet clear to me what it exactly means to look at the overlap between cognition and mental health. This relation is r=0.3 which is not that high, so why is it then necessary to explain this overlap with neuroimaging measures? And, could it be that the relation between cognition and mental health is explained by third variables (environment? opportunities?). In the introduction I miss an explanation of why it is important to study this and what it will tell us, and in the discussion I would like to read some kind of 'answer' to these questions.

      Thank you. It’s important to clarify why we investigated the relationship between cognition and mental health, and what we found using data from the UK Biobank.

      Conceptually, our work is grounded in the Research Domain Criteria (RDoC; Insel et al., 2010) framework. RDoC conceptualizes mental health not through traditional diagnostic categories, but through core functional domains that span the full spectrum from normal to abnormal functioning. These domains include cognition, negative and positive valence systems, arousal and regulatory systems, social processes, and sensorimotor functions. Within this framework, cognition is considered a fundamental domain that contributes to mental health across diagnostic boundaries. Meta-analytic evidence supports a link between cognitive functioning and mental health (Abramovitch, et al., 2021; East-Richard, et al., 2020). In the context of a large, population-based dataset like the UK Biobank, this implies that cognitive performance – as measured by various cognitive tasks – should be meaningfully associated with available mental health indicators.

      However, because cognition is only one of several functional domains implicated in mental health, we do not expect the covariation between cognition and mental health to be very high. Other domains, such as negative and positive valence systems, arousal and regulatory systems, or social processing, may also play significant roles. Theoretically, this places an upper bound on the strength of the cognition-mental health relationship, especially in normative, nonclinical samples.

      Our current findings from the UK Biobank reflect this. Most of the 133 mental health variables showed relatively weak individual correlations with cognition (mean r \= 0.01, SD = 0.05, min r \= –0.08, max r \= 0.17; see Figure 2). However, using a PLS-based machine learning approach, we were able to integrate information across all mental-health variables to predict cognition, yielding an out-of-sample correlation of r = 0.31 [95% CI: 0.29, 0.32].  

      We believe this estimate approximates the true strength of the cognition-mental health relationship in normative samples, consistent with both theoretical expectations and prior empirical findings. Theoretically, this aligns with the RDoC view that cognition is one of several contributing domains. Empirically, our results are consistent with findings from our previous mega-analysis in children (Wang et al., 2025). Moreover, in the field of gerontology, an effect size of r = 0.31 is not considered small. According to Brydges (2019), it falls around the 70th percentile of effect sizes reported in gerontological studies and approaches the threshold for a large effect (r \= 0.32). Given that most studies report within-sample associations, our out-of-sample results are likely more robust and generalizable (Yarkoni & Westfall, 2017).

      To answer, “why is it then necessary to explain this overlap with neuroimaging measures”, we again draw on the conceptual foundation of the RDoC framework. RDoC emphasizes that each functional domain, such as cognition, should be studied not only at the behavioural level but also across multiple neurobiological units of analysis, including genes, molecules, cells, circuits, physiology, and behaviour.

      MRI-based neural markers represent one such level of analysis. While other biological systems (e.g., genetic, molecular, or physiological) also contribute to the cognition-mental health relationship, neuroimaging provides unique insights into the brain mechanisms underlying this association – insights that cannot be obtained from behavioural data alone.

      In response to the related question, “Could the relationship between cognition and mental health be explained by third variables (e.g., environment, opportunities)?”, we note that developing a neural marker of cognition capable of capturing its relationship with mental health is the central aim of this study. Using the MRI modalities available in the UK Biobank, we were able to account for 48% of the covariation between cognition and mental health.

      The remaining 52% of unexplained variance may stem from several sources. According to the RDoC framework, neuromarkers could be further refined by incorporating additional neuroimaging modalities (e.g., task-based fMRI, PET, ASL, MEG/EEG, fNIRS) and integrating other units of analysis such as genetic, molecular, cellular, and physiological data.

      Once more comprehensive neuromarkers are developed, capturing a greater proportion of the cognition-mental health covariation, they may also lead to new research direction – to investigate how environmental factors and life opportunities influence these markers. However, exploring those environmental contributions lies beyond the scope of the current study.

      We discuss these considerations and explain the motivation of our study in the revised Introduction and Discussion.

      Line 481: “Our analysis confirmed the validity of the g-factor [31] as a quantitative measure of cognition [31], demonstrating that it captures nearly half (39%) of the variance across twelve cognitive performance scores, consistent with prior studies [63–68]. Furthermore, we were able to predict cognition from 133 mental health indices, showing a medium-sized relationship that aligns with existing literature [69,70]. Although the observed mental health-cognition association is lower than within-sample estimates in conventional regression models, it aligns with our prior mega-analysis in children [69]. Notably, this effect size is not considered small in gerontology. In fact, it falls around the 70th percentile of reported effects and approaches the threshold for a large effect at r = 0.32 [71]. While we focused specifically on cognition as an RDoC core domain, the strength of its relationship with mental health may be bounded by the influence of other functional domains, particularly in normative, non-clinical samples – a promising direction for future research.”

      Line 658: “Although recent debates [18] have challenged the predictive utility of MRI for cognition, our multimodal marker integrating 72 neuroimaging phenotypes captures nearly half of the mental health-explained variance in cognition. We demonstrate that neural markers with greater predictive accuracy for cognition also better explain cognition-mental health covariation, showing that multimodal MRI can capture both a substantial cognitive variance and nearly half of its shared variance with mental health. Finally, we show that our neuromarkers explain a substantial portion of the age- and sex-related variance in the cognition-mental health relationship, highlighting their relevance in modeling cognition across demographic strata.

      The remaining unexplained variance in the relationship between cognition and mental health likely stems from multiple sources. One possibility is the absence of certain neuroimaging modalities in the UK Biobank dataset, such as task-based fMRI contrasts, positron emission tomography, arterial spin labeling, and magnetoencephalography/electroencephalography. Prior research has consistently demonstrated strong predictive performance from specific task-based fMRI contrasts, particularly those derived from tasks like the n-Back working memory task and the face-name episodic memory task, none of which is available in the UK Biobank [15,17,61,69,114,142,151].

      Moreover, there are inherent limitations in using MRI as a proxy for brain structure and function. Measurement error and intra-individual variability, such as differences in a cognitive state between cognitive assessments and MRI acquisition, may also contribute to the unexplained variance. According to the RDoC framework, brain circuits represent only one level of neurobiological analysis relevant to cognition [14]. Other levels, including genes, molecules, cells, and physiological processes, may also play a role in the cognition-mental health relationship.

      Nonetheless, neuroimaging provides a valuable window into the biological mechanisms underlying this overlap – insights that cannot be gleaned from behavioural data alone. Ultimately, our findings validate brain-based neural markers as a fundamental neurobiological unit of analysis, advancing our understanding of mental health through the lens of cognition.”

      Introduction

      Line 43: “Cognition and mental health are closely intertwined [1]. Cognitive dysfunction is present in various mental illnesses, including anxiety [2, 3], depression [4–6], and psychotic disorders [7–12]. National Institute of Mental Health’s Research Domain Criteria (RDoC) [13,14] treats cognition as one of the main basic functional domains that transdiagnostically underly mental health. According to RDoC, mental health should be studied in relation to cognition, alongside other domains such as negative and positive valence systems, arousal and regulatory systems, social processes, and sensorimotor functions. RDoC further emphasizes that each domain, including cognition, should be investigated not only at the behavioural level but also through its neurobiological correlates. In this study, we aim to examine how the covariation between cognition and mental health is reflected in neural markers of cognition, as measured through multimodal neuroimaging.”

      Discussion

      Line 481: “Our analysis confirmed the validity of the g-factor [31] as a quantitative measure of cognition [31], demonstrating that it captures nearly half (39%) of the variance across twelve cognitive performance scores, consistent with prior studies [63–68]. Furthermore, we were able to predict cognition from 133 mental health indices, showing a medium-sized relationship that aligns with existing literature [69,70]. Although the observed mental health-cognition association is lower than within-sample estimates in conventional regression models, it aligns with our prior mega-analysis in children [69]. Notably, this effect size is not considered small in gerontology. In fact, it falls around the 70th percentile of reported effects and approaches the threshold for a large effect at r = 0.32 [71]. While we focused specifically on cognition as an RDoC core domain, the strength of its relationship with mental health may be bounded by the influence of other functional domains, particularly in normative, non-clinical samples – a promising direction for future research.”

      Line 658: “Although recent debates [18] have challenged the predictive utility of MRI for cognition, our multimodal marker integrating 72 neuroimaging phenotypes captures nearly half of the mental health-explained variance in cognition. We demonstrate that neural markers with greater predictive accuracy for cognition also better explain cognition-mental health covariation, showing that multimodal MRI can capture both a substantial cognitive variance and nearly half of its shared variance with mental health. Finally, we show that our neuromarkers explain a substantial portion of the age- and sex-related variance in the cognition-mental health relationship, highlighting their relevance in modeling cognition across demographic strata.

      The remaining unexplained variance in the relationship between cognition and mental health likely stems from multiple sources. One possibility is the absence of certain neuroimaging modalities in the UK Biobank dataset, such as task-based fMRI contrasts, positron emission tomography, arterial spin labeling, and magnetoencephalography/electroencephalography. Prior research has consistently demonstrated strong predictive performance from specific task-based fMRI contrasts, particularly those derived from tasks like the n-Back working memory task and the face-name episodic memory task, none of which is available in the UK Biobank [15,17,61,69,114,142,151].

      Moreover, there are inherent limitations in using MRI as a proxy for brain structure and function. Measurement error and intra-individual variability, such as differences in a cognitive state between cognitive assessments and MRI acquisition, may also contribute to the unexplained variance. According to the RDoC framework, brain circuits represent only one level of neurobiological analysis relevant to cognition [14]. Other levels, including genes, molecules, cells, and physiological processes, may also play a role in the cognition-mental health relationship.

      Nonetheless, neuroimaging provides a valuable window into the biological mechanisms underlying this overlap – insights that cannot be gleaned from behavioural data alone. Ultimately, our findings validate brain-based neural markers as a fundamental neurobiological unit of analysis, advancing our understanding of mental health through the lens of cognition.”

      Insel T, Cuthbert B, Garvey M, Heinssen R, Pine DS, Quinn K, et al. Research Domain Criteria (RDoC): Toward a New Classification Framework for Research on Mental Disorders. AJP. 2010;167:748–751.

      Abramovitch, A., Short, T., & Schweiger, A. (2021). The C Factor: Cognitive dysfunction as a transdiagnostic dimension in psychopathology. Clinical Psychology Review, 86, 102007.

      East-Richard, C., R. -Mercier, A., Nadeau, D., & Cellard, C. (2020). Transdiagnostic neurocognitive deficits in psychiatry: A review of meta-analyses. Canadian Psychology / Psychologie Canadienne, 61(3), 190–214.

      Wang Y, Anney R, Pat N. The relationship between cognitive abilities and mental health as represented by cognitive abilities at the neural and genetic levels of analysis. eLife. 2025.14:RP105537.

      Brydges CR. Effect Size Guidelines, Sample Size Calculations, and Statistical Power in Gerontology. Innovation in Aging. 2019;3(4):igz036.

      Yarkoni T, Westfall J. Choosing Prediction Over Explanation in Psychology: Lessons From Machine Learning. Perspect Psychol Sci. 2017;12(6):1100-1122.

      Comment 2 Title: - Shouldn't it be "MRI markers" (plural)?

      We used the singular form (“marker”) intentionally, as it refers to the composite neuroimaging marker derived from all three MRI modalities in our stacked model. This multimodal marker represents the combined predictive power of all modalities and captures the highest proportion of the mental health-cognition relationship in our analyses.

      Comment 3: Introduction - I miss an explanation of why it is useful to look at cognition-mental health covariation

      We believe we have sufficiently addressed this comment in our response to Reviewer 2, comment 1 above.

      Comment 4: - "Demonstrating that MRI-based neural indicators of cognition capture the covariation between cognition and mental health will thereby support the utility of such indicators for understanding the etiology of mental health" (page 4, line 56-58) - how/why?

      Previous research has largely focused on developing MRI-based neural indicators that accurately predict cognitive performance (Marek et al., 2022; Vieira et al., 2020). Building on this foundation, our findings further demonstrate that the predictive performance of a neural indicator for cognition is closely tied to its ability to explain the covariation between cognition and mental health. In other words, the robustness of a neural indicator – its capacity to capture individual differences in cognition – is strongly associated with how well it reflects the shared variance between cognition and mental health.

      This insight is particularly important within the context of the RDoC framework, which seeks to understand the etiology of mental health through functional domains (such as cognition) and their underlying neurobiological units of analysis (Insel et al., 2010). According to RDoC, for a neural indicator of cognition to be informative for mental health research, it must not only predict cognitive performance but also capture its relationship with mental health.

      Furthermore, RDoC emphasizes the integration of neurobiological measures to investigate the influence of environmental and developmental factors on mental health. In line with this, our neural indicators of cognition may serve as valuable tools in future research aimed at understanding how environmental exposures and developmental trajectories shape mental health outcomes. We discuss this in more detail in the revised Discussion.

      Line 481: “Our analysis confirmed the validity of the g-factor [31] as a quantitative measure of cognition [31], demonstrating that it captures nearly half (39%) of the variance across twelve cognitive performance scores, consistent with prior studies [63–68]. Furthermore, we were able to predict cognition from 133 mental health indices, showing a medium-sized relationship that aligns with existing literature [69,70]. Although the observed mental health-cognition association is lower than within-sample estimates in conventional regression models, it aligns with our prior mega-analysis in children [69]. Notably, this effect size is not considered small in gerontology. In fact, it falls around the 70th percentile of reported effects and approaches the threshold for a large effect at r = 0.32 [71]. While we focused specifically on cognition as an RDoC core domain, the strength of its relationship with mental health may be bounded by the influence of other functional domains, particularly in normative, non-clinical samples – a promising direction for future research.”

      Line 658: “Although recent debates [18] have challenged the predictive utility of MRI for cognition, our multimodal marker integrating 72 neuroimaging phenotypes captures nearly half of the mental health-explained variance in cognition. We demonstrate that neural markers with greater predictive accuracy for cognition also better explain cognition-mental health covariation, showing that multimodal MRI can capture both a substantial cognitive variance and nearly half of its shared variance with mental health. Finally, we show that our neuromarkers explain a substantial portion of the age- and sex-related variance in the cognition-mental health relationship, highlighting their relevance in modeling cognition across demographic strata.

      The remaining unexplained variance in the relationship between cognition and mental health likely stems from multiple sources. One possibility is the absence of certain neuroimaging modalities in the UK Biobank dataset, such as task-based fMRI contrasts, positron emission tomography, arterial spin labeling, and magnetoencephalography/electroencephalography. Prior research has consistently demonstrated strong predictive performance from specific task-based fMRI contrasts, particularly those derived from tasks like the n-Back working memory task and the face-name episodic memory task, none of which is available in the UK Biobank [15,17,61,69,114,142,151].

      Moreover, there are inherent limitations in using MRI as a proxy for brain structure and function. Measurement error and intra-individual variability, such as differences in a cognitive state between cognitive assessments and MRI acquisition, may also contribute to the unexplained variance. According to the RDoC framework, brain circuits represent only one level of neurobiological analysis relevant to cognition [14]. Other levels, including genes, molecules, cells, and physiological processes, may also play a role in the cognition-mental health relationship.

      Nonetheless, neuroimaging provides a valuable window into the biological mechanisms underlying this overlap – insights that cannot be gleaned from behavioural data alone. Ultimately, our findings validate brain-based neural markers as a fundamental neurobiological unit of analysis, advancing our understanding of mental health through the lens of cognition.”

      Marek S, Tervo-Clemmens B, Calabro FJ, Montez DF, Kay BP, Hatoum AS, et al. Reproducible brain-wide association studies require thousands of individuals. Nature. 2022;603:654–660.

      Vieira S, Gong QY, Pinaya WHL, et al. Using Machine Learning and Structural Neuroimaging to Detect First Episode Psychosis: Reconsidering the Evidence. Schizophr Bull. 2020;46(1):17-26.

      Insel T, Cuthbert B, Garvey M, Heinssen R, Pine DS, Quinn K, et al. Research Domain Criteria (RDoC): Toward a New Classification Framework for Research on Mental Disorders. AJP. 2010;167:748–751.

      Comment 5: - The explanation about the stacking approach is not yet completely clear to me. I don't understand how the target variable can be the dependent variable in both step one and two. Or are those different variables? It would be helpful to also give an example of the target variable in line 88 on page 5

      Thank you for this excellent question. In our stacking approach, the same target variable, the g-factor, is indeed used across both modeling stages, but with a key distinction in how predictions are generated and integrated.

      In the first-level models, we trained separate Partial Least Squares Regression (PLSR) models for each of the 72 neuroimaging phenotypes, each predicting the g-factor independently. The predicted values from these 72 models were then used as input features for the second-level stacked model, which combined them to generate a final prediction of the g-factor. This twostage framework enables us to integrate information across multiple imaging modalities while maintaining a consistent prediction target.

      To avoid data leakage, both modeling stages were conducted entirely within the training set for each cross-validation fold. Only after the second-level model was trained was it applied to the outer-fold test participants who were not involved in any part of the model training process.

      To improve accessibility, we have revised the Methods section (see Page 10) to clarify this approach, ensuring that the description remains technically accurate while being easier to follow.

      Line 188: “We employed nested cross-validation to predict cognition from mental health indices and 72 neuroimaging phenotypes (Fig. 1). Nested cross-validation is a robust method for evaluating machine-learning models while tuning their hyperparameters, ensuring that performance estimates are both accurate and unbiased. Here, we used a nested cross-validation scheme with five outer folds and ten inner folds.

      We started by dividing the entire dataset into five outer folds. Each fold took a turn being held out as the outerfold test set (20% of the data), while the remaining four folds (80% of the data) were used as an outer-fold training set. Within each outer-fold training set, we performed a second layer of cross-validation – this time splitting the data into ten inner folds. These inner folds were used exclusively for hyperparameter tuning: models were trained on nine of the inner folds and validated on the remaining one, cycling through all ten combinations.

      We then selected the hyperparameter configuration that performed best across the inner-fold validation sets, as determined by the minimal mean squared error (MSE). The model was then retrained on the full outer-fold training set using this hyperparameter configuration and evaluated on the outer-fold test set, using four performance metrics: Pearson r, the coefficient of determination ( R<sup>2</sup>), the mean absolute error (MAE), and the MSE. This entire process was repeated for each of the five outer folds, ensuring that every data point is used for both training and testing, but never at the same time. We opted for five outer folds instead of ten to reduce computational demands, particularly memory and processing time, given the substantial volume of neuroimaging data involved in model training. Five outer folds led to an outer-fold test set at least n = 4 000, which should be sufficient for model evaluation. In contrast, we retained ten inner folds to ensure robust and stable hyperparameter tuning, maximising the reliability of model selection.

      To model the relationship between mental health and cognition, we employed Partial Least Squares Regression (PLSR) to predict the g-factor from 133 mental health variables. To model the relationship between neuroimaging data and cognition, we used a two-step stacking approach [15–17,61] to integrate information from 72 neuroimaging phenotypes across three MRI modalities. In the first step, we trained 72 base (first-level) PLSR models, each predicting the g-factor from a single neuroimaging phenotype. In the second step, we used the predicted values from these base models as input features for stacked models, which again predicted the g-factor. We constructed four stacked models based on the source of the base predictions: one each for dwMRI, rsMRI, sMRI, and a combined model incorporating all modalities (“dwMRI Stacked”, “rsMRI Stacked”, “sMRI Stacked”, and “All MRI Stacked”, respectively). Each stacked model was trained using one of four machine learning algorithms – ElasticNet, Random Forest, XGBoost, or Support Vector Regression – selected individually for each model (see Supplementary Materials, S6).

      For rsMRI phenotypes, we treated the choice of functional connectivity quantification method – full correlation, partial correlation, or tangent space parametrization – as a hyperparameter. The method yielding the highest performance on the outer-fold training set was selected for predicting the g-factor (see Supplementary Materials, S5).

      To prevent data leakage, we standardized the data using the mean and standard deviation derived from the training set and applied these parameters to the corresponding test set within each outer fold. This standardization was performed at three key stages: before g-factor derivation, before regressing out modality-specific confounds from the MRI data, and before stacking. Similarly, to maintain strict separation between training and testing data, both base and stacked models were trained exclusively on participants from the outer-fold training set and subsequently applied to the corresponding outer-fold test set.

      To evaluate model performance and assess statistical significance, we aggregated the predicted and observed gfactor values from each outer-fold test set. We then computed a bootstrap distribution of Pearson’s correlation coefficient (r) by resampling with replacement 5 000 times, generating 95% confidence intervals (CIs) (Fig. 1). Model performance was considered statistically significant if the 95% CI did not include zero, indicating that the observed associations were unlikely to have occurred by chance.”

      Comment 6: Methods - It's not clear from the text and Figure 1 which 12 scores from 11 tests are being used to derive the g-factor. Figure 1 shows only 8 bullet points with 10 scores in A and 13 tests under 'Cognitive tests' in B. Moreover, Supplement S1 describes 12 tests and 14 measures (Prospective Memory test is in the text but not in Supplementary Table 1).

      Thank you for identifying this discrepancy. In the original Figure 1b and in the Supplementary Methods (S1), the “Prospective Memory” test was accidentally duplicated, while it was present in the Supplementary Table 1 (Line 53, Supplementary Table 1). We have now corrected both figures for consistency. To clarify: Figure 1a presents the global mental health and cognitive domains studied, while Figure 1b now accurately lists 1) the 12 cognitive scores from 11 tests used to derive the g-factor (with the Trail Making Test contributing two measures – numeric and alphabetic trails) and 2) the three main categories of mental health indices used as machine learning features.

      We also corrected the Supplementary Materials to remove the duplicate test from the first paragraph. In Supplementary Table 1, there were 11 tests listed, and for the Trail Making test, we specified in the “Core measures” column that this test had 2 derivative scores: duration to complete the numeric path (Trail 1) and duration to complete the alphabetic path (Trail 2).

      Supplementary Materials, Line 46: “We used twelve scores from the eleven cognitive tests that represented the following cognitive domains: reaction time and processing speed (Reaction Time test), working memory (Numeric Memory test), verbal and numerical reasoning (Fluid Intelligence test), executive function (Trail Making Test), non-verbal fluid reasoning (Matrix Pattern Completion test), processing speed (Symbol Digit Substitution test), vocabulary (Picture Vocabulary test), planning abilities (Tower Rearranging test), verbal declarative memory (Paired Associate Learning test), prospective memory (Prospective Memory test), and visual memory (Pairs Matching test) [1].”

      Comment 7: - For the mental health measures: If I understand correctly, the questionnaire items were used individually, but also to create composite scores. This seems counterintuitive, because I would assume that if the raw data is used, the composite scores would not add additional information to that. When reading the Supplement, it seems like I'm not correct… It would be helpful to clarify the text on page 7 in the main text.

      You raise an excellent observation regarding the use of both individual questionnaire items and composite scores. This dual approach was methodologically justified by the properties of Partial Least Squares Regression (PLSR), our chosen first-level machine learning algorithm, which benefits from rich feature sets and can handle multicollinearity through dimensionality reduction. PLSR transforms correlated features into latent variables, meaning both individual items and composite scores can contribute unique information to the model. We elaborate on PLSR's mathematical principles in Supplementary Materials (S5).

      To directly address this concern, we conducted comparative analyses showing that the PLSR model (a single 80/20% training/test split), incorporating all 133 mental health features (both items and composites), outperformed models using either type alone. The full model achieved superior performance (MSE = 0.458, MAE = 0.537, \= 0.112, Pearson r = 0.336, p-value = 6.936e-112) compared to using only composite scores (93 features; MSE = 0.461, MAE = 0.538, R<sup>2</sup> = 0.107, Pearson r = 0.328, p-value = 5.8e-106) or only questionnaire items (40 features; MSE = 0.499, MAE = 0.561, R<sup>2</sup> = 0.033, Pearson r = 0.184, p-value = 2.53e-33). These results confirm that including both data types provide complementary predictive value. We expand on these considerations in the revised Methods section.

      Line 123: “Mental health measures encompassed 133 variables from twelve groups: mental distress, depression, clinical diagnoses related to the nervous system and mental health, mania (including bipolar disorder), neuroticism, anxiety, addictions, alcohol and cannabis use, unusual/psychotic experiences, traumatic events, selfharm behaviours, and happiness and subjective well-being (Fig. 1 and Tables S4 and S5). We included both selfreport questionnaire items from all participants and composite diagnostic scores computed following Davis et al. and Dutt et al. [35,36] as features in our first-level (for explanation, see Data analysis section) Partial Least Squares Regression (PLSR) model. This approach leverages PLSR’s ability to handle multicollinearity through dimensionality reduction, enabling simultaneous use of granular symptom-level information and robust composite measures (for mental health scoring details, see Supplementary Materials, S2). We assess the contribution of each mental health index to general cognition by examining the direction and magnitude of its PLSR-derived loadings on the identified latent variables”

      Comment 8: - Results - The colors in Figure 4 B are a bit hard to differentiate.

      We have updated Figure 4 to enhance colour differentiation by adjusting saturation and brightness levels, improving visual distinction. For further clarity, we split the original figure into two separate figures.

      Comment 9: - Discussion - "Overall, the scores for mental distress, alcohol and cannabis use, and self-harm behaviours relate positively, and the scores for anxiety, neurological and mental health diagnoses, unusual or psychotic experiences, happiness and subjective well-being, and negative traumatic events relate negatively to cognition," - this seems counterintuitive, that some symptoms relate to better cognition and others relate to worse cognition. Could you elaborate on this finding and what it could mean?

      We appreciate you highlighting this important observation. While some associations between mental health indices and cognition may appear counterintuitive at first glance, these patterns are robust (emerging consistently across both univariate correlations and PLSR loadings) and align with previous literature (e.g., Karpinski et al., 2018; Ogueji et al., 2022). For instance, the positive relationship between cognitive ability and certain mental health indicators like help-seeking behaviour has been documented in other population studies (Karpinski et al., 2018; Ogueji et al., 2022), potentially reflecting greater health literacy and access to care among cognitively advantaged individuals. Conversely, the negative associations with conditions like psychotic experiences mirror established neurocognitive deficits in these domains.

      As was initially detailed in Supplementary Materials (S12) and now expanded in our Discussion, these findings likely reflect complex multidimensional interactions. The positive loadings for mental distress indicators may capture: (1) greater help-seeking behaviour among those with higher cognition and socioeconomic resources, and/or (2) psychological overexcitability and rumination tendencies in high-functioning individuals. These interpretations are particularly relevant to the UK Biobank's assessment methods, where mental distress items focused on medical help-seeking rather than symptom severity per se (e.g., as a measure of mental distress, the UK Biobank questionnaire asked whether an individual sought or received medical help for or suffered from mental distress).

      Line 492: “Factor loadings derived from the PLSR model showed that the scores for mental distress, alcohol and cannabis use, and self-harm behaviours related positively, and the scores for anxiety, neurological and mental health diagnoses, unusual or psychotic experiences, happiness and subjective well-being, and negative traumatic events related negatively to the g-factor. Positive PLSR loadings of features related to mental distress may indicate greater susceptibility to or exaggerated perception of stressful events, psychological overexcitability, and predisposition to rumination in people with higher cognition [72]. On the other hand, these findings may be specific to the UK Biobank cohort and the way the questions for this mental health category were constructed. In particular, to evaluate mental distress, the UK Biobank questionnaire asked whether an individual sought or received medical help for or suffered from mental distress. In this regard, the estimate for mental distress may be more indicative of whether an individual experiencing mental distress had an opportunity or aspiration to visit a doctor and seek professional help [73]. Thus, people with better cognitive abilities and also with a higher socioeconomic status may indeed be more likely to seek professional help.

      Limited evidence supports a positive association between self-harm behaviours and cognitive abilities, with some studies indicating higher cognitive performance as a risk factor for non-suicidal self-harm. Research shows an inverse relationship between cognitive control of emotion and suicidal behaviours that weakens over the life course [73,74]. Some studies have found a positive correlation between cognitive abilities and the risk of nonsuicidal self-harm, suicidal thoughts, and suicidal plans that may be independent of or, conversely, affected by socioeconomic status [75,76]. In our study, the magnitude of the association between self-harm behaviours and cognition was low (Fig. 2), indicating a weak relationship.

      Positive PLSR loadings of features related to alcohol and cannabis may also indicate the influence of other factors. Overall, this relationship is believed to be largely affected by age, income, education, social status, social equality, social norms, and quality of life [79–80]. For example, education level and income correlate with cognitive ability and alcohol consumption [79,81–83]. Research also links a higher probability of having tried alcohol or recreational drugs, including cannabis, to a tendency of more intelligent individuals to approach evolutionary novel stimuli [84,85]. This hypothesis is supported by studies showing that cannabis users perform better on some cognitive tasks [86]. Alternatively, frequent drinking can indicate higher social engagement, which is positively associated with cognition [87]. Young adults often drink alcohol as a social ritual in university settings to build connections with peers [88]. In older adults, drinking may accompany friends or family visits [89,90]. Mixed evidence on the link between alcohol and drug use and cognition makes it difficult to draw definite conclusions, leaving an open question about the nature of this relationship.

      Consistent with previous studies, we showed that anxiety and negative traumatic experiences were inversely associated with cognitive abilities [90–93]. Anxiety may be linked to poorer cognitive performance via reduced working memory capacity, increased focus on negative thoughts, and attentional bias to threatening stimuli that hinder the allocation of cognitive resources to a current task [94–96]. Individuals with PTSD consistently showed impaired verbal and working memory, visual attention, inhibitory function, task switching, cognitive flexibility, and cognitive control [97–100]. Exposure to traumatic events that did not reach the PTSD threshold was also linked to impaired cognition. For example, childhood trauma is associated with worse performance in processing speed, attention, and executive function tasks in adulthood, and age at a first traumatic event is predictive of the rate of executive function decline in midlife [101,102]. In the UK Biobank cohort, adverse life events have been linked to lower cognitive flexibility, partially via depression level [103].

      In agreement with our findings, cognitive deficits are often found in psychotic disorders [104,105]. We treated neurological and mental health symptoms as predictor variables and did not stratify or exclude people based on psychiatric status or symptom severity. Since no prior studies have examined isolated psychotic symptoms (e.g., recent unusual experiences, hearing unreal voices, or seeing unreal visions), we avoid speculating on how these symptoms relate to cognition in our sample.

      Finally, negative PLSR loadings of the features related to happiness and subjective well-being may be specific to the study cohort, as these findings do not agree with some previous research [107–109]. On the other hand, our results agree with the study linking excessive optimism or optimistic thinking to lower cognitive performance in memory, verbal fluency, fluid intelligence, and numerical reasoning tasks, and suggesting that pessimism or realism indicates better cognition [110]. The concept of realism/optimism as indicators of cognition is a plausible explanation for a negative association between the g-factor and friendship satisfaction, as well as a negative PLSR loading of feelings that life is meaningful, especially in older adults who tend to reflect more on the meaning of life [111]. The latter is supported by the study showing a negative association between cognitive function and the search for the meaning of life and a change in the pattern of this relationship after the age of 60 [112]. Finally, a UK Biobank study found a positive association of happiness with speed and visuospatial memory but a negative relationship with reasoning ability [113].”

      Karpinski RI, Kinase Kolb AM, Tetreault NA, Borowski TB. High intelligence: A risk factor for psychological and physiological overexcitabilities. Intelligence. 2018;66:8–23.

      Ogueji IA, Okoloba MM. Seeking Professional Help for Mental Illness: A Mixed-Methods Study of Black Family Members in the UK and Nigeria. Psychol Stud. 2022;67:164–177.

      Comment 10: - All neuroimaging factors together explain 48% of the variance in the cognition-mental health relationship. However, this relationship is only r=0.3 - so then the effect of neuroimaging factors seems a lot smaller… What does it mean?

      Thank you for raising this critical point. We have addressed this point in our response to Reviewer 1, comment 2, Reviewer 1, comment 3 and Reviewer 2, comment 1.

      Briefly, cognition is related to mental health at around r = 0.3 and to neuroimaging phenotypes at around r = 0.4. These levels of relationship strength are consistent to what has been shown in the literature (e.g., Wang et al., 2025 and Vieira et al., 2020). We discussed the relationship between cognition and mental health in our response to Reviewer 2, comment 1 above. In short, this relationship reflects just one functional domain – mental health may also be associated with other domains such as negative and positive valence systems, arousal and regulatory systems, social processes, and sensorimotor functions. Moreover, in the context of gerontology research, this effect size is considered relatively large (Brydges et al., 2019).

      We conducted a commonality analysis to investigate the unique and shared variance of mental health and neuroimaging phenotypes in explaining cognition.  As we discussed in our response to Reviewer 1, comment 2, we were able to account for 48% of the covariation between cognition and mental health using the MRI modalities available in the UK Biobank. The remaining 52% of unexplained variance may arise from several sources.

      One possibility is the absence of certain neuroimaging modalities in the UK Biobank dataset, such as task-based fMRI contrasts, positron emission tomography, arterial spin labeling, and magnetoencephalography/electroencephalography. Prior research from our group and others has consistently demonstrated strong predictive performance from specific task-based fMRI contrasts, particularly those derived from tasks like the n-Back working memory task and the face-name episodic memory task, none of which is available in the UK Biobank (Tetereva et al., 2025).

      Moreover, there are inherent limitations in using MRI as a proxy for brain structure and function. Measurement error and intra-individual variability, such as differences in a cognitive state between cognitive assessments and MRI acquisition, may also contribute to the unexplained variance. According to RDoC framework, brain circuits represent only one level of neurobiological analysis relevant to cognition. Other levels, including genes, molecules, cells, and physiological processes, may also play a role in the cognition-mental health relationship.

      We have now incorporated these considerations into the Discussion section.

      Line 481: “Our analysis confirmed the validity of the g-factor [31] as a quantitative measure of cognition [31], demonstrating that it captures nearly half (39%) of the variance across twelve cognitive performance scores, consistent with prior studies [63–68]. Furthermore, we were able to predict cognition from 133 mental health indices, showing a medium-sized relationship that aligns with existing literature [69,70]. Although the observed mental health-cognition association is lower than within-sample estimates in conventional regression models, it aligns with our prior mega-analysis in children [69]. Notably, this effect size is not considered small in gerontology. In fact, it falls around the 70th percentile of reported effects and approaches the threshold for a large effect at r = 0.32 [71]. While we focused specifically on cognition as an RDoC core domain, the strength of its relationship with mental health may be bounded by the influence of other functional domains, particularly in normative, non-clinical samples – a promising direction for future research.”

      Line 658: “Although recent debates [18] have challenged the predictive utility of MRI for cognition, our multimodal marker integrating 72 neuroimaging phenotypes captures nearly half of the mental health-explained variance in cognition. We demonstrate that neural markers with greater predictive accuracy for cognition also better explain cognition-mental health covariation, showing that multimodal MRI can capture both a substantial cognitive variance and nearly half of its shared variance with mental health. Finally, we show that our neuromarkers explain a substantial portion of the age- and sex-related variance in the cognition-mental health relationship, highlighting their relevance in modeling cognition across demographic strata.

      The remaining unexplained variance in the relationship between cognition and mental health likely stems from multiple sources. One possibility is the absence of certain neuroimaging modalities in the UK Biobank dataset, such as task-based fMRI contrasts, positron emission tomography, arterial spin labeling, and magnetoencephalography/electroencephalography. Prior research has consistently demonstrated strong predictive performance from specific task-based fMRI contrasts, particularly those derived from tasks like the n-Back working memory task and the face-name episodic memory task, none of which is available in the UK Biobank [15,17,61,69,114,142,151].

      Moreover, there are inherent limitations in using MRI as a proxy for brain structure and function. Measurement error and intra-individual variability, such as differences in a cognitive state between cognitive assessments and MRI acquisition, may also contribute to the unexplained variance. According to the RDoC framework, brain circuits represent only one level of neurobiological analysis relevant to cognition [14]. Other levels, including genes, molecules, cells, and physiological processes, may also play a role in the cognition-mental health relationship.

      Nonetheless, neuroimaging provides a valuable window into the biological mechanisms underlying this overlap – insights that cannot be gleaned from behavioural data alone. Ultimately, our findings validate brain-based neural markers as a fundamental neurobiological unit of analysis, advancing our understanding of mental health through the lens of cognition.”

      Wang Y, Anney R, Pat N. The relationship between cognitive abilities and mental health as represented by cognitive abilities at the neural and genetic levels of analysis. eLife. 2025.14:RP105537.

      Vieira S, Gong QY, Pinaya WHL, et al. Using Machine Learning and Structural Neuroimaging to Detect First Episode Psychosis: Reconsidering the Evidence. Schizophr Bull. 2020;46(1):17-26.

      Brydges CR. Effect Size Guidelines, Sample Size Calculations, and Statistical Power in Gerontology. Innovation in Aging. 2019;3(4):igz036.

      Tetereva A, Knodt AR, Melzer TR, et al. Improving Predictability, Reliability and Generalisability of Brain-Wide Associations for Cognitive Abilities via Multimodal Stacking. Preprint. bioRxiv. 2025;2024.05.03.589404.

      Reviewer 3:

      Buianova et al. present a comprehensive analysis examining the predictive value of multimodal neuroimaging data for general cognitive ability, operationalized as a derived g-factor. The study demonstrates that functional MRI holds the strongest predictive power among the modalities, while integrating multiple MRI modalities through stacking further enhances prediction performance. The inclusion of a commonality analysis provides valuable insight into the extent to which shared and unique variance across mental health features and neuroimaging modalities contributes to the observed associations with cognition. The results are clearly presented and supported by highquality visualizations. Limitations of the sample are stated clearly.

      Thank you once more for your constructive and encouraging feedback. We appreciate your careful reading and valuable methodological insights. Your expertise has helped us clarify key methodological concepts and improve the overall rigour of our study.

      Suggestions for improvement:

      (1) The manuscript would benefit from the inclusion of permutation testing to evaluate the statistical significance of the predictive models. This is particularly important given that some of the reported performance metrics are relatively modest, and permutation testing could help ensure that results are not driven by chance.

      Thank you, this is an excellent point. We agree that evaluating the statistical significance of our predictive models is essential.

      In our original analysis, we assessed model performance by generating a bootstrap distribution of Pearson’s r, resampling the data with replacement 5,000 times (see Figure 3b). In response to your feedback, we have made the following updates:

      (1) Improved Figure 3b to explicitly display the 95% confidence intervals.

      (2) Supplemented the results by reporting the exact confidence interval values.

      (3) Clarified our significance testing procedure in the Methods section.

      We considered model performance statistically significant when the 95% confidence interval did not include zero, indicating that the observed associations are unlikely to have occurred by chance.

      We chose bootstrapping over permutation testing because, while both can assess statistical significance, bootstrapping additionally provides uncertainty estimates in the form of confidence intervals. Given the large sample size in our study, significance testing can be less informative, as even small effects may reach statistical significance. Bootstrapping offers a more nuanced understanding of model uncertainty.

      Line 233: “To evaluate model performance and assess statistical significance, we aggregated the predicted and observed g-factor values from each outer-fold test set. We then computed a bootstrap distribution of Pearson’s correlation coefficient (r) by resampling with replacement 5 000 times, generating 95% confidence intervals (CIs) (Fig. 1). Model performance was considered statistically significant if the 95% CI did not include zero, indicating that the observed associations were unlikely to have occurred by chance.”

      (2) Applying and testing the trained models on an external validation set would increase confidence in generalisability of the model.

      We appreciate this excellent suggestion. While we considered this approach, implementing it would require identifying an appropriate external dataset with comparable neuroimaging and behavioural measures, along with careful matching of acquisition protocols and variable definitions across sites. These challenges extend beyond the scope of the current study, though we fully agree that this represents an important direction for future research.

      Our findings, obtained from one of the largest neuroimaging datasets to date with training and test samples exceeding most previous studies, align closely with existing literature: the predictive accuracy of each neuroimaging phenotype and modality for cognition matches the effect size reported in meta-analyses (r ≈ 0.4; e.g., Vieira et al., 2020). The ability of dwMRI, rsMRI and sMRI to capture the cognition-mental health relationship is, in turn, consistent with our previous work in pediatric populations (Wang et al., 2025; Pat et al., 2022).

      Vieira S, Gong QY, Pinaya WHL, et al. Using Machine Learning and Structural Neuroimaging to Detect First Episode Psychosis: Reconsidering the Evidence. Schizophr Bull. 2020;46(1):17-26.

      Wang Y, Anney R, Pat N. The relationship between cognitive abilities and mental health as represented by cognitive abilities at the neural and genetic levels of analysis. eLife. 2025.14:RP105537.

      Pat N, Wang Y, Anney R, Riglin L, Thapar A, Stringaris A. Longitudinally stable, brain-based predictive models mediate the relationships between childhood cognition and socio-demographic, psychological and genetic factors. Hum Brain Mapp. 2022;43:5520–5542.

      (3) The rationale for selecting a 5-by-10-fold cross-validation scheme is not clearly explained. Clarifying why this structure was preferred over more commonly used alternatives, such as 10-by-10 or 5-by-5 cross-validation, would strengthen the methodological transparency.

      Thank you for this important methodological question. Our choice of a 5-by-10-fold crossvalidation scheme was motivated by the need to balance robust hyperparameter tuning with computational efficiency, particularly memory and processing time. Retaining five outer folds allowed us to rigorously assess model performance across multiple data partitions, leading to an outer-fold test set at least n = 4 000 and providing a substantial amount of neuroimaging data involved in model training. In contrast, employing ten inner folds ensured robust and stable hyperparameter tuning that maximizes the reliability of model selection. Thus, the 5-outer-fold with our large sample provided sufficient out-of-sample test set size for reliable model evaluation and efficient computation, while 10 inner folds enabled robust hyperparameter tuning. We now provide additional rationale for this design decision on Page 10.

      Line 188: “We employed nested cross-validation to predict cognition from mental health indices and 72 neuroimaging phenotypes (Fig. 1). Nested cross-validation is a robust method for evaluating machine-learning models while tuning their hyperparameters, ensuring that performance estimates are both accurate and unbiased. Here, we used a nested cross-validation scheme with five outer folds and ten inner folds.

      We started by dividing the entire dataset into five outer folds. Each fold took a turn being held out as the outerfold test set (20% of the data), while the remaining four folds (80% of the data) were used as an outer-fold training set. Within each outer-fold training set, we performed a second layer of cross-validation – this time splitting the data into ten inner folds. These inner folds were used exclusively for hyperparameter tuning: models were trained on nine of the inner folds and validated on the remaining one, cycling through all ten combinations.

      We then selected the hyperparameter configuration that performed best across the inner-fold validation sets, as determined by the minimal mean squared error (MSE). The model was then retrained on the full outer-fold training set using this hyperparameter configuration and evaluated on the outer-fold test set, using four performance metrics: Pearson r, the coefficient of determination ( R<sup>2</sup>), the mean absolute error (MAE), and the MSE. This entire process was repeated for each of the five outer folds, ensuring that every data point is used for both training and testing, but never at the same time. We opted for five outer folds instead of ten to reduce computational demands, particularly memory and processing time, given the substantial volume of neuroimaging data involved in model training. Five outer folds led to an outer-fold test set at least n = 4 000, which should be sufficient for model evaluation. In contrast, we retained ten inner folds to ensure robust and stable hyperparameter tuning, maximising the reliability of model selection.”

      (4) A more detailed discussion of which specific brain regions or features within each neuroimaging modality contributed most strongly to the prediction of cognition would enhance neurobiological relevance of the findings.

      Thank you for this thoughtful suggestion. To address this point, we have included feature importance plots for the top-performing neuroimaging phenotypes within each modality (Figure 5 and Figures S2–S4), demonstrating the relative contributions of individual features to the predictive models. While we maintain our primary focus on cross-modality performance comparisons in the main text, as this aligns with our central aim of evaluating multimodal MRI markers at the integrated level, we outline the contribution of neuroimaging features with the highest predictive performance for cognition in the revised Results and Discussion.

      Methods

      Line 255: “To determine which neuroimaging features contribute most to the predictive performance of topperforming phenotypes within each modality, while accounting for the potential latent components derived from neuroimaging, we assessed feature importance using the Haufe transformation [62]. Specifically, we calculated Pearson correlations between the predicted g-factor and scaled and centred neuroimaging features across five outer-fold test sets. We also examined whether the performance of neuroimaging phenotypes in predicting cognition per se is related to their ability to explain the link between cognition and mental health. Here, we computed the correlation between the predictive performance of each neuroimaging phenotype and the proportion of the cognition-mental health relationship it captures. To understand how demographic factors, including age and sex, contribute to this relationship, we also conducted a separate set of commonality analyses treating age, sex, age<sup>2</sup>, age×sex, and age<sup>2</sup>×sex as an additional set of explanatory variables (Fig. 1).”

      Results

      dwMRI

      Line 331: “Overall, models based on structural connectivity metrics performed better than TBSS and probabilistic tractography (Fig. 3). TBSS, in turn, performed better than probabilistic tractography (Fig. 3 and Table S13). The number of streamlines connecting brain areas parcellated with aparc MSA-I had the best predictive performance among all dwMRI neuroimaging phenotypes (R<sup>2</sup><sub>mean</sub> = 0.052, r<sub>mean</sub> = 0.227, 95% CI [0.212, 0.235]). To identify features driving predictions, we correlated streamline counts in aparc MSA-I parcellation with the predicted g_factor values from the PLSR model. Positive associations with the predicted _g-factor were strongest for left superior parietal-left caudal anterior cingulate, left caudate-right amygdala, and left putamen-left hippocampus connections. The most marked negative correlations involved left putamen-right posterior thalamus and right pars opercularis-right caudal anterior cingulate pathways (Fig. 5 and Supplementary Fig. S2).”

      rsMRI

      Line 353: “Among RSFC metrics for 55 and 21 ICs, tangent parameterization matrices yielded the highest performance in the training set compared to full and partial correlation, as indicated by the cross-validation score. Functional connections between the limbic (IC10) and dorsal attention (IC18) networks, as well as between the ventral attention (IC15) and default mode (IC11) networks, displayed the highest positive association with cognition. In contrast, functional connectivity between the limbic (IC43, the highest activation within network) and default mode (IC11) and limbic (IC45) and frontoparietal (IC40) networks, between the dorsal attention (IC18) and frontoparietal (IC25) networks, and between the ventral attention (IC15) and frontoparietal (IC40) networks, showed the highest negative association with cognition (Fig. 5 and Supplementary Fig. S3 and S4)”

      sMRI

      Line 373: “FreeSurfer subcortical volumetric subsegmentation and ASEG had the highest performance among all sMRI neuroimaging phenotypes (R<sup>2</sup><sub>mean</sub> = 0.068, r<sub>mean</sub> = 0.244, 95% CI [0.237, 0.259] and R<sup>2</sup><sub>mean</sub> = 0.059, r<sub>mean</sub> = 0.235, 95% CI [0.221, 0.243], respectively). In FreeSurfer subcortical volumetric subsegmentation, volumes of all subcortical structures, except for left and right hippocampal fissures, showed positive associations with cognition. The strongest relations were observed for the volumes of bilateral whole hippocampal head and whole hippocampus (Fig. 5 and Supplementary Fig. S5 for feature importance maps). Grey matter morphological characteristics from ex vivo Brodmann Area Maps showed the lowest predictive performance (R<sup>2</sup><sub>mean</sub> = 0.008, r<sub>mean</sub> = 0.089, 95% CI [0.075, 0.098]; Fig. 3 and Table S15).”

      Discussion

      dwMRI

      Line 562: “Among dwMRI-derived neuroimaging phenotypes, models based on structural connectivity between brain areas parcellated with aparc MSA-I (streamline count), particularly connections with bilateral caudal anterior cingulate (left superior parietal-left caudal anterior cingulate, right pars opercularis-right caudal anterior cingulate), left putamen (left putamen-left hippocampus, left putamen-right posterior thalamus), and amygdala (left caudate-right amygdala), result in a neural indicator that best reflects microstructural resources associated with cognition, as indicated by predictive modeling, and more importantly, shares the highest proportion of the variance with mental health-g, as indicated by commonality analysis.”

      rsMRI

      Line 583: “We extend findings on the superior performance of rsMRI in predicting cognition, which aligns with the literature [15, 28], by showing that it also explains almost a third of the variance in cognition that mental health captures. At the rsMRI neuroimaging phenotype level, this performance is mostly driven by RSFC patterns among 55 ICA-derived networks quantified using tangent space parameterization. At a feature level, these associations are best captured by the strength of functional connections among limbic, dorsal attention and ventral attention, frontoparietal and default mode networks. These functional networks have been consistently linked to cognitive processes in prior research [127–130].”

      sMRI

      Line 608: “Integrating information about brain anatomy by stacking sMRI neuroimaging phenotypes allowed us to explain a third of the link between cognition and mental health. Among all sMRI neuroimaging phenotypes, those that quantified the morphology of subcortical structures, particularly volumes of bilateral hippocampus and hippocampal head, explain the highest portion of the variance in cognition captured by mental health. Our findings show that, at least in older adults, volumetric properties of subcortical structures are not only more predictive of individual variations in cognition but also explain a greater portion of cognitive variance shared with mental health than structural characteristics of more distributed cortical grey and white matter. This aligns with the Scaffolding Theory that proposes stronger compensatory engagement of subcortical structures in cognitive processing in older adults [138–140].”

      (5) The formatting of some figure legends could be improved for clarity - for example, some subheadings were not formatted in bold (e.g., Figure 2 c)

      Thank you for noticing this. We have updated the figures to enhance clarity, keeping subheadings plain while bolding figure numbers and MRI modality names.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      Evidence, reproducibility and clarity

      The manuscript by Egawa and colleagues investigates differences in nodal spacing in an avian auditory brain stem circuit. The results are clearly presented and data are of very high quality. The authors make two main conclusions:

      (1) Node spacing, i.e. internodal length, is intrinsically specified by the oligodendrocytes in the region they are found in, rather than axonal properties (branching or diameter).

      (2) Activity is necessary (we don't know what kind of signaling) for normal numbers of oligodendrocytes and therefore the extent of myelination.

      These are interesting observations, albeit phenomenon. I have only a few criticisms that should be addressed:

      (1) The use of the term 'distribution' when describing the location of nodes is confusing. I think the authors mean rather than the patterns of nodal distribution, the pattern of nodal spacing. They have investigated spacing along the axon. I encourage the authors to substitute node spacing or internodal length for node distribution.

      Thanks for your suggestion to avoid confusion. We used the phrase "nodal spacing" instead of "nodal distribution" throughout the revised manuscript.

      (2) In Seidl et al. (J Neurosci 2010) it was reported that axon diameter and internodal length (nodal spacing) were different for regions of the circuit. Can the authors help me better understand the difference between the Seidl results and those presented here?

      As a key distinction, our study focuses specifically on the main trunk of the contralateral projection of NM axons. This projection features a sequential branching structure known as the delay line, where collateral branches form terminal arbors and connect to the ventral dendritic layer of NL neurons. This structural organization plays a critical role in influencing the dynamic range of ITD detection by regulating conduction delays along the NM axon trunk.

      The study by Seidl et al. (2010) is a pioneering work that measured diameter of NM axon using electron microscopy, providing highly reliable data. However, due to the technical  limitations of electron microscopy, which does not allow for the continuous tracing of individual axons, it is not entirely clear whether the axons measured in the ventral NL region correspond to terminal arbors of collateral branches or the main trunk of NM axons (see Figure 9E, F in their paper). Instead, they categorized axon diameters based on their distance from NL cell layer, showing that axon diameter increases distally (see Figure 9G in their paper). Notably, the diameters of ventral axons located more than 120 μm away from the NL cell layer is almost identical to those in the midline.

      As illustrated in our Figure 4D and Supplementary Video 2, the main trunk of the contralateral NM projection is predominantly located in these distal regions. Therefore, our findings complement those of Seidl et al. (2010) rather than contradicting them. We made this point as clear as possible in text (page 7, line 3).

      (3) The authors looked only in very young animals - are the results reported here applicable only to development, or does additional refinement take place with aging?

      In this study, we examined chick embryos from E9 to just before hatching (E21) and post-hatch chicks up to P9. Chickens begin to perceive sound around E12 and possess sound localization abilities at the time of hatching (Grier et al., 1967) (added to page 4, line 9). Therefore, by E21, the sound localization circuit is largely established.

      On the other hand, additional refinement of the circuit with aging is certainly possible. A key cue for sound localization, interaural time difference (ITD), depends on the distance between the two ears, which increases as the animal grows. As shown in Figure 2G, internodal length increased by approximately 20% between E18 and P9 while maintaining regional differences. Given that NM axons are nearly fully myelinated by E21 (Figure 4D, 6C), this suggests that myelin extends in proportion to the overall growth of the head and brain volume. We described this possibility in text (page 5, line 21)

      Thus, our study covers not only the early stages of myelination but also the post-functional maturation in the sound localization circuit.

      (4) The fact that internodal length is specified by the oligodendrocyte suggests that activity may not modify the location of nodes of Ranvier - although again, the authors have only looked during early development. This is quite different than this reviewer's original thoughts - that activity altered internodal length and axon diameter. Thus, the results here argue against node plasticity. The authors may choose to highlight this point or argue for or against it based on results in adult birds?

      In this study, we demonstrated that although vesicular release did not affect internodal length, it selectively promoted oligodendrogenesis, thereby supporting the full myelination and hence the pattern of nodal spacing along the NM axons. We believe that this finding falls within the broader scope of 'activity-dependent plasticity' involving oligodendrocytes and nodes.

      As summarized in the excellent review by Bonetto et al. (2021), activity-dependent plasticity in oligodendrocytes encompasses a wide range of phenomena, not limited to changes in internodal length but also including oligodendrogenesis. Moreover, the effects of neuronal activity are not uniform but likely depend on the diversity of both neurons and oligodendrocytes. For example, in the mouse visual cortex, activity-dependent myelination occurs in interneurons but not in excitatory neurons (Yang et al., 2020). Additionally, expression of TeNT in axons affected myelination heterogeneously in zebrafish; some axons were impaired in myelination and the others were not affected at all (Koudelka et al., 2016). In the mouse corpus callosum, neuronal activity influences oligodendrogenesis, which in turn facilitates adaptive myelination (Gibson et al., 2014).

      Thus, rather than refuting the role of activity-dependent plasticity in nodal spacing, our findings emphasize the diversity of underlying regulatory mechanisms. We described these explicitly in text (page 10, line 18).

      Significance

      This paper may argue against node plasticity as a mechanism for tuning of neural circuits. Myelin plasticity is a very hot topic right now and node plasticity reflects myelin plasticity. this seems to be a circuit where perhaps plasticity is NOT occurring. That would be interesting to test directly. One limitation is that this is limited to development.

      This paper does not argue against node plasticity, but rather demonstrates that oligodendrocytes in the NL region exhibit a form of plasticity; they proliferate in response to vesicular release from NM axons, yet do not undergo morphological changes, ensuring adequate oligodendrocyte density for the full myelination of the auditory circuit. Thus, activity-dependent plasticity involving oligodendrocytes would contributes in various ways to each neural circuit, which is presumably attributed to the fact that myelination is driven by complex multicellular interactions between diverse axons and oligodendrocytes. Oligodendrocytes are known to exhibit heterogeneity in morphology, function, responsiveness, and gene profiles (Foerster et al., 2019; Sherafat et al., 2021; Osanai et al., 2022; Valihrach et al., 2022), but functional significance of this heterogeneity remains largely unclear. This paper also provides insight into how oligodendrocyte heterogeneity may contribute to the fine-tuning of neural circuit function, adding further value to our findings. Importantly, our study covers the wide range of development in the sound localization circuit, from the pre-myelination (E9) to the postfunctional maturation (P9), revealing how the nodal spacing pattern along the axon in this circuit emerges and matures.

      Reviewer #2:

      Evidence, reproducibility and clarity

      Egawa et al describe the developmental timeline of the assembly of nodes of Ranvier in the chick brainstem auditory circuit. In this unique system, the spacing between nodes varies significantly in different regions of the same axon from early stages, which the authors suggest is critical for accurate sound localization. Egawa et al set out to determine which factors regulate this differential node spacing. They do this by using immunohistological analyses to test the correlation of node spacing with morphological properties of the axons, and properties of oligodendrocytes, glial cells that wrap axons with the myelin sheaths that flank the nodes of Ranvier. They find that axonal structure does not vary significantly, but that oligodendrocyte density and morphology varies in the different regions traversed by these axons, which suggests this is a key determinant of the region-specific differences in node density and myelin sheath length. They also find that differential oligodendrocyte density is partly determined by secreted neuronal signals, as (presumed) blockage of vesicle fusion with tetanus toxin reduced oligodendrocyte density in the region where it is normally higher. Based on these findings, the authors propose that oligodendrocyte morphology, myelin sheath length, and consequently nodal distribution are primarily determined by intrinsic oligodendrocyte properties rather than neuronal factors such as activity.

      Major points, detailed below, need to be addressed to overcome some limitations of the study.

      Major comments:

      (1) It is essential that the authors validate the efficiency of TeNT to prove that vesicular release is indeed inhibited, to be able to make any claims about the effect of vesicular release on oligodendrogenesis/myelination.

      eTeNT is a widely used genetically encoded silencing tool and constructs similar to the one used in this study have been successfully applied in primates and rodents to suppress target behaviors via genetic dissection of specific pathways (Kinoshita et al., 2012; Sooksawate et al., 2013). However, precisely quantifying the extent of vesicular release inhibition from NM axons in the brainstem auditory circuit is technically problematic.

      One major limitation is that while A3V efficiently infects NM neurons, its transduction efficiency does not reach 100%. In electrophysiological evaluations, NL neurons receive inputs from multiple NM axons, meaning that responses may still include input from uninfected axons. Additionally, failure to evoke synaptic responses could either indicate successful silencing or failure to stimulate NM axons, making a clear distinction difficult. Furthermore, unlike in motor circuits, we cannot assess the effect of silencing by observing behavioral outputs.

      Thus, we instead opted to quantify the precise expression efficiency of GFP-tagged eTeNT in the cell bodies of NM neurons. The proportion of NM neurons expressing GFP-tagged eTeNT was 89.7 ± 1.6% (N = 6 chicks), which is consistent with previous reports evaluating A3V transduction efficiency in the brainstem auditory circuit (Matsui et al., 2012). These results strongly suggest that synaptic transmission from NM axons was globally silenced by eTeNT at the NL region. We described these explicitly in text (page 8, line 2).

      (2) Related to 1, can the authors clarify if their TeNT expression system results in the whole tract being silenced? It appears from Fig. 6 that their approach leads to sparse expression of TeNT in individual neurons, which enables them to measure myelination parameters. Can the authors discuss how silencing a single axon can lead to a regional effect in oligodendrocyte number?

      Figure 6D depicts a representative axon selected from a dense population of GFP-positive axons in a 200-μm-thick slice after A3V-eTeNT infection to bilateral NM. As shown in Supplementary Video 1 and 2, densely labeled GFP-positive axons can be traced along the main trunk. To prevent any misinterpretation, we have revised the description of Figure 6 in the main text and Figure legend (page 31, line 9), and stated the A3V-eTeNT infection efficiency was 89.7 ± 1.6% in NM neurons, as mentioned above. Based on this efficiency, we interpreted that the global occlusion of vesicular release from most of the NM axons altered the pericellular microenvironment of the NL region, which led to the regional effect on the oligodendrocyte density.

      On the other hand, your question regarding whether sparse expression of eTeNT still has an effect is highly relevant. As we also discussed in our reply to comment 4 by Reviewer #1, the relationship between neuronal activity and oligodendrocytes is highly diverse. In some types of axons, vesicular release is essential for normal myelination, and this process was disrupted by TeNT (Koudelka et al., 2016), suggesting that direct interaction with oligodendrocytes via vesicle release may actively promote myelination in these types of axons.

      To clarify whether the phenotype observed in Figure 6 arises from changes in the pericellular microenvironment at the NL region or from the direct suppression of axon-oligodendrocyte interactions, we included a new Supplementary Figure (Figure 6—figure supplement 1). In this figure, we evaluated the node formation on the axon sparsely expressing eTeNT by electroporation into the unilateral NM. The results showed that sparse eTeNT expression did not increase the percentages of heminodes or unmyelinated segments. This finding supports our conclusion that the increased unmyelinated segments by A3V-eTeNT resulted from impaired synaptic transmission at NM terminals and subsequent alterations of  pericellular microenvironment at the NL region.

      (3) The authors need to fully revise their statistical analyses throughout and supply additional information that is needed to assess if their analyses are adequate:

      Thank you for your valuable suggestions to improve the rigor of our statistical analyses. We have reanalyzed all statistical tests using R software. In the revised Methods section and Figure Legends, we have clarified the rationale for selecting each statistical test, specified which test was used for each figure, and explicitly defined both n and N. After reevaluation with the Shapiro-Wilk test, we adjusted some analyses to non-parametric tests where appropriate. However, these adjustments did not alter the statistical significance of our results compared to the original analyses.

      (3.1) the authors use a variety of statistical tests and it is not always obvious why they chose a particular test. For example, in Fig. 2G they chose a Kruskal-Wallis test instead of a two-way ANOVA or MannWhitney U test, which are much more common in the field. What is the rationale for the test choice?

      We have revised the explanation of our statistical test choices to provide greater clarity and precision. For example, in Figure 2G, we first assessed the normality of the data in each of the four groups using the Shapiro-Wilk test, which revealed that some datasets did not follow a normal distribution. Given this, we selected the Kruskal-Wallis test, a commonly used non-parametric test for comparisons across three or more groups. Since the Kruskal-Wallis test indicated a significant difference, we conducted a post hoc Steel-Dwass test to determine which specific group comparisons were statistically significant.

      (3.2) in some cases, the choice of test appears wholly inappropriate. For example, in Fig. 3H-K, an unpaired t-test is inappropriate if the two regions were analysed in the same samples. In Fig. 5, was a ttest used for comparisons between multiple groups in the same dataset? If so, an ANOVA may be more appropriate.

      In the case of Figures 3H-K, we compared oligodendrocyte morphology between regions. However, since the number of sparsely labeled oligodendrocytes differs both between regions and across individual samples, there is no strict correspondence between paired measurements. On the other hand, in Figures 5B, C, and E, we compared the density of labeled cells between regions within the same slice, establishing a direct correspondence between paired data points. For these comparisons, we appropriately used a paired t-test.

      (3.3) in some cases, the authors do not mention which test was used (Fig 3: E-G no test indicated, despite asterisks; G/L/M - which regression test that was used? What does r indicate?)

      We have specified the statistical tests used for each figure in the Methods section and Figure Legends for better clarity. Additionally, we have revised the descriptions for Figure 4G, L, and M and their corresponding Figure Legends to explicitly indicate that Spearman’s rank correlation coefficient (rₛ) was used for evaluation.

      (3.4) more concerningly, throughout the results, data may have been pseudo-replicated. t-tests and ANOVAs assume that each observation in a dataset is independent of the other observations. In figures 1-4 and 6 there is a very large "n" number, but the authors do not indicate what this corresponds to. This leaves it open to interpretation, and the large values suggest that the number of nodes, internodal segments, or cells may have been used. These are not independent experimental units, and should be averaged per independent biological replicate - i.e. per animal (N).

      We have now clarified what “n” represents in each figure, as well as the number of animals (N) used in each experiment, in the Figure Legends.

      In this study, developmental stages of chick embryos were defined by HH stage (Hamburger and Hamilton, 1951), minimizing individual variability. Additionally, since our study focuses on the distribution of morphological characteristics of individual cells, averaging measurements per animal would obscure important cellular-level variability and potentially mislead interpretation of data. Furthermore, we employed a strategy of sparse genetic labeling in many experiments, which naturally results in variability in the number of measurable cells per animal. Given the clear distinctions in our data distributions, we believe that averaging per biological replicate is not essential in this case.

      To further ensure the robustness of our statistical analysis, data presented as boxplots were preliminarily assessed using PlotsOfDifferences, a web-based application that calculates and visualizes effect sizes and 95% confidence intervals based on bootstrapping (https://huygens.science.uva.nl/PlotsOfDifferences/; https://doi.org/10.1101/578575). Effect sizes can serve as a valuable alternative to p-values (Ho, 2018; https://www.nature.com/articles/s41592019-0470-3). The significant differences reported in our study are also supported by clear differences in effect sizes, ensuring that our conclusions remain robust regardless of the statistical approach used.

      If requested, we would be happy to provide PlotsOfDifferences outputs as supplementary source data files, similar to those used in eLife publications, for each figure.

      (3.5) related to the pseudo-replication issue, can the authors include individual datapoints in graphs for full transparency, per biological replicates, in addition or in alternative to bar-graphs (e.g. Fig. 5 and 6).

      We have now incorporated individual data points into the bar graphs in Figures 5 and 6.

      (4) The main finding of the study is that the density of nodes differs between two regions of the chicken auditory circuit, probably due to morphological differences in the respective oligodendrocytes. Can the authors discuss if this finding is likely to be specific to the bird auditory circuit?

      The morphological differences of oligodendrocytes between white and gray matter are well established (i.e. shorter myelin at gray matter), but their correspondence with the nodal spacing pattern along the long axonal projections of cortical neurons is not well understood. Future research may find similarities with our findings. Additionally, as mentioned in the final section of the Discussion, the mammalian brainstem auditory circuit is functionally analogous to the avian ITD circuit. Regional differences in nodal spacing along axons have also been observed in the mammalian system, raising the important question of whether these differences are supported by regional heterogeneity in oligodendrocytes. Investigating this possibility will facilitate our understanding of the underlying logic and mechanisms for determining node spacing patterns along axons, as well as provide valuable insights into evolutionary convergence in auditory processing mechanisms. We described these explicitly in text (page 11, line 34).

      (5) Provided the authors amend their statistical analyses, and assuming significant differences remain as shown, the study shows a correlation (but not causation) between node spacing and oligodendrocyte density, but the authors did not manipulate oligodendrocyte density per se (i.e. cell-autonomously). Therefore, the authors should either include such experiments, or revise some of their phrasing to soften their claims and conclusions. For example, the word "determine" in the title could be replaced by "correlate with" for a more accurate representation of the work. Similar sentences throughout the main text should be amended.

      As you summarized in your comment, our results demonstrated that A3V-eTeNT suppressed oligodendrogenesis in the NL region, leading to a reduction in oligodendrocyte density (Figures 6L, M), which caused the emergence of unmyelinated segments. While this is an indirect manipulation of oligodendrocyte density, it nonetheless provides evidence supporting a causal relationship between oligodendrocyte density and nodal spacing.

      The emergence of unmyelinated segments at the NL region further suggests that the myelin extension capacity of oligodendrocytes differs between regions, highlighting regional differences in intrinsic properties of oligodendrocyte as the most prominent determinant of nodal spacing variation. However, as you correctly pointed out, our findings do not establish direct causation.

      In the future, developing methods to artificially manipulate myelin length could provide a more definitive demonstration of causality. Given these considerations, we have modified the title to replace "determine" with "underlie", ensuring that our conclusions are presented with appropriate nuance.

      (6) The authors fail to introduce, or discuss, very pertinent prior studies, in particular to contextualize their findings with:

      (6.1) known neuron-autonomous modes of node formation prior to myelination, e.g. Zonta et al (PMID 18573915); Vagionitis et al (PMID 35172135); Freeman et al (PMID 25561543)

      (6.2) known effects of vesicular fusion directly on myelinating capacity and oligodendrogenesis, e.g. Mensch et al (PMID 25849985)

      (6.3) known correlation of myelin length and thickness with axonal diameter, e.g. Murray & Blakemore (PMID 7012280); Ibrahim et al (PMID 8583214); Hildebrand et al (PMID 8441812).

      (6.4) regional heterogeneity in the oligodendrocyte transcriptome (page 9, studies summarized in PMID 36313617)

      Thank you for your insightful suggestions. We have incorporated the relevant references you provided and revised the manuscript accordingly to contextualize our findings within the existing literature.

      Minor comments:

      (7) Can the authors amend Fig. 1G with the correct units of measurement, not millimetres.

      Response: 

      Thank you for your suggestion. We have corrected the units in Figure 1G to µm

      (8) The Olig2 staining in Fig 2C does not appear to be nuclear, as would be expected of a transcription factor and as is well established for Olig2, but rather appears to be excluded from the nucleus, as it is in a ring or donut shape. Can the authors comment on this?

      Oligodendrocytes and OPCs have small cell bodies, often comparable in size to their nuclei. The central void in the ring-like Olig2 staining pattern appears too small to represent the nucleus. Additionally, a similar ring-like appearance is observed in BrdU labeling (Figure 5G), suggesting that this staining pattern may reflect nuclear morphology or other structural features.

      Significance

      In our view the study tackles a fundamental question likely to be of interest to a specialized audience of cellular neuroscientists. This descriptive study is suggestive that in the studied system, oligodendrocyte density determines the spacing between nodes of Ranvier, but further manipulations of oligodendrocyte density per se are needed to test this convincingly.

      The main finding of our study is that the primary determinant of the biased nodal spacing pattern in the sound localization circuit is the regional heterogeneity in the morphology of oligodendrocytes due to their intrinsic properties (e.g., their ability to produce and extend myelin sheaths) rather than the density of the cells. This was based on our observations that a reduction of oligodendrocyte density by A3V-eTeNT expression caused unmyelinated segments but did not increase internodal length (Figure 6), further revealing the importance of oligodendrocyte density in ensuring full myelination for the axons with short internodes. Thus, we think that our study could propose the significance of oligodendrocyte heterogeneity in the circuit function as well as in the nodal spacing using experimental manipulation of oligodendrocyte density. 

      Reviewer #3:

      Evidence, reproducibility and clarity

      The authors have investigated the myelination pattern along the axons of chick avian cochlear nucleus. It has already been shown that there are regional differences in the internodal length of axons in the nucleus magnocellularis. In the tract region across the midline, internodes are longer than in the nucleus laminaris region. Here the authors suggest that the difference in internodal length is attributed to heterogeneity of oligodendrocytes. In the tract region oligodendrocytes would contribute longer myelin internodes, while oligodendrocytes in the nucleus laminaris region would synthesize shorter myelin internodes. Not only length of myelin internodes differs, but also along the same axon unmyelinated areas between two internodes may vary. This is an interesting contribution since all these differences contribute to differential conduction velocity regulating ipsilateral and contralateral innervation of coincidence detector neurons. However, the demonstration falls rather short of being convincing. I have some major concerns:

      (1) The authors neglect the possibility that nodal cluster may be formed prior to myelin deposition. They have investigated stages E12 (no nodal clusters) and E15 (nodal cluster plus MAG+ myelin). Fig. 1D is of dubious quality. It would be important to investigate stages between E12 and E15 to observe the formation of pre-nodes, i.e., clustering of nodal components prior to myelin deposition.

      Thank you for your insightful comment regarding the potential role of pre-nodal clusters in determining internodal length. Indeed, studies in zebrafish have suggested that pre-nodal clustering of node components prior to myelination may prefigure internodal length (Vagionitis et al., 2022). We have incorporated a discussion on whether such pre-nodal clusters could contribute to regional differences in nodal spacing in our manuscript (page 9, line 35).

      Whether pre-nodal clusters are detectable before myelination appears to depend on neuronal subpopulation (Freeman et al., 2015). To investigate the presence of pre-nodal clusters along NM axons in the brainstem auditory circuit, we previously attempted to visualize AnkG signals at E13 and E14. However, we did not observe clear structures indicative of pre-nodal clusters; instead, we only detected sparse fibrous AnkG signals with weak Nav clustering at their ends, consistent with hemi-node features. This result does not exclude the possibility of pre-nodal clusters on NM axons, as the detection limit of immunostaining cannot be ruled out. In brainstem slices, where axons are densely packed, nodal molecules are expressed at low levels across a wide area, leading to a high background signal in immunostaining, which may mask weak pre-nodal cluster signals prior to myelination. Regarding the comment on Figure 1D, we assume you are referring to Figure 2D based on the context. The lack of clarity in the high-magnification images in Figure 2D results from both the high background signal and the limited penetration of the MAG antibody. Furthermore, we are unable to verify Neurofascin accumulation at pre-nodal clusters, as there is currently no commercially available antibody suitable for use in chickens, despite our over 20 years of efforts to identify one for AIS research. Therefore, current methodologies pose significant challenges in visualizing pre-nodal clusters in our model. Future advancements, such as exogenous expression of fluorescently tagged Neurofascin at appropriate densities or knock-in tagging of endogenous molecules, may help overcome these limitations.

      However, a key issue to be discussed in this study is not merely the presence or absence of prenodal clusters, but rather whether pre-nodal clusters—if present—would determine regional differences in internodal length. To address this possibility, we have added new data in Figure 6I, measuring the length of unmyelinated segments that emerged following A3V-eTeNT expression.

      If pre-nodal clusters were fixed before myelination and predetermined internodal length, then the length of unmyelinated segments should be equal to or a multiple of the typical internodal length. However, our data showed that unmyelinated segments in the NL region were less than half the length of the typical NL internodal length, contradicting the hypothesis that fixed pre-nodal clusters determine internodal length along NM axons in this region.

      (2) The claim that axonal diameter is constant along the axonal length need to be demonstrated at the EM level. This would also allow to measure possible regional differences in the thickness of the myelin sheath and number of myelin wraps.

      As mentioned in our reply to comment 2 by Reviewer #1, the diameter of NM axons was already evaluated using electron microscopy (EM) in the pioneering study by Seidl et al., (2010). Additionally, EM-based analysis makes it difficult to clearly distinguish between the main trunk of NM axons and thin collateral branches at the NL region. Accordingly, we did not do the EM analysis in this revision. 

      In Figure 4, we used palGFP, which is targeted to the cell membrane, allowing us to measure axon diameter by evaluating the distance between two membrane signal peaks. This approach minimizes the influence of the blurring of fluorescence signals on diameter measurements. Thus, we believe that our method is sufficient to evaluate the relative difference in axon diameters between regions and hence to show that axon diameter is not the primary determinant of the 3-fold difference in internodal length between regions. 

      (3) The observation that internodal length differs is explain by heterogeneity of sources of oligodendrocyte is not convincing. Oligodendrocytes a priori from the same origin remyelinate shorter internode after a demyelination event.

      The heterogeneity in oligodendrocyte morphology would reflect differences in gene profiles, which, in turn, may arise from differences in their developmental origin and/or pericellular microenvironment of OPCs. We made this point as clear as possible in Discussion (page 9, line 21).

      Significance

      The authors suggest that the difference in internodal length is attributed to heterogeneity of oligodendrocytes. In the tract region oligodendrocytes would contribute longer myelin internodes, while oligodendrocytes in the nucleus laminaris region would synthesize shorter myelin internodes. Not only length of myelin internodes differs, but also along the same axon unmyelinated areas between two internodes may vary. This is an interesting contribution since all these differences contribute to differential conduction velocity regulating ipsilateral and contralateral innervation of coincidence detector neurons.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The paper sets out to examine the social recognition abilities of a 'solitary' jumping spider species. It demonstrates that based on vision alone spiders can habituate and dishabituate to the presence of conspecifics. The data support the interpretation that these spiders can distinguish between conspecifics on the basis of their appearance.

      We appreciate the reviewer’s summary. We indeed aimed at investigating the social recognition abilities of the solitary jumping spider (Phidippus regius), using visual cues alone. By employing a habituation-dishabituation paradigm, well-established in developmental psychology, we found support for the interpretation that these spiders can distinguish between conspecifics based on their appearance, as the reviewer noted.

      Strengths:

      The study presents two experiments. The second set of data recapitulates the findings of the first experiment with an independent set of spiders, highlighting the strength of the results. The study also uses a highly quantitative approach to measuring relative interest between pairs of spiders based on their distance.

      We appreciate the reviewer's acknowledgement of the strengths of our study. The second set of data underscores the robustness and reliability of the results. Additionally, however, the second experiment served the purpose of disentangling whether the habituation effect observed over sessions was caused by ‘physical’ or ‘cognitive’ fatigue by employing ‘long-term’ dishabituation trials at the end of Session 3. These trials are critical in our study as they help to differentiate between recognition of individual identities versus recognition of familiar individuals (as opposed to unfamiliar ones) and to determine if the observed effects are due to ‘general habituation’ or ‘specific recognition’. We will elaborate on this further below in this revision.

      As stated by the reviewer, we employed a highly quantitative approach to measure relative interest between pairs of spiders based on their distance, providing precise and objective data to support our conclusions.

      Weaknesses:

      The study design is overly complicated, missing key controls, and the data presented in the figures are not clearly connected to the study. The discussion is challenging to understand and appears to make unsupported conclusions.

      While we acknowledge that the study design is indeed complex, this complexity is essential for conducting a well-controlled and balanced experiment regarding the experimental conditions.  

      The habituation-dishabituation paradigm is a well-established paradigm in developmental psychology with non-verbal infants. It is understood that during the habituation phase, an individual's attention to a repeated stimulus decreases as they engage in information processing and form a mental representation of it. As the stimulus becomes familiar, it loses its novelty and interest. When a new stimulus is introduced, a recovery of attention suggests that the individual has compared this new stimulus to the stored memory of the habituation stimulus and detected a difference. This process suggests that the individual not only remembered the original stimulus but also recognized the new one as distinct (for a review Kavšek & Bornstein, 2010).

      This paradigm has also been extensively applied in animal research, where, like infants, nonverbal subjects rely on recognition and discrimination processes to demonstrate their cognitive abilities. The use of this paradigm dates back to seminal studies such as Humphrey (1974), which explored the perceptual world of monkeys, illustrating how species and individuals are perceived and recognized. In another previous study (Dahl, Logothetis, and Hoffman, 2007), we utilized an even more complex experimental design that incorporated dedicated baseline trials for both habituation and dishabituation phases, which was well-received despite its complexity. In the current study, we contrast dishabituation and habituation trials directly, creating a sequential cascade where each trial is evaluated against the preceding one as its baseline.

      On the basis of these arguments, we respectfully decline the claim that this paradigm is inappropriate or lacks key controls. Our study design, though complex, is rigorously grounded in established methodologies and offers a robust framework for exploring individual recognition in Phidippus regius.

      However, we take the reviewer’s comments seriously and are committed to identifying and addressing the aspects in our manuscript that may have led to misunderstandings. We clarify these areas in our revision of the manuscript. Modifications were made in the Introduction, Methods, and Discussion sections.

      Dahl, C. D., Logothetis, N. K., & Hoffman, K. L. (2007). Individuation and holistic processing of faces in rhesus monkeys. Proceedings of the Royal Society B: Biological Sciences, 274(1622), 2069-2076.

      Humphrey, N. K. (1974). Species and individuals in the perceptual world of monkeys. Perception, 3(1), 105-114.

      Kavšek, M., & Bornstein, M. H. (2010). Visual habituation and dishabituation in preterm infants: A review and meta-analysis. Research in developmental disabilities, 31(5), 951-975.

      (1) Study design: The study design is rather complicated and as a result, it is difficult to interpret the results. The spiders are presented with the same individual twice in a row, called a habituation trial. Then a new individual is presented twice in a row. The first of these is a dishabituation trial and the second is another habituation trial (but now habituating to a second individual). This is done with three pairings and then this entire structure is repeated over three sessions. 

      While we acknowledge that the design is complex, this complexity is essential for conducting a well-controlled experiment, as described earlier. As the reviewer noted, our design involves presenting the same individual to the focal spider twice in a row (habituation trial), followed by a new individual (dishabituation trial), and then repeating this structure. This approach is fundamental to the habituation-dishabituation paradigm, which allows us to systematically compare the responses to a familiar individual with those elicited by a novel one. If the spiders exhibit different behaviours in terms of the distance they maintain when encountering the same individual versus a new one, it indicates that they are processing the stimuli differently, consistent with recognition memory. This differential response is a key indicator that the spiders can distinguish between familiar and unfamiliar individuals, demonstrating not only a decrease in interest or engagement due to repeated exposure but also a cognitive process where the lack of a matching memory template triggers a distinct behavioural response when confronted with novel stimuli.

      By repeating this sequence two more times (Session 2 and 3), we aim to assess the consistency of this recognition process over time. If the focal spider does not remember the individuals from the previous session (one hour ago), we expect consistent behavioural responses across sessions. Conversely, if there is a decrease in response magnitude but the overall response patterns are maintained, we can infer that the focal spider recognizes the previously presented individuals and exhibits habituation, reflected in reduced response intensity. In other words, over sessions and repeated exposure to the same individuals, the memory traces become more firmly established, leading to a situation where a dishabituation trial introduces less novelty, as the spider's recognition of previously encountered individuals becomes more robust and consistent to the point where “habituation” and “dishabituation” trials become indistinguishable, as observed in Session 3. This method allows us to assess the duration of identity recognition in these spiders, indicating how long the memory of specific individuals persists. 

      All of these outcomes were anticipated before we began Experiment 1. Given that the results aligned with our predictions, we then sought to determine whether the observed reduction in the magnitude of the effect (i.e., the difference between habituation and dishabituation trials) was due to a physical fatigue effect, where the spiders might simply be getting tired, or a cognitive fatigue effect, where the spiders recognized the individuals and as a result did not exhibit any novelty response. To address this, we replicated the experiment with a new group of spiders and introduced special (long-term dishabituation) trials at the end, where the focal spider was presented with a novel spider. 

      These extra trials allowed us to disentangle the nature of the diminishing response across repeated sessions: a lack of dishabituation (remaining distant) would suggest general physical fatigue, whereas a strong dishabituation response (approaching closely) to the novel spider would indicate cognitive fatigue, thereby confirming that the spiders were indeed recognizing the familiar individuals throughout the experiment. 

      In light of these considerations, we believe that the complexity of our design is not only justified but absolutely necessary to rigorously test the cognitive capabilities of the spiders. Nonetheless, we understand the need for clarity in presenting our findings and are committed to refining our manuscript to better communicate the rationale and results of our study.

      The data appear to show the strong effects of differences between habituation and dishabituation trials in the first session. The decrease in differential behavior between the socalled habituation and dishabituation trials in sessions 2 and 3 is explained as a consequence of the spiders beginning to habituate in general to all of the individuals. 

      The key question, as mentioned above, is to determine the underlying cause of this general habituation across sessions. Specifically, we aim to differentiate between two potential causes: physical fatigue, where the spiders may simply become less responsive due to the demands of the three-hour testing period, or cognitive fatigue, where the repeated exposure to the same individuals leads to a decreased response because the spiders have started to recognize these individuals over multiple repetitions.

      To address this, we replicated the experiment and introduced each focal spider to a new individual in what we termed "long-term dishabituation" trials. By comparing the spiders' responses to these novel individuals with their responses in earlier trials, we sought to better understand the underlying mechanisms of habituation and the duration of individual recognition. The strong dishabituation response observed in these trials is indicative of cognitive fatigue, supporting the presence of recognition memory rather than a general physical fatigue effect.

      The claim that the spiders remember specific individuals is somewhat undercut because all of the 'dishabituation' trials in session 2 are toward spiders they already met for 14 minutes previously but seemingly do not remember in session 2. 

      We appreciate the reviewer’s comment regarding the claim that spiders do not remember specific individuals. This assessment does not align with the rationale of our experiment. The reviewer noted that the dishabituation trials in session 2 involved spiders previously encountered and suggested that the lack of a clear memory response might undercut the claim of specific individual recognition. 

      However, as we explained earlier, we expect habituation in Session 2 relative to Session 1 precisely because spiders recognize each other in Session 2. If there were no such habituation in Sessions 2 or 3, it would suggest that the spiders’ recognition memory does not persist beyond one hour. 

      Additionally, it is important to correct the timing noted by the reviewer: each individual spider reencounters the same spider exactly one hour later, not 14 minutes. This is detailed in Table 2 of the manuscript, which outlines that each trial lasts 7 minutes, with a 3-minute visual separation between trials. With six trials per session, this totals to 1 hour per session. Thus, every pair of spiders re-encounters exactly 1 hour after their last interaction.

      Again, it is important to clarify that the observed decrease in differential behaviour is not indicative of a failure to remember specific individuals. Rather, it reflects a systematic pattern of habituation, which is a common and expected outcome in such paradigms. This systematic decrease in response strength suggests that the spiders recognize the previously encountered individuals and becoming less responsive over repeated exposures, consistent with the process of habituation. In different terms, the repeated exposure to the same individuals leads to more firmly established memory traces, leading to a situation where a dishabituation trial introduces less novelty, as the spider's recognition of previously encountered individuals becomes more robust and consistent.

      Based on the explanations provided above, we respectfully reject the claim that “the spiders remember specific individuals is somewhat undercut […]”. In contrast, this claim is incorrect, as the exact opposite is true. The very strength of our study lies in demonstrating that spiders possess robust recognition memory, as evidenced by a clear dissociation of habituation and dishabituation trials in Session 1, followed by a gradually diminishing effect over Session 2 and 3 as the spiders are increased exposed to the same individuals: Furthermore, the strong rebound from habituation observed in long-term dishabituation trials, where the spiders were exposed to novel individuals. 

      This misunderstanding suggests that we should take additional care in the revised manuscript to clarify our explanations and provide more detail, ensuring that the rationale behind our experimental design and findings are communicated effectively.

      In session 3 it is ambiguous what is happening because the spiders no longer differentiate between the trial types. This could be due to fatigue or familiarity. 

      The reviewer proposes that the absence of differentiation between 'habituation' and 'dishabituation' trials in Session 3 might be attributed to either fatigue or familiarity. We interpret "fatigue" as what we have termed the “physical fatigue effect” and "familiarity" as “cognitive fatigue effect.” In this context, we concur with the reviewer’s observation, and this very line of reasoning prompted us to conduct a further experiment following the outcome of Experiment 1.

      A second experiment is done to show that introducing a totally novel individual, recovers a large dishabituation response, suggesting that the lack of differences between 'habituation' and 'dishabituation' trials in session 3 is the result of general habituation to all of the spiders in the session rather than fatigue. As mentioned before, these data do support the claim that spiders differentiate among individuals.

      As the reviewer rightly noted, we addressed these possibilities in our second experiment by introducing a completely novel individual to the spiders, which resulted in a strong dishabituation response. This outcome suggests that the lack of differentiation in Session 3 is more likely due to cognitive habituation rather than physical fatigue. The robust response to novel individuals demonstrates that the spiders are capable of distinguishing between familiar and unfamiliar individuals, suggesting that the reduced differentiation is a consequence of habituation from repeated encounters with the same individuals. 

      We appreciate the reviewer's recognition that these findings support the conclusion that spiders are capable of differentiating between individual conspecifics.

      Additionally, it is important to clarify the structure of our sessions. Each of the 6 trials lasts 7 minutes with a 3-minute visual separation, resulting in a total of 1 hour per session. This ensures that each pair of spiders is encountered exactly one hour later, which controls for the timing and allows us to evaluate the spiders' recognition memory over repeated sessions.

      In summary, while the data show a decrease in differential behaviour between habituation and dishabituation trials in Session 2 and 3, the results from our second experiment support the interpretation that this is due to ‘cognitive habituation’ (familiarization) rather than ‘physical fatigue’ (general habituation). This habituation effect underscores the spiders' ability to recognize and become familiar with specific individuals over time, reinforcing our conclusion that they can differentiate among individuals.

      The data from session 1 are easy to interpret. The data from sessions 2 and 3 are harder to understand, but these are the trials in which they meet an individual again after a substantial period of separation. 

      The data from Session 1 are straightforward to interpret, showing clear differences between habituation and dishabituation trials. However, the data from Sessions 2 and 3 are more complex, as these sessions involve the spiders re-encounter individuals after a 1-hour period of separation. Importantly, the outcome is not an artefact in our experiment, but the consequence of a deliberate choice in the experimental design to assess whether spiders can recognise each other after this duration. We believe that this complexity aligns with our expectations, based on the assumption that spiders can recognise each other after one hour. The observed pattern of habituation in Sessions 2 and 3 suggests that the spiders retain memory of the individuals, leading to decreased responsiveness upon repeated encounters. This interpretation is further supported by the Experiment 2, which introduced a novel individual and elicited a strong dishabituation response. This finding confirms that the reduced differentiation in later sessions is due to cognitive habituation rather than physical fatigue, supporting the conclusion that recognition memory last at least one hour.

      We hope this explanation clarifies our findings and the rationale behind our relatively complex experimental design choice. 

      Other studies looking at recognition in ants and wasps (cited by the authors) have done a 4 trial design in which focal animal A meets B in the first trial, then meets C in the second trial, meets B again in the third trial, and then meets D in the last trial. In that scenario trials 1, 2, and 4 are between unfamiliar individuals and trial 3 is between potentially familiar individuals. In both the ants and wasps, high aggression is seen in species with and without recognition on trial 1, with low aggression specifically for trials with familiar individuals in species with recognition. Across different tests, species or populations that lack recognition have shown a general reduction in aggression towards all individuals that become progressively less aggressive over time (reminiscent of the session 2 and 3 data) while others have maintained modest levels of aggression across all individuals. The 4 session design used in those other studies provides an unambiguous interpretation of the data while controlling for 'fatigue'. 

      We acknowledge that there are multiple ways to design experiments to test recognition memory. In fact, we considered using the paradigm similar to the one proposed by the reviewer and used in studies like Dreier et al., which involves a series of trials with unfamiliar and familiar individuals over extended intervals. We then, however, opted for a more complex design to rigorously assess how habituation and recognition memory develop over repeated sessions with shorter intervals.

      In the following, we would like to describe the advantages and disadvantages of both paradigms and outline how we ended up using the more complex version:

      Advantages of our paradigm: 

      As pointed out, by repeating the sequence in exactly similar manner (every same pair of spiders reoccurs after exactly 1 and 2 hours), we can comprehensively evaluate the effect of habituation over multiple exposures. This allows us to assess the extent of the spiders’ memory, when a spider shows stronger habituation to individuals that were novel in Session 1 but “familiar” by the time they encounter them again in Session 2. To achieve this, we need to ensure that each trial and visual separation is precisely timed, ensuring consistent intervals between encounters. As a consequence, each individual spider undergoes the exact same experimental protocol. Most critically, however, are the novel individuals presented after Session 3 (long-term dishabituation trials) that help differentiate between cognitive habituation and physical fatigue.  Disadvantages of our paradigm:

      The sequences of habituation and dishabituation trials may make the design more complex, as pointed out by the reviewer. As a consequence, the interpretation will become more difficult. However, the data perfectly align with our predictions, and the outcomes were as anticipated in two independently run experiments with two groups of spiders. This highlights the reliability of our experimental design and robustness of our findings.

      Advantages of the 4-trial paradigm proposed by the reviewer:

      Clearly, the structure of the proposed design is simpler, making interpretation easier. The paradigm also accommodates longer intervals between trials (e.g., 24 hours). Longer intervals could theoretically have been applied in our study. (However, we chose not to leave the spiders in the experimental box longer than necessary, opting instead to return them to their home containers for the night to ensure their well-being. And, a 24-hour interval targets a different phase in the process of long-term memory, but more to this topic further below.)

      Disadvantages of the 4-trial paradigm proposed by the reviewer:

      Strictly replicating the 4-trial design would result in one familiar encounter versus three unfamiliar ones. This imbalance might introduce bias and limit the robustness of the measurements. Additionally, the design provides less data overall, as the focal individual will be confronted with three other individuals, who will then be excluded from further testing as focal subjects themselves. In contrast, our design ensures a balanced number of familiar0020(habituation) and novel encounters (dishabituation) for each focal individual, allowing for more efficient and comprehensive data collection without excluding individuals from further testing.

      Given the aforementioned considerations, we determined that the advantages of our experimental design, in particular the assessment of a cognitive fatigue effect when encountering the same individuals again, outweigh those of the proposed 4-trial design. The mentioned limitations of the 4-trial design, such as the potential for bias and less comprehensive data collection, do not justify re-running the study, especially when the best case scenario is fewer insights than our already existing findings. Our current paradigm yielded results that align perfectly with our predictions, offering a thorough and reliable understanding of recognition memory and habituation in spiders. Therefore, we believe our approach provides a more complete and robust answer to our research questions.

      However, we acknowledge that there might be insufficient information in the manuscript addressing the rationale behind our design choices, and we will revise the manuscript to provide a clearer explanation of why our approach is well suited to answering the research questions at hand.

      That all trials in sessions 2 and 3 are always with familiar individuals makes it challenging to understand how much the spiders are habituating to each other versus having some kind of associative learning of individual identity and behavior.

      We understand the reviewer's concern that having all trials in Sessions 2 and 3 involve familiar individuals could make it challenging to distinguish between general habituation and associative learning of individual identities. In our study, we contrast habituation and dishabituation trials: If general habituation were occurring, we would expect uniformly reduced responses (around the zero line) to all individuals over time, indicating that the spiders are getting used to any individual regardless of their specific identity. However, this is not the case. Our data show that while the responses in Session 2 are reduced in effect size compared to Session 1, they are not flat (around the zero line). This indicates that the spiders still differentiate between a repetition of a spider identity (habituation trials) and two different spider identities (dishabituation trials), albeit with a reduced response strength. The systematicity in the data suggests that the spiders are not merely habituating to any individual, but are instead retaining some level of recognition between specific individuals.

      Only by Session 3 do the spiders fully habituate to the point where the responses to habituation and dishabituation trials converge, indicating a complete habituation effect. The introduction of novel individuals in our long-term dishabituation trials further supports the idea that the spiders are recognizing specific individuals rather than exhibiting general habituation. If the spiders were experiencing general habituation, we would not expect the strong dishabituation response observed in our study.

      The data presentation is also very complicated. How is it the case that a negative proportion of time is spent? The methods reveal that this metric is derived by comparing the time individuals spent in each region relative to the previous time they saw that individual. 

      We understand the reviewer's concern regarding the complexity of the data presentation and the calculation of the negative proportion of time. Regarding the complexity of the design, we have already justified our choice of a more intricate experimental setup. This complexity is necessary for accurately assessing recognition memory and habituation over repeated sessions. 

      The metric is derived by comparing the time individuals spent in each region (relative to the transparent front panel) in the current trial (n) relative to the previous trial (n-1). With multiple trials, this results in a cascade of trials and conditions. This method was established in

      Humphrey’s and our previous study (Humphrey, 1974; Dahl, Logothetis, Hoffman, 2007), where we demonstrated its effectiveness in assessing individuation of faces in macaque monkeys.  

      Also in our current experimental design, each current trial is contrasted with the preceding one, allowing us to compare distributions of distances taken in two trials. In this context, every preceding trial serves as baseline for every current trial. 

      Figure 1 of the manuscript, illustrates the structure and analysis of the trials,

      Panel a depicts the baseline, habituation, and dishabituation trials, where spiders are exposed to different conspecifics.

      Baseline (left panel, red): When two spiders are visually exposed to each other for the first time, it is expected that they will explore each other closely, exhibiting high levels of proximity (initial exploratory behaviour).

      Habituation (centre panel, green): When the same spiders are reintroduced in a subsequent round of exposure, it is anticipated that they will exhibit reduced exploratory behaviour and maintain a greater distance compared to the baseline trial, if they recognize each other from the previous encounter (indicative of habituation).

      Panel b (upper and middle panels; red and green): Demonstrates the theoretical assumptions and expected changes in behaviour:

      By subtracting the distribution of distances in the baseline trial from the habituation trial, we generate a delta distribution. This delta distribution reveals negative values near the transparent panel (indicating reduced proximity in the habituation trial) and positive values at mid- to fardistances (indicating increased distancing behaviour). This delta distribution is also what is reported in Figure 2. 

      Dishabituation: In this trial, a new spider (different from the one in the habituation trial) is introduced. The dishabituation trial will be considered in contrast to the habituation trial described above. If the spider recognizes the new individual as different, it is expected to show increased exploratory behaviour and reduced distance, similar to the initial baseline trial.

      By subtracting the distribution of distances in the habituation trial from the dishabituation trial, we obtain another delta distribution. This delta distribution should reveal positive values near the transparent panel (indicating increased proximity in the dishabituation trial) and negative values at mid- to far-distances (indicating decreased proximity compared to the habituation trial).

      We hope this clarifies the rationale behind our data presentation and the methodological approach we employed. We have revised the figure to enhance its clarity and make it more intuitive for the reader.

      Dahl, C. D., Logothetis, N. K., & Hoffman, K. L. (2007). Individuation and holistic processing of faces in rhesus monkeys. Proceedings of the Royal Society B: Biological Sciences, 274(1622), 2069-2076.

      Humphrey, N. K. (1974). Species and individuals in the perceptual world of monkeys. Perception, 3(1), 105-114.

      At the very least, data showing the distribution of distances from the wall would be much easier to interpret for the reader.

      We understand the reviewer's concern that data showing the distribution of distances from the wall would be much easier to interpret for the reader. We initially consider that but came to the conclusion that this approach is not straightforward. For instance, if both spiders are positioned at the very front but in different corners, the distance to the panel would be very small, but the distance between the spiders would be large. Thus, using distances from the wall could misrepresent the actual spatial distribution between the spiders.

      (2) "Long-term social memory": It is not entirely clear what is meant by the authors when they say 'long-term social memory', though typically long-term memory refers to a form of a memory that requires protein synthesis.  

      To address this conceptually, we used the term "long-term social memory" to describe the spiders' ability to recognize and remember individual conspecifics over multiple experimental sessions. While social memory refers to the ability of an individual to recognize other individuals within a social context, long-term memory typically involves the retention of information over extended periods. Recognizing that the term “long-term social memory” is not commonly used, we have revised the manuscript to use the more standard term “long-term memory.”

      While the precise timing of memory formation varies across species and contexts, a general rule is that long-term memory should last for > 24 hours (e.g., Dreier et al 2007 Biol Letters). The longest time that spiders are apart in this trial setup is something like an hour. There is no basis to claim that spiders have long-term social memory as they are never asked to remember anyone after a long time apart.

      We appreciate the reviewer’s feedback regarding the term "long-term social memory." The statement "long-term memory should last for > 24 hours" is a generalisation in discussions about memory. It oversimplifies a more complex topic. That is, long-term memory is typically distinguished from short-term memory by its persistence over time, often lasting from hours to a lifetime. However, the exact duration that qualifies memory as "long-term" varies depending on the context, model species, and type of memory. In studies involved in synaptic plasticity (LTP), the object might indeed be to look at memory that persists for at least 24 hours as a criterion for long-term memory. In studies of cellular and/or molecular mechanisms where the stabilization and consolidation of memory traces over time are key areas of interest this 24-hour interval is very common. But, defining long-term memory strictly by a 24-hour duration is by no means universally accepted nor does it apply across all fields of study.

      To clarify, long-term memory is a process involving consolidation starting within minutes to hours after learning. Clearly, full consolidation can take longer, while memory persisting 24 hours is considered fully consolidated. But this does not mean that memory lasting less than 24 hours are not part of long-term memory. 

      In fact, Atkinson and Shiffrin (1969) proposed that information entering short-term memory remains there for about 20 to 30 seconds before being displaced due to space limitations. During this brief interval, initial encoding processes begin transferring information to long-term memory, establishing an initial memory trace. This transfer is not indicative of full consolidation but represents the initial "laying down" of the memory trace (encoding). In our study, the focal spider’s brain forms initial memory traces of the individuals it encounters. This process continues during the period of visual separation. Upon re-encountering the same individual a few minutes later, the spider accesses the initial memory trace stored in long-term memory. This trace is fragile and not fully consolidated. The re-encounter acts as a rehearsal, reactivating specific memory traces and potentially strengthening them through additional encoding processes, allowing the spider to recognize the individual even an hour later.

      According to Markowitsch (2013), initial encoding in long-term memory begins within seconds to minutes. It is also important to note that we argue for identity recognition rather than identity recall. Recognition involves correctly identifying a stimulus when it is presented again, while recall requires the volitional generation of information without an external stimulus. Thus, recall may rely on deeper forms of memory consolidation than recognition.

      Is protein synthesis required for long-term memory? 

      The role of protein synthesis in long-term memory has been extensively studied. According to Castellucci et al. (1978), explicit memory comprises a short-term phase that does not require protein synthesis and a long-term phase that does. Hebbian learning in its initial phase (early LTP) does not necessarily require protein synthesis. This phase involves the rapid strengthening of synapses through existing proteins and signaling pathways, such as the activation of NMDA receptors and the influx of Ca2+ ions. For the changes to persist (late LTP), protein synthesis is important. This phase involves the production of new proteins that contribute to long-term structural changes at the synapse, such as the growth of new synaptic connections or the stabilization of existing ones.

      This differentiation between the early and late phases of LTP highlights that long-term memory can begin forming without immediate protein synthesis. Our study focuses on this early phase of memory encoding, which involves the initial formation of memory traces that do not yet depend on protein synthesis. 

      It is however worth noting that recent research suggests that there is an early phase of protein synthesis (within minutes to hours) through the activation of immediate early genes (IEGs) and transcription factors. In this context, protein synthesis supports initial synaptic modifications. What the reviewer refers to is the consolidation phase (late phase), where continued synthesis of proteins induces structural changes at synapses, leading to the formation of new synaptic connections. In our study, it is plausible to assume that an early form of protein synthesis may contribute to stabilizing the initial memory traces during the encoding phase. However, whether or not protein synthesis occurred in our spiders is beyond the scope of this investigation and was not specifically addressed.

      The critical aspect of our study is that the information transitioned from short-term memory to long-term memory during an early encoding phase, allowing recall after an hour. Due to the inherent limitations and transient nature of the short-term memory, it is implausible for spiders to retain these memory representations solely within the short-term memory for such durations. Our findings suggest that the initial encoding processes were robust enough to transfer these experiences into long-term memory, where they were stabilized and could be accessed later. 

      In sum, it is important to note that long-term memory is a dynamic process, and while testing after 24 hours is a convention in some studies, this timing is arbitrary and not universally applicable to all contexts or species. The more critical consideration here is that we are dealing with a species where no prior evidence of long-term memory exists. Debating a 24-hour delay or the specifics of protein synthesis, while potentially interesting for future studies, detracts from the true significance of our findings. Our study is the first to show something akin to long-term memory representations in this species and this should remain in our focus.

      Shiffrin, R. M., & Atkinson, R. C. (1969). Storage and retrieval processes in long-term memory. Psychological review, 76(2), 179. 

      Markowitsch, H. J. (2013). Memory and self–Neuroscientific landscapes. International Scholarly Research Notices, 2013(1), 176027.

      Castellucci, V. F., Carew, T. J., & Kandel, E. R., 1978. Cellular analysis of long-term habituation of the gill-withdrawal reflex of Aplysia californica. Science, 202(4374), 1306-1308.

      The odd phrasing of the 'long-term dishabutation' trial makes it seem that it is testing a longterm memory, but it is not. The spiders have never met. The fact that they are very habituated to one set of stimuli and then respond to a new stimulus is not evidence of long-term memory. To clearly test memory (which is the part really lacking from the design), the authors would need to show that spiders - upon the first instance of re-encountering a previously encountered individual are already 'habituated' to them but not to some other individuals. The current data suggest this may be the case, but it is just very hard to interpret given the design does not directly test the memory of individuals in a clear and unambiguous manner.

      While we appreciate the reviewer's feedback, we believe there may have been some misunderstanding regarding the term “long-term dishabituation.” The introduction of novel individuals at the end of Session 3 was not intended to test long-term memory by having spiders recognize these novel individuals. Instead, it aimed to investigate the nature of the habituation observed over the three sessions.

      The novel individuals introduced at the end of Session 3 serve the purpose to differentiate between general habituation (a decline in response due to repeated exposure to any stimuli) and specific habituation (recognition and reduced response to previously encountered individuals). The novel spiders have never been encountered before, so the focal spiders cannot have prior representations of them. Thus, the strong dishabituation response to these novel individuals indicates that the habituation observed earlier is not due to a general fatigue effect or loss of interest but rather a specific habituation effect to the familiar individuals. By showing such strong and increased response to novel individuals, the study demonstrates that the spiders' increasingly reduced responses in Sessions 2 and 3 are not merely due to a general decrease in responsiveness but suggest cognitive habituation. This cognitive habituation implies that the spiders remember the familiar individuals (as each of them occurred three times across the three sessions), a process that relies on long-term memory. Therefore, while the novel spiders themselves are not a direct test of long-term memory, the use of these novel spiders helps us infer that the habituation observed over the three sessions is indeed due to the formation of long-term memory traces.

      In other words, the organism detects and processes the novel stimulus as different from the habituated one. In our study, if a spider showed a strong dishabituation response to a novel individual introduced at the end of Session 3, it would indicate that the spider had formed specific representations of the individuals they encountered during the three sessions. These representations allow the spiders to recognise the novel individuals as different, leading to renewed interest and a stronger behavioural response. It is the absence of a prior representation for the novel spiders that triggers this dishabituation response. Since the novel spider does not match any stored representations of the previously encountered spiders, the focal spider responds more strongly.

      The introduction of novel individuals at the end of Session 3 helps clarify that the increasing habituation observed in Session 2 and 3 is specific to familiar individuals, indicating cognitive habituation. This supports the presence of long-term memory processes in the spiders, as they can distinguish between previously encountered individuals and new ones. The habituationdishabituation paradigm thus effectively demonstrates the spiders' ability to form and reactivate encoded memory traces, providing clear evidence of recognition memory. 

      For these reasons, we are convinced that our interpretation is accurate and hope this clarification renders the additional request for an entirely new experiment unnecessary.

      (3) Lack of a functional explanation and the emphasis on 'asociality': It is entirely plausible that recognition is a pleitropic byproduct of the overall visual cognition abilities in the spiders. 

      We agree with the reviewer that it is essential to consider the broader context of individual recognition and its potential adaptive significance. The possibility that recognition in jumping spiders could be a pleiotropic byproduct of their advanced visual cognition abilities is indeed a plausible explanation and has been discussed in our manuscript.

      However, the discussion that discounts territoriality as a potential explanation is not well laid out. First, many species that are 'asocial' nevertheless defend territories. It is perhaps best to say such species are not group living, but they have social lives because they encounter conspecifics and need to interact with them.

      The reviewer also correctly points out that many 'asocial' species still defend territories and have social interactions. Our use of the term 'asocial' was meant to indicate that jumping spiders do not live in cohesive social groups, but we acknowledge that they do have social lives in terms of interactions with conspecifics. It is more accurate to describe these spiders as non-groupliving, yet socially interactive species. A better term is “non-social” to refer to the jumping spider as a species that do not live in stable social groups and do not exhibit associated behaviours, such as cooperative behaviours. This also would imply that individuals still interact with conspecifics, especially in contexts like mating, territorial disputes or aggression. We, thus, change the term from “asocial” to “non-social” in the manuscript.  

      Indeed, there are many examples of solitary living species that show the dear enemy effect, a form of individual recognition, towards familiar territorial neighbors. The authors in this case note that territorial competition is mediated by the size or color of the chelicerae (seemingly a trait that could be used to distinguish among individuals). Apparently, because previous work has suggested that territorial disputes can be mediated by a trait in the absence of familiarity has led them to discount the possibility that keeping track of the local neighbors in a potentially cannibalistic species could be a sufficient functional reason. In any event, the current evidence presented certainly does not warrant discounting that hypothesis.

      The “dear enemy effect”, where solitary living species recognize and show reduced aggression towards familiar territorial neighbors, is a relevant consideration. This effect demonstrates that individual recognition can have significant functional implications even in species that are not group-living. We will elaborate on this effect in the revised manuscript to provide a more comprehensive discussion.

      The reviewer mentioned that territorial disputes can be mediated by the size or color of the chelicerae, potentially serving as a feature for individual recognition. Our intention was not to discount the role of such traits but to highlight that the level of identity recognition we observed represents subordinate classification. This is different from the basic-level classification, such as distinguishing between male and female based on chelicerae colour. While we acknowledge that colour can be an important feature for identity discrimination, our findings suggest that individual recognition in jumping spiders goes beyond simple colour differentiation. 

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, the authors investigated whether a salticid spider, Phidippus regius, recognizes other individuals of the same species. The authors placed each spider inside a container from which it could see another spider for 7 minutes, before having its view of the other spider occluded by an opaque barrier for 3 minutes. The spider was then either presented with the same individual again (habituation trial) or a different individual (dishabituation trial). The authors recorded the distance between the two spiders during each trial. In habituation trials, the spiders were predicted to spend more time further away from each other and, in dishabituation trials, the spiders were predicted to spend more time closer to each other. The results followed these predictions, and the authors then considered whether the spiders in habituation trials were generally fatigued instead of being habituated to the appearance of the other spider, which may have explained why they spent less time near the other individual. The authors presented the spiders with a different (novel) individual after a longer period of time (which they considered to be a long-term dishabituation trial), and found that the spiders switched to spending more time closer to the other individual again during this trial. This suggested that the spiders had recognized and had habituated to the individual that they had seen before and that they became dishabituated when they encountered a different individual.

      We appreciate the reviewer's detailed summary of our study. The reviewer's summary accurately captures the essence of our experimental design, predictions, and findings.

      Strengths:

      It is interesting to consider individual recognition by Phidippus regius. Other work on individual recognition by an invertebrate has been, for instance, known for a species of social wasp, but Phidippus regius is a different animal. Importantly and more specifically, P. regius is a salticid spider, and these spiders are known to have exceptional eyesight for animals of their size, potentially making them especially suitable for studies on individual recognition. In the current study, the results from experiments were consistent with the authors' predictions, suggesting that the spiders were recognizing each other by being habituated to individuals they had encountered before and by being dishabituated to individuals they had not encountered before. This is a good start in considering individual recognition by this species.

      We appreciate the reviewer's positive summary and acknowledgment of the strengths of our study. We would like to point out some more details: 

      While the exceptional eyesight of salticid spiders is indeed a significant factor, our study reaches deeper in terms of processing. We do not argue at the level of sensation rather than at the level of perception. Even more, identity recognition is a higher-level perceptual process. This distinction is crucial: we are not merely examining the spiders' sensory capabilities (such as good eye sight), but rather how their brains interpret and represent what they “see”. This involves a cognitive process where the sensory input (sensation) is processed and integrated into meaningful constructs (perception) and memorised in form of representations. 

      Our study also suggests that P. regius engages in “higher-level” perceptual processes. This most-likely involves complex representations of individual conspecifics, which in mammalian brains are associated with regions such as the central inferior temporal (cIT) and anterior inferior temporal (aIT) areas. We provide evidence that these spiders do not just sense visual stimuli but interpret and recognize individual identities, indicating sophisticated perceptual and cognitive abilities. In other words, the spiders do not merely respond to visual stimuli in a reflexive manner, but rather engage in sophisticated perceptual and cognitive processes that allow them to recognize and distinguish between individual identities. This indicates that the spiders are not simple Braitenberg vehicles reacting to stimuli, but are thinking organisms capable of complex mental representations. This resonates with current trends in animal cognition research, which increasingly recognize some level of consciousness and advanced cognitive abilities across a wide range of animal species. Moreover, this aligns with the growing interest and recognition of spider cognition, where research begins to provide evidence for the cognitive complexity and perceptual capabilities of these often underestimated creatures (Jackson and Cross, 2011). 

      Jackson, R. R., & Cross, F. R. (2011). Spider cognition. Advances in insect physiology, 41, 115174.

      Weaknesses:

      The experiments in this manuscript (habituation/dishabituation trials) are a good start for considering whether individuals of a salticid species recognize each other. I am left wondering, however, what features the spiders were specifically paying attention to when recognizing each other. The authors cited Sheehan and Tibbetts (2010) who stated that "Individual recognition requires individuals to uniquely identify their social partners based on phenotypic variation." Also, recognition was considered in a paper on another salticid by Tedore and Johnsen (2013).

      Tedore, C., & Johnsen, S. (2013). Pheromones exert top-down effects on visual recognition in the jumping spider Lyssomanes viridis. The Journal of Experimental Biology, 216, 1744-1756. doi: 10.1242/jeb.071118 

      In this elegant study, the authors presented spiders with manipulated images to find out what features matter to these spiders when recognizing individuals.

      The reviewer raises an important point regarding the specific features that Phidippus regius might be paying attention to when recognizing individual conspecifics. Our study indeed cited Sheehan and Tibbetts (2010) to highlight the importance of phenotypic variation in individual recognition. Additionally, we referenced the work by Tedore and Johnsen (2013) on visual recognition in another salticid species, which suggests that multiple sensory modalities, including visual and pheromonal cues, may be involved in the recognition process. While our current study focused on demonstrating that Phidippus regius can recognize individual conspecifics, we acknowledge that it does not specifically identify the phenotypic features involved in this recognition. 

      Part of the problem with using two living individuals in experiments is that the behavior of one individual can influence the behavior of the other, and this can bias the results.  

      We appreciate the reviewer's observation regarding the potential bias introduced by using two living individuals in experiments, as the behaviour of one individual can indeed influence the behaviour of the other. We shared this concern initially; however, the consistency of the data with our hypotheses suggests that this potential bias did not adversely affect the validity of our findings, rendering the concern largely illusory at least in the context of our study.

      We opted for the living-individual paradigm for the following reasons:

      There is a growing trend in ethological as well as animal cognition research towards more ecologically valid and biologically relevant settings, while simultaneously advancing the precision and quantification of the data collected. This is referred to as computational ethology.

      This approach advocates for assessing behaviour in environments that more closely resemble natural conditions, rather than relying solely on sterile and artificial experimental setups. The rationale is that such naturalistic arenas allow animals to exhibit a broader range of behaviours and interactions, providing a more accurate reflection of their cognitive and social abilities. The challenge, however, lies in navigating the inherent tradeoff between the strict control offered by standardized procedures and the ecological validity of more naturalistic interactions.

      By allowing two spiders to confront each other, we aimed to capture authentic behavioural responses while maintaining a degree of experimental standardization through the use of a controlled setup. Our approach ensures that the behaviours observed are not merely artifacts of an artificial environment but are representative of genuine social interactions. Also, to minimize potential biases arising from mutual behavioural influences, we employed a controlled and repeatable experimental environment. 

      We believe that the chosen approach provides a meaningful balance (in the above-mentioned trade-off) between ecological validity and experimental rigour. By combining a standardized environment with the naturalistic interaction of real spiders, we ensured that our findings are both scientifically robust and biologically relevant.

      However, this issue can be readily avoided because salticids are well known, for example, to be highly responsive to lures (e.g. dead prey glued in lifelike posture onto cork disks) and to computer animation. 

      While it is true that salticid spiders are responsive to lures and computer animations, we carefully considered the most appropriate and ecologically valid approach for our study. Our aim was to capture genuine behavioural patterns in a context that closely mimics the natural encounters these spiders experience.

      Additionally, creating comparable video stimuli of spiders presents its own set of challenges: Video recordings or computer animations may not fully capture the nuanced behaviours and subtle variations that occur during real-life interactions. There is also a risk that such stimuli could be perceived differently by the spiders, potentially introducing new biases or confounding factors.

      Scientific progress is not made by merely relying on previously established paradigms, especially when they may not be suitable for the specific context of a study. While alternative methods like lures or computer animations can be valuable in certain situations, our approach was deliberately chosen to best capture the naturalistic and interactive aspects of spider behaviour.

      These methods have already been successful and helpful for standardizing the different stimuli presented during many different experiments for many different salticid spiders, and they would be helpful for better understanding how Phidippus regius might recognize another individual on the basis of phenotypic variation. There are all sorts of ways in which a salticid might recognize another individual. Differences in face or body structure, or body size, or all of these, might have an important role in recognition, but we won't know what these are using the current methods alone. Also, I didn't see any details about whether body size was standardized in the current manuscript.

      As mentioned previously, the goal of our study was to demonstrate that identity recognition occurs in spiders. This alone is of significant importance, as it challenges existing assumptions about the cognitive capabilities of small-brained animals. We did not aim at providing a proximate explanation (mechanism) for identity recognition in spiders.

      The problem with what the reviewer suggested is this: As long as we do not have conclusive evidence that spiders recognize individual conspecifics, any attempt to design and manipulate stimuli would lack a solid foundation. Without understanding whether spiders have this capability, we cannot make informed decisions about which features or characteristics to manipulate in stimuli. In other words, this uncertainty means we lack a starting point for our assumptions, making it nearly impossible to create stimuli that would be useful or relevant in testing identity recognition.

      Additionally, it is nearly impossible to artificially generate a stimulus set that encompasses the natural variance in features that spiders use for visual individuation. There is no guarantee that artificial stimuli, such as lures or computer animations, would capture the relevant features that spiders use in natural interactions.

      In other words, the question how Phidippus regius recognizes another individual will be subject of further investigation. In this study, we focus on whether or not they individuate others.  

      For another perspective, my thoughts turn to a paper by Cross et al.

      Cross, F. R., Jackson, R. R., & Taylor, L. A. (2020). Influence of seeing a red face during the male-male encounters of mosquito-specialist spiders. Learning & Behavior, 48, 104-112. doi: 10.3758/s13420-020-00411-y

      These authors found that males of Evarcha culicivora, another salticid species that is known to have a red face, become less responsive to their own mirror images after having their faces painted with black eyeliner than if their faces remained red. In all instances, the spiders only saw their own mirror images and never another spider, and these results cannot be interpreted on the basis of habituation/dishabituation because the spiders were not responding differently when they simply saw their mirror image again. Instead, it was specifically the change to the spider's face which resulted in a change of behavior. The findings from this paper and from Tedore and Johnsen can help give us additional perspectives that the authors might like to consider. On the whole, I would like the authors to further consider the features that P. regius might use to discern and recognize another individual.

      We acknowledge that identifying the specific features used by P. regius for identity recognition is a valuable direction for future research. However, we must emphasise that without first establishing whether spiders are capable of individuating each other, it would be premature and challenging to determine the specific features they rely on for this process. A lack of response to certain features could either suggest that those features are not relevant or, more critically, that the spider does not recognize individual identities at all. Thus, our initial focus on demonstrating identity recognition is essential before delving into the specific cues or characteristics involved.

      While the call for addressing the proximate causation of identity recognition in jumping spiders is valid, we need to also reiterate the significance of our findings and why they stand on their own merit:

      Our study demonstrates for the first time that Phidippus regius can systematically individuate conspecifics, showing habituation within short intervals (10 minutes) and over longer intervals (1 hour). This behaviour is not due to general habituation or physical fatigue but is a result of cognitive habituation, as illustrated by the spiders' response to novel individuals introduced after repeated encounters with familiarized ones. 

      What are the implications of this? Our findings indicate that these spiders possess long-term memory and form representations that can be reactivated after an hour. While this is most-likely not fully consolidated memory formation (see our reply to Reviewer 1), it represents an encoded long-term memory. This implies that small-brained animals can remember, represent, and potentially build internal mental images, which are crucial for sophisticated cognitive processing. 

      Reviewer #3 (Public Review):

      Summary:

      Jumping spiders (family Salticidae) have extraordinarily good eyesight, but little is known about how sensitive these small animals might be to the identity of other individuals that they see. Here, experiments were carried out using Phidippus regius, a salticid spider from North America. There were three steps in the experiments; first, a spider could see another spider; then its view of the other spider was blocked; and then either the same or a different individual spider came into view. Whether it was the same or a different individual that came into view in the third step had a significant effect on how close together or far apart the spiders positioned themselves. It has been demonstrated before that salticids can discriminate between familiar and unfamiliar individuals while relying on chemical cues, but this new research on P. regius provides the first experimental evidence that a spider can discriminate by sight between familiar and unfamiliar individuals.

      Clark RJ, Jackson RR (1995) Araneophagic jumping spiders discriminate between the draglines of familiar and unfamiliar conspecifics. Ethology, Ecology and Evolution 7:185-190

      We appreciate the reviewer's comprehensive summary and acknowledgment of the significance of our findings.

      Strengths:

      This work is a useful step toward a fuller understanding of the perceptual and cognitive capacities of spiders and other animals with small nervous systems. By providing experimental evidence for a conclusion that a spider can, by sight, discriminate between familiar and unfamiliar individuals, this research will be an important milestone. We can anticipate a substantial influence on future research.

      We appreciate the reviewer’s recognition of the strengths and significance of our study. We are pleased that the reviewer considers our research an important milestone. Our findings indeed suggest that even animals with relatively simple nervous systems can perform complex cognitive tasks, which has substantial implications for the broader study of animal cognition.

      As pointed out by the reviewer, we also hope that our study will have a substantial influence on future research. By establishing a methodology and providing clear evidence of visual discrimination, we aim to encourage further investigations into the cognitive abilities of jumping spiders and other arthropods. Future research can build on our findings to explore the specific visual cues and mechanisms involved in individual recognition (as Reviewer 2 pointed out), as well as the ecological and evolutionary implications of these abilities.

      Weaknesses:

      (1) The conclusions should be stated more carefully.

      We agree that clarity in our conclusions is paramount. We will revise the manuscript to ensure that our conclusions are presented with precision and appropriately reflect the data. Specifically, we will emphasize the evidence supporting our findings of visual individual recognition and clarify the limitations and scope of our conclusions to avoid any potential overstatements.

      (2) It is not clearly the case that the experimental methods are based on 'habituation (learning to ignore; learning not to respond). Saying 'habituation' seems to imply that certain distances are instances of responding and other distances are instances of not responding but, as a reasonable alternative, we might call distance in all instances a response. However, whether all distances are responses or not is a distracting issue because being based on habituation is not a necessity.

      We appreciate the reviewer's feedback and understand the concern regarding the use of the term 'habituation.' We agree that all distances maintained by the spiders are active responses and reflect their behavioral decisions based on perception and recognition of the other individual. We recognize that all distances are responses and interpret these as the spiders’ “active decisions”, modulated by their recognition of the same or different individuals. 

      The terms 'habituation' and 'dishabituation' are used to label trial types for ease of discussion and to describe the expected behavioural modulation.

      (3) Besides data related to distances, other data might have been useful. For example, salticids are especially well known for the way they communicate using distinctive visual displays and, unlike distance, displaying is a discrete, unambiguous response.

      We appreciate the reviewer’s suggestion to incorporate data on visual displays, which are indeed well-known communication methods among salticids. We agree that visual displays are discrete and unambiguous responses that could provide additional insights into the spiders' recognition abilities.

      Our primary focus on distance measurements was driven by the need to quantify behaviour in a continuous and scalable manner, that is, how spiders modulate their proximity based on familiarity with other individuals.

      We acknowledge the potential value of including visual display measurments; however, in our study, we aimed to establish a foundational understanding of recognition behaviour through proximity measures first. Also, capturing diplays requires a different experimental paradigm, where the displays are clearly visible and analyzable. 

      (4) Methods more aligned with salticids having extraordinarily good eyesight would be useful. For example, with salticids, standardising and manipulating stimuli in experiments can be achieved by using mounts, video playback, and computer-generated animation.

      There is no doubt that salticids have excellent eyesight. However, our study focuses on higherlevel perceptual processes that require complex brain analysis, not just visual acuity. The goal was to investigate whether spiders can individuate and recognize conspecifics, which involves interpreting visual information and forming long-term representations.

      Clearly, methods like video playback and computer animations are useful in controlled settings, where the spider is mounted, but they pose challenges for our specific research question. At this stage of research, we lack precise knowledge of which visual features are critical for individual recognition in spiders, making it difficult to design effective artificial stimuli. 

      Our primary objective was to determine if spiders can individuate others. Before exploring the proximate mechanisms of how they individuate others, it was essential to establish that they have this capability. This foundational question needed to be addressed before delving into more detailed mechanistic studies.

      (5) An asocial-versus-social distinction is too imprecise, and it may have been emphasised too much. With P. regius, irrespective of whether we use the label asocial or social, the important question pertains to the frequency of encounters between the same individuals and the consequences of these encounters.

      Our intent was to convey that P. regius does not live in cohesive social groups but does engage in individual interactions that can have significant behavioral consequences. We will revise the manuscript to reduce the emphasis on the asocial-versus-social distinction. As discussed above, we also will change the term “asocial” to “non-social” in the manuscript.

      (6) Hypotheses related to not-so-strictly adaptive factors are discussed and these hypotheses are interesting, but these considerations are not necessarily incompatible with more strictly adaptive influences being relevant as well.

      We appreciate the reviewer's observation regarding the discussion of hypotheses related to notso-strictly adaptive factors. We agree that our considerations of these factors do not preclude the relevance of more strictly adaptive influences.

      We will revise the manuscript to explicitly discuss how our findings can be interpreted in the context of adaptive hypotheses. This will provide a more comprehensive understanding of the evolutionary significance of individual recognition in P. regius. Modifications were made in the Discussion section.

      In the following, we comment on issues not mentioned in the “public reviews” section.

      Reviewer #1 (Recommendations For The Authors):

      (1) I would suggest conducting experiments that actually test for recognition memory, as this seems to be a claim that the authors make. Following the ant studies by Dreier cited in this manuscript would be sufficient to test for memory. Given the relative simplicity of the measures being taken (location of spiders), this would seem like a very simple addition that would provide a much stronger and more readily interpreted dataset.

      As previously explained in our detailed responses (public reviews), we believe that the current design effectively addresses the questions at hand. Our approach, using a habituationdishabituation paradigm, provides robust evidence for recognition memory within the framework of early long-term memory.

      Additionally, we have explained why using the distance to the panel as a measure is not appropriate in this context. Specifically, using such a measure can misrepresent the actual interests of the spiders in each other.

      While we acknowledge the merits of the ant studies by Dreier, our current design allows for a detailed understanding of the spiders' recognition capabilities over short (10 min) and slightly longer intervals (up to one hour). This is sufficient to demonstrate the presence of recognition memory without the necessity of further experiments. The observed patterns of habituation and dishabituation responses in our study clearly indicate that the spiders can distinguish between familiar and novel individuals, which supports our claims.

      Given these points, we respectfully maintain that the current data and experimental design are adequate to support our findings and provide a comprehensive understanding of recognition memory in Phidippus regius.

      (2) The writing is rather impenetrable. The results explain the basic finding in terms of statistical variables rather than simply stating the results. A clear and straightforward statement such as 'the spiders showed reduced interest upon habituation trials, indicating xyz' (and then citing the stats) is preferable to the introduction of results as a statistical model. The statistical model is a means of assessing the results. It is not the result. Describe the data.

      We tried to improve that in the current version.

      (3) Showing more straightforward data such as distance from the joint barrier would make the paper much easier to understand.

      This paper has been on bioRxiv for some time and my guess is that it has ended up here because it is having trouble in review. Collecting new data that more directly test the question at hand, presenting the data in a more direct manner, and more critically evaluating your own claims will improve the paper.

      While it is true that the paper has been on bioRxiv for a while, this submission marks the first instance where it has undergone peer review. Prior to this, the manuscript was submitted to other journals but was not reviewed.

      We hope the explanations provided in the “public reviews” section, along with the revised manuscript, sufficiently clarify our study and its conclusions. We believe the current data robustly address the research questions, and as outlined in our detailed responses, we have critically evaluated our claims and presented the data clearly. Given these clarifications, we do not see the necessity for new experiments as the existing data adequately support our findings. We trust that these revisions and explanations will clarify any misunderstandings.

      I am totally sold that the spiders are paying attention to identity at some level. The key now is to understand what that actually means in terms of recognition (i.e. memory of individuals) not just habituation.

      We appreciate the reviewer’s emphasis on the distinction between habituation and memorybased individual recognition. As detailed in the preceding discussion, we have taken great care to clarify how our paradigm distinguishes simple habituation effects from true memory for individual identity. We trust that the preceding sections make clear how our findings go beyond simple habituation to establish genuine individual recognition.

      Reviewer #2 (Recommendations For The Authors):

      Aside from the comments in the public review, I have some additional comments that the authors may wish to consider.

      Numerous times in the manuscript, the authors mentioned that recognizing individuals requires recognition memory. This seems rather obvious, and I wonder if the authors could instead be more precise about what they mean by 'recognition memory'?

      Recognition memory refers to the cognitive ability to identify a previously encountered stimulus, an individual, or events as familiar. It involves both encoding and retrieval processes, allowing an organism to distinguish between novel and familiar stimuli. This form of memory is a fundamental component of cognitive functioning and is supported by neural mechanisms that, in the mammal brain, involve the hippocampus and other brain regions associated with memory processing. 

      In our study, we aimed to test whether Phidippus regius recognizes conspecifics, or, in other words, utilizes recognition memory to distinguish between familiar and unfamiliar conspecifics. With the habituation - dishabituation paradigm, we assessed the spiders' ability to recognize previously encountered individuals and demonstrate memory retention over short (10 min) and extended periods (1 hour).

      Encoding: In the initial trial, when a spider encounters an individual for the first time (Figure 1A, “Baseline” or “Dishabituation” for every following trial), it encodes the visual information related to that specific individual. This encoding process involves creating a memory trace of the individual's phenotypic characteristics.

      Storage: During the visual separation period, this encoded information is stored in the spider's memory system. The memory trace, though initially fragile, starts to stabilize over the separation period. Whether or not this leads to some form of consolidated memory remains unaddressed. This aspect was highlighted by the first reviewer, but our focus is on the early process rather than on late processes, such as consolidation. 

      Retrieval: In the subsequent trial, when the same individual is presented again, the spider retrieves the stored memory trace. If the spider recognizes the individual, its behaviour reflects habituation, indicating memory retrieval. Conversely, when a novel individual is introduced, the lack of stored memory trace triggers a different behavioural response, indicating dishabituation. This differential response demonstrates the spider's ability to distinguish between familiar and unfamiliar individuals. This differential response is also key to understanding the nature of habituation over the three sessions, as introducing novel spiders leads to a significant dishabituation response after the three sessions in Experiment 2.

      In Line 39, the authors state that they used "a naturalistic experimental procedure". I would like to know how this experiment is 'naturalistic'. The authors' use of an arena does not appear naturalistic, or something the spiders would encounter in the wild.

      We appreciate the reviewer's comment regarding our use of the term 'naturalistic'. We acknowledge that the experimental arena itself does not replicate the conditions found in the wild. Our approach aimed to incorporate elements of natural behaviour by allowing two spiders to freely move and interact within the controlled environment. This approach aligns with principles from computational ethology, which seeks to balance the trade-off between repeatability/standardization and observing free, naturalistic behaviour. By using this paradigm, we aimed to capture behaviours that closely resemble those exhibited in their natural habitat. This setup was chosen to balance the need for ecological validity with the requirements for standardized data collection. 

      Also, and this point has been raised above, by observing the spiders' natural interactions without restraining them or using artificial stimuli like computer animations, we aimed to capture behaviours that closely resemble their natural responses to conspecifics. In contrast, we would not have any clear expectations regarding responses to arbitrarily designed artificial stimuli. This method provides a more ecologically valid assessment of the spiders' recognition abilities.

      There are a few details wrong in Line 41. 'Salticidae' is a family name and shouldn't be italicized. Also, the sentence suggests that there is a spider called a 'jumping spider' in the family Salticidae, which is technically called Phidippus regius. To clarify, all spiders in the family Salticidae are known as jumping spiders, and one species of jumping spiders is called Phidippus regius.

      We will correct this in the manuscript to accurately reflect the classification and terminology. Thank you for pointing out these inaccuracies.

      A manuscript on individual recognition by a salticid should include citations to earlier papers that have already considered individual recognition by salticids. As well as the paper by Tedore and Johnsen (2013), the authors should be aware of the following papers.

      Clark, R. J., & Jackson, R. R. (1994). Portia labiata, a cannibalistic jumping spider, discriminates between its own and foreign egg sacs. International Journal of Comparative Psychology, 7, 3843.

      Clark, R. J., & Jackson, R. R. (1994). Self-recognition in a jumping spider: Portia labiata females discriminate between their own draglines and those of conspecifics. Ethology, Ecology & Evolution, 6, 371-375.

      Clark, R. J., & Jackson, R. R. (1995). Araneophagic jumping spiders discriminate between the draglines of familiar and unfamiliar conspecifics. Ethology, Ecology & Evolution, 7, 185-190.

      We appreciate the reviewer's suggestion to include citations to these earlier papers. We will add the recommended references to provide a comprehensive background.

      In Line 203, I would not consider "interaction with human caretakers and experimenters" to be a form of behavioral enrichment. This kind of interaction has the potential to be stressful for the spiders, rather than enriching. I suggest deleting that part of the sentence.

      We appreciate the reviewer's feedback and agree that interactions with human caretakers and experimenters might not always be enriching and could potentially be stressful for the spiders. We will remove that part of the sentence to better reflect the intended meaning.

      Reviewer #3 (Recommendations For The Authors):

      This manuscript is useful and interesting, and I predict that it will be influential, but more attention should be given to stating the objective and conclusion accurately and clearly. As I understand it, the objective was to investigate a specific hypothesis: that Phidippus regius has a capacity to identify conspecific individuals as particular individuals (i.e., individual identification). Strong evidence supporting this hypothesis being true would be especially remarkable because I am unaware of any published work having shown evidence of a spider expressing this specific perceptual capacity.

      Thank you for recognizing the significance and potential influence of our manuscript. We agree that clearly stating the objective and conclusions is essential for conveying the importance of our findings. Our results provide robust evidence supporting the hypothesis that Phidippus regius can recognize and remember individual conspecifics. We will revise the manuscript to more clearly highlight the objective and our conclusions, emphasizing the novel evidence for individual identification in these spiders.

      Based on reading this manuscript and based on my understanding of the meaning of 'individual identification', it seems to me that the hypothesis that P. regius has a capacity for individual identification might or might not be true, and the experiments in this manuscript cannot tell us which is the case. 

      We respectfully disagree with the reviewer's assessment. Our experiments were carefully designed to test whether P. regius has the capacity for individual identification, and our results provide clear evidence supporting this hypothesis. The systematic differences in the spiders' behaviour when encountering familiar versus novel individuals indicate that they can recognize and remember specific conspecifics. We will revise the manuscript to ensure that the evidence and conclusions are stated more clearly to address any potential misunderstandings.

      Determining which is the case would have required research that made better use of the literature, and displayed more critical thinking. addressed credible alternative hypotheses and adopted experimental methods that focused more strictly on individual identification. 

      The distinction between whether P. regius has a capacity for individual identification is not ambiguous in our study. Our findings clearly demonstrate this capacity through systematic behavioural responses to familiar versus novel individuals. As pointed out above, the experimental procedure might be complex, but results are systematic despite this complexity. The experiments were designed to directly address the hypothesis of individual identification, and the data robustly support our conclusions. While considering alternative hypotheses is important, the results we present provide a coherent and compelling case for individual identification in P. regius. We will ensure our manuscript clearly articulates this narrative and the supporting evidence.

      At the same time, I also appreciate that asking for all of that at once would be asking for too much. As I see it, this manuscript tells us about research that moves us closer to a clear focus on the details and questions that will matter in the context of considering a hypothesis that is strictly about individual identification. More importantly, I think this research reveals a perceptual capacity that is remarkable even if it is not strictly a capacity for individual identification.

      We understand the desire for a more focused exploration of individual identification with paradigms more familiar to the reviewers and we acknowledge that further detailed studies could enhance our understanding of this capacity. However, our findings do indeed suggest that Phidippus regius exhibits a remarkable perceptual capacity for recognizing and remembering individual conspecifics. The systematic behavioural responses observed in our experiments strongly indicate that these spiders possess the ability for individual recognition. While our study may not have explored every potential detail (e.g. which features are most crucial for the memory matching processes), the evidence we present robustly supports the conclusion of individual identification.

      We acknowledge that it is indeed valuable to follow established paradigms and build upon the frameworks that have been used successfully in similar species and studies. These paradigms provide a solid foundation for scientific inquiry and allow for comparability across different research efforts. However, it is equally important to acknowledge and explore alternative approaches. Scientific progress is driven not only by replication but also by innovation. By employing new paradigms, researchers can uncover novel insights and push the boundaries of current understanding. The paradigm we used in our study, while different from those traditionally applied to similar research, is not an invention but a well-established method in various domains. It represents an innovative application in the context of our specific research questions, offering a fresh perspective and contributing to the advancement of the field.

      As I understand it, 'individual identification' means identifying another individual as being a particular individual instead of a member of a larger set (or 'class') of individuals. An 'individual' is a set containing a single individual. Interesting examples of identifying members of larger sets include discriminating between familiar and unfamiliar individuals. In the context of the specific experiments in this manuscript, familiar-unfamiliar discrimination means discriminating between recently-seen and not-so-recently-seen individuals. My impression is that the experiments in this manuscript have given us a basis for concluding that P. regius has a capacity for familiarunfamiliar (recently seen versus not so recently seen) discrimination. If this is the case, then I think this is the conclusion that should be emphasised. This would be an important conclusion.

      I appreciate that, depending on how we use the words, familiar-unfamiliar discrimination might be construed as being 'individual identification'. An individual is identified as 'the individual recently seen'. As a casual way of speaking, it can be reasonable to call this 'individual identification'. The difficulty comes from the way calling this 'individual identification' can suggest something more than has been demonstrated. To navigate through this difficulty, we need an expression to use for a capacity that goes beyond familiar-unfamiliar discrimination. In the context of this manuscript about P. regius, we need expressions that will make it easy to consider two things. One of these things is a capacity for familiar-unfamiliar discrimination. The other is the capacity to identify another individual as being a particular individual.

      We appreciate the reviewer's insightful comments on the distinction between familiar-unfamiliar discrimination and individual identity recognition. Our study indeed focuses on demonstrating that Phidippus regius can recognize and remember individual conspecifics, providing evidence for individual identity recognition.

      Two specific behavioural hallmarks that speak against familiarity recognition:

      First, the significant dishabituation response to novel individuals introduced after multiple sessions underscores the specificity of the recognition. This shows that the spiders' habituation is not general but specific to familiar individuals. 

      Second, the pattern of habituation over the sessions provides further evidence: We observed the strongest systematic modulation in Session 1, a reduced modulation in Session 2, and a further diminished effect in Session 3. If the spiders were only responding based on familiarity, we would expect a more drastic decrease, resulting in a washed-out non-effect by Session 2. However, the continued, though diminishing, differentiation between habituation and dishabituation trials across sessions indicates that the spiders are not merely responding to a general sense of familiarity but are engaging in individual recognition. In other words, the spiders' ability to distinguish between familiar and novel individuals even after repeated exposures suggests that they are not just recognizing a familiar status but are identifying specific individuals.

      Things people do might help clarify what this means. People have an extraordinary capacity for identifying other individuals as particular individuals. Often this is based on giving each other names. Imagine we are letting somebody see photographs and asking them to identify who they see. The answer might be, 'somebody familiar' or 'somebody I saw recently' (familiar-unfamiliar discrimination); or the question might be answered by naming a particular individual (individual identification).

      We appreciate the reviewer's efforts to clarify the distinction between familiar-unfamiliar discrimination and individual recognition using human examples. However, we believe this comparison might not fully capture the complexity of individual recognition in non-human animals. 

      Familiarity recognition refers to recognizing someone as having been seen or encountered before without necessarily distinguishing them from others in the same category. On the other hand, identity recognition involves recognizing a specific individual based on unique characteristics (or features). In humans, this often involves naming, but more critically, like in most animals, it involves recognizing visual, auditory, chemical or other sensory cues. In animals, including spiders, individual recognition does not involve and let alone rely on naming but on the ability to distinguish between individuals based on sensory cues and learnt associations. This is a valid and well-documented form of individual recognition across many species.

      Individual recognition does not require naming or the assignment of a referential label. Animals can distinguish between specific individuals based on previously perceived and stored features and characteristics. Naming is the exception rather than the rule in the animal kingdom. Only a few species, such as humans and maybe certain cetaceans, use naming for identity recognition. This is an evolutionary rarity and not the standard mechanism for individual recognition, which primarily relies on sensory cues and learnt associations. Furthermore, the mechanism of recognition in both humans and animals involves a complex process of matching incoming sensory and perceptual information with stored memory representations. Naming is merely a tool for communication, allowing us to convey which individual we are referring to. It is not the mechanism by which recognition occurs. The core of individual recognition is this matching process, where sensory cues (visual, auditory, chemical, etc.) are compared to memory traces of previously encountered individuals. Therefore, the suggestion that individual identification necessitates naming misrepresents the actual cognitive processes involved. 

      We can think of individual identification being based on more fine-grained discrimination (with this, set size = one), with familiar-unfamiliar discrimination being more coarse-grained discrimination (with this, set size can be more than one). Restricting the expression 'individual identification' to instances of having the capacity to identify another individual as being a particular individual (set size = one) is better aligned with normal usage of this expression.

      Absolutely, the distinction between fine-grained and coarse-grained discrimination aligns with the concept of different category levels, such as basic and subordinate levels, put forward by Eleanor Rosch (e.g. Rosch, 1973). In the context of individual recognition, fine-grained discrimination (where set size = one) refers to the ability to identify a specific individual based on unique characteristics. This is referred to as subordinate level categorization. Coarse-grained discrimination (where set size can be more than one) refers to recognizing someone as familiar without distinguishing them from others in the same category, more similar to basic level categorization. 

      Rosch, E.H. (1973). "Natural categories". Cognitive Psychology. 4 (3): 328–50.doi:10.1016/0010-0285(73)90017-0

      There is a strong emphasis on an asocial-social distinction in this manuscript. It seems to me that this needs to be focused more clearly on the specific factors that would make a capacity for individual identification beneficial. In the context of this manuscript, the term 'social' may suggest too much. It seems to me that the issue that matters the most is whether individuals live in situations where important encounters occur frequently between the same individuals. Irrespective of whether other notions of the meaning of 'social' also apply, there are salticids that live in aggregated situations where they frequently have important encounters with each other. This is the case with Phidippus regius in the field in Florida, but I realize that there may not be much published information about the natural history of this salticid. Even so, there are salticids to which the word 'social' has been applied in published literature.

      We appreciate the reviewer's comments on the asocial-social distinction and we agree that this terminology might need refinement. Our intent was not to categorize Phidippus regius rigidly but to explore the contextual factors influencing the benefits of individual identification. The critical factor in our study is indeed the frequency and importance of encounters between individuals, rather than a broader social structure. We will revise the manuscript to reflect this more nuanced perspective, focusing on the ecological validity of our experimental design and the adaptive significance of individual recognition in environments where repeated encounters can occur.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review):

      The authors observed a decline in autophagy and proteasome activity in the context of Milton knockdown. Through proteomic analysis, they identified an increase in the protein levels of eIF2β, subsequently pinpointing a novel interaction within eIF subunits where eIF2β contributes to the reduction of eIF2α phosphorylation levels. Furthermore, they demonstrated that overexpression of eIF2β suppresses autophagy and leads to diminished motor function. It was also shown that in a heterozygous mutant background of eIF2β, Milton knockdown could be rescued. This work represents a novel and significant contribution to the field, revealing for the first time that the loss of mitochondria from axons can lead to impaired autophagy function via eIF2β, potentially influencing the acceleration of aging.

      Thank you so much for your review and comments.

      Reviewer #2 (Public Review):

      In the manuscript, the authors aimed to elucidate the molecular mechanism that explains neurodegeneration caused by the depletion of axonal mitochondria. In Drosophila, starting with siRNA depletion of Milton and Miro, the authors attempted to demonstrate that the depletion of axonal mitochondria induces the defect in autophagy. From proteome analyses, the authors hypothesized that autophagy is impacted by the abundance of eIF2β and the phosphorylation of eIF2α. The authors followed up the proteome analyses by testing the effects of eIF2β overexpression and depletion on autophagy. With the results from those experiments, the authors proposed a novel role of eIF2β in proteostasis that underlies neurodegeneration derived from the depletion of axonal mitochondria.

      The manuscript has several weaknesses. The reader should take extra care while reading this manuscript and when acknowledging the findings and the model in this manuscript.

      The defect in autophagy by the depletion of axonal mitochondria is one of the main claims in the paper. The authors should work more on describing their results of LC3-II/LC3-I ratio, as there are multiple ways to interpret the LC3 blotting for the autophagy assessment. Lysosomal defects result in the accumulation of LC3-II thus the LC3-II/LC3-I ratio gets higher. On the other hand, the defect in the early steps of autophagosome formation could result in a lower LC3-II/LC3-I ratio. From the results of the actual blotting, the LC3-I abundance is the source of the major difference for all conditions (Milton RNAi and eIF2β overexpression and depletion).

      Thank you so much for your review and comments. As the reviewer pointed out, LC3-II/LC3- I ratio changes do not necessarily indicate autophagy defects. However, since p62 accumulation (Figure 2B, 2E, 3E, Figure 8C, Figure 9C), these results collectively suggest that autophagy is lowered.

      As the reviewer pointed out and we described in v2, milton knockdown, eIF2β overexpression and heterozygosity increase LC3-I abundance. We do not know how these conditions increase LC3-I at this moment. We will investigate the cause of the increase in LC3-I by milton knockdown and how it contribute to impaired autophagy. We added this discussion as:

      Lines 388-393; ‘Our results also suggest that milton knockdown and overexpression of eIF2β affect autophagy via increased LC3-I abundance (Figures 2 and 7), suggesting an unconventional mechanism of autophagy suppression. To our knowledge, the roles of eIF2β in aging and autophagy independent of ISR have not been reported. Our results revealed a novel function of eIF2β to maintain proteostasis during aging, while further investigation is required to elucidate underlying mechanisms.’

      Another main point of the paper is the up-regulation of eIF2β by depleting the axonal mitochondria leads to the proteostasis crisis. This claim is formed by the findings from the proteome analyses. The authors should have presented their proteomic data with much thorough presentation and explanation. As in the experiment scheme shown in Figure 4A, the author did two proteome analyses: one from the 7-day-old sample and the other from the 21-day-old sample. The manuscript only shows a plot of the result from the 7-day-old sample, but that of the result from the 21-day-old sample. For the 21-day-old sample, the authors only provided data in the supplemental table, in which the abundance ratio of eIF2β from the 21-day-old sample is 0.753, meaning eIF2β is depleted in the 21-day-old sample. The authors should have explained the impact of the eIF2β depletion in the 21-day-old sample, so the reader could fully understand the authors' interpretation of the role of eIF2β on proteostasis.

      Thank you for pointing it out. Plots of the 21-day-old proteome results was included in the main figure (Figure 4C) in v2. In this revision, we further analyzed age-dependent changes of eIF2β levels by western blotting (Figure 4G). We found that eIF2β levels increased during aging until 49-day-old then reduced at 63-day-old (Figure 4G in the revised manuscript). At the young age, eIF2β levels were higher in milton knockdown brain compared to the control , and eIF2β levels were lower in milton knockdown brains than those in the control. These results suggest that milton knockdown accelerates age-dependent changes in eIF2β. We added these results and discussion in the revised manuscript.

      Lines 240-243: ‘We also investigated age-dependent changes in eIF2β by western blotting of control flies at 7-, 21-, 35-, and 49-, and 63-day-old. eIF2β levels increased during aging until 49-day-old (Figure 4G). These results suggest that upregulation of eIF2β in milton knockdown fly brain reflects early an onset of age-dependent increase of eIF2β levels.’

      Lines 363-368: ‘We also found that eIF2β protein levels increase in an age-dependent manner until 49-day-old and reduces after that (Figure 4G). In the brains with neuronal knockdown of milton, eIF2β levels were higher at 7-day-old than those in control and lower at the 21-day-old (Figure 4D and Supplementary table). These results suggest that milton knockdown is likely accelerating age-dependent changes rather than increasing their magnitude.’Our new data indicate that eIF2β levels increase during aging in control flies until 49-day-old, then reduce at 63-day-old (included as Figure 4G in the revised manuscript). These age- dependent changes might explain the reduction in eIF2β levels in Milton knockdown compared to the control in middle age: higher eIF2β levels in milton knockdown flies at a young age than control and lower eIF2β levels in the middle-aged flies may reflect premature aging.

      We included these sentences in the discussion section:

      Lines 240-243:‘We also investigated age-dependent changes in eIF2β by western blotting of control flies at 7-, 21-, 35-, and 49-, and 63-day-old. eIF2β levels increased during aging until 49-day-old (Figure 4G). These results suggest that upregulation of eIF2β in milton knockdown fly brain reflects early an onset of age-dependent increase of eIF2β levels.’

      Lines 359-371: ‘Our results suggest that the loss of axonal mitochondria is an event upstream of proteostasis collapse during aging. The number of puncta of ubiquitinated proteins was higher in milton knockdown at 14-day-old, but there was no significant difference at 30-day-old (Figure 1). Proteome analyses also showed that age-related pathways, such as immune responses, are enhanced in young flies with milton knockdown (Table 2). We also found that eIF2β protein levels increase in an age-dependent manner until 49-day-old and reduces after that (Figure 4G). In the brains with neuronal knockdown of milton, eIF2β levels were higher at 7-day-old than those in control and lower at the 21-day-old (Figure 4D and Supplementary table). These results suggest that milton knockdown is likely accelerating age-dependent changes rather than increasing their magnitude. Disruption of proteostasis is expected to contribute neurodegeneration38 , and it would be interesting to analyze the sequence of protein accumulation and axonal degeneration in milton knockdown (24,29 and Figure 1) in detail with higher time resolution.’


      With our new data, we revised some of our responses to the first round of reviewer’s comments.

      Reviewer #1 (Public Review):

      The authors observed a decline in autophagy and proteasome activity in the context of Milton knockdown. Through proteomic analysis, they identified an increase in the protein levels of eIF2β, subsequently pinpointing a novel interaction within eIF subunits where eIF2β contributes to the reduction of eIF2α phosphorylation levels. Furthermore, they demonstrated that overexpression of eIF2β suppresses autophagy and leads to diminished motor function. It was also shown that in a heterozygous mutant background of eIF2β, Milton knockdown could be rescued. This work represents a novel and significant contribution to the field, revealing for the first time that the loss of mitochondria from axons can lead to impaired autophagy function via eIF2β, potentially influencing the acceleration of aging. To further support the authors' claims, several improvements are necessary, particularly in the methods of quantification and the points that should be demonstrated quantitatively. It is crucial to investigate the correlation between aging and the proteins eIF2β and eIF2α.

      Thank you so much for your review and comments. We included analyses of protein levels of eIF2α, eIF2β, and eIF2γ at 7 days and 21 days (Figure 4D). The manuscript was revised as below;

      Lines 246-249 ‘As for the other subunits of eIF2 complex, proteome analysis did not detect a significant difference in the protein levels of eIF2α and eIF2γ between milton knockdown and control flies at 7 and 21 days (Figure 4D).’

      NEW TEXT: We analyzed age-dependent changes of eIF2β levels in more detail by western blotting (Figure 4G). We found that eIF2β levels increased during aging until 49-day-old then reduced at 63-day-old (Figure 4G in the revised manuscript). At the young age, eIF2β levels were higher in milton knockdown brain compared to the control , and eIF2β levels were lower in milton knockdown brains than those in the control. These results suggest that Milton knockdown accelerates age-dependent changes in eIF2β.. We added these results and discussion in the revised manuscript.

      NEW TEXT: Lines 240-243: ‘We also investigated age-dependent changes in eIF2β by western blotting of control flies at 7-, 21-, 35-, and 49-, and 63-day-old. eIF2β levels increased during aging until 49-day-old (Figure 4G). These results suggest that upregulation of eIF2β in milton knockdown fly brain reflects early an onset of age-dependent increase of eIF2β levels.’

      NEW TEXT: Lines 363-368: ‘We also found that eIF2β protein levels increase in an age-dependent manner until 49-day-old and reduces after that (Figure 4G). In the brains with neuronal knockdown of milton, eIF2β levels were higher at 7-day-old than those in control and lower at the 21-day-old (Figure 4D and Supplementary table). These results suggest that milton knockdown is likely accelerating age-dependent changes rather than increasing their magnitude.’

      Reviewer #2 (Public Review):

      In the manuscript, the authors aimed to elucidate the molecular mechanism that explains neurodegeneration caused by the depletion of axonal mitochondria. In Drosophila, starting with siRNA depletion of Milton and Miro, the authors attempted to demonstrate that the depletion of axonal mitochondria induces the defect in autophagy. From proteome analyses, the authors hypothesized that autophagy is impacted by the abundance of eIF2β and the phosphorylation of eIF2α. The authors followed up the proteome analyses by testing the effects of eIF2β overexpression and depletion on autophagy. With the results from those experiments, the authors proposed a novel role of eIF2β in proteostasis that underlies neurodegeneration derived from the depletion of axonal mitochondria.

      The manuscript has several weaknesses. The reader should take extra care while reading this manuscript and when acknowledging the findings and the model in this manuscript.

      The defect in autophagy by the depletion of axonal mitochondria is one of the main claims in the paper. The authors should work more on describing their results of LC3-II/LC3-I ratio, as there are multiple ways to interpret the LC3 blotting for the autophagy assessment. Lysosomal defects result in the accumulation of LC3-II thus the LC3-II/LC3-I ratio gets higher. On the other hand, the defect in the early steps of autophagosome formation could result in a lower LC3-II/LC3-I ratio. From the results of the actual blotting, the LC3-I abundance is the source of the major difference for all conditions (Milton RNAi and eIF2β overexpression and depletion). In the text, the authors simply state the observation of their LC3 blotting. The manuscript lacks an explanation of how to evaluate the LC3-II/LC3-I ratio. Also, the manuscript lacks an elaboration on what the results of the LC3 blotting indicate about the state of autophagy by the depletion of axonal mitochondria.

      Thank you for pointing it out, and we apologize for an insufficient description of the result. We included quantitation of the levels of LC3-I and LC3-II in Figures 2A, 2D, 3D, 7B (Figure 6B in the previous version), and 8B (Figure 7B in the previous version). As the reviewer pointed out, LC3-II/LC3-I ratio changes do not necessarily indicate autophagy defects. However, since p62 accumulation (Figure 2B, 2E, 3E, 7C (Figure 6C in the previous version), 8C (Figure 7C in the previous version)), these results collectively suggest that autophagy is lowered. We revised the manuscript to include this discussion as below:

      Lines 174-186 ‘During autophagy progression, LC3 is conjugated with phosphatidylethanolamine to form LC3-II, which localizes to isolation membranes and autophagosomes. LC3-I accumulation occurs when autophagosome formation is impaired, and LC3-II accumulation is associated with lysosomal defects31,32. p62 is an autophagy substrate, and its accumulation suggests autophagic defects31,32. We found that milton knockdown increased LC3-I, and the LC3-II/LC3-I ratio was lower in milton knockdown flies than in control flies at 14-day-old (Figure 2A). We also analyzed p62 levels in head lysates sequentially extracted using detergents with different stringencies (1% Triton X-100 and 2% SDS). Western blotting revealed that p62 levels were increased in the brains of 14-day-old of milton knockdown flies (Figure 2B). The increase in the p62 level was significant in the Triton X-100- soluble fraction but not in the SDS-soluble fraction (Figure 2B), suggesting that depletion of axonal mitochondria impairs the degradation of less-aggregated proteins.’

      Line 189-190: 'At 30 day-old, LC3-I was still higher, and the LC3-II/LC3-I ratio was lower, in milton knockdown compared to the control (Figure 2D).’

      Line 202-203: ‘However, in contrast with milton knockdown, Pfk knockdown did not affect the levels of LC3-I, LC3-II or the LC3-II/LC3-I ratio (Figure 3D).’

      Line 279-285: ‘Neuronal overexpression of eIF2β increased LC3-II, while the LC3-II/LC3-I ratio was not significantly different (Figure 7A and B). Overexpression of eIF2β significantly increased the p62 level in the Triton X-100-soluble fraction (Figure 7C, 4-fold vs. control, p <0.005 (1% Triton X-100)) but not in the SDS-soluble fraction (Figure 7C, 2-fold vs. control, p\= 0.062 (2% SDS)), as observed in brains of milton knockdown flies (Figure 2B). These data suggest that neuronal overexpression of eIF2β accumulates autophagic substrates.’

      Line 311-319: ‘Neuronal knockdown of milton causes accumulation of autophagic substrate p62 in the Triton X-100-soluble fraction (Figure 2B), and we tested if lowering eIF2β ameliorates it. We found that eIF2β heterozygosity caused a mild increase in LC3-I levels and decreases in LC3-II levels, resulting in a significantly lower LC3-II/LC3-I ratio in milton knockdown flies (Figure 8B). eIF2β heterozygosity decreased the p62 level in the Triton X- 100-soluble fraction in the brains of milton knockdown flies (Figure 8C). The p62 level in the SDS-soluble fraction, which is not sensitive to milton knockdown (Figure 2B), was not affected (Figure 8C). These results suggest that suppression of eIF2β ameliorates the impairment of autophagy caused by milton knockdown.’

      Another main point of the paper is the up-regulation of eIF2β by depleting the axonal mitochondria leads to the proteostasis crisis. This claim is formed by the findings from the proteome analyses. The authors should have presented their proteomic data with much thorough presentation and explanation. As in the experiment scheme shown in Figure 4A, the author did two proteome analyses: one from the 7-day-old sample and the other from the 21-day-old sample. The manuscript only shows a plot of the result from the 7-day-old sample, but that of the result from the 21-day-old sample. For the 21-day-old sample, the authors only provided data in the supplemental table, in which the abundance ratio of eIF2β from the 21-day-old sample is 0.753, meaning eIF2β is depleted in the 21-day-old sample. The authors should have explained the impact of the eIF2β depletion in the 21-day-old sample, so the reader could fully understand the authors' interpretation of the role of eIF2β on proteostasis.

      NEW TEXT: Thank you for pointing it out. We included plots of the 21-day-old proteome results as a part of the main figure (Figure 4C). As the reviewer pointed out, eIF2β protein levels are lower in milton knockdown background at the 21-day-old compared to the control. Since a reduction in the eIF2_β_ ameliorated milton knockdown-induced locomotor defects in aged flies (Figure 7D), the reduction in eIF2β observed in the 21-day-old milton knockdown flies is not likely to negatively contribute to milton knockdown-induced defects. Our new data indicate that eIF2β levels increase during aging in control flies until 49-day-old, then reduce at 63-day-old (included as Figure 4G in the revised manuscript). These age-dependent changes might explain the reduction in eIF2β levels in Milton knockdown compared to the control in middle age: higher eIF2β levels in milton knockdown flies at a young age than control and lower eIF2β levels in the middle-aged flies may reflect premature aging.

      NEW TEXT: We included these sentences in the discussion section:

      NEW TEXT: Lines 240-243:‘We also investigated age-dependent changes in eIF2β by western blotting of control flies at 7-, 21-, 35-, and 49-, and 63-day-old. eIF2β levels increased during aging until 49-day-old (Figure 4G). These results suggest that upregulation of eIF2β in milton knockdown fly brain reflects early an onset of age-dependent increase of eIF2β levels.’

      NEW TEXT: Lines 359-371: ‘Our results suggest that the loss of axonal mitochondria is an event upstream of proteostasis collapse during aging. The number of puncta of ubiquitinated proteins was higher in milton knockdown at 14-day-old, but there was no significant difference at 30-day-old (Figure 1). Proteome analyses also showed that age-related pathways, such as immune responses, are enhanced in young flies with milton knockdown (Table 2). We also found that eIF2β protein levels increase in an age-dependent manner until 49-day-old and reduces after that (Figure 4G). In the brains with neuronal knockdown of milton, eIF2β levels were higher at 7-day-old than those in control and lower at the 21-day-old (Figure 4D and Supplementary table). These results suggest that milton knockdown is likely accelerating age-dependent changes rather than increasing their magnitude. Disruption of proteostasis is expected to contribute neurodegeneration38 , and it would be interesting to analyze the sequence of protein accumulation and axonal degeneration in milton knockdown (24,29 and Figure 1) in detail with higher time resolution.’

      The manuscript consists of several weaknesses in its data and explanation regarding translation.

      (1) The authors are likely misunderstanding the effect of phosphorylation of eIF2α on translation. The P-eIF2α is inhibitory for translation initiation. However, the authors seem to be mistaken that the down-regulation of P-eIF2α inhibits translation.

      We are sorry for our insufficient explanation in the previous version. As the reviewer pointed out, it is well known that the phosphorylated form of eIF2α inhibits translation initiation. Neuronal knockdown of milton caused a reduction in p-eIF2α (Figure 5D and E (Figure 4J and K in the previous version)), and it also lowered translation (Figure 6 (Figure 5 in the previous version)); the relationship between these two events is currently unclear. We do not think that a reduction in the p-eIF2α suppressed translation; rather, we propose that the unbalance of expression levels of the components of eIF2 complexes negatively affects translation. We revised discussion sections to describe our interpretation more in detail as below:

      Line 374-384: ‘eIF2β is a component of eIF2, which meditates translational regulation and ISR initiation. When ISR is activated, phosphorylated eIF2α suppresses global translation and induces translation of ATF4, which mediates transcription of autophagy-related genes39,40. Since ISR can positively regulate autophagy, we suspected that suppression of ISR underlies a reduction in autophagic protein degradation. We found neuronal knockdown of milton reduced phosphorylated eIF2α, suggesting that ISR is reduced (Figure 5). However, we also found that global translation was reduced (Figure 6). Increased levels of eIF2β might disrupt the eIF2 complex or alter its functions. The stoichiometric mismatch caused by an imbalance of eIF2 components may inhibit ISR induction. Supporting this model, we found that eIF2β upregulation reduced the levels of p-eIF2α (Figure 7).’We have revised the graphical abstract and removed the eIF2 complex since its role in the loss of proteostasis caused by milton knockdown has not been elucidated yet.

      (2) The result of polysome profiling in Figure 4H is implausible. By 10%-25% sucrose density gradient, polysomes are not expected to be observed. The authors should have used a gradient with much denser sucrose, such as 10-50%.

      Thank you for pointing it out. It was a mistake of 10-50%, and we apologize for the oversight. It was corrected (Figure 6 (Figure 5 in the previous version)).

      (3) Also on the polysome profiling, as in the method section, the authors seemed to fractionate ultra-centrifuged samples from top to bottom and then measured A260 by a plate reader. In that case, the authors should have provided a line plot with individual data points, not the smoothly connected ones in the manuscript.

      Thank you for pointing it out. We revised the graph (Figure 6 (Figure 5 in the previous version)).

      (4) For both the results from polysome profiling and puromycin incorporation (Figure 4H and I), the difference between control siRNA and Milton siRNA are subtle, if not nonexistent. This might arise from the lack of spatial resolution in their experiment as the authors used head lysate for these data but the ratio of Phospho-eIF2α/eIF2α only changes in the axons, based on their results in Figure 4E-G. The authors could have attempted to capture the spatial resolution for the axonal translation to see the difference between control siRNA and Milton siRNA.

      Thank you for your comment. We agree that it would be an interesting experiment, but it will take a considerable amount of time to analyze axonal translation with spatial resolution. We will try to include such analyses in the future. For this manuscript, we revised the discussion section to include the reviewer's suggestion as below;

      Lines 355-357: ‘Further analyses to dissect the effects of milton knockdown on proteostasis and translation in the cell body and axon by experiments with spatial resolution would be needed.’

      Recommendations for the authors:

      From the Reviewing Editor:

      As the Reviewing Editor, I have read your manuscript and the associated peer reviews. I have concerns about publishing this work in its current form. I think that your manuscript cannot claim to have found a novel function of eIF2beta because of technical uncertainties and conceptual problems that should be addressed.

      Thank you so much for your review and comments. We addressed all the concerns raised by the reviewers. Point-by-point responses are listed below.

      First, your manuscript is based partly on what appears to be a mistaken understanding of the mechanistic basis of the ISR. Specifically, eIF2 is a heterotrimeric complex of alpha, beta, and gamma subunits. When eIF2a is phosphorylated, the heterotrimer adopts a new conformation. This conformation directly binds and inhibits eIF2B, the decameric GEF that exchanges the GDP bound to the gamma subunit of the eIF2 complex for GTP. Unless I misunderstood your paper, you seem to propose that decreasing levels of phospho-eIF2a will inhibit translation, but this is backward from what we know about the ISR.

      Thank you for your insightful comment, and we are sorry for the confusion. We did not mean to propose that decreasing levels of phospho-eIF2_a_ inhibits translation. We apologize for our insufficient explanation, which might have caused a misunderstanding (Lines 312-318 in the original version). We agree with the reviewer that ‘mismatch due to elevated eIF2-beta could change the behavior of the ISR’. We revised the text in the result section as follows:

      Lines 263-268 (in the Result section) ‘Phosphorylation of eIF2α induces conformational changes in the eIF2 complex and inhibits global translation36. To analyze the effects of milton knockdown on translation, we performed polysome gradient centrifugation to examine the level of ribosome binding to mRNA. Since p-eIF2α was downregulated, we hypothesized that milton knockdown would enhance translation. However, unexpectedly, we found that milton knockdown significantly reduced the level of mRNAs associated with polysomes (Figure 6A and B).’

      Lines 374-384 (in the Discussion section): ‘eIF2β is a component of eIF2, which meditates translational regulation and ISR initiation. When ISR is activated, phosphorylated eIF2α suppresses global translation and induces translation of ATF4, which mediates transcription of autophagy-related genes39,40. Since ISR can positively regulate autophagy, we suspected that suppression of ISR underlies a reduction in autophagic protein degradation. We found neuronal knockdown of milton reduced phosphorylated eIF2α, suggesting that ISR is reduced (Figure 5). However, we also found that global translation was reduced (Figure 6). Increased levels of eIF2β might disrupt the eIF2 complex or alter its functions. The stoichiometric mismatch caused by an imbalance of eIF2 components may inhibit ISR induction. Supporting this model, we found that eIF2β upregulation reduced the levels of p-eIF2α (Figure 7).’

      It may be possible that a stoichiometric mismatch due to elevated eIF2-beta could change the behavior of the ISR, but your paper doesn't adequately address the expression levels of all three eIF2 subunits: alpha, beta, and gamma. The proteomic data shown in Fig 4B is unconvincing on its own because the changes in the beta subunit are subtle. The Western blot in Figure 4C suggests that the KD changes the mass or mobility of the beta subunit, and most importantly, there are no Western blots measuring the levels of eIF2a, eIF2a-phospho, or eIF2-gamma.

      We appreciate the reviewer’s comment and agree that the stoichiometric mismatch due to elevated eIF2β may interfere with ISR. We found overexpression of eIF2β lowered p-eIF2 alpha (Figure S2 in V1), which supports this model. We included this data in the main figure in the revised manuscript (Figure 7D) and revised the text as below:

      Lines 286-289: ‘Since milton knockdown reduced the p-eIF2α level (Figure 5E), we asked whether an increase in eIF2β affects p-eIF2α. Neuronal overexpression of eIF2β did not affect the eIF2α level but significantly decreased the p-eIF2α level (Figure 7D and E).’

      Expression data of eIF2α and eIF2γ from proteomic analyses has been extracted from proteome analyses and included as a table (Figure 4D). Western blots of phospho-eIF2a (Figure S1 in V1) in the main figure (Figure 5B). The result section was revised as below;

      Lines 246-249: ‘As for the other subunits of eIF2 complex, proteome analysis did not detect a significant difference in the protein levels of eIF2α and eIF2γ between milton knockdown and control flies at 7 and 21 days (Figure 4D).’

      NEW TEXT: We also analyzed age-dependent changes of eIF2β by western blotting and found that eIF2β increased during aging until 49-day-old. We included this result as Figure 4G and added these sentences in the result section:

      NEW TEXT: Line 240-243: ‘We also investigated age-dependent changes in eIF2β by western blotting of control flies at 7-, 21-, 35-, and 49-, and 63-day-old. eIF2β levels increased during aging until 49-day-old (Figure 4G). These results suggest that upregulation of eIF2β in milton knockdown fly brain reflects early an onset of age-dependent increase of eIF2β levels.

      Reviewer #1 (Recommendations For The Authors):

      L125-128: In this section, while the efficiency of Milton knockdown is referenced from a previous publication, it is necessary to also mention that the Miro knockdown has been similarly reported in the literature. Additionally, the Methods section lacks details on the Miro RNAi line used, and Table 2 does not include the genotype for Miro RNAi. This information should be included for clarity and completeness.

      Thank you for pointing it out. Knockdown efficiency with this strain has been reported (Iijima- Ando et al., PLoS Genet, 2012). We revised the text to include citation and knockdown efficiency as follows:

      Lines 136-147: ‘There was no significant increase in ubiquitinated proteins in milton knockdown flies at 1-day old, suggesting that the accumulation of ubiquitinated proteins caused by milton knockdown is age-dependent (Figure S1). We also analyzed the effect of the neuronal knockdown of Miro, a partner of milton, on the accumulation of ubiquitin-positive proteins. Since severe knockdown of Miro in neurons causes lethality, we used UAS-Miro RNAi strain with low knockdown efficiency, whose expression driven by elav-GAL4 caused 30% reduction of Miro mRNA in head extract24. Although there was a tendency for increased ubiquitin- positive puncta in Miro knockdown brains, the difference was not significant (Figure 1B, p>0.05 between control RNAi and Miro RNAi). These data suggest that the depletion of axonal mitochondria induced by milton knockdown leads to the accumulation of ubiquitinated proteins before neurodegeneration occurs.’

      L132-L136: The current phrasing in this section suggests an increase in ubiquitinated proteins for both Milton and Miro knockdowns. However, since there is no significant difference noted for Miro, it is incorrect to state an increase in ubiquitin-positive puncta. Furthermore, combining the results of Milton knockdown to claim an increase in ubiquitinated proteins prior to neurodegeneration is misleading. At the very least, the expression here needs to be moderated to accurately reflect the findings.

      Thank you for pointing it out. We revised the text as above.

      L137-L141: Results in Figure 1 indicate that Milton knockdown leads to an increase in ubiquitinated proteins at 14 days, while Miro knockdown shows no difference from the control at either 14 or 30 days. Conversely, both the control and Miro exhibit an increase in ubiquitinated proteins with aging, but this trend does not seem to apply to Milton knockdown. This observation suggests that Milton KD may not affect the changes in protein quality control associated with aging. It implies that Milton's function might be more related to protein homeostasis in younger cells, or that changes due to aging might overshadow the effects of Milton knockdown. These interpretations should be included in the Results or Discussion sections for a more comprehensive analysis.

      NEW TEXT: Thank you for your insightful comment. As you mentioned, the accumulation of ubiquitinated proteins significantly increases only in young flies. Age-related pathways, such as immune responses, are highlighted in young milton knockdown flies but not in the aged flies. Our new result indicates that eIF2β increases during aging in control flies (included as Figure 4G in the revised manuscript), and upregulation of eIF2β in milton knockdown is only observed at a young age. These results suggest that milton knockdown does not increase the magnitude of age-dependent changes but accelerates their onset. We revised the text to include those points as follows:

      NEW TEXT: Lines 152-153: ‘These results suggest that depletion of axonal mitochondria may have more impact on proteostasis in young neurons than in old neurons.’

      NEW TEXT: Lines 359-371: ‘Our results suggest that the loss of axonal mitochondria is an event upstream of proteostasis collapse during aging. The number of puncta of ubiquitinated proteins was higher in milton knockdown at 14-day-old, but there was no significant difference at 30-day- old (Figure 1). Proteome analyses also showed that age-related pathways, such as immune responses, are enhanced in young flies with milton knockdown (Table 2). We also found that eIF2β protein levels increase in an age-dependent manner until 49-day-old and reduces after that (Figure 4G). In the brains with neuronal knockdown of milton, eIF2β levels were higher at 7-day-old than those in control and lower at the 21-day-old (Figure 4 and Supplementary table). These results suggest that milton knockdown is likely accelerating age-dependent changes rather than increasing their magnitude. Disruption of proteostasis is expected to contribute neurodegeneration38 , and it would be interesting to analyze the sequence of protein accumulation and axonal degeneration in milton knockdown (24,29 and Figure 1) in detail with higher time resolution.’

      L143 : Please remove the erroneously included quotation mark.

      Thank you for pointing it out. We corrected it.

      L145-L147:

      While it is understood that Milton knockdown results in a reduction of mitochondria in axons, as reported previously and seemingly indicated in Figure 1E, this paper repeatedly refers to axonal depletion of mitochondria. Therefore, it would be beneficial to quantitatively assess the number of mitochondria in the axonal terminals located in the lamina via electron microscopy. Such quantification would robustly reinforce the argument that mitochondrial absence in axons is a consequence of Milton knockdown.

      Thank you for pointing it out. We included quantitation of the number of mitochondria in the synaptic terminals (Figure 1E).

      The text and figure legend was revised accordingly:

      Lines 156-157: ‘As previously reported24, the number of mitochondria in presynaptic terminals decreased in milton knockdown (Figure 1E).’

      The knockdown of Milton is known to reduce mitochondrial transport from an early stage, but what about swelling? By observing swelling at 1 day and 14 days, it may be possible to confirm the onset of swelling and discuss its correlation with the accumulation of ubiquitinated proteins.

      Quantitation of axonal swelling has also been included (Figure 1F).

      We appreciate the reviewer's comments on the correlation between the accumulation of ubiquitinated proteins and axonal swelling. Axonal swelling was not observed at 3-days-old (Iijima-Ando et al., PLoS Genetics, 2012), indicating that axonal swelling is an age-dependent event. Dense materials are found in swollen axons more often than in normal axons, suggesting a positive correlation between disruption of proteostasis and axonal damage. It would be interesting to analyze the time course of events further; however, we feel it is beyond the scope of this manuscript. We revised the text to include this discussion as:

      Lines 157-160: ‘The swelling of presynaptic terminals, characterized by the enlargement and roundness, was not reported at 3-day-old24 but observed at this age with about 4% of total presynaptic terminals (Figure 1F, asterisks).’

      Lines 162-167: ‘Dense materials are rarely found in age-matched control neurons, indicating that milton knockdown induces abnormal protein accumulation in the presynaptic terminals (Figure 1G and H). In milton knockdown neurons, dense materials are found in swollen presynaptic terminals more often than in presynaptic terminals without swelling, suggesting a positive correlation between the disruption of proteostasis and axonal damage (Figure 1G).’

      Lines 369-371: ‘Disruption of proteostasis is expected to contribute neurodegeneration38 , and it would be interesting to analyze the sequence of protein accumulation and axonal degeneration in milton knockdown (24,29 and Figure 1) in detail with higher time resolution.’

      L147-L151: Though Figures 1F and 1G provide qualitative representations, it is advisable to quantitatively assess whether dense materials significantly accumulate. Such quantitative analysis would be required to verify the accumulation of dense materials in the context of the study.

      Thank you for pointing it out. We included quantitation of the number of neurons with dense material (Figure 1G). We revised the manuscript as follows:

      Line 162-164: ‘Dense materials are rarely found in age-matched control neurons, indicating that milton knockdown induces abnormal protein accumulation in the presynaptic terminals (Figure 1G and H).’

      Regarding Figure 1B, C:

      Even though the count of puncta in the whole brain appears to be fewer than 400, the magnification of the optic lobe suggests a substantial presence of puncta. Please clarify in the Methods section what constitutes a puncta and whether the quantification in the whole brain is based on a 2D or 3D analysis. Detail the methodology used for quantification.

      Thank you for your comment. We revised the method section to include more details as below:

      Lines 440-443: ‘Quantitative analysis was performed using ImageJ (National Institutes of Health) with maximum projection images derived from Z-stack images acquired with same settings. Puncta was identified with mean intensity and area using ImageJ.’

      What about 1-day-old specimens? Does Milton knockdown already show an increase in ubiquitinated protein accumulation at this early stage? Investigating whether ubiquitin-protein accumulation is involved in aging promotion or is already prevalent during developmental stages is a necessary experiment.

      Thank you for your comment. We carried out immunostaining with an anti-ubiquitin antibody in the brains at 1-day-old. No significant difference was detected between the control and milton knockdown. This result has been included as Figure S1 in the revised manuscript. The result section was revised as below:

      Line 136-139 ‘There was no significant increase in ubiquitinated proteins in milton knockdown flies at 1-day old, suggesting that the accumulation of ubiquitinated proteins caused by milton knockdown is age-dependent (Figure S1).’

      For Figure 1E: In the Electron Microscopy section of the Methods, define how swollen axons were identified and describe the quantification methodology used.

      Thank you for your comment. Swollen axons are, unlike normal axons, round in shape and enlarged. We revised the text as below;

      Lines 157-160: ‘The swelling of presynaptic terminals, characterized by the enlargement and roundness, was not reported at 3-day-old24 but observed at this age with about 4% of total presynaptic terminals (Figure 1F, asterisks).’

      Lines 689-691, Figure 1 legend: ‘Swollen presynaptic terminals (asterisks in (F)), characterized by the enlargement and higher circularity, were found more frequently in milton knockdown neurons.’

      L218-L219: Throughout the text, the expression 'eIF2β is "upregulated" in response to Milton knockdown' is frequently used. However, considering the presented results, it might be more accurate to interpret that under the condition of Milton knockdown, eIF2β is not undergoing degradation but rather remains stable.

      Thank you for pointing it out. We replaced ‘upregulated’ with ‘increased’ throughout the text.

      L234-L235: On what basis is the conclusion drawn that there is a reduction? Given that three experiments have been conducted, it would be possible and more convincing to quantify the results to determine if there is a significant decrease.

      Thank you for pointing it out. We quantified the AUC of polysome fraction and carried out a statistical analysis. There is a significant decrease in polysome in milton knockdown, and this result has been included in Figure 5B. We revised the figure and the legend accordingly.

      L236: 5H-> 4H

      Thank you for pointing it out, and we are sorry for the confusion. We corrected it.

      L238-L239: Since there is no significant difference observed, it may not be accurate to interpret a reduction in puromycin incorporation.

      Thank you for pointing it out. As described above, quantification of polysome fractions showed that milton knockdown significantly reduced polysome (Figure 6B (Figure 5B in the previous version)). We revised the manuscript as below;

      Lines 267-268: ‘However, unexpectedly, we found that milton knockdown significantly reduced the level of mRNAs associated with polysomes (Figure 6A and B).’

      Figure 5D and Figure 6D: Climbing assays have been conducted, but I believe experiments should also be performed to examine whether overexpression or heterozygous mutants of eIF2β induce or suppress degeneration.

      Thank you for pointing it out. We analyzed the eyes with eIF2β overexpression for neurodegeneration. Although there was a tendency of elevated neurodegeneration in the retina with eIF2β overexpression, the difference between control and eIF2β overexpression did not reach statistical significance (Figure S2). This result has been included as Figure S2 in the revised manuscript, and the following sentences have been included in the text:

      Lines 292-297: ‘We asked if eIF2β overexpression causes neurodegeneration, as depletion of axonal mitochondria in the photoreceptor neurons causes axon degeneration in an age- dependent manner24. eIF2β overexpression in photoreceptor neurons tends to increase neurodegeneration in aged flies, while it was not statistically significant (p>0.05, Figure S2).’

      L271-L272: The results in Figure 6B are surprising. I anticipated a greater increase compared to the Milton knockdown alone. While p62 appears to be reduced, it is not clear why these results lead to the conclusion that lowering eIF2β rescues autophagic impairment. Please add a discussion section to address this point.

      Thank you for pointing it out. We apologize for the unclear description of the result. Milton knockdown flies show p62 accumulation (Figure 2), and deleting one copy of eIF2beta in milton knockdown background reduced p62 accumulation (Figure 8C (Figure 7C in the previous version)). We revised the text as below:

      Lines 311-319: ‘Neuronal knockdown of milton causes accumulation of autophagic substrate p62 in the Triton X-100-soluble fraction (Figure 2B), and we tested if lowering eIF2β ameliorates it. We found that eIF2β heterozygosity caused a mild increase in LC3-I levels and decreases in LC3-II levels, resulting in a significantly lower LC3-II/LC3-I ratio in milton knockdown flies (Figure 8B). eIF2β heterozygosity decreased the p62 level in the Triton X-100-soluble fraction in the brains of milton knockdown flies (Figure 8C). The p62 level in the SDS-soluble fraction, which is not sensitive to milton knockdown (Figure 2B), was not affected (Figure 8C). These results suggest that suppression of eIF2β ameliorates the impairment of autophagy caused by milton knockdown.’

      L369: Please specify the source of the anti-ubiquitin antibody used.

      Thank you for pointing it out. We included the antibody information in the method section.

      Figure 7: While the relationship between Milton knockdown and the eIF2β and eIF2α proteins has been elucidated through the authors' efforts, I would like to see an investigation into whether eIF2β is upregulated and eIF2α phosphorylation is reduced in simply aged Drosophila. This would help us understand the correlation between aging and eIF2 protein dynamics.

      Thank you for your comment. We agree that it is an important question, and we are working on it. However, we feel that it is beyond the scope of the current manuscript.

      L645-L646: If the mushroom body is identified using mito-GFP, then include mito-GFP in the genotype listed in Supplementary Table 2.

      We are sorry for the oversight. We corrected it in Supplementary Table 2.

      Additionally, while it is presumed that the mito-GFP signal decreases in axons with Milton RNAi, how was the lobe tips area accurately selected for analysis? Please include these details along with a comprehensive description of the quantification methodology in the Methods section.

      Thank you for your comment. Although the mito-GFP signal in the axon is weak in the milton knockdown neurons, it is sufficient to distinguish the mushroom body structure from the background. We revised the method section to include this information in the method section:

      Line 443-447: ‘For eIF2α and p-eIF2α immunostaining, the mushroom body was detected by mitoGFP expression.’

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      I am impressed with the thoroughness with which the authors addressed my concerns. I don't have any further concerns and think that this paper makes an interesting and significant contribution to our understanding of VWM. I would only suggest adding citations to the newly added paragraph where the authors state "It could be argued that preparatory attention relies on the same mechanisms as working memory maintenance." They could cite work by Bettencourt and Xu, 2016; and Sheremata, Somers, and Shomstein (2018).

      We thank the reviewer for the positive feedback. We have now cited the referenced work in the manuscript (Page. 19, Line 371).

      Reviewer #2 (Public review):

      Overall, I think that the authors' revision has addressed most, if not all, of my major concerns noted in my previous comments. The results appear convincing and I do not have additional comments.

      We thank the reviewer for the positive feedback and are pleased that the revision addressed the major concerns.

      Reviewer #3 (Public review):

      (1) The authors addressed most of my previous concerns and provided additional data analysis. They conducted further analyses to demonstrate that the observed changes in network communication are associated with behavioral RTs, supporting the idea that the impulse-driven sensory-like template enhances informational connectivity between sensory and frontoparietal areas, and relates to behavior.

      We are pleased that the revision addressed the major concerns.

      (2) I would like to further clarify my previous points regarding the definition of the two types of templates and the evidence for their coexistence. The authors stated that the sensory-like template likely existed in a latent state and was reactivated by visual pings, proposing that sensory and non-sensory templates coexist. However, it remains unclear whether this reflects a dynamic switch between formats or true coexistence. If the templates are non-sensory in nature, what exactly do they represent? Are they meant to be abstract or conceptual representations, or, put simply, just "top-down attentional information"? If so, why did the generalization analysestraining classifiers on activity during the stimulus selection period and testing on preparatory activity-fail to yield significant results? While the stimulus selection period necessarily encodes both target and distractor information, it should still contain attentional information. I would appreciate more discussion from this perspective.

      We thank the reviewer for the helpful clarification of previous comments. Since we addressed similar comments from Reviewer 2 (Point 2) in the previous round, our response below may appear somewhat repetitive. First, regarding whether our findings reflect a dynamic switch between non-sensory and sensory-like template, or the ‘coexistence’ of two template formats, we acknowledge that the temporal limitations of fMRI prevent us from directly testing dynamic representations. However, several aspects of our data favor the latter interpretation: (1) our key findings remained consistent in the subset of participants (N=14) who completed both No-Ping and Ping sessions in counterbalanced order. This makes it unlikely that participants systematically switched cognitive strategies (e.g., using non-sensory templates in the No-Ping session versus sensory-like templates in the Ping session) in response to the taskirrelevant, uninformative visual impulse; (2) while we agree that the temporal dynamics between the two templates remain unclear, it is difficult to imagine that orientation-specific templates observed in the Ping session emerged de novo from purely non-sensory templates and an exogenous ping. In other words, if there is no orientation information at all to begin with, how does it come into being from an orientation-less external ping? A more parsimonious explanation is that orientation information was already present in a latent format and was activated by the ping, in line with the models of “activity-silent” working memory. However, since the detailed circuit-level mechanism underlying such reactivation remain unclear, we acknowledge that this interpretation warrants direct investigation in future studies. This point is discussed in the main texts (Page 19-20, Line 389-402). 

      Second, while our data cannot definitively determine the nature of the non-sensory template, we consider categorical coding a plausible candidate based on prior visual search studies. For instance, categorical attributes (e.g., left-tilted vs. right-tilted) have been shown to effectively guide attention in orientation search tasks (Wolfe et al., 1992), similar to our paradigm. Further, categorical templates are more tolerant of stimulus variability, making them well-suited to our task, which involved trial-by-trial variations in target orientation around a reference (see Page 21, Line 427- 437 for more detailed discussions).

      Third, the lack of generalization from stimulus selection to preparatory attention in the Ping session may relate to the limited overlap in shared information between these two periods. Neural activity during stimulus selection encodes sensory information about both orientations, along with sensory-like attentional signals (as indicated by the attention decoding and crosstask generalization from perception task to the stimulus-selection period). In contrast, preparatory activity likely involves a dominant non-sensory template, a latent sensory-like template, and residual sensory effects from the impulse stimulus. The limited overlap in sensory-like attentional signals may therefore be insufficient to support generalization across the two periods.

      Reviewer #2 ( Recommendations for the authors)

      I think the central prediction of greater pattern similarity between 'attend leftward' and 'perceived leftward' in the ping session in comparison to the no-ping session (the same also holds for 'attend rightward' and 'perceived rightward' could be directly examined by a two-way ANOVA (session × the attend orientation is the same/different from the perceived orientation) for each ROI (V1 and EVC). A three-way ANOVA might complicate readers' intuitive understanding of the implications of the statistical results.

      We thank the reviewer for the suggestion. Following the reviewer’s suggestion, we defined a new condition label based on orientation consistency between attended and perceived orientations: (1) same orientation: averaging “attend leftward/perceive leftward” and “attend rightward/perceive rightward”; and (2) different orientation: averaging “attend leftward/perceive rightward” and “attend rightward/perceive leftward”. A two-way mixed ANOVA (session × orientation consistency) on Mahalanobis distance revealed a main effect of orientation consistency in V1 (F(1,38) = 4.21, p = 0.047, η<sub>p</sub><sup>2</sup> = 0.100), indicating that activity patterns were more similar when attended and perceived orientations matched. No significant main effect of session was found (p = 0.923). Importantly, a significant interaction was found in V1 (F(1,38) = 5.00, p = 0.031, η<sub>p</sub><sup>2</sup> = 0.116), suggesting that visual impulse enhanced the similarity between preparatory attentional template and the perception of corresponding orientation. In EVC, the same analysis revealed only a main effect of orientation consistency (F(1,38) = 5.87, p = 0.020, η<sub>p</sub><sup>2</sup> = 0.134), with no significant other effects (ps > 0.240). The interaction results were consistent with those reported in the original three-way ANOVA. We have now replaced the previous analysis with the new one in the main texts (Page 11-12, Line 231-242).

    1. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Yamazaki et al. conducted multiple microscopy-based GFP localization screens, from which they identified proteins that are associated with PM/cell wall damage stress response. Specifically, the authors identified that bud-localized TMD-containing proteins and endocytotic proteins are associated with PM damage stress. The authors further demonstrated that polarized exocytosis and CME are temporally coupled in response to PM damage, and CME is required for polarized exocytosis and the targeting of TMD-containing proteins to the damage site. From these results, the authors proposed a model that CME delivers TMD-containing repair proteins between the bud tip and the damage site.

      Strengths:

      Overall, this is a well-written manuscript, and the experiments are well-conducted. The authors identified many repair proteins and revealed the temporal coordination of different categories of repair proteins. Furthermore, the authors demonstrated that CME is required for targeting of repair proteins to the damage site, as well as cellular survival in response to stress related to PM/cell wall damage. Although the roles of CME and bud-localized proteins in damage repair are not completely new to the field, this work does have conceptual advances by identifying novel repair proteins and proposing the intriguing model that the repairing cargoes are shuttled between the bud tip and the damaged site through coupled exocytosis and endocytosis.

      Weaknesses:

      While the results presented in this manuscript are convincing, they might not be sufficient to support some of the authors' claims. Especially in the last two result sessions, the authors claimed CME delivers TMD-containing repair proteins from the bud tip to the damage site. The model is no doubt highly possible based on the data, but caveats still exist. For example, the repair proteins might not be transported from one localization to another localization, but are degraded and resynthesized. Although the Gal-induced expression system can further support the model to some extent, I think more direct verification (such as FLIP or photo-convertible fluorescence tags to distinguish between pre-existing and newly synthesized proteins) would significantly improve the strength of evidence.

      Major experiment suggestions:

      (1) The authors may want to provide more direct evidence for "protein shuttling" and for excluding the possibility that proteins at the bud are degraded and synthesized de novo near the damage site. For example, if the authors could use FLIP to bleach bud-localized fluorescent proteins, and the damaged site does not show fluorescent proteins upon laser damage, this will strongly support the authors' model. Alternatively, the authors could use photo-convertible tags (e.g., Dendra) to differentiate between pre-existing repair proteins and newly synthesized proteins.

      (2) In line with point 1, the authors used Gal-inducible expression, which supported their model. However, the author may need to show protein abundance in galactose, glucose, and upon PM damage. Western blot would be ideal to show the level of full-length proteins, or whole-cell fluorescence quantification can also roughly indicate the protein abundance. Otherwise, we cannot assume that the tagged proteins are only expressed when they are growing in galactose-containing media.

      (3) Similarly, for Myo2 and Exo70 localization in CME mutants (Figure 4), it might be worth doing a western or whole-cell fluorescence quantification to exclude the caveat that CME deficiency might affect protein abundance or synthesis.

      (4) From the authors' model in Figure 7, it looks like the repair proteins contribute to bud growth. Does laser damage to the mother cell prevent bud growth due to the reduction of TMD-containing repair proteins at the bud? If the authors could provide evidence for that, it would further support the model.

      (5) Is the PM repair cell-cycle-dependent? For example, would the recruitment of repair proteins to the damage site be impaired when the cells are under alpha-factor arrest?

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      The small conductance calcium-activated potassium channel 2 (SK2) is an important drug target for treating neurological and cardiovascular diseases. However, structural information on this subtype of SK channels has been lacking, and it has been diOicult to draw conclusions about activator and inhibitor binding and action in the absence of structural information.

      Here the authors set out to (1) determine the structure of the transmembrane regions of a mammalian SK2 channel, (2) determine the binding site of apamin, a historically important SK2 inhibitor whose mode of action is unclear, and (3) use the structural information to generate a novel set of activators/inhibitors that selectively target SK2.

      The authors largely achieved all the proposed goals, and they present their data clearly.

      Unable to solve the structure of the human SK2 due to excessive heterogeneity in its cytoplasmic regions, the authors create a chimeric construct using SK4, whose structure was previously solved, and use it for structural studies. The data reveal a unique extracellular structure formed by the S2-S3 loop, which appears to directly interact with the selectivity filter and modulate its conductivity. Structures of SK2 in the absence and presence of the activating Ca2+ ions both possess non-K+-selective/conductive selectivity filters, where only sites 3 and 4 are preserved. The S6 gates are captured in closed and open states, respectively. Apamine binds to the S2-S3 loop, and unexpectedly, induces a K+ selective/conductive conformation of the selectivity filter while closing the S6 gate.

      Through high-throughput screening of small compound libraries and compound optimization, the group identified a reasonably selective inhibitor and a related compound that acts as an activator. The characterization shows that these compounds bind in a novel binding site. Interestingly, the inhibitor, despite binding in a site diOerent from that of apamine, also induces a K+ selective/conductive conformation of the selectivity filter while the activator induces a non-K+ selective/conductive conformation and an open S6 gate.

      The data suggest that the selectivity filter and the S6 gate are rarely open at the same time, and the authors hypothesize that this might be the underlying reason for the small conductance of SK2. The data will be valuable for understanding the mechanism of SK2 channel (and other SK subtypes).

      Overall, the data is of good quality and supports the claims made by the authors. However, a deeper analysis of the cryo-EM data sets might yield some important insights, i.e., about the relationship between the conformation of the selectivity filter and the opening of the S6 gate.

      We attempted focused 3D classification to identify subsets of particles with the S6 open and the SF in a conductive state but were not able to isolate such a particle class. This indicates that either none or a very small percentage of particles exists in a fully conductive state. This sentence was included in the results section: 

      “Focused 3D classification of the S3-S4 linker was unsuccessful in identifying particles subsets with a dilated extracellular constriction suggesting that either none or a very small percentage of Ca<sup>2+</sup>-bound SK2-4 is in a conductive state”

      Some insight and discussion about the allosteric networks between the SF and the S6 gate would also be a valuable addition.

      The extracellular constriction is in the same non-conductive conformation in the Ca<sup>2+</sup> bound and Ca<sup>2+</sup> -free SK2-4 structures suggesting that the conformation of S3-S4 linker/SF and the S6 are not allosterically coupled. We predict that Ca<sup>2+</sup> opens the intracellular gate and another physiological factor (not yet identified) promotes extracellular gate opening. These sentences were added to the results and discussion: “This along with the similar conformation of the S3-S4 linker in the Ca<sup>2+</sup> -bound and Ca<sup>2+</sup> -free states of SK2-4 suggest that Ca<sup>2+</sup> -dependent intracellular gate dynamics are not coupled to the conformation of the S3-S4 linker. Other yet to be identified physiological factors may be required to dilate the extracellular constriction.”

      “Alternatively, other physiological factors, such as PIP2[46,47] or protein-protein interactions[48-50], may exist in live cells that modulate the interaction between S3-S4 linker and the selectivity filter.”

      Reviewer #2 (Public review):

      Summary:

      The authors have used single-particle cryoEM imaging to determine how small-molecule regulators of the SK channel interact with it and modulate their function.

      Strengths:

      The reconstructions are of high quality, and the structural details are well described.

      Weaknesses:

      The electrophysiological data are poorly described. Several details of the structural observations require a mechanistic context, perhaps better relating them to what is known about SK channels or other K channel gating dynamics.

      As recommended, additional details for electrophysiological data were added to the results, methods, and figure legends for clarification.  

      The most pressing point I have to make, which could help improve the manuscript, relates to the selectivity filter (SF) conformation. Whether the two ion-bound state of SK2-4 (Figure 4A) represents a non-selective, conductive SF occluded by F243 or represents a C-type inactivated SF, further occluded by F243, is unclear. It would be important to discuss this. Reconstructions of Kv1.3 channels also feature a similar configuration, which has been correlated to its accelerated C-type inactivation.

      Structural overlays of Ca<sup>2+</sup> bound SK2-4, HCN, and C-type inactivated Kv1.3 selectivity filters demonstrate that each have conformational diVerences and it is diVicult to definitively determine if the SK2-4 selectivity filter is in a non-selective conformation like HCN or a C-type inactivated conformation like Kv1.3. Based on the number of ions observed in the filter and the position of Tyr361 we believe the selectivity filter most closely resembles that of HCN. Importantly, the selectivity filter conformation observed in the SK2-4 Ca<sup>2+</sup> -bound and Ca<sup>2+</sup> -free structures is ultimately nonconductive due to the Phe243 extracellular constriction blocking K<sup>+</sup> eVlux. 

      A comparison of the SK2-4 selectivity filter to HCN and C-type inactivated Kv1.3 was included in Figure 4 and this sentence was included in the results section:

      “The selectivity filter of SK2-4 resembles that of to HCN in both the position of Tyr361 and the number of K<sup>+</sup> coordination sites (Fig 4E,F,G,H)”

      Furthermore, binding of a toxin derivative to Kv1.3 restores the SF into a conductive form, though occluded by the toxin. It appears that apamin binding to SK2-4 might be doing something similar. Although I am not sure whether SK channels undergo C-type inactivation like gating, classical MTS accessibility studies have suggested that dynamics of the SF might play a role in the gating of SK channels. It would be really useful (if not essential) to discuss the SF dynamics observed in the study and relate them better to aspects of gating reported in the literature.

      Extracellular toxin binding to SK2-4 and K<sub>v</sub>1.3 induce a conformational change in the selectivity filter to produce a canonical K<sup>+</sup> selective structure with four coordination sites. However, the mechanism by which the toxins produce the conformational change is diVerent. For SK2-4, apamin interacts primarily with S3-S4 linker residues and induces a shift in the S3-S4 linker away from the pore axis. This in turn prevents the hydrogen bonds between Arg240 and Tyr245 of the S3-S4 linker and Asp363 at the C-terminus of the selectivity filter to produce a selectivity filter conformation with four K<sup>+</sup> coordination sites. For K<sub>v</sub>1.3, the sea anemone toxin ShK binds directly to the C-terminus of the selectivity filter disrupting interactions required for the C-type inactivated structure and thereby inducing the conformational change. These sentences were added to the results:

      “Toxin induced selectivity filter conformational change has also been reported for K<sub>v</sub 1.3 with the sea anemone toxin ShK. However, unlike apamin binding to SK2-4, ShK binds directly to the K<sub>v</sub> 1.3 selectivity filter to convert a C-type inactivated conformation to a canonical K<sup>+</sup> selective structure with four coordination sites [39,40]. The change in selectivity filter conformation in apamin-bound SK2-4 seems to be driven instead by the weakening of interactions between the selectivity filter and the S3-S4 linker.”

      The SF of K channels, in conductive states, are usually stabilized by an H-bond network involving water molecules bridged to residues behind the SF (D363 in the down-flipped conformation and Y361). Considering the high quality of the reconstructions, I would suspect that the authors might observe speckles of density (possibly in their sharpened map) at these sites, which overlap with water molecules identified in high-resolution X-ray structures of KcsA, MthK, NaK, NaK2K, etc. It could be useful to inspect this region of the density map.

      We did not observe strong density near Y361 or D363 that could be confidently model as water. However, in the structures of SK2-4 bound to apamin and compound 1 Tyr361 in the selectivity filter rotates 180° and forms a hydrogen bond with Thr355 in the pore helix. The homologous hydrogen bond is also observed in SK4 and the conductive/ K<sup>+</sup> selective selectivity filter conformation of Kv1.3.  The rotation of Tyr361 to form a hydrogen bond with Thr355, reorientation of Asp363 and Trp350 into hydrogen bonding position, and the presence of four K<sup>+</sup> coordination sites upon binding of apamin and compound 1 strongly suggest that the selectivity filter is in a K<sup>+</sup> selective/conductive conformation. The Tyr361/Thr355 hydrogen bond is now described in the paper and shown in Figures 4D, 5D, and S6F.

      Reviewer #3 (Public review):

      This is a fundamentally important study presenting cryo-EM structures of a human small conductance calcium-activated potassium (SK2) channel in the absence and presence of calcium, or with interesting pharmacological probes bound, including the bee toxin apamin, a small molecule inhibitor, and a small molecule activator. As eOorts to solve structures of the wild-type hSK2 channel were unsuccessful, the authors engineered a chimera containing the intracellular domain of the SK4 channel, the subtype of SK channel that was successfully solved in a previous study (reference 13). The authors present many new and exciting findings, including opening of an internal gate (similar to SK4), for the first time resolving the S3-S4 linker sitting atop the outer vestibule of the pore and unanticipated plasticity of the ion selectivity filter, and the binding sites for apamin, one new small molecule inhibitor and another small molecule activator. Appropriate functional data are provided to frame interpretations arising from the structures of the chimeric protein; the data are compelling, the interpretations are sound, and the writing is clear. This high-quality study will be of interest to membrane protein structural biologists, ion channel biophysicists, and chemical biologists, and will be valuable for future drug development targeting SK channels.

      The following are suggestions for strengthening an already very strong and solid manuscript:

      (1) It would be good to include some information in the text of the results section about the method and configuration used to obtain electrophysiological data and the limitations. It is not until later in the text that the Qube instrument is mentioned in the results section, and it is not until the methods section that the reader learns it was used to obtain all the electrophysiological data. Even there, it is not explicitly mentioned that a series of diOerent internal solutions were used in each cell where the free calcium concentration was varied to obtain the data in Figure1C. Also, please state the concentration of free calcium for the data in Figure 1B.

      As recommended, additional details for electrophysiological data were added to the results, methods, and figure legends for clarification.  

      (2) The authors do a nice job of discussing the conformations of the selectivity filter they observed here in SK as they relate to previous work on NaK and HCN, but from my perspective the authors are missing an opportunity to point out even more striking relationships with slow C-type inactivation of the selectivity filter in Shaker and Kv1 channels. C-type inactivation of the filter in Shaker was seen in 150 mM K using the W434F mutant (PMC8932672) or in 4 mM K for the WT channel (PMC8932672), and similar results have been reported for Kv1.2 (PMC9032944; PMC11825129) and for Kv1.3 (PMC9253088; PMC8812516) channels. For Kv1.3, C-type inactivation occurs even in 150 mM K (PMC9253088; PMC8812516). Not unlike what is seen here with apamin, binding of the sea anemone toxin (ShK) with a Fab attached (or the related dalazatide) inserts a Lys into the selectivity filter and stabilizes the conducting conformation of Kv1.3 even though the Lys depletes occupancy of S1 by potassium (PMC9253088; PMC8812516). Or might the conformation of the filter be controlled by regulatory processes in SK2 channels? I think connecting the dots here would enhance the impact of this study, even if it remains relatively speculative.

      Please see the response to reviewer 2’s comments for a comparison of the selectivity filter structure between SK2-4 and C-type inactivated K<sub>v</sub>1.3 and a discussion of toxin induced selectivity filter conformational change.

      What is known about how the functional properties of SK2 channels (where the filter changes conformation) diOer from SK4, where the filter remains conducting (reference 13)? Is there any evidence that SK2 channels inactivate?

      Compared with SK4, SK2 has some unique properties such as lower conductance and the ability to switch between low- and high-open probability states. Mutation of Phe243 suggests that the S3-S4 linker conformation contributes to the low conductance. This is included in the discussion.

      “Such a mechanism may explain some properties of SK2 that are not observed in SK4, which lacks an S3-S4 linker, such as its low conductance (~10 pS) and the ability to switch between low- and high-open probability states[3,4]. Indeed, mutation of Phe243 in rat SK2 produced a 2-fold increase in channel conductance[5].”

      Or might the conformation of the filter be controlled by regulatory processes in SK2 channels? I think connecting the dots here would enhance the impact of this study, even if it remains relatively speculative.

      Please see the response to reviewer 1’s comments for a discussion of the potential physiological role of the S3-S4 linker/extracellular constriction and its mechanism for opening.

      Reviewer #1 (Recommendations for the authors):

      I enjoyed reading your paper and am intrigued by your findings on the selectivity filter of SK2. I've got a few recommendations for data analysis and a couple of questions that might contribute to the discussion.

      In your Ca2+-bound dataset, have you tried to parse out any alternative conformations (e.g., by using 3D classification, or 3D variability)? Do you think there might be a small(er) population of particles that adopt a fully open conformation? If you haven't done this already, I would recommend doing so. You have a rather large number of particles in your final 3D reconstruction (~660k), so there might be some hidden conformations that could contribute to our understanding of the system.

      I would recommend doing the same for your compound 4-bound data set.

      Please see above for response to this recommendation.

      Do you think apamine works solely as a pore blocker, or does its binding perhaps also aOect the S6 gate via allosteric networks (perhaps the same ones that induce the formation of the K+ conductive SF through binding of compound 1 above the S6 gate?)?

      Apamin binding does not change the conformation of the pore helices (S5 or S6) and thus we believe it acts primarily as a pore blocker. The following was added to the results section:

      “Overall, the apamin-bound SK2-4/CaM structure resembles Ca<sup>2+</sup>-bound SK2-4. The Nterminal lobe of CaM engages with the S<sub>45</sub> A helix, the S5 and S6 helices adopt a similar conformation, and the intracellular gate Val390 is open with a radius of 3.5 Å (Fig 2D). The most significant conformational change is in the position of the S3-S4 linker, which shifts ~2 Å away from the pore axis to accommodate apamin binding.”

      Is there a mechanistic explanation for why it might be diOicult/energetically costly for the SF to be conductive and the S6 gate to be open at the same time?

      Not to our knowledge.

      I also have these minor recommendations:

      -In all figures showing density, include the threshold/sigma value at which density is shown.

      -For all ligands and ions, include half-map data.

      Sigma values were added for all figures legends displaying cryoEM density. The displayed maps are the sharpened full maps.

      Reviewer #2 (Recommendations for the authors):

      Is it possible to provide a structure-sequence guided explanation for the diOerent aOinity of compound 1 for SK2 vs SK4?

      Yes. The following is now included in the results section and a panel was added to Figure S6D.

      “However, for SK4 Thr212 replaces SK2 Ser318 and Trp216 (homologous to SK2 Trp322) is conserved but adopts a diVerent rotamer conformation (Fig S6D). Both changes occlude the compound 1 binding site in SK4 and would likely reduce compound 1 potency on SK4 as observed in the functional data.”

      Is it possible to propose a model of modulation by compound 1/4 where the authors can comment on the conformational dependence of compound binding? That is, do they bind exclusively to the identified conformational states of the channel, or are they able to bind to both closed and open channels, but bias one state over the other?

      The clash between compound 1 and Thr386 in the open conformation of the S6 helices suggests that compound 1 would preferentially bind to closed state of SK2. Similarly, the clash between compound 4 and Ile380 in the closed conformation of the S6 helices suggests that compound 4 would preferentially bind to the open state of SK2. This was included in the discussion:

      “This proposed mechanism of modulation suggests that compound 1 may bind preferentially to the closed conformation of the S6 helices and compound 4 may bind preferentially to the open conformation of the S6 helices.” 

      Please provide the calcium concentration used to generate the data in Figure 1B. The calcium concentration is now stated in the legend for Fig 1B:

      “Intracellular solution contains 2 µM Ca<sup>2+</sup> based on calculation using Maxchelator (see methods)”

      Essential and critically important descriptions of experiments in Figure 7A are lacking. It would be essential to describe properly, with care, what the currents and the conditions of measurements are. If these currents are obtained by subtracting leak currents by adding other drugs, it would be good to comment on whether the latter compete with compounds 1/4.

      As recommended, additional details for electrophysiological data were added to the results, methods, and figure legends for clarification. SK currents were obtained by subtracting leak currents by adding UCL1684 only at the end of experiments. UCL1684 is not expected to interfere with eVect of compound 1 or 4 given diVerent binding sites and mechanisms.  

      If Compound 1 changes the structure of the SF (Figure 6F), would it also promote apamin binding? Given that both these agents produce a similar change in the SF, could each favor the binding of the other?

      Since apamin binds to the S3-S4 linker it is unlikely that the selectivity filter conformational change observed in the compound 1 bound structure would aVect apamin binding.

    1. These temporary limitations will pass. The physics engines thatunderpin VR are improving. In years to come, the headsets will getsmaller, and we will transition to glasses, contact lenses, and eventuallyretinal or brain implants. The resolution will get better, until a virtualworld looks exactly like a nonvirtual world. We will figure out how tohandle touch, smell, and taste. We may spend much of our lives in theseenvironments, whether for work, socializing, or entertainment.

      Its so crazy to me how much VR can and will change the world. I think that its really cool to use as a fun game or activity but I do not think that it should be incorporated into everyday life. I feel as though its going to make the world into such a fake environment and ruin true socialness and connection.

    2. Reality exists, independently of us. The truthmatters. There are truths about reality, and we can try to find them.Even in an age of multiple realities, I still believe in objective reality.

      I find it interesting to sya that reality exists just independently of us. I mean everyone lives a completely different live and we tend to forget that. This can also be referred to as sonder. I think sonder can also be applied to concept of if virtual reality is reality and where the truths are within reality. If reality is just wihin our minds, how does one go about trying to find out what is true? Just by living? I think we can create certain realties in our brain that may or may not come true.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public Review):

      Summary:

      It seems as if the main point of the paper is about the new data related to rat fish although your title is describing it as extant cartilaginous fishes and you bounce around between the little skate and ratfish. So here's an opportunity for you to adjust the title to emphasize ratfish is given the fact that leader you describe how this is your significant new data contribution. Either way, the organization of the paper can be adjusted so that the reader can follow along the same order for all sections so that it's very clear for comparative purposes of new data and what they mean. My opinion is that I want to read, for each subheading in the results, about the the ratfish first because this is your most interesting novel data. Then I want to know any confirmation about morphology in little skate. And then I want to know about any gaps you fill with the cat shark. (It is ok if you keep the order of "skate, ratfish, then shark, but I think it undersells the new data).

      The main points of the paper are 1) to define terms for chondrichthyan skeletal features in order to unify research questions in the field, and 2) add novel data on how these features might be distributed among chondrichthyan clades. However, we agree with the reviewer that many readers might be more interested in the ratfish data, so we have adjusted the order of presentation to emphasize ratfish throughout the manuscript.

      Strengths:

      The imagery and new data availability for ratfish are valuable and may help to determine new phylogenetically informative characters for understanding the evolution of cartilaginous fishes. You also allude to the fossil record.

      Thank you for the nice feedback.

      Opportunities:

      I am concerned about the statement of ratfish paedomorphism because stage 32 and 33 were not statistically significantly different from one another (figure and prior sentences). So, these ratfish TMDs overlap the range of both 32 and 33. I think you need more specimens and stages to state this definitely based on TMD. What else leads you to think these are paedomorphic? Right now they are different, but it's unclear why. You need more outgroups.

      Sorry, but we had reported that the TMD of centra from little skate did significantly increase between stage 32 and 33. Supporting our argument that ratfish had features of little skate embryos, TMD of adult ratfish centra was significantly lower than TMD of adult skate centra (Fig1).  Also, it was significantly higher than stage 33 skate centra, but it was statistically indistinguishable from that of stage 33 and juvenile stages of skate centra.  While we do agree that more samples from these and additional groups would bolster these data, we feel they are sufficiently powered to support our conclusions for this current paper.

      Your headings for the results subsection and figures are nice snapshots of your interpretations of the results and I think they would be better repurposed in your abstract, which needs more depth.

      We have included more data summarized in results sub-heading in the abstract as suggested (lines 32-37).

      Historical literature is more abundant than what you've listed. Your first sentence describes a long fascination and only goes back to 1990. But there are authors that have had this fascination for centuries and so I think you'll benefit from looking back. Especially because several of them have looked into histology and development of these fishes.

      I agree that in the past 15 years or so a lot more work has been done because it can be done using newer technologies and I don't think your list is exhaustive. You need to expand this list and history which will help with your ultimate comparative analysis without you needed to sample too many new data yourself.

      We have added additional recent and older references: Kölliker, 1860; Daniel, 1934; Wurmbach, 1932; Liem, 2001; Arratia et al., 2001.

      I'd like to see modifications to figure 7 so that you can add more continuity between the characters, illustrated in figure 7 and the body of the text.

      We address a similar comment from this reviewer in more detail below, hoping that any concerns about continuity have been addressed with inclusion of a summary of proposed characters in a new Table 1, re-writing of the Discussion, and modified Fig7 and re-written Fig7 legend.

      Generally Holocephalans are the outgroup to elasmobranchs - right now they are presented as sister taxa with no ability to indicate derivation. Why isn't the catshark included in this diagram?

      While a little unclear exactly what was requested, we restructured the branches to indicate that holocephalans diverged earlier from the ancestors that led to elasmobranchs. Also in response to this comment, we added catshark (S. canicula) and little skate (L. erinacea) specifically to the character matrix.

      In the last paragraph of the introduction, you say that "the data argue" and I admit, I am confused. Whose data? Is this a prediction or results or summary of other people's work? Either way, could be clarified to emphasize the contribution you are about to present.

      Sorry for this lack of clarity, and we have changed the wording in this revision to hopefully avoid this misunderstanding.

      Reviewer #2 (Public Review):

      General comment:

      This is a very valuable and unique comparative study. An excellent combination of scanning and histological data from three different species is presented. Obtaining the material for such a comparative study is never trivial. The study presents new data and thus provides the basis for an in-depth discussion about chondrichthyan mineralised skeletal tissues.

      many thanks for the kind words

      I have, however, some comments. Some information is lacking and should be added to the manuscript text. I also suggest changes in the result and the discussion section of the manuscript.

      Introduction:

      The reader gets the impression almost no research on chondrichthyan skeletal tissues was done before the 2010 ("last 15 years", L45). I suggest to correct that and to cite also previous studies on chondrichthyan skeletal tissues, this includes studies from before 1900.

      We have added additional older references, as detailed above.

      Material and Methods:

      Please complete L473-492: Three different Micro-CT scanners were used for three different species? ScyScan 117 for the skate samples. Catshark different scanner, please provide full details. Chimera Scncrotron Scan? Please provide full details for all scanning protocols.

      We clarified exact scanners and settings for each micro-CT experiment in the Methods (lines 476-497).

      TMD is established in the same way in all three scanners? Actually not possible. Or, all specimens were scanned with the same scanner to establish TMD? If so please provide the protocol.

      Indeed, the same scanner was used for TMD comparisons, and we included exact details on how TMD was established and compared with internal controls in the Methods. (lines 486-488)

      Please complete L494 ff: Tissue embedding medium and embedding protocol is missing. Specimens have been decalcified, if yes how? Have specimens been sectioned non-decalcified or decalcified?

      Please complete L506 ff: Tissue embedding medium and embedding protocol is missing. Description of controls are missing.

      Methods were updated to include these details (lines 500-503).

      Results:

      L147: It is valuable and interesting to compare the degree of mineralisation in individuals from the three different species. It appears, however, not possible to provide numerical data for Tissue Mineral Density (TMD). First requirement, all specimens must be scanned with the same scanner and the same calibration values. This in not stated in the M&M section. But even if this was the case, all specimens derive from different sample locations and have, been preserved differently. Type of fixation, extension of fixation time in formalin, frozen, unfrozen, conditions of sample storage, age of the samples, and many more parameters, all influence TMD values. Likewise the relative age of the animals (adult is not the same as adult) influences TMD. One must assume different sampling and storage conditions and different types of progression into adulthood. Thus, the observation of different degrees of mineralisation is very interesting but I suggest not to link this observation to numerical values.

      These are very good points, but for the following reasons we feel that they were not sufficiently relevant to our study, so the quantitative data for TMD remain scientifically valid and critical for the field moving forward.  Critically, 1) all of the samples used for TMD calculations underwent the same fixation protocols, and 2) most importantly, all samples for TMD were scanned on the same micro-CT scanner using the same calibration phantoms for each scanning session.  Finally, while the exact age of each adult was not specified, we note for Fig1 that clear statistically significant differences in TMD were observed among various skeletal elements from ratfish, shark, and skate.  Indeed, ratfish TMD was considerably lower than TMD reported for a variety of fishes and tetrapods (summarized in our paper about icefish skeletons, who actually have similar TMD to ratfish: https://doi.org/10.1111/joa.13537).

      In  , however, we added a caveat to the paper’s Methods (lines 466-469), stating that adult ratfish were frozen within 1 or 2 hours of collection from the wild, staying frozen for several years prior to thawing and immediate fixation.

      Parts of the results are mixed with discussion. Sometimes, a result chapter also needs a few references but this result chapter is full of references.

      As mentioned above, we reduced background-style writing and citations in each Results section.

      Based on different protocols, the staining characteristics of the tissue are analysed. This is very good and provides valuable additional data. The authors should inform the not only about the staining (positive of negative) abut also about the histochemical characters of the staining. L218: "fast green positive" means what? L234: "marked by Trichrome acid fuchsin" means what? And so on, see also L237, L289, L291

      We included more details throughout the Results upon each dye’s first description on what is generally reflected by the specific dyes of the staining protocols. (lines 178, 180, 184, 223, 227, and 243-244)

      Discussion

      Please completely remove figure 7, please adjust and severely downsize the discussion related to figure 7. It is very interesting and valuable to compare three species from three different groups of elasmobranchs. Results of this comparison also validate an interesting discussion about possible phylogenetic aspects. This is, however, not the basis for claims about the skeletal tissue organisation of all extinct and extant members of the groups to which the three species belong. The discussion refers to "selected representatives" (L364), but how representative are the selected species? Can there be a extant species that represents the entire large group, all sharks, rays or chimeras? Are the three selected species basal representatives with a generalist life style?

      These are good points, and yes, we certainly appreciate that the limited sampling in our data might lead to faulty general conclusions about these clades.  In fact, we stated this limitation clearly in the Introduction (lines 126-128), and we removed “representative” from this revision.  We also replaced general reference to chondrichthyans in the Title by listing the specific species sampled.  However, in the Discussion, we also compare our data with previously published additional species evaluated with similar assays, which confirms the trend that we are concluding.  We look forward to future papers specifically testing the hypotheses generated by our conclusions in this paper, which serves as a benchmark for identifying shared and derived features of the chondrichthyan endoskeleton.

      Please completely remove the discussion about paedomorphosis in chimeras (already in the result section). This discussion is based on a wrong idea about the definition of paedomorphosis. Paedomorphosis can occur in members of the same group. Humans have paedormorphic characters within the primates, Ambystoma mexicanum is paedormorphic within the urodeals. Paedomorphosis does not extend to members of different vertebrate branches. That elasmobranchs have a developmental stage that resembles chimera vertebra mineralisation does not define chimera vertebra centra as paedomorphic. Teleost have a herocercal caudal fin anlage during development, that does not mean the heterocercal fins in sturgeons or elasmobranchs are paedomorphic characters.

      We agree with the reviewer that discussion of paedomorphosis should apply to members of the same group.  In our paper, we are examining paedomorphosis in a holocephalan, relative to elasmobranch fishes in the same group (Chrondrichthyes), so this is an appropriate application of paedomorphosis.  In response to this comment, we clarified that our statement of paedomorphosis in ratfish was made with respect to elasmobranchs (lines 37-39; 418-420).

      L432-435: In times of Gadow & Abott (1895) science had completely wrong ideas bout the phylogenic position of chondrichthyans within the gnathostomes. It is curious that Gadow & Abott (1895) are being cited in support of the paedomorphosis claim.

      If paedomorphosis is being examined within Chondrichthyes, such as in our paper and in the Gadow and Abbott paper, then it is an appropriate reference, even if Gadow and Abbott (and many others) got the relative position of Chondrichthyes among other vertebrates incorrect.

      The SCPP part of the discussion is unrelated to the data obtained by this study. Kawaki & WEISS (2003) describe a gene family (called SCPP) that control Ca-binding extracellular phosphoproteins in enamel, in bone and dentine, in saliva and in milk. It evolved by gene duplication and differentiation. They date it back to a first enamel matrix protein in conodonts (Reif 2006). Conodonts, a group of enigmatic invertebrates have mineralised structures but these structure are neither bone nor mineralised cartilage. Cat fish (6 % of all vertebrate species) on the other hand, have bone but do not have SCPP genes (Lui et al. 206). Other calcium binding proteins, such as osteocalcin, were initially believed to be required for mineralisation. It turned out that osteocalcin is rather a mineralisation inhibitor, at best it regulates the arrangement collagen fiber bundles. The osteocalcin -/- mouse has fully mineralised bone. As the function of the SCPP gene product for bone formation is unknown, there is no need to discuss SCPP genes. It would perhaps be better to finish the manuscript with summery that focuses on the subject and the methodology of this nice study.

      We completely agree with the reviewer that many papers claim to associate the functions of SCPP genes with bone formation, or even mineralization generally.  The Science paper with the elephant shark genome made it very popular to associate SCPP genes with bone formation, but we feel that this was a false comparison (for many reasons)!  In response to the reviewer’s comments, however, we removed the SCPP discussion points, moving the previous general sentence about the genetic basis for reduced skeletal mineralization to the end of the previous paragraph (lines 435-439).  We also added another brief Discussion paragraph afterwards, ending as suggested with a summary of our proposed shared and derived chondrichthyan endoskeletal traits (lines 440-453).

      Reviewer #1 (Recommendations For The Authors):

      Further Strengths and Opportunities:

      Your headings for the results subsection and figures are nice snapshots of your interpretations of the results and I think they would be better repurposed in your abstract, which needs more depth. It's a little unusual to try and state an interpretation of results as the heading title in a results section and the figures so it feels out of place. You could also use the headings as the last statement of each section, after you've presented the results. In order I would change these results subheadings to:

      Tissue Mineral Density (TMD)

      Tissue Properties of Neural Arches

      Trabecular mineralization

      Cap zone and Body zone Mineralization Patterns

      Areolar mineralization

      Developmental Variation

      Sorry, but we feel that summary Results sub-headings are the best way to effectively communicate to readers the story that the data tell, and this style has been consistently used in our previous publications.  No changes were made.

      You allude to the fossil record and that is great. That said historical literature is more abundant than what you've listed. Your first sentence describes a long fascination and only goes back to 1990. But there are authors that have had this fascination for centuries and so I think you'll benefit from looking back. Especially because several of them have looked into histology of these fishes. You even have one sentence citing Coates et al. 2018, Frey et al., 2019 and ørvig 1951 to talk about the potential that fossils displayed trabecular mineralization. That feels like you are burying the lead and may have actually been part of the story for where you came up with your hypothesis in the beginning... or the next step in future research. I feel like this is really worth spending some more time on in the intro and/or the discussion.

      We’ve added older REFs as pointed out above.  Regarding fossil evidence for trabecular mineralization, no, those studies did not lead to our research question.  But after we discovered how widespread trabecular mineralization was in extant samples, we consulted these papers, which did not focus on the mineralization patterns per se, but certainly led us to emphasize how those patterns fit in the context of chondrichthyan evolution, which is how we discussed them.

      I agree that in the past 15 years or so a lot more work has been done because it can be done using newer technologies. That said there's a lot more work by Mason Dean's lab starting in 2010 that you should take a look at related to tesserae structure... they're looking at additional taxa than what you did as well. It will be valuable for than you to be able to make any sort of phylogenetic inference as part of your discussion and enhance the info your present in figure 7. Go further back in time... For example:

      de Beer, G. R. 1932. On the skeleton of the hyoid arch in rays and skates. Quarterly

      Journal of Microscopical Science. 75: 307-319, pls. 19-21.

      de Beer, G. R. 1937. The Development of the Vertebrate Skull. The University Press,Oxford.

      Indeed, we have read all of Mason’s work, citing 9 of his papers, and where possible, we have incorporated their data on different species into our Discussion and Fig7.  Thanks for the de Beer REFs.  While they contain histology of developing chondrichthyan elements, they appear to refer principally to gross anatomical features, so were not included in our Intro/Discussion.

      Most sections with in the results, read more like a discussion than a presentation of the new data and you jump directly into using an argument of those data too early. Go back in and remove the references or save those paragraphs for the discussion section. Particularly because this journal has you skip the method section until the end, I think it's important to set up this section with a little bit more brevity and conciseness.  For instance, in the first section about tissue mineral density, change that subheading to just say tissue mineral density. Then you can go into the presentation of what you see in the ratfish, and then what you see in the little skate, and then that's it. You save the discussion about what other elasmobranch's or mineralizing their neural arches, etc. for another section.

      We dramatically reduced background-style writing and citations in each Results section (other than the first section of minor points about general features of the ratfish, compared to catshark and little skate), keeping only a few to briefly remind the general reader of the context of these skeletal features.

      I like that your first sentence in the paragraph is describing why you are doing. a particular method and comparison because it shows me (the reader) where you're sampling from. Something else is that maybe as part of the first figure rather than having just each with the graph have a small sketch for little skate and catch shark to show where you sampled from for comparative purposes. That would relate back, then to clarifying other figures as well.

      done (also adding a phylogenetic tree).

      Second instance is your section on trabecular mineralization. This has so many references in it. It does not read like results at all. It looks like a discussion. However, the trabecular mineralization is one of the most interesting aspect of this paper, and how you are describing it as a unique feature. I really just want a very clear description of what the definition of this trabecular mineralization is going to be.

      In addition to adding Table 1 to define each proposed endoskeletal character state, we have changed the structure of this section and hope it better communicates our novel trabecular mineralization results.  We also moved the topic of trabecular mineralization to the first detailed Discussion point (lines 347-363) to better emphasize this specific topic.

      Carry this reformatting through for all subsections of the results.

      As mentioned above, we significantly reduced background-style writing and citations in each Results section.

      I'd like to see modifications to figure 7 so that you can add more continuity between the characters, illustrated in figure 7 and the body of the text. I think you can give the characters a number so that you can actually refer to them in each subsection of the results. They can even be numbered sequentially so that they are presented in a standard character matrix format, that future researchers can add directly to their own character matrices. You could actually turn it into a separate table so it doesn't taking up that entire space of the figure, because there need to be additional taxa referred to on the diagram. Namely, you don't have any out groups in figure 7 so it's hard to describe any state specifically as ancestral and wor derived. Generally Holocephalans are the outgroup to elasmobranchs - right now they are presented as sister taxa with no ability to indicate derivation. Why isn't the catshark included in this diagram?

      The character matrix is a fantastic idea, and we should have included it in the first place!  We created Table 1 summarizing the traits and terminology at the end of the Introduction, also adding the character matrix in Fig7 as suggested, including specific fossil and extant species.  For the Fig7 branching and catshark inclusion, please see above. 

      You can repurpose the figure captions as narrative body text. Use less narrative in the figure captions. These are your results actually, so move that text to the results section as a way to truncate and get to the point faster.

      By figure captions, we assume the reviewer refers to figure legends.  We like to explain figures to some degree of sufficiency in the legends, since some people do not read the main text and simply skim a manuscript’s abstract, figures, and figure legends.  That said, we did reduce the wording, as requested.

      More specific comments about semantics are listed here:

      The abstract starts negative and doesn't state a question although one is referenced. Potential revision - "Comprehensive examination of mineralized endoskeletal tissues warranted further exploration to understand the diversity of chondrichthyans... Evidence suggests for instance that trabecular structures are not common, however, this may be due to sampling (bring up fossil record.) We expand our understanding by characterizing the skate, cat shark, and ratfish... (Then add your current headings of the results section to the abstract, because those are the relevant takeaways.)"

      We re-wrote much of the abstract, hoping that the points come across more effectively.  For example, we started with “Specific character traits of mineralized endoskeletal tissues need to be clearly defined and comprehensively examined among extant chondrichthyans (elasmobranchs, such as sharks and skates, and holocephalans, such as chimaeras) to understand their evolution”.  We also stated an objective for the experiments presented in the paper: “To clarify the distribution of specific endoskeletal features among extant chondrichthyans”. 

      In the last paragraph of the introduction, you say that "the data argue" and I admit, I am confused. Whose data? Is this a prediction or results or summary of other people's work? Either way, could be clarified to emphasize the contribution you are about to present.

      Sorry for this lack of clarity, and we have changed the wording in this revision to hopefully avoid this misunderstanding.

      In the second paragraph of the TMD section, you mention the synarcual comparison. I'm not sure I follow. These are results, not methods. Tell me what you are comparing directly. The non-centrum part of the synarcual separate from the centrum? They both have both parts... did you mean the comparison of those both to the cat shark? Just be specific about which taxon, which region, and which density. No need to go into reasons why you chose those regions here.. Put into methods and discussion for interpretation.

      We hope that we have now clarified wording of that section.

      Label the spokes somehow either in caption or on figure direction. I think I see it as part of figure 4E, I, and J, but maybe I'm misinterpreting.

      Based upon histological features (e.g., regions of very low cellularity with Trichrome unstained matrix) and hypermineralization, spokes in Fig4 are labelled with * and segmented in blue.  We detailed how spokes were identified in main text (lines 241-243; 252-254) and figure legend (lines 597-603). 

      Reviewer #2 (Recommendations For The Authors):

      Other comments

      L40: remove paedomorphism

      no change; see above

      L53: down tune languish, remove "severely" and "major"

      done (lines 57-59)

      L86: provide species and endoskeletal elements that are mineralized

      no change; this paragraph was written generally, because the papers cited looked at cap zones of many different skeletal elements and neural arches in many different species

      L130: remove TMD, replace by relative, descriptive, values

      no change; see above

      L135: What are "segmented vertebral neural arches and centra" ?

      changed to “neural arches and centra of segmented vertebrae” (lines 140-141)

      L166: L168 "compact" vs. "irregular". Partial mineralisation is not necessarily irregular.

      thanks for pointing out this issue; we changed wording, instead contrasting “non-continuous” and “continuous” mineralization patterns (lines 171-174)

      L192: "several endoskeletal regions". Provide all regions

      all regions provided (lines 198-199)

      L269: "has never been carefully characterized in chimeras". Carefully means what? Here, also only one chimera is analyses, not several species.

      sentence removed

      302: Can't believe there is no better citation for elasmobranch vertebral centra development than Gadow and Abott (1895)

      added Arriata and Kolliker REFs here (lines 293-295)

      L318 ff: remove discussion from result chapter

      references to paedomorphism were removed from this Results section

      L342: refer to the species studied, not to the entire group.

      sorry, the line numbering for the reviewer and our original manuscript have been a little off for some reason, and we were unclear exactly to which line of text this comment referred.  Generally in this revision, however, we have tried to restrict our direct analyses to the species analyzed, but in the Discussion we do extrapolate a bit from our data when considering relevant published papers of other species.

      346: "selected representative". Selection criteria are missing

      “selected representative” removed

      L348: down tune, remove "critical"

      Done

      L351: down tune, remove "critical"

      done

      L 364: "Since stem chondrichthyans did not typically mineralize their centra". Means there are fossil stem chondrichthyans with full mineralised centra?

      Re-worded to “Stem chondrichthyans did not appear to mineralize their centra” (lines 379)

      L379: down tune and change to: "we propose the term "non-tesseral trabecular mineralization. Possibly a plesiomorphic (ancestral) character of chondrichthyans"

      no change; sorry, but we feel this character state needs to be emphasized as we wrote in this paper, so that its evolutionary relationship to other chondrichthyan endoskeletal features, such as tesserae, can be clarified.

      L407: suggests so far palaeontologist have not been "careful" enough?

      apologies; sentence re-worded, emphasizing that synchrotron imaging might increase details of these descriptions (lines 406-408)

      414: down tune, remove "we propose". Replace by "possibly" or "it can be discussed if"

      sentence re-worded and “we propose” removed (lines 412-415)

      L420: remove paragraph

      no action; see above

      L436: remove paragraph

      no action; see above

      L450: perhaps add summery of the discussion. A summery that focuses on the subject and the methodology of this nice study.

      yes, in response to the reviewer’s comment, we finished the discussion with a summary of the current study.  (lines 440-453)

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC-2025-03094

      Corresponding author(s): Saurabh S. Kulkarni

      1. General Statements

      We thank the reviewers for their strong praise of the manuscript, highlighting its rigor, depth, and conceptual importance. They consistently described the study as a beautiful, fascinating, and conceptually strong piece of work that addresses a timely question in multiciliated cells. They also noted the high quality of the data, careful quantification, and the use of multiple genetic and pharmacological approaches, all of which improve the reproducibility and credibility of the findings. Importantly, they emphasized the novelty of discovering a direct mechanistic link between Piezo1-mediated mechanotransduction and Foxj1-driven transcriptional control of multiciliation, representing a significant breakthrough for both the cilia field and mechanobiology more broadly. Collectively, these strengths highlight the manuscript’s wide impact and make it highly suitable for publication in a high-impact journal.

      2. Description of the planned revisions

      Reviewer #1:


      There are two experiments that would significantly strengthen these claims.

      • First if their model is correct then even short term treatment with Yoda1 should induce the pathway and effect centriole numbers. While I appreciate the challenge of long term Yoda1 treatment its not clear to me why it would be needed if short term treatment is setting off the transcriptional cascade. Yoda is used throughout the paper to induce all the pathways but we don't know if it actually induces the phenotype. I think this should be addressed with either short term treatments or a dose response to find a dose that does not lead to skin pealing. It is hard to ignore this obvious deficiency.
      • Second, the model predicts that all of this is to regulate Foxj1 levels to regulate the subtle balance between cell size and centriole number. If this is correct, then the overexpression of Foxj1 should have a profound effect on centriole number in multiciliated cells. This is such an easy experiment that would validate many of the claims. RESPONSE:

      We recognize that the reviewer is asking us to test the sufficiency of the pathway with these comments: “If their model is correct, then they should be able to activate the pathway in one way or another to stimulate centriole number. This is a significant limitation to their overall model.” And “If this is correct, then the overexpression of Foxj1 should have a profound effect on centriole number in multiciliated cells.”

      To address reviewers’ suggestions, we will perform the following experiments.

      1. A brief exposure (15 and 30 mins) to Yoda1 and wait for 3 hours to examine changes in centriole amplification. This will avoid skin peeling from long-term exposure.
      2. A brief exposure to Yoda1 (15 mins) followed by a 30-minute wait period, and the cycle repeats a total of 4 times for a total of 3 hours to examine centriole amplification.
      3. The above two experiments will also be done in a constitutively active-Yap background to increase the probability that synergistic activation can lead to centriole amplification.
      4. Although Foxj1 is essential for multiciliogenesis, it is not sufficient to induce multiciliogenesis, as shown by multiple previous studies. Therefore, we do not expect overexpression of Foxj1 to have a profound effect on centriole number. While we will conduct the experiments because we truly want to address the suggestions and gain insight into the answers ourselves, we respectfully ask the Reviewer to consider the following responses to their concern.

      Yoda1 sufficiency: We agree that testing whether acute Yoda1 treatment can induce centriole amplification is an important question. We will conduct experiments with short-pulse and cyclic Yoda1 exposure, including in a constitutively active-YAP background (listed above), to address this possibility. However, several challenges complicate interpretation: (i) PIEZO1 adapts and desensitizes upon activation, (ii) transient signaling may be sufficient to cause secondary signaling but insufficient to drive stable transcriptional programs required for amplification, and (iii) centriole number is inherently variable, making modest effects difficult to resolve. However, we must recognize that failure to observe sufficiency under these conditions would not invalidate the model for two reasons: 1) absence of evidence is not evidence of absence, and thus, we may not have found the right experimental design. 2) PIEZO1–YAP is a necessary input but not sufficient on its own, as elaborated below. For both reasons, we are very careful about the interpretation of results in the manuscript, which shows that this pathway is necessary for centriole amplification using loss-of-function approaches.

      Foxj1 overexpression: Foxj1 is a well-established regulator essential for motile and multiciliogenesis across species (Xenopus, zebrafish, mouse). Loss of Foxj1 reduces cilia number in MCCs, but its activation alone does not have a profound effect on ciliogenesis/cilia number in MCCs. This is because Foxj1 is a part of a larger network essential for multiciliogenesis. This parallels the behavior of other transcriptional regulators, such as Myb, where loss of function impairs centriole amplification, but overexpression does not drive the formation of supernumerary centrioles. Both studies are seminal discoveries in the field of ciliogenesis, but they did not demonstrate the sufficiency of these molecules/pathways. Thus, our results, demonstrating that Foxj1 is necessary to induce tension-dependent centriole amplification, are significant, as the reviewer mentioned. The lack of Foxj1 sufficiency to induce centriole amplification is not a deficiency of the study, but rather evidence that Foxj1 is a part of a larger network essential for tension-dependent centriole amplification.

      Necessity versus sufficiency: We respectfully emphasize that sufficiency is not a prerequisite for demonstrating the significance of a pathway. Mechanochemical signaling is inherently complex, involving many mechanosensitive proteins and pathways. In our case, mechanical stretch increases centriole amplification, with PIEZO1–YAP signaling identified as a key mediator. However, we do not claim that PIEZO1–YAP alone is sufficient. Other pathways, including cadherin-mediated junctions, F-actin–myosin contractility, integrin–focal adhesion signaling, and nuclear mechanotransduction, likely contribute and may regulate unique downstream effectors that collectively promote centriole amplification. Therefore, PIEZO1–YAP should be regarded as one essential component within a larger network.


      __TIMELINE: __We will perform these additional proposed experiments. Since the first author, a postdoctoral researcher on this manuscript, has started a new job and will be coming in on weekends to complete the experiments, we estimate it will take approximately 2-3 months to finish them.


      Reviewer #2:

      1. Considering the Yap-piezo mechanism of action, the authors' logic for the selection of myb, foxj, plk4 and ccno as transcriptional targets is clear, but the HCR-derived signal and the differences seen in the yap morphants are not very strong, notwithstanding the statistical significance. There appear to be distinct subgroups within the treated populations (in Figure S6B, although these data seem quite different in Fig. 7H, so a comment on the technical differences might be helpful), so that the extent to which Yap1 regulates (Myb-)Foxj1 expression in MCCs is not clearly demonstrated by this experiment. Related to this point, it is unclear why 20-25% of the yap1/ piezo1 MO-treated embryos do not show a decline in FOXj1 in Fig. 6, given the qualitative nature of the scoring. Assuming the KD penetrance would vary on a cell-to-cell basis, rather than an embryo-to-embryo basis, this may suggest that there are additional relevant targets (some of which are discussed by the authors). Single-cell analysis might be a way to address this; however, this is not a trivial experiment, it might be sufficient to include a caveat in the text. Furthermore, the conclusion that Foxj1 regulates centriole amplification in a tension-dependent manner is well-supported by the data.

      RESPONSE: We appreciate the reviewer’s thoughtful observation. Differences in the expression of Foxj1 from experiment to experiment are possible due to a combination of factors, including heterogeneity in MCC development across embryos, slightly different embryonic stages, differences in embryo quality between fertilizations, and variability in morpholino delivery and knockdown penetrance, which can occur both across embryos and on a cell-to-cell basis within an embryo. We also note that technical aspects of HCR RNA-FISH, such as proteinase K treatment and washing steps, can affect signal intensity, potentially contributing to the appearance of distinct subgroups within treated populations.

      We agree that single-cell analysis would be a powerful way to dissect these differences, but as the reviewer notes, this is not a trivial experiment and is beyond the scope of the present study. We have therefore added clarifications in the text and discussion to acknowledge these sources of variability and to highlight the possibility of parallel pathways regulating foxj1 expression.

      ********************************************

      Controls for the knockdowns by the various MOs should be provided.

      RESPONSE: We appreciate the reviewer’s comment. The piezo1 MO has been previously established in Kulkarni et al. (2021). Additionally, the current manuscript includes MO control experiments for both erk2 and yap1, through KD at the 1-cell stage using the MO oligonucleotide, followed by mosaic-rescue with the respective WT RNA constructs (mCherry-ERK2 and yap1-GFP) and a nuclear tracer molecule such as H2B-RFP (Fig. 5, E-H, Fig. S5, C&D, Fig. 3, D-F). The mosaic-rescue is a robust experiment that provides an internal control within the same embryo, thereby avoiding differences that may arise due to embryo-to-embryo variability, embryo quality, or differences in fertilization batches. This approach also serves as a valuable tool for detecting cell-autonomous effects, providing a clear readout against uninjected neighboring cells, as the injected cells are labeled with a tracer. We will perform a similar mosaic-rescue experiment for the foxj1 MO.

      TIMELINE: We will conduct mosaic-rescue experiments for the foxj1 MO. We will need 1 month to complete the experiment.

      ********************************************

      __Minor comments:

      __

      Autocorrection of ERK1/2 or MEK1/2 pathways to 1/2 should be avoided. – We are unclear on this comment. Can reviewer please clarify what they mean.


      Reviewer # 3

      Major concerns

      1- The presented data do not yet establish a specific, direct pathway linking mechanotransduction to centriole number, because the molecular players tested (PIEZO1, Ca²⁺, PKC, ERK, YAP, Foxj1) are highly pleiotropic. As such, the observed centriole number phenotypes, and some of the major conclusions, could be indirect. It is therefore critical to test the specificity and causality of the proposed pathway. This could be done with the authors' own strategies and/or with the following potential approaches:

      • Genetic dependency and sufficiency tests: It could be shown that Yoda1 has no effect in PIEZO1 loss-of-function MCCs, and that wild-type PIEZO1, but not conductance-ad PIEZO1 pore mutants restores Yoda1 responsiveness across centriole number, pERK, and YAP readouts. For example, PIEZO1 C terminus was shown to govern Ca²⁺ influx and ERK1/2 activation. Comparing full length PIEZO1 with a C terminal deletion in MCC restricted rescue; loss of rescue of centriole amplification and ERK/YAP activation with the C terminal deletion can provide a genetics anchored specificity test beyond broad inhibitors.

      RESPONSE:

      • To address the reviewer’s concern, we will test whether Yoda1 affects ERK and Yap activation when Piezo1 is depleted. We appreciate the reviewer’s thoughtful suggestion to employ genetic rescue experiments with Piezo1 mutants. Unfortunately, these are not technically feasible in Xenopus, as the Piezo1 coding sequence is exceptionally large (~7.5 kb)____, and repeated attempts by our group to generate and express stable, translatable transcripts have been unsuccessful. To address genetic dependency and specificity despite these technical barriers, we have employed a combination of orthogonal strategies that together provide strong genetic and mechanistic evidence:

      • Mosaic loss-of-function experiments (Fig. 1) demonstrate that Piezo1 regulates centriole number in a cell-autonomous manner, ruling out global epithelial or indirect tissue-wide effects.

      • Pharmacological activation/inhibition with Piezo1-specific agonist (Yoda1) and inhibitors (GSMTx4, gadolinium) produced consistent phenotypes, including activation of downstream ERK and YAP readouts. Notably, Yoda1 is a Piezo-specific agonist, not a broad pharmacological agent.
      • Downstream pathway dissection (calcium chelation, PKC inhibition, ERK2 depletion, and YAP1 knockdown/rescue) consistently converges on the same phenotypes, reduced centriole amplification and altered Foxj1 expression, providing multiple independent lines of evidence that the Piezo1–Ca²⁺–PKC–ERK–YAP axis specifically controls centriole number.
      • Positive feedback regulation of Piezo1 expression by YAP/Foxj1 (Fig. 7) further strengthens the argument for a pathway-specific role rather than pleiotropic, indirect effects. Taken together, while full-length Piezo1 rescue experiments are technically not possible in Xenopus due to gene size constraints, our data employ state-of-the-art genetic, pharmacological, and orthogonal functional assays to rigorously test pathway specificity. These complementary approaches provide compelling evidence for the causal role of Piezo1-mediated mechanotransduction in centriole number control in MCCs.

      • Downstream bypass/rescue experiments: In PIEZO1 loss-of-function or BAPTA conditions, can enforcing MEK/ERK activation or YAP rescue centriole number defect? Conversely, can MEK inhibitors block Yoda1-induced effects.

      RESPONSE: We appreciate the reviewer’s insightful questions.

      • We will express CA Yap in the Piezo1 KD background to assess if we can rescue centriole number. We also note that the converse experiment has already been performed in our study: 1) PKC inhibition abolishes Yoda1-induced ERK phosphorylation and nuclear localization (Fig. 2), 2) both MEK inhibition and ERK2 depletion block Yoda1-induced Yap activation and nuclear entry (Figs. 4, S2). Thus, we have directly demonstrated that MEK inhibition prevents Yoda1-induced effects, satisfying this aspect of the reviewer’s concern.

      ********************************************

      2- Image quantification and analysis must be described in greater detail in the Methods section, as they are central to the major conclusions of the manuscript. For example, the authors should explain how nuclear, cytoplasmic, and centriole segmentation were performed, and how relative protein levels in the nucleus versus the cytoplasm (e.g., YAP, volume- or area-based) were quantified. Specifically, the thresholds and segmentation criteria applied to different cellular structures under various conditions, as well as the use of Imaris and other software, should be clearly detailed.

      RESPONSE: We will describe the methods in greater detail.

      ********************************************

      3- PIEZO1 mRNA was shown to incrase in a Foxj1 linked feedback loop. Does this increase translate into an increase in total protein levels?

      RESPONSE: If the reviewer is referring to Figure 7B, that is the Piezo1 antibody, so yes, the Piezo1 protein levels have increased.

      If the reviewer is referring to Figure 7C and D, we show that loss of Foxj1 leads to a reduction in Piezo1 mRNA expression.

      ********************************************

      4- Is the proposed signaling cascade active in mammalian multiciliated cells (e.g., airway epithelium). If possible, testing this by using one of the major players of the pathway as a readout such as as ERK phosphorylation, YAP nuclear localization in mammalian MCCs will reveal whether regulation of centriole number through this pathway is conserved and would strengthen the generality.


      RESPONSE: We agree with the reviewer that testing conservation of this pathway in mammalian MCCs is of great interest. Indeed, another group is currently investigating the role of Yap in the mammalian airway epithelium; in their temporally controlled Yap knockout model (the global Yap KO being embryonic lethal), they observed that Yap loss led to a reduction in centriole number. To avoid overlap and direct competition with this ongoing work, we chose to focus our efforts on Xenopus.

      Importantly, Xenopus has become a widely recognized and powerful system for MCC biology, enabling mechanistic dissection of centriole amplification and ciliogenesis. Several key discoveries in the field, including the identification of MCIDAS as a master regulator of MCC fate, were first made in Xenopus before being validated in mammals. Similarly, our study provides a mechanistic framework in Xenopus that can inform and guide ongoing studies in the mammalian airway.

      ********************************************

      5- Throughout the results section, there are multiple times where authors raised specific hypothesis about their data (e.g. foxj1 regulation of number control, apical actin/YAP). However, they have not tested them. These hypothesis are very exciting and if possible, testing experimentally, would strengthen the conclusions associated with them.

      RESPONSE: We are not sure what the reviewer means here by “authors raised specific hypothesis about their data (e.g., foxj1 regulation of number control, apical actin/YAP). However, they have not tested them”,

      BECAUSE:

      • Foxj1 regulation of centriole number: We demonstrate a clear reduction in centriole number upon Foxj1 depletion, and importantly, we extend this finding by showing that the reduction is tension-dependent (Fig. 6). We will perform a rescue assay to demonstrate the specificity.
      • Foxj1 and YAP: We never claimed that Foxj1 regulates YAP expression, and this is not part of our proposed model. Instead, our data show that Piezo1–ERK–YAP signaling regulates Foxj1
      • Foxj1 and apical actin: Foxj1 regulation of apical F-actin has already been established in prior work, and in our study, we clearly observe reduced apical actin intensity in Foxj1-depleted MCCs (Fig. 6). To further strengthen this conclusion, we will provide a quantitative analysis of apical actin intensity in Foxj1 morphants. ********************************************

      __TIMELINE: __We will perform these additional proposed experiments. Since the first author, a postdoc on this manuscript, has started a new job and will be coming in on weekends to finish the experiments, we estimate it will take approximately 2-3 months to complete them.

      Minor comments

      MCC vs non MCC identification (Fig. 1): Clarify how non MCCs were distinguished from MCCs (e.g. markers/criteria). – Can the reviewer please clarify which panel or panels? Or provide more specific text that needs to be changed.

      Add the Kintner group reference linking motile cilia number and centriole number in Xenopus MCCs.– Can the reviewer clarify where and which reference? Thank you.

      3. Description of the revisions that have already been incorporated in the transferred manuscript

      Please insert a point-by-point reply describing the revisions that were already carried out and included in the transferred manuscript. If no revisions have been carried out yet, please leave this section empty.

      Reviewer 2

      Major comments:

      1. It should be clarified whether the immunoblots and the related quantitations in Figs. 2 and S2 are all from separate blots/ exposures. If so, they are not useful as controls, and these blots should be repeated with the relevant samples analyzed in parallel. Size markers and labels should be included (2B, 2G; S2B and S2G). An increase in total ERK would alter the interpretation of the increase in nuclear pERK in the IF experiments. RESPONSE: We thank the reviewer for raising this important point regarding clarification of the immunoblots. All experimental groups were analyzed in parallel with their corresponding controls. Because the primary antibodies for pERK and ERK were both raised in rabbit, we optimized our workflow to prevent protein loss during stripping and to ensure accurate visualization. Specifically, lysates from each experimental group were loaded in duplicate on the same gel, separated by a molecular weight ladder that served as a reference point. After transfer, the blot was cut along the ladder, and the two halves were processed in parallel: one probed with anti-pERK and the other with anti-ERK. This strategy ensured that all samples from a single experiment (e.g., Control and Yoda1-treated groups) were analyzed under identical conditions, with staining and imaging performed together at the same exposure. To enhance clarity, we have provided this data as __uncut, full-length __as Supplemental Figure 7 (Figure S7) in the revised revision.

      ********************************************

      Minor comments:

      1. Reference list should be checked for completeness; some citations lack journal/ volume/ page/ year details. – We have corrected the references.
      2. An 'overexposed' version of the image selected for centrioles in Figure 5F might be included with the Chibby-BFP at the same level as in the other figures. At present, the Yap KD cell in the image appears to have normal centrioles; this is potentially confusing, even though the authors clearly explain the matter in the text. – __We have added a new panel to Fig. 5F to avoid confusion.

      __ 3. It might be clearer to present injected/ uninjected in the same orientation in Fig. 6A and B. – __Unfortunately, that is not possible because the injected and uninjected sides are left and right, and they cannot be in the same orientation.

      __ 4. Figure 7B lacks the schematic described in the figure legend. – We have removed the Schematic sentence from the figure legend. That was an error on our side. Thank you for catching it.


      Reviewer 3


      1. Abstract: "how MCCs regulate centriole/cilia numbers remains a major knowledge gap" overstates the field; please soften to reflect recent advances (mechanics/apical area scaling; PIEZO1 implication). – We changed the text to “incompletely understood”.
      2. GsMTx4 rationale: State that GsMTx4 is a spider venom peptide that inhibits cationic mechanosensitive channels (including PIEZO1) and justify its use alongside Yoda1.– GsMTx4 was used in the previous manuscript, and its use was justified there. Here, we are only comparing the results. However, we have added a sentence describing what GSMTx4 is. We have also included a sentence explaining the use of Yoda1. “GsMTx4, a spider venom peptide used in our previous study, inhibits cationic mechanosensitive channels, including Piezo1.”

      “For this experiment, we used the Piezo1 channel-specific chemical agonist, Yoda1, to increase the sensitivity of Piezo1 and upregulate calcium entry into cells”

      Timeline statement: "Centriole amplification to migration and apical docking takes ~4-5 h (personal observation)" is not appropriate; either cite time lapse literature or include your own time lapse data.– We have added a reference that showed imaging for 2 hours, but it was not enough to capture the entire process from intercalation to maturation, so we also kept “personal observation” still in the manuscript. We are unaware of any study that has done time-lapse imaging for 4 hours to capture the entire process of centriole amplification.

      Redundancy: The description of Yoda1 as a channel specific agonist is repeated; keep only once.- Removed

      "WT yap1 GFP construct previously used by Dr. Lance Davidson ..." should move construct description to Methods and keep only the citation in Results.– We moved it to Methods.

      "(Unpublished data; Dr. Mahjoub)" should be removed unless data are shown.- Removed

      Replace "as shown previously in our eLife paper" with "as we previously showed or shown previously (Kulkarni et al., 2021)".– We have made the change.

      The two hypotheses for how Foxj1 could regulate number under tension (actin remodeling vs. transcriptional control of amplification genes) belong in the Discussion unless tested. Moreover, the part on the discussion on yap sequestration by apical actin and the two possibilities presented also should go do discussion. – We have moved both to the discussion section.

      4. Description of analyses that authors prefer not to carry out

      Please include a point-by-point response explaining why some of the requested data or additional analyses might not be necessary or cannot be provided within the scope of a revision. This can be due to time or resource limitations or in case of disagreement about the necessity of such additional data given the scope of the study. Please leave empty if not applicable.

      Reviewer 3

      1- The hypothesis about the centriole pool of Piezo as the mechnosensor for centriole number regulation is very exciting and novel. Can localization controlled variants be used to test whether a centriole associated pool directly senses tension for number control (for example, centrosome targeted PIEZO1 via a PACT tag). Alternatively, broad cellular Ca sensors (GcaMP) or centrosome proximal Ca sensors (e.g., PACT GCaMP) can be used detect local calcium microdomains during tethering or Yoda1 treatment.

      RESPONSE: We appreciate the reviewer's curiosity and excitement; however, these experiments will not alter the conclusion of this paper and will be part of the next study, which aims to delve deeper into how different pools of Piezo1 at centrioles versus cell junctions function in MCCs. To that point, we had thought about these experiments. As mentioned earlier, the Piezo1 coding sequence is exceptionally large (~7.5 kb)____, and repeated attempts by our group to generate and express stable, translatable transcripts have been unsuccessful. Thus, the idea of centrosome-targeted PIEZO1 via a PACT is very exciting; however, it is not technically feasible. Beyond size, PIEZO1 is a trimeric, large plasma-membrane mechanosensitive channel that requires proper ER processing and bilayer incorporation. PACT localizes cargo to the centriole/pericentriolar material, not a membrane compartment; thus, a PACT-anchored PIEZO1 would be membrane-mismatched and almost certainly nonfunctional even if expressed/

      Second, Centrosome-proximal GCaMP (PACT-GCaMP) would show correlation, not causation. This experiment does not address the question “centriole pool of Piezo as the mechanosensor for centriole number regulation”. It will only show if the Ca2+ influx is happening at the basal bodies, but not whether and how that Ca2+ is essential for centriole amplification. For this purpose, we will need to find a way to block Ca2+ influx specifically at basal bodies, rather than junctions, which will require extensive controls.

      We do not claim that any specific Piezo1 or Ca2+ pool is critical for controlling centriole number and thus the suggested experiment would not alter the manuscript's conclusions. We therefore view the above as exciting future directions rather than prerequisites.

      ********************************************

      2- Because the proposed pathway is tension-sensing and YAP pathway is tightly linked to the actin cytoskeleton, the role of actin cysoskeleton in the proposed pathway should be tested directly. The authors mention different hypothesis around actin but has not tested them in the manuscript. For example, actin-depedent sequestration of Yap at the apical surface is intriguing. Does actin polymerization induced by drugs release Yap from the apical surface?

      RESPONSE: We would like to thank the reviewer for their suggestion. As per the reviewers' suggestion, we have moved this section to discussion, stating that “In the future, we plan to address this question by examining how Yap is sequestered by apical actin.”.

      However, we appreciate the reviewer’s enthusiasm and would like to share some experiments we are thinking/planning of to test the hypothesis.

      We plan to examine if the actin polymerization or contractility is responsible for Yap sequestration/release from the apical surface with the following experiments: 1) if the Yap is displaced by Jasplakinolide treatment, which stabilizes filamentous actin, 2) use of ROCK inhibitor to decrease contractility in the absence or presence of Yoda1, 3) Use genetic constructs such as Shroom3 to increase ROCK-mediated contractility to observe changes in Yap localization and dynamics.

      Although these experiments are interesting, they do not alter the conclusion of the current manuscript, and they represent future directions for our research.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1

      Summary: The authors have previously published Mass-spectrometry data that demonstrates a physical interaction between Sall4 and the BAF chromatin complex in iPSC derived neurectodermal cells that are a precursor cell state to neural crest cells. The authors sought to understand the basis of this interaction and investigate the role of Sall4 and the BAF chromatin remodelling complex during neural crest cell specification. The authors first validate this interaction with a co-IP between ARID1B subunit and Sall4 confirming the mass spec data. The authors then utilise in silico modelling to identify the specific interaction between the BAF complex and Sall4, suggesting that this contact is mediated through the BAF complex member DPF2. To functionally validate the role of Sall4 during neural crest specification, the authors utilsie CRISPR-Cas9 to introduce a premature stop codon on one allele of Sall4 to generate iPSCs that are haploinsufficient for Sall4. Due to the reports of Sall4's role in pluripotency, the authors confirm that this model doesn't disrupt pluripotent stem cells and is viable to model the role of Sall4 during neural crest induction. The authors expand this assessment of Sall4 function further during their differentiation model to cranial neural crest cells, assessing Sall4 binding with Cut+Run sequencing, revealing that Sall4 binds to motifs that correspond to key genes in neural crest differentiation. Moreover, reduction in Sall4 expression also reduces the binding of the BAF complex, through Cut and Run for BRG1. Overall, the authors then propose a model by which Sall4 and BRG1 bind to and open enhancer regions in neurectodermal cells that enable complete differentiation to cranial neural crest cells.

      Overall, the data is clear and reproducible and offers a unique insight into the role of chromatin remodellers during cell fate specification.

      We thank the Reviewer for the nice words of appreciation of our manuscript.

      However, I have some minor comments.

      1- Using AlphaFold in silico modelling, he authors propose the interaction between the BAF complex with Sall4 is mediated by DPF2, but don't test it. Does a knockout, or knockdown of DPF2 prevent the interaction?

      We agree with the Reviewer that we are not functionally validating our computational prediction that DPF2 is the specific BAF subunit directly linking SALL4 with BAF. We chose not to perform the validation experiment for two main reasons:

      1) This would be outside of the scope of the paper. In fact, from a mechanistic point of view, we have confirmed via both Mass-spectrometry and co-IP with ARID1B that SALL4 and BAF interact in our system. Moreover, mechanistically we also extensively demonstrate that the interaction with SALL4 is required to recruit BAF at the neural crest induction enhancers and we further demonstrate that depletion of SALL4 impairs this. In our view, this was the focus of the manuscript. On the other hand, detecting with certainty which BAF subunit mediates the interaction with SALL4 would be outside the scope of the paper.

      2) Moreover, after careful consideration, we don’t think that even a knock-out of DPF2 would provide a definite answer to which exact BAF subunit mediates the interaction with SALL4. In fact, knock out of DPF2 could potentially disrupt BAF assembly or stability, and this could result in a disruption of the interaction with SALL4 even if DPF2 is not the very subunit mediating it (in other words the experiment could provide a false positive result). In our opinion, the only effective experiment would be mutating the DPF2 residues that we computationally predicted as responsible for the interaction with SALL4, but again this would be very laborious and out of the scope.

      That being said, we agree with the Reviewer that while the SALL4-BAF interaction was experimentally validated with robust approaches, the role of DPF2 in the interaction was only computationally predicted, which comes as a limitation of the study. We have now added a dedicated paragraph in the discussion to acknowledge such limitation.

      2- OPTIONAL: Does knockout of DPF2 phenocopy the Sall4 ko? This would be very interesting to include in the manuscript, but it would perhaps be a larger body of work.

      See point-1.

      3- Figure 1, the day of IP is not clearly described until later in the test. please outline during in the figure.

      We thank the Reviewer for pointing this out. This has been fixed.

      3- What is the expression of Sall1 (and other Sall paralogs) during differentiation. The same with the protein levels of Sall4, does this remain at the below 50%, or is this just during pluripotency?

      As Recommend by the Reviewer, we have performed time-course WB of SALL1 and SALL4. These experiments revealed that SALL1 remains very lowly expressed in wild-type conditions across time points and all the way through differentiation until CNCC (See updated supplementary Fig. S9). This is consistent with previous studies that demonstrated that SALL4, but not SALL1, is required for early mammalian development (see for example Miller et al. 2016, Development, and Koulle et al. 2025, Biorxiv). We performed the same time-course WB for SALL4 which revealed that SALL4 expression progressively decreases after day-5 (as expected) and it’s very low at CNCC stage (day-14), therefore we would expect the KO to remain at even lower level at this stage.

      4- The authors hypothesise that Sall4 binds to enhancers- with the criteria for an enhancer being that these peaks > 1KB from the TSS are enhancers. Can this be reinforced by overlaying with other ChIP tracks that would give more confidence in this? There are several datasets from Joanna Wysocka's lab that also utilise this protocol which can give you more evidence to reinforce the claim and provide further detail as to the role of Sall4.

      We thank the Reviewer for this great suggestion. As recommended, we have used publicly available ChIP-seq data generated by the Wysocka lab (H3K4me1, H3K4m3) and also generated new H3K27ac CHIP-seq data as well. These experiments and analyses confirmed that these regions are putative CNCC enhancers (and a minority of them putative promoters), decorated with H3K4me1 and with progressive increase in H3K27ac after CNCC induction (day-5). See new Supplementary Figure S6.

      5- The authors state that cells fail to become cranial neural crest cells, however they do not propose what the cells do instead. do they become neural? Or they stay at pluriopotent, which is one option given the higher expression of Nanog, OCT4 and OTX2 that are all expressed in pluripotent stem cells.

      We think that it is likely a mix of both. There is a mixed bag of expression of pluripotency markers, but also high expression of neuroectodermal markers. This suggests that most cells safely reach the neuroectodermal stage but fail to go beyond that, while some of the cells simply do not differentiate or regress back to pluripotency. We would rather refrain on overinterpreting what the KO-cells become, as it is likely an aberrant cell type, but following the Reviewer’s indication we have added a paragraph in the discussion to speculate on this.

      6- In general, I would like to see the gating strategy and controls for the flow cytometry in a supplemental figure.

      As Recommended by the Reviewer, we have added the gating strategy in the Supplementary Fig. S4.

      7- For supplementary figure 1- please include the gene names in the main image panels rather than just the germ layer.

      Done. The figure is now Supplementary Figure S3 since two supplementary figures were added before.


      Reviewer #2

      Summary In this manuscript, the authors build on their previous work (Pagliaroli et al., 2021) where they identified an interaction between the transcription factor SALL4 and the BAF chromatin remodeling complex at Day-5 of an iPSC to CNCC differentiation protocol. In their current work, the authors begin by exploring this interaction further, leveraging AlphaFold to predict interaction surfaces between SALL4 and BAF complex members, considering both SALL4 splice isoforms: a longer SALL4A (associated with developmental processes) and a shorter SALL4B (associated with pluripotency). They propose that SALL4A may interact with DPF2, a BAF complex member, in an isoform-dependent manner. The authors next explore the role of SALL4 in craniofacial development, motivated by patient heterozygous loss of function mutations, leveraging iPSC cells with an engineered SALL4 frameshift mutation (SALL4-het-KO). Using this model, the authors first demonstrate that a reduced expression of SALL4 does not impact the iPSC identity, perhaps due to compensation via upregulation of SALL1. Upon differentiation to neuroectoderm, SALL4 haploinsufficiency causes a reduction in newly accessible sites which are associated with a reduction in SALL4 binding and therefore a loss of BAF complex recruitment. Interestingly, however, there were few transcriptional changes at this stage. Later in the CNCC differentiation at Day-14 when the wildtype cells have switched expression of CNCC markers, the SALL4-het-KO cells fail to switch cadherin expression associated with a transition from epithelial to mesenchymal state, and fail to induce CNCC specification and post-migratory markers. Together the authors propose that SALL4 recruits BAF to CNCC enhancers as early as the neuroectodermal stage, and failure of BAF recruitment in SALL4-het-KO lines results in a loss of open chromatin at regulatory regions required later for induction of the CNCC programme. The failure of the later differentiation is compelling in the light of the early stages of the differentiation progressing normally, and the authors outline an interesting proposed mechanism whereby SALL4 recruits BAF to remodel chromatin ahead of CNCC enhancer activation, a model that can be tested further in future work. The link between SALL4 DNA binding and BAF recruitment is nicely argued, and very interesting as altered chromatin accessibility at Day 5 in the neuroectodermal stage is associated with only few changes in gene expression, while gene expression is greatly impacted later in the CNCC stage at Day 14. The in silico predictions of SALL4-BAF interaction interfaces are perhaps less convincing, requiring experimental follow-up outside the scope of this paper. Some of the associated figures could perhaps be moved to the supplement to enhance the focus on the later functional genomics experiments.

      We thank the Reviewer for the nice words of appreciation of our manuscript.

      Major comments

      1. A lot of emphasis is placed on the AlphaFold predictions in Figure 1, however the predictions in Figure 1B appear to be mostly low or very low confidence scores (coloured yellow and orange). It is unclear how much weight can be placed on these predictions without functional follow-up, e.g. mutating certain residues and showing impact on the interaction by co-IP. The latter parts of the manuscript are much better supported experimentally, and therefore perhaps some of the Figure 1 could move to a Supplemental Figure (e.g. the right-hand part of 1B, and the lower part of Figure 1C showing SALL4B predicted interactions). The limitations of AlphaFold predictions should be acknowledged and the authors should discuss how these predicted interactions could be experimentally explored further in the future.

      As recommended by the Reviewer, we have moved part of the AlphaFold predictions to Supplementary Figure S1, and we added a paragraph in the discussion to acknowledge the limitations of AlphaFold.

      The authors only show data for one heterozygous knockout clone for SALL4. It is usual to have more than one clone to mitigate potential clonal effects. The authors should comment why they only have one clone and include any data for a second clone for key experiments if they already have this. Alternatively, the authors could provide any quality control information generated during production of this line, for example if any additional genotyping was performed.

      We apologize for the confusion and for our lack of clarify on this. We have used two clones (one generated with a 11 bp deletion, one with a 19 bp deletion, both in exon-1, see also the point 6 of your minor points). The two clones were used as biological replicates, so for example the two ATAC-seq replicates performed in each time point were performed with the two different clones, and the three RNA-seq replicates were performed with two technical replicates of the clone with the 11bp deletion and one replicate with the clone with 19 bp deletion. We have clarified this in the methods section of the manuscript and added a Supplementary Figure (S2) showing the editing strategy for the two clones. Thank you for catching it.

      The authors show all genomics data (ATAC-seq, CUT&RUN and ChIP-seq) as heatmaps and average profiles. It would be valuable to see some representative loci for the ATAC seq (perhaps along with SALL4 and BRG1 recruitment) at some representative and interesting loci.

      As recommended by the Reviewer, we have added Genome Browser screenshots of representative loci in Fig. 6.

      Figure 4A. The schematic could be improved by including brightfield or immunofluorescent images at the three stages of the differentiation. Are the iPS cells seeded as single cells, or passaged as colonies before starting the differentiation. Further details are required in the methods to clarify how the differentiation is performed, for example at what Day are the differentiating cells passaged, this is not shown on the schematic in Figure 4A.

      As recommended, we added IF images in the Fig. 4A schematic, and added more details in the methods.

      There is likely some heterogeneity of cell types in the differentiation at Day 5 and Day 14. Can the authors comment on this from previous publications or perhaps conduct some IF for markers to demonstrate what proportions of cells are neuroectoderm at Day 5 and CNCCs at Day 14.

      The differentiation starts with single cells that aggregate to form neuroectodermal clusters, as per original protocol. The CNCCs that we obtain with this protocol homogeneously express CNCC markers, as shown by IF of SOX9 in Fig. 4A. For the day-5, as recommended we have added IF for PAX6 also showing homogeneous expression (Fig. 4A).

      For the motif analysis for Day 5-specific SALL4 binding sites (Figure 4E), was de novo motif calling performed? Were any binding sites reminiscent of a SALL4 binding site observed (e.g. an AT-rich motif)? Could the authors comment on this in the text - if there is no SALL4 binding motif, does this suggest SALL4 is recruited indirectly to these sites via interaction with another transcription factor for example?

      Similar to SALL4, SALL1 also recognizes AT-rich motifs. However, while we found AT-rich motifs as enriched in our day-5 motif analysis (in the regions that gain SALL4 binding upon differentiation), the enrichment is not particularly strong, and several other motifs are significantly more enriched, suggesting that, like the Reviewer mentioned, SALL4 might be recruited indirectly at these sites by other factors. We have added a paragraph on this in the discussion.

      Does SALL1 remain upregulated at Day-5 and Day-14 of the differentiation for the SALL4-het-KO line? Are binding sites known for this TF and were they detected in the motif analysis performed? Further discussion of the impact of the overexpression of SALL1 on the phenotypes observed is warranted - e.g. for Figure 5F, could the sites associated with a gain of BRG1 peaks upon loss of SALL4 be associated with SALL1 being upregulated and 'hijacking' BAF recruitment to distinct sites associated with nervous system development? Is SALL1 still upregulated at Day 5?

      As mentioned above, SALL1 also recognizes AT-rich motifs but similar to SALL4 also binds unspecifically, likely in cooperation with other TFs. Like the Reviewer suggested, it is certainly possible that some of the sites associated with a gain of BRG1 peaks upon loss of SALL4 could be associated with SALL1 being upregulated and 'hijacking' BAF recruitment to distinct sites. While this is speculative, we have added a paragraph on this in the discussion.

      Related to the point above, SALL4A is proposed to have an isoform-specific interaction with the BAF complex. It would be valuable to plot SALL4A and SALL4B expression from the available RNA-seq data at Day 0, 5 and 14 to explore whether stage-specific isoform expression matches with the proposed role of SALL4A to interact with BAF at Day 5. It could be valuable to also look at expression of SALL1, 2 and 3 across the time course to see whether additional compensation mechanisms are at play during the differentiation.

      Thanks for suggesting this. We performed a time course analysis of isoform specific gene expression, which showed that SALL4B expression remains low throughout differentiation, while SALLA4A expression increases upon differentiation cues and it remains at high levels until the end. We have added this to supplementary Fig. S9. Moreover, we have performed an additional experiment, using pomalidomide, which is a thalidomide derivative that selectively degrades SALL4A but not SALL4B. Notably, SALL4A degradation recapitulated the main findings obtained with the CRISPR-KO of SALL4, further supporting that SALL4A is the isoform involved in CNCC induction (see new Fig. 8).

      At line 264, The authors state "SALL4 recruits the BAF complex at CNCC developmental enhancers to increase chromatin accessibility". Given that this analysis is performed at Day 5 of the differentiation, which is labelled as neuroectoderm what evidence do the authors have that these are specifically CNCC enhancers? Statements relating to enhancers should generally be re-phrased to putative enhancers (as no functional evidence is provided for enhancer activity), and further evidence could be provided to support that these are CNCC-specific regulatory elements, e.g. showing representative gene loci from CNCC-specific genes. Discussion of the RNA-seq presented in Supplementary Figure 2B may also be appropriate to introduce here given that large numbers of accessible chromatin sites are detected while the expression of very few genes is impacted, suggesting these sites may become active enhancers at a later developmental stage.

      As also recommended by the other Reviewer, to further characterize these sites, we have used publicly available histone modification CHIP-seq data (H3K4me1, H3K4me3) generated by the Wysocka lab (H3K4me1, H3K4m3) and also generated new H3K27ac CHIP-seq data as well. These experiments and analyses confirmed that these regions are putative CNCC enhancers (and a minority of them putative promoters), all decorated with H3K4me1, and all showing progressive increase in H3K27ac after CNCC induction (day-5). See new Supplementary Figure S6.

      1. Do any of the putative CNCC enhancers detected at Day 5 as being sensitive to SALL4 downregulation and loss of BAF recruitment overlap with previously tested VISTA enhancers (https://enhancer.lbl.gov/vista/)?

      Yes, we have found examples of overlap and have included two of them in the updated Figure 6 as Genome Browser screenshots.

      Minor comments

      1. The authors are missing references in the introduction "a subpopulation of neural crest cells that migrate dorsolaterally to give rise to the cartilage and bones of the face and anterior skull, as well as cranial neurons and glia".

      Fixed, thank you.

      The discussion of congenital malformations associated with SALL4 haploinsufficiency is brief in the introduction. From OMIM, SALL4 heterozygous mutations are implicated with the condition Duane-radial ray syndrome (DRRS) with "upper limb anomalies, ocular anomalies, and, in some cases, renal anomalies... The ocular anomalies usually include Duane anomaly". That Duane anomaly is one phenotype among a number for patients with SALL4 haploinsufficiency could be clarified in the introduction. Of note, this is stated more clearly in the discussion but needs re-wording in the introduction.

      Done, thank you.

      The statements "show that the SALL4A isoform directly interacts with the BAF complex subunit DPF2 through its zinc-finger-3 domain" and "this interaction occurs between the zinc-finger-cluster-3 (ZFC3) domain of SALL4A and the plant homeodomains (PHDs) of DPF2" in the introduction appear overstated and should be toned down. To show this the authors would need to mutate or delete the proposed important zinc-finger domains from SALL4A, which is outside the scope of this work. Notably, this is less strongly-stated elsewhere in the manuscript, e.g "predict that this interaction is mediated by the BAF subunit DPF2", Line 162.

      Done, thank you.

      Could the authors clarify why 3 Alphafold output models are shown for SALL4B in Figure 1C, and only one output model for SALL4A?

      AlphaFold3 produces five separate predicted models per protein combination (e.g., Model_1 … Model_4), each derived from slightly different network parameters or initializations. The final output prioritizes the model with the highest confidence score. This multi-model strategy enables the identification of the most robust conformation while providing a measure of structural uncertainty (as per GitHub documentation for AlphaFold3). wE have conducted the same analysis for SALL4A as we did for SALL4B. Specifically, SALL4A interacts with the AT-rich DNA in models 0, 1, and 2, therefore models 3 and 4 were excluded. When analysing models 1 and 2, we found a higher number of residues involved in the interaction (>800 instead of 396). Similarly to model 0, only the interactions between residues belonging to an annotated functional domain (ZFs and PHDs) were considered.

      In Model 1: SALL4A and DPF2 interact mainly through ZF6 and 7, and not 5 as Model 0.

      In Model 2: SALL4A and DPF2 interact mainly through ZF5 and 6, and not 7 as Models 0. In contrast, this model shows an interaction with ZF1 not shown in the other two models, but with a higher PAE (31 average compared to 25 to 27 average of the other two ZFs.

      Therefore, we considered Model 0 as it is the model with higher confidence and representative of all significant models (includes ZF5, 6, and 7).

      Line 121. The authors state "DPF2, a broadly expressed BAF subunit,", but don't show expression during their CNCC differentiation. It would be good to include expression of DPF2 in Figure 1E.

      Done, thank you.

      The text states "a 11 bp deletion within the 3'-terminus of exon 1 of SALL4", while the figure legend states, "Sanger sequencing confirming the 19 bp deletion in one allele of SALL4 is displayed". The authors should clarify this disparity and experimentally confirm the deletion, e.g. by TA-cloning the two alleles and sequencing these separately to show that one allele is wildtype and the other has a frameshift deletion.

      We apologize for the confusion. As stated above (point-2 of the major comments), we have used two clones (one generated with a 11 bp deletion, one with a 19 bp deletion, both in exon-1, see also the point 6 of your minor points). The two clones were used as biological replicates (see response above for details). The deletion for both clones was experimentally confirmed by Sanger sequencing by the company that generated the lines for us (Synthego). The strategy for the two clones is now shown also in Supplementary Fig. S2.

      The authors generate an 11-bp (or 19-bp?) deletion in exon-1 - it would be valuable to include a discussion whether patients have been identified with deletions and frame-shift mutations in this region of SALL4 exon-1. And also clarify, if not clearly stated in the text, that both SALL4A and SALL4B will be impacted by this mutation. Are there examples of patient mutations which only impact SALL4A?

      As requested, we have added a discussion paragraph to discuss this. And, yes, both SALL4A and SALL4B are impacted by both deletions in both clones (11 bp and 19 bp deletion).

      Regarding patient variants on exon-1 and patient variants that only impact SALL4A. We could only find one published pathogenic 170bp deletion in exon 1 (VCV000642045.7). The majority of the pathogenic or likely pathogenic variances are located on exon2. In particular, of the 63 reported pathogenic (or likely pathogenic) clinical variants, 42 were located on exon 2. Among these, 28 are located in the portion shared by both SALL4A and SALL4B, while the remaining 14 were SALL4A specific.

      For the SALL4 blots in Figure 2B, is the antibody expected to detect both isoforms (SALL4A and SALL4B), and which isoform is shown? If two isoforms are detected, they should both be presented in the figure.

      Yes, the antibody detects both isoforms, and we now present both in the figure 2, as recommended.

      SALL4 expression should be shown for Figure 2C to see whether the >50% down-regulation of SALL4 at the protein level may be partially driven by transcriptional changes.

      Done, thank you. As expected, we observed the SALL4 mRNA expression in the KO line is comparable to wild-type conditions, but still this results in a significant decrease of the SALL4 protein level likely because of autoregulatory mechanisms coupled with non-sense mediated decay of the mutated allele. Also, we note that SALL4 usually makes homodimers, therefore lack of sufficient amount of protein could also lead to degradation of the monomers.

      The number of experimental replicates should be indicated in all figure legends where relevant. Raw data points should be plotted visibly over the violin plots (e.g. Figure 2C).

      Done, thank you.

      For Figure 3A, the images of the DAPI and NANOG/OCT4 staining should be shown separately in addition to the overlay.

      Done, thank you.

      The metric 'Corrected Total Cell Fluorescence (CTCF)' should be described in the methods. The number of images used for the quantification in Figure 3A should be

      Done, thank you.

      Figure 3C - what are the 114 differentially expressed genes? Some interesting genes could be labelled on the plot and the data used to generate this plot should be included as a Supplementary Table. Supplementary Tables should similarly be provided for Figure 6C, Day 14 and Supplementary Figure 2B, Day 5.

      As recommended, we have highlighted some interesting genes in the volcano plot and also included all the expression data for all genes in Supplementary Table S3.

      Figure 4B. The shared peaks are not shown. For completeness, it would be ideal to show these sites also.

      Done, thank you.

      Figure 4C is difficult to interpret. Why is the plot asymmetric to the left versus right? What does the axis represent - % of binding sites?

      The asymmetry is due to the fact that there is a larger number of peaks that are downstream of the TSS than peaks that are upstream of TSS. This is consistent with the fact that many SALL4 peaks are in introns, likely representing intronic enhancers.

      Line 224-225. What do n= 3,729 and n= 6,860 refer to? There appear to be many more binding sites indicated in Figure 4B, therefore these numbers cannot represent 86% and 97% of sites?

      Thank you for pointing this out, we should have specified in the text. Those numbers refer to the genes whose TSS is closest to each SALL4 peak. Notably, multiple peaks can share the same closest TSS, hence the discrepancy between # of peaks and # of nearest genes.

      Raw numbers:

      • Day-0 RAW = 6,104 (peaks = 6,114);
      • Day-5 RAW = 17,131 (peaks = 17,137). Now raw data reported in Supplementary Table 4.

      Figure 4E. Several TFs mentioned in the text (Line 243) are not shown in the figure, it would be good to show all TFs motifs mentioned in the text in this figure. Again, there is no mention of whether a sequence-specific motif is detected for SALL4 (e.g. an AT-rich sequence) from this motif analysis.

      Done, thank you. An AT-rich sequence, resembling the SALL4 motif, was detected in a small minority of sites (this is now shown in Supplementary Figure S5), suggesting that SALL4 engages chromatin in a broad manner, going beyond its preferred motif, possibly in cooperation with other TFs. This is consistent with many studies that in mESCs have shown that SALL4 binds at OCT4/NANOG/SOX2 target motifs. This is now discussed in a dedicated paragraph in the discussion.

      Figure 4G. How was the ATAC-seq data normalized for the WT and SALL4-het-KO lines for this comparison? The background levels of accessibility seem quite different in Replicate 1.

      The bigwigs used to make the heatmaps are normalized by sequencing depth using the Deeptools Suite (normalization by RPKM).

      Figures 5B-C could be exchanged to flow better with the text. A Venn diagram could be included to show the overlap between the sites losing BRG1 in SALL4-het-KO (13,505 sites) and the Day5-specific SALL4 sites (17,137 sites).

      Done, thank you.

      At Day 5, the authors suggest a shift towards neural differentiation. It could be interesting for the authors to perform qRT-PCR at Day 5 for some neural markers or look in the Day 14 data for markers of neural differentiation at the expense of CNCC markers.

      See updated Supplementary Fig. S8, where we show timecourse expression of several genes, including neural markers.

      Is the data used to plot Figure 5D the same as Figure 4G. If so, why is only one replicate shown in Figure 5D?

      Only one replicate was shown in the main figure purely for lack of space, but the experiment was replicated twice (with the two different clones), and the results were exactly the same. See plots below for your convenience:

      Figure 6A. How many replicates are shown? If n=2, boxplots are not an appropriate to represent the distribution of the data. Please include n= X in the figure legend and plot the raw data points also.

      Done, thank you, and as suggested we are no longer using boxplots for this panel.

      Figure 6B. What is the significance of CD99 for CNCC differentiation?

      Figure 6F. No error bars are shown, how many replicates were performed for this time couse? The linear regression line does not appear to add much value and could be removed.

      As suggested, we have removed these plots and replaced them with individual genes plots, which include error bars. See updated Supplementary Figure S8.

      At line 304, the authors state "while SALL4-het-KO showed a significant downregulation of these genes". Perhaps 'failed to induce these genes' may be more accurate unless they were expressed at Day 5 and downregulated at Day 14.

      Done, thank you.

      Lines 332-335. The genes selected for pluripotency, neural plate border, CNCC specification could be plotted separately in the Supplement to show individual gene expression dynamics.

      Done, thank you, see point 24.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      Summary: 

      In this manuscript, Singh, Wu and colleagues explore functional links between septins and the exocyst complex. The exocyst in a conserved octameric complex that mediates the tethering of secretory vesicles for exocytosis in eukaryotes. In fission yeast cells, the exocyst is necessary for cell division, where it localizes mostly at the rim of the division plane, but septins, which localize in a similar manner, are non-essential. The main findings of the work are that septins are required for the specific localization of the exocyst to the rim of the division plane, and the likely consequent localization of the glucanase Eng1 at this same location, where it is known to promote cell separation. In the absence of septins, the exocyst still localizes to the division plane but is not restricted to the rim. They also show some defects in the localization of secretory vesicles and glucan synthase cargo. They further propose that interactions between septins and exocysts are direct, as shown through Alphafold2 predictions (of unclear strength) and clean coIP experiments. 

      Strengths: 

      The septin, exocyst and Eng1 localization data are well supported, showing that the septin rim recruits the exocyst and (likely consequently) the Eng1 glucanase at this location. One major finding of the manuscript is that of a physical interaction between septins and exocyst subunits. Indeed, many of the coIPs supporting this discovery are very clear. 

      Weaknesses: 

      I am less convinced by the strength of the physical interaction of septins with the exocyst complex. Notably, one important open question is whether septins interact with the intact exocyst complex, as claimed in the text, or whether the interactions occur only with individual subunits. The two-hybrid and coIP data only show weak interactions with individual subunits, and some coIPs (for instance Sec3 and Exo70 with Spn1 and Spn4) are negative, suggesting that the exocyst complex does not remain intact in these experiments.

      Given the known structure of the full exocyst complex and septin filaments (at least in S. cerevisiae), the Alphafold2 predicted structure could be used to probe whether the proposed interaction sites are compatible with full complex formation.  

      We thank the reviewer for these important and insightful comments. We agree that our current data, particularly the data from yeast two-hybrid and co-immunoprecipitation (coIP) assays, primarily reveal interactions between individual septin and exocyst subunits, and do not conclusively demonstrate binding of septins to the fully assembled exocyst complex. We realize this as a key limitation and have revised the manuscript text accordingly to clarify this point.

      We also appreciate the reviewer’s suggestion to use structural prediction to further assess their interaction plausibility. We have now employed the full Saccharomyces cerevisiae exocyst complex (with 4.4 Å resolution) published by the Guo group (Mei et al., 2018) to examine the interfaces of septin and the exocyst interactions, assuming that the S. pombe exocyst has the similar structure. We focused on checking all the interacting residues on the exocyst complex and septins from our AlphaFold modeling to determine whether these predicted interactions are structurally compatible. Our analysis reveals that majority subunit interactions are sterically feasible, while a few would likely require partial disassembly or flexible conformations. These new insights have been added to the revised Results and Discussion sections (Figure Supplement S4, S5 and Videos 4-7).

      While we cannot fully resolve whether septins engage with the whole exocyst complex versus selected subunits, our combined data support a model that septins scaffold or spatially regulate the exocyst localization at the division site, potentially through dynamic and multivalent interactions. We now explicitly state this more cautious interpretation in the revised manuscript.

      Mei, K., Li, Y., Wang, S., Shao, G., Wang, J., Ding, Y., Luo, G., Yue, P., Liu, J.-J., Wang, X. and Dong, M.-Q., Wang, H-W, Guo W. 2018. Cryo-EM structure of the exocyst complex. Nature Struct & Mol. Biol, 25(2), pp.139-146.

      The effect of spn1∆ on Eng1 localization is very clear, but the effect on secretory vesicles (Ypt3, Syb1) and glucan synthase Bgs1 is less convincing. The effect is small, and it is not clear how the cells are matched for the stage of cytokinesis. 

      For localizations and quantifications of Eng1, Ypt3, Syb1, and Bgs1 shown in Figures 6 and 7, cells with a closed septum (at or after the end of contractile-ring constriction) were quantified or highlighted. To quantify their fluorescence intensity at the division site using line scan, the line width used was 3 pixels. For Syb1 (Figure 6D), we quantified cells at the end of ring constriction (when Rlc1-tdTomato constricted to a dot) in the middle focal plane. The exact same lines were drawn in both Rlc1 and Syb1 channels. The center of line scan was defined as the pixel with the brightest Rlc1 value. All data were aligned by the center and plotted. For Bgs1 (Figure 7A), we quantified the cells that Rlc1 signal had disappeared from the division site. The line was drawn in the Bgs1 channel in the middle focal plane. The center of line scan was defined as the pixel with the brightest Bgs1 value.

      All data were aligned by the center and plotted. These details were added to the Materials and Methods.

      Reviewer #2 (Public Review): 

      Summary: 

      This interesting study implicates the direct interaction between two multi-subunit complexes, known as the exocyst and septin complexes, in the function of both complexes during cytokinesis in fission yeast. While previous work from several labs had implicated roles for the exocyst and septin complexes in cytokinesis and cell separation, this study describes the importance of protein:protein interaction between these complexes in mediating the functions of these complexes in cytokinesis. Previous studies in neurons had suggested interactions between septins and exocyst complexes occur but the functional importance of such interactions was not known. Moreover, in baker's yeast where both of these complexes have been extensively studied - no evidence of such an interaction has been uncovered despite numerous studies which should have detected it. Therefore while exocyst:septin interactions appear to be conserved in several systems, it appears likely that budding yeast are the exception--having lost this conserved interaction. 

      Strengths: 

      The strengths of this work include the rigorous analysis of the interaction using multiple methods including Co-IP of tagged but endogenously expressed proteins, 2 hybrid interaction, and Alphafold Multimer. Careful quantitative analysis of the effects of loss of function in each complex and the effects on localization and dynamics of each complex was also a strength. Taken together this work convincingly describes that these two complexes do interact and that this interaction plays an important role in post Golgi vesicle targeting during cytokinesis. 

      Weaknesses: 

      The authors used Alphafold Multimer to predict (largely successfully) which subunits were most likely to be involved in direct interactions between the complexes. It would be very interesting to compare this to a parallel analysis on the budding yeast septin and exocyst complexes where it is quite clear that detectable interactions between the exocyst and septins (using the same methods) do not exist. Presumably the resulting pLDDT scores will be significantly lower. These are in silico experiments and should not be difficult to carry out. 

      We thank the reviewer for this insightful suggestion. To assess the specificity of the predicted interactions between septins and the exocyst complex in S. pombe, we performed a comparative AlphaFold2 analysis using some of the homologous subunits from Saccharomyces cerevisiae. We modeled two interactions between Cdc10-Sec5 and Cdc10-Sec15 (Cdc10 is the Spn2 homolog) using the same pipeline and parameters at the time when we did the modeling for S. pombe. We did not find interactions between them using the criteria we used for the fission yeast proteins in this study. These results support the notion that the predicted septin–exocyst interactions in S. pombe are not generalizable to budding yeast. Unfortunately, we did not test all other combinations at that time and the AlphaFold2 platform is not available to us now (showing system error messages when we tried recently). We thank the reviewer again for this helpful suggestion, which should strengthen the evolutionary interpretation of the septin-exocyst interactions once it is able to be systematically carried out.

      Reviewer #3 (Public Review): 

      Septins in several systems are thought to guide the location of exocytosis, and they have been found to interact with the exocyst vesicle-tethering complex in some cells. However, it is not known whether such interactions are direct or indirect. Moreover, septin-exocyst physical associations were not detected in several other systems, including yeasts, making it unclear whether such interactions reflect a conserved septin-exocytosis link or whether they may missed if they depend on septin polymerization or association into higher-order structures. Singh et. al., set out to define whether and how septins influence the exocyst during S. pombe cytokinesis. Based on three lines of evidence, the authors conclude that septins directly bind to exocyst subunits to regulate localization of the exocyst and vesicle secretion during cytokinesis. The conclusions are consistent with the data presented, but some interpretations need to be clarified and extended: 

      (1) The first line of evidence examines septin and exocyst localization during cytokinesis in wild-type and septin-mutant or exocyst-mutant yeast. Quantitative imaging convincingly shows that the detailed localization of the exocyst at the division site is perturbed in septin mutants, and that this is accompanied by modest accumulation of vesicles and vesicle cargos. Whether that is sufficient to explain the increased thickness of the division septum in septin mutants remains unclear.

      The modest accumulation of vesicles and vesicle cargos at the division site is one of the reasons for the increased thickness of the division septum in septin mutants. It is more likely that the misplaced exocyst can still tether vesicles along the division plane (less likely at the rim) without septins. Due to the lack of the glucanase Eng1 at the rim of the division plane in septin mutants, daughter-cell separation is delayed and then cells continue to thicken the septum. We have added these points to the Discussion.

      (2) The second line of evidence involves a comprehensive Alphafold2 analysis of potential pair-wise interactions between septin and exocyst subunits. This identifies several putative interactions in silico, but it is unclear whether the identified interaction surfaces would be available in the full septin or exocyst complexes.  

      We thank the reviewer for raising this important point. We fully agree that a key limitation of pairwise AlphaFold predictions is that they do not account for the higher-order structural context of multimeric protein complexes, such as septin hetero-oligomers or the assembled exocyst complex. As a result, some of the predicted interfaces could indeed be conformationally restricted in the native state.

      To address this concern, we predicted the S. pombe exocyst and septin structures using AlphaFold3. We mapped predicted contact residues onto the predicted structure. Most predicted interfaces (86% for the exocyst and 86-96% for septins) appear to be located on accessible surfaces in the assembled complexes (Figure supplement S4, S5, videos 4 - video 7), suggesting that these interactions are sterically plausible. We have added this important caveat to the text of the revised manuscript highlighting the interface accessibility within the assembled complexes. We appreciate the reviewer’s insight, which helped us strengthen the interpretation and limitations of the AlphaFold-based analysis.

      (3) The third line of evidence uses co-immunoprecipitation and yeast two hybrid assays to show that several physical interactions predicted by Alphafold2 can be detected, leading the authors to conclude that they have identified direct interactions. However, both methods leave open the possibility that the interactions are indirect and mediated by other proteins in the fission yeast extract (co-IP) or budding yeast cell (two-hybrid). 

      We thank the reviewer for this important clarification. We agree that coimmunoprecipitation (co-IP) and yeast two-hybrid (Y2H) assays cannot conclusively distinguish between direct and indirect interactions. As the reviewer points out, co-IPs may reflect associations mediated by bridging proteins within the fission yeast extract, and Y2H readouts can be influenced by fusion context or endogenous host proteins. In our manuscript, we have now revised the relevant statements in the Results and Discussion sections to clarify that the observed associations are consistent with direct interactions predicted by AlphaFold2, but cannot alone establish direct binding. We have also tempered our terminology—substituting phrases such as “direct interaction” with “physical association consistent with direct binding,” where appropriate.

      (4) Based on prior studies it would be expected that the large majority of both septins and exocyst subunits are present in cells and extracts as stoichiometric complexes. Thus, one would expect any septin-exocyst interaction to yield associations detectable with multiple subunits, yet co-IPs were not detected in some combinations. It is therefore unclear whether the interactions reflect associations between fully-formed functional complexes or perhaps between transient folding intermediates. 

      We thank the reviewer for this thoughtful observation. We agree that both septins and exocyst subunits are generally understood to exist in cells as stable, stoichiometric complexes, and that interactions between fully assembled complexes might be expected to yield co-immunoprecipitation signals involving multiple subunits from each complex. However, it was also found that >50% of septins Spn1 and Spn4 are in the cytoplasm even during cytokinesis when the septin double rings are formed (Table 1 of Wu and Pollard, Science 2005, PMID: 16224022). Thus, it is possible that there are pools of free septin and exocyst subunits in the cytoplasm, which were detected in our Co-IP assays. 

      In our experiments, we observed selective co-IP signals between certain septin and exocyst subunits, while other combinations did not yield detectable interactions. We believe these findings could reflect several other possibilities besides the possible interactions among the free subunits in the cytoplasm:

      (1) Some interactions may only be strong enough between specific subunits at exposed interfaces under the Co-IP conditions, rather than through wholesome complex–complex interactions;

      (2) The detergent and/or salt conditions used in our co-IPs may disrupt labile complex interfaces or partially dissociate multimeric assemblies.

      To address this concern, we now include in the Discussion a paragraph highlighting the possibility that some of the observed interactions may not reflect binding between fully assembled, functional complexes. Notably, most detected interactions pairs are consistent with the AlphaFold predictions, which suggest specific subunit interfaces may be responsible for mediating contact. While we cannot fully resolve whether septins engage with the whole exocyst complex versus selected subunits, our combined data supports a model that septins scaffold or spatially regulate the exocyst localization at the division site, potentially through dynamic and multivalent interactions. We now explicitly state this more cautious interpretation in the revised manuscript. Future biochemical studies using native complex purifications, cross-linking mass spectrometry, or in vitro reconstitution with fully assembled septin and exocyst complexes, or in vivo FRET assays will be essential to clarify whether the interactions we observe occur between intact assemblies or intermediate forms.

      Reviewer #1 (Recommendations for the Authors): 

      A major finding from the manuscript is the description of physical interaction of septin subunits with exocyst subunits. The analysis starts from Alphafold2 predictions, shown in Figures 3 and S3. However, some of the most useful metrics of Alphafold, the PAE plot and the pTM and ipTM values, are not provided. It is thus very difficult to estimate the value of the predicted structures (which are also obscured by all side chains). The power of a predicted structure is that it suggests binding interfaces, which is not explored here. At the very least, it would not be difficult to examine whether the proposed binding interfaces are free in the septin filaments and octameric exocyst complex. 

      Please also see response to reviewer #1 (Public Review).

      We thank the reviewer for these very helpful suggestions. We agree that inclusion of AlphaFold2 model confidence metrics—specifically the Predicted Aligned Error (PAE) plots, as well as pTM and ipTM values—is essential for evaluating the reliability of the predicted septin–exocyst interfaces.

      In the revised manuscript, we have now included the PAE plots (Figure 3 and Supplementary S3) and summarizes the pTM scores for each predicted septin–exocyst subunit pair. We also provide a short description of these metrics in the figure legend to help guide interpretation. The old Alphafold2 version (alphafold2advanced) that we used doesn’t give iPTM score, so are not included. However, according to our methodology, we only counted the interacting residues which have pLDDT scores >50%, predicting the resulting iPTM score should not be very weak.

      In addition, we have updated Figures 3 and S3 to show simplified ribbon diagrams of the interface regions, with side chains hidden by default and selectively displayed only at predicted interaction hotspots. This improves structural clarity and makes the interface regions easier to interpret. We mentioned in the Discussion that the preliminary studies show that the predicted interacting interfaces of Sec15 and Sec5 with septin subunits are accessible for interaction in the whole exocyst complex. The new Figure Supplement S4 and S5 and Videos 4-7 now show the interface residues of both the exocyst and septins that are involved in the interactions.

      Two further points on the interaction: 

      The 2H interaction data is not very convincing. The insets showing beta-gal assays do not look very different from the negative control (compare for instance in panel 4E the Sec15BD alone, last column, with the Sec15-BD in combination with Spn4-AD, third column: roughly same color), which suggests it is mostly driven by autoactivation of Sec15-BD. Providing growth information in addition to beta-gal may be helpful. 

      We appreciate the reviewer’s close evaluation of the yeast two-hybrid (Y2H) assay data, and we agree that the signals observed in the Spn4–Sec15 combination is indeed weak. Unfortunately, we did not perform growth assays. However, we would like to clarify that this is consistent with the nature of the interactions that we are investigating. The interaction between individual septin and exocyst subunits is not strong and/or transient as supported by the weak interactions by Co-IP experiments. Given the exocyst only tethers/docks vesicles on the plasma membrane for tens of seconds before vesicle fusion, the multivalent interactions between septins and the exocyst should be very dynamic and not be too strong. 

      As evidenced by our Co-IP experiments and multivalent interactions predicted by Alphafold2, the interaction between Spn4 and Sec15 is detectable but weak, suggesting that this may be a low-affinity or transient interaction. Given that Y2H assays have known limitations in detecting such low-affinity interactions—especially those that depend on conformational context or are not optimal in the yeast nucleus—it is perhaps not surprising that the X-gal color development is subtle. These limitations of the Y2H system have been well-documented (e.g., Braun et al., 2009; Vidal & Fields, 2014), particularly for interactions with affinities in the micromolar range or those requiring conformational specificity. Therefore, the weak signal observed is in line with expectations for a lowaffinity, transient interaction such as between Spn4 and Sec15.

      Vidal, M. and Fields, S., 2014. The yeast two-hybrid assay: still finding connections after 25 years. Nature methods, 11(12), pp.1203-1206.

      Braun, P., Tasan, M., Dreze, M., Barrios-Rodiles, M., Lemmens, I., Yu, H., Sahalie, J.M., Murray, R.R., Roncari, L., De Smet, A.S. and Venkatesan, K., 2009. An experimentally derived confidence score for binary protein-protein interactions. Nature methods, 6(1), pp.91-97.

      In the coIP experiments, I am confused by the presence of tubulin signal in some of the IPs. For instance, in Fig 4B, but not 4D, where the same Sec15-GFP is immunoprecipitated. There is also a signal in 4C but not 4A. This needs to be clarified. 

      The presence of tubulin in some immunoprecipitates is not unexpected, particularly in experiments involving cytoskeleton-associated proteins such as septins and exocyst subunits. The occasional presence of tubulin in our co-IP samples is consistent with well-documented reports showing tubulin as a frequent non-specific co-purifying protein, particularly under native lysis conditions used to preserve large complexes (Vega and Hsu, 2003; Gavin et al., 2006; Mellacheruvu et al., 2013; Hein et al., 2015). The CRAPome database and quantitative interactomics studies highlight tubulin as one of the most common background proteins in affinity-based workflows. Importantly, tubulin was used as a loading control but not as a marker for interaction in our study, and its variable presence does not reflect a specific interaction with Sec15-GFP or other bait proteins, and we have clarified this point in the revised figure legend.

      Gavin, A.C., Aloy, P., Grandi, P., Krause, R., Boesche, M., Marzioch, M., Rau, C., Jensen, L.J., Bastuck, S., Dümpelfeld, B. and Edelmann, A., 2006. Proteome survey reveals modularity of the yeast cell machinery. Nature, 440(7084), pp.631-636.

      Mellacheruvu, D., Wright, Z., Couzens, A.L., Lambert, J.P., St-Denis, N.A., Li, T., Miteva, Y.V., Hauri, S., Sardiu, M.E., Low, T.Y. and Halim, V.A., 2013. The CRAPome: a contaminant repository for affinity purification–mass spectrometry data. Nature methods, 10(8), pp.730736.

      Hein, M.Y., Hubner, N.C., Poser, I., Cox, J., Nagaraj, N., Toyoda, Y., Gak, I.A., Weisswange, I., Mansfeld, J., Buchholz, F. and Hyman, A.A., 2015. A human interactome in three quantitative dimensions organized by stoichiometries and abundances. Cell, 163(3), pp.712-723.

      Vega, I.E., Hsu, S.C. 2003. The septin protein Nedd5 associates with both the exocyst complex and microtubules and disruption of its GTPase activity promotes aberrant neurite sprouting in PC12 cells. Neuroreport, 14, pp.31-37.

      Regarding the localization of Ypt3 and Syb1 in WT and spn1∆ in Figure 6C-D and Bgs1 in Figure 7A, it would help to add a contractile ring marker to be able to match the timing of cytokinesis between WT and mutants and ensure that cells of same stage are compared (and add some quantification for Ypt3). In fact, in Figure 7A, next to the cells being pointed at, there are very similar localizations of Bgs1 in WT and spn1∆ at the rim of the ingressing septum, which makes me wonder how the quantified cells were chosen. 

      For localizations and quantifications of Eng1, Ypt3, Syb1, and Bgs1 shown in Figures 6 and 7, cells with a closed septum (at or after the end of contractile-ring constriction) were quantified or highlighted. To quantify their fluorescence intensity at the division site using line scan, the line width used was 3 pixels. For Syb1 (Figure 6D), we quantified cells at the end of ring constriction (when Rlc1-tdTomato constricted to a dot) in the middle focal plane. The exact same lines were drawn in both Rlc1 and Syb1 channels. The center of line scan was defined as the pixel with the brightest Rlc1 value. All data were aligned by the center and plotted. For Bgs1 (Figure 7A), we quantified the cells that Rlc1 signal had disappeared from the division site. The line was drawn in the Bgs1 channel in the middle focal plane. The center of line scan was defined as the pixel with the brightest Bgs1 value. All data were aligned by the center and plotted. These details were added to the Materials and Methods.

      Finally, the manuscript would benefit from some figure reorganization/compaction. Unless work on the binding interfaces is added, Figure 3 and S3 could be removed and summarized by providing the pTM and ipTM values of the predicted interactions. Figure 5 could be combined with Figure 2, as it is essentially a repeat with additional exocyst subunits. 

      Because the binding interfaces are added, we keep the original Figures 3 and S3. The experiments in Figure 5 could not be performed before the interaction tests between septins and the exocyst. Thus, to aid the flow of the story, we keep Figures 2 and 5 separated.

      Minor comments: 

      The last sentence of the first paragraph of the results does not make much sense at this point of the paper. After the first paragraph, there is no evidence that colocalization would be required for proper function.  

      We agree that the sentence in question may have overstated the functional implications of colocalization too early in the Results section, before presenting supporting evidence. Our intention was to introduce the hypothesis that spatial proximity between septins and exocyst subunits may be relevant for their coordination during cytokinesis, which we examine in later figures. We have revised the sentence to more accurately reflect the observational nature of the data at this stage in the manuscript as below:

      "These observations suggest the spatial proximity between septins and the exocyst during certain stage of cytokinesis, raising the possibility of their functional coordination, which we would further investigate below."

      What is the indicated n in Figure 6B? Number of cells? 

      Yes, the n in Figure 6B refers to the thin sections of electron microscopy quantified in the analysis. We have now updated the figure legend to explicitly state this for clarity.

      The causal inference made between the alteration of Exocyst localization in septin mutants and the thicker septum is possible, but by no means certain. It should be phrased more cautiously. 

      We agree that our original phrasing may have overstated the causal relationship between altered exocyst localization in septin mutants and septum thickening. Our data supports a correlation between these phenotypes, but additional experiments would be required to establish direct causality.

      To reflect this, we have revised the relevant sentence in the Discussion to read:

      “The modest accumulation of vesicles and vesicle cargos at the division site is one of the reasons for the increased thickness of the division septum in septin mutants. It is more likely that the misplaced exocyst can still tether vesicles along the division plane without septins. Due to the lack of the glucanase Eng1 at the rim of the division plane in septin mutants, daughter-cell separation is delayed and then cells continue to thicken the septum.”

      Reviewer #2 (Recommendations for the Authors): 

      (1) In the display of the AlphaFold Model for the interactions (Figure 3 and Supplemental Figure 3) it is difficult to identify which subunits are where. Residue numbers and subunits should be labeled and only side chains important for the interactions should be present in the model. 

      We appreciate this valuable suggestion. We agree that clearer visual labeling is essential for interpreting the predicted interactions and have revised Figures 3 and S3 accordingly to improve readability and emphasize key structural features.

      Specifically, we have:

      • Labeled each subunit with its name and color-coded consistently across panels.

      •  Annotated key interface residues with residue numbers directly in the figure.

      • Removed non-interacting side chains to declutter the model and highlight only those involved in predicted interactions as well as expanded the figure legend for explanation.

      (2) In Table 1 the column label "Genetic Interaction at 25C" is confusing when synthetic growth defects are shown with a "plus". Rather this column could be labeled "Growth of double mutants at 25C" and then designate the relative growth rate observed at 25C as in Table 2. Designating a negative effect on growth with a plus is confusing. 

      Thanks for the thoughtful suggestions. We have made the suggested changes by deleting the last column so that Tables 1 and 2 are consistent.

      (3) In Figure 4, why is tubulin being co-immunoprecipitated in two of the four anti-GFP IPs? Are the IPs dirty and if so why does it vary between the four experiments? If they are dirty can the non-specific tubulin be removed by additional washes with IP buffer or conversely is it necessary to do minimal washes in order to detect the exocyst-septin interaction by coIP? A comment on this would be helpful. 

      The presence of tubulin in some immunoprecipitates is not unexpected, particularly in experiments involving cytoskeleton-associated proteins such as septins and exocyst subunits. The occasional presence of tubulin in our co-IP samples is consistent with welldocumented reports showing tubulin as a frequent non-specific co-purifying protein, particularly under native lysis conditions used to preserve large complexes (Vega and Hsu, 2003; Gavin et al., 2006; Mellacheruvu et al., 2013; Hein et al., 2015). The CRAPome database and quantitative interactomics studies highlight tubulin as one of the most common background proteins in affinity-based workflows. Importantly, tubulin was used as a loading control but not marker for interaction in our study, and its variable presence does not reflect a specific interaction with Sec15-GFP or other bait proteins, and we have clarified this point in the revised figure legend.

      Gavin, A.C., Aloy, P., Grandi, P., Krause, R., Boesche, M., Marzioch, M., Rau, C., Jensen, L.J., Bastuck, S., Dümpelfeld, B. and Edelmann, A., 2006. Proteome survey reveals modularity of the yeast cell machinery. Nature, 440(7084), pp.631-636.

      Mellacheruvu, D., Wright, Z., Couzens, A.L., Lambert, J.P., St-Denis, N.A., Li, T., Miteva, Y.V., Hauri, S., Sardiu, M.E., Low, T.Y. and Halim, V.A., 2013. The CRAPome: a contaminant repository for affinity purification–mass spectrometry data. Nature methods, 10(8), pp.730736.

      Hein, M.Y., Hubner, N.C., Poser, I., Cox, J., Nagaraj, N., Toyoda, Y., Gak, I.A., Weisswange, I., Mansfeld, J., Buchholz, F. and Hyman, A.A., 2015. A human interactome in three quantitative dimensions organized by stoichiometries and abundances. Cell, 163(3), pp.712-723.

      Vega, I.E., Hsu, S.C. 2003. The septin protein Nedd5 associates with both the exocyst complex and microtubules and disruption of its GTPase activity promotes aberrant neurite sprouting in PC12 cells. Neuroreport, 14, pp.31-37. 

      In response to the second part of reviewer’s comment, we washed the pulldown product for 5 times each time with 1 ml IP buffer at 4ºC. We used this standard protocol for all the Co-IP experiments to detect the interaction between different septin-exocyst subunits. So, we are not sure if and how more washes or more stringent buffer conditions can interfere with detection of the interactions.

      Reviewer #3 (Recommendations for the Authors): 

      In addition to the issues noted in the public review, there were some confusing findings and references to previous literature that merit further consideration or discussion: 

      • The current gold standard for validating Alphafold predictions involves making targeted mutants suggested by the structural predictions. The absence of any such validation weakens the conclusions significantly. 

      We agree that the targeted mutagenesis based on AlphaFold2-predicted interaction interfaces represents a powerful approach to experimentally validate the in silico models. While we did not pursue structure-guided mutagenesis in this study, our goal was to identify putative interactions between septin and exocyst subunits as a foundation for future functional work. Our current conclusions are intentionally limited to proposing putative interfaces, supported by co-immunoprecipitation and genetic interaction data.

      We recognize that direct validation of specific contact residues would significantly strengthen the model. Accordingly, we have revised the Discussion to explicitly state this limitation and to note that structure-based mutagenesis will be an important next step to test the functional relevance of predicted interactions. We have added the following statement:

      “Future studies are needed to refine the residues involved in the interactions because the predicted interacting residues from AlphaFold are too numerous. However, it is encouraging that most of the predicted interacting residues are clustered in several surface patches. Experimental validation through targeted mutagenesis is an important next step.”

      • Much of the writing appears to imply that differences in mutant phenotypes indicate differences in septin (or exocyst) subunit behaviors/functions. However, my reading of the work in budding yeast is that such differences reflect the partial functionality that can be conferred by aberrant partial septin complexes that assemble and may polymerize in mutants lacking different subunits. In this view, which is supported by data showing that essentially all septins are in stoichiometric octameric complexes in cells, the wild-type functions are all mediated by the full complex. Similarly, the separate exocyst subunit localizations based on tagged Sec3 (Finger et al) were not supported by later work from the Brennwald lab with untagged Sec3, and the idea that different exocyst subunits may function separately from the full complex has very limited support in yeast. I would suggest that the text be edited to better reflect the literature, or that different views be better justified. 

      Thanks for the suggestions. We have revised the text accordingly.

      • The comprehensive set of Alphafold2 predictions is a major strength of the paper, but it is unclear to this reader whether the multiple predicted interactions truly reflect multivalent multimode interactions or whether many (most?) predictions would not be consistent with interactions between full complexes and may not indicate physiological interactions. Better discussion of these issues is needed to interpret the findings. 

      We appreciate the reviewer’s suggestion to use structural prediction to further assess interaction plausibility. We have now employed the full Saccharomyces cerevisiae exocyst complex (with 4.4 Å resolution) published by the Guo group to examine the interfaces of septins and the exocyst interactions, assuming that the S. pombe exocyst has the similar structure. We mapped predicted contact residues onto the predicted structure. Most predicted interfaces (86% for the exocyst and 86-96% for septins) appear to be located on accessible surfaces in the assembled complexes (Figure supplement S4, S5, videos 4 - video 7), suggesting that these interactions are sterically plausible. We have added this important caveat to the text of the revised manuscript highlighting the interface accessibility within the assembled complexes. We appreciate the reviewer’s insight, which helped us strengthen the interpretation and limitations of the AlphaFold-based analysis.

      • Some but not all co-IP blots appear to show tubulin (negative control) coming down with the GFP pull-downs. Why is that, and what does it imply for the reliability of the co-IP protocol? 

      The presence of tubulin in some immunoprecipitates is not unexpected, particularly in experiments involving cytoskeleton-associated proteins such as septins and exocyst subunits. The occasional presence of tubulin in our co-IP samples is consistent with welldocumented reports showing tubulin as a frequent non-specific co-purifying protein, particularly under native lysis conditions used to preserve large complexes (Vega and Hsu, 2003; Gavin et al., 2006; Mellacheruvu et al., 2013; Hein et al., 2015). The CRAPome database and quantitative interactomics studies highlight tubulin as one of the most common background proteins in affinity-based workflows. Importantly, tubulin was used as a loading control but not a marker for interaction in our study, and its variable presence does not reflect a specific interaction with Sec15-GFP or other bait proteins, and we have clarified this point in the revised figure legend.

      Gavin, A.C., Aloy, P., Grandi, P., Krause, R., Boesche, M., Marzioch, M., Rau, C., Jensen, L.J., Bastuck, S., Dümpelfeld, B. and Edelmann, A., 2006. Proteome survey reveals modularity of the yeast cell machinery. Nature, 440(7084), pp.631-636.

      Mellacheruvu, D., Wright, Z., Couzens, A.L., Lambert, J.P., St-Denis, N.A., Li, T., Miteva, Y.V., Hauri, S., Sardiu, M.E., Low, T.Y. and Halim, V.A., 2013. The CRAPome: a contaminant repository for affinity purification–mass spectrometry data. Nature methods, 10(8), pp.730736.

      Hein, M.Y., Hubner, N.C., Poser, I., Cox, J., Nagaraj, N., Toyoda, Y., Gak, I.A., Weisswange, I., Mansfeld, J., Buchholz, F. and Hyman, A.A., 2015. A human interactome in three quantitative dimensions organized by stoichiometries and abundances. Cell, 163(3), pp.712-723.

      Vega, I.E., Hsu, S.C. 2003. The septin protein Nedd5 associates with both the exocyst complex and microtubules and disruption of its GTPase activity promotes aberrant neurite sprouting in PC12 cells. Neuroreport, 14, pp.31-37.

      • Why were two different protocols used for different yeast-two-hybrid analyses? 

      The purpose of using two protocols was to test which protocol is more reliable and sensitive.

      • The different genetic interactions between septin and exocyst mutants when combined with TRAPP-II mutants merits further discussion: might the difference reflect relocation of exocyst from rim to center in septin mutants versus inactivation of exocyst in exocyst mutants? 

      We appreciate this insightful comment and agree that this distinction is likely meaningful. The reviewer correctly notes that septin mutants may not abolish exocyst function but rather cause its spatial mislocalization: from the rim to the center of the division site, whereas the exocyst mutants likely result in partial or complete loss of vesicle tethering activity at the plasma membrane.

      To address this important nuance, we have expanded the Discussion as follows:

      “The genetic interactions between mutations in the exocyst and septins when combined with TRAPP-II mutants may reflect fundamentally different consequences for compromising the exocyst function (Tables 1 and 2). In septin mutants, the exocyst complex still localizes to the division site but is mispositioned from the rim to the center of the division plane. This mislocalization allows partial retention of exocyst function, leading to very mild synthetic or additive defects when combined with compromised TRAPP-II trafficking and tethering. In contrast, in exocyst subunit mutants, the exocyst becomes partial or non-functional, resulting in a more severe loss of exocyst activity. These differing consequences could explain the qualitative differences in genetic interactions observed with TRAPP-II mutants (Tables 1 and 2). Thus, septins and the exocyst also work in different genetic pathways for certain functions in fission yeast cytokinesis.”

      • The vesicle accumulation in septin mutants was quite modest. Does that imply that most vesicles are still fusing in the septum? Further discussion would be beneficial to understand what the authors think this means. 

      We thank the reviewer for this important point. We agree that the modest vesicle accumulation observed in septin mutants suggests that a significant proportion of vesicles continue to successfully fuse at the division site, even in the absence of fully functional septin structures.

      We now discuss this in greater detail in the revised manuscript:

      “The relatively modest vesicle accumulation in septin mutants suggests that septins are not absolutely required for vesicle tethering or fusion per se at the division site. Instead, septins primarily function to spatially organize the targeting sites of exocyst-directed vesicles by stabilizing the localization of the exocyst at the rim of the cleavage furrow. In septin mutants, mislocalization of the exocyst reduces the spatial precision of membrane insertion but still permits vesicle tethering and fusion, albeit in a less controlled manner. Thus, septins likely play a modulatory rather than essential role in exocytic vesicle delivery during cytokinesis. This interpretation aligns with our localization and genetic interaction data, which indicates that septins act as scaffolds to optimize secretion geometry, rather than as core components of the fusion machinery.”

      • It was unclear to this reader why relocation of some exocyst complexes from the rim to the center of the septal region would lead to dramatic thickening of the septum. Further discussion would be beneficial to understand what the authors think this means. 

      The modest accumulation of vesicles and vesicle cargos at the division site is one of the reasons for the increased thickness of the division septum in septin mutants. It is more likely that the misplaced exocyst can still tether vesicles along the division plane without septins. Because of the lack of glucanase Eng1 at the rim of the division plane in septin mutants, daughter-cell separation is delayed and then cells continue to thicken the septum. We have added these points to the Discussion.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      Summary:

      The authors make a bold claim that a combination of repetitive transcranial magnetic stimulation (intermittent theta burst-iTBS) and transcranial alternating current stimulation (gamma tACS) causes slight improvements in memory in a face/name/profession task.

      Strengths:

      The idea of stimulating the human brain non-invasively is very attractive because, if it worked, it could lead to a host of interesting applications. The current study aims to evaluate one such exciting application.

      Weaknesses:

      (1) The title refers to the "precuneus-hippocampus" network. A clear definition of what is meant by this terminology is lacking. More importantly, mechanistic evidence that the precuneus and the hippocampus are involved in the potential effects of stimulation remains unconvincing.

      Thank you for the observation. We believe that the evidence collected supports our state relative to the stimulation of the precuneus and the involvement of the hippocampus. In particular, given the existing evidence on TMS methodology and precuneus non-invasive stimulation (see Koch et al., Brain, 2022, Koch et al., Alzheimer's research & therapy, 2025), the computation of the biophysical model with the E-field we produced (see Biophysical modeling and E-field calculation section in the supplementary information), together with the individual identification of the precuneus through the RM (see iTBS+γtACS neuromodulation protocol and MRI data acquisition in the main text), we can reasonably assume that the individually identified PC was stimulated.

      As we acknowledged in the Limitations section, we cannot entirely rule out the possibility that our results might also reflect stimulation of more superficial parietal regions adjacent to the precuneus. Nor do we provide direct evidence of microscopic changes in the precuneus following stimulation. However, the results we provide in terms of changes in precuneus oscillatory activity and precuneus-hippocampi connectivity sustain both our thesis of the precuneus stimulation and of hippocampi involvement in the stimulation effects.

      Despite this consideration, we agree on the fact that a clear definition of what is meant by the terminology “precuneus-hippocampus network” is lacking. Moreover, since our data and previous evidence sustain the notion of PC stimulation, while this study does not produce direct evidence of the hippocampi stimulation - but only of the effect of the neuromodulation protocol on its connection with the precuneus, we soften the claim in the title. We remove the mention of the precuneus-hippocampus network so that the modified title will be as follows: “Dual transcranial electromagnetic stimulation of the precuneus boosts human long-term memory.”

      (2) The question of the extent to which the stimulation approach and the stimulation parameters used in these experiments causes specific and functionally relevant neural effects remains open. Invasive recordings that could address this question remain out of the scope of this non-invasive study. The authors conducted scalp EEG experiments in an attempt to address this question using non-invasive methods. However, the results shown in Fig. 3 are unclear. The results are inconsistently reported in units of microvolts squared in some panels (3A, 3B) and in units of microvolts in other panels (3C). Also, there is insufficient consideration of potential contamination by signal components reflecting eye movements, other muscle artifacts, or another volume-conducted signal reflecting aggregate activity inside the brain.

      As you correctly noted, Figure 3 presents results obtained from the TMS–EEG recordings. However, there is no inconsistency regarding the measurement units, as we are referring to two distinct indices: one in the frequency domain—oscillatory power shown in Figures 3A and 3B, expressed in microvolts squared (μV<sup>²</sup>)—and one in the time domain—the TMS-evoked potential shown in Figure 3C, expressed in microvolts (μV).

      Regarding the concern about artifacts, this is an important issue on which our group has a strong expertise, having published well-established, highly cited procedures on how to record and clean TMS-EEG signals (e.g., Casula et al., Clinical Neurophysiology, 2017; Rocchi et al., Brain Stimulation, 2021). In the current study, we adopted a well-established and rigorous approach for both data acquisition and preprocessing. This ensured that the recorded TMS–EEG signals were not contaminated by physiological or electrical artifacts.

      As regards the recording procedure, all participants were instructed to fixate on a black cross to minimize eye movements. To avoid auditory-related components caused by the TMS click, we adopted an ad-hoc procedure optimized for TMS-EEG recordings (Rocchi et al., Brain Stimulation, 2021). First, participants were given earphones that continuously played an ad-hoc masking noise composed of white noise mixed with specific time-varying frequencies of the TMS click (Rocchi et al., Brain Stimulation, 2021). The masking noise volume was adjusted to ensure that participants could not detect the TMS click, or as much as tolerated (always below 90 dB). To further reduce the impact of the TMS click on the EEG signal, we placed ear defenders (SNR=30) on top of the earphones. Please see TMS–EEG data acquisition section in the main text.

      As regards the offline cleaning process, we applied Independent Component Analysis (INFOMAX-ICA) to the EEG data to identify and remove components associated with muscle activity, eye movements, blinking, and residual TMS-related artifacts, in line with the most recent guidelines on TMS–EEG preprocessing (Hernandez-Pavon et al., Brain Stimulation, 2023). Specifically, for TMS-related muscle artefacts, we strictly followed the criteria based on their scalp topography, spectral content, timing, and amplitude, which we published in a paper focused on this topic (Casula et al., Clinical Neurophysiology, 2017). We add this detail in the TMS–EEG preprocessing and analysis section in the supplementary information (lines 119-120).

      (3) Figure 3 indicates "Precuneus oscillatory activity ...", but evidence that the activity presented reflects precuneus activity is lacking. The maps shown at the bottom of Figure 3C suggest that the EEG signals recorded with scalp EEG reflect activity generated across a wide spatial range, with a peak encompassing at least tens of centimeters. Thus, evidence that effects specifically reflect precuneus activity, as the paper's title and text throughout the manuscript suggest, is lacking.

      We believe there may have been a misunderstanding. As indicated in the figure caption, panels A and B represent oscillatory activity, whereas panel C displays the TMS-evoked potentials (TEPs). Therefore, the topographical maps mentioned (i.e., those in panel C) did not refer to oscillatory activity, but to differences in TEP amplitude. Specifically, the topographies shown in Figure 3C illustrate statistically significant differences in TEP amplitudes between post-stimulation time points (T1—immediately after stimulation, and T2—20 minutes after stimulation) and the pre-stimulation baseline (T0).

      In this figure, we focused our analysis on a cluster of electrodes overlying the individually identified precuneus, capturing EEG responses to single TMS pulses delivered to that target. This approach, widely used in previous literature (e.g., Koch et al., NeuroImage, 2018; Casula et al., Annals of Neurology, 2022; Koch et al., Brain, 2022; Maiella et al., Clinical Neurophysiology, 2024; Koch et al., Alzheimer’s Research & Therapy, 2025), supports the interpretation that the observed responses reflect precuneus-related activity. Furthermore, the wide spatial range change you mention proved to be statistically different only when conducting the TMS-EEG over the precuneus (i.e., administering the TMS single pulse over the precuneus) and not when performing it over the left parietal cortex. We modified the discussion section in the main text to make it more clear (lines 196-199).

      “Moreover, we observed specific cortical changes in the posteromedial parietal areas, as evidenced by the whole-brain analysis conducted on TMS-EEG data when performed over the precuneus and the absence of effect when TMS-EEG was performed on the lateral posterior parietal cortex used as a control condition.”

      That said, we do not state that the effects observed specifically reflect the precuneus activity; indeed, we think the effect of the stimulation is broader, as discussed in the Discussion section. We rather sustain, in line with the literature (Koch et al., Neuroimage 2018; Koch et al., Brain, 2022; Koch et al., Alzheimer's research & therapy, 2025), the idea that the effects observed are a consequence of the precuneus stimulation by the dual stimulation.

      (4) The paper as currently presented (e.g., Figure 3) also lacks rigorous evidence of relevant oscillatory activity. Prior to filtering EEG signals in a particular frequency band, clear evidence of oscillations in the frequency band of interest should be shown (e.g., demonstration of a clear peak that emerges naturally in the frequency range of interest when spectral analysis is applied to "raw" signals). The authors claim that gamma oscillations change because of the stimulation, but a clear peak in the gamma range prior to stimulation is not apparent in the data as currently presented. Thus, the extent to which spectral measurements during stimulation reflect physiological gamma oscillations remains unclear.

      If we understand correctly, your concern relates to the lack of a clear gamma peak before neuromodulation, which may suggest uncertainty about the observed changes in gamma oscillatory activity. Is that correct?

      First, it is important to underline that the natural frequency typically observed in the precuneus falls within the beta range, not the gamma range (see Rosanova et al., Journal of Neuroscience, 2009; Casula et al., Annals of Neurology, 2022). This explains why a prominent gamma peak is not expected at baseline (T0).

      Differently, our neuromodulatory protocol was specifically aimed at boosting gamma oscillatory activity given its well-established role in learning and memory processes (Griffiths & Jensen, Trends in Neurosciences, 2023). Thus, to assess the effect of the neuromodulatory protocol, we compared the oscillatory activity before (T0) and after stimulation (T1 and T2), which showed a clear increase in the gamma band. This effect is visible in the raw oscillatory power plot and is most clearly represented in Figure 3B, where the gamma band emerged as the only frequency range showing significant changes across time points.

      (5) Concerns remain regarding the rigor of statistical analyses in the revised manuscript (see also point 8 below). Figure 3B shows an undefined statistical test with p<0.05. The statistical test that was used is not explained. Also, a description of how corrections for multiple comparisons were made is missing. Figures 3A and 3C are not accompanied by statistics, making the results difficult to interpret. For Figure 4C, a claim was made based on a significant p-value for one statistical test and a non-significant p-value in another test. This is a common statistical mistake (see Figure 1 and accompanying discussion in Makin and Orban de Xivry (2019) Science Forum: Ten common statistical mistakes to watch out for when writing or reviewing a manuscript. eLife 8:e48175).

      All statistical tests are described in the Statistical Analysis section of the main text. Specifically, to assess cortical oscillation changes in Experiment 3, we conducted repeated-measures ANOVAs with stimulation condition (iTBS+γtACS vs. iTBS+sham-tACS) and time (ΔT1 = T1–T0; ΔT2 = T2–T0) as within-subject factors, for each frequency band. To further explore the effects of stimulation at each time point, we performed paired t-tests with Bonferroni correction for multiple comparisons. A one-tailed hypothesis was adopted, based on our a priori prediction of gamma-band increase derived from previous work (Maiella et al., 2022).

      Please note that Figures 3A and 3C are purely descriptive and are therefore not accompanied by statistical tests. Figure 3A shows the full spectral profile across frequencies and conditions, while statistical significance for these data is reported in Figure 3B. Similarly, the upper part of Figure 3C displays the TMS-evoked potential (TEP) in the precuneus, while the statistical comparison of TEP amplitudes across time points is shown in the lower part of Figure 3C.

      Regarding Figure 4C and the article you cited, are you referring to the error described as “Interpreting comparisons between two effects without directly comparing them”? If we understand correctly, this refers to the mistake of inferring an effect by observing that a significant result occurs in one condition or group, while the corresponding result in another condition or group is not significant, without directly testing the difference between them.

      In the case of Experiment 4, which investigates fMRI effects and is illustrated in Figure 4, we employed a general linear model that explicitly modeled both conditions and time points, allowing for a direct statistical comparison. Therefore, the connectivity effect reported does not fall into the category of the error you mentioned.

      Importantly, Figure 4C does not depict the effect of the neuromodulatory protocol itself. Rather, its purpose is to show that, within the real stimulation condition, there is a correlation between the observed effect and the integrity of the bilateral Middle Longitudinal Fasciculus. No conclusions or assumptions were made based on the absence of a significant correlation in the sham condition. However, since it was an exploratory analysis, we decided to soften our claims relative to the neural mechanism in the discussion section of the main text (lines 241-246).

      (6) In the second question posed in the original review, I highlighted that it was unclear how such stimulation would produce memory enhancement. The authors replied that, in the absence of mechanisms, there are many other studies that suffer from the same problem. This raises the question of placebo effects. The paper does not sufficiently address or discuss the possibility that any potential stimulation effects may reflect placebo effects.

      We agree with the reviewer on the potential role of a placebo effect in our study. For this reason, our experimental study had several stimulation conditions, including a placebo condition, which corresponded to the sham iTBS-sham tACS condition, which did not produce any effect.

      (7) The third major concern in the original review was the lack of evidence for a mechanism that is specific to the precuneus. Evidence for specific involvement of the precuneus remains lacking in the revised manuscript. The authors state: "the non-invasive stimulation protocol was applied to an individually identified precuneus for each participant". However, the meaning of this statement is unclear. Specifically, it is unclear how the authors know that they are specifically targeting the precuneus. Without directly recording from the precuneus and directly demonstrating effects, which is outside of the scope of the study, specific involvement of the precuneus seems speculative. Also, it does not seem as though a figure was included in the paper to show how the stimulation protocol specifically targets the precuneus. In their response to the original reviews, the authors state that posterior medial parietal areas are the only regions that show significant differences following the stimulation, but they did not cite a specific figure, or statistics reported in the text, that show this. In any event, posterior medial parietal areas encompass a wide area of the brain, so this would still not provide evidence for an effect specifically involving the precuneus.

      We respectfully disagree with the claim that targeting the precuneus in our study is speculative. The statement that “without directly recording from the precuneus and directly demonstrating effects, which is outside the scope of the study, specific involvement of the precuneus seems speculative” would, by that logic, implicitly call into question a large body of cognitive neuroscience research employing non-invasive techniques such as EEG and fMRI.

      Our methodological approach—combining MRI-guided stimulation, biophysical modeling, and TMS–EEG—is well established and widely used for targeting and studying the role of specific cortical regions, including the precuneus (e.g., Wang et al., Science, 2014; Koch et al., NeuroImage, 2018; Casula et al., Annals of Neurology, 2022, 2023; Koch et al., Brain, 2022; Maiella et al., Clinical Neurophysiology, 2024; Koch et al., Alzheimer’s Research & Therapy, 2025).

      In line with previously published protocols (Santarnecchi et al., Human Brain Mapping, 2018; Özdemir et al., PNAS, 2020; Mantovani et al., Journal of Psychiatric Research, 2021), we identified individual targets (i.e., the precuneus) for each participant based on structural and resting-state functional MRI data (see MRI Data Acquisition and Preprocessing section in the main text). This target was then accurately localized using MRI-guided stereotaxic neuronavigation, ensuring reproducible and anatomically precise stimulation across subjects.

      Finally, concerning the last comment about the lack of figures/statistics showing how the stimulation protocol targets the precuneus and the specificity of the effect observed, we would like to let the focus go over:

      Figure 3 in the main text, where we show the results of the TME-EEG over the posterior medial parietal areas;

      Figure S1 in the supplementary information, which shows with the e-fied simulation how the stimulation protocol targets the brain;

      the Precuneus iTBS+γtACS increases gamma oscillatory activity section in the main text results, where we report the results of the statistical analysis of the TMS-EEG conducted over the precuneus and the left posterior parietal cortex, used as a control condition to test for the specificity of the neuromodulation protocol.

      (8) Regarding chance levels, it is unfortunate that the authors cannot quantify what chance levels are in the immediate and delayed recall conditions. This makes interpretation of the results challenging. In the immediate and delayed conditions, the authors state that the chance level is 33%. It would be useful to mark this in the figures. If I understand correctly, chance is 33% in Fig. 2A. If this is the case and if I am interpreting the figure correctly:

      Gray bars for the sham condition appear to be below chance (~20-25%). Why is this condition associated with an accuracy level that is lower than chance?

      Cyan bars and red bars do not appear to be significantly different from chance (i.e., 33%), with red slightly higher than cyan. What statistic was performed to obtain the level of significance indicated in the figure? The highest average value for the red condition appears to be around 35%. More details are needed to fully explain this figure and to support the claims associated with this figure.

      The immediate and recall conditions you mention correspond to a free recall task. In this case, the notion of a fixed "chance level" is not straightforward as it would be in recognition or forced-choice paradigms, which is why we did not quantify it at first. I will now try to explain this extensively.

      Unlike multiple-choice tasks, where participants select the answer from a limited set of alternatives and the probability of a correct response by chance can be precisely quantified (e.g., 33% in a 3-alternative forced choice), free recall involves the spontaneous retrieval of items from memory without external cues or predefined options. As such, the response range in free recall is essentially unconstrained, encompassing the entire vocabulary of the participant.

      Because of this open-ended nature, the probability of correctly recalling a studied item purely by chance is exceedingly low and could be approximated to zero. Also, in our task, participants had to correctly recollect both name and occupation, doubling the possibility of the answers.

      This assumption is further supported by the fact that random guesses in free recall are unlikely to match any of the studied items, given the vast number of possible alternatives. As a result, performance above zero can be reasonably interpreted as reflecting genuine memory retrieval, rather than random guessing.

      As regards statistics, repeated-measures ANOVAs with stimulation condition as a within-subject factor (i.e., iTBS+γtACS; iTBS+sham-tACS; sham-iTBS+sham-tACS) for each dependent variable (see statistical analysis section in main text).

      (9) In the revised version of the paper, the authors did not address concerns associated with the block design (please see question 4d in the original review).

      We are sorry for the misunderstanding. We did not address your concerns related to block design since it does not apply to our study. As reported in the paper you mentioned in the original review, block design involves data collection performed in response to different stimuli of a given class presented in succession. If this is the case, it does not correspond to our experimental design since both TMS-EEG and fMRI were conducted in the resting state (i.e., without the presentation of stimuli) on different days according to the different randomized stimulation conditions.  

      In sum, this study presents an admirable aspirational goal, the notion that a non-invasive stimulation protocol could modulate activity in specific brain regions to enhance memory. However, the evidence presented at the behavioral level and at the mechanistic level (e.g. the putative involvement of specific brain regions) remains unconvincing.

      We hope our response will be carefully considered, fostering a constructive exchange and leading to a reassessment of your evaluation.

      Reviewer #2 (Public review):

      Summary:

      The manuscript by Borghi and colleagues provides evidence that the combination of intermittent theta burst TMS stimulation and gamma transcranial alternating current stimulation (γtACS) targeting the precuneus increases long-term associative memory in healthy subjects compared to iTBS alone and sham conditions. Using a rich dataset of TMS-EEG and resting-state functional connectivity (rs-FC) maps and structural MRI data, the authors also provide evidence that dual stimulation increased gamma oscillations and functional connectivity between the precuneus and hippocampus. Enhanced memory performance was linked to increased gamma oscillatory activity and connectivity through white matter tracts.

      Strengths:

      The combination of personalized repetitive TMS (iTBS) and gamma tACS is a novel approach to targeting the precuneus, and thereby, connected memory-related regions to enhance long-term associative memory. The authors leverage an existing neural mechanism engaged in memory binding, theta-gamma coupling, by applying TMS at theta burst patterns and tACS at gamma frequencies to enhance gamma oscillations. The authors conducted a thorough study that suggests that simultaneous iTBS and gamma tACS could be a powerful approach for enhancing long-term associative memory. The paper was well-written, clear, and concise.

      Comments on Revision:

      I thank the authors for their thoughtful responses to my first review and their inclusion of more detailed methodological discussion of their rationale for the stimulation protocol conditions and timing. Regarding the apparent difference in connectivity at baseline between conditions, the explanation that this is due to intrinsic dynamics, state, or noise implies the baseline is reflecting transient changes in dynamics rather than a true or stable baseline. Based on this, it looks like iTBS solely is significantly greater than the baseline before the iTBS and γtACS condition but maybe not that much lower than post-stimulation period for iTBS and γtACS. A longer baseline period should be used to ensure transient states are not driving baseline levels such that these endogenous fluctuations would average out. This also raises questions about whether the effect of iTBS and γtACS or iTBS alone are dependent on the intrinsic state at the time when stimulation begins. Their additional clarification of memory scoring is helpful but also reveals that the effect of dual iTBS+γtACS specifically on the association between faces and names is just significant. This modest increase in associative memory should be taken into consideration when interpreting these findings.

      We thank the reviewer for the feedback. We fully agree that considering baseline dynamics is critical when assessing the neurophysiological and connectivity effects of stimulation protocols.

      In Experiments 3 and 4, baseline measurements were specifically included in our design to account for the possibility that intrinsic dynamics, state, or noise could influence the observed effects of neuromodulation. Indeed, if we had compared only post-stimulation connectivity between the real and sham conditions, the effects might have appeared larger. The inclusion of baseline measurements allows us to contextualize and better isolate the neuromodulatory impact by controlling such endogenous fluctuations. Importantly, the fMRI connectivity measurements, which comprise the baseline, are derived from 10-minute BOLD signal acquisitions, which help mitigate the influence of transient fluctuations and provide a quite stable estimate of intrinsic connectivity.

      Moreover, regarding the possibility that stimulation effects may depend on the intrinsic state at stimulation onset, we hypothesize that gamma-frequency entrainment induced by tACS could reduce the variability of intrinsic dynamics, promoting a more stable neural state that is favorable for the induction of long-term plasticity.

      As regards the memory scoring, we would like to clarify that the significant improvement observed in the dual iTBS+γtACS condition does not pertain solely to the face–name association. Rather, it concerns the more demanding task of recalling the association between face, name, and occupation. While we agree that the observed effect could be considered modest, it is worth noting that it follows from only 3 minutes of stimulation.

      Reviewer #3 (Public review):

      Summary:

      Borghi and colleagues present results from 4 experiments aimed at investigating the effects of dual γtACS and iTBS stimulation of the precuneus on behavioral and neural markers of memory formation. In their first experiment (n = 20), they find that a 3-minute offline (i.e., prior to task completion) stimulation that combines both techniques leads to superior memory recall performance in an associative memory task immediately after learning associations between pictures of faces, names, and occupation, as well as after a 15-minute delay, compared to iTBS alone (+ tACS sham) or no stimulation (sham for both iTBS and tACS). Performance in a second task probing short-term memory was unaffected by the stimulation condition. In a second experiment (n = 10), they show that these effects persist over 24 hours and up to a full week after initial stimulation. A third (n = 14) and fourth (n = 16) experiment were conducted to investigate neural effects of the stimulation protocol. The authors report that, once again, only combined iTBS and γtACS increases gamma oscillatory activity and neural excitability (as measured by concurrent TMS-EEG) specific to the stimulated area at the precuneus compared to a control region, as well as precuneus-hippocampus functional connectivity (measured by resting state MRI), which seemed to be associated with structural white matter integrity of the bilateral middle longitudinal fasciculus (measured by DTI).

      Strengths:

      Combining non-invasive brain stimulation techniques is a novel, potentially very powerful method to maximize the effects of these kinds of interventions that are usually well-tolerated and thus accepted by patients and healthy participants. It is also very impressive that the stimulation-induced improvements in memory performance resulted from a short (3 min) intervention protocol. If the effects reported here turn out to be as clinically meaningful and generalizable across populations as implied, this approach could represent a promising avenue for treatment of impaired memory functions in many conditions.

      Methodologically, this study is expertly done! I don't see any serious issues with the technical setup in any of the experiments. It is also very commendable that the authors conceptually replicated the behavioral effects of experiment 1 in experiment 2 and then conducted two additional experiments to probe the neural mechanisms associated with these effects. This certainly increases the value of the study and the confidence in the results considerably.

      The authors used a within-subject approach in their experiments, which increases statistical power and allows for stronger inferences about the tested effects. They also used to individualize stimulation locations and intensities, which should further optimize the signal-to-noise ratio.

      Weaknesses:

      I think one of the major weaknesses of this study is the overall low sample size in all of the experiments (between n = 10 and n = 20). This is, as I mentioned when discussing the strengths of the study, partly mitigated by the within-subject design and individualized stimulation parameters. The authors mention that they performed a power analysis but this analysis seemed to be based on electrophysiological readouts similar to those obtained in experiment 3. It is thus unclear whether the other experiments were sufficiently powered to reliably detect the behavioral effects of interest. In the revised manuscript, the authors provide post-hoc sensitivity analyses that help contextualize the strength of the findings.

      While the authors went to great lengths trying to probe the neural changes likely associated with the memory improvement after stimulation, it is impossible from their data to causally relate the findings from experiments 3 and 4 to the behavioral effects in experiments 1 and 2. This is acknowledged by the authors and there are good methodological reasons for why TMS-EEG and fMRI had to be collected in separate experiments, but readers should keep in mind that this limits inferences about how exactly dual iTBS and γtACS of the precuneus modulate learning and memory.

      We thank the reviewer for the feedback.

      Reviewer #1 (Recommendations for the authors):

      I suggest:

      (1) Removing all mechanistic claims about the precuneus and hippocampus.

      We soften our claims about the precuneus-hippocampus network.

      (2) Repeating and focusing on the behavioral experiments with a much larger number of images and stronger statistical power to try to demonstrate a compelling behavioral correlate of the proposed stimulation protocol.

      We clarified the misunderstanding relative to the chance level of the behavioral experiments raised by the reviewer.

      Reviewer #2 (Recommendations for the authors):

      Use longer baseline to establish stable gamma level for comparisons in Figure 3

      If we understand correctly, you propose to increase the baseline to establish the gamma oscillatory activity as expressed in Figure 3 (showing the results of experiment 3). Is that right? In the figure, you see a baseline of -100; 0ms, which we use for a merely graphical reason, since no activity is usually observable before the TMS pulse. However, to establish the level of gamma, we used a larger baseline correction ranging from -700 ms to -300 ms (i.e., 400ms). We added this important information in the cortical oscillation section of the supplementary information (lines 134-135).

      Reviewer #3 (Recommendations for the authors):

      I think that the authors did a great job responding to the concerns raised by the reviewers. All of my own comments have been satisfactorily addressed. I will update my public review to be more concise, so that it only includes the overall assessment of the manuscript, including the strengths and weaknesses, but without the requests for clarification. Strengths and weaknesses remain largely the same, as the authors did not conduct additional experiments.

      Thank you.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public Review):

      Summary:

      In this study, López-Jiménez and colleagues demonstrated the utility of using high-content microscopy in dissecting host and bacterial determinants that play a role in the establishment of infection using Shigella flexneri as a model. The manuscript nicely identifies that infection with Shigella results in a block to DNA replication and protein synthesis. At the same time, the host responds, in part, via the entrapment of Shigella in septin cages.

      Strengths:

      The main strength of this manuscript is its technical aspects. They nicely demonstrate how an automated microscopy pipeline coupled with artificial intelligence can be used to gain new insights regarding elements of bacterial pathogenesis, using Shigella flexneri as a model system. Using this pipeline enabled the investigators to enhance the field's general understanding regarding the role of septin cages in responding to invading Shigella. This platform should be of interest to those who study a variety of intracellular microbial pathogens.

      Another strength of the manuscript is the demonstration - using cell biology-based approaches- that infection with Shigella blocks DNA replication and protein synthesis. These observations nicely dovetail with the prior findings of other groups. Nevertheless, their clever click-chemistry-based approaches provide visual evidence of these phenomena and should interest many.

      We thank the Reviewer for their enthusiasm on technical aspects of this paper, regarding both the automated microscopy pipeline coupled with artificial intelligence and the click-chemistry based approaches to dissect DNA replication and protein synthesis by microscopy.

      Weaknesses:

      There are two main weaknesses of this work. First, the studies are limited to findings obtained using a single immortalized cell line. It is appreciated that HeLa cells serve as an excellent model for studying aspects of Shigella pathogenesis and host responses. However, it would be nice to see that similar observations are observed with an epithelial cell line of intestinal, preferably colonic origin, and eventually, with a non-immortalized cell line, although it is appreciated that the latter studies are beyond the scope of this work.

      The immortalized cell line HeLa is widely regarded as a paradigm to study infection by Shigella and other intracellular pathogens. However, we agree that future studies beyond the scope of this work should include other cell lines (eg. epithelial cells of colonic origin, macrophages, primary cells). 

      The other weakness is that the studies are minimally mechanistic. For example, the investigators have data to suggest that infection with Shigella leads to an arrest in DNA replication and protein synthesis; however, no follow-up studies have been conducted to determine how these host cell processes are disabled. Interestingly, Zhang and colleagues recently identified that the Shigella OspC effectors target eukaryotic translation initiation factor 3 to block host cell translation (PMID: 38368608). This paper should be discussed and cited in the discussion.

      We appreciate the Reviewer’s concern about the lack of follow up work on observations of host DNA and protein synthesis arrest upon Shigella infection, which will be the focus of future studies. We acknowledge the recent work of Zhang et al. (Cell Reports, 2024) considering their similar results on protein translation arrest, and this reference has been more fully discussed in the revised version of the manuscript.

      Reviewer #2 (Public Review):

      Summary:

      Septin caging has emerged as one of the innate immune responses of eukaryotic cells to infections by intracellular bacteria. This fascinating assembly of eukaryotic proteins into complex structures restricts bacteria motility within the cytoplasm of host cells, thereby facilitating recognition by cytosolic sensors and components of the autophagy machinery. Given the different types of septin caging that have been described thus far, a single-cell, unbiased approach to quantify and characterise septin recruitment at bacteria is important to fully grasp the role and function of caging. Thus, the authors have developed an automated image analysis pipeline allowing bacterial segmentation and classification of septin cages that will be very useful in the future, applied to study the role of host and bacterial factors, compare different bacterial strains, or even compare infections by clinical isolates.

      Strengths:

      The authors developed a solid pipeline that has been thoroughly validated. When tested on infected cells, automated analysis corroborated previous observations and allowed the unbiased quantification of the different types of septin cages as well as the correlation between caging and bacterial metabolic activity. This approach will prove an essential asset in the further characterisation of septin cages for future studies.

      We thank the Reviewer for their positive comments, and for highlighting the strength of our imaging and analysis pipeline to analyse Shigella-septin interactions.

      Weaknesses:

      As the main aim of the manuscript is to describe the newly developed analysis pipeline, the results illustrated in the manuscript are essentially descriptive. The developed pipeline seems exceptionally efficient in recognising septin cages in infected cells but its application for a broader purpose or field of study remains limited.

      The main objective of this manuscript is the development of imaging and analysis tools to study Shigella infection, and in particular, Shigella interactions with the septin cytoskeleton. In future work we will provide more mechanistic insight with novel experiments and broader applicability, using different cell lines (in agreement with Reviewer 1), mutants or clinical isolates of Shigella and different bacteria species (eg. Listeria, Salmonella, mycobacteria).

      Reviewer #3 (Public Review):

      Summary:

      The manuscript uses high-content imaging and advanced image-analysis tools to monitor the infection of epithelial cells by Shigella. They perform some analysis on the state of the cells (through measurements of DNA and protein synthesis), and then they focus on differential recruitment of Sept7 to the bacteria. They link this recruitment with the activity of the bacterial T3SS, which is a very interesting discovery. Overall, I found numerous exciting elements in this manuscript, and I have a couple of reservations. Please see below for more details on my reservations. Nevertheless, I think that these issues can be addressed by the authors, and doing so will help to make it a convincing and interesting piece for the community working on intracellular pathogens. The authors should also carefully re-edit their manuscript to avoid overselling their data (see below for issues I see there). I would consider taking out the first figure and starting with Figure 3 (Figure 2 could be re-organized in the later parts)- that could help to make the flow of the manuscript better.

      Strengths:

      The high-content analysis including the innovative analytical workflows are very promising and could be used by a large number of scientists working on intracellular bacteria. The finding that Septins (through SEPT7) are differentially regulated through actively secreting bacteria is very exciting and can steer novel research directions.

      We thank the Reviewer for their constructive feedback and excitement for our results, including our findings on T3SS activity and Shigella-septin interactions. In accordance with the Reviewer’s comments, we avoid overselling our data in the revised version of the manuscript.

      Weaknesses:

      The manuscript makes a connection between two research lines (1: Shigella infection and DNA/protein synthesis, 2: regulation of septins around invading Shigella) that are not fully developed - this makes it sometimes difficult to understand the take-home messages of the authors.

      We agree that the manuscript is mostly technical and therefore some of our experimental observations would benefit from follow up mechanistic studies in the future. We highlight our vision for broader applicability in response to weaknesses raised by Reviewer 2.

      It is not clear whether the analysis that was done on projected images actually reflects the phenotypes of the original 3D data. This issue needs to be carefully addressed.

      We agree with the Reviewer that characterizing 3D data using 2D projected images has limitations.

      We observe an increase in cell and nuclear surface that does not strictly imply a change in volume. This is why we measure Hoechst intensity in the nucleus using SUM-projection (as it can be used as a proxy of DNA content of the cell). However, we agree that future use of other markers (such as fluorescently labelled histones) would make our conclusions more robust.

      Regarding the different orientation of intracellular bacteria, we agree that investigation of septin recruitment is more challenging when bacteria are placed perpendicular to the acquisition plane. In a first step, we trained a Convolutional Neural Network (CNN) using 2D data, as it is easier/faster to train and requires fewer annotated images. In doing so, we already managed to correctly identify 80% of Shigella interacting with septins, which enabled us to observe higher T3SS activity in this population. In future studies, we will maximize the 3D potential of our data and retrain a CNN that will allow more precise identification of Shigella-septin interactions and in depth characterization of volumetric parameters.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) To conclude that cell volume is indeed increased, the investigators should consider staining the cells with markers that demarcate cell boundaries and/or are confined to the cytosol, i.e., a cell tracker dye.

      Staining using our SEPT7 antibody enables us to define cell boundaries for cellular area measurements (Novel Figure 1 - figure supplement 1A). However, we agree with the Reviewer that staining cells with additional markers (such as a cell tracker dye) would be required to conclude that cell volume is increased. We therefore adjust our claims in the main text (lines 107-115 and 235-246).

      (2) Line 27: I understand what is meant by "recruited to actively pathogenic bacteria with increased T3SS activation." However, one could argue that there are many different roles of the intracytosolic bacteria in pathogenesis in terms of pathogenesis, not just actively secreting effectors.

      T3SS secretion by cytosolic bacteria is tightly regulated and both T3SS states (active, inactive) likely contribute to the pathogenic lifestyle of S. flexneri. In agreement with this, we removed this statement from the manuscript (lines 27, 225 and 274).

      (3) Line 88: Please clarify in the text that HeLa cells are being studied.

      We explicitly mention that the epithelial cell line we study is HeLa in the main text (line 93), in addition to the Materials and methods (line 328).

      (4) Line 97: is it possible to quantify the average distance of the nuclei from the cell perimeter? This would help provide some context as to what it means to be a certain distance from the nucleus, i.e., is there another way to point out that distance from nuclei correlates with movement inward post-invasion at the periphery?

      To provide more context to the inward movement of bacteria to the cell centre, we provide calculations based on measurements in Figure 1G, I. If we approximate geometric shape of both cells and nucleus to a circle, the median radius of a HeLa cell is 31.1 µm<sup>2</sup> (uninfected cell) and 36.3 µm<sup>2</sup> (infected cell). Similarly, the median radius of the nucleus is 22.2 µm<sup>2</sup> (uninfected cell) and 24.57 µm<sup>2</sup> (infected cell).

      However, we note that Figure 1F shows distance of bacteria to the centroid of the cell, which is the geometric centre of the cell, and which does not necessarily coincide with the geometric centre of the nucleus. We also note that nuclear area increases with infection (in a bacterial dose dependent manner). Finally, we note that these measurements are performed on max projections of 3D Z-stacks. In this case we cannot fully appreciate distance to the nucleus for bacteria located above it.

      (5) Lines 212-213 - there is no Figure 9A, B - I think this should be Figure 7A, B.

      Text has been updated (lines 216-217).

      Reviewer #2 (Recommendations For The Authors):

      Testing the analysis pipeline as a proof-of-concept question such as the comparison of caging around the laboratory strain as compared to one or a few clinical isolates or mutants of interest would help stress the relevance of this new, remarkable tool.

      We thank the Reviewer for their enthusiasm.

      Future research in the Mostowy lab will capitalise on the high-content tools generated here to explore the frequency and heterogeneity of septin cage entrapment for a wide variety of S. flexneri mutants and Shigella clinical isolates.

      The sentence in line 215 ends with "in agreement with" followed by a reference.

      Text has been updated (line 219).

      The sentence in line 217 on the correlation between caging and T3SS is not very clear.

      Text has been clarified (lines 221-223).

      There is a typo in line 219 : "protrusSions"

      Text has been updated (line 223).

      Reviewer #3 (Recommendations For The Authors):

      Major points

      The quantitative analysis approach in Figure 1 has multiple issues. Some examples:<br /> (1) How was the cell area estimated? Normally, a marker for the whole cell (CellMask or similar) or cells expressing GFP would be good indicators. Here it is not clear to me what was done.

      The cell area was estimated using SEPT7 antibody staining which is enriched under the cell cortex. CellProfiler was used to segment cells based on SEPT7 staining, using a propagation method from the identified nucleus based on Otsu thresholding. To provide more clarity on how this was performed, we now include a new figure (Figure 1- figure supplement 1A) showing a representative image of HeLa cells stained with SEPT7 and the corresponding cell segmentation performed with CellProfiler software, together with an updated figure legend explaining the procedure (lines 784–787).

      (2) The authors use Hoechst and integrated z-projections (Figure 1 S1) as a proxy to estimate nuclear volume. Hoechst staining depends on the organization of the DNA within the nucleus and I find that the authors need to do better controls to estimate nuclear size - this would be possible with cells expressing fluorescently labeled histones, or even better with a fluorescently tagged nuclear pore/envelope marker. The current quantification approach is misleading.

      We understand Reviewer #3’s concerns about using Hoechst staining as a proxy of nuclear volume, due to potential differences in DNA organisation within the nucleus.

      Following the recommendation of Reviewer #3 in the following point 3, text has been updated (lines 107–115 and 235-246).

      (3) Was cell density assessed for the measurements? If cells are confluent, bacteria could spread between cells within 3 hrs, if cells are less dense, this does not occur. When epithelial cells are infected for some hours, they have the tendency to round up a bit (and to appear thicker in z), but a bit smaller in xy. My suggestion to the authors (as they use these findings to follow up with experiments on the underlying processes) would be to tone down their statements - eg, Hoechst staining could be simply indicated as altered, but not put in a context of size (this would require substantial control experiments).

      Local cell density was not directly measured, but the experiment was set up to infect at roughly 80% confluency (cells were seeded at 10<sup>4</sup> cells/well 2 days prior to infection in a 96-well microplate, as described in the Materials and methods section) and to ensure bacterial spread between cells.

      In agreement with Reviewer #3 we tone down statements in the main text (see response to point 2 above).

      In addition, I found Figure 1 (and parts of Figure 2) disconnected from the rest of the manuscript, and it may even be an idea to take it out of the manuscript (that could also help to deal with my feedback relating to Figure 1). I would suggest starting the manuscript with the current Figure 3 and building the biological story with a stronger focus on SEPT7 (and its links with T3 secretion and actively pathogenic bacteria) from there on. As it stands, the two parts of the manuscript are not well connected.

      We carefully considered this comment but following revisions we have not reorganised the manuscript. We believe that high-content characterisation of S. flexneri infection in Figure 1 and 2 provides insightful information about changes in host cells in response to infection. Following this, we move onto characterising intracellular bacteria (and in particular those entrapped in septin cages) in the second part of the manuscript (Figure 3-7). Similar methods were used to analyse both host and bacterial cells and results obtained offer complementary views on host-pathogen interactions.

      My major reservation with the experimental work of the current version of the manuscript relates to Figure 5: The analysis of the septin phenotypes in Figure 5 seems to be problematic - to me, it appears that analysis and training were done on projected image stacks. As bacteria are rod-shaped their orientation in space has an enormous impact on how the septin signal appears in a projection - this can lead to wrong interpretation of the phenotypes. The authors need to do some quantitative controls analyzing their data in 3D. To be more clear: the example "tight" (second row) shows a bacterium that appears short. It may be that it's actually longer if one looks in 3D, and the septin signal could possibly fall in the category "rings" or even "two poles".

      The deep learning training and subsequent analysis of septin-cage entrapment is done on projected Z-stacks, which presents limitations. Future work in the Mostowy lab will exploit this first study and dive deeper into 3D aspects of the data.

      To address Reviewer #3’s concern, we include a sentence explaining that this analysis was performed using 2D max projections (lines 708 and 724), as well as acknowledging its limitations in the main text (lines 259-262).

      Minor points

      The scale bar in Fig 1 is very thin.

      We corrected the scale bar in Fig. 1 to make it more visible.

      Could it be that Figure 1F is swapped with Figure1E in the description?

      Descriptions for Figure 1E and F are correct.

      Line 27: what does "actively pathogenic bacteria" mean? I propose to change the term.

      We agree with Reviewer #3 that “actively pathogenic bacteria” should be removed from the text. This update is also in agreement with Reviewer #1 (see Reviewer #1 point 2).

      Line 28: "dynamics" can be confusing as it relates to dynamic events imaged by time-lapse.

      Although we are making a snapshot of the infection process at 3 hpi, we capture asynchronous processes in both host and bacterial cells (eg. host cells infected with different bacterial loads, bacterial cells undergoing actin polymerisation or septin cage entrapment). We agree that we are not following dynamics of full events over time. However, our high content approach enables us to capture different stages of dynamic processes. To avoid confusion, we replace “dynamics” by “diverse interactions” (line 28), and we discuss the importance of follow-up studies studying microscopy timelapses (line 274).

      Paragraph 59 following: the concept of heterogeneity was investigated in some detail for viral infection by the Pelkmans group (PMID: 19710653) using advanced image analysis tools. Advanced machine-learning-based analysis was then performed on Salmonella invasion by Voznica and colleagues (PMID: 29084895). It would be great to include these somewhat "old" works here as they really paved the way for high-content imaging, and the way analyses were performed then should be also discussed in light of how analyses can be performed now with the approaches developed by the authors.

      We agree. These landmark studies have now been included in the main text (lines 71-74).

      Line 181: I do not know what "morphological conformations" means, perhaps the authors can change the wording or clarify.

      We substituted the phrase “morphological conformations” by “morphological patterns” to improve clarity in the main text (lines 185).

      The authors claim (eg in the abstract) that they are measuring the dynamic infection process. To me, it appears that they look at one time-point, so no dynamic information can be extracted. I suggest that the authors tone down their claims.

      Please note our response above (Minor points, Line 28) which also refers to this question.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Nahas et al. investigated the roles of herpes simplex virus 1 (HSV-1) structural proteins using correlative cryo-light microscopy and soft X-ray tomography. The authors generated nine viral variants with deletions or mutations in genes encoding structural proteins. They employed a chemical fixation-free approach to study native-like events during viral assembly, enabling observation of a wider field of view compared to cryo-ET. The study effectively combined virology, cell biology, and structural biology to investigate the roles of viral proteins in virus assembly and budding.

      Strengths:

      (1) The study presented a novel approach to studying viral assembly in cellulo.

      (2) The authors generated nine mutant viruses to investigate the roles of essential proteins in nuclear egress and cytoplasmic envelopment.

      (3) The use of correlative imaging with cryoSIM and cryoSXT allowed for the study of viral assembly in a near-native state and in 3D.

      (4) The study identified the roles of VP16, pUL16, pUL21, pUL34, and pUS3 in nuclear egress.

      (5) The authors demonstrated that deletion of VP16, pUL11, gE, pUL51, or gK inhibits cytoplasmic envelopment.

      (6) The manuscript is well-written, clearly describing findings, methods, and experimental design.

      (7) The figures and data presentation are of good quality.

      (8) The study effectively correlated light microscopy and X-ray tomography to follow virus assembly, providing a valuable approach for studying other viruses and cellular events.

      (9) The research is a valuable starting point for investigating viral assembly using more sophisticated methods like cryo-ET with FIB-milling.

      (10) The study proposes a detailed assembly mechanism and tracks the contributions of studied proteins to the assembly process.

      (11) The study includes all necessary controls and tests for the influence of fluorescent proteins.

      Weaknesses:

      Overall, the manuscript does not have any major weaknesses, just a few minor comments:

      (1) The gel quality in Figure 1 is inconsistent for different samples, with some bands not well resolved (e.g., for pUL11, GAPDH, or pUL20).

      We thank the reviewer for their suggestion. We tried to resolve the bands several times, but unfortunately this was the best outcome we could achieve.

      (2) The manuscript would benefit from a summary figure or table to concisely present the findings for each protein. It is a large body of manuscript, and a summary figure showing the discovered function would be great.

      We thank the reviewer for their suggestion. We have created a summary table (Table 2).

      (3) Figure 2 lacks clarity on the type of error bars used (range, standard error, or standard deviation). It says, however, range, and just checking if this is what the authors meant.

      We thank the reviewer for double-checking, but it is meant to be range, as reported in the legend. We used range because there are only two data points for each time point, which are insufficient to calculate standard deviation or standard error.

      (4) The manuscript could be improved by including details on how the plasma membrane boundary was estimated from the saturated gM-mCherry signal. An additional supplementary figure with the data showing the saturation used for the boundary definition would be helpful.

      We appreciate the suggestion and have included an example of how saturated gM-mCherry signal was used to delineate the cytoplasm in Supp. Fig. 4A.

      (5) Additional information or supplementary figures on the mask used to filter the YFP signal for Figure 4 would be helpful.

      Thanks, we have adapted the text in the results section to clarify: “eYFP-VP26 signal was manually inspected to determine threshold values that filtered out background and included pixels containing individual or clustered puncta that represent capsids.”

      (6) The figure legends could include information about which samples are used for comparison for significance calculations. As the colour of the brackets is different from the compared values (dUL34), it would be great to have this information in the figure legend.

      Thanks, we have adapted Fig. 4B to make the colour of the brackets match the colour used for the ΔUL34 mutant, and we have included labels next to the brackets for clarity. We have applied similar adjustments to Fig. 5D & E and Supp. Fig. 4C.

      (7) In Figure 5B, the association between YFP and mCherry signals is difficult to assess due to the abundance of mCherry signal; single-channel and combined images might improve visualization.

      Thanks, we have provided split and combined channel views in Supp. Fig. 4B to improve visualization.

      (8) In Figure 6D, staining for tubulin could help identify the cytoskeleton structures involved in the observed virus arrays.

      We thank the reviewer for their suggestion, which we think would be interesting future work to build on the current study. Given the competitive nature of access to the cryoSIM and cryoSXT, CLXT, including staining for tubulin was outside the scope of additional experiments we were able to conduct at this time.

      (9) It is unclear in Figure 6D if the microtubule-associated capsids are with the gM envelope or not, as the signal from mCherry is quite weak. It could be made clearer with the split signals to assess the presence of both viral components.

      We have provided split channels to the figure to aid with visualization.

      (10) The representation of voxel intensity in Figure 8 is somewhat confusing. Reversion of the voxel intensity representation to align brighter values with higher absorption, which would simplify interpretation.

      We thank the reviewer for this suggestion. In contrast to fluorescence microscopy where high intensities reflect signal, low intensities represent signal (absorbance of X-rays) in cryoSXT. We respectfully decided not to reverse the values, as we believe that could cause more confusion. We have instead added a black-to-white gradient bar to illustrate that low voxel intensities correspond to dark signal in Fig 8.

      (11) The visualization in panel I of Figure 8 might benefit from a more divergent colormap to better show the variation in X-ray absorbance.

      We thank the reviewer for their suggestion. We experimented with a few different colour schemes but concluded that the current one produced the clearest results and was most accessible for color-blind viewers.

      (12) Figure 9 would be enhanced by images showing the different virus sizes measured for the comparative study, which would help assess the size differences between different assembly stages.

      We thank the reviewer for their suggestion and have included images to accompany the graph.

      Overall, this is an excellent manuscript and an enjoyable read. It would be interesting to see this approach applied to the study of other viruses, providing valuable insights before progressing to high-resolution methods.

      Reviewer #2 (Public review):

      Summary:

      For centuries, humans have been developing methods to see ever smaller objects, such as cells and their contents. This has included studies of viruses and their interactions with host cells during processes extending from virion structure to the complex interactions between viruses and their host cells: virion entry, virus replication and virion assembly, and release of newly constructed virions. Recent developments have enabled simultaneous application of fluorescence-based detection and intracellular localization of molecules of interest in the context of sub-micron resolution imaging of cellular structures by electron microscopy.

      The submission by Nahas et al., extends the state-of-the-art for visualization of important aspects of herpesvirus (HSV-1 in this instance) virion morphogenesis, a complex process that involves virus genome replication, and capsid assembly and filling in the nucleus, transport of the nascent nucleocapsid and some associated tegument proteins through the inner and outer nuclear membranes to the cytoplasm, orderly association of several thousand mostly viral proteins with the capsid to form the virion's tegument, envelopment of the tegumented capsid at a virus-tweaked secretory vesicle or at the plasma membrane, and release of mature virions at the plasma membrane.

      In this groundbreaking study, cells infected with HSV-1 mutants that express fluorescently tagged versions of capsid (eYFP-VP26) and tegument (gM-mCherry) proteins were visualized with 3D correlative structured illumination microscopy and X-ray tomography. The maturation and egress pathways thus illuminated were studied further in infections with fluorescently tagged viruses lacking one of nine viral proteins.

      Strengths:

      This outstanding paper meets the journal's definitions of Landmark, Fundamental, Important, Valuable, and Useful. The work is also Exceptional, Compelling, Convincing, and Solid. The work is a tour de force of classical and state-of-the-art molecular and cellular virology. Beautiful images accompanied by appropriate statistical analyses and excellent figures. The numerous complex issues addressed are explained in a clear and coordinated manner; the sum of what was learned is greater than the sum of the parts. Impacts go well beyond cytomegalovirus and the rest of the herpesviruses, to other viruses and cell biology in general.

      Reviewer #3 (Public review):

      Summary:

      Kamal L. Nahas et al. demonstrated that pUL16, pUL21, pUL34, VP16, and pUS3 are involved in the egress of the capsids from the nucleous, since mutant viruses ΔpUL16, ΔpUL21, ΔUL34, ΔVP16, and ΔUS3 HSV-1 show nuclear egress attenuation determined by measuring the nuclear:cytoplasmic ratio of the capsids, the dfParental, or the mutants. Then, they showed that gM-mCherry+ endomembrane association and capsid clustering were different in pUL11, pUL51, gE, gK, and VP16 mutants. Furthermore, the 3D view of cytoplasmic budding events suggests an envelopment mechanism where capsid budding into spherical/ellipsoidal vesicles drives the envelopment.

      Strengths:

      The authors employed both structured illumination microscopy and cellular ultrastructure analysis to examine the same infected cells, using cryo-soft-X-ray tomography to capture images. This combination, set here for the first time, enabled the authors to obtain holistic data regarding a biological process, as a viral assembly. Using this approach, the researchers studied various stages of HSV-1 assembly. For this, they constructed a dual-fluorescently labelled recombinant virus, consisting of eYFP-tagged capsids and mCherry-tagged envelopes, allowing for the independent identification of both unenveloped and enveloped particles. They then constructed nine mutants, each targeting a single viral protein known to be involved in nuclear egress and envelopment in the cytoplasm, using this dual-fluorescent as the parental one. The experimental setting, both the microscopic and the virological, is robust and well-controlled. The manuscript is well-written, and the data generated is robust and consistent with previous observations made in the field.

      Weaknesses:

      It would be helpful to find out what role the targeted proteins play in nuclear egress or envelopment acquisition in a different orthoherpesvirus, like HSV-2. This would confirm the suitability of the technical approach set and would also act as a way to validate their mechanism at least in one additional herpesvirus beyond HSV-1. So, using the current manuscript as a starting point and for future studies, it would be advisable to focus on the protein functions of other viruses and compare them.

      We appreciate the suggestion and agree that this would be a great starting point for future studies. At present, we do not have a panel of mutant viruses in HSV-2 or another orthoherpesvirus, and it would be significant work to generate them, so we consider this outside the scope of the current study.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) There are enough uncommon abbreviations in the text to justify the inclusion of an abbreviation list.

      We thank the reviewer for the suggestion, but we define all uncommon abbreviations at first mention and an abbreviations list is not part of eLife’s house style.

      (2) The complex paragraph on p. 7 would be much easier to digest if broken into smaller chunks. Consider similar treatment for other lengthy landmark-free blocks of text, e.g., the one that begins on p. 14. Subheadings would help.

      We thank the reviewer for this suggestion. We have divided large paragraphs into more easily digestible chunks throughout the manuscript, for example in the discussion where the previous monolithic 3rd paragraph has been divided into five shorter, focussed paragraphs.

      (3) Table 1 needs units.

      We thank the reviewer for noticing our omission and apologise for the oversight - the table has been updated accordingly.

      Reviewer #3 (Recommendations for the authors):

      (1) Toward the end of the manuscript, I missed some lines attempting to speculate on the origin/nature of the spherical/ellipsoidal vesicles providing the envelopment. Would it be possible to incorporate this in the Discussion section?

      Thank you for noticing that omission. We have now included a few lines speculating that they may represent recycling endosomes, trans-Golgi network vesicles, or a hybrid compartment.

      (2) I congratulate the authors. The work is robust, and I personally highlight the way they managed to include others' results merged with their own, providing a complete view of the story.

      We thank the reviewer for their kind words.

      Note to editors

      In addition to these responses to the reviewer’s comments, we have also now included in the methods section details of the Tracking of Indels by Decomposition (TIDE) analysis we performed (data in Supplementary Figure 3) that was omitted by mistake from the original submission.

    1. But a multithreaded story can offer many voices at once without giving any one of them the last word. This is a reassuring format for encountering a traumatic event because it allows plenty of room for conflicting emotions.

      Here the author is contrasting linear story telling and multithread story telling. This can show how a single version of an event may feel limiting, specifically in context of trauma. The multithreaded version can validate and express multiple perspectives and emotions which I think relates nicely to Hana Feels as we experienced multiple perspectives which I thought helped the story develop.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      Summary

      This paper summarises responses from a survey completed by around 5,000 academics on their manuscript submission behaviours. The authors find several interesting stylised facts, including (but not limited to):

      Women are less likely to submit their papers to highly influential journals (e.g., Nature, Science and PNAS).

      Women are more likely to cite the demands of co-authors as a reason why they didn't submit to highly influential journals.

      Women are also more likely to say that they were advised not to submit to highly influential journals.

      The paper highlights an important point, namely that the submission behaviours of men and women scientists may not be the same (either due to preferences that vary by gender, selection effects that arise earlier in scientists' careers or social factors that affect men and women differently and also influence submission patterns). As a result, simply observing gender differences in acceptance rates - or a lack thereof - should not be automatically interpreted as as evidence for or against discrimination (broadly defined) in the peer review process.

      Major comments

      What do you mean by bias?

      In the second paragraph of the introduction, it is claimed that "if no biases were present in the case of peer review, then we should expect the rate with which members of less powerful social groups enjoy successful peer review outcomes to be proportionate to their representation in submission rates." There are a couple of issues with this statement.

      First, the authors are implicitly making a normative assumption that manuscript submission and acceptance rates *should* be equalised across groups. This may very well be the case, but there can also be valid reasons - even when women are not intrinsically better at research than men - why a greater fraction of female-authored submissions are accepted relative to male-authored submissions (or vice versa). For example, if men are more likely to submit their less ground-breaking work, then one might reasonably expect that they experience higher rejection rates compared to women, conditional on submission.

      We do assume that normative statement: unless we believe that men’s papers are intrinsically better than women’s papers, the acceptance rate should be the same. But the referee is right: we have no way of controlling for the intrinsic quality of the work of men and women. That said, our manuscript does not show that there is a different acceptance rate for men and women; it shows that women are less likely to submit papers to a subset of journals that are of a lower Journal Impact Factor, controlling for their most cited paper, in an attempt to control for intrinsic quality of the manuscripts.

      Second, I assume by "bias", the authors are taking a broad definition, i.e., they are not only including factors that specifically relate to gender but also factors that are themselves independent of gender but nevertheless disproportionately are associated with one gender or another (e.g., perhaps women are more likely to write on certain topics and those topics are rated more poorly by (more prevalent) male referees; alternatively, referees may be more likely to accept articles by authors they've met before, most referees are men and men are more likely to have met a given author if he's male instead of female). If that is the case, I would define more clearly what you mean by bias. (And if that isn't the case, then I would encourage the authors to consider a broader definition of "bias"!)

      Yes, the referee is right that we are taking a broad definition of bias. We provide a definition of bias on page 3, line 92. This definition is focused on differential evaluation which leads to differential outcomes. We also hedge our conversation (e.g., page 3, line 104) to acknowledge that observations of disparities may only be an indicator of potential bias, as many other things could explain the disparity. In short, disparities are a necessary but insufficient indicator of bias. We add a line in the introduction to reinforce this. The only other reference to the term bias comes on page 10, line 276. We add a reference to Lee here to contextualize.

      Identifying policy interventions is not a major contribution of this paper

      I would take out the final sentence in the abstract. In my opinion, your survey evidence isn't really strong enough to support definitive policy interventions to address the issue and, indeed, providing policy advice is not a major - or even minor - contribution of your paper. (Basically, I would hope that someone interested in policy interventions would consult another paper that much more thoughtfully and comprehensively discusses the costs and benefits of various interventions!) While it's fine to briefly discuss them at the end of your paper - as you currently do - I wouldn't highlight that in the abstract as being an important contribution of your paper.

      We thank the referee for this comment. While we agree that our results do not lead to definitive policy interventions, we believe that our findings point to a phenomenon that should be addressed through policy interventions. Given that some interventions are proposed in our conclusion, we feel like stating this in the abstract is coherent.

      Minor comments

      What is the rationale for conditioning on academic rank and does this have explanatory power on its own - i.e., does it at least superficially potentially explain part of the gender gap in intention to submit?

      Thank you for this thoughtful question. We conditioned on academic rank in all regression analyses to account for structural differences in career stage that may potentially influence submission behaviors. Academic rank (e.g., assistant, associate, full professor) is a key determinant of publishing capacity and strategic considerations, such as perceived likelihood of success at elite journals, tolerance for risk, and institutional expectations for publication venues.

      Importantly, academic rank is also correlated with gender due to cumulative career disadvantages that contribute to underrepresentation of women at more senior levels. Failing to adjust for rank would conflate gender effects with differences attributable to career stage. By including rank as a covariate, we aim to isolate gender-associated patterns in submission behavior within comparable career stages, thereby producing a more precise estimate of the gender effect.

      Regarding explanatory power, academic rank does indeed contribute significantly to model fit across our analyses, indicating that it captures meaningful variation in submission behavior. However, even after adjusting for rank, we continue to observe significant gender differences in submission patterns in several disciplines. This suggests that while academic rank explains part of the variation, it does not fully account for the gender gap—highlighting the importance of examining other structural and behavioral factors that shape the publication trajectory.

      Reviewer #2 (Public review):

      Basson et al. present compelling evidence supporting a gender disparity in article submission to "elite" journals. Most notably, they found that women were more likely to avoid submitting to one of these journals based on advice from a colleague/mentor. Overall, this work is an important addition to the study of gender disparities in the publishing process.

      I thank the authors for addressing my concerns.

      Reviewer #4 (Public review):

      Main strengths

      The topic of the MS is very relevant given that across the sciences/academia, genders are unevenly represented, which has a range of potential negative consequences. To change this, we need to have the evidence on what mechanisms cause this pattern. Given that promotion and merit in academia are still largely based on the number of publications and the impact factor, one part of the gap likely originates from differences in publication rates of women compared to men.

      Women are underrepresented compared to men in journals with a high impact factor. While previous work has detected this gap and identified some potential mechanisms, the current MS provides strong evidence that this gap might be due to a lower submission rate of women compared to men, rather than the rejection rates. These results are based on a survey of close to 5000 authors. The survey seems to be conducted well (though I am not an expert in surveys), and data analysis is appropriate to address the main research aims. It was impossible to check the original data because of the privacy concerns.

      Interestingly, the results show no gender bias in rejection rates (desk rejection or overall) in three high-impact journals (Science, Nature, PNAS). However, submission rates are lower for women compared to men, indicating that gender biases might act through this pathway. The survey also showed that women are more likely to rate their work as not groundbreaking and are advised not to submit to prestigious journals, indicating that both intrinsic and extrinsic factors shape women's submission behaviour.

      With these results, the MS has the potential to inform actions to reduce gender bias in publishing, but also to inform assessment reform at a larger scale.

      I do not find any major weaknesses in the revised manuscript.

      Reviewer #4 (Recommendations for the authors):

      (1) Colour schemes of the Figures are not adjusted for colour-blindness (red-green is a big NO), some suggestions can be found here https://www.nceas.ucsb.edu/sites/default/files/2022-06/Colorblind%20Safe%20Color%20Schemes.pdf

      We appreciate the suggestion. We’ve adjusted the colors in the manuscript to be color-blind friendly using one of the colorblind safe palettes suggested by the reviewer.

      (2) I do not think that the authors have fully addressed the comment about APCs and the decision to submit, given that PNAS has publication charges that amount to double of someone's monthly salary. I would add a sentence or two to explain that publication charges should not be a factor for Nature and Science, but might be for PNAS.

      While APCs are definitely a factor affecting researchers’ submission behavior, it is mostly does so for lower prestige journals rather than for the three elite journals analyzed here. As mentioned in the previous round of revisions, Nature and Science have subscription options. And PNAS authors without funding have access to waivers: https://www.pnas.org/author-center/publication-charges

      (3) Line 268, the first suggestion here is not something that would likely work. Thus, I would not put it as the first suggestion.

      We made the suggested change.

      (4) Data availability - remove AND in 'Aggregated and de-identified data' because it sounds like both are shared. Suggest writing: 'Aggregated, de-identified data..'. I still suggest sharing data/code in a trusted repository (e.g. Dryad, ZENODO...) rather than on GitHub, as per the current recommendation on the best practices for data sharing.

      Thank you for your comment regarding data availability. Due to IRB restrictions and the conditions of our ethics approval, we are not permitted to share the survey data used in this study. However, to support transparency and reproducibility, we have made all analysis code available on Zenodo at https://doi.org/10.5281/zenodo.16327580. In addition, we have included a synthetic dataset with the same structure as the original survey data but containing randomly generated values. This allows others to understand the data structure and replicate our analysis pipeline without compromising participant confidentiality.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      Chao et al. produced an updated version of the SpliceAI package using modern deep learning frameworks. This includes data preprocessing, model training, direct prediction, and variant effect prediction scripts. They also added functionality for model fine-tuning and model calibration. They convincingly evaluate their newly trained models against those from the original SpliceAI package and investigate how to extend SpliceAI to make predictions in new species. While their comparisons to the original SpliceAI models are convincing on the grounds of model performance, their evaluation of how well the new models match the original's understanding of non-local mutation effects is incomplete. Further, their evaluation of the new calibration functionality would benefit from a more nuanced discussion of what set of splice sites their calibration is expected to hold for, and tests in a context for which calibration is needed.

      Strengths:

      (1) They provide convincing evidence that their new implementation of SpliceAI matches the performance of the original model on a similar dataset while benefiting from improved computational efficiencies. This will enable faster prediction and retraining of splicing models for new species as well as easier integration with other modern deep learning tools.

      (2) They produce models with strong performance on non-human model species and a simple, well-documented pipeline for producing models tuned for any species of interest. This will be a boon for researchers working on splicing in these species and make it easy for researchers working on new species to generate their own models.

      (3) Their documentation is clear and abundant. This will greatly aid the ability of others to work with their code base.

      We thank the reviewer for these positive comments.  

      Weaknesses:

      (1) The authors' assessment of how much their model retains SpliceAI's understanding of "nonlocal effects of genomic mutations on splice site location and strength" (Figure 6) is not sufficiently supported. Demonstrating this would require showing that for a large number of (non-local) mutations, their model shows the same change in predictions as SpliceAI or that attribution maps for their model and SpliceAI are concordant even at distances from the splice site. Figure 6A comes close to demonstrating this, but only provides anecdotal evidence as it is limited to 2 loci. This could be overcome by summarizing the concordance between ISM maps for the two models and then comparing across many loci. Figure 6B also comes close, but falls short because instead of comparing splicing prediction differences between the models as a function of variants, it compares the average prediction difference as a function of the distance from the splice site. This limits it to only detecting differences in the model's understanding of the local splice site motif sequences. This could be overcome by looking at comparisons between differences in predictions with mutants directly and considering non-local mutants that cause differences in splicing predictions.

      We agree that two loci are insufficient to demonstrate preservation of non-local effects. To address this, we have extended our analysis to a larger set of sites: we randomly sampled 100 donor and 100 acceptor sites, applied our ISM procedure over a 5,001 nt window centered at each site for both models, and computed the ISM map as before. We then calculated the Pearson correlation between the collection of OSAI<sub>MANE</sub> and SpliceAI ISM importance scores. We also created 10 additional ISM maps similar to those in Figure 6A, which are now provided in Figure S23.

      Follow is the revised paragraph in the manuscript’s Results section:

      First, we recreated the experiment from Jaganathan et al. in which they mutated every base in a window around exon 9 of the U2SURP gene and calculated its impact on the predicted probability of the acceptor site. We repeated this experiment on exon 2 of the DST gene, again using both SpliceAI and OSAI<sub>MANE</sub> . In both cases, we found a strong similarity between the resultant patterns between SpliceAI and OSAI<sub>MANE</sub>, as shown in Figure 6A. To evaluate concordance more broadly, we randomly selected 100 donor and 100 acceptor sites and performed the same ISM experiment on each site. The Pearson correlation between SpliceAI and OSAI<sub>MANE</sub> yielded an overall median correlation of 0.857 (see Methods; additional DNA logos in Figure S23). 

      To characterize the local sequence features that both models focus on, we computed the average decrease in predicted splice-site probability resulting from each of the three possible singlenucleotide substitutions at every position within 80bp for 100 donor and 100 acceptor sites randomly sampled from the test set (Chromosomes 1, 3, 5, 7, and 9). Figure 6B shows the average decrease in splice site strength for each mutation in the format of a DNA logo, for both tools.

      We added the following text to the Methods section:

      Concordance evaluation of ISM importance scores between OSAI<sub>MANE</sub> and SpliceAI

      To assess agreement between OSAI<sub>MANE</sub>  and SpliceAI across a broad set of splice sites, we applied our ISM procedure to 100 randomly chosen donor sites and 100 randomly chosen acceptor sites. For each site, we extracted a 5,001 nt window centered on the annotated splice junction and, at every coordinate within that window, substituted the reference base with each of the three alternative nucleotides. We recorded the change in predicted splice-site probability for each mutation and then averaged these Δ-scores at each position to produce a 5,001-score ISM importance profile per site.

      Next, for each splice site we computed the Pearson correlation coefficient between the paired importance profiles from ensembled OSAI<sub>MANE</sub> and ensembled SpliceAI. The median correlation was 0.857 for all splice sites. Ten additional zoom-in representative splice site DNA logo comparisons are provided in Supplementary Figure S23.

      (2) The utility of the calibration method described is unclear. When thinking about a calibrated model for splicing, the expectation would be that the models' predicted splicing probabilities would match the true probabilities that positions with that level of prediction confidence are splice sites. However, the actual calibration that they perform only considers positions as splice sites if they are splice sites in the longest isoform of the gene included in the MANE annotation. In other words, they calibrate the model such that the model's predicted splicing probabilities match the probability that a position with that level of confidence is a splice site in one particular isoform for each gene, not the probability that it is a splice site more broadly. Their level of calibration on this set of splice sites may very well not hold to broader sets of splice sites, such as sites from all annotated isoforms, sites that are commonly used in cryptic splicing, or poised sites that can be activated by a variant. This is a particularly important point as much of the utility of SpliceAI comes from its ability to issue variant effect predictions, and they have not demonstrated that this calibration holds in the context of variants. This section could be improved by expanding and clarifying the discussion of what set of splice sites they have demonstrated calibration on, what it means to calibrate against this set of splice sites, and how this calibration is expected to hold or not for other interesting sets of splice sites. Alternatively, or in addition, they could demonstrate how well their calibration holds on different sets of splice sites or show the effect of calibrating their models against different potentially interesting sets of splice sites and discuss how the results do or do not differ.

      We thank the reviewer for highlighting the need to clarify our calibration procedure. Both SpliceAI and OpenSpliceAI are trained on a single “canonical” transcript per gene: SpliceAI on the hg 19 Ensembl/Gencode canonical set and OpenSpliceAI on the MANE transcript set. To calibrate each model, we applied post-hoc temperature scaling, i.e. a single learnable parameter that rescales the logits before the softmax. This adjustment does not alter the model’s ranking or discrimination (AUC/precision–recall) but simply aligns the predicted probabilities for donor, acceptor, and non-splice classes with their observed frequencies. As shown in our reliability diagrams (Fig. S16-S22), temperature scaling yields negligible changes in performance, confirming that both SpliceAI and OpenSpliceAI were already well-calibrated. However, we acknowledge that we didn’t measure how calibration might affect predictions on non-canonical splice sites or on cryptic splicing. It is possible that calibration might have a detrimental effect on those, but because this is not a key claim of our paper, we decided not to do further experiments. We have updated the manuscript to acknowledge this potential shortcoming; please see the revised paragraph in our next response.

      (3) It is difficult to assess how well their calibration method works in general because their original models are already well calibrated, so their calibration method finds temperatures very close to 1 and only produces very small and hard to assess changes in calibration metrics. This makes it very hard to distinguish if the calibration method works, as it doesn't really produce any changes. It would be helpful to demonstrate the calibration method on a model that requires calibration or on a dataset for which the current model is not well calibrated, so that the impact of the calibration method could be observed.

      It’s true that the models we calibrated didn’t need many changes. It is possible that the calibration methods we used (which were not ours, but which were described in earlier publications) can’t improve the models much. We toned down our comments about this procedure, as follows.

      Original:

      “Collectively, these results demonstrate that OSAIs were already well-calibrated, and this consistency across species underscores the robustness of OpenSpliceAI’s training approach in diverse genomic contexts.”

      Revised:

      “We observed very small changes after calibration across phylogenetically diverse species, suggesting that OpenSpliceAI’s training regimen yielded well‐calibrated models, although it is possible that a different calibration algorithm might produce further improvements in performance.”

      Reviewer #2 (Public review):

      Summary:

      The paper by Chao et al offers a reimplementation of the SpliceAI algorithm in PyTorch so that the model can more easily/efficiently be retrained. They apply their new implementation of the SpliceAI algorithm, which they call OpenSpliceAI, to several species and compare it against the original model, showing that the results are very similar and that in some small species, pretraining on other species helps improve performance.

      Strengths:

      On the upside, the code runs fine, and it is well documented.

      Weaknesses:

      The paper itself does not offer much beyond reimplementing SpliceAI. There is no new algorithm, new analysis, new data, or new insights into RNA splicing. There is no comparison to many of the alternative methods that have since been published to surpass SpliceAI. Given that some of the authors are well-known with a long history of important contributions, our expectations were admittedly different. Still, we hope some readers will find the new implementation useful.

      We thank the reviewer for the feedback. We have clarified that OpenSpliceAI is an open-source PyTorch reimplementation optimized for efficient retraining and transfer learning, designed to analyze cross-species performance gains, and supported by a thorough benchmark and the release of several pretrained models to clearly position our contribution.

      Reviewer #3 (Public review):

      Summary:

      The authors present OpenSpliceAI, a PyTorch-based reimplementation of the well-known SpliceAI deep learning model for splicing prediction. The core architecture remains unchanged, but the reimplementation demonstrates convincing improvements in usability, runtime performance, and potential for cross-species application.

      Strengths:

      The improvements are well-supported by comparative benchmarks, and the work is valuable given its strong potential to broaden the adoption of splicing prediction tools across computational and experimental biology communities.

      Major comments:

      Can fine-tuning also be used to improve prediction for human splicing? Specifically, are models trained on other species and then fine-tuned with human data able to perform better on human splicing prediction? This would enhance the model's utility for more users, and ideally, such fine-tuned models should be made available.

      We evaluated transfer learning by fine-tuning models pretrained on mouse (OSAI<sub>Mouse</sub>), honeybee (OSAI<sub>Honeybee</sub>), Arabidopsis (OSAI<sub>Arabidopsis</sub>), and zebrafish (OSAI<sub>Zebrafish</sub>) on human data. While transfer learning accelerated convergence compared to training from scratch, the final human splicing prediction accuracy was comparable between fine-tuned and scratch-trained models, suggesting that performance on our current human dataset is nearing saturation under this architecture.

      We added the following paragraph to the Discussion section:

      We also evaluated pretraining on mouse (OSAI<sub>Mouse</sub>), honeybee (OSAI<sub>Honeybee</sub>), zebrafish (OSAI<sub>Zebrafish</sub>), and Arabidopsis (OSAI<sub>Arabidopsis</sub>) followed by fine-tuning on the human MANE dataset. While cross-species pretraining substantially accelerated convergence during fine-tuning, the final human splicing-prediction accuracy was comparable to that of a model trained from scratch on human data. This result indicates that our architecture seems to capture all relevant splicing features from human training data alone, and thus gains little or no benefit from crossspecies transfer learning in this context (see Figure S24).

      Reviewer #1 (Recommendations for the authors):

      We thank the editor for summarizing the points raised by each reviewer. Below is our point-bypoint response to each comment:

      (1) In Figure 3 (and generally in the other figures) OpenSpliceAI should be replaced with OSAI_{Training dataset} because otherwise it is hard to tell which precise model is being compared. And in Figure 3 it is especially important to emphasize that you are comparing a SpliceAI model trained on Human data to an OSAI model trained and evaluated on a different species.

      We have updated the labels in Figures 3, replacing “OpenSpliceAI” with “OSAI_{training dataset}” to more clearly specify which model is being compared.

      (2) Are genes paralogous to training set genes removed from the validation set as well as the test set? If you are worried about data leakage in the test set, it makes sense to also consider validation set leakage.

      Thank you for this helpful suggestion. We fully agree, and to avoid any data leakage we implemented the identical filtering pipeline for both validation and test sets: we excluded all sequences paralogous or homologous to sequences in the training set, and further removed any sequence sharing > 80 % length overlap and > 80 % sequence identity with training sequences. The effect of this filtering on the validation set is summarized in Supplementary Figure S7C.

      Reviewer #3 (Recommendations for the authors):

      (1) The legend in Figure 3 is somewhat confusing. The labels like "SpliceAI-Keras (species name)" may imply that the model was retrained using data from that species, but that's not the case, correct?

      Yes, “SpliceAI-Keras (species name)” was not retrained; it refers to the released SpliceAI model evaluated on the specified species dataset. We have revised the Figure 3 legends, changing “SpliceAI-Keras (species name)” to “SpliceAI-Keras” to clarify this.

      (2) Please address the minor issues with the code, including ensuring the conda install works across various systems.

      We have addressed the issues you mentioned. OpenSpliceAI is now available on Conda and can be installed with:  conda install openspliceai. 

      The conda package homepage is at: https://anaconda.org/khchao/openspliceai We’ve also corrected all broken links in the documentation.

      (3) Utility:

      I followed all the steps in the Quick Start Guide, and aside from the issues mentioned below, everything worked as expected.

      I attempted installation using conda as described in the instructions, but it was unsuccessful. I assume this method is not yet supported.

      In Quick Start Guide: predict, the link labeled "GitHub (models/spliceai-mane/10000nt/)" appears to be incorrect. The correct path is likely "GitHub (models/openspliceaimane/10000nt/)".

      In Quick Start Guide: variant (https://ccb.jhu.edu/openspliceai/content/quick_start_guide/quickstart_variant.html#quick-startvariant), some of the download links for input files were broken. While I was able to find some files in the GitHub repository, I think the -A option should point to data/grch37.txt, not examples/data/input.vcf, and the -I option should be examples/data/input.vcf, not data/vcf/input.vcf.

      Thank you for catching these issues. We’ve now addressed all issues concerning Conda installation and file links. We thank the editor for thoroughly testing our code and reviewing the documentation.

    1. In this article, the authors present a study using different networks from various data sources to measure differences in gathering scholarly document topics and to show which networks provide the best information to represent the scientific topics considered appropriately. The work is built on a previous contribution and analyses networks obtained from six sources: scholarly document authors, Facebook users, Twitter users and conversations, patents, and policy documents. These networks are also accompanied by other networks, i.e. the text similarity network and the citation network, that are mainly used for comparison purposes.

      The work particularly interests the scholarly community, aiming to work with science map generation. However, some passages need further explanation to be clear to the reader.

      1. In the abstract, there is a mention of traditional and non-traditional data sources. While in the text of the article there are, indeed, some clarifications, it would be ideal to briefly explain in the abstract what the authors refer to these terms, since it is not immediately clear what is a traditional data source in the context of topic identification.

      2. In the introduction, the authors anticipate the outcomes of a previous work they have conducted on a similar topic. They claim that some topics are well-represented in maps based on citation links and text similarity, while others are not. However, it is not clear which sources they have used to get to this claim, and it is also not evident what the main difference is that characterises the current work compared to the previous one.

      3. In section 3, the authors introduce all the methods and materials used for their analysis. Despite the fact that some of the material cannot be shared since it is behind a paywall (e.g. the Web of Science data), by reading the section, it is not clear that all the code developed and the data obtained from the analysis have been published on Zenodo. While it is okay to address this aspect in the appropriate section at the end of the article, I would suggest to anticipate this information at the beginning of section 3, citing the Zenodo record appropriately and clarifying which of material is not included in that record, thus explaining that the full reproducibility of the experiment cannot be conducted.

      4. Considering all the external sources of networks, it is not clear what the datetime window of each source is - are all these sources containing information from the year of publication of the oldest article in the document set considered to 2024?

      5. As far as I understood from the formula in section 3.7.1, the Purity is always calculated against a particular topic M. Thus, why not refer to such "M" in the formula definition, defining it in a function-like way Purity(N, M)? In addition, still in this section, it is not clear how the N clusters considered are selected. A running example of Purity calculation would probably help the reader here.

      6. In section 3.7.2, the denominator of the formula is set to 5. However, it is unclear why such a number is sensitive for the calculation presented. Why not 6 or 7? Why not 3? I think the authors should clearly justify the choice of such a denominator by bringing in explicit evidence.

      7. In section 3.7.3, it is not entirely clear what the difference is between topics and topic categories.

      8. In the discussions, it would be good to extend a bit on the work's limitation and envision possible paths for future works in the area. A few points that I would love to see discussed in detail:

        • The analysis has been done by using sources that may have changed drastically in the past months/years - e.g. Twitter that, after becoming X, has seen a series of abandons from the academics towards more open (in a broad sense) platforms and networks (e.g. Mastodon and, more recently, BlueSky). Would it be possible to gather the necessary data from these platforms to run the study again? If yes, would it be possible to download them? If not, should we consider these sources unreliable for scientific purposes and, if so, what preconditions should be in place for their reliability? Considering the present situation, what is the relevance of the results obtained with the data gathered from Twitter (now X)?

        • The authors transparently claim that some of the data used (e.g. Web of Science data) are not freely available to the reader, thus preventing the full replication of the study. Is it possible to substitute these closed sources with others offering open research information? For instance, OpenCitations for gathering the citation network (full disclosure: I'm director of OpenCitations), PubMed and PubMed Central for gathering titles and abstracts of the article considered, etc.?

        • The core set of scholarly documents considered are primarily from the biomedical domain since the authors considered only those with a PubMed identifier specified. While the results shown are sensitive for this domain, how much does the approach the authors presented scale also in other scholarly areas, e.g. Social Science and Humanities? Is it possible to speculate that the approach presented is discipline-agnostic? Is there any evidence for such a claim?

      Some final remarks:

      A. The figures should be closer (i.e. maximum on the next page) to the place they are mentioned the very first time.

      B. The research question introduced in the article is introduced in section 1, and then it is not explicitly mentioned anymore in the text. It would be ideal to add an explicit reference to that question when the authors present appropriate evidence to answer it (e.g. in section 4) and to recall the answer to that question in the conclusion of the paper.

    1. Note: This response was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity):

      This manuscript described the translational responses to single and combined BCAA shortages in mouse cell lines. Using Ribo-seq and RNA-seq analysis, the authors found selective ribosome pausing at codons that encode the depleted amino acids, where the pausing at valine codons was prominent at both a single and triple starvations whereas isoleucine codons showed pausing only under a single depletion. They analyzed the mechanisms of the unexpected selective pausing and proposed that the positional codon usage bias could shape the ribosome stalling and tRNA charging patterns across different amino acids. They also examined the stress responses and the changes in the protein expression levels under BCAA starvation.

      The manuscript was well-written, and the findings are interesting, especially their model that positional codon usage bias could be a regulator of ribosome pausing and tRNA charging levels. Although different translational responses to distinct amino acid starvation have been widely documented, the positional codon usage bias is an interesting aspect. The manuscript's central message could have been made clearer. The authors may consider emphasizing this point more explicitly in the abstract. The rich multi-omics dataset in this work provides valuable resources for the translation field.

      We thank the reviewer for the thoughtful and positive evaluation of our work.

      Major comments

      1. The abstract may need to be revised since it is hard to immediately catch the authors' main point. If the authors regard this work as a resource paper, the current version is fine. But it could be better to point out the positional codon usages the authors found, which is a strong point of the current manuscript.

      Response: We thank the reviewer for highlighting the importance of positional codon usage, which indeed represents a key finding of our study. We revised the abstract, and we now emphasize this aspect more clearly. However, in response to review #2, we have framed the observed positional effects and the idea of an elongation bottleneck as one possible contributing mechanism among others and relate it specifically to the attenuation of isoleucine-specific stalling under triple starvation.

      1. Page 18 "Beyond these tRNA dynamics, our data also highlight the importance of the codon positional context within mRNAs, indicating that where a codon is located within the CDS can influence both the extent of ribosomal stalling and overall translation efficiency during nutrient stress." This idea is interesting. To what extent the authors think this could be generalized? The authors may discuss whether they think their proposed model is specific to the different ribosome stalling patterns between valine and isoleucine codons or generalized to other codon combinations. For example, the positional codon usage bias will be different among different organisms, and are there any previous reports on ribosome behaviors that align with their model?

      Response: We thank the reviewer for raising these important points. While our study primarily focuses on the differential stalling patterns of valine and isoleucine codons, we believe the underlying principle, that the position of codons within the CDS can modulate the extent of ribosome stalling, may under very specific circumstances extend beyond this amino acid pair. We expect this positional effect to be potentially relevant for combinations in which one amino acid has considerable enrichment near the 5′ end of coding sequences, coupled with starvation-sensitive tRNA isoacceptors, while the other does not. In our case, valine meets these criteria (see Fig. S11A and Fig. 6). In contrast, isoleucine and leucine codons, although also relatively frequent, show more variable positional distributions and are both decoded by isoacceptors that appear more resistant to starvation, as illustrated in Fig. 6 and reported for mammals and bacteria in Saikia et al. 2016; Darnell, Subramaniam, and O’Shea 2018; Elf et al. 2003; Dittmar et al. 2005. To explore the generalizability of this model, we have now included a transcriptome-wide analysis of codon position biases in mouse for all codons in the revised manuscript (Supplementary Figures 10 and 11). This analysis may serve as a basis to identify additional candidate codons for future studies. Furthermore, we now mention in the Discussion that amino acids with similar properties to valine regarding their positional distribution and tRNA isoacceptors, such as phenylalanine, and glutamine, whose tRNA isoacceptors are predicted to be fully deacylated under their respective starvation in bacteria (Elf et al. 2003), could be promising candidates for testing this model, in combination with amino acids, whose tRNAs are expected to remain partially charged under starvation or to be depleted at the start of the CDS such as i.e. His (Supplementary Fig.11C).

      Even if the authors think this model can be applied to BCAA starvation, would it be possible to explain the different isoleucine codon responses between single and double starvation? The authors may discuss why the ribosome stalling at isoleucine AUU and AUC codons was slightly attenuated under double starvation. And how about the different leucine codon responses among single, double, and triple starvations, although the pausing is not as strong as isoleucine and valine codons?

      Response: Regarding the attenuated isoleucine stalling under double starvation, we believe this is primarily due to stronger inhibition of the mTORC1 pathway when leucine is co-depleted (i.e., in the double starvation condition; Fig. 2D–F). This results in a more substantial suppression of global translation, reducing overall tRNA demand and thereby mitigating stalling (Darnell, 2018). A similar effect may explain the only mild leucine codon stalling observed under single leucine starvation, which also triggers strong mTORC1 inhibition and reduced initiation. In contrast, triple starvation does not suppress mTORC1 to the same extent, and thus reduced initiation alone cannot explain the absence of leucine codon stalling. Instead, we propose that additional features, such as the relative sensitivity of tRNA isoacceptors to starvation and their aminoacylation dynamics, must be considered. Valine tRNAs, for example, are known to be highly sensitive and become strongly deacylated under starvation in bacteria (Elf et al. 2003), a pattern that we also find in our own data (Fig. 6). Leucine tRNAs, by contrast, appear more resistant, possibly due to better amino acid recycling or isoacceptor-specific differences in charging kinetics, though further validation would be needed. However, combined with the strong stalling at 5′-enriched valine codons, this could reduce downstream ribosome traffic and limit exposure of leucine codons, thus preventing stalling. However, our new analysis of the positional relationship between valine and leucine codons within individual transcripts (now shown in Supplementary Figure 11B) did not reveal as strong a pattern as we observed for valine and isoleucine codons. We now discuss these points and their implications in the revised Discussion.

      Experimental validation using artificial reporters carrying biased sequences may also be considered.

      Response: We appreciate the reviewer’s suggestion. In fact, we explored this experimentally using a dual-fluorescent reporter system (GFP–RFP) (Juszkiewicz and Hegde 2017) containing consecutive Val or Ile codons. However, the constructs yielded variable and non-reproducible results under starvation conditions. In addition, testing the role of codon position would require placing the same codons at multiple defined positions within a single transcript and performing ribosome profiling directly on the reporter. This type of targeted experimental validation is technically challenging and falls beyond the scope of the current study. We now mention this explicitly in the revised Discussion as an interesting direction for future work.

      1. Page 13 "Moreover, we noticed that DT changes extend beyond the ribosomal A-site, including the P-site, E-site, and even further positions (Supplementary Fig. 2A), consistent with other studies on single amino acid starvation 39 (Supplementary Fig. 2B-C)." Could the widespread DT changes be due to Ribo-DT pipeline they used or difficulties in offset determination? Indeed the authors showed that this feature was found in other datasets, but it seems that the datasets were processed and analyzed in the same way as their data. The original Ribo-DT paper (Gobet and Naef, 2022, Methods) also showed some widespread DT changes even from RNA-seq. Another analysis method like the codon subsequence abundant shift as a part of diricore analysis (Loayza-Puch et al., 2016, Nature) did not show that broad changed regions. The authors are encouraged to re-analyze the data sets using different methods.

      Response: We agree with the reviewer that the fact that DT changes beyond the ribosomal A-site is puzzling, but this has already been seen in other papers using other approaches (Darnell, Subramaniam, and O’Shea 2018). To validate that this shift is not due to our A-site assignment, enrichment analysis, or DT method, we applied the Diricore pipeline to our Ribo-Seq data. The output of the pipeline provides either 5’-end ribosome density or “subsequence” analysis using an A-site offset for each read size based on the metagene profile at the start codon. Both analyses show the same enriched codons across the different conditions as in our analyses, and the broad shift is similar, with the maximum signal at E, -1 position (Fig. R1).

      1. Page 13 "Intriguingly, only two of the three isoleucine codons (AUU and AUC) showed increased DTs upon Ile starvation (p < 0.01), while just one leucine codon (CUU) exhibited a modest but significant DT increase (p < 0.01) under Leu starvation (Figure 1A-B, Supplementary Figure 2A)." How can the authors explain the different strengths of ribosome pausing at Ile codons under Ile and double starvation? The AUA codon did not show any pausing under either of the starvation conditions. Throughout the manuscript, the authors mainly describe the difference between amino acids but it is desirable to discuss the codon-level difference as well.

      Response: Thank you for raising this point. The observed differences in stalling between the isoleucine codons can likely be explained by differences in tRNA isoacceptor charging and positional bias within transcripts. The AUA codon is decoded by a distinct tRNAIle isoacceptor (tRNAIleUAU), which, according to our tRNA charging data (Fig. 6), remains largely charged during Ile starvation. This observation aligns with previous reports suggesting that this isoacceptor is more resistant to starvation-induced deacylation in mammalian cells and bacteria (Saikia et al. 2016; Elf et al. 2003). In contrast, the AUU and AUC codons are primarily decoded by the tRNAIleAAU isoacceptor, which we find to be strongly deacylated under Ile starvation, likely contributing to the observed codon-specific ribosome pausing. Additionally, we found that the AUA codons are relatively rare in general and particularly underrepresented near the 5′ ends of coding sequences. Our new spatial analysis (now included in Supplementary Figure 11B) confirms that AUA codons tend to occur downstream of AUU and AUC codons within transcripts. This potentially further reduces stalling on these codons and further diminishes their apparent DT increase under starvation. In order to better explain these important points, we have now expanded the codon-level discussion of these differences in the revised manuscript.

      1. Page 13 "We examined the effects of single amino acid starvations (-Leu, -Ile and -Val), as well as combinations, including a double starvation of leucine and isoleucine (hereafter referred to as "double") and a starvation of leucine, isoleucine, and valine ("triple"), allowing us to identify potential non-additive effects." The different double starvations, isoleucine and valine, and leucine and valine, will further support their hypothesis on the effects of the positional codon usage bias on ribosome pausing and tRNA charging patterns. Although this could be beyond the scope of the current manuscript, the authors are encouraged to provide a rationale for the chosen combination.

      Response: Our experimental design evolved stepwise: we initially focused on leucine and isoleucine depletion as we found that despite their structure similarity these had respectively short and long dwell times in our previous work in the mouse liver (Gobet et al. 2020). Valine was included at a later stage to cover all the BCAAs. At the time, we did not anticipate valine to yield particularly striking effects in cells, and therefore we did not include systematic pairwise depletions involving valine. However, the strong and unexpected stalling observed at valine codons, especially under triple starvation, became a central aspect of the study. Thus, we agree that additional combinations, such as Leu/Val or Val/Ile, could be informative and now mention this in the Discussion as a potential direction for future studies.

      Minor comments

      Page 16 "these results imply that BCAA deprivation lowers protein output through multiple pathways: a combination of reduced initiation, direct elongation blocks (stalling), and possibly an increased proteolysis" This conclusion is totally right but may be too general. Could the authors summarize BCAA-specific features of the events including reduced initiation, stalling, and proteolysis that all contribute to protein outputs? This is not well discussed in the latter sections including Discussion.

      Response: We thank the reviewer for this helpful suggestion. We agree that the original statement was too general and have revised the relevant section to more clearly delineate the distinct responses observed under each BCAA starvation condition. Specifically, we now summarize that valine starvation is characterized by strong, positionally biased ribosome stalling; leucine starvation primarily impacts translation initiation, likely via mTORC1 repression; and isoleucine starvation shows a mixed phenotype, with features of both impaired initiation and codon-specific elongation delays. We also clarify that while protein stability or degradation may contribute to the observed changes in protein output, our current data do not allow for quantitative assessment of proteolytic effects (e.g., changes in protein half-life). Therefore, we refrain from making direct quantitative conclusions about the differential modulations of proteolysis and instead focus our discussion on the translational mechanisms supported by our data.

      Reviewer #1 (Significance):

      The manuscript was well-written, and the findings are interesting, especially their model that positional codon usage bias could be a regulator of ribosome pausing and tRNA charging levels. Although different translational responses to distinct amino acid starvation have been widely documented, the positional codon usage bias is an interesting aspect. The manuscript's central message could have been made clearer. The authors may consider emphasizing this point more explicitly in the abstract. The rich multi-omics dataset in this work provides valuable resources for the translation field.

      We thank the reviewer for the encouraging comments and share the view that positional codon-usage bias is an important result; accordingly, we now underscore this point explicitly in the revised Abstract. We also emphasise that our other observations are, to our knowledge, novel: only a handful of multi-omics studies have combined ribosome-pausing profiles with direct tRNA-aminoacylation measurements, and none has systematically examined multiple amino-acid-deprivation conditions as presented here.

      Reviewer #2 (Evidence, reproducibility and clarity):

      This study examines the consequences of starvation for the BRCAAs, either singly, for Leu & Ile, or for all three simultaneously in HeLa cells on overall translation rates, decoding rates at each codon, and on ribosome density, protein expression, and distribution of ribosome stalling events across the CDS for each expressed gene. The single amino acid starvation regimes specifically reduce the cognate intracellular amino acid pool and lead to deacylation of at least a subset of the cognate tRNAs in a manner dependent on continuing protein synthesis. They also induce the ISR equally and decrease bulk protein synthesis equally in a manner that appears to occur largely at the initiation level for -Leu and -Val, judging by the decreased polysome:monsome ratio, but at both the initiation and elongation levels for -Ile-a distinction that remains unexplained. Only -Leu appears to down-regulate mTORC1 and TOP mRNA translation.There is a significant down-regulation of protein levels for 50-200 genes, which tend to be unstable in nutrient-replete cells, only a fraction of which are associated with reduced ribosome occupancies (RPFs measured by Ribo-Seq) on the corresponding mRNAs in the manner expected for reduced initiation, suggesting that delayed elongation is responsible for reduced protein levels for the remaining fraction of genes. All three single starvations lead to increased decoding times for a subset of the cognate "hungry" codons: CUU for -Leu, AUU and AUC for -Ile, and all of the Val codons, in a manner that is said to correspond largely to the particular tRNA isoacceptors that become deacylated, although this correspondence was not explained explicitly and might not be as simple as claimed. All three single starvations also evoke skewing of RPFs towards the 5' ends of many CDSs in a manner correlated with an enrichment within the early regions of the CDSs for one or more of the cognate codons that showed increased decoding times for -Ile (AUC codon) and -Val (GUU, GUC, and GUG), but not for -Leu-of which the latter was not accounted for. These last findings suggest that, at least for -Val and -Ile, delays in decoding N-terminal cognate codons cause elongating ribosomes to build-up early in the CDS. They go on to employ a peak calling algorithm to identify stalling sites in an unbiased way within the CDS, which are greatest in number for -Val, and find that Val codons are enriched in the A-sites (slightly) and adjacent 5' nucleotides (to a greater extent) for -Val starvation; and similarly for Ile codons in -Ile conditions, but not for -Leu starvation-again for unknown reasons. It's unclear why their called stalling sites have various other non-hungry codons present in the A sites with the cognate hungry codons being enriched further upstream, given that stalling should occur with the "hungry" cognate codon in the A site. The proteins showing down-regulation are enriched for stalling sites only in the case of the -Val starvation in the manner expected if stalling is contributing to reduced translation of the corresponding mRNA. It's unclear why this enrichment apparently does not extend to -Ile starvation which shows comparable skewing of RPFs towards the 5'ends, and this fact diminishes the claim that pausing generally contributes to reduced translation for genes with abundant hungry codons. All of the same analyses were carried out for the Double -Ile/-Leu and Triple starvations and yield unexpected results, particularly for the triple starvation wherein decoding times are increased only at Val codons, skewing of RPFs towards the 5' ends of CDSs is correlated only with an enrichment for Val codons within the early regions of the CDSs, and stall sites are enriched only for Val codons at nearly upstream sites, all consistent with the finding that only Val tRNAs become deacylated in the Triple regime. To explain why only Val tRNA charging is reduced despite the observed effective starvation for all three amino acids, they note first that stalling at Val codons is skewed towards the 5'ends of CDS for both -Val and triple starvations more so than observed for Ile or -Leu starvation, which they attribute to a greater frequency of Val codons vs Ile codons in the 5' ends of CDSs. As such, charged Val tRNAs are said to be consumed in translating the 5'ends of CDSs and the resulting stalling prevents ribosomes from reaching downstream Ile and Leu codons at the same frequencies and thus prevents deacylation of the cognate Ile and Leu tRNAs. It's unclear whether this explanation is adequate to explain the complete lack of Ile or Leu tRNA deacylation observed even when amino acid recycling by the proteasome is inhibited-a treatment shown to exacerbate deacylation of cognate tRNAs in the single amino acid starvations and of Val tRNA in the triple starvation. As such, the statement in the Abstract "Notably, we could show that isoleucine starvation-specific stalling largely diminished under triple starvation, likely due to early elongation bottlenecks at valine codons" might be too strong and the word "possibly" would be preferred over "likely". It's also unclear why the proteins that are down-regulated in the triple starvation are not significantly enriched for stalling sites (Fig. 5B) given that the degree of skewing is comparable or greater than for -Val. This last point seems to undermine their conclusion in the Abstract that "that many proteins downregulated under BCAA deprivation harbor stalling sites, suggesting that compromised elongation contributes to decreased protein output." In the case of the double -Ile/-Leu starvation, a related phenomenon occurs wherein decoding rates are decreased for only the AUU Ile codon and only the AAU Ile tRNA becomes deacylated; although in this case increased RPFs in the 5' ends are not correlated with enrichment for Ile or Leu codons and, although not presented, apparently stall sites are not associated with the Ile codon in the double starvation. In addition, stalling sites are not enriched in the proteins down-regulated by the double starvation. Moreover, because Ile codons are not enriched in the 5'ends of CDS, it doesn't seem possible to explain the selective deacylation of the single Ile tRNA observed in the double starvation by the same "bottleneck" mechanism proposed to explain selective deacylation of only Val tRNAs during the triple starvation. This is another reason for questioning their "bottleneck" mechanism.

      We thank the reviewer for their deep assessment, exhaustive reading, and constructive feedback, which have greatly contributed to improving the clarity and contextualization of our manuscript. We would first like to clarify that all experiments in this study were conducted in NIH3T3 mouse fibroblasts, not HeLa cells; we assume this was a misunderstanding and have verified that the correct cell line is consistently indicated throughout the manuscript. We also clarify that our data show that -Leu, double starvation, and to a lesser extent -Ile, downregulate mTORC1 signaling and TOP mRNA translation, whereas valine -Val and triple starvation had minimal effects on these pathways. We agree that some of our conclusions and observed phenomena were not explained in sufficient detail in the original version. To address this, we have significantly reworked the discussion, added complementary figures and clarified key points throughout the text, to better convey the underlying rationale and biological interpretation of our findings. We address each of the reviewer’s points in detail in the point-by-point responses below.

      Specific comments (some of which were mentioned above):

      -The authors have treated cells with CHX in the Ribo-Seq experiments, which has been shown to cause artifacts in determining the locations of ribosome stalling in vivo owing to continued elongation in the presence of CHX (https://doi.org/10.1371/journal.pgen.1005732 ). The authors should comment on whether this artifact could be influencing some of their findings, particular the results in Fig. 5C where the hungry codons are often present upstream of the A sites of called stalling sites in the manner expected if elongation continued slowly following stalling in the presence of CHX.

      Response: We thank the reviewer for raising this important concern. We would like to clarify that our ribosome profiling protocol did not include CHX pretreatment of live cells. CHX was added only during the brief PBS washes immediately before lysis and in the lysis buffer itself. This approach aligns with best practices aimed at minimizing post-lysis ribosome run-off, and is intended to prevent the downstream ribosome displacement artifacts described by Hussmann et al. 2015, which result from pre-incubation of live cells with CHX for several minutes before harvesting. Furthermore, recent studies have demonstrated that CHX-induced biases are species-specific. For instance, Sharma et al. 2021 found that human (and mice) ribosomes are not susceptible to conformational restrictions by CHX, nor does CHX distort gene-level measurements of ribosome occupancy. This suggests that the use of CHX in the lysis buffer, as performed in our protocol, is unlikely to introduce significant artifacts in our ribosome profiling data. To further support this, we reanalyzed data from Darnell, Subramaniam, and O’Shea 2018, where the ribosome profiling samples were prepared without any CHX pretreatment or CHX in the wash buffer, and still observed similar upstream enrichments in their stalling profiles (see Supplementary Figure 2B-C in our manuscript). Additionally, in our previous work (Gobet et al. 2020), we compared ribosome dwell times with and without CHX in the lysis buffer and found no significant differences, reinforcing the notion that CHX use during lysis does not substantially affect the measurement of ribosome stalling. Given these considerations, we believe that CHX-related artifacts, such as downstream ribosome movement, are unlikely to explain the enrichment of hungry codons upstream of identified stalling sites in our data. We have now adjusted the Methods section to clarify this point.

      -p. 12: "These starvation-specific DT and ribosome density modulations were also evident at the individual transcript level, as exemplified by Col1a1, Col1a2, Aars, and Mki67 which showed persistent Val-codon-specific ribosome density increases but lost Ile-codon-specific increases under triple starvation (Supplementary Figure 3A-D). " This conclusion is hard to visualize for any but Val codons. It would help to annotate the relevant peaks of interest for -Ile starvation with arrows.

      Response: We agree and thank the reviewer for this observation. We have now annotated exemplary peaks in Supplementary Figure 3A–D to highlight ribosome pileups over Ile codons. However, we agree that it is still hard to visualize in the given Figure. Therefore, we added scatter plots for each of the transcripts that show the RPM of each position in the Ctrl vs starvation to allow for a better illustration of the milder effects upon Ile starvation (Supplementary Figure 4).

      -To better make the point that codon-specific stalling under BCAA starvation appears to be not driven by codon usage, rather than the analysis in Fig. 1H, wouldn't it be better to examine the correlation between increases in DT under the single amino acid starvation conditions and the codon frequencies across all codons?

      Response: We appreciate the suggestion. We have now added an additional analysis correlating the change in DT with codon usage frequency for each starvation condition. This is included in Supplementary Figure 5A-D and supports our interpretation that codon frequency alone does not explain the observed stalling behavior.

      -p. 13, entire paragraph beginning with "Our RNA-seq and Ribo-seq revealed a general activation of stress response pathways across all starvations..." It is difficult to glean any important conclusions from this lengthy analysis, and the results do not appear to be connected to the overall topic of the study. If there are important conclusions here that relate to the major findings then these connections should be made or noted later in the Discussion. If not, perhaps the analysis should be largely relegated to the Supplemental material.

      Response: We thank the reviewer for this comment. The paragraph in question is intended to provide a global overview of transcriptional and translational responses across the starvation conditions. It serves both as a quality control (e.g., PCA clustering and global shifts in RPF/RNA-seq profiles), and to confirm that expected starvation-induced responses are among the strongest detectable signals separating the starved samples from the control. Indeed, these observations establish that the perturbations are effective and that hallmark nutrient stress responses are globally engaged across conditions. Importantly, very few studies to date have examined transcriptional and translational responses under single or combined branched-chain amino acid (BCAA) starvation conditions. It therefore remains unclear to what extent BCAA depletion broadly remodels gene expression and translation. Our analysis contributes to addressing this gap, revealing that while certain stress pathways are commonly induced, others show condition-specific patterns such as we observed for -Ile starvation. To maintain focus, we have kept the detailed pathway analyses and transcript-level enrichments in the Supplement and rewritten the corresponding text in a more compact manner, reducing it by more than one third.

      -p. 15: "Together, these findings highlight that BCAA starvation triggers a combination of effects on initiation and elongation, with varying dynamics by amino acid starvation." I take issue with this statement as it appears that translation is reduced primarily at the initiation step for all conditions except -Ile. As noted above, these data are never menitioned in the DISCUSSION as to why only -Ile would show a marked elongation component to the inhibition whereas -Val gives the greatest amount of ribosome stalling.

      Response: We acknowledge the reviewer’s point. While the polysome profiles (Figure 3F-H) directly indicate that most conditions repress initiation, codon- and condition-specific elongation defects can still contribute to reduced protein output, even if they are not always detectable as global polysome shifts. Polysome profiles reflect the combined outcome of reduced initiation (which decreases polysome numbers) and ribosome stalling (which can, but does not always have to, increase ribosome density on individual transcripts, potentially counteracting the effects of reduced initiation). For valine starvation strong stalling occurs very early in the CDS (Figure 5F). This bottleneck restricts overall ribosome movement to downstream regions. Thus, while elongation is profoundly impaired, the total number of ribosomes per transcript (which polysome signals largely reflect) may appear low due to reduced overall ribosome traffic. In contrast, isoleucine codon stalling tends to occur also further downstream on the transcript (Figure 5F), allowing ribosomes to accumulate in larger numbers on the mRNA, leading to a clearer "elongation signature" in polysome profiles (Figure 3F, H). Additionally, we observed slightly higher inter-replicate variance for isoleucine starvation (Supplementary Figure 6B), which may have reduced the number of statistically significant stalling sites extracted compared to valine. We have revised the main text and discussion to clarify these points.

      -I cannot decipher Fig. 4D and more detail is required to indicate the identity of each column of data.

      Response: We thank the reviewer for pointing this out. Figure 4D (now Figure 4E) presents an UpSet plot, which is a scalable alternative to Venn diagrams commonly used to visualize intersections across multiple sets. Briefly, each bar in the upper plot represents the number of transcripts with increased 5′ ribosome coverage (Δpi < -0.15; p < 0.05) shared across the conditions indicated in the dot matrix below. Each column in the dot matrix highlights the specific combination of conditions contributing to a given intersection (e.g., dots under “Val” and “Triple” show the overlap between these two). To improve clarity, we have expanded the figure legend accordingly and now refer to the UpSetR methodology in the main text.

      -In Fig. 4E, one cannot determine what the P values actually are, which should be provided in the legend to confirm statistical significance.

      Response: Thank you for pointing that out. The legend in Figure 4E (now Figure 4F) for the p-values was accidentally removed during figure editing. We have added the legend back, so that the statistical significance is clear.

      -It's difficult to understand how the -Leu condition and the Double starvation can produce polarized RPFs (Fig. 4A) without evidence of stalling at the cognate hungry codons (Fig. 4E), despite showing later in Fig. 5A that the numbers of stall sites are comparable in those cases to that found for -Ile.

      Response: We appreciate this comment, which points to an important property of RPF profiles under nutrient stress. As shown in Figure 4A, all starvation conditions induce a degree of 5′ ribosome footprint polarization, a pattern that can be observed under various stress conditions and perturbations (Allen et al. 2021; Hwang and Buskirk 2017; Li et al. 2023). This general 5′ bias likely reflects a combination of slowed elongation and altered ribosome dynamics and is not necessarily linked to codon-specific stalling. However, Val and Triple starvation show a much stronger and more asymmetric polarization, characterized by pronounced 5′ accumulation and 3′ depletion of ribosome density. To better illustrate this, we have updated the visualization of polarity scores and added a new bar chart summarizing the number of transcripts showing strong 5′ polarization under each condition. This quantification highlights that the effect is markedly more prevalent under Val and Triple conditions than under Leu or Double starvation. In addition, Figure 4F demonstrates that this polarity is codon-specific under Val and Triple starvation. We clarify that this analysis tests for enrichment of specific codons near the start codon among the polarized transcripts and does not directly assess stalling. The observed enrichment of Val codons in the 5′ regions of polarized transcripts supports the interpretation that early elongation delays contribute to the RPF shift. In contrast, no such enrichment is observed for Leu starvation, reinforcing that Leu-induced polarity is not driven by stalling at Leu codons. While Figure 5 shows a similar number of peak-called stalling sites in -Leu, -Ile, and Double starvation, we note that Ribo-seq signal variability under Ile starvation was higher, which may have limited statistical power for detecting stalling sites, even though clear dwell time increases were observed at specific codons. Additionally, we have improved the metagene plots depicting total ribosome footprint density in Figure 4A. The previous version incorrectly showed sharp drops at CDS boundaries due to binning artifacts. The updated version more accurately reflects the density distribution and further highlights the stronger polarization in Val and Triple conditions. Together, these clarifications and improvements within the main text now more clearly distinguish between general polarity effects and codon-specific stalling.

      -Fig. 5B: the P values should be given for all five columns, and it should be explained here or in the Discussion why the authors conclude that stalling is an important determinant for reduced translation when a significant correlation seems to exist only for the -Val condition and not even for the Triple condition.

      Response: We thank the reviewer for this important observation. In response, we have revised both the text and the figures to provide a clearer and biologically more meaningful representation of the relationship between ribosome stalling and reduced protein output. Specifically, we have replaced the previous Figure 5B with a new analysis that stratifies transcripts based on the number of identified stalling sites. This updated analysis, now shown in Figure 5B, reveals that under Val and Triple starvation conditions, proteins that are downregulated tend to originate from transcripts with multiple stalling sites. Importantly, the corresponding p-values for all five conditions are now explicitly shown in the figure (as red lines). As the reviewer correctly notes, only the Val condition shows a statistically significant enrichment when considering overall overlap. Triple starvation shows a similarly high proportion of overlap (72.3%) but does not reach statistical significance, likely due to the more complex background composition under combined starvation, which increases the expected overlap and reduces statistical power. By stratifying transcripts by the number of stalling sites, we uncover that transcripts with ≥2 stalling sites are enriched among downregulated proteins specifically under Val and Triple conditions, providing a more robust indication of the link between stalling and translation repression under Valine deprivations. We believe this refined approach, prompted by the reviewer’s comment, offers a clearer and biologically more relevant perspective on the role of ribosome stalling. The original analysis previously shown in Figure 5B is now provided as Supplemental Figure 10C for transparency and comparison. We have clarified this in the revised text and now interpret the relationship more cautiously.

      -p. 17: "Of note, in cases where valine or isoleucine codons were present just upstream (rather than at) the stalling position, we noted a strong bias for GAG (E), GAA (E), GAU (D), GAC (D), AAG (K), CAG (Q), GUG (V) and GGA (G) (Val starvation) and AAC (N), GAC (D), CUG (L), GAG (E), GCC (A), CAG (Q), GAA (E) and AAG (K) (Ile starvation) at the stalling site (Supplementary Figure 7B)." The authors fail to explain why these codons would be present in the A sites at stalling sites rather than the hungry codons themselves, especially since it is the decoding times of the hungry codons that are increased according to Fig. 1A-E. As suggested above, is this a CHX artifact?

      Response: We agree that the observation that the listed codons are enriched at identified stalling positions (now Supplementary Figure 10C), while the depleted amino acid codon is located upstream, is a finding that needs more detailed explanation. Importantly, this phenomenon is not attributable to CHX artifacts, as our Ribo-seq protocol employs CHX solely during brief washes and lysis to prevent post-lysis ribosome run-off, rather than live-cell pre-treatment. Instead, we propose two hypotheses to explain this pattern: Firstly, many of these enriched codons are already inherently slow-decoded with longer DTs even under control conditions (Supplementary Figure 5H, newly added). Together with the upstream hungry codons they might form a challenging consecutive decoding environment, which results in an attenuated ribosome slowdown downstream after the hungry codon. Second, ribosome queuing may further explain this pattern. When a ribosome encounters a critically hungry codon and stalls, subsequent ribosomes can form a queue. The codon within the A-site of the queued ribosome would be (more or less) independent of the identity of the hungry codon itself that caused the initial stall. Since the listed codons have a high frequency within the transcriptome (Supp. Fig 5B), they therefore have an increased likelihood of appearing at this “stalling site”. Importantly, both of these phenomena are not necessarily represented by a general increase of DT on all of the listed codons and would therefore only be captured by the direct extraction of stalling sites but might be averaged out in the global dwell time analysis. We mention this phenomenon now in the Discussion.

      -Fig. 5D: P values for the significance, or lack thereof, of the different overlaps should be provided.

      Response: Thanks for pointing out this omission. We have now computed hypergeometric p-values for comparisons shown in Figure 5D and Figure 5E, and report them directly in the main text. As described, the overlap in stalling sites between Val and triple starvation is highly significant (2522 positions, p < 2.2×10⁻¹⁶), while overlaps involving Ile-specific stalling positions are smaller but still statistically robust (e.g., 149 positions for Ile – Triple, p = 1.77×10⁻⁵²). Notably, we also calculated p-values at the transcript level and found that a large fraction of transcripts with Ile-specific stalling under single starvation also stall under triple starvation, though often at different positions (1806 transcripts, p = 1.78×10⁻⁵⁸). These values are now included in the revised results section to support the interpretation of these overlaps.

      -p. 17: "Nonetheless, when we examined entire transcripts rather than single positions, many transcripts that exhibited isoleucine-related stalling under Ile starvation also stalled under triple starvation, but at different sites along the CDS (Figure 5E). This finding is particularly intriguing, as it suggests that while Ile-starvation-specific stalling sites may shift under triple starvation, the overall tendency of these transcripts to stall remains." The authors never come back to account for this unexpected result.

      Response: Thank you for highlighting this point. We've incorporated this finding as part of the proposed "bottleneck" scenario. While the isoleucine-specific stalling sites identified under Ile starvation do shift or disappear under triple starvation, we've observed that the same transcripts still tend to exhibit stalling. However, this now primarily occurs at upstream valine codons. We interpret this as a consequence of early elongation stalling caused by strong pausing at Val codons. This restriction on ribosome progression effectively prevents ribosomes from reaching the original Ile stalling sites. Therefore, the stalling sites identified under triple starvation are largely explained by the Val codons, reflecting a redistribution of stalling rather than its loss. To further clarify this crucial point, we've now explicitly mentioned Figure 5D-E again in the subsequent paragraph, which introduces the bottleneck theory.

      -It seems very difficult to reconcile the results in Fig. 5F with those in Fig. 4A, where similar polarities in RPFs are observed for -Ile and -Val in Fig, 4A but dramatically different distributions of stalling sites in Fig. 5F. More discussion of these discrepancies is required.

      Response: Thank you for pointing this out. The apparent discrepancy between the RPF profiles shown in Figure 4A and the stalling site distributions in Figure 5F likely reflects the fact that RPF polarization includes both general (unspecific) and codon-specific components. Figure 4A displays total ribosome footprint density, capturing both broad stress-induced effects and codon-specific contributions, whereas Figure 5F focuses specifically on peak-called stalling sites, representing localized and statistically significant pauses. Importantly, we would like to emphasise that Fig 4 shows that -Val and -Ile starvation exhibit different responses and not the same patterns. To make these differences even clearer, we have now updated the visualizations in Figure 4, including improved polarity plots and a new bar chart summarizing the number of transcripts with strong 5′ polarization. These additions highlight that the RPF profiles under -Val starvation are more pronounced and asymmetric, particularly due to 3′ depletion, while the polarity under -Ile is milder and a distinct, much smaller subset of transcripts appears to show polarity score shifts. We believe the updated figures and accompanying explanations now make these distinctions clearer.

      • p. 18: " These isoacceptor-specific patterns correlate largely with the particular subsets of leucine and isoleucine codons that stalled (Figure 1A)." This correlation needs to be addressed for each codon-anticodon pair for all of the codons showing stalling in Fig. 1A.

      Response: We thank the reviewer for this important comment. In the revised manuscript, we have expanded the relevant sections to address codon–anticodon relationships more thoroughly. We now explicitly match codons that exhibited increased dwell times under starvation to the corresponding tRNA isoacceptors whose charging was affected, and we provide a clearer discussion of the caveats involved. As noted by the reviewer, this correlation is not straightforward, as it is complicated by wobble base pairing, anticodon modifications, and the fact that multiple codons can be decoded by more than one isoacceptor, and vice versa. Moreover, in our qPCR-based tRNA charging assay, certain isoacceptors cannot be distinguished due to highly similar sequences (e.g., LeuAAG and LeuUAG, and LeuCAA and LeuCAG), which limits resolution for exact pairing. In addition, we did not assess absolute tRNA abundance, which may further influence decoding capacity. Nevertheless, where resolution is possible, the patterns align well: All tRNAVal isoacceptors became uncharged under Val and triple starvation, matching the consistent dwell time increases across all Val codons. Only tRNAIleAAU (decoding AUU and AUC) was deacylated, matching to these codons showing increased dwell times, while AUA (decoded by still-charged tRNAIleUAU) did not. Only CUU (decoded by uncharged tRNALeuGAA) showed increased dwell time. A mild deacylation of the other Leu isoacceptors was observed, but isoacceptor-level resolution is limited by assay constraints. However, these rather minimal tRNA and DT changes were consistent with more dominant initiation repression rather than elongation stalls. To support this analysis, we included an illustrative figure (now in Supplementary Figure 12F) summarizing the codon–anticodon matches.

      -p. 19: "For instance, in our double starvation condition, unchanged tRNA charging levels (Figure 6E) may result from a pronounced downregulation of global translation initiation, likely driven by the activation of stress responses (Figure 2), subsequently lowering the demand for charged tRNAs as it has been observed previously for Leu starvation 39.” This seems at odds with the comparable down-regulation of protein synthesis for the Double starvation and -Leu and -Ile single starvations shown in Fig. 3C. Also, in the current study, Leu starvation does lower charging of certain Leu tRNAs.

      Response: We thank the reviewer for raising this important point. In the revised manuscript, we have clarified this section and now offer a more refined interpretation of the tRNA charging patterns observed under double starvation. While Figure 3C shows a comparable reduction in global protein synthesis across the -Leu, -Ile, and double starvation conditions, it needs to be considered that the OPP assay has limited sensitivity. It operates in a relatively low fluorescence intensity range and is subject to background signal, which may obscure subtle differences between conditions. Moreover, other factors such as changes in protein stability or turnover could also contribute to the observed differences. Therefore, inter-condition differences in translation repression should be interpreted with caution. However, based on our stress response analysis (Figure 2), mTORC1 inactivation appears strongest under double starvation, likely leading to more profound suppression of translation initiation. This would reduce the overall demand for charged tRNAs and could explain why no detectable tRNA deacylation was observed under double starvation, even though mild uncharging of Leu isoacceptors occurred under -Leu, which exhibited a milder stress response. This distinction is consistent with the observed mild dwell time increases for one Leu codon under -Leu, but not in the double condition. Similarly, the absence of Ile codon stalling and tRNA deacylation under double starvation may be attributed to stress-driven reductions in elongation demand, preventing the tRNA depletion and codon-specific delays observed under single Ile starvation. A more direct clarification is now included in the revised manuscript.

      Reviewer #2 (Significance):

      The results here are significant in showing that starvation for a single amino acid does not lead to deacylation of all isoacceptors for that amino acid and in revealing that starvation for one amino acid can prevent deacylation of tRNAs for other amino acids, as shown most dramatically for the selective deacylation of only Val tRNAs in the triple BRCAA starvation condition. For the various reasons indicated above, however, I'm not convinced that their "bottleneck" mechanism is adequate to explain this phenomenon, especially in the case of the selective deacylation of Ile vs Leu tRNA in the Double starvation regime. It's also significant that deacylation leads to ribosome build-up near the 5'ends of CDS, which seems to be associated with an enrichment for the hungry codons in the case of Val and Ile starvation, but inexplicably, not for Leu or the Double starvations. This last discrepancy makes it hard to understand how the -Leu and Double starvations produce RPF buildups near the 5 ends of CDSs. In addition, the claim in the Discussion that "our data also highlight the importance of the codon positional context within mRNAs, indicating that where a codon is located within the CDS can influence both the extent of ribosomal stalling and overall translation efficiency during nutrient stress" overstates the strength of evidence that the stalling events lead to substantial decreases in translational efficiencies for the affected mRNAs, as the stalling frequency and decreased protein output are significantly correlated only for the -Val starvation, and the data in Fig. 3 D-H suggest that the reductions in protein synthesis generally occur at the level of initiation, even for -Val starvation, with a contribution from slow elongation only for -Ile-which is in itself difficult to understand considering that stalling frequencies are highest in -Val. Thus, while many of the results are very intriguing and will be of considerable interest to the translation field, it is my opinion that a number of results have been overinterpreted and that important inconsistencies and complexities have been overlooked in concluding that a significant component of the translational inhibition arises from the increased decoding times at hungry codons during elongation and that the selective deacylation of Val tRNAs in the Triple starvation can be explained by the "bottleneck" mechanism. The complexities and limitations of the data and their intepretations should be discussed much more thoroughly in the Discussion, which currently is devoted mostly to other phenomena often of tangential importance to the current findings. A suitably revised manuscript would clearly state the limitations and caveats of the proposed mechanisms and consider other possible explanations as well.

      Again, we thank the reviewer for the valuable insights and constructive critiques. We believe that the concerns regarding potential overinterpretation and inconsistencies have now been addressed through clearer explanations and more cautious interpretation throughout the revised manuscript. We also agree that the original Discussion included aspects that, while interesting, were of secondary importance. In light of the reviewer’s suggestions, we have restructured and rebalanced the Discussion to focus more directly on the key findings and their implications. Importantly, we wish to clarify that we do not propose the elongation bottleneck model as a general mechanism across all conditions. In particular, for double (Leu/Ile) starvation, we attribute the observed effects primarily to stress response–mediated translational repression, and not to codon-specific stalling or tRNA depletion. We believe that this distinction is now more clearly conveyed in the revised manuscript.

      Reviewer #3 (Evidence, reproducibility and clarity):

      Summary

      Worpenberg and colleagues investigated the translational consequences of branched-chain amino acid (BCAA) starvation in mouse cells. Limitation of individual BCAAs has been reported to cause codon-specific and global translational repression. In this paper, the authors use RNA-seq, ribosome profiling (Ribo-seq), proteomics, and tRNA charging assays to characterize the impacts of individual and combined depletion of leucine, isoleucine, and valine on translation. They find that BCAA starvation increases codon-specific ribosome dwell times, activates global translational stress responses and reduces global protein synthesis. They infer that this effect is due to decreased translation initiation and codon-specific translational stalling. They find that the effects of simultaneous depletion are non-additive. In valine and triple (valine, leucine, and isoleucine) depletion, they show that affected transcripts have a high density of valine codons early in their coding sequences, creating an "elongation bottleneck" that obscures the impact of starvation of other amino acids. Finally, they identify isoacceptor-specific differences in tRNA charging that help explain the codon-specific effects that they observe.

      We find the major findings convincing and clear. We find that some results are incompletely explained. We suggest an additional experiment and also have some minor comments that we hope will improve clarity and rigor.

      We thank the reviewer for the thorough and constructive feedback. We appreciate the recognition of our main findings and the helpful suggestions for improving the manuscript. Below we address each point in detail.

      Major comments

      Figure 3O: In this figure and the associated text, the authors try to determine whether differences in protein degradation can explain why some proteins have higher ribosome density but lower proteomic expression. However, since this analysis relies on published protein half-lives from non-starvation conditions and on the assumption that protein synthesis has entirely stopped, we are not convinced it is informative for this experimental context. It does not distinguish between a model in which protein synthesis has been reduced by stalling and a model in which both protein synthesis and degradation rate have increased, which are both consistent with their Ribo-seq and proteomic data. To address this issue, the authors should either perform protein half-life measurements under their starvation conditions, or more clearly explain these two models in the text and acknowledge that they cannot distinguish between them.

      Response: We agree with the reviewer that our current analysis, which is based on protein half-lives obtained under non-starvation conditions, can not definitively separate the effects of reduced translation from those of increased protein degradation. We have revised the relevant section in the manuscript to more clearly state that this analysis is correlative in nature and serves only to explore one possible explanation for the observed disconnect between ribosome density and protein levels. We now also explicitly acknowledge that our dataset does not allow us to distinguish between a model in which protein output is reduced due to stalling and one in which both translation and degradation rates are altered. However, the observed log2FC in the proteomics data are often milder than expected based on complete-medium condition half-life alone, which would be difficult to reconcile with a dominant contribution from global protein destabilization. That said, we also acknowledge that protein degradation is highly context- and protein-specific, and that proteolytic regulation might still play a role. Performing a direct protein half-life measurement under our starvation conditions would indeed be required to rigorously test this, but such an experiment is outside the scope of this study. We now highlight this as a limitation and a valuable direction for future work, and we have softened any interpretations in the main text to reflect the uncertainty regarding the contribution of protein stability changes.

      Minor comments

      Figure 1G: Why does intracellular valine seem to be less depleted under starvation conditions than intracellular leucine or isoleucine? Are the limits of detections different for different amino acids? The authors should acknowledge this discrepancy and comment on whether it has any implications for interpretation of their results.

      Response: We thank the reviewer for this important point. While valine appears slightly less depleted than leucine or isoleucine in Figure 1G, the fold changes and absolute reductions are strong for all three BCAAs, including valine. To further illustrate this, we have added a supplementary bar chart showing the measured intracellular concentrations in µmol/L, including mean and variance across five biological replicates (Supplementary Figure 5A). We believe that the variation may reflect technical factors, such as differences in detection sensitivity or ionization efficiency between amino acids in the targeted metabolomics assay and, therefore, that the observed difference does not have a meaningful impact on the interpretation of our results. We now directly acknowledge these differences in the main text.

      Figure 1H: These data do not appear to meet the assumptions for linear regression. We suggest either reporting a Spearman R correlation (as the data appears linear in rank but not absolute value), or remove it entirely - we think the plot without statistics is sufficient.

      Response: We thank the reviewer for the suggestion. In the revised manuscript, we removed the statistical annotation and retained only the trend line to illustrate the general pattern. We agree that this visualization alone is sufficient to support the qualitative point we aimed to convey.

      Figure 2B: The in-text description of this figure states that "most" ISR genes show a "robust induction," but only three genes are shown in the figure, two of which are upregulated. The authors should instead specify that 2 out of the 3 genes profiled were robustly induced.

      Response: We have rephrased the sentence to say “two of the three genes profiled…” for precision and consistency with the data shown.

      Figure 2D: Please include the full, uncropped blots in the supplementary materials.

      Response: We have now added the full, uncropped western blots to the supplementary material (Supplementary Figure 8).

      Figure 2E: Swap the positions of the RPS6 and 4E-BP1 plots so they line up with their respective blots to make these figures easier to interpret. Authors should consider doing a one-way ANOVA and post-hoc analysis, if we correctly understand that they are making a conclusion about the difference between multiple groups in aggregate.

      Response: We thank the reviewer for the suggestion. The alignment of the RPS6 and 4E-BP1 plots with their respective blots has been corrected. As this panel focuses on comparisons to the control condition only, we have retained the original presentation.

      Figure 4B: Panel A in this figure is very convincing, and these plots don't add additional information. The authors could consider removing them. If this panel stays in, we suggest removing the "mid index" plot, since it is never referenced in the text and doesn't seem relevant to the message of the figure.

      Response: We appreciate the feedback. While we considered removing panel B as suggested, we decided to retain it because it provides a useful summary of panel A. To improve clarity and visual interpretation, we replaced the original boxplot with a bar plot displaying mean values and SEM error bars. We believe the bar plot now nicely illustrates that Val and Triple starvation lead to stronger effects, especially in the reduction of the 3′ index. The “mid index” plot, which was not referenced in the text and did not contribute to the central message, has been removed as suggested.

      Figure 4E: Why is there a reduction in frequency of a Leu and a Val codon under Ile starvation?

      Response: Thank you for highlighting this observation. The reduction in the frequency of a specific Leu and Val codon under Ile starvation in Figure 4F (former Figure 4E) is indeed intriguing. This figure reflects codon usage in the first 20% of the CDSs among the subset of transcripts that exhibit a footprint polarization under each starvation condition. As such, the observed depletion likely arises from the specific transcript composition of the polarized subset under -Ile, which differs from that under -Val or other conditions. Importantly, this pattern is not consistently observed when analyzing the full transcripts (another Leu codon is affected), indicating that it is not a systematic depletion of these codons. One possibility is that an increased frequency of Ile codons (AUC) within the constrained region may lead to a relative underrepresentation of other codons, such as Leu and Val. Alternatively, this may reflect non-random codon co-occurrence patterns within specific transcripts. While our current data do not allow us to investigate this further, we acknowledge these as speculative explanations and now mention this point in the Discussion as a potential avenue for future study.

      Figure 5G: There appears to be one Val codon early in the Hint1 transcript without much stalling under triple or valine starvation conditions. The authors should acknowledge this and comment on why this may be.

      Response: We thank the reviewer for pointing this out. While the Hint1 transcript indeed contains a valine codon early in its CDS, no clear stalling peak was observed at that position under valine or triple starvation. Several factors may contribute to this: local sequence context can influence ribosome pausing, and not all cognate codons necessarily lead to detectable stalling even under amino acid starvation. Additionally, coverage at the 5′ end of Hint1 is relatively sparse in our dataset, and potential mappability limitations, such as regions with low complexity or repetitive elements, may further reduce resolution at specific sites. We now briefly mention this in the manuscript to clarify the possible causes.

      Figure 5B: In the text referencing this figure, the authors state that "a high number of downregulated proteins with associated ribosome stalling sites did not show an overall decreased mean RPF count...as it would be expected from translation initiation defects, linking these stalling sites directly to proteomic changes." However, RPF is affected both by stalling (increases RPF) and initiation defects (decreases RPF). A gene with both stalling and decreased initiation may appear to have no RPF change. The data does suggest a contribution from stalling, but the authors should also acknowledge that reduced initiation may also be playing a role.

      Response: We agree with the reviewer comment. Our cited statement should indeed be more nuanced. The reviewer correctly points out that RPFs are influenced by both increased ribosome density due to stalling and decreased ribosome density due to reduced initiation. Therefore, a gene experiencing both stalling and reduced initiation might appear to have no net change in RPF, or even a slight increase if stalling is dominant. Thus, while the presence of stalling sites strongly suggests a contribution from compromised elongation to reduced protein output, we cannot definitively rule out a concurrent role for reduced initiation, even in cases where RPF counts are not globally decreased. We revised this section in the manuscript to acknowledge this interplay.

      Figure 5E: the black text on dark brown in the center of the Venn diagram is difficult to read. The diagram should either have a different color scheme, or the text in the center should be white instead of black for higher contrast.

      Response: We have adjusted the text color for better contrast and improved readability.

      Supplementary Figure 1C: The ribosome dwell time data in this study is described as "highly correlated" with another published dwell time dataset, but the P and E site data do not seem strongly correlated. The authors should remove the word "highly."

      Response: We have removed the word “highly” to have a more cautious interpretation in the text.

      Supplementary Figure 3E: Not all of the highlighted codons in this figure are ones with prolonged dwell times. To clarify the point that dwell time change is not related to codon frequency, this figure should only highlight codons that have a significantly prolonged dwell time in at least one starvation condition.

      Response: We thank the reviewer for pointing this out. To improve clarity, we have revised the figure and now specifically highlight codons with significantly prolonged dwell times with stars.

      Supplementary Figure 5C: The gene Chop is mentioned in the main text when referencing this figure, but is absent from the heatmap.

      Response: We thank the reviewer for noting this. The gene Chop is annotated under its alternative name Ddit3 in the current version of the heatmap and is indeed present. To avoid confusion, we have now updated the label in the figure to display Chop (Ddit3) directly.

      Supplementary Figure 7A: The authors could clarify this figure by adding additional language to either the figure panel or the figure legend specifying that the RPM metric being used comes from Ribo-seq.

      Response: We have updated the legend to explicitly state that the RPM values shown are derived from Ribo-seq data.

      Supplementary Figure 7D: The metric used to describe the spatial relationship between the first valine and isoleucine codons in transcripts in this figure seems to be describing something conceptually similar to the stalling sites in Figure 5G, but uses a different metric. These figures would be easier to interpret if these spatial relationships were presented in a consistent way throughout the manuscript.

      Response: We thank the reviewer for this helpful observation. Supplementary Figure 7D (now Supplementary Figure 11B) originally used a gene-length-normalized metric to describe codon spacing, whereas Figure 5G depicted absolute nucleotide distances to stalling sites. To ensure consistency across the manuscript, we have now updated Supplementary Figure 11B to also use absolute distances. We believe this adjustment improves clarity and allows for a more direct comparison between spatial codon patterns and stalling events.

      Discussion:

      Reader understanding would be improved if the relevance of paragraphs were established in the first sentence. For instance, in the paragraphs about adaptive misacylation and posttranscriptional modifications, it is unclear until the end of the paragraph how these topics are relevant. Introducing the relevant aspects of the study (the fact that some starvation conditions have less severe effects and the observation about m6A-related mRNAs) at the beginning of these paragraphs would improve clarity.

      Response: We thank the reviewer for this helpful comment. We agree that the flow and clarity of the Discussion can be improved by making the relevance of each paragraph clearer from the outset. In the revised manuscript, we have restructured these sections to better highlight the connection between each topic and our main findings. These changes also align with suggestions from Reviewer 2, and we believe they help to focus the Discussion more tightly around the core insights of our study.

      The authors should provide more information and speculation about possible physiological relevance of their findings, particularly about the way that the effects of triple starvation are highly valine-dependent. Are there physiological conditions under which starvation of all three BCAAs is more likely than starvation of one or two of them? If so, are there any reasons why a valine-based bottleneck might be advantageous?

      Response: We appreciate the reviewer's insightful question regarding the physiological relevance of our findings, particularly the valine-dependent bottleneck observed under triple BCAA starvation. This prompts a crucial discussion on the broader biological context of our work.

      While complete starvation of all three BCAAs might be less frequent than individual deficiencies, such conditions are physiologically relevant in several contexts. In prolonged fasting, starvation, or severe cachectic states associated with chronic diseases (e.g., advanced cancer, critical illness), systemic amino acid pools, including BCAAs, can become significantly depleted due to increased catabolism and insufficient intake (Yu et al. 2021). Moreover, certain specialized diets or therapeutic strategies aim to modulate BCAA levels. For instance, in some Maple Syrup Urine Disease (MSUD) management protocols, BCAA intake is severely restricted to prevent the accumulation of toxic BCAA metabolites (Mann et al. 2021). Similarly, emerging cancer therapies sometimes explore nutrient deprivation strategies to selectively target tumor cells, which could involve broad BCAA reduction (e.g. Sheen et al. 2011; Xiao et al. 2016).

      In these contexts, a valine-based bottleneck, as we describe, could indeed represent an adaptive strategy. If valine-tRNAs are particularly susceptible to deacylation and valine codons are strategically enriched at the 5' end of transcripts, stalling at these early positions could serve as a rapid "gatekeeper" for global translation. This early-stage inhibition would conserve cellular energy and available amino acids by quickly reducing the overall demand for charged tRNAs. Such a mechanism could potentially prioritize the translation of a subset of proteins that might have different codon usage biases or are translated via alternative, less valine-dependent mechanisms. This aligns with the concept of a multi-layered translational control where global initiation repression (as reflected in mTORC1 inhibition and polysome profiles) is complemented by specific elongation checkpoints, allowing for a more nuanced and adaptive response to severe nutrient stress.

      Reviewer #3 (Significance):

      Nature and significance of the advance

      The main contribution of this work is to demonstrate that depletion of multiple amino acids simultaneously impacts translation elongation in ways that are not necessarily additive. These impacts can depend on the distribution of codons in a transcript. It adds to a growing body of work showing that essential amino acid starvation can cause codon-specific ribosome stalling. The authors suggest that the position-dependent stalling they observe could be a novel regulatory mechanism to alleviate the effects of multi-amino acid starvation. However, it is not fully clear from the paper what the significance of a valine-based regulatory adaptation to BCAA starvation is, or whether simultaneous starvation of all three BCAAs is of particular physiological relevance. The paper's primary contribution is mainly focused on the similarity between valine and triple BCAA starvation, and it provides limited insight into the effects of combined depletion of two BCAAs.

      Context of existing literature

      Although ribosome profiling does not distinguish between actively-elongating and stalled ribosomes, sites with higher read coverage, and thereby higher inferred dwell time, can be used to infer ribosome stalling (Ingolia 2011). Various downstream effects of essential amino acid depletion have been documented, such as leucine deficiency being sensed by mTORC1 via leucyl-tRNA synthetase (Dittmar 2005, Han 2012), and shared transcriptional responses among many amino acid depletion conditions (Tang 2015). These authors have previously measured the translational effects of nutrient stress using ribosome profiling (e.g., Gobet 2020), as have others (Darnell 2018, Kochavi et al. 2024). The present work represents the first study (to our knowledge) combining BCAA depletions, representing an incremental and useful contribution to our understanding of translational responses to stress conditions.

      Audience

      This work is of interest to investigators studying the response of human cells in stress conditions, such as in human disease, as well as investigators studying the basic biology of eukaryotic translational control.

      Reviewer expertise: mRNA decay and translation regulation in bacteria.

      We hope the authors have found our comments thoughtful and useful. We welcome further discussion or clarification via email: Juliana Stanley (julianst@mit.edu) and Hannah LeBlanc (leblanch@mit.edu).

      We sincerely thank the reviewers for their thoughtful and constructive feedback, as well as for their careful and thorough reading of our manuscript. We also gratefully acknowledge the invitation for further discussion and would be happy to engage in future correspondence.

      References

      Allen, George E., Olesya O. Panasenko, Zoltan Villanyi, Marina Zagatti, Benjamin Weiss, Lucile Pagliazzo, Susanne Huch, et al. 2021. “Not4 and Not5 Modulate Translation Elongation by Rps7A Ubiquitination, Rli1 Moonlighting, and Condensates That Exclude eIF5A.” Cell Reports 36 (9): 109633. https://doi.org/10.1016/j.celrep.2021.109633.

      Darnell, Alicia M., Arvind R. Subramaniam, and Erin K. O’Shea. 2018. “Translational Control through Differential Ribosome Pausing during Amino Acid Limitation in Mammalian Cells.” Molecular Cell 71 (2): 229-243.e11. https://doi.org/10.1016/j.molcel.2018.06.041.

      Dittmar, Kimberly A., Michael A. Sørensen, Johan Elf, Måns Ehrenberg, and Tao Pan. 2005. “Selective Charging of tRNA Isoacceptors Induced by Amino-Acid Starvation.” EMBO Reports 6 (2): 151–57. https://doi.org/10.1038/sj.embor.7400341.

      Elf, Johan, Daniel Nilsson, Tanel Tenson, and Mans Ehrenberg. 2003. “Selective Charging of tRNA Isoacceptors Explains Patterns of Codon Usage.” Science (New York, N.Y.) 300 (5626): 1718–22. https://doi.org/10.1126/science.1083811.

      Gobet, Cédric, Benjamin Dieter Weger, Julien Marquis, Eva Martin, Nagammal Neelagandan, Frédéric Gachon, and Felix Naef. 2020. “Robust Landscapes of Ribosome Dwell Times and Aminoacyl-tRNAs in Response to Nutrient Stress in Liver.” Proceedings of the National Academy of Sciences of the United States of America 117 (17): 9630–41. https://doi.org/10.1073/pnas.1918145117.

      Hussmann, Jeffrey A., Stephanie Patchett, Arlen Johnson, Sara Sawyer, and William H. Press. 2015. “Understanding Biases in Ribosome Profiling Experiments Reveals Signatures of Translation Dynamics in Yeast.” Edited by Michael Snyder. PLOS Genetics 11 (12): e1005732. https://doi.org/10.1371/journal.pgen.1005732.

      Hwang, Jae-Yeon, and Allen R. Buskirk. 2017. “A Ribosome Profiling Study of mRNA Cleavage by the Endonuclease RelE.” Nucleic Acids Research 45 (1): 327–36. https://doi.org/10.1093/nar/gkw944.

      Juszkiewicz, Szymon, and Ramanujan S. Hegde. 2017. “Initiation of Quality Control during Poly(A) Translation Requires Site-Specific Ribosome Ubiquitination.” Molecular Cell 65 (4): 743-750.e4. https://doi.org/10.1016/j.molcel.2016.11.039.

      Li, Fajin, Jianhuo Fang, Yifan Yu, Sijia Hao, Qin Zou, Qinglin Zeng, and Xuerui Yang. 2023. “Reanalysis of Ribosome Profiling Datasets Reveals a Function of Rocaglamide A in Perturbing the Dynamics of Translation Elongation via eIF4A.” Nature Communications 14 (1): 553. https://doi.org/10.1038/s41467-023-36290-w.

      Mann, Gagandeep, Stephen Mora, Glory Madu, and Olasunkanmi A. J. Adegoke. 2021. “Branched-Chain Amino Acids: Catabolism in Skeletal Muscle and Implications for Muscle and Whole-Body Metabolism.” Frontiers in Physiology 12 (July):702826. https://doi.org/10.3389/fphys.2021.702826.

      Saikia, Mridusmita, Xiaoyun Wang, Yuanhui Mao, Ji Wan, Tao Pan, and Shu-Bing Qian. 2016. “Codon Optimality Controls Differential mRNA Translation during Amino Acid Starvation.” RNA (New York, N.Y.) 22 (11): 1719–27. https://doi.org/10.1261/rna.058180.116.

      Sharma, Puneet, Jie Wu, Benedikt S. Nilges, and Sebastian A. Leidel. 2021. “Humans and Other Commonly Used Model Organisms Are Resistant to Cycloheximide-Mediated Biases in Ribosome Profiling Experiments.” Nature Communications 12 (1): 5094. https://doi.org/10.1038/s41467-021-25411-y.

      Sheen, Joon-Ho, Roberto Zoncu, Dohoon Kim, and David M. Sabatini. 2011. “Defective Regulation of Autophagy upon Leucine Deprivation Reveals a Targetable Liability of Human Melanoma Cells In Vitro and In Vivo.” Cancer Cell 19 (5): 613–28. https://doi.org/10.1016/j.ccr.2011.03.012.

      Xiao, Fei, Chunxia Wang, Hongkun Yin, Junjie Yu, Shanghai Chen, Jing Fang, and Feifan Guo. 2016. “Leucine Deprivation Inhibits Proliferation and Induces Apoptosis of Human Breast Cancer Cells via Fatty Acid Synthase.” Oncotarget 7 (39): 63679–89. https://doi.org/10.18632/oncotarget.11626.

      Yu, Deyang, Nicole E. Richardson, Cara L. Green, Alexandra B. Spicer, Michaela E. Murphy, Victoria Flores, Cholsoon Jang, et al. 2021. “The Adverse Metabolic Effects of Branched-Chain Amino Acids Are Mediated by Isoleucine and Valine.” Cell Metabolism 33 (5): 905-922.e6. https://doi.org/10.1016/j.cmet.2021.03.025.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary

      Worpenberg and colleagues investigated the translational consequences of branched-chain amino acid (BCAA) starvation in mouse cells. Limitation of individual BCAAs has been reported to cause codon-specific and global translational repression. In this paper, the authors use RNA-seq, ribosome profiling (Ribo-seq), proteomics, and tRNA charging assays to characterize the impacts of individual and combined depletion of leucine, isoleucine, and valine on translation. They find that BCAA starvation increases codon-specific ribosome dwell times, activates global translational stress responses and reduces global protein synthesis. They infer that this effect is due to decreased translation initiation and codon-specific translational stalling. They find that the effects of simultaneous depletion are non-additive. In valine and triple (valine, leucine, and isoleucine) depletion, they show that affected transcripts have a high density of valine codons early in their coding sequences, creating an "elongation bottleneck" that obscures the impact of starvation of other amino acids. Finally, they identify isoacceptor-specific differences in tRNA charging that help explain the codon-specific effects that they observe.

      We find the major findings convincing and clear. We find that some results are incompletely explained. We suggest an additional experiment and also have some minor comments that we hope will improve clarity and rigor.

      Major comments

      Figure 3O: In this figure and the associated text, the authors try to determine whether differences in protein degradation can explain why some proteins have higher ribosome density but lower proteomic expression. However, since this analysis relies on published protein half-lives from non-starvation conditions and on the assumption that protein synthesis has entirely stopped, we are not convinced it is informative for this experimental context. It does not distinguish between a model in which protein synthesis has been reduced by stalling and a model in which both protein synthesis and degradation rate have increased, which are both consistent with their Ribo-seq and proteomic data. To address this issue, the authors should either perform protein half-life measurements under their starvation conditions, or more clearly explain these two models in the text and acknowledge that they cannot distinguish between them.

      Minor comments

      Figure 1G: Why does intracellular valine seem to be less depleted under starvation conditions than intracellular leucine or isoleucine? Are the limits of detections different for different amino acids? The authors should acknowledge this discrepancy and comment on whether it has any implications for interpretation of their results.

      Figure 1H: These data do not appear to meet the assumptions for linear regression. We suggest either reporting a Spearman R correlation (as the data appears linear in rank but not absolute value), or remove it entirely - we think the plot without statistics is sufficient.

      Figure 2B: The in-text description of this figure states that "most" ISR genes show a "robust induction," but only three genes are shown in the figure, two of which are upregulated. The authors should instead specify that 2 out of the 3 genes profiled were robustly induced.

      Figure 2D: Please include the full, uncropped blots in the supplementary materials.

      Figure 2E: Swap the positions of the RPS6 and 4E-BP1 plots so they line up with their respective blots to make these figures easier to interpret. Authors should consider doing a one-way ANOVA and post-hoc analysis, if we correctly understand that they are making a conclusion about the difference between multiple groups in aggregate.

      Figure 4B: Panel A in this figure is very convincing, and these plots don't add additional information. The authors could consider removing them. If this panel stays in, we suggest removing the "mid index" plot, since it is never referenced in the text and doesn't seem relevant to the message of the figure.

      Figure 4E: Why is there a reduction in frequency of a Leu and a Val codon under Ile starvation?

      Figure 5G: There appears to be one Val codon early in the Hint1 transcript without much stalling under triple or valine starvation conditions. The authors should acknowledge this and comment on why this may be.

      Figure 5B: In the text referencing this figure, the authors state that "a high number of downregulated proteins with associated ribosome stalling sites did not show an overall decreased mean RPF count...as it would be expected from translation initiation defects, linking these stalling sites directly to proteomic changes." However, RPF is affected both by stalling (increases RPF) and initiation defects (decreases RPF). A gene with both stalling and decreased initiation may appear to have no RPF change. The data does suggest a contribution from stalling, but the authors should also acknowledge that reduced initiation may also be playing a role.

      Figure 5E: the black text on dark brown in the center of the Venn diagram is difficult to read. The diagram should either have a different color scheme, or the text in the center should be white instead of black for higher contrast.

      Supplementary Figure 1C: The ribosome dwell time data in this study is described as "highly correlated" with another published dwell time dataset, but the P and E site data do not seem strongly correlated. The authors should remove the word "highly."

      Supplementary Figure 3E: Not all of the highlighted codons in this figure are ones with prolonged dwell times. To clarify the point that dwell time change is not related to codon frequency, this figure should only highlight codons that have a significantly prolonged dwell time in at least one starvation condition.

      Supplementary Figure 5C: The gene Chop is mentioned in the main text when referencing this figure, but is absent from the heatmap.

      Supplementary Figure 7A: The authors could clarify this figure by adding additional language to either the figure panel or the figure legend specifying that the RPM metric being used comes from Ribo-seq.

      Supplementary Figure 7D: The metric used to describe the spatial relationship between the first valine and isoleucine codons in transcripts in this figure seems to be describing something conceptually similar to the stalling sites in Figure 5G, but uses a different metric. These figures would be easier to interpret if these spatial relationships were presented in a consistent way throughout the manuscript.

      Discussion:

      Reader understanding would be improved if the relevance of paragraphs were established in the first sentence. For instance, in the paragraphs about adaptive misacylation and posttranscriptional modifications, it is unclear until the end of the paragraph how these topics are relevant. Introducing the relevant aspects of the study (the fact that some starvation conditions have less severe effects and the observation about m6A-related mRNAs) at the beginning of these paragraphs would improve clarity.<br /> The authors should provide more information and speculation about possible physiological relevance of their findings, particularly about the way that the effects of triple starvation are highly valine-dependent. Are there physiological conditions under which starvation of all three BCAAs is more likely than starvation of one or two of them? If so, are there any reasons why a valine-based bottleneck might be advantageous?

      We hope the authors have found our comments thoughtful and useful. We welcome further discussion or clarification via email: Juliana Stanley (julianst@mit.edu) and Hannah LeBlanc (leblanch@mit.edu).

      Significance

      Nature and significance of the advance

      The main contribution of this work is to demonstrate that depletion of multiple amino acids simultaneously impacts translation elongation in ways that are not necessarily additive. These impacts can depend on the distribution of codons in a transcript. It adds to a growing body of work showing that essential amino acid starvation can cause codon-specific ribosome stalling. The authors suggest that the position-dependent stalling they observe could be a novel regulatory mechanism to alleviate the effects of multi-amino acid starvation. However, it is not fully clear from the paper what the significance of a valine-based regulatory adaptation to BCAA starvation is, or whether simultaneous starvation of all three BCAAs is of particular physiological relevance. The paper's primary contribution is mainly focused on the similarity between valine and triple BCAA starvation, and it provides limited insight into the effects of combined depletion of two BCAAs.

      Context of existing literature

      Although ribosome profiling does not distinguish between actively-elongating and stalled ribosomes, sites with higher read coverage, and thereby higher inferred dwell time, can be used to infer ribosome stalling (Ingolia 2011). Various downstream effects of essential amino acid depletion have been documented, such as leucine deficiency being sensed by mTORC1 via leucyl-tRNA synthetase (Dittmar 2005, Han 2012), and shared transcriptional responses among many amino acid depletion conditions (Tang 2015). These authors have previously measured the translational effects of nutrient stress using ribosome profiling (e.g., Gobet 2020), as have others (Darnell 2018, Kochavi et al. 2024). The present work represents the first study (to our knowledge) combining BCAA depletions, representing an incremental and useful contribution to our understanding of translational responses to stress conditions.

      Audience

      This work is of interest to investigators studying the response of human cells in stress conditions, such as in human disease, as well as investigators studying the basic biology of eukaryotic translational control.

      Reviewer expertise: mRNA decay and translation regulation in bacteria.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      This manuscript described the translational responses to single and combined BCAA shortages in mouse cell lines. Using Ribo-seq and RNA-seq analysis, the authors found selective ribosome pausing at codons that encode the depleted amino acids, where the pausing at valine codons was prominent at both a single and triple starvations whereas isoleucine codons showed pausing only under a single depletion. They analyzed the mechanisms of the unexpected selective pausing and proposed that the positional codon usage bias could shape the ribosome stalling and tRNA charging patterns across different amino acids. They also examined the stress responses and the changes in the protein expression levels under BCAA starvation.

      The manuscript was well-written, and the findings are interesting, especially their model that positional codon usage bias could be a regulator of ribosome pausing and tRNA charging levels. Although different translational responses to distinct amino acid starvation have been widely documented, the positional codon usage bias is an interesting aspect. The manuscript's central message could have been made clearer. The authors may consider emphasizing this point more explicitly in the abstract. The rich multi-omics dataset in this work provides valuable resources for the translation field.

      Major comments

      1. The abstract may need to be revised since it is hard to immediately catch the authors' main point. If the authors regard this work as a resource paper, the current version is fine. But it could be better to point out the positional codon usages the authors found, which is a strong point of the current manuscript.
      2. Page 18 "Beyond these tRNA dynamics, our data also highlight the importance of the codon positional context within mRNAs, indicating that where a codon is located within the CDS can influence both the extent of ribosomal stalling and overall translation efficiency during nutrient stress."<br /> This idea is interesting. To what extent the authors think this could be generalized? The authors may discuss whether they think their proposed model is specific to the different ribosome stalling patterns between valine and isoleucine codons or generalized to other codon combinations. For example, the positional codon usage bias will be different among different organisms, and are there any previous reports on ribosome behaviors that align with their model? Even if the authors think this model can be applied to BCAA starvation, would it be possible to explain the different isoleucine codon responses between single and double starvation? The authors may discuss why the ribosome stalling at isoleucine AUU and AUC codons was slightly attenuated under double starvation. And how about the different leucine codon responses among single, double, and triple starvations, although the pausing is not as strong as isoleucine and valine codons? Experimental validation using artificial reporters carrying biased sequences may also be considered.
      3. Page 13 "Moreover, we noticed that DT changes extend beyond the ribosomal A-site, including the P-site, E-site, and even further positions (Supplementary Fig. 2A), consistent with other studies on single amino acid starvation 39 (Supplementary Fig. 2B-C)." Could the widespread DT changes be due to Ribo-DT pipeline they used or difficulties in offset determination? Indeed the authors showed that this feature was found in other datasets, but it seems that the datasets were processed and analyzed in the same way as their data. The original Ribo-DT paper (Gobet and Naef, 2022, Methods) also showed some widespread DT changes even from RNA-seq. Another analysis method like the codon subsequence abundant shift as a part of diricore analysis (Loayza-Puch et al., 2016, Nature) did not show that broad changed regions. The authors are encouraged to re-analyze the data sets using different methods.
      4. Page 13 "Intriguingly, only two of the three isoleucine codons (AUU and AUC) showed increased DTs upon Ile starvation (p < 0.01), while just one leucine codon (CUU) exhibited a modest but significant DT increase (p < 0.01) under Leu starvation (Figure 1A-B, Supplementary Figure 2A)." How can the authors explain the different strengths of ribosome pausing at Ile codons under Ile and double starvation? The AUA codon did not show any pausing under either of the starvation conditions. Throughout the manuscript, the authors mainly describe the difference between amino acids but it is desirable to discuss the codon-level difference as well.
      5. Page 13 "We examined the effects of single amino acid starvations (-Leu, -Ile and -Val), as well as combinations, including a double starvation of leucine and isoleucine (hereafter referred to as "double") and a starvation of leucine, isoleucine, and valine ("triple"), allowing us to identify potential non-additive effects." The different double starvations, isoleucine and valine, and leucine and valiene, will further support their hypothesis on the effects of the positional codon usage bias on ribosome pausing and tRNA charging patterns. Although this could be beyond the scope of the current manuscript, the authors are encouraged to provide a rationale for the chosen combination.

      Minor comments

      Page 16 "these results imply that BCAA deprivation lowers protein output through multiple pathways: a combination of reduced initiation, direct elongation blocks (stalling), and possibly an increased proteolysis" This conclusion is totally right but may be too general. Could the authors summarize BCAA-specific features of the events including reduced initiation, stalling, and proteolysis that all contribute to protein outputs? This is not well discussed in the latter sections including Discussion.

      Significance

      The manuscript was well-written, and the findings are interesting, especially their model that positional codon usage bias could be a regulator of ribosome pausing and tRNA charging levels. Although different translational responses to distinct amino acid starvation have been widely documented, the positional codon usage bias is an interesting aspect. The manuscript's central message could have been made clearer. The authors may consider emphasizing this point more explicitly in the abstract. The rich multi-omics dataset in this work provides valuable resources for the translation field.

    1. Problem-posing education is revolutionary futurity. Hence itis prophetic (and, as such, hopeful). Hence, it corresponds tothe historical nature of humankind. Hence, it affirms womenand men as beings who transcend themselves, who move for-ward and look ahead, for whom immobility represents a fatalthreat, for whom looking at the past must only be a means ofunderstanding more clearly what and who they are so that theycan more wisely build the fixture. Hence, it identifies with themovement which engages people as beings aware of their in-completion—an historical movement which has its point of de-parture, its Subjects and its objective.

      This reminds me of a doctrine from one of my mentors who allowed me to see that I must be perturbed over the thought of surpassing myself. In this sense it is a collaborative effort. If I may tie it to a metaphor, problem-posing education makes me think of a giant pump trolley where neither teacher or student can properly advance without the other's contribution. We must also decide in what direction we'll travel.

    1. ons Tap to enable a layout that focuses on the article. Focus mode setTimeout(()=>{try{if(-1===document.cookie.indexOf("c_mId="))return;const e=window.localStorage.getItem("FocusMode");if(!e)return;if(!JSON.parse(e).enabled)return;const o=document.querySelector(".focus-toggle"),t=o?o.querySelector(".toggle-switch-button"):void 0;if(!o||!t)return;document.documentElement.classList.add("focus","focus-enabled"),o.classList.remove("hidden"),t.classList.add("is-checked")}catch(e){console.warn("Error retrieving data for Focus Mode",e)}},0) Subscribe or Log In Profile Sign Out Show Search Search Query Submit Search Advertisement California The 9 LGBTQ+ children’s books targeted in high court ruling upending education policy A selection of books featuring LGBTQ+ characters that are part of a Supreme Court case are pictured April 15 in Washington. (Pablo Martinez Monsivais / Associated Press) By Jenny GoldStaff Writer Follow June 27, 2025 8:01 PM PT 8 Share via Close extra sharing options Email Facebook X LinkedIn Threads Reddit WhatsApp Copy Link URL Copied! Print Picture books are not usually the stuff of Supreme Court rulings. But on Friday, a majority of justices ruled that parents have a right to opt their children out of lessons that offend their religious beliefs — bringing the colorful pages of books like “Uncle Bobby’s Wedding” and “Pride Puppy” into the staid public record of the nation’s highest court.The ruling resulted from a lawsuit brought by parents in Montgomery County, Md., who sued for the right to remove their children from lessons where LGBTQ+ storybooks would be read aloud in elementary school classes from kindergarten through 5th grade. The books were part of an effort in the district to represent LGBTQ+ families in the English language arts curriculum.In a 6-3 decision, the Supreme Court ruled that schools must “notify them in advance” when one of the disputed storybooks would be used in their child’s class, so that they could have their children temporarily removed. The court’s three liberals dissented. Advertisement Politics Parents may pull their children from classes that offend their religion, Supreme Court rules Supreme Court hands down a major victory for parents’ rights June 27, 2025 As part of the the decisions, briefings and petitions in the case, the justices and lawyers for the parents described in detail the story lines of nine picture books that were part of Montgomery County’s new curriculum. In her dissent, Justice Sonia Sotomayor even reproduced one, “Uncle Bobby’s Wedding,” in its entirety. Here are the nine books that were the subject of the case:Pride PuppyAuthor: Robin Stevenson Illustrator: Julie McLaughlin Book “Pride Puppy” published by Orca Book Publishers. (Orca Book Publishers) “Pride Puppy,” a rhyming alphabet book for very young children, depicts a little girl who loses her dog during a joyful visit to a Pride parade. The story, which is available as a board book, invites readers to spot items starting with each of the letters of the alphabet, including apple, baseball and clouds — as well as items more specific to a Pride parade.Lawyers representing the parents said in their brief that the “invites students barely old enough to tie their own shoes to search for images of ‘underwear,’ ‘leather,’ ‘lip ring,’ ‘[drag] king’ and ‘[drag] queen,’ and ‘Marsha P. Johnson,’ a controversial LGBTQ activist and sex worker.”The “leather” in question refers to a mother’s jacket, and the “underwear” to a pair of green briefs worn over tights by an older child as part of a colorful outfit. Advertisement The Montgomery County Public Schools stopped teaching “Pride Puppy” in the midst of the legal battle. California As children’s book bans soar, sales are down and librarians are afraid. Even in California Book bans are tanking sales of children’s books. Schools and libraries aren’t buying books about LGBTQ+ issues and race as they brace for culture war pushback. Dec. 12, 2024 Love, VioletAuthor: Charlotte Sullivan WildIllustrator: Charlene Chua Book “Love Violet” published by macmillan publishers. (macmillan) The story describes a little girl named Violet with a crush on another girl in her class named Mira, who “had a leaping laugh” and “made Violet’s heart skip.” But every time Mira tries to talk to her, Violet gets shy and quiet.On Valentine’s Day, Violet makes Mira a special valentine. As Violet gathers the courage to give it to her, the valentine ends up trampled in the snow. But Mira loves it anyway and also has a special gift for Violet — a locket with a violet inside. At the end of the book, the two girls go on an adventure together.Lawyers for the parents describe “Love, Violet” as a book about “two young girls and their same-sex playground romance.” They wrote in that “teachers are encouraged to have a ‘think aloud’ moment to ask students how it feels when they don’t just ‘like’ but ‘like like’ someone.” Advertisement Born Ready: The True Story of a Boy Named PenelopeAuthor: Jodie Patterson Illustrator: Charnelle Pinkney Barlow Book “Born Ready” published by Random House. (Random House) In “Born Ready,” 5-year-old Penelope was born a girl but is certain they are a boy. “I love you, Mama, but I don’t want to be you. I want to be Papa. I don’t want tomorrow to come because tomorrow I’ll look like you. Please help me, Mama. Help me be a boy,” Penelope tells their mom. “We will make a plan to tell everyone we know,” Penelope’s mom tells them, and they throw a big party to celebrate.In her dissent, Sotomayor notes, “When Penelope’s brother expresses skepticism, his mother says, ‘Not everything needs to make sense. This is about love.’ ” In their opening brief, lawyers for the families said that “teachers are told to instruct students that, at birth, people ‘guess about our gender,’ but ‘we know ourselves best.’ ”Prince and Knight Author: Daniel Haack Illustrator: Stevie Lewis “Prince and Knight” is a story about a prince whose parents want him to find a bride, but instead he falls in love with a knight. Together, they fight off a dragon. When the prince falls from a great height, his knight rescues him on horseback. When the king and queen find out of their love, they “were overwhelmed with joy. ‘We have finally found someone who is perfect for our boy!’ ” A great wedding is held, and “the prince and his shining knight would live happily ever after.”“The book Prince & Knight clearly conveys the message that same-sex marriage should be accepted by all as a cause for celebration,” said Justice Samuel Alito, who wrote the majority opinion, a concerning message for Americans whose religion tells them that same-sex marriage is wrong.

      This is just about acceptance and not really conforming into certain views

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC-2025-02922

      Corresponding author(s): Christian Specht

      [Please use this template only if the submitted manuscript should be considered by the affiliate journal as a full revision in response to the points raised by the reviewers.

      • *

      If you wish to submit a preliminary revision with a revision plan, please use our "Revision Plan" template. It is important to use the appropriate template to clearly inform the editors of your intentions.]

      1. General Statements [optional]

      This section is optional. Insert here any general statements you wish to make about the goal of the study or about the reviews.

      • *

      We thank the reviewers for their thorough and constructive evaluation of our work. We have revised the manuscript carefully and addressed all the criticisms raised, in particular the issues mentioned by several of the reviewers (see point-by-point response below). We have also added a number of explanations in the text for the sake of clarity, while trying to keep the manuscript as concise as possible.

      • *

      In our view, the novelty of our research is two-fold. From a neurobiological point of view, we provide conclusive evidence for the existence of glycine receptors (GlyRs) at inhibitory synapses in various brain regions including the hippocampus, dentate gyrus and sub-regions of the striatum. This solves several open questions and has fundamental implications for our understanding of the organisation and function of inhibitory synapses in the telencephalon. Secondly, our study makes use of the unique sensitivity of single molecule localisation microscopy (SMLM) to identify low protein copy numbers. This is a new way to think about SMLM as it goes beyond a mere structural characterisation and towards a quantitative assessment of synaptic protein assemblies.

      2. Point-by-point description of the revisions

      This section is mandatory. *Please insert a point-by-point reply describing the revisions that were already carried out and included in the transferred manuscript. *

      • *

      __Reviewer #1 (Evidence, reproducibility and clarity (Required)): __

      In this manuscript, the authors investigate the nanoscopic distribution of glycine receptor subunits in the hippocampus, dorsal striatum, and ventral striatum of the mouse brain using single-molecule localization microscopy (SMLM). They demonstrate that only a small number of glycine receptors are localized at hippocampal inhibitory synapses. Using dual-color SMLM, they further show that clusters of glycine receptors are predominantly localized within gephyrin-positive synapses. A comparison between the dorsal and ventral striatum reveals that the ventral striatum contains approximately eight times more glycine receptors and this finding is consistent with electrophysiological data on postsynaptic inhibitory currents. Finally, using cultured hippocampal neurons, they examine the differential synaptic localization of glycine receptor subunits (α1, α2, and β). This study is significant as it provides insights into the nanoscopic localization patterns of glycine receptors in brain regions where this protein is expressed at low levels. Additionally, the study demonstrates the different localization patterns of GlyR in distinct striatal regions and its physiological relevance using SMLM and electrophysiological experiments. However, several concerns should be addressed.

      The following are specific comments:

      1. Colocalization analysis in Figure 1A. The colocalization between Sylite and mEos-GlyRβ appears to be quite low. It is essential to assess whether the observed colocalization is not due to random overlap. The authors should consider quantifying colocalization using statistical methods, such as a pixel shift analysis, to determine whether colocalization frequencies remain similar after artificially displacing one of the channels. *Following the suggestion of reviewer 1, we re-analysed CA3 images of Glrbeos/eos hippocampal slices by applying a pixel-shift type of control, in which the Sylite channel (in far red) was horizontally flipped relative to the mEos4b-GlyRβ channel (in green, see Methods). As expected, the number of mEos4b-GlyRβ detections per gephyrin cluster was markedly reduced compared to the original analysis (revised__ Fig. 1B__), confirming that the synaptic mEos4b detections exceed chance levels (see page 5). *

      Inconsistency between Figure 3A and 3B. While Figure 3B indicates an ~8-fold difference in the number of mEos4b-GlyRβ detections per synapse between the dorsal and ventral striatum, Figure 3A does not appear to show a pronounced difference in the localization of mEos4b-GlyRβ on Sylite puncta between these two regions. If the images presented in Figure 3A are not representative, the authors should consider replacing them with more representative examples or providing an expanded images with multiple representative examples. Alternatively, if this inconsistency can be explained by differences in spot density within clusters, the authors should explain that.

      *The pointillist images in Fig. 3A are essentially binary (red-black). Therefore, the density of detections at synapses cannot be easily judged by eye. For clarity, the original images in Fig. 3A have been replaced with two other examples that better reflect the different detection numbers in the dorsal and ventral striatum. *

      • *

      Quantification in Figure 5. It is recommended that the authors provide quantitative data on cluster formation and colocalization with Sylite puncta in Figure 5 to support their qualitative observations.

      *This is an important point that was also raised by the other reviewers. We have performed additional experiments to increase the data volume for analysis. For quantification, we used two approaches. First, we counted the percentage of infected cells in which synaptic localisation of the recombinant receptor subunit was observed (Fig. 5C). We found that mEos4b-GlyRa1 consistently localises at synapses, indicating that all cells express endogenous GlyRb. When neurons were infected with mEos4b-GlyRb, fewer cells had synaptic clusters, meaning that indeed, GlyR alpha subunits are the limiting factor for synaptic targeting. In cultures infected with mEos4b-GlyRa2, only very few neurons displayed synaptic localisation (as judged by epifluorescence imaging). We think this shows that GlyRa2 is less capable of forming heteromeric complexes than GlyRa1, in line with our previous interpretation (see pp. 9-10, 13). *

      • *

      Secondly, we quantified the total intensity of each subunit at gephyrin-positive domains, both in infected neurons as well as non-infected control cultures (Fig. 5D). We observed that mEos4b-GlyRa1 intensity at gephyrin puncta was higher than that of the other subunits, again pointing to efficient synaptic targeting of GlyRa1. Gephyrin cluster intensities (Sylite labelling) were not significantly different in GlyRb and GlyRa2 expressing neurons compared to the uninfected control, indicating that the lentiviral expression of recombinant subunits does not fundamentally alter the size of mixed inhibitory synapses in hippocampal neurons. Interestingly, gephyrin levels were slightly higher in hippocampal neurons expressing mEos4b-GlyRa1. In our view, this comes from an enhanced expression and synaptic targeting of mEos4b-GlyRa1 heteromers with endogenous GlyRb, pointing to a structural role of GlyRa1/b in hippocampal synapses (pp. 10, 13).

      • *

      The new data and analyses have been described and illustrated in the relevant sections of the manuscript.

      Potential for pseudo replication. It's not clear whether they're performing stats tests across biological replica, images, or even synapses. They often quote mean +/- SEM with n = 1000s, and so does that mean they're doing tests on those 1000s? Need to clarify.

      All experiments were repeated at least twice to ensure reproducibility (N independent experiments). Statistical tests were performed on pooled data across the biological replicates; n denotes the number of data points used for testing (e.g., number of synaptic clusters, detections, cells, as specified in each case). We have systematically given these numbers in the revised manuscript (n, N, and other experimental parameters such as the number of animals used, coverslips, images or cells). Data are generally given as mean +/- SEM or as mean +/- SD as indicated.

      • *

      Does mEoS effect expression levels or function of the protein? Can't see any experiments done to confirm this. Could suggest WB on homogenate, or mass spec?

      The Glrbeos/eos knock-in mouse line has been characterised previously and does not to display any ultrastructural or functional deficits at inhibitory synapses (Maynard et al. 2021 eLife). GlyRβ expression and glycine-evoked responses were not significantly different to those of the wild-type. The synaptic localisation of mEos4b-GlyRb in KI animals demonstrates correct assembly of heteromeric GlyRs and synaptic targeting. Accordingly, the animals do not display any obvious phenotype. We have clarified this in the manuscript (p. 4). In the case of cultured neurons, long-term expression of fluorescent receptor subunits with lentivirus has proven ideal to achieve efficient synaptic targeting. The low and continuous supply of recombinant receptors ensures assembly with endogenous subunits to form heteropentameric receptor complexes (e.g. [Patrizio et al. 2017 Sci Rep]). In the present study, lentivirus infection did not induce any obvious differences in the number or size of inhibitory synapses compared to control neurons, as judged by Sylite labelling of synaptic gephyrin puncta (new__ Fig. 5D__).

      Quantification of protein numbers is challenging with SMLM. Issues include i) some of FP not correctly folded/mature, and ii) dependence of localisation rate on instrument, excitation/illumination intensities, and also the thresholds used in analysis. Can the authors compare with another protein that has known expression levels- e.g. PSD95? This is quite an ask, but if they could show copy number of something known to compare with, it would be useful.

      We agree that absolute quantification with SMLM is challenging, since the number of detections depends on fluorophore maturation, photophysics, imaging conditions, and analysis thresholds (discussed in Patrizio & Specht 2016, Neurophotonics). For this reason, only very few datasets provide reliable copy numbers, even for well-studied proteins such as PSD-95. One notable exception is the study by Maynard et al. (eLife 2021) that quantified endogenous GlyRb-containing receptors in spinal cord synapses using SMLM combined with correlative electron microscopy. The strength of this work was the use of a KI mouse strain, which ensures that mEos4b-GlyRb expression follows intrinsic regional and temporal profiles. The authors reported a stereotypic density of ~2,000 GlyRs/µm² at synapses, corresponding to ~120 receptors per synapse in the dorsal horn and ~240 in the ventral horn, taking into account various parameters including receptor stoichiometry and the functionality of the fluorophore. These values are very close to our own calculations of GlyR numbers at spinal cord synapses that were obtained slightly differently in terms of sample preparation, microscope setup, imaging conditions, and data analysis, lending support to our experimental approach. Nevertheless, the obtained GlyR copy numbers at hippocampal synapses clearly have to be taken as estimates rather than precise figures, because the number of detections from a single mEos4b fluorophore can vary substantially, meaning that the fluorophores are not represented equally in pointillist images. This can affect the copy number calculation for a specific synapse, in particular when the numbers are low (e.g. in hippocampus), however, it should not alter the average number of detections (Fig. 1B) or the (median) molecule numbers of the entire population of synapses (Fig. 1C). We have discussed the limitations of our approach (p. 11).

      Rationale for doing nanobody dSTORM not clear at all. They don't explain the reason for doing the dSTORM experiments. Why not just rely on PALM for coincidence measurements, rather than tagging mEoS with a nanobody, and then doing dSTORM with that? Can they explain? Is it to get extra localisations- i.e. multiple per nanobody? If so, localising same FP multiple times wouldn't improve resolution. Also, no controls for nanobody dSTORM experiments- what about non-spec nb, or use on WT sections?

      *As discussed above (point 6), the detection of fluorophores with SMLM is influenced by many parameters, not least the noise produced by emitting molecules other than the fluorophore used for labelling. Our study is exceptional in that it attempts to identify extremely low molecule numbers (down to 1). To verify that the detections obtained with PALM correspond to mEos4b, we conducted robust control experiments (including pixel-shift as suggested by the reviewer, see point 1, revised__ Fig. 1B__). The rationale for the nanobody-based dSTORM experiments was twofold: (1) to have an independent readout of the presence of low-copy GlyRs at inhibitory synapses and (2) to analyse the nanoscale organisation of GlyRs relative to the synaptic gephyrin scaffold using dual-colour dSTORM with spectral demixing (see p. 6). The organic fluorophores used in dSTORM (AF647, CF680) ensure high photon counts, essential for reliable co-localisation and distance analysis. PALM and dSTORM cannot be combined in dual-colour mode, as they require different buffers and imaging conditions. *

      The specificity of the anti-Eos nanobody was demonstrated by immunohistochemistry in spinal cord cultures expressing mEos4b-GlyRb and wildtype control tissue (Fig. S3). In response to the reviewer's remarks, we also performed a negative control experiment in Glrbeos/eos slices (dSTORM), in which the nanobody was omitted (new__ Fig. S4F,G__). Under these conditions, spectral demixing produced a single peak corresponding to CF680 (gephyrin) without any AF647 contribution (Fig. S4F). The background detection of "false" AF647 detections at synapses was significantly lower than in the slices labelled with the nanobody. We conclude that the fluorescence signal observed in our dual-colour dSTORM experiments arises from the specific detection of mEos4b-GlyRb by the nanobody, rather than from background, cross-reactivity or wrong attribution of colour during spectral demixing. We have added these data and explanations in the results (p. 7) and in the figure legend of Fig. S4F,G.

      What resolutions/precisions were obtained in SMLM experiments? Should perform Fourier Ring Correlation (FRC) on SR images to state resolutions obtained (particularly useful for when they're presenting distance histograms, as this will be dependent on resolution). Likewise for precision, what was mean precision? Can they show histograms of localisation precision.

      This is an interesting question in the context of our experiments with low-copy GlyRs, since the spatial resolution of SMLM is limited also by the density of molecules, i.e. the sampling of the structure in question (Nyquist-Shannon criterion). Accordingly, the priority of the PALM experiments was to improve the sensibility of SMLM for the identification of mEos4b-GlyRb subunits, rather than to maximize the spatial resolution. The mean localisation precision in PALM was 33 +/- 12 nm, as calculated from the fitting parameters of each detection (Zeiss, ZEN software), which ultimately result from their signal-to-noise ratio. This is a relatively low precision for SMLM, which can be explained by the low brightness of mEos4b compared to organic fluorophores together with the elevated fluorescence background in tissue slices.

      • *

      In the case of dSTORM, the aim was to study the relative distribution of GlyRs within the synaptic scaffold, for which a higher localisation precision was required (p. 6). Therefore, detections with a precision ≥ 25 nm were filtered during analysis with NEO software (Abbelight). The retained detections had a mean localisation precision of 12 +/- 5 for CF680 (Sylite) and 11 +/- 4 for AF647 (nanobody). These values are given in the revised manuscript (pp. 18, 22).

      Why were DBSCAN parameters selected? How can they rule out multiple localisations per fluor? If low copy numbers (

      Multiple detections of the same fluorophore are intrinsic to dSTORM imaging and have not been eliminated from the analysis. Small clusters of detections likely represent individual molecules (e.g. single receptors in the extrasynaptic regions, Fig. 2A). DBSCAN is a robust clustering method that is quite insensitive to minor changes in the choice of parameters. For dSTORM of synaptic gephyrin clusters (CF680), a relatively low length (80 nm radius) together with a high number of detections (≥ 50 neighbours) were chosen to reconstruct the postsynaptic domain with high spatial resolution (see point 8). In the case of the GlyR (nanobody-AF647), the clustering was done mostly for practical reasons, as it provided the coordinates of the centre of mass of the detections. The low stringency of this clustering (200 nm radius, ≥ 5 neighbours) effectively filters single detections that can result from background noise or incorrect demixing. An additional reference explaining the use of DBSCAN including the choice of parameters is given on p. 22 (see also R2 point 4).

      For microscopy experiment methods, state power densities, not % or "nominal power".

      *Done. We now report the irradiance (laser power density) instead of nominal power (pp. 18, 21). *

      In general, not much data presented. Any SI file with extra images etc.?

      *The original submission included four supplementary figures with additional data and representative images that should have been available to the reviewer (Figs. S1-S4). The SI file has been updated during revision (new Fig. S4E-G). *

      Clarification of the discussion on GlyR expression and synaptic localization: The discussion on GlyR expression, complex formation, and synaptic localization is sometimes unclear, and needs terminological distinctions between "expression level", "complex formation" and "synaptic localization". For example, the authors state:"What then is the reason for the low protein expression of GlyRβ? One possibility is that the assembly of mature heteropentameric GlyR complexes depends critically on the expression of endogenous GlyR α subunits." Does this mean that GlyRβ proteins that fail to form complexes with GlyRα subunits are unstable and subject to rapid degradation? If so, the authors should clarify this point. The statement "This raises the interesting possibility that synaptic GlyRs may depend specifically on the concomitant expression of both α1 and β transcripts." suggests a dependency on α1 and β transcripts. However, is the authors' focus on synaptic localization or overall protein expression levels? If this means synaptic localization, it would be beneficial to state this explicitly to avoid confusion. To improve clarity, the authors should carefully distinguish between these different aspects of GlyR biology throughout the discussion. Additionally, a schematic diagram illustrating these processes would be highly beneficial for readers.

      We thank the reviewer to point this out. We are dealing with several processes; protein expression that determines subunit availability and the assembly of pentameric GlyRs complexes, surface expression, membrane diffusion and accumulation of GlyRb-containing receptor complexes at inhibitory synapses. We have edited the manuscript, particularly the discussion and tried to be as clear as possible in our wording.

      • *

      We chose not to add a schematic illustration for the time being, because any graphical representation is necessarily a simplification. Instead, we preferred to summarise the main numbers in tabular form (Table 1). We are of course open to any other suggestions.

      Interpretation of GlyR localization in the context of nanodomains. The distribution of GlyR molecules on inhibitory synapses appears to be non-homogeneous, instead forming nanoclusters or nanodomains, similar to many other synaptic proteins. It is important to interpret GlyR localization in the context of nanodomain organization.

      The dSTORM images in Fig. 2 are pointillist representations that show individual detections rather than molecules. Small clusters of detections are likely to originate from a single AF647 fluorophore (in the case of nanobody labelling) and therefore represent single GlyRb subunits. Since GlyR copy numbers are so low at hippocampal synapses (≤ 5), the notion of nanodomain is not directly applicable. Our analysis therefore focused on the integration of GlyRs within the postsynaptic scaffold, rather than attempting to define nanodomain structures (see also response to point 8 of R1). A clarification has been added in the revised manuscript (p. 6).

      __Reviewer #1 (Significance (Required)): __

      The paper presents biological and technical advances. The biological insights revolve mostly on the documentation of Glycine receptors in particular synapses in forebrain, where they are typically expressed at very low levels. The authors provide compelling data indicating that the expression is of physiological significance. The authors have done a nice job of combining genetically-tagged mice with advanced microscopy methods to tackle the question of distributions of synaptic proteins. Overall these advances are more incremental than groundbreaking.

      We thank the reviewer for acknowledging both the technical and biological advances of our study. While we recognize that our work builds upon established models, we consider that it also addresses important unresolved questions, namely that GlyRs are present and specifically anchored at inhibitory synapses in telencephalic regions, such as the hippocampus and striatum. From a methodological point of view, our study demonstrates that SMLM can be applied not only for structural analysis of highly abundant proteins, but also to reliably detect proteins present at very low copy numbers. This ability to identify and quantify sparse molecule populations adds a new dimension to SMLM applications, which we believe increases the overall impact of our study beyond the field of synaptic neuroscience.

      __Reviewer #2 (Evidence, reproducibility and clarity (Required)): __

      In their manuscript "Single molecule counting detects low-copy glycine receptors in hippocampal and striatal synapses" Camuso and colleagues apply single molecule localization microscopy (SMLM) methods to visualize low copy numbers of GlyRs at inhibitory synapses in the hippocampal formation and the striatum. SMLM analysis revealed higher copy numbers in striatum compared to hippocampal inhibitory synapses. They further provide evidence that these low copy numbers are tightly linked to post-synaptic scaffolding protein gephyrin at inhibitory synapses. Their approach profits from the high sensitivity and resolution of SMLM and challenges the controversial view on the presence of GlyRs in these formations although there are reports (electrophysiology) on the presence of GlyRs in these particular brain regions. These new datasets in the current manuscript may certainly assist in understanding the complexity of fundamental building blocks of inhibitory synapses.

      However I have some minor points that the authors may address for clarification:

      1) In Figure 1 the authors apply PALM imaging of mEos4b-GlyRß (knockin) and here the corresponding Sylite label seems to be recorded in widefield, it is not clearly stated in the figure legend if it is widefield or super-resolved. In Fig 1 A - is the scale bar 5 µm? Some Sylite spots appear to be sized around 1 µm, especially the brighter spots, but maybe this is due to the lower resolution of widefield imaging? Regarding the statistical comparison: what method was chosen to test for normality distribution, I think this point is missing in the methods section.

      *This is correct; the apparent size of the Sylite spots does not reflect the real size of the synaptic gephyrin domain due to the limited resolution of widefield imaging including the detection of out-of-focus light. We have clarified in the legend of Fig. 1A that Sylite labelling was with classic epifluorescence microscopy. The scale bar in Fig. 1A corresponds to 5 µm. Since the data were not normally distributed, nonparametric tests (Kruskal- Wallis one-way ANOVA with Dunn’s multiple comparison test or Mann-Whitney U-test for pairwise comparisons) were used (p. 23). *

      Moreover I would appreciate a clarification and/or citation that the knockin model results in no structural and physiological changes at inhibitory synapses, I believe this model has been applied in previous studies and corresponding clarification can be provided.

      The Glrbeos/eos mouse model has been described previously and does not exhibit any structural or physiological phenotypes (Maynard et al. 2021 eLife). The issue was also raised by reviewer R1 (point 5) and has been clarified in the revised manuscript (p. 4).

      2) In the next set of experiments the authors switch to demixing dSTORM experiments - an explanation why this is performed is missing in the text - I guess better resolution to perform more detailed distance measurements? For these experiments: which region of the hippocampus did the authors select, I cannot find this information in legend or main text.

      Yes, the dSTORM experiments enable dual-colour structural analysis at high spatial resolution (see response to R1 point 7). An explanation has been added (p. 6).

      3) Regarding parameters of demixing experiments: the number of frames (10.000) seems quite low and the exposure time higher than expected for Alexa 647. Can the authors explain the reason for chosing these particular parameters (low expression profile of the target - so better separation?, less fluorophores on label and shorter collection time?) or is there a reference that can be cited? The laser power is given in the methods in percentage of maximal output power, but for better comparison and reproducibility I recommend to provide the values of a power meter (kW/cm2) as lasers may change their maximum output power during their lifetime.

      Acquisition parameters (laser power, exposure time) for dSTORM were chosen to obtain a good localisation precision (~12 nm; see R1 point 8). The number of frames is adequate to obtain well sampled gephyrin scaffolds in the CF680 channel. In the case of the GlyR (nanobody-AF647), the concept of spatial resolution does not really apply due to the low number of targets (see R1, point 13). Power density (irradiance) values have now been given (pp. 18, 21).

      4) For analysis of subsynaptic distribution: how did the authors decide to choose the parameters in the NEO software for DBSCAN clustering - was a series of parameters tested to find optimal conditions and did the analysis start with an initial test if data is indeed clustered (K-ripley) or is there a reference in literature that can be provided?

      DBSCAN parameters were optimised manually, by testing different values. Identification of dense and well-delimited gephyrin clusters (CF680) was achieved with a small radius and a high number of detections (80 nm, ≥ 50 neighbours), whereas filtering of low-density background in the AF647 channel (GlyRs) required less stringent parameters (200 nm, ≥ 5) due to the low number of target molecules. Similar parameters were used in a previous publication (Khayenko et al. 2022, Angewandte Chemie). The reference has been provided on p. 22 (see also R1 point 9).

      5) A conclusion/discussion of the results presented in Figure 5 is missing in the text/discussion.

      *This part of the manuscript has been completely overhauled. It includes new experimental data, quantification of the data (new Fig.5), as well as the discussion and interpretation of our findings (see also R1, point 3). In agreement with our earlier interpretation, the data confirm that low availability of GlyRa1 subunits limits the expression and synaptic targeting of GlyRa1/b heteropentamers. The observation that GlyRa1 overexpression with lentivirus increases the size of the postsynaptic gephyrin domain further points to a structural role, whereby GlyRs can enhance the stability (and size) of inhibitory synapses in hippocampal neurons, even at low copy numbers (pp. 13-14). *

      6) in line 552 "suspension" is misleading, better use "solution"

      Done.

      __Reviewer #2 (Significance (Required)): __

      Significance: The manuscript provides new insights to presence of low-copy numbers by visualizing them via SMLM. This is the first report that visualizes GlyR optically in the brain applying the knock-in model of mEOS4b tagged GlyRß and quantifies their copy number comparing distribution and amount of GlyRs from hippocampus and striatum. Imaging data correspond well to electrophysiological measurements in the manuscript.

      Field of expertise: Super-Resolution Imaging and corresponding analysis

      __Reviewer #4 (Evidence, reproducibility and clarity (Required)): __

      In this study, Camuso et al., make use of a knock-in mouse model expressing endogenously mEos4b-tagged GlyRβ to detect endogenous glycine receptors using single-molecule localization microscopy. The main conclusion from this study is that in the hippocampus GlyRβ molecules are barely detected, while inhibitory synapses in the ventral striatum seem to express functionally relevant GlyR numbers.

      I have a few points that I hope help to improve the strength of this study.

      • In the hippocampus, this study finds that the numbers of detections are very low. The authors perform adequate controls to indicate that these localizations are above noise level. Nevertheless, it remains questionable that these reflect proper GlyRs. The suggestion that in hippocampal synapses the low numbers of GlyRβ molecules "are important in assembly or maintenance of inhibitory synaptic structures in the brain" is on itself interesting, but is not at all supported. It is also difficult to envision how such low numbers could support the structure of a synapse. A functional experiment showing that knockdown of GlyRs affects inhibitory synapse structure in hippocampal neurons would be a minimal test of this.

      *It is not clear what the reviewer means by “it remains questionable that these reflect proper GlyRs”. The PALM experiments include a series of stringent controls (see R1, point 1) demonstrating the existence of low-copy GlyRs at inhibitory synapses in the hippocampus (Fig. 1) and in the striatum (Fig. 3), and are backed up by dSTORM experiments (Fig. 2). We have no reason to doubt that these receptors are fully functional (as demonstrated for the ventral striatum (Fig. 4). However, due to their low number, a role in inhibitory synaptic transmission is clearly limited, at least in the hippocampus and dorsal striatum. *

      • *

      We therefore propose a structural role, where the GlyRs could be required to stabilise the postsynaptic gephyrin domain in hippocampal neurons. This is based on the idea that the GlyR-gephyrin affinity is much higher than that of the GABAAR-gephyrin interaction (reviewed in Kasaragod & Schindelin 2018 Front Mol Neurosci). Accordingly, there is a close relationship between GlyRs and gephyrin numbers, sub-synaptic distribution, and dynamics in spinal cord synapses that are mostly glycinergic (Specht et al. 2013 Neuron; Maynard et al. 2021 eLife; Chapdelaine et al. 2021 Biophys J). It is reasonable to assume that low-copy GlyRs could play a similar structural role at hippocampal synapses. A knockdown experiment targeting these few receptors is technically very challenging and beyond the scope of this study. However, in response to the reviewer's question we have conducted new experiments in cultured hippocampal neurons (new__ Fig. 5__). They demonstrate that overexpression of GlyRa1/b heteropentamers increases the size of the postsynaptic domain in these neurons, supporting our interpretation of a structural role of low-copy GlyRs (p. 14).

      • The endogenous tagging strategy is a very strong aspect of this study and provides confidence in the labeling of GlyRβ molecules. One caveat however, is that this labeling strategy does not discriminate whether GlyRβ molecules are on the cell membrane or in internal compartments. Can the authors provide an estimate of the ratio of surface to internal GlyRβ molecules?

      Gephyrin is known to form a two-dimensional scaffold below the synaptic membrane to which inhibitory GlyRs and GABAARs attach (reviewed in Alvarez 2017 Brain Res). The majority of the synaptic receptors are therefore thought to be located in the synaptic membrane, which is supported by the close relationship between the sub-synaptic distribution of GlyRs and gephyrin in spinal cord neurons (e.g. Maynard et al. 2021 eLife). To demonstrate the surface expression of GlyRs at hippocampal synapses we labelled cultured hippocampal neurons expressing mEos4b-GlyRa1 with anti-Eos nanobody in non-permeabilised neurons (see Figure below for the reviewer only). The close correspondence between the nanobody (AF647) and the mEos4b signal confirms that the majority of the GlyRs are indeed located in the synaptic membrane.

      • *

      Figure (for the reviewer only).* Left: Lentivirus expression of mEos4b-GlyRa1 in fixed and non-permeabilised hippocampal neurons (mEos4b signal). Right: Surface labelling of the recombinant subunit with anti-Eos nanoboby (AF647). *

      • 'We also estimated the absolute number of GlyRs per synapse in the hippocampus. The number of mEos4b detections was converted into copy numbers by dividing the detections at synapses by the average number of detections of individual mEos4b-GlyRβ containing receptor complexes'. In essence this is a correct method to estimate copy numbers, and the authors discuss some of the pitfalls associated with this approach (i.e., maturation of fluorophore and detection limit). Nevertheless, the authors did not subtract the number of background localizations determined in the two negative control groups. This is critical, particularly at these low-number estimations.

      We fully agree that background subtraction can be useful with low detection numbers. In the revised manuscript, copy numbers are now reported as background-corrected values. Specifically, the mean number of detections measured in wildtype slices was used to calculate an equivalent receptor number, which was then subtracted from the copy number estimates across hippocampus, spinal cord and striatum. This procedure is described in the methods (p. 20) and results (p. 5, 8), and mentioned in the figure legends of Fig. 1C, 3C. The background corrected values are given in the text and Table 1.

      Furthermore, the authors state that "The advantage of this estimation is that it is independent of the stoichiometry of heteropentameric GlyRs". However, if the stoichometry is unknown, the number of counted GlyRβ subunits cannot simply be reported as the number of GlyRs. This should be discussed in more detail, and more carefully reported throughout the manuscript.

      *The reviewer is right to point this out. There is still some debate about the stoichiometry of heteropentameric GlyRs. Configurations with 2a:3b, 3a:2b and 4a:1b subunits have been advanced (e.g. Grudzinska et al. 2005 Neuron; Durisic et al. 2012 J Neurosci; Patrizio et al. 2017 Sci Rep; Zhu & Gouaux 2021 Nature). We have therefore chosen a quantification that is independent of the underlying stoichiometry. Since our quantification is based on very sparse clusters of mEos4b detections that likely originate from a single receptor complex (irrespective of its stoichiometry), the reported values actually reflect the number of GlyRs (and not GlyRb subunits). We have clarified this in the results (p. 5) and throughout the manuscript (Table 1). *

      • The dual-color imaging provides insights in the subsynaptic distribution of GlyRβ molecules in hippocampal synapses. Why are similar studies not performed on synapses in the ventral striatum where functionally relevant numbers of GlyRβ molecules are found? Here insights in the subsynaptic receptor distribution would be of much more interest as it can be tight to the function.

      This is an interesting suggestion. However, the primary aim of our study was to identify the existence of GlyRs in hippocampal regions. At low copy numbers, the concept of sub-synaptic domains (SSDs, e.g. Yang et al. 2021 EMBO Rep) becomes irrelevant (see R1 point 13). It should be pointed out that the dSTORM pointillist images (Fig. 2A) represent individual GlyR detections rather than clusters of molecules. In the striatum, our specific purpose was to solve an open question about the presence of GlyRs in different subregions (putamen, nucleus accumbens).

      • It is unclear how the experiments in Figure 5 add to this study. These results are valid, but do not seem to directly test the hypothesis that "the expression of α subunits may be limiting factor controlling the number of synaptic GlyRs". These experiments simply test if overexpressed α subunits can be detected. If the α subunits are limiting, measuring the effect of α subunit overexpression on GlyRβ surface expression would be a more direct test.

      Both R1 and R2 have also commented on the data in Fig. 5 and their interpretation. We have substantially revised this section as described before (see R1 point 3) including additional experiments and quantification of the data (new Fig. 5). The findings lend support to our earlier hypothesis that GlyR alpha subunits (in particular GlyRa1) are the limiting factor for the expression of heteropentameric GlyRa/b in hippocampal neurons (pp. 13-14). Since the GlyRa1 subunit itself does not bind to gephyrin (Patrizio et al. 2017 Sci Rep), the synaptic localisation of the recombinant mEos4b-GlyRa1 subunits is proof that they have formed heteropentamers with endogenous GlyRb subunits and driven their membrane trafficking, which the GlyRb subunits are incapable of doing on their own.

      __Reviewer #4 (Significance (Required)): __

      These results are based on carefully performed single-molecule localization experiments, and are well-presented and described. The knockin mouse with endogenously tagged GlyRβ molecules is a very strong aspect of this study and provides confidence in the labeling, the combination with single-molecule localization microscopy is very strong as it provides high sensitivity and spatial resolution.

      The conceptual innovation however seems relatively modest, these results confirm previous studies but do not seem to add novel insights. This study is entirely descriptive and does not bring new mechanistic insights.

      This study could be of interest to a specialized audience interested in glycine receptor biology, inhibitory synapse biology and super-resolution microscopy.

      my expertise is in super-resolution microscopy, synaptic transmission and plasticity

      As we have stated before, the novelty of our study lies in the use of SMLM for the identification of very small numbers of molecules, which requires careful control experiments. This is something that has not been done before and that can be of interest to a wider readership, as it opens up SMLM for ultrasensitive detection of rare molecular events. Using this approach, we solve two open scientific questions: (1) the demonstration that low-copy GlyRs are present at inhibitory synapses in the hippocampus, (2) the sub-region specific expression and functional role of GlyRs in the ventral versus dorsal striatum.

      • *

      • *

      The following review was provided later under the name “Reviewer #4”. To avoid confusion with the last reviewer from above we will refer to this review as R4-2.


      __Reviewer #4-2 (Evidence, reproducibility and clarity (Required)): __


      Summary:

      Provide a short summary of the findings and key conclusions (including methodology and model system(s) where appropriate).

      The authors investigate the presence of synaptic glycine receptors in the telencephalon, whose presence and function is poorly understood.

      Using a transgenically labeled glycine receptor beta subunit (Glrb-mEos4b) mouse model together with super-resolution microscopy (SLMM, dSTORM), they demonstrate the presence of a low but detectable amount of synaptically localized GLRB in the hippocampus. While they do not perform a functional analysis of these receptors, they do demonstrate that these subunits are integrated into the inhibitory postsynaptic density (iPSD) as labeled by the scaffold protein gephyrin. These findings demonstrate that a low level of synaptically localized glycerine receptor subunits exist in the hippocampal formation, although whether or not they have a functional relevance remains unknown.

      They then proceed to quantify synaptic glycine receptors in the striatum, demonstrating that the ventral striatum has a significantly higher amount of GLRB co-localized with gephyrin than the dorsal striatum or the hippocampus. They then recorded pharmacologically isolated glycinergic miniature inhibitory postsynaptic currents (mIPSCs) from striatal neurons. In line with their structural observations, these recordings confirmed the presence of synaptic glycinergic signaling in the ventral striatum, and an almost complete absence in the dorsal striatum. Together, these findings demonstrate that synaptic glycine receptors in the ventral striatum are present and functional, while an important contribution to dorsal striatal activity is less likely.

      Lastly, the authors use existing mRNA and protein datasets to show that the expression level of GLRA1 across the brain positively correlates with the presence of synaptic GLRB.

      The authors use lentiviral expression of mEos4b-tagged glycine receptor alpha1, alpha2, and beta subunits (GLRA1, GLRA1, GLRB) in cultured hippocampal neurons to investigate the ability of these subunits to cause the synaptic localization of glycine receptors. They suggest that the alpha1 subunit has a higher propensity to localize at the inhibitory postsynapse (labeled via gephyrin) than the alpha2 or beta subunits, and may therefore contribute to the distribution of functional synaptic glycine receptors across the brain.

      Major comments:

      • Are the key conclusions convincing?

      The authors are generally precise in the formulation of their conclusions.

      • They demonstrate a very low, but detectable, amount of a synaptically localized glycine receptor subunit in a transgenic (GlrB-mEos4b) mouse model. They demonstrate that the GLRB-mEos4b fusion protein is integrated into the iPSD as determined by gephyrin labelling. The authors do not perform functional tests of these receptors and do not state any such conclusions.
      • The authors show that GLRB-mEos4b is clearly detectable in the striatum and integrated into gephyrin clusters at a significantly higher rate in the ventral striatum compared to the dorsal striatum, which is in line with previous studies.
      • Adding to their quantification of GLRB-mEos4b in the striatum, the authors demonstrate the presence of glycinergic miniature IPSCs in the ventral striatum, and an almost complete absence of mIPSCs in the dorsal striatum. These currents support the observation that GLRB-mEos4b is more synaptically integrated in the ventral striatum compared to the dorsal striatum.
      • The authors show that lentiviral expression of GLRA1-mEos4b leads to a visually higher number of GLR clusters in cultured hippocampal neurons, and a co-localization of some clusters with gephyrin. The authors claim that this supports the idea that GLRA1 may be an important driver of synaptic glycine receptor localization. However, no quantification or statistical analysis of the number of puncta or their colocalization with gephyrin is provided for any of the expressed subunits. Such a claim should be supported by quantification and statistics A thorough analysis and quantification of the data in Fig.5 has been carried out as requested by all the other reviewers (e.g. R1, point 3). The new data and results have been described in the revised manuscript (pp. 9-10, 13-14).

      • Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether?

      One unaddressed caveat is the fact that a GLRB-mEos4b fusion protein may behave differently in terms of localization and synaptic integration than wild-type GLRB. While unlikely, it is possible that mEos4b interacts either with itself or synaptic proteins in a way that changes the fused GLRB subunit’s localization. Such an effect would be unlikely to affect synaptic function in a measurable way, but might be detected at a structural level by highly sensitive methods such as SMLM and STORM in regions with very low molecule numbers (such as the hippocampus). Since reliable antibodies against GLRB in brain tissue sections are not available, this would be difficult to test. Considering that no functional measures of the hippocampal detections exist, we would suggest that this possible caveat be mentioned for this particular experiment.

      *This question has also been raised before (R1, point 5). According to an earlier study the mEos4b-GlyRb knock-in does not cause any obvious phenotypes, with the possible exception of minor loss of glycine potency (Maynard et al. 2021 eLife). The fact that the synaptic levels in the spinal cord in heterozygous animals are precisely half of those of homozygous animals argues against differences in receptor expression, heteropentameric assembly, forward trafficking to the plasma membrane and integration into the synaptic membrane as confirmed using quantitative super-resolution CLEM (Maynard et al. 2021 eLife). Accordingly, we did not observe any behavioural deficits in these animals, making it a powerful experimental model. We have added this information in the revised manuscript (p. 4). *

      In addition, without any quantification or statistical analysis, the author’s claims regarding the necessity of GLRA1 expression for the synaptic localization of glycine receptors in cultured hippocampal neurons should probably be described as preliminary (Fig. 5).

      As mentioned before, we have substantially revised this part (R1, point 3). The quantification and analysis in the new Fig. 5 support our earlier interpretation.

      • Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation.

      The authors show that there is colocalization of gephyrin with the mEos4b-GlyRβ subunit using the Dual-colour SMLM. This is a powerful approach that allows for a claim to be made on the synaptic location of the glycine receptors. The images presented in Figure 1, together with the distance analysis in Figure 2, display the co-localization of the fluorophores. The co-localization images in all the selected regions, hippocampus and striatum, also show detections outside of the gephyrin clusters, which the authors refer to as extrasynaptic. These punctated small clusters seem to have the same size as the ones detected and assigned as part of the synapse. It would be informative if the authors analysed the distribution, density and size of these non-synaptic clusters and presented the data in the manuscript and also compared it against the synaptic ones. Validating this extrasynaptic signal by staining for a dendritic marker, such as MAP-2 or maybe a somatic marker and assessing the co-localization with the non-synaptic clusters would also add even more credibility to them being extrasynaptic.

      The existence of extrasynaptic GlyRs is well attested in spinal cord neurons (e.g. Specht et al. 2013 Neuron; this study see Fig. S2). The fact that these appear as small clusters of detections in SMLM recordings results from the fact that a single fluorophore can be detected several times in consecutive image frames and because of blinking. Therefore, small clusters of detections likely represent single GlyRs (that can be counted), and not assemblies of several receptor complexes. Due to their diffusion in the neuronal membrane, they are seen as diffuse signals throughout the somatodendritic compartment in epifluorescence images (e.g. Fig. 5A). SMLM recordings of the same cells resolves this diffuse signal into discrete nanoclusters representing individual receptors (Fig. 5B). It is not clear what information co-localisation experiments with specific markers could provide, especially in hippocampal neurons, in which the copy numbers (and density) of GlyRs is next to zero.

      In addition we would encourage the authors to quantify the clustering and co-localization of virally expressed GLRA1, GLRA2, and GLRB with gephyrin in order to support the associated claims (Fig. 5). Preferably, the density of GLR and gephyrin clusters (at least on the somatic surface, the proximal dendrites, or both) as well as their co-localization probability should be quantified if a causal claim about subunit-specific requirements for synaptic localization is to be made.

      Quantification of the data have been carried out (new Fig.5C,D). The results have been described before (R1, point 3) and support our earlier interpretation of the data (pp. 13-14).

      Lastly, even though it may be outside of the scope of such a study analysing other parts of the hippocampal area could provide additional important information. If one looks at the Allen Institute’s ISH of the beta subunit the strongest signal comes from the stratum oriens in the CA1 for example, suggesting that interneurons residing there would more likely have a higher expression of the glycine receptors. This could also be assessed by looking more carefully at the single cell transcriptomics, to see which cell types in the hippocampus show the highest mRNA levels. If the authors think that this is too much additional work, then perhaps a mention of this in the discussion would be good.

      We have added the requested information from the ISH database of the Allen Institute in the discussion as suggested by the reviewer (p. 12). However, in combination with the transcriptomic data (Fig. S1) our finding strongly suggest that the expression of synaptic GlyRs depends on the availability of alpha subunits rather than on the presence of the GlyRb transcript. This is obvious when one compares the mRNA levels in the hippocampus with those in the basal ganglia (striatum) and medulla. While the transcript concentrations of GlyRb are elevated in all three regions and essentially the same, our data show that the GlyRb copy numbers *at synapses differ over more than 2 orders of magnitude (Fig. 1B, Table 1). *

      • Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments.

      Since the labeling and some imaging has been performed already, the requested experiment would be a matter of deploying a method of quantification. In principle, it should not require any additional wet-lab experiments, although it may require additional imaging of existing samples.

      • Are the data and the methods presented in such a way that they can be reproduced?

      Yes, for the most part.

      • Are the experiments adequately replicated and statistical analysis adequate?

      Yes

      Minor comments:

      • Specific experimental issues that are easily addressable.

      N/A

      • Are prior studies referenced appropriately?

      Yes

      • Are the text and figures clear and accurate?

      Yes, although quantification in figure 5 is currently not present.

      A quantification has been added (see R1, point 3).

      • Do you have suggestions that would help the authors improve the presentation of their data and conclusions?

      This paper presents a method that could be used to localize receptors and perhaps other proteins that are in low abundance or for which a detailed quantification is necessary. I would therefore suggest that Figure S4 is included into Figure 2 as the first panel, showcasing the demixing, followed by the results.

      We agree in principle with this suggestion. However, the revised Fig. S4 is more complex and we think that it would distract from the data shown in Fig. 2. Given that Fig. S4 is mostly methodological and not essential to understand the text, we have kept it in the supplement for the time being. We leave the final decision on this point to the editor.

      __Reviewer #4-2 (Significance (Required)): __

      [This review was supplied later]

      • Describe the nature and significance of the advance (e.g. conceptual, technical, clinical) for the field.

      Using a novel and high resolution method, the authors have provided strong evidence for the presence of glycine receptors in the murine hippocampus and in the dorsal striatum. The number of receptors calculated is small compared to the numbers found in the ventral striatum. This is the first study to quantify receptor numbers in these region. In addition it also lays a roadmap for future studies addressing similar questions.

      • Place the work in the context of the existing literature (provide references, where appropriate).

      This is done well by the authors in the curation of the literature. As stated above, the authors have filled a gap in the presence of glycine receptors in different brain regions, a subject of importance in understanding the role they play in brain activity and function.

      • State what audience might be interested in and influenced by the reported findings.

      Neuroscientists working at the synaptic level, on inhibitory neurotransmission and on fundamental mechanisms of expression of genes at low levels and their relationship to the presence of the protein would be interested. Furthermore, researchers in neuroscience and cell biology may benefit from and be inspired by the approach used in this manuscript, to potentially apply it to address their own aims.

      *We thank the reviewer for the positive assessment of the technical and biological implications of our work, as well as the interest of our findings to a wide readership of neuroscientists and cell biologists. *

      • Define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate.

      Synaptic transmission, inhibitory cells and GABAergic synapses functionally and structurally, cortex and cortical circuits. No strong expertise in super-resolution imaging methods.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #4

      Evidence, reproducibility and clarity

      Summary: Provide a short summary of the findings and key conclusions (including methodology and model system(s) where appropriate).

      The authors investigate the presence of synaptic glycine receptors in the telencephalon, whose presence and function is poorly understood.

      Using a transgenically labeled glycine receptor beta subunit (Glrb-mEos4b) mouse model together with super-resolution microscopy (SLMM, dSTORM), they demonstrate the presence of a low but detectable amount of synaptically localized GLRB in the hippocampus. While they do not perform a functional analysis of these receptors, they do demonstrate that these subunits are integrated into the inhibitory postsynaptic density (iPSD) as labeled by the scaffold protein gephyrin. These findings demonstrate that a low level of synaptically localized glycerine receptor subunits exist in the hippocampal formation, although whether or not they have a functional relevance remains unknown.

      They then proceed to quantify synaptic glycine receptors in the striatum, demonstrating that the ventral striatum has a significantly higher amount of GLRB co-localized with gephyrin than the dorsal striatum or the hippocampus. They then recorded pharmacologically isolated glycinergic miniature inhibitory postsynaptic currents (mIPSCs) from striatal neurons. In line with their structural observations, these recordings confirmed the presence of synaptic glycinergic signaling in the ventral striatum, and an almost complete absence in the dorsal striatum. Together, these findings demonstrate that synaptic glycine receptors in the ventral striatum are present and functional, while an important contribution to dorsal striatal activity is less likely.

      Lastly, the authors use existing mRNA and protein datasets to show that the expression level of GLRA1 across the brain positively correlates with the presence of synaptic GLRB. The authors use lentiviral expression of mEos4b-tagged glycine receptor alpha1, alpha2, and beta subunits (GLRA1, GLRA1, GLRB) in cultured hippocampal neurons to investigate the ability of these subunits to cause the synaptic localization of glycine receptors. They suggest that the alpha1 subunit has a higher propensity to localize at the inhibitory postsynapse (labeled via gephyrin) than the alpha2 or beta subunits, and may therefore contribute to the distribution of functional synaptic glycine receptors across the brain.

      Major comments: - Are the key conclusions convincing?

      The authors are generally precise in the formulation of their conclusions.

      1) They demonstrate a very low, but detectable, amount of a synaptically localized glycine receptor subunit in a transgenic (GlrB-mEos4b) mouse model. They demonstrate that the GLRB-mEos4b fusion protein is integrated into the iPSD as determined by gephyrin labelling. The authors do not perform functional tests of these receptors and do not state any such conclusions. 2) The authors show that GLRB-mEos4b is clearly detectable in the striatum and integrated into gephyrin clusters at a significantly higher rate in the ventral striatum compared to the dorsal striatum, which is in line with previous studies. 3) Adding to their quantification of GLRB-mEos4b in the striatum, the authors demonstrate the presence of glycinergic miniature IPSCs in the ventral striatum, and an almost complete absence of mIPSCs in the dorsal striatum. These currents support the observation that GLRB-mEos4b is more synaptically integrated in the ventral striatum compared to the dorsal striatum. 4) The authors show that lentiviral expression of GLRA1-mEos4b leads to a visually higher number of GLR clusters in cultured hippocampal neurons, and a co-localization of some clusters with gephyrin. The authors claim that this supports the idea that GLRA1 may be an important driver of synaptic glycine receptor localization. However, no quantification or statistical analysis of the number of puncta or their colocalization with gephyrin is provided for any of the expressed subunits. Such a claim should be supported by quantification and statistics

      • Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether?

      One unaddressed caveat is the fact that a GLRB-mEos4b fusion protein may behave differently in terms of localization and synaptic integration than wild-type GLRB. While unlikely, it is possible that mEos4b interacts either with itself or synaptic proteins in a way that changes the fused GLRB subunit's localization. Such an effect would be unlikely to affect synaptic function in a measurable way, but might be detected at a structural level by highly sensitive methods such as SMLM and STORM in regions with very low molecule numbers (such as the hippocampus). Since reliable antibodies against GLRB in brain tissue sections are not available, this would be difficult to test. Considering that no functional measures of the hippocampal detections exist, we would suggest that this possible caveat be mentioned for this particular experiment.

      In addition, without any quantification or statistical analysis, the author's claims regarding the necessity of GLRA1 expression for the synaptic localization of glycine receptors in cultured hippocampal neurons should probably be described as preliminary (Fig. 5).

      • Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation.

      The authors show that there is colocalization of gephyrin with the mEos4b-GlyRβ subunit using the Dual-colour SMLM. This is a powerful approach that allows for a claim to be made on the synaptic location of the glycine receptors. The images presented in Figure 1, together with the distance analysis in Figure 2, display the co-localization of the fluorophores. The co-localization images in all the selected regions, hippocampus and striatum, also show detections outside of the gephyrin clusters, which the authors refer to as extrasynaptic. These punctated small clusters seem to have the same size as the ones detected and assigned as part of the synapse. It would be informative if the authors analysed the distribution, density and size of these non-synaptic clusters and presented the data in the manuscript and also compared it against the synaptic ones. Validating this extrasynaptic signal by staining for a dendritic marker, such as MAP-2 or maybe a somatic marker and assessing the co-localization with the non-synaptic clusters would also add even more credibility to them being extrasynaptic.

      In addition we would encourage the authors to quantify the clustering and co-localization of virally expressed GLRA1, GLRA2, and GLRB with gephyrin in order to support the associated claims (Fig. 5). Preferably, the density of GLR and gephyrin clusters (at least on the somatic surface, the proximal dendrites, or both) as well as their co-localization probability should be quantified if a causal claim about subunit-specific requirements for synaptic localization is to be made.

      Lastly, even though it may be outside of the scope of such a study analysing other parts of the hippocampal area could provide additional important information. If one looks at the Allen Institute's ISH of the beta subunit the strongest signal comes from the stratum oriens in the CA1 for example, suggesting that interneurons residing there would more likely have a higher expression of the glycine receptors. This could also be assessed by looking more carefully at the single cell transcriptomics, to see which cell types in the hippocampus show the highest mRNA levels. If the authors think that this is too much additional work, then perhaps a mention of this in the discussion would be good.

      • Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments.

      Since the labeling and some imaging has been performed already, the requested experiment would be a matter of deploying a method of quantification. In principle, it should not require any additional wet-lab experiments, although it may require additional imaging of existing samples.

      • Are the data and the methods presented in such a way that they can be reproduced?

      Yes, for the most part.

      • Are the experiments adequately replicated and statistical analysis adequate?

      Yes

      Minor comments: - Specific experimental issues that are easily addressable.

      N/A

      • Are prior studies referenced appropriately?

      Yes

      • Are the text and figures clear and accurate?

      Yes, although quantification in figure 5 is currently not present.

      • Do you have suggestions that would help the authors improve the presentation of their data and conclusions?

      This paper presents a method that could be used to localize receptors and perhaps other proteins that are in low abundance or for which a detailed quantification is necessary. I would therefore suggest that Figure S4 is included into Figure 2 as the first panel, showcasing the demixing, followed by the results.

      Significance

      • Describe the nature and significance of the advance (e.g. conceptual, technical, clinical) for the field.

      Using a novel and high resolution method, the authors have provided strong evidence for the presence of glycine receptors in the murine hippocampus and in the dorsal striatum. The number of receptors calculated is small compared to the numbers found in the ventral striatum. This is the first study to quantify receptor numbers in these region. In addition it also lays a roadmap for future studies addressing similar questions.

      • Place the work in the context of the existing literature (provide references, where appropriate).

      This is done well by the authors in the curation of the literature. As stated above, the authors have filled a gap in the presence of glycine receptors in different brain regions, a subject of importance in understanding the role they play in brain activity and function.

      • State what audience might be interested in and influenced by the reported findings.

      Neuroscientists working at the synaptic level, on inhibitory neurotransmission and on fundamental mechanisms of expression of genes at low levels and their relationship to the presence of the protein would be interested. Furthermore, researchers in neuroscience and cell biology may benefit from and be inspired by the approach used in this manuscript, to potentially apply it to address their own aims.

      • Define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate.

      Synaptic transmission, inhibitory cells and GABAergic synapses functionally and structurally, cortex and cortical circuits. No strong expertise in super-resolution imaging methods.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      We thank the reviewer for his valuable input and careful assessment, which have significantly improved the clarity and rigor of our manuscript.

      Summary:

      Mazer & Yovel 2025 dissect the inverse problem of how echolocators in groups manage to navigate their surroundings despite intense jamming using computational simulations.

      The authors show that despite the 'noisy' sensory environments that echolocating groups present, agents can still access some amount of echo-related information and use it to navigate their local environment. It is known that echolocating bats have strong small and large-scale spatial memory that plays an important role for individuals. The results from this paper also point to the potential importance of an even lower-level, short-term role of memory in the form of echo 'integration' across multiple calls, despite the unpredictability of echo detection in groups. The paper generates a useful basis to think about the mechanisms in echolocating groups for experimental investigations too.

      Strengths:

      (1) The paper builds on biologically well-motivated and parametrised 2D acoustics and sensory simulation setup to investigate the various key parameters of interest

      (2) The 'null-model' of echolocators not being able to tell apart objects & conspecifics while echolocating still shows agents successfully emerge from groups - even though the probability of emergence drops severely in comparison to cognitively more 'capable' agents. This is nonetheless an important result showing the directionof-arrival of a sound itself is the 'minimum' set of ingredients needed for echolocators navigating their environment.

      (3) The results generate an important basis in unraveling how agents may navigate in sensorially noisy environments with a lot of irrelevant and very few relevant cues.

      (4) The 2D simulation framework is simple and computationally tractable enough to perform multiple runs to investigate many variables - while also remaining true to the aim of the investigation.

      Weaknesses:

      There are a few places in the paper that can be misunderstood or don't provide complete details. Here is a selection:

      (1) Line 61: '... studies have focused on movement algorithms while overlooking the sensory challenges involved' : This statement does not match the recent state of the literature. While the previous models may have had the assumption that all neighbours can be detected, there are models that specifically study the role of limited interaction arising from a potential inability to track all neighbours due to occlusion, and the effect of responding to only one/few neighbours at a time e.g. Bode et al. 2011 R. Soc. Interface, Rosenthal et al. 2015 PNAS, Jhawar et al. 2020 Nature Physics.

      We appreciate the reviewer's comment and the relevant references. We have revised the manuscript accordingly to clarify the distinction between studies that incorporate limited interactions and those that explicitly analyze sensory constraints and interference. We have refined our statement to acknowledge these contributions while maintaining our focus on sensory challenges beyond limited neighbor detection, such as signal degradation, occlusion effects, and multimodal sensory integration (see lines 58-64):

      (2) The word 'interference' is used loosely places (Line 89: '...took all interference signals...', Line 319: 'spatial interference') - this is confusing as it is not clear whether the authors refer to interference in the physics/acoustics sense, or broadly speaking as a synonym for reflections and/or jamming.

      To improve clarity, we have revised the manuscript to distinguish between different types of interference:

      • Acoustic interference (jamming): Overlapping calls that completely obscure echo detection, preventing bats from perceiving necessary environmental cues.

      • Acoustic interference (masking): Partial reduction in signal clarity due to competing calls.

      • Spatial interference: Physical obstruction by conspecifics affecting movement and navigation.

      We have updated the manuscript to use these terms consistently and explicitly define them in relevant sections (see lines 84-85, 119-120). This distinction ensures that the reader can differentiate between interference as an acoustic phenomenon and its broader implications in navigation.

      (3) The paper discusses original results without reference to how they were obtained or what was done. The lack of detail here must be considered while interpreting the Discussion e.g. Line 302 ('our model suggests...increasing the call-rate..' - no clear mention of how/where call-rate was varied) & Line 323 '..no benefit beyond a certain level..' - also no clear mention of how/where call-level was manipulated in the simulations.

      All tested parameters, including call rate dynamics and call intensity variations, are detailed in the Methods section and Tables 1 and 2. Specifically:

      • Call Rate Variation: The Inter-Pulse Interval (IPI) was modeled based on documented echolocation behavior, decreasing from 100 msec during the search phase to 35 msec (~28 calls per second) at the end of the approach phase, and to 5 msec (200 calls per second) during the final buzz (see Table 2). This natural variation in call rate was not manually manipulated in the model but emerged from the simulated bat behavior.

      • Call Intensity Variation: The tested call intensity levels (100, 110, 120, 130 dB SPL) are presented in Table 1 under the “Call Level” parameter. The effect of increasing call intensity was analyzed in relation to exit probability, jamming probability, and collision rate. This is now explicitly referenced in the Discussion. We have revised the manuscript to explicitly reference these aspects in the Results and Discussion sections – see lines 346-349, 372-375.

      Reviewer #2 (Public review):

      We are grateful for the reviewer’s insightful feedback, which has helped us clarify key aspects of our research and strengthen our conclusions.

      This manuscript describes a detailed model of bats flying together through a fixed geometry. The model considers elements that are faithful to both bat biosonar production and reception and the acoustics governing how sound moves in the air and interacts with obstacles. The model also incorporates behavioral patterns observed in bats, like one-dimensional feature following and temporal integration of cognitive maps. From a simulation study of the model and comparison of the results with the literature, the authors gain insight into how often bats may experience destructive interference of their acoustic signals and those of their peers, and how much such interference may actually negatively affect the groups' ability to navigate effectively. The authors use generalized linear models to test the significance of the effects they observe.

      In terms of its strengths, the work relies on a thoughtful and detailed model that faithfully incorporates salient features, such as acoustic elements like the filter for a biological receiver and temporal aggregation as a kind of memory in the system. At the same time, the authors' abstract features are complicating without being expected to give additional insights, as can be seen in the choice of a twodimensional rather than three-dimensional system. I thought that the level of abstraction in the model was perfect, enough to demonstrate their results without needless details. The results are compelling and interesting, and the authors do a great job discussing them in the context of the biological literature. 

      The most notable weakness I found in this work was that some aspects of the model were not entirely clear to me. 

      For example, the directionality of the bat's sonar call in relation to its velocity. Are these the same?

      For simplicity, in our model, the head is aligned with the body, therefore the direction of the echolocation beam is the same as the direction of the flight. 

      Moreover, call directionality (directivity) is not directly influenced by velocity. Instead, directionality is estimated using the piston model, as described in the Methods section. The directionality is based on the emission frequency and is thus primarily linked to the behavioral phases of the bat, with frequency shifts occurring as the bat transitions from search to approach to buzz phases. During the approach phase, the bat emits calls with higher frequencies, resulting in increased directionality. This is supported by the literature (Jakobsen and Surlykke, 2010; Jakobsen, Brinkløv and Surlykke, 2013). This phase is also associated with a natural reduction in flight speed, which is a well-documented behavioral adaptation in echolocating bats(Jakobsen et al., 2024).

      To clarify this in the manuscript, we have updated the text to explicitly state that directionality follows phase-dependent frequency changes rather than being a direct function of velocity, see lines 543-545. 

      If so, what is the difference between phi_target and phi_tx in the model equations? 

      𝝓<sub>𝒕𝒂𝒓𝒈𝒆𝒕</sub> represents the angle between the bat and the reflected object (target).

      𝝓<sub>𝑻𝒙</sub> the angle [rad], between the masking bat and target (from the transmitter’s perspective)

      𝝓<sub>𝑻𝒙𝑹𝒙</sub> refers to the angle between the transmitting conspecific and the receiving focal bat, from the transmitter’s point of view.

      𝝓<sub>𝑹𝒙𝑻𝒙</sub> represents the angle between the receiving bat and the transmitting bat, from the receiver’s point of view.

      These definitions have been explicitly stated in the revised manuscript to prevent any ambiguity (lines 525-530). Additionally, a Supplementary figure demonstrating the geometrical relations has been added to the manuscript.

      What is a bat's response to colliding with a conspecific (rather than a wall)? 

      In nature, minor collisions between bats are common and typically do not result in significant disruptions to flight (Boerma et al., 2019; Roy et al., 2019; Goldshtein et al., 2025). Given this, our model does not explicitly simulate the physical impact of a collision event. Instead, during the collision event the bat keeps decreasing its velocity and changing its flight direction until the distance between bats is above the threshold (0.4 m). We assume that the primary cost of such interactions arises from the effort required to avoid collisions, rather than from the collision itself. This assumption aligns with observations of bat behavior in dense flight environments, where individuals prioritize collision avoidance rather than modeling post-collision dynamics. See lines 479-484.

      From the statistical side, it was not clear if replicate simulations were performed. If they were, which I believe is the right way due to stochasticity in the model, how many replicates were used, and are the standard errors referred to throughout the paper between individuals in the same simulation or between independent simulations, or both? 

      The number of repetitions for each scenario is detailed in Table 1, but we included it in a more prominent location in the text for clarity. Specifically, we now state (Lines 110-111):

      "The number of repetitions for each scenario was as follows: 1 bat: 240; 2 bats: 120; 5 bats: 48; 10 bats: 24; 20 bats: 12; 40 bats: 12; 100 bats: 6."

      Regarding the reported standard errors, they are calculated across all individuals within each scenario, without distinguishing between different simulation trials. 

      We clarified in the revised text (Lines 627-628 in Statistical Analysis) 

      Overall, I found these weaknesses to be superficial and easily remedied by the authors. The authors presented well-reasoned arguments that were supported by their results, and which were used to demonstrate how call interference impacts the collective's roost exit as measured by several variables. As the authors highlight, I think this work is valuable to individuals interested in bat biology and behavior, as well as to applications in engineered multi-agent systems like robotic swarms.

      Reviewer #3 (Public review):

      We sincerely appreciate the reviewer’s thoughtful comments and the time invested in evaluating our work, which have greatly contributed to refining our study.

      We would like to note that in general, our model often simplifies some of the bats’ abilities, under the assumption that if the simulated bats manage to perform this difficult task with simpler mechanisms, real better adapted bats will probably perform even better. This thought strategy will be repeated in several of the s below.

      Summary:

      The authors describe a model to mimic bat echolocation behavior and flight under high-density conditions and conclude that the problem of acoustic jamming is less severe than previously thought, conflating the success of their simulations (as described in the manuscript) with hard evidence for what real bats are actually doing. The authors base their model on two species of bats that fly at "high densities" (defined by the authors as colony sizes from tens to tens of thousands of individuals and densities of up to 33.3 bats/m2), Pipistrellus kuhli and Rhinopoma microphyllum. This work fits into the broader discussion of bat sensorimotor strategies during collective flight, and simulations are important to try to understand bat behavior, especially given a lack of empirical data. However, I have major concerns about the assumptions of the parameters used for the simulation, which significantly impact both the results of the simulation and the conclusions that can be made from the data. These details are elaborated upon below, along with key recommendations the authors should consider to guide the refinement of the model.

      Strengths:

      This paper carries out a simulation of bat behavior in dense swarms as a way to explain how jamming does not pose a problem in dense groups. Simulations are important when we lack empirical data. The simulation aims to model two different species with different echolocation signals, which is very important when trying to model echolocation behavior. The analyses are fairly systematic in testing all ranges of parameters used and discussing the differential results.

      Weaknesses:

      The justification for how the different foraging phase call types were chosen for different object detection distances in the simulation is unclear. Do these distances match those recorded from empirical studies, and if so, are they identical for both species used in the simulation? 

      The distances at which bats transition between echolocation phases are identical for both species in our model (see Table 2). These distances are based on welldocumented empirical studies of bat hunting and obstacle avoidance behavior (Griffin, Webster and Michael, 1958; Simmons and Kick, 1983; Schnitzler et al., 1987; Kalko, 1995; Hiryu et al., 2008; Vanderelst and Peremans, 2018). These references provide extensive evidence that insectivorous bats systematically adjust their echolocation calls in response to object proximity, following the characteristic phases of search, approach, and buzz.

      To improve clarity, we have updated the text to explicitly state that the phase transition distances are empirically grounded and apply equally to both modeled species (lines 499-508).

      What reasoning do the authors have for a bat using the same call characteristics to detect a cave wall as they would for detecting a small insect? 

      In echolocating bats, call parameters are primarily shaped by the target distance and echo strength. Accordingly, there is little difference in call structure between prey capture and obstacles-related maneuvers, aside from intensity adjustments based on target strength (Hagino et al., 2007; Hiryu et al., 2008; Surlykke, Ghose and Moss, 2009; Kothari et al., 2014). In our study, due to the dense cave environment, the bats are found to operate in the approach phase most of the time, which is consistent with natural cave emergence, where they are navigating through a cluttered environment rather than engaging in open-space search. For one of the species (Rhinopoma), we also have empirical recordings of individuals flying under similar conditions (Goldshtein et al., 2025). Our model was designed to remain as simple as possible while relying on conservative assumptions that may underestimate bat performance. If, in reality, bats fine-tune their echolocation calls even earlier or more precisely during navigation than assumed, our model would still conservatively reflect their actual capabilities. See lines 500-508.

      The two species modeled have different calls. In particular, the bandwidth varies by a factor of 10, meaning the species' sonars will have different spatial resolutions. Range resolution is about 10x better for PK compared to RM, but the authors appear to use the same thresholds for "correct detection" for both, which doesn't seem appropriate.

      The detection process in our model is based on Saillant’s method using a filterbank, as detailed in the paper (Saillant et al., 1993; Neretti et al., 2003; Sanderson et al., 2003). This approach inherently incorporates the advantages of a wider bandwidth, meaning that the differences in range resolution between the species are already accounted for within the signal-processing framework. Thus, there is no need to explicitly adjust the model parameters for bandwidth variations, as these effects emerge from the applied method.

      Also, the authors did not mention incorporating/correcting for/exploiting Doppler, which leads me to assume they did not model it.

      The reviewer is correct. To maintain model simplicity, we did not incorporate the Doppler effect or its impact on echolocation. The exclusion of Doppler effects was based on the assumption that while Doppler shifts can influence frequency perception, their impact on jamming and overall navigation performance is minor within the modelled context.

      The maximal Doppler shifts expected for the bats in this scenario are of ~ 1kHz. These shifts would be applied variably across signals due to the semi-random relative velocities between bats, leading to a mixed effect on frequency changes. This variability would likely result in an overall reduction in jamming rather than exacerbating it, aligning with our previous statement that our model may overestimate the severity of acoustic interference. Such Doppler shifts would result in errors of 2-4 cm in localization (i.e., 200-400 micro-seconds) (Boonman, Parsons and Jones, 2003).

      We have now explicitly highlighted this in the revised version (see 548-581).

      The success of the simulation may very well be due to variation in the calls of the bats, which ironically enough demonstrates the importance of a jamming avoidance response in dense flight. This explains why the performance of the simulation falls when bats are not able to distinguish their own echoes from other signals. For example, in Figure C2, there are calls that are labeled as conspecific calls and have markedly shorter durations and wider bandwidths than others. These three phases for call types used by the authors may be responsible for some (or most) of the performance of the model since the correlation between different call types is unlikely to exceed the detection threshold. But it turns out this variation in and of itself is what a jamming avoidance response may consist of. So, in essence, the authors are incorporating a jamming avoidance response into their simulation. 

      We fully agree that the natural variations in call design between the phases contribute significantly to interference reduction (see our discussion in a previous paper in Mazar & Yovel, 2020). However, we emphasize that this cannot be classified as a Jamming Avoidance Response (JAR). In our model, bats respond only to the physical presence of objects and not to the acoustic environment or interference itself. There is no active or adaptive adjustment of call design to minimize jamming beyond the natural phase-dependent variations in call structure. Therefore, while variation in call types does inherently reduce interference, this effect emerges passively from the modeled behavior rather than as an intentional strategy to avoid jamming. 

      The authors claim that integration over multiple pings (though I was not able to determine the specifics of this integration algorithm) reduces the masking problem. Indeed, it should: if you have two chances at detection, you've effectively increased your SNR by 3dB.  

      The reviewer is correct. Indeed, integration over multiple calls improves signal-tonoise ratio (SNR), effectively increasing it by approximately 3 dB per doubling of observations. The specifics of the integration algorithm are detailed in the Methods section, where we describe how sensory information is aggregated across multiple time steps to enhance detection reliability.

      They also claim - although it is almost an afterthought - that integration dramatically reduces the degradation caused by false echoes. This also makes sense: from one ping to the next, the bat's own echo delays will correlate extremely well with the bat's flight path. Echo delays due to conspecifics will jump around kind of randomly. However, the main concern is regarding the time interval and number of pings of the integration, especially in the context of the bat's flight speed. The authors say that a 1s integration interval (5-10 pings) dramatically reduces jamming probability and echo confusion. This number of pings isn't very high, and it occurs over a time interval during which the bat has moved 5-10m. This distance is large compared to the 0.4m distance-to-obstacle that triggers an evasive maneuver from the bat, so integration should produce a latency in navigation that significantly hinders the ability to avoid obstacles. Can the authors provide statistics that describe this latency, and discussion about why it doesn't seem to be a problem? 

      As described in the Methods section, the bat’s collision avoidance response does not solely rely on the integration process. Instead, the model incorporates real-time echoes from the last calls, which are used independently of the integration process for immediate obstacle avoidance maneuvers. This ensures that bats can react to nearby obstacles without being hindered by the integration latency. The slower integration on the other hand is used for clustering, outlier removal and estimation wall directions to support the pathfinding process, as illustrated in Supplementary Figure 1.

      Additionally, our model assumes that bats store the physical positions of echoes in an allocentric coordinate system (x-y). The integration occurs after transforming these detections from a local relative reference frame to a global spatial representation. This allows for stable environmental mapping while maintaining responsiveness to immediate changes in the bat’s surroundings.

      See lines 600-616 in the revised version.

      The authors are using a 2D simulation, but this very much simplifies the challenge of a 3D navigation task, and there is an explanation as to why this is appropriate. Bat densities and bat behavior are discussed per unit area when realistically it should be per unit volume. In fact, the authors reference studies to justify the densities used in the simulation, but these studies were done in a 3D world. If the authors have justification for why it is realistic to model a 3D world in a 2D simulation, I encourage them to provide references justifying this approach. 

      We acknowledge that this is a simplification; however, from an echolocation perspective, a 2D framework represents a worst-case scenario in terms of bat densities and maneuverability:

      • Higher Effective Density: A 2D model forces all bats into a single plane rather than distributing them through a 3D volume, increasing the likelihood of overlap in calls and echoes and making jamming more severe. As described in the text: the average distance to the nearest bat in our simulation is 0.27m (with 100 bats), whereas reported distances in very dense colonies are 0.5m (Fujioka et al., 2021), as observed in Myotis grisescens (Sabol and Hudson, 1995) and Tadarida brasiliensis (Theriault et al., no date; Betke et al., 2008; Gillam et al., 2010)

      • Reduced Maneuverability: In 3D space, bats can use vertical movement to avoid obstacles and conspecifics. A 2D constraint eliminates this degree of freedom, increasing collision risk and limiting escape options.

      Thus, our 2D model provides a conservative difficult test case, ensuring that our findings are valid under conditions where jamming and collision risks are maximized. Additionally, the 2D framework is computationally efficient, allowing us to perform multiple simulation runs to explore a broad parameter space and systematically test the impact of different variables.

      To address the reviewer’s concern, we have clarified this justification in the revised text and will provide supporting references where applicable (see Methods lines 450455).

      The focus on "masking" (which appears to be just in-band noise), especially relative to the problem of misassigned echoes, is concerning. If the bat calls are all the same waveform (downsweep linear FM of some duration, I assume - it's not clear from the text), false echoes would be a major problem. Masking, as the authors define it, just reduces SNR. This reduction is something like sqrt(N), where N is the number of conspecifics whose echoes are audible to the bat, so this allows the detection threshold to be set lower, increasing the probability that a bat's echo will exceed a detection threshold. False echoes present a very different problem. They do not reduce SNR per se, but rather they cause spurious threshold excursions (N of them!) that the bat cannot help but interpret as obstacle detection. I would argue that in dense groups the mis-assignment problem is much more important than the SNR problem. 

      There is substantial literature supporting the assumption that bats can recognize their own echoes and distinguish them from conspecific signals (Schnitzler, Bioscience and 2001, no date; Kazial, Burnett and Masters, 2001; Burnett and Masters, 2002; Kazial, Kenny and Burnett, 2008; Chili, Xian and Moss, 2009; Yovel et al., 2009; Beetz and Hechavarría, 2022)). However, we acknowledge that false echoes may present a major challenge in dense groups. To address this, we explicitly tested the impact of the self-echo identification assumption in our study see Results Figure 1: The impact of confusion on performance, and lines 399-404 in the Discussion.

      Furthermore, we examined a full confusion scenario, where all reflected echoes from conspecifics were misinterpreted as obstacle reflections (i.e., 100% confusion). Our results show that this significantly degrades navigation performance, supporting the argument that echo misassignment is a critical issue. However, we also explored a simple mitigation strategy based on temporal integration with outlier rejection, which provided some improvement in performance. This suggests that real bats may possess additional mechanisms to enhance self-echo identification and reduce false detections. See lines 411-420 in the manuscript for further discussion. 

      We actually used logarithmically frequency modulated (FM) chirps, generated using the MATLAB built-in function chirp(t, f0, t1, f1, 'logarithmic'). This method aligns with the nonlinear FM characteristics of Pipistrellus kuhlii (PK) and Rhinopoma microphyllum (RM) and provides a realistic approximation of their echolocation signals. We acknowledge that this was not sufficiently emphasized in the original text, and we have now explicitly highlighted this in the revised version to ensure clarity (see Lines 509-512 in Methods).

      The criteria set for flight behavior (lines 393-406) are not justified with any empirical evidence of the flight behavior of wild bats in collective flight. How did the authors determine the avoidance distances? Also, what is the justification for the time limit of 15 seconds to emerge from the opening? Instead of an exit probability, why not instead use a time criterion, similar to "How long does it take X% of bats to exit?"  :

      While we acknowledge that wild bats may employ more complex behaviors for collision avoidance, we chose to implement a simplified decision-making rule in our model to maintain computational tractability.

      The avoidance distances (1.5 m from walls and 0.4 m from other bats) were selected as internal parameters to support stable and realistic flight trajectories while maintaining a reasonable collision rate. These values reflect a trade-off between maneuverability and behavioral coherence under crowding. To address this point, we added a sensitivity analysis to the revised manuscript. Specifically, we tested the effect of varying the conspecific avoidance distance from 0.2 to 1.6 meters at bat densities of 2 to 40 bats/3m². The only statistically significant impact was at the highest density (40 bats/3m²), where exit probability increased slightly from 82% to 88% (p = 0.024, t = 2.25, DF = 958). No significant changes were observed in exit time, collision rate, or jamming probability across other densities or conditions (GLM, see revised Methods). These results suggest that the selected avoidance distances are robust and not a major driver of model performance, see lines 469-47.

      The 15-second exit limit was determined as described in the text (Lines 489-491): “A 15-second window was chosen because it is approximately twice the average exit time for 40 bats and allows for a second corrective maneuver if needed.” In other words, it allowed each bat to circle the ‘cave’ twice to exit even in the most crowded environment. This threshold was set to keep simulation time reasonable while allowing sufficient time for most bats to exit successfully.

      We acknowledge that the alternative approach suggested by the reviewer— measuring the time taken for a certain percentage of bats to exit—is also valid. However, in our model, some outlier bats fail to exit and continue flying for many minutes, such simulations would lead to excessive simulation times making it difficult to generate repetitions and not teaching us much – they usually resulted from the bat slightly missing the opening (see video S1. Our chosen approach ensures practical runtime constraints while still capturing relevant performance metrics.

      What is the empirical justification for the 1-10 calls used for integration?  

      The "average exit time for 40 bats" is also confusing and not well explained. Was this determined empirically? From the simulation? If the latter, what are the conditions?

      Does it include masking, no masking, or which species? 

      Previous studies have demonstrated that bats integrate acoustic information received sequentially over several echolocation calls (2-15), effectively constructing an auditory scene in complex environments (Ulanovsky and Moss, 2008; Chili, Xian and Moss, 2009; Moss and Surlykke, 2010; Yovel and Ulanovsky, 2017; Salles, Diebold and Moss, 2020). Additionally, bats are known to produce echolocation sound groups when spatiotemporal localization demands are high (Kothari et al., 2014). Studies have documented call sequences ranging from 2 to 15 grouped calls (Moss and Surlykke, 2010), and it has been hypothesized that grouping facilitates echo segregation.

      We did not use a single integration window - we tested integration sizes between 1 and 10 calls and presented the results in Figure 3A. This range was chosen based on prior empirical findings and to explore how different levels of temporal aggregation impact navigation performance. Indeed, the results showed that the performance levels between 5-10 calls integration window (Figure 3A)

      Regarding the average exit time for 40 bats, this value was determined from our simulations, where it represents the mean time for successful exits under standard conditions with masking. We have revised the text to clarify these details see, lines 489-491.

      Reviewer #1 (Recommendations for the authors):

      (1) Data Availability:

      As it stands now, this reviewer cannot vouch for the uploaded code as it wasn't accessible according to F.A.I.R principles. The link to the code/data points to a private company's file-hosting account that requires logging in or account creation to see its contents, and thus cannot be accessed.

      This reviewer urges the authors to consider uploading the code onto an academic data repository from the many on offer (e.g. Dryad, Zenodo, OSF). Some repositories offer an option to share a private link (e.g. Zenodo) to the folder that can then be shared only with reviewers so it is not completely public.

      This is a computational paper, and the credibility of the results is based on the code used to generate them.

      The code is available at GitHub as required:

      https://github.com/omermazar/Colony-Exit-Bat-Simulation

      (2) Abstract:

      Line 22: 'To explore whether..' - replace 'whether' with 'how'?

      The sentence was rephrased as suggested by the reviewer.

      (2) Main text:

      Line 43: '...which may share...' - correct to '...which share...', as elegantly framed in the authors' previous work - jamming avoidance is unavoidable because all FM bats of a species still share >90% of spectral bandwidth despite a few kHz shift here and there.

      The sentence was rephrased as suggested by the reviewer.

      Line 49: The authors may wish to additionally cite the work of Fawcett et al. 2015 (J. Comp. Phys A & Biology Open)

      Thank you for the suggestion. We have included a citation to the work of Fawcett et al. (2015) in the revised manuscript.

      Line 61: This statement does not match the recent state of the literature. While the previous models may have assumed that all neighbours can be detected, there are models that specifically study the role of limited interaction arising from the potential inability to track all neighbours, and the effect of responding to only one/few neighbours at a time e.g. Bode et al. 2011 R. Soc. Interface, Jhawar et al. 2020 Nature Physics.

      We have added citations to the important studies suggested by the reviewer, as detailed in the Public Review above.

      Line 89: '..took all interference signals into account...' - what is meant by 'interference signals' - are the authors referring to reflections, unclear.

      We have revised the sentence and detailed the acoustic signals involved in the process: self-generated echoes, calls from conspecifics, and echoes from cave walls and other bats evoked by those calls, see lines 99-106.

      Figure 1A: The colour scheme with overlapping points makes the figure very hard to understand what is happening. The legend has colours from subfigures B-D, adding to the confusion.

      What does the yellow colour represent? This is not clear. Also, in general, the color schemes in the simulation trajectories and the legend are not the same, creating some amount of confusion for the reader. It would be good to make the colour schemes consistent and visually separable (e.g. consp. call direct is very similar to consp. echo from consp. call), and perhaps also if possible add a higher resolution simulation visualisation. Maybe it is best to separate out the colour legends for each sub-figure.

      The updated figure now includes clearer, more visually separable colors, and consistent color coding across all sub-panels. The yellow trajectory representing the focal bat’s flight path is now explicitly labeled, and we adjusted the color mapping of acoustic signals (e.g., conspecific calls vs. echoes) to improve distinction. We also revised the figure caption accordingly and ensured that the legend is aligned with the updated visuals. These modifications aim to enhance interpretability and reduce ambiguity for the reader.

      Figure C3: What is 'FB Channel', this is not explained in the legend.

      FB Channel’ stands for ‘Filter Bank Channel’. This clarification has been added to the caption of Figure 1. 

      Figure 3: Visually noticing that the colour legend is placed only on sub-figure A is tricky and readers may be left searching for the colour legend. Maybe lay out the legend horizontally on top of the entire figure, so it stands out?

      We have adjusted the placement of the color legend in Figure 3 to improve visibility and consistency.

      Line 141: '..the probability of exiting..' - how is this probability calculated - not clear.

      We have clarified in the revised text that the probability of exiting the cave within 15 seconds is defined as the number of bats that exited the cave within that time divided by the total number of bats in each scenario, see lines 159160.

      Line 142: What are the sample sizes here - i.e. how many simulation replicates were performed?

      We have clarified the number of repetitions in each scenario the revised text, as detailed in the Public Review above.

      Line 151: 'The jamming probability,...number of jammed echoes divided by the total number of reflected echoes' - it seems like these are referring to 'own' echoes or first-order reflections, it is important to clarify this.

      The reviewer is right. We have clarified it in the revised text, see lines 173175.

      Line 153: '..with a maximum difference of ...' - how is this difference calculated? What two quantities are being compared - not clear.

      We have revised the text to clarify that the 14.3% value reflects the maximum difference in jamming probability between the RM and PK models, which occurred at a density of 10 bats. The values at each density are shown in Figure 2D, see lines 175-177.

      Line 221: '..temporal aggregation helps..' - I'm assuming the authors meant temporal integration? However, I would caution against using the exact term 'temporal integration' as it is used in the field of audition to mean something different. Perhaps something like 'sensory integration' , or 'multi-call integration'

      To avoid ambiguity and better reflect the process modeled in our work, we have replaced the term "temporal aggregation" with "multi-call integration" throughout the revised manuscript. This term more accurately conveys the idea of combining information from multiple echolocation calls without conflicting with existing terminology.

      (4) Discussion

      Lines 302: 'Our model suggests...increasing the call-rate..' - not clear where this is explicitly tested or referred to in this manuscript. Can't see what was done to measure/quantify the effect of this variable in the Methods or anywhere else.

      We have rephrased this paragraph as detailed in the Public Review above, see lines 346-349.

      Line 319: 'spatial interference' - unclear what this means. This reviewer would strongly caution against creating new terms unless there is an absolute need for it. What is meant by 'interference' in this paper is hard to assess given that the word seems to be used as a synonym for jamming and also for actual physical wave-based interference.

      We have rephrased this paragraph as detailed in the Public Review above, see line 119-120, 366-367.

      Line 323: '..no benefit beyond a certain level...' - also not clear where this is explicitly tested. It seems like there was a set of simulations run for a variety of parameters but this is not written anywhere explicitly. What type of parameter search was done, was it all possible parameter combinations - or only a subset? This is not clear.

      We have rephrased this paragraph as detailed in the Public Review above, see lines 372-375.

      Line 324: '..ca. 110 dB-SPL.' - what reference distance?

      All call levels were simulated and reported in dB-SPL, referenced at 0.1 meters from the emitting bat. We have clarified it in the revised text in the relevant contexts and specifically in line 529.

      (5) Methods

      Line 389 : '...over a 2 x 1.5 m2 area..' It took a while to understand this statement and put it in context. Since there is no previous description of the entire L-arena, the reviewer took it to mean the simulations happened over the space of a 2 x 1.5 m2 area. Include a top-down description of the simulation's spatial setup and rephrase this sentence.

      To address the confusion, we revised the text to clarify that the full simulation environment represents a corridor-shaped cave measuring 14.5 × 2.5 meters, with a right-angle turn located 5.5 meters before the exit, as shown in Figure 1A. The 2 × 1.5 m area refers specifically to the small zone at the far end of the cave where bats begin their flight. The revised description now includes a clearer spatial overview to prevent ambiguity, see lines 456-460.

      Line 398: Replace 'High proximity' with 'Close proximity'

      Replaced.

      Line 427: 'uniform target strength of -23 dB' - at what distance is this target strength defined? Given the reference distance can vary by echolocation convention (0.1 or 1 m), one can't assess if this is a reasonable value or not.

      The reference distance for the reported target strength is 1 meter, in line with standard acoustic conventions. We have revised the text to clarify this explicitly (line 531).

      Also, independent of the reference distance, particularly with reference to bats, the target strength is geometry-dependent, based on whether the wings are open or not. Using the entire wingspan of a bat to parametrise the target strength is an overestimate of the available reflective area. The effective reflective area is likely to be somewhere closer to the surface area of the body and a fraction of the wingspan together. This is important to note and/or mention explicitly since the value is not experimentally parametrised.

      For comparison, experimentally based measurements used in Goetze et al. 2016 are -40 dB (presumably at 1 m since the source level is also defined at 1 m?), and Beleyur & Goerlitz 2019 show a range between -43 to -34 dB at 1 m.

      We agree with the reviewer that target strength in bats is strongly influenced by their geometry, particularly wing posture during flight. In our model, we simplified this aspect by using a constant target strength, as the detailed temporal variation in body and wing geometry is pseudo-random and not explicitly modeled. We acknowledge that this is a simplification, and have now stated this limitation clearly in the revised manuscript. We chose a fixed value of –23 dB at 1 meter to reflect a plausible mid-range estimate, informed by anatomical data and consistent with values reported for similarly sized species (Beleyur and Goerlitz, 2019). To support this, we directly measured the target strength of a 3D-printed RM bat model, obtaining –32dB. 

      Moreover, a sensitivity analysis across a wide range (–49 to –23 dB) confirmed that performance metrics remain largely stable, indicating that our conclusions are not sensitive to this parameter, and suggesting that our results hold for different-sized bats. See lines 384-390, 533-538, and Supplementary Figures 3 and 4 in the revised article. 

      Line 434: 'To model the bat's cochlea...'. Bats have two cochleas. This model only describes one, while the agents are also endowed with the ability to detect sound direction - which requires two ears/cochleas.... There is missing information about the steps in between that needs to be provided.

      We appreciate the reviewer’s observation. Indeed, our model is monaural, and simulates detection using a single cochlear-like filter bank receiver. We have clarified this in the revised text to avoid confusion. This paragraph specifically describes the detection stage of the auditory processing pipeline. The localization process, which builds on detection and includes directional estimation, is described in the following paragraph (see line 583 onward), as discussed in the next comment and response.

      Line 457: 'After detection, the bat estimates the range and Direction of Arrival...' This paragraph describes the overall idea, but not the implementation. What were the inputs and outputs for the range and DOA calculation performed by the agent? Or was this information 'fed' in by the simulation framework? If there was no explicit DOA step that the agent performed, but it was assumed that agents can detect DOA, then this needs to be stated.

      In the current simulation, the Direction of Arrival (DOA) was not modeled via an explicit binaural processing mechanism. Instead, based on experimental studies (Simmons et al., 1983; Popper and Fay, 1995).  we assumed that bats can estimate the direction of an echo with an angular error that depends on the signal-to-noise ratio (SNR). Accordingly, the inputs to the DOA estimation were the peak level of the desired echo, noise level, and the level of acoustic interference. The output was an estimated direction of arrival that included a random angular error, drawn from a normal distribution whose standard deviation varied with the SNR. We have revised the relevant paragraph (Lines 583-592) to clarify this implementation.

      Line 464: 'To evaluate the impact of the assumption...' - the 'self' and 'non-self' echoes can be distinguished perhaps using pragmatic time-delay cues, but also using spectro-temporal differences in individual calls/echoes. Do the agents have individual call structures, or do all the agents have the same call 'shape'? The echolocation parameters for the two modelled species are given, but whether there is call parameter variation implemented in the agents is not mentioned.

      In our relatively simple model, all individuals emit the same type of chirp call, with parameters adapted only based on the distance to the nearest detected object. However, individual variation is introduced by assigning each bat a terminal frequency drawn from a normal distribution with a standard deviation of 1 kHz, as described in the revised version -lines 519-520. This small variation is not used explicitly as a spectro-temporal cue for echo discrimination.

      In our model, all spectro-temporal variations—whether due to call structure or variations resulting from overlapping echoes from nearby reflectors—are processed through the filter bank, which compares the received echoes to the transmitted call during the detection stage. As such, the detection process itself can act as a discriminative filter, to some extent, based on similarity to the emitted call.

      We acknowledge that real bats likely rely on a variety of spectro-temporal features for distinguishing self from non-self-echoes—such as call duration, received level, multi-harmonic structure, or amplitude modulation. In our simulation, we focus on comparing two limiting conditions: full recognition of self-generated echoes versus full confusion. Implementing a more nuanced self-recognition mechanism based on temporal or spectral cues would be a valuable extension for future work.

      (6) References

      Reference 22: Formatting error - and extra '4' in the reference.

      The error has been fixed.

      (7) Thoughts/comments

      Even without 'recogntion' of walls & conspecifics, bats may be able to avoid obstacles - this is a neat result. Also, using their framework the authors show that successful 'blind' object-agnostic obstacle avoidance can occur only when supported by some sort of memory. In some sense, this is a nice intermediate step showing the role of memory in bat navigation. We know that bats have good long-term and long-spatial scale memory, and here the authors show that short-term spatial memory is important in situations where immediate sensory information is unreliable or unavailable.

      We appreciate the reviewer’s thoughtful summary. Indeed, one of the main takeaways of our study is that successful obstacle avoidance can occur even without explicit recognition of walls or conspecifics—provided that a clustered multi-call integration is in place. Our model shows that when immediate sensory information is unreliable, integrating detections over time becomes essential for effective navigation. This supports the broader view that memory, even on short timescales, plays an important role in bat behavior.

      (8) Reporting GLM results

      The p-value, t-statistic, and degrees of freedom are reported consistently across multiple GLM results. However, the most important part which is the effect size is not consistently reported - and this needs to be included in all results, and even in the table. The effect size provides an indicator of the parameter's magnitude, and thus scientific context.

      We agree that the effect size provides essential scientific context. In fact, we already include the effect size explicitly in Table 1, as shown in the “Effect Size” column for each tested parameter. These values describe the magnitude of each parameter’s effect on exit probability, jamming probability, and collision rate. In the main text, effect sizes are presented as concrete changes in performance metrics (e.g., “exit probability increased from 20% to 87%,” or “with a decrease of 3.5%±8% to 5.5%±5% (mean ± s.e.)”), which we believe improves interpretability and scientific relevance.  

      To further clarify this in the main text, we have reviewed the reported results and ensured that effect sizes are mentioned more consistently wherever GLM outcomes are discussed. Additionally, we have added a brief note in the table caption to emphasize that effect sizes are provided for all tested parameters.

      The 'tStat' appears multiple times and seems to be the output of the MATLAB GLM function. This acronym is specific to the MATLAB implementation and needs to be replaced with a conventionally used acronym such as 't', or the full form 't-statistic' too. This step is to keep the results independent of the programming language used.

      We have replaced all instances of tStat with the more conventional term ‘t’ throughout the manuscript to maintain consistency with standard reporting practices.

      Reviewer #2 (Recommendations for the authors):

      In addition to my public review, I had a few minor points that the authors may want to consider when revising their paper.

      (1) Figures 2, 3, and 4 may benefit from using different marker styles, in addition to different colors, to show the different cases.

      Thank you for the suggestion. In Figures 2–4, the markers represent means with standard error bars. To maintain clarity and consistency across all conditions, we have chosen to keep a standardized marker style – and we clarify this in the legend. We found that varying only the colors is sufficient for distinguishing between conditions without introducing visual clutter.

      (2) The text "PK" in the inset for Figure 2A is very difficult to read. I would suggest using grey as with "RM" in the other inset.

      We have updated the insert in Figure 2A to improve legibility.

      (3) Are the error bars in Figure 3 very small? I wasn't able to see them. If that is the case, the authors may want to mention this in the caption.

      You are correct—the error bars are present in all plots but appear very small due to the large number of simulation repetitions and low variability. We have revised the caption to explicitly mention this.

      (4) The species name of PK is spelled inconsistently (kuhli, khulli, and kuhlii).

      We have corrected the species name throughout the manuscript.

      (5) Table 1 is a great condensation of all the results, but the time to exit is missing. It may be helpful if summary statistics on that were here as well.

      We have added time-to-exit to the effect size column in Table 1, alongside the other performance metrics, to provide a more complete summary of the simulation results.

      (6) I may have missed it, but why are there two values for the exit probability when nominal flight speed is varied?

      The exit probability was not monotonic with flight speed, but rather showed a parabolic trend with a clear optimum. Therefore, we reported two values representing the effect before and after the peak. We have clarified this in the revised table and updated the caption accordingly.

      (7) Table 2 has an extra header after the page break on page 18.

      The extra header in Table 2 after the page break has been removed in the revised manuscript.

      (8) The G functions have 2 arguments in their definitions and Equation 1, but only one argument in Equations 2 and 3. I wasn't able to see why.

      Thank you for pointing this out. You are correct—this was a typographical error. We have corrected the argument notation in Equations 2 and 3 and explicitly included the frequency dependence of the gain (G) functions in both equations.

      (9) D_txrx was not defined but it was used in Equation 2.

      The variable D_txrx is defined in the equation notation section as: D<sub>₍ₜₓ</sub>r<sub>ₓ</sub> – the distance [m] between the transmitting conspecific and the receiving focal bat, from the transmitter’s perspective. We have now ensured that this definition is clearly linked to Equation 2 in the revised text. Moreover, we have added a supplementary figure that illustrates the geometric configuration defined by the equations to further support clarity, as described in the Public Review above.

      (10) It was hard for me to understand what was meant by phi_rx and phi_tx. These were described as angles between the rx or tx bats and the target, but I couldn't tell what the point defining the angle was. Perhaps a diagram would help, or more precise definitions.

      We have revised the caption to provide clearer and more precise definitions Additionally, we have included a geometric diagram as a supplementary figure, as noted in the Public Review above, to visually clarify the spatial relationships and angle definitions used in the equations, see lines 498-499.

      (11) Was the hearing threshold the same for both species?

      Yes. We have clarified it in the revised version.

      (12) Collision avoidance is described as turning to the "opposite direction" in the supplemental figure explaining the model. Is this 90 degrees or 180 degrees? If 90 degrees, how do these turns decide between right and left?

      In our model, the bat does not perform a fixed 90° or 180° turn. Instead, the avoidance behavior is implemented by setting the maximum angular velocity in the direction opposite to the detected echo. For example, if the obstacle or conspecific is detected on the bat’s right side, the bat begins turning left, and vice versa.

      This turning direction is re-evaluated at each decision step, which occurs after every echolocation pulse. The bat continues turning in the same direction if the obstacle remains in front, otherwise it resumes regular pathfinding. We have clarified this behavior in the updated figure caption and model description, see lines 478-493.

      Reviewer #3 (Recommendations for the authors):

      (1) Lines 27-31: These sentences mischaracterize the results. This claim appears to equate "the model works" with "this is what bats actually do." Also, the model does not indicate that bats' echolocation strategies are robust enough to mitigate the effects of jamming - this is self-evident from the fact that bats navigate successfully via echolocation in dense groups.

      Thank you for the comment. Our aim was not to claim that the model confirms actual bat behavior, but rather to demonstrate that simple and biologically plausible strategies—such as signal redundancy and basic pathfinding—are sufficient to explain how bats might cope with acoustic interference in dense settings. We have revised the wording to better reflect this goal and to avoid overinterpreting the model's implications.

      See abstract in the revised version.  

      (2) Line 37: This number underestimates the number of bats that form some of the largest aggregations of individuals worldwide - the free-tailed bats can form aggregations exceeding several million bats.

      We have revised the text to reflect that some bat species, such as free-tailed bats, are known to form colonies of several million individuals, which exceed the typical range. The updated sentence accounts for these extreme cases, see lines 36-37.

      (3) The flight densities explained in the introduction and chosen references are not representative of the literature - without providing additional justification for the chosen species, it can be interpreted that the selection of the species for the simulation is somewhat arbitrary. If the goal is to model dense emergence flight, why not use a species that has been studied in terms of acoustic and flight behavior during dense emergence flights---such as Tadarida brasiliensis?

      Our goal was to develop a general model applicable to a broad class of FMecholocating bat species. The two species we selected—Pipistrellus kuhlii (PK) and Rhinopoma microphyllum (RM)—span a wide range of signal characteristics: from wideband (PK) to narrowband (RM), providing a representative contrast in call structure. 

      Although we did not include Tadarida brasiliensis (TB) specifically, its echolocation calls are acoustically similar to RM in terminal frequency and fall between PK and RM in bandwidth. Therefore, we believe our findings are likely to generalize to TB and other FM-bats.

      Moreover, as noted in a previous response, the average inter-bat distance in our highest-density simulations (0.27 m) is still smaller than those reported for Tadarida brasiliensis during dense emergences—further supporting the relevance of our model to such scenarios.

      To support broader applicability, we also provide a supplementary graphical user interface (GUI) that allows users to modify key echolocation parameters and explore their impact on behavior—making the framework adaptable to additional species, including TB.

      (4) Line 78: It is not clear how (or even if) the simulated bats estimate the direction of obstacles. The explanation given in lines 457-463 is quite confusing. What is the acoustic/neurological mechanism that enables this direction estimation? If there is some mechanism (such as binaural processing), how does this extrapolate to 3D?

      This comment echoes a similar concern raised by a previous reviewer. As explained earlier, in the current simulation, the Direction of Arrival (DOA) was not modeled via an explicit binaural processing mechanism. The complete  is detailed in  to Reviewer #1, Line 457. This implementation is now clarified in the revised text, and a detailed description of the localization process is also provided in the Methods section (lines 583-592).

      (5) The authors propose they are modeling the dynamic echolocation of bats in the simulation (line 79), but it appears (whether this is due to a lack of information in the manuscript or true lack in the simulation) that the authors only modeled a flight response. How did the authors account for bats dynamically changing their echolocation? This is unclear and from what I can tell may just mean that the bats can switch between foraging phase call types depending on the distance to a detected obstacle. Can the authors elaborate more on this?

      The echolocation behavior of the bats—including dynamic call adjustments— was implemented in the simulation and is described in detail in the Methods section (lines 498-520 and Table 2). To avoid redundancy, the Results chapter originally referred to this section, but we have now added a brief explanation in the Results to clarify that the bats’ call parameters (IPI, duration, and frequency range) adapt based on the distance to detected objects, following empirically documented echolocation phases ("search," "approach," "buzz"). These dynamics are consistent with established bat behavior during navigation in cluttered environments such as caves.

      (6) Figure 1 C3: "Detection threshold": what is this and how was it derived?

      The caption also mentions yellow arrows, but they are absent from the figure. C4: Each threshold excursion is marked with an asterisk, but there are many more excursions than asterisks. Why are only some marked? Unclear.

      C3: The detection threshold is determined dynamically. It is set to the greater of either 7 dB above the noise level (0 dB-SPL)(Kick, 1982; Saillant et al., 1993; Sanderson et al., 2003; Boonman et al., 2013) or the maximal received level minus 70 dB, effectively applying a dynamic range of 70 dB. This clarification has been added to the Methods section. The yellow arrow has been added.

      C4: Thank you for this important observation. Only peaks marked with asterisks represent successful detections—those that were identified in both the interference-free and full detection conditions, as explained in the Methods. Other visible peaks result from masking signals or overlapping echoes from nearby reflectors, but they do not meet the detection criteria. To keep the figure caption concise, we have elaborated on this process more clearly in the revised Methods section. We added this information to the legend

      (7) Figure 2: A line indicating RM, No Masking is absent

      Thank you for pointing this out. The missing line for RM, No Masking has now been added in the revised version of Figure 2.

      (8) Line 121: "reflected off conspecifics". Does this mean echoes due to conspecifics?

      The phrase "reflected off conspecifics" refers to echoes originating from the bat’s own call and reflected off the bodies of nearby conspecifics. We have clarified the wording in the revised text to avoid confusion

      (9) Line 125: Why are low-frequency channels stimulated by higher frequencies? This needs further clarification.

      The cochlear filter bank in our model is implemented using gammatone filters, each modeled as an 8th-order Butterworth filter. Due to the non-ideal filter response and relatively broad bandwidths—especially in the lower-frequency channels—strong energy from the beginning of the downward FM chirp (at higher frequencies) can still produce residual activation in lower-frequency channels. While these stimulations are usually below the detection threshold, they may still be visible as early sub-threshold responses. Given the technical nature of this explanation (a property of the filter implementation) and it does not influence the detection outcomes, we have chosen not to elaborate on it in the figure caption or Methods.

      (10) Lines 146-150: This is an interesting finding. Is there a theoretical justification for it?

      This outcome arises directly from the simulation results. As noted in the Discussion (lines 359-365), although Pipistrellus kuhlii (PK) shows a modest advantage in jamming resistance due to its broader bandwidth, the redundancy in sensory information across calls—enabled by frequent echolocation—appears to compensate for these signal differences. As a result, the small variations in echo quality between species do not translate into significant differences in performance. We speculate that if the difference in jamming probability had been larger, performance disparities would likely have emerged.

      (11) Line 151: The authors define a jammed echo as an echo entirely missed due to masking. Is this appropriate? Doesn't echo mis-assignment also constitute jamming?

      We agree that echo mis-assignment can also degrade performance; however, in our model, we distinguish between two outcomes: (1) complete masking (echo not detected), and (2) detection with a localization error. As explained in the Methods (lines 500–507), we run the detection analysis twice—once with only desired echoes (“interference-free detection”) and once including masking signals (“full detection”). If a previously detected echo is no longer detected, it is classified as a jammed echo. If the echo is still detected but the delay shifts by more than 100 µs compared to the interference-free condition, it is also considered jammed. If the delay shift is smaller, it is treated as a detection with localization error rather than full jamming. We have clarified this distinction in the revised Methods section.

      (12) Figure 2-E: Detection probability statistics are of limited usefulness without accompanying false alarm rate (FAR) statistics. Do the authors have FAR numbers?

      We understand FAR to refer to instances where masking signals or other acoustic phenomena are mistakenly interpreted as real echoes from physical objects. As explained in the manuscript, we implemented two model versions: one without confusion, and one with full confusion.

      Figure 2E reports detection performance under the non-confusion model, in which only echoes from actual physical reflectors are used, and no false detections occur—hence, the false alarm rate is effectively zero in this condition. In the full-confusion model, all detected echoes—including those originating from masking signals or conspecific calls—are treated as valid detections, which may include false alarms. However, we did not explicitly quantify the false alarm rate as a separate metric in this simulation.

      We agree that tracking FAR could be informative and will consider incorporating it into future versions of the model.

      (13) Line 161: RM bats suffered from a significantly higher probability of the "desired conspecific's echoes" being jammed. What does "desired conspecific's echoes" mean? This is unclear.

      The term “desired conspecific's echoes” refers to echoes originating from the bat’s own call, reflected off nearby conspecifics, which are treated as relevant reflectors for collision avoidance. We have revised the wording in the text for clarity.

      (14) Line 188: Why didn't the size of the integration window affect jamming probability? I couldn't find this explained in the discussion.

      The jamming probability in our analysis is computed at the individual-echo level, prior to any temporal integration. Since the integration window is applied after the detection step, it does not influence whether a specific echo is masked (i.e., jammed) or not. Therefore, as expected, we did not observe a significant effect of integration window size on jamming probability.

      (15) Line 217-218: Why do the authors think this would be?

      Thank you for the thoughtful question. We agree that, in theory, increasing call intensity should raise the levels of both desired echoes and masking signals proportionally. However, in our model, the environmental noise floor and detection threshold remain constant, meaning that higher call intensities increase the signal-to-noise ratio (SNR) more effectively for weaker echoes, especially those at longer distances or with low reflectivity. This could lead to a higher likelihood of those echoes crossing the detection threshold, resulting in a small but measurable reduction in jamming probability.

      Additionally, the non-linear behavior of the filter-bank receiver—including such as thresholding at multiple stages—can introduce asymmetries in how increased signal levels affect the detection of target versus masking signals.

      That said, the effect size was small, and the improvement in jamming probability did not translate into any significant gain in behavioral performance (e.g., exit probability or collision rate), as shown in Figure 3C.

      (16) Line 233: I'm not sure I understand how a slightly improved aggregation model that clustered detected reflectors over one-second periods is different. Doesn't this just lead to on average more calls integrated into memory?

      While increasing the memory duration does lead to more detections being available, the enhanced aggregation model (we now refer to as multi-call clustering) differs fundamentally from the simpler one. As detailed in the Methods, it includes additional processing steps: clustering spatially close detections, removing outliers, and estimating wall directions based on the spatial structure of clustered echoes. In contrast, the simpler model treats each detection as an isolated point without estimating obstacle orientation. These additional steps allow for more robust environmental interpretation and significantly improve performance under high-confusion conditions. We have clarified it in revised text (lines 606-616) and added a Supplementary Figure 2B.

      (17) Table 1: What about conspecific target strength?

      We have now added the conspecific target strength as a tested parameter in Table 1, along with its tested range, default value, and measured effect sizes. A detailed sensitivity analysis is also presented in Supplementary Figure 4, demonstrating that variations in conspecific target strength had relatively minor effects on performance metrics.  

      (18) Figure 3-A: The x-axis is the number of calls in the integration window. But the leftmost sample on each curve is at 0 calls. Shouldn't this be 1?

      “0 calls” refers to the case where only the most recent call is used for pathfinding—without integrating any information from prior calls. The x-axis reflects the number of previous calls stored in memory, so a value of 0 still includes the current call. We’ve clarified this terminology in the figure caption.

      (19) Lines 282-283: This statement needs to be clarified that it is with the constraints of using a 2D simulation with at most 33 bats/m^2. It also should be clarified that it is assumed the bat can reliably distinguish between its own echoes and conspecific echoes, which is a very important caveat.

      We have revised the text to clarify that the results are based on a 2D simulation with a maximum tested density of 33 bats/m². We also now explicitly state that the model assumes bats can distinguish between their own echoes and those generated by conspecifics—an assumption we recognize as a simplification. These clarifications help place the results within the scope and constraints of the simulation. Moreover, as described in the text (and noted in previous response): the average distance to the nearest bat in our simulation is 0.27m (with 100 bats), whereas reported distances in very dense colonies are 0.5m

      (20) Line 294: What is this sentence referring to?

      The sentence refers to the finding that, even under high bat densities, a substantial portion of the echoes—particularly those reflected from nearby obstacles (e.g., 1 m away)—were jammed due to masking. Nevertheless, the bats in the simulation were still able to navigate successfully using partial sensory input. We have clarified the sentence in the revised text to make this point more explicit, see line 333-336.

      (21) Line 302: Was jamming less likely when IPI was higher or lower? I could not find this demonstrated anywhere in the manuscript.

      We agree that the original text was not sufficiently clear on this point. While we did not explicitly test fixed IPI values as a parameter, the model does simulate the natural behavior of decreasing IPI as bats approach obstacles. This behavior is supported by empirical observations and is incorporated into the echolocation dynamics of the simulation. We have clarified this point in the revised text (see Lines 346-351) and explained that while lower IPI introduces more acoustic overlap, it also increases redundancy and improves detection through temporal integration.

      (22) Lines 313-314: This is an interesting assumption, but it is not evident that is substantiated by the references.

      The claim is based on well-established principles in signal processing and bioacoustics. Wideband signals—such as those emitted by PK bats— distribute their energy over a broader frequency range, which makes them inherently more resistant to narrowband interference and masking. This concept is commonly applied in both biological and artificial sonar systems and is supported by empirical studies in bats and theory in acoustic sensing.

      For example, Beleyur & Goerlitz (2019) demonstrate that broader bandwidth calls improve detection in cluttered and jamming-prone environments. Similarly, Ulanovsky et al. (2004) and Schnitzler & Kalko (200) discuss how FM bats' wideband calls enhance temporal and spatial resolution, helping to reduce the impact of overlapping signals from conspecifics. These findings align with communication theory where spread-spectrum techniques improve robustness in noisy environments.

      We agree with the reviewer that this is an important point and we have updated the manuscript to clarify this rationale and cite the relevant literature accordingly – lines 631-363,

      (23) Lines 318-319: What is the justification for "probably"? Isn't this just a supposition?

      We agree with the reviewer’s point and have rephrased the sentence

      (24) Line 320: How does this 63% performance match the sentence in line 295?

      The sentence in Line 295 refers to the overall ability of the bats to navigate successfully despite high jamming levels, highlighting the robustness of the strategy under challenging conditions. The figure in Line 320 (63%) quantifies this performance under the most extreme simulated scenario (100 bats / 3 m²), where both spatial and acoustic interferences are maximal. We have rephrased the text in the revised version (lines 324-327).

      (25) Lines 341-345: It seems like this is more likely to be the main takeaway of the paper.

      As noted in the Public Review above, there is substantial literature supporting the assumption that bats can recognize their own echoes and distinguish them from those of conspecifics (e.g., Schnitzler, Bioscience, 2001; Kazial et al., 2001, 2008; Burnett & Masters, 2002; Chiu et al., 2009; Yovel et al., 2009; Beetz & Hechavarría, 2022). Therefore, we consider our assumption of selfrecognition to be well-supported, at least under typical conditions. That said, we agree that the impact of echo confusion on performance is significant and highlights a critical challenge in dense environments.

      To our knowledge, this is the first computational model to explicitly simulate both self-recognition and full echo confusion under high-density conditions. We believe that the combination of modeled constraints and the demonstrated robustness of simple sensorimotor strategies, even under worst-case assumptions, is what makes this contribution both novel and meaningful.

      (26) Lines 349-350: What is the aggregation model? What is meant by "integration"?

      We have revised the text to clarify that the “aggregation model” refers to a multi-call clustering process that includes clustering of detections, removal of outliers, and estimation of wall orientation, as described in detail in the revised Methods and Results sections.

      (27) Line 354: Again, why isn't this the assumption we're working under?

      As addressed in our response to Comment 25, our primary model assumes that bats can recognize their own echoes—an assumption supported by substantial empirical evidence. The alternative "full confusion" model was included to explore a worst-case scenario and highlight the behavioral consequences of failing to distinguish self from conspecific echoes. We assume that real bats may experience some degree of echo misidentification; however, our assumption of full confusion represents a worst-case scenario.

      (28) Line 382: "Under the assumption that..." I agree that bats probably can, but if we assume they can differentiate them all, where's the jamming problem?

      The assumption that bats can theoretically distinguish between different signal sources applies after successful detection. However, the jamming problem arises during the detection and localization stages, where acoustic interference can prevent echoes from crossing the detection threshold or distort their timing.

      (29) Lines 386-387: The paper referenced focused on JAR in the context of foraging. What changes were made to the simulation to switch to obstacle avoidance?

      While the simulation framework in Mazar & Yovel (2020) was developed to study jamming avoidance during foraging, the core components—such as the acoustic calculations, receiver model, and echolocation behavior—remain applicable. For the current study, we adapted the simulation extensively to address colony-exit behavior. These modifications include modeling cave walls as acoustic reflectors, implementing a pathfinding algorithm, integrating obstacle-avoidance maneuvers, and adapting the integration window and integration processes. These updates are detailed throughout the Methods section.

      (30) Line 400-402: Something doesn't add up with the statement: each decision relies on an integration window that records estimated locations of detected reflectors from the last five echolocation calls, with the parameter being tested between 1 and 10 calls. Can the authors reword this to make it less confusing?

      We have reworded the sentence to clarify that the default integration window includes five calls, while we systematically tested the effect of using 1 to 10 calls, see lines 486-487.

      (31) Line 393: "30 deg/sec" why was this value chosen?

      The turning rate of 30 deg/sec was manually selected to approximate the curvature of natural foraging flight paths observed in Rhinopoma microphyllum using on-board tags. Moreover, in Mazar & Yovel (2020), we showed that the flight dynamics of simulated bats in a closed room closely matched those of Pipistrellus kuhlii flying in a room of similar dimensions. However, in the current simulation, bats rarely follow a random-walk trajectory due to the structured environment and frequent obstacle detection. As a result, this parameter has no meaningful impact on the simulation outcomes.

      (32) Line 412: "Harmony" --- do you mean harmonic? And what is the empirical evidence that RM bats use the 2nd harmonic compared to the 1st?

      Perhaps showing a spectrogram of a real RM signal would be helpful.

      The typo-error was corrected. For reference See (Goldshtein et al., 2025)

      (33) Table 2: Something is incorrect with the table. The first row on the next page is the wrong species name. Also, where are the citations for these parameter values?

      The table header has been corrected in the revised version. The parameter values for flight and echolocation behavior were derived from existing literature and empirical data: Pipistrellus kuhlii parameters were based on Kalko (1995), and Rhinopoma microphyllum parameters were extracted from our own recordings using on-board tags, as described in Goldstein et al. (2025). We have added the appropriate citations to Table 2.

      (34) Line 442: How was the threshold level chosen?

      The detection threshold in each level is set to the greater of either 7 dB above the noise level (0 dB-SPL) or the maximal received level minus 70 dB, effectively applying a dynamic range of 70 dB.

      (35) Line 445: 100 micros: This is about 3cm. The resolution of PK is about 1cm. For RM it's about 10cm. So, this window is generous for PK, but too strict for RM.

      To keep the model simple and avoid introducing species-specific detection thresholds, we selected a biologically plausible compromise that could reasonably apply to both species. This simplification ensures consistency across simulations while remaining within the known behavioral range.

      (36) Line 448: What is the spectrum of the Gaussian noise, and did it change between PK and RM?

      We used the same white Gaussian noise with a flat spectrum across the relevant frequency range (10–80 kHz) for both species. We have clarified this in the revised text in lines 570-572.

      (37) Line 451: 4 milliseconds is 1.3m. Is this appropriate?

      The 4 milliseconds window was selected based on established auditory masking thresholds described in Mazar & Yovel (2020), and supported by (Popper and Fay, 1995) ch. 2.4.5, ((Blauert, 1997),  ch. 3.1 and (Mohl and Surlykke, 1989). These values provide conservative lower bounds on bats’ ability to cope with masking (Beleyur and Goerlitz, 2019). For simplicity, we used constant thresholds within each window, see lines 574-576.  

      (38) Line 452: Citation for the forward and backward masking durations?

      See the  to the previous comment.

      (39) Lines 460-461: This is unclear. How does the bat get directional information? The authors claim to be able to measure direction-of-arrival for each detection, but it is not clear how this is done

      As noted in our response to Reviewer 1 (Comment on Line 457), directional information is not computed via an explicit binaural model. Instead, we assume the bat estimates the direction of arrival with an angular error that depends on the SNR, based on established studies (e.g., Simmons et al., 1983; Popper & Fay, 1995). We have clarified this in the revised text in lines 583-592.

      (40) Line 467: It seems like the authors are modeling pulse-echo ambiguity, at least in this one alternative model, which is good! However the alternative model doesn't get much attention in the paper. Is there a reason for this?

      We would like to clarify that we did not model pulse-echo. In our confusion model, all echoes received within the IPI are attributed to the bat’s most recent call. This includes echoes that may in fact originate from conspecific calls, but the model does not assign self-echoes to earlier pulses or span multiple IPIs. Therefore, while the model captures echo confusion, it does not include true pulse-echo ambiguity. We have clarified this point in the revised text in lines 551-553.

      (41) Line 41: "continuous" is more appropriate than "constant".

      Thank you, we have rephrased the text accordingly.

      (42) Line 69: "band width" should be one word.

      Thank you, we have corrected it to “bandwidth”.

      (43) Line 79: "bats" should be in the possessive.

      Thank you, the text has been rephrased.

      (44) Line 128: "convoluted" don't you mean "convolved"?

      We have replaced “convoluted” with the correct term “convolved” in the revised text.

      (45) Please check your references, as there are some incomplete citations and typos.

      Thank you, we have reviewed and corrected all references for completeness and consistency.

      References

      Beetz, M.J. and Hechavarría, J.C. (2022) ‘Neural Processing of Naturalistic Echolocation Signals in Bats’, Frontiers in Neural Circuits, 16, p. 899370. Available at: https://doi.org/10.3389/FNCIR.2022.899370/BIBTEX.

      Beleyur, T. and Goerlitz, H.R. (2019) ‘Modeling active sensing reveals echo detection even in large groups of bats’, Proceedings of the National Academy of Sciences of the United States of America, 116(52), pp. 26662–26668. Available at: https://doi.org/10.1073/pnas.1821722116.

      Betke, M. et al. (2008) ‘Thermal Imaging Reveals Significantly Smaller Brazilian Free-Tailed Bat Colonies Than Previously Estimated’, Journal of Mammalogy, 89(1), pp. 18–24. Available at: https://doi.org/10.1644/07-MAMM-A-011.1.

      Blauert, J. (1997) ‘Spatial Hearing: The Psychophysics of Human Sound Localization (rev. ed.)’.

      Boerma, D.B. et al. (2019) ‘Wings as inertial appendages: How bats recover from aerial stumbles’, Journal of Experimental Biology, 222(20). Available at: https://doi.org/10.1242/JEB.204255/VIDEO-3.

      Boonman, A. et al. (2013) ‘It’s not black or white-on the range of vision and echolocation in echolocating bats’, Frontiers in Physiology, 4 SEP(September), pp. 1–12. Available at: https://doi.org/10.3389/fphys.2013.00248.

      Boonman, A.M., Parsons, S. and Jones, G. (2003) ‘The influence of flight speed on the ranging performance of bats using frequency modulated echolocation pulses’, The Journal of the Acoustical Society of America, 113(1), p. 617. Available at: https://doi.org/10.1121/1.1528175.

      Burnett, S.C. and Masters, W.M. (2002) ‘Identifying Bats Using Computerized Analysis and Artificial Neural Networks’, North American Symposium on Bat Research, 9.

      Chili, C., Xian, W. and Moss, C.F. (2009) ‘Adaptive echolocation behavior in bats for the analysis of auditory scenes’, Journal of Experimental Biology, 212(9), pp. 1392–1404. Available at: https://doi.org/10.1242/jeb.027045.

      Fujioka, E. et al. (2021) ‘Three-Dimensional Trajectory Construction and Observation of Group Behavior of Wild Bats During Cave Emergence’, Journal of Robotics and Mechatronics, 33(3), pp. 556–563. Available at: https://doi.org/10.20965/jrm.2021.p0556.

      Gillam, E.H. et al. (2010) ‘Echolocation behavior of Brazilian free-tailed bats during dense emergence flights’, Journal of Mammalogy, 91(4), pp. 967–975. Available at: https://doi.org/10.1644/09-MAMM-A-302.1.

      Goldshtein, A. et al. (2025) ‘Onboard recordings reveal how bats maneuver under severe acoustic interference’, Proceedings of the National Academy of Sciences, 122(14), p. e2407810122. Available at: https://doi.org/10.1073/PNAS.2407810122.

      Griffin, D.R., Webster, F.A. and Michael, C.R. (1958) ‘THE ECHOLOCATION OF FLYING INSECTS BY BATS ANIMAL BEHAVIOUR , Viii , 3-4’.

      Hagino, T. et al. (2007) ‘Adaptive SONAR sounds by echolocating bats’, International Symposium on Underwater Technology, UT 2007 - International Workshop on Scientific Use of Submarine Cables and Related Technologies 2007, pp. 647–651. Available at: https://doi.org/10.1109/UT.2007.370829.

      Hiryu, S. et al. (2008) ‘Adaptive echolocation sounds of insectivorous bats, Pipistrellus abramus, during foraging flights in the field’, The Journal of the Acoustical Society of America, 124(2), pp. EL51–EL56. Available at: https://doi.org/10.1121/1.2947629.

      Jakobsen, L. et al. (2024) ‘Velocity as an overlooked driver in the echolocation behavior of aerial hawking vespertilionid bats’. Available at: https://doi.org/10.1016/j.cub.2024.12.042. Jakobsen, L., Brinkløv, S. and Surlykke, A. (2013) ‘Intensity and directionality of bat echolocation signals’, Frontiers in Physiology, 4 APR(April), pp. 1–9. Available at: https://doi.org/10.3389/fphys.2013.00089.

      Jakobsen, L. and Surlykke, A. (2010) ‘Vespertilionid bats control the width of their biosonar sound beam dynamically during prey pursuit’, 107(31). Available at:

      https://doi.org/10.1073/pnas.1006630107.

      Kalko, E.K. V. (1995) ‘Insect pursuit, prey capture and echolocation in pipistrelle bats (Microchirptera)’, Animal Behaviour, 50(4), pp. 861–880.

      Kazial, K.A., Burnett, S.C. and Masters, W.M. (2001) ‘ Individual and Group Variation in Echolocation Calls of Big Brown Bats, Eptesicus Fuscus (Chiroptera: Vespertilionidae) ’, Journal of Mammalogy, 82(2), pp. 339–351. Available at: https://doi.org/10.1644/15451542(2001)082<0339:iagvie>2.0.co;2.

      Kazial, K.A., Kenny, T.L. and Burnett, S.C. (2008) ‘Little brown bats (Myotis lucifugus) recognize individual identity of conspecifics using sonar calls’, Ethology, 114(5), pp. 469– 478. Available at: https://doi.org/10.1111/j.1439-0310.2008.01483.x.

      Kick, S.A. (1982) ‘Target-detection by the echolocating bat, Eptesicus fuscus’, Journal of Comparative Physiology □ A, 145(4), pp. 431–435. Available at: https://doi.org/10.1007/BF00612808/METRICS.

      Kothari, N.B. et al. (2014) ‘Timing matters: Sonar call groups facilitate target localization in bats’, Frontiers in Physiology, 5 MAY. Available at: https://doi.org/10.3389/fphys.2014.00168.

      Mohl, B. and Surlykke, A. (1989) ‘Detection of sonar signals in the presence of pulses of masking noise by the echolocating bat , Eptesicus fuscus’, pp. 119–124.

      Moss, C.F. and Surlykke, A. (2010) ‘Probing the natural scene by echolocation in bats’, Frontiers in Behavioral Neuroscience. Available at: https://doi.org/10.3389/fnbeh.2010.00033.

      Neretti, N. et al. (2003) ‘Time-frequency model for echo-delay resolution in wideband biosonar’, The Journal of the Acoustical Society of America, 113(4), pp. 2137–2145. Available at: https://doi.org/10.1121/1.1554693.

      Popper, A.N. and Fay, R.R. (1995) Hearing by Bats. Springer-Verlag.

      Roy, S. et al. (2019) ‘Extracting interactions between flying bat pairs using model-free methods’, Entropy, 21(1). Available at: https://doi.org/10.3390/e21010042.

      Sabol, B.M. and Hudson, M.K. (1995) ‘Technique using thermal infrared-imaging for estimating populations of gray bats’, Journal of Mammalogy, 76(4). Available at: https://doi.org/10.2307/1382618.

      Saillant, P.A. et al. (1993) ‘A computational model of echo processing and acoustic imaging in frequency- modulated echolocating bats: The spectrogram correlation and transformation receiver’, The Journal of the Acoustical Society of America, 94(5). Available at: https://doi.org/10.1121/1.407353.

      Salles, A., Diebold, C.A. and Moss, C.F. (2020) ‘Echolocating bats accumulate information from acoustic snapshots to predict auditory object motion’, Proceedings of the National Academy of Sciences of the United States of America, 117(46), pp. 29229–29238. Available at: https://doi.org/10.1073/PNAS.2011719117/SUPPL_FILE/PNAS.2011719117.SAPP.PDF.

      Sanderson, M.I. et al. (2003) ‘Evaluation of an auditory model for echo delay accuracy in wideband biosonar’, The Journal of the Acoustical Society of America, 114(3), pp. 1648– 1659. Available at: https://doi.org/10.1121/1.1598195.

      Schnitzler, H., Bioscience, E.K.- and 2001, undefined (no date) ‘Echolocation by insecteating bats: we define four distinct functional groups of bats and find differences in signal structure that correlate with the typical echolocation ’, academic.oup.comHU Schnitzler, EKV KalkoBioscience, 2001•academic.oup.com [Preprint]. Available at: https://academic.oup.com/bioscience/article-abstract/51/7/557/268230 (Accessed: 17 March 2025).

      Schnitzler, H.-U. et al. (1987) ‘The echolocation and hunting behavior of the bat,Pipistrellus kuhli’, Journal of Comparative Physiology A, 161(2), pp. 267–274. Available at: https://doi.org/10.1007/BF00615246.

      Simmons, J.A. et al. (1983) ‘Acuity of horizontal angle discrimination by the echolocating bat , Eptesicus fuscus’. Simmons, J.A. and Kick, S.A. (1983) ‘Interception of Flying Insects by Bats’, Neuroethology and Behavioral Physiology, pp. 267–279. Available at: https://doi.org/10.1007/978-3-64269271-0_20.

      Surlykke, A., Ghose, K. and Moss, C.F. (2009) ‘Acoustic scanning of natural scenes by echolocation in the big brown bat, Eptesicus fuscus’, Journal of Experimental Biology, 212(7), pp. 1011–1020. Available at: https://doi.org/10.1242/JEB.024620.

      Theriault, D.H. et al. (no date) ‘Reconstruction and analysis of 3D trajectories of Brazilian free-tailed bats in flight’, cs-web.bu.edu [Preprint]. Available at: https://csweb.bu.edu/faculty/betke/papers/2010-027-3d-bat-trajectories.pdf (Accessed: 4 May 2023).

      Ulanovsky, N. and Moss, C.F. (2008) ‘What the bat’s voice tells the bat’s brain’, Proceedings of the National Academy of Sciences of the United States of America, 105(25), pp. 8491– 8498. Available at: https://doi.org/10.1073/pnas.0703550105. Vanderelst, D. and Peremans, H. (2018) ‘Modeling bat prey capture in echolocating bats : The feasibility of reactive pursuit’, Journal of theoretical biology, 456, pp. 305–314.

      Yovel, Y. et al. (2009) ‘The voice of bats: How greater mouse-eared bats recognize individuals based on their echolocation calls’, PLoS Computational Biology, 5(6). Available at: https://doi.org/10.1371/journal.pcbi.1000400.

      Yovel, Y. and Ulanovsky, N. (2017) ‘Bat Navigation’, The Curated Reference Collection in Neuroscience and Biobehavioral Psychology, pp. 333–345. Available at: https://doi.org/10.1016/B978-0-12-809324-5.21031-6.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Based on the below reviews, we propose the following revision plan. Briefly:

      • We will re-focus the manuscript on the developmental data providing a molecular and cellular blueprint __of lining macrophage development. The __novelty and relevance of our developmental data have been highlighted by all three reviewers, and they have also praised the rigor of these experiments and their interpretation. We thus believe that this re-focus will improve the manuscript's message.
      • We will include our data on CSF1 as a key signal. Whilst previously appreciated as a factor required for tissue-resident macrophages, including those in the joint, our study is the first to show the requirement of lining macrophages over a complete developmental time course, using modern readouts, and in a model that circumvents the limitations of previously used approaches (see point-by-point response for details).
      • However, we will remove the functional data on TGFβ signaling and mechanical loading/mechanosensing. We agree with the reviewers that we would need to generate additional histological and molecular data from conditional knockout mice, antibody and (ant)agonist treatments and the optogenetic model to determine their exact involvement in lining macrophage maturation. These experiments require significant time and other resources. We would therefore like to uncouple this question for a follow-on manuscript, and to re-focus the current study as a developmental atlas. Removal of (some) of these data has been suggested in the reviewers' comments as well.
      • To further elevate our developmental atlas, we are proposing to include additional data and new analyses delineating the developmental dynamics of synovial fibroblasts on single cell (transcriptomic) level. This change to the original manuscript had not been requested by the reviewers, but we are proposing this pro-actively because we believe this would be an impactful addition to a revised version of our study, providing data also on the maturation of the synovial (lining) macrophage niche. Again, this will re-focus the manuscript on the developmental data and provide a novel, valuable resource for those interested in joint biology.
      • We will otherwise respond to all individual reviewer comments and implement the requested changes, unless technically not possible. We are convinced that this revision plan will result in a manuscript that fits very well with the remit of Genes & Development.

      Please find below detailed point-by-point answers.

      Reviewer #1

      Evidence, reproducibility and clarity

      In their manuscript entitled "The synovial lining macrophage layer develops in the first weeks of life in a CSF1- and TGFβ-dependent but monocyte-independent process," the authors explore the developmental trajectory of synovial lining macrophages. They demonstrate that the formation of this specialized macrophage layer is age-dependent and governed by a distinct developmental program that proceeds independently of circulating monocytes. Through scRNA-Seq, the authors show that synovial lining macrophages originate locally from Aqp1⁺ macrophages and are marked by the expression of Csf1r, Tgfbr, and Piezo1. Notably, genetic ablation of each of these factors impaired the development of lining macrophages to varying degrees, suggesting differential contributions of CSF1, TGFβ, and PIEZO1 signaling pathways to their maturation and maintenance.

      The manuscript is well written, and the data quality and representation is of a high standard. The authors have employed a sophisticated array of state-of-the-art mouse models and cutting-edge technologies to elucidate the developmental origin of synovial lining macrophages. Notably, the supporting scRNA-Seq datasets are of excellence and provide valuable insights that will likely be of significant interest to researchers in the field of immunology and joint biology. Accordingly, the experimental approach and interpretations regarding macrophage origin are well-founded and compelling. However, in the eye of the reviewer, the section addressing the underlying molecular mechanisms is a bit less convincing. This part of the study appears slightly underdeveloped, and some of the mechanistic claims lack sufficient experimental clarity. A more rigorous experimental investigation would be essential to reinforce the manuscript's conclusions, particularly concerning the data related to Tgfbr and Piezo1, where the current evidence appears insufficiently substantiated.

      We thank the reviewer for their positive and constructive evaluation of our manuscript. We agree with them (and the other reviewers) that our functional data on the involvement of TGFβ signaling and mechanical loading/mechanosensing are comparably less convincing and substantiated than our developmental data. We are very grateful for their (and the other reviewers') suggestions to provide more support for the involvement of these factors in lining macrophage development. However, we think that carrying this out to the same high standard will require substantial time and other resources. We have therefore decided to uncouple this from the developmental data and pursue this in follow-up work. We will re-focus the current manuscript on the developmental data. We have proposed to the editors to instead include additional data on synovial fibroblast development, to complement our macrophage data and also delineate the maturation of their niche, thereby providing a conclusive developmental atlas.

      Major point:

      1. The numbers of VSIG4⁺ macrophages appear either unaffected or only minimally altered in both Csf1rMerCreMer Tgfbr2floxed and Fcgr1Cre Piezo1floxed mouse models, respectively. This raises an important question: was the gene deletion efficiency sufficient in each model? Accordingly, the authors are encouraged to include quantitative data on gene deletion efficiency for both mouse models, as this information is critical for interpreting the observed phenotypic outcomes and validating the conclusions regarding gene function. Furthermore, to better assess the impact of Tgfbr2 and Piezo1 disruption, the authors should provide more comprehensive flow cytometry analyses and histological data for these mouse models. Given the apparent homogeneity of VSIG4⁺ macrophages (as shown by the authors themselves), bulk RNA-Seq of sorted Tgfbr2- and Piezo1-deficient VSIG4⁺ macrophages (or from TGFβ-treated animals) would offer valuable insights into both the effectiveness of gene deletion and the molecular pathways governed by TGFβ and PIEZO1 in lining macrophages.

      As outlined above, we have decided to uncouple our functional data on TGFβ, Piezo1 and mechanical loading. The points raised here are all very valid, and we will implement your suggestions in our follow-up functional work focusing on signaling events regulating lining macrophage development. On the suggestion to perform bulk RNA sequencing for VSIG4+ macrophages: This is a good one in principle - although we will not be able to use this strategy where we want to assess the consequences of experimental treatments or genetic models on lining macrophage maturation, because acquisition of VSIG4 is a key maturation event that might be impaired in these conditions.

      Minor points:

      Consistent usage of Cx3cr1-GFP+ nomenclature (for instance: Fig. S1 legend "adult mouse synovial tissue, showing PDGFRα⁺ fibroblasts (yellow) and CX3CR1-GFP⁺ cells (cyan)." versus Fig. 1 legend "Automated spot detection highlights Cx3cr1-GFP⁺ macrophages)".

      We will implement these changes.

      Unclear Fig. 3 legend: "Representative immunofluorescence images of synovial tissue from Clec9aCre:Rosa26lsl-tdT mice at 3 weeks and in adulthood, showing and tdTomato (yellow) and stained for DAPI (blue), VSIG4 (cyan)" Check 'showing and tdTomato.'

      We will implement these changes.

      For greater clarity, it would have been helpful if the transcript names had been directly included within Figures 3C, S3A, and S3C.

      We will implement these changes.

      Page 24: "(Mki67CreERT2:Rosa26lsl-tdT)" Last bracket not superscript.

      We will implement these changes.

      Page 25: "we again leveraged our scRNAsequencing dataset" Missing punctuation.

      We will implement these changes.

      Page 27: Fig. 5C legend: " of synovial tissue of 1 week-old, 3 weeks-old and adult mice." Please specify and change to 'adult Csf1rΔFIRE/ΔFIRE mice'.

      We will implement these changes.

      Page 30: The outcome observed in the Acta1-rtTA:tetO-Cre:ChR2-V5fl mouse model appears to be inconclusive: "This approach resulted in an increased density of VSIG4+ and total (F4/80+) macrophages in the exposed leg of some 5 days-old pups, but others showed the opposite trend (Figure S5D)." This variability may reflect low efficiency of the model or other technical limitations (e.g. muscle contractions frequency or time point of analysis). Given this ambiguity, it is worth reconsidering whether the data are sufficiently robust to warrant inclusion. Should the authors choose to include these findings, further experimentation of appropriate depth and precision is required to allow a conclusive interpretation (either it increases the density of VSIG4+ macrophages or not). The same applies to the Yoda1-treated mice, for which additional data are needed to determine whether VSIG4⁺ macrophage density is truly affected.

      We have decided to remove the data on the optogenetic mouse model and Yoda1 treatment and follow-on separately, implementing these suggestions, including proof of concept data for optogenetically induced muscle contractions.

      Significance

      General assessment: provide a summary of the strengths and limitations of the study. What are the strongest and most important aspects? What aspects of the study should be improved or could be developed? This is a well-designed study that uses cutting-edge methodologies to investigate the developmental trajectory of synovial lining macrophages under homeostatic conditions. The authors present robust experimental evidence and compelling interpretations concerning synovial macrophage origin, which are both well-substantiated and impactful. Nonetheless, from the reviewer's perspective, the section exploring the molecular mechanisms underlying macrophage differentiation is comparatively less convincing. This section appears somewhat underdeveloped, as some of the mechanistic claims lack sufficient depth and experimental rigor to fully substantiate the conclusions.

      Describe the nature and significance of the advance (e.g. conceptual, technical, clinical) for the field: In contrast to earlier studies (PMID: 31391580, 32601335), the inclusion of fate-mapping experiments adds an important dimension, offering novel insight into the ontogeny of synovial macrophages. This expanded perspective may prove particularly valuable in advancing our understanding of joint immunology, especially regarding the local origins and lineage relationships of macrophage populations.

      Furthermore, the authors present novel insights into the molecular pathways underlying the differentiation and development of synovial lining macrophages. By demonstrating previously unrecognized regulatory mechanisms, this work significantly deepens our understanding of the cellular and transcriptional programs that drive macrophage specialization within the joint microenvironment.

      Place the work in the context of the existing literature (provide references, where appropriate): This study builds upon previous work characterizing the macrophage compartment in the joint (PMID: 31391580, 32601335), yet provides a substantially more comprehensive dataset that spans multiple developmental time points and data on the origin of this specialized macrophage subset.

      State what audience might be interested in and influenced by the reported findings: Immunologist, clinicians

      Define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate. This study falls well within the scope of the reviewer's expertise in innate immunity.

      Reviewer #2

      Evidence, reproducibility and clarity

      In the manuscript „The synovial lining macrophage layer develops in the first weeks of life in a CSF1- and TGFβ- dependent but monocyte-independent process", Magalhaes Pinto and colleagues carefully employ a wide range of technologies including single cell profiling, imaging and an exceptional combination of fate mapping models to characterize the ontogeny and development of lining macrophages in the joint, thus dissecting their maturation during postnatal development. Over the last decade, several landmark studies highlighted the imprinting of tissue-resident macrophages by a combination of ontogenetic and tissue-specific niche factors during development. So far, the ontogeny and the tissue niche factors governing the development and maturation of lining macrophages have not been described. Therefore, the results of this study offers insights on a small highly adapted macrophage population with relevance in many disease settings in the joint. Furthermore, the findings are nicely showcasing how macrophages are specializing to even very small tissue niches across development within one bigger anatomical compartment to serve dedicated functions within this niche.

      This manuscript is beautifully written and highlights many novel, highly relevant findings on lining macrophage biology and the authors employ a wide range of different technologies to carefully dissect the postnatal development of lining macrophages.

      In particular, the combination of scRNA-seq and fate mapping is providing a unique the link of transcriptional programs to ontogeny within the tissue niche. Furthermore, the integrative use of distinct fate mapping strategies, transgenic mouse lines, and treatment paradigms to elucidate key niche factors guiding the development and maturation of lining macrophages provides many interesting findings and data that are highly relevant to the field. I really enjoyed reading this manuscript.

      Thank you for your complimentary and constructive assessment of our manuscript, and the detailed comments below, which are very helpful. Please find point-by-point responses below.

      Major points:

      The authors show dynamic regulation of VSIG4 in lining macrophages during development, therefore VSIG4 is maybe not an ideal choice for gating strategies to define lining macrophages or to show as a single markers in immunofluorescence (IF) stainings to demonstrate their abundance across development (even though it is clear that this is the reason why the F4/80 staining is shown next to it). To demonstrate the increase of lining macrophages during development in IF, it would be more helpful if the authors would show quantifications of all F4/80+ cells and additionally VSIG4+ as a proportion of F4/80+ cells (or VSIG4+ F4/80+ and all F4/80+ in a stacked bar plot). We agree with the assessment of VSIG4 not being ideal since this is a key marker of mature lining macrophages only.

      We agree with the assessment of VSIG4 not being ideal since this is a key marker of mature lining macrophages only. We will provide additional data and analyses.

      In Figure 1C, the authors nicely demonstrate that the lining macrophages get closer in their distance across development to build the epithelial-like macrophage structure along the adult lining. Is the close proximity between lining macrophages already fully "matured" at 3 weeks of age and comparable to adults? Please quantify the distance in adult linings.

      We will provide additional data for adult joints.

      Can the authors explain how the grouping was performed between the analyzed human fetal joints? It is not clear why the cut was chosen between the groups at 16/17 weeks of age. Maybe it would be also beneficial if the authors would consider not grouping these samples but rather show the specific quantifications for each samples individually and estimate via linear regression the expansion over time across human development. Furthermore, can the authors give additional information about the distancing of lining macrophages in the human fetal samples, it would be great to see if they follow the same dynamics as in mouse. Maybe comparison to human juvenile/adult joints would also add on to substantiate the findings in human samples (if possible).

      We will show samples ungrouped and perform new linear regression analysis as suggested.

      The scRNA-seq analysis leaves several questions open and some conclusions and workflows cannot be easily followed.

      We appreciate this comment and the complexity of the data, and will implement the below recommendations, and clarify the issues raised. Detailed:

      a. It is not clear how and especially why the signature genes to define macrophages vs. monocytes were chosen. Especially as the signature genes for monocytes would not include patrolling monocytes and the macrophage signature genes seem to be highly regulated during development, see also Apoe expression in NB vs. adult in Figure S2e. Why did the authors not take classical markers such as Itgam, Fcgr1a, Csf1r?

      We will include new analyses using these markers.

      b. Can dendritic cell signatures be excluded? Cluster 11 and 12 show indeed some DC markers, are these really macrophages?

      We will include new analyses to account for DC markers.

      c. The authors provide several figure panels showing TOP marker genes or key marker genes for the identified clusters, however it is not clear if these are TOP DE genes or if the genes were hand chosen. Somehow, the authors give the impression that the clusters were chosen and labeled not based on DE genes, but more on existing literature that previously reported these macrophage populations. DE gene lists for all annotated cell types and macrophage clusters need to be provided within the manuscript.

      We will provide the full DEG analysis results.

      d. The authors claim that Clusters 1 and 4 are "developing" macrophages. How is this defined? Why are these developing cells compared to other clusters? And why are these clusters later on not considered as progenitors of Aqp1 macrophages and Vsig4 macrophages? Why are Aqp1+ macrophages not labeled as developing when they are later on in the manuscript shown as potential intermediate progenitors of lining macrophages?

      As per below comment, we will expand on this and clarify nomenclature and (potential) relationships between these and other macrophages.

      e. Furthermore, it is again confusing that markers are used throughout Figure 2 which are labeled as "key marker genes" for a population and then later on they are claimed to be regulated during development within this population, see for example Figure 2D and 2H.

      We will clarify this as per above answer.

      f. It is appreciated that the authors distinguished cycling clusters such as 8, 9, and 10 based on their cycling gene signature. Here it would be very exciting to see a cell cycle analysis across all clusters and time points to see when exactly the cells are expanding during development; this would also substantiate the data later shown for the Mki67-CreERT2 mouse model.

      We will perform the proposed cell cycle analysis, and implement this and the other reviewer's suggestions for marker selection and cluster annotation (this is also covered in below comments from other reviewers).

      g. Can the authors identify certain gene modules during development of lining macrophages (and/or their progenitors) which are associated with certain functions (e.g. GO terms, GSEA enrichment)?

      This will be included in the revised manuscript.

      To determine the actual presence of the identified macrophage clusters from the scRNA-seq as macrophage populations in the joint, the authors should perform IF or FACS for key markers. Especially, Aqp1+ macrophages should be shown in the developing joint.

      We will provide additional data on Aqp1+ macrophages in the developing joint, and related these to a study by collaborators currently in revision at Immunity, which characterizes the Aqp1+ population in detail (we are hoping to have a doi available during our revision process).

      The authors used a wide range of fate mapping models, which is quite unique and highly appreciated. The obtained results and the conclusions made from the models raise a couple of questions: Whereas contribution of HSC-derived/monocyte-derived macrophages to the lining compartment seems to be minor, there is still labeling across different models. Various aspects would need to be clarified.

      We will clarify these data throughout as per below suggestions.

      a. For example, the authors employ Ms4a3-Cre as a tracing model for GMP-derived monocytes, however all quantifications of the labeling efficiency are not normalized to the labeling in monocytes or another highly recombined cell population. This should be shown, similar to the other fate mapping models (Figure 3 F-I).

      Labelling efficacy for Ms4a3-Cre is near complete for GMP-derived monocytes (and neutrophils) with the Rosa-lsl-tdT (aka Ai14) reporter we have used (see also PMID: 31491389 and doi: 10.1101/2024.12.03.626330); but we will include normalized data as requested.

      b. Please show Ms4a3 expression across clusters across time points, to exclude expression in fetal-derived clusters.

      We will include this in the revised supplementary information, but there is indeed very little at birth (in line with the original report for other tissues PMID: 31491389).

      c. In line with the question raised above, if the authors can exclude a development of the Egfr1+ and Clec4n+ developing macrophages into Aqp1+ macrophages and subsequently into Vsig4 lining macrophages, the obtained data from the Ms4a3-Cre model highly suggests a correlative labeling across these clusters what could implicate a relation. However, the authors do not discuss throughout the manuscript the role of these developing macrophages. It is highly encouraged to include this into the manuscript and it would be of high relevance to understand lining macrophage development.

      This is an interesting point and we agree it deserves consideration in the revised manuscript. Indeed, our trajectory analyses do not predict differentiation of the Egfr1+ and Clec4n+ developing macrophages into Aqp1+ macrophages, and hence, ultimately lining macrophages. Conversely, Aqp1+ cells might also convert into Egfr1+ and Clec4n+ developing macrophages. We will elaborate on this more in the revised manuscript.

      d. The authors conclude from the pseudo bulk transcriptomic profiling of the different macrophage clusters that TdT+ and TdT- macrophages do not differ in their gene expression profile and that this is due to niche imprinting rather than origin imprinting. Even though the data supports that conclusion, the authors should verify if inkling cells early during development also show this similar gene expression profile and gene expression should be compared at the different developmental time points. Tissue niche imprinting is happening within the niche during development, most likely in a stepwise progress, and therefore there should be differences in the beginning.

      This is another important point that we will address in the revised manuscript by performing additional differential gene expression analyses at the different developmental time points, including the earliest stages, as suggested.

      The trajectorial analysis using different pseudotime pipelines is very interesting and nicely points out the potential role of Aqp1 macrophages as intermediates of Vsig4 lining macrophages. From my point of view, all trajectories seem to suggest that Egfr1 developing macrophages and Clec4n developing macrophages might differentiate into Aqp1 macrophages, however the authors are not exploring this further and the role of both developing macrophage clusters is not further discussed (see also comments above).

      We will address and discuss this in the revised manuscript.

      How was the starting point of the trajectorial analyses defined and is it the same for each pipeline used?

      We will clarify this in the revised manuscript.

      Are there potentially two trajectories? It looks like there is one in the beginning of postnatal life and a second one appearing from the monocyte-compartment later in life. If this is true, that would rather speak for a dual ontogeny of Vsig4+ macrophages, wouldn't it?

      We will discuss this in the revised manuscript.

      A heatmap (transcriptional shift) of trajectories between more clusters should be shown at least for Cluster 0,1,2, and 3. It is not sufficient to demonstrate this only between two clusters.

      We will add these analyses during revision.

      To show the similarity between Aqp1 macrophages and proliferating macrophage clusters, the authors should remove the cycling signature and compare these clusters to show that the cycling cells might be Aqp1 macrophages or earlier developing macrophage progenitors aka Clec4n or Egfr1 macrophages.

      We will address this in the revised manuscript.

      The conclusions made from the Mki67-CreERT2 data are a bit difficult to understand, whereas all progenitors (monocyte progenitors and macrophage progenitors will proliferate at the neonatal time point and no conclusions can be made if the cells expand in the niche. The authors should employ Confetti mice or other models (Ubow mice) to analyze clonal expansion in the niche.

      We acknowledge that interpretation of the Mki67-CreERT2 data is complicated by labeling of other cells, and notably, labeling observed in BM-derived cells. To complement the Mki67-CreERT2 data, and specifically account for proliferation of BM-derived cells, we have tried using Ms4a3-Cre:Ubow mice to quantify expansion of the few monocyte-derived macrophages in the joint (lining). However, this yielded

      All predicted cell-cell interactions between macrophages and fibroblasts should be provided in a supplementary table. Are the interactions shown in Figure 5 chosen interactions or the TOP predicted ones? Whereas the authors show different numbers of interactions, it is most likely hand-picked and therefore biased.

      We will provide a full list of all predicted interactions in the revised supplementary material in addition to a list of the full differential gene expression analysis.

      The authors further aim to dissect the factors involved in the developmental niche imprinting of lining macrophages. Even though it is highly appreciated that the authors used so many experimental setups to show the reliance of lining macrophages on Csf1 and TGF-beta as well as mechanosensation, the wide range of models the different methods used and selected developmental time points make it very difficult to really interpret the data. The authors should carefully choose time points and methods (either FACS analysis across all models or IF across all, or both). Often deletion efficiencies for transgenic models and proof of concept that the inhibitors and agonists are working in the treatment paradigm are not provided. For example, Csf1rMer-iCre-Mer Tgfbr2fl/fl mice are used but no deletion efficiency is shown or different time points of analysis, maybe the macrophages are not properly targeted in the set up.

      We have decided to uncouple our experimental data on Tgfb, Piezo1 and mechanosensing/mechanical loading, but are taking this into consideration for revision. In many cases, we have in fact performed flow cytometry and imaging analyses, and agree, we should be showing this consistently.

      The authors have shown the role of Csf1 and Tgfbr2 only for lining macrophages, is this specific in the joint to this population of are subliming macrophages affected in a similar manner.

      We will include data on sublining macrophages in the revised figure (for CSF1; Tgfb data will be uncoupled from this current manuscript).

      Can the authors confirm their results in CSF1R-FIRE mice with anti-Csf1 injections or in Csf1op/op mice?

      We will expand our discussion of the Csf1 findings, and aim to include data for anti-CSF1 antibody treatment during revision. Csf1 has previously been reported as a key factor required for maintenance of tissue-resident macrophages, including those in the joint (lining). Indeed, Csf1op/op mice are deficient in synovial lining macrophages, from 2 days of age onwards (PMID: 8050349), and lining macrophages are also absent from 2-weeks-old and adult Csf1r-/- mice (PMID: 11756160). However, a full developmental analysis has not been performed. We are thus the first to show a full developmental time course, using state-of-the-art experimental readouts, and specifically focusing on the early postnatal window of lining maturation that we have identified here in this study. Moreover, we have used a more specific model, Csf1rFIRE ko, in which Csf1 deficiency is restricted to myeloid cells. This model circumvents issues with other models, which show many developmental defects, some of which unrelated to macrophages. These include growth retardation and skeletal defects, which may influence joint macrophage development. Therefore, although Csf1 dependence of synovial lining macrophage had indeed been previously reported in principle, our data substantially expand on and solidify these findings, thereby adding novelty.

      The setup in Figure S5G is very interesting to test the role of movement and mechanical load on the joint, however, there is basically no data on the model provided showing the efficiency of the induced optogenetic muscle contractions, and only one time point is shown.

      Data on mechanical loading will be uncoupled from the current manuscript and substantiated in a separate follow-up.

      The results regarding the role of Piezo1 and mechanosensation vary a lot. Could it be that analyses were done too early or that actually proper weight load on the joint must be applied for the maturation of the macrophages? The authors should test this to.

      We will uncouple these data from the current manuscript during revision in order to investigate the contribution of these (and other) factors in sufficient detail. However, this is a possibility that we have discussed. In fact, the most appropriate experimental approach to address the involvement of mechanical loading, onset of walking and specifically, weight bearing would be a loss-of-function approach (i.e. paralysis at the newborn stage), for which we unfortunately could not obtain ethics approval from the UK Home Office.

      The Rolipram experiment is shown in Figure S5G, but is not described in the result section. It only appears at some point in the discussion part. The authors should move it to results or remove it from the manuscript.

      We will incorporate these data with the revised section on developing synovial macrophage populations.

      Minor points:

      Please reference the Figure panels in numeric order throughout the text.

      We will change this where not the case already.

      Figure 2a and 2b are a bit out of the storyline, it is not obvious why this is shown here and maybe it would be good to move it to the supplements. Gating strategy is also not used for scRNA-seq. Therefore, it would better fit to the later analysis of joint macrophages across different transgenic mouse models and treatment paradigms. The gating strategies are changing across different experiments throughout the figures, it would be nice to have a similar gating strategy for all experiments, see also Figure 3 where the defining markers for joint macrophages are changing between models.

      We will revise Figures 2, 3 and the related supplementary figures.

      A lot of figure panels have very small labeling that is basically unreadable. Axes at FACS plots for example. Sometimes, it is even impossible to distinguish cluster labels especially when they have similar colors.

      We will revise this, thanks for pointing it out.

      In the text on page 14, many markers are named which are specifically regulated during development in lining macrophages, but these factors are not labeled anywhere in the volcano plot. It would be good to showcase at least some of these named genes in the figure panel, e.g. Trem2.

      We will do this for revision.

      Figure 2F and Figure S2F are really nicely showing the percentage of cells per cluster in each analyzed biological sample. Maybe the authors could additionally consider to show a stacked bar plot with the mean percentage of cells per cluster and how the clusters are distributed across time points?

      We will include this in the revised manuscript.

      Figure 3A: IF for adult lining macrophages and the quantification are missing.

      This will be included in the revised version.

      Reviewer #3 - Major

      Generally, the story could be more streamlined by introducing earlier reporter lines and lineage-origin logic. Clearly state which reporter/CreERT2 lines and acrosses are used. It was unclear in Figure 2 that cells of the cross of the Cx3cr1-GFP and Ms4a3Cre:Rosa26lsl-tdT reporter lines were used for the scRNA-seq. The principle that there are fetal-derived and bone marrow (GMP)-derived monocytes and macrophages doesn't need to be "hidden" until Figure 3. For example, also the imaging of Ms4a3Cre could be introduced before the scRNA-seq.

      We will revise the structure and order of the manuscript during revision. However, we will streamline this between reviewer comments, and would also like to point out that the 2 other reviewers were very complimentary about the writing and clarity, i.e. we may not follow every specific suggestion of reviewer 3, but are very much taking on board their overall comment on structure and clarity.

      Figure 1 could benefit from a cartoon visualizing the anatomy of the knee joint. The terms "sublining" and "synovium" are now a bit unclear, as it appears that sometimes the synovium is indicated as sublining and vice versa. Additionally, a schematic developmental timeline could be added to indicate the parallels between mouse and human development (fetal and postnatal development in mouse versus gestational age in human). Also, the various waves of hematopoiesis could be indicated in this timeline, which would be particularly helpful for Figure 3 for the lineage-tracing readouts. Lastly, the authors could end the manuscript (a new Figure 6) with a general cartoon summarizing all the results presented.

      We will include these illustrations as suggested.

      Figure 1 could be rearranged: first introduce the markers CX3CR1 and VSIG4 (Figure 1D) and then present the quantifications (Figure 1B/E). Where possible, co-visualization CX3CR1-GFP and VSIG4 on tissue sections to strengthen the claims on the relationship between these 2 markers. Tying the scRNA-seq insights (Figure 2) to the imaging would be elegant. Moreover, it would be informative to represent the CX3CR1+ and VSIG4+ macrophages as a percentage of F4/80+ macrophages (Figure 1B/E). Similarly, for the flow cytometry data in Figure 2, the relationship between the markers CX3CR1 and VSIG4 on macrophages could be more clearly displayed and discussed.

      Thanks for this remark. We will endeavour to show co-localization and analysis of both markers wherever possible. However, where we did not use Cx3cr1gfp mice, co-staining was limited by antibody choice and availability.

      The 3D imaging of the joint is a nice addition to the manuscript, as it provides more context to the anatomical structure; however, while the text suggests several newborn joints were imaged, Figure 1F visualizes (again) the knee joint. Could other joints also be represented by 3D imaging? If the knee joint is the only joint available for imaging, and previous confocal imaging focused specifically on the meniscus in the knee joint, could the meniscus also be highlighted in the lightsheet imaging?

      Apologies if this was not clear from the original manuscript text, but we have only imaged the knee joint in 3D. We will clarify this during revision. Whilst we want to maintain the focus on knee joints throughout this manuscript, but we will include additional 3D lightsheet imaging data from micro-dissected knee joints to further substantiate the original data.

      Clarification is requested regarding the imaging quantification representation. The M&M section under "Statistical analysis and reproducibility" states that individual data points are displayed, and bars represent the mean. However, some of the Figure legends (e.g., Figures 1B and S1C) specify that each dot corresponds to an individual mouse, with quantification based on 2-3 sections per mouse. While this appears to be a very reasonable representation of the data, does this mean that for each dot, the mean value from the 2-3 sections per mouse was calculated and plotted?

      We will clarify this.

      It is not clear how the differential expression analysis was performed on the Vsig4+ cells. Please specify if Cluster 0 was used for analysis, or all Vsig4-expressing cells? Not all cells in Cluster 0 have Vsig4+ expression. The authors described the expression dynamics of Aqp1 as intriguing, but lack a reasoning on why this is interesting.

      We will revise this section.

      Figure S3E: In line with the previous comment, can the authors justify that the tdTomato+/- comparisons are not biased by scRNA-seq dropout (scRNA-seq is zero-inflated, so some tdTomato- cells could be false negatives), and provide methodological details (thresholds, ambient RNA correction, etc.) to support this?

      We will clarify this and include additional representations of the tdTomato transcript data.

      Although the sex-related differences in macrophage composition and the absence of differential expression are interesting, they distract from the manuscript's main messages. Moreover, the Discussion does not elaborate on how these observations relate to joint (disease) biology. Consider removing this section or integrating it clearly into the relevant biological context.

      We will remove this section as suggested.

      CreERT2 transgenic lines are often not 100% efficient in recombination, also depending on whether tamoxifen or 4-OHT is used. Could the authors report the percentage of tdTomato+ cells in the joints and compare them to the recombination efficiencies in the monocytes/microglia under the same tamoxifen or 4-OHT conditions? This would help clarify how the interpret the macrophage labeling %'s.

      We will report labelling efficacies and/or show normalized data in the revised manuscript.

      Could the authors draw parallels between the observations in the mouse knee joint macrophage populations and literature on other joints in mouse and the knee joint in human (for example, as described in Alivernini et al., 2020 and in the very recent Raut et al., 2025)?

      We will include a section on this in the revised manuscript.

      Reviewer #3 - Minor comments:

      In general, the authors should clarify in the Results what each marker used for imaging, flow cytometry, or in the mouse reporter lines delineates. For example, mention that F4/80 is a marker for tissue-resident macrophages (correct?) in immunofluorescence, that IBA1 is a marker for macrophages on human tissue sections (Figure S1), and PDPN is GP38 (Figure S2 - align usage of marker reference across main text and figures).

      We will implement this request.

      Figure S1B: Is CX3CR1 also restricted to the lining macrophages in human? Could a co-staining with IBA1 be performed to strengthen the species similarities?

      To our knowledge, there is no antibody available that works for imaging of human CX3CR1. Moreover, CX3CR1 is only limited to the lining population in adult joints, in fetal and newborn (mouse) joints, all macrophages express this receptor, as do fetal progenitors to macrophages. However, Alivernini and colleagues have reported that TREM2high macrophages are the human counterpart of the mouse CX3CR1+ lining population (PMID: 32601335). We do not have access to postnatal human joint tissue samples, unfortunately, but we will attempt to stain for and quantify TREM2+ macrophages in human fetal joints for the revised manuscript.

      Adipocyte diameter quantification: Avoid plotting individual adipocytes from 2 mice without per-mouse visualization. Instead, report the mean adipocyte diameter per mouse and plot those means.

      We will implement this change.

      A little typo was spotted in the "Statistical analysis and reproducibility" section: it is Dunn's, not Bunn's multiple-comparison correction.

      Thanks for spotting this.

      Figure 2A: The gating strategy for the CX3CR1-GFP cells is missing.

      We will provide this in the revised manuscript or supplementary material.

      Improve the visualization of some plots. For example, Figure 2F is hard to read because of the big dot size. The dots seem to add no information to the graph and could be removed. Additionally, for comparing the clusters across the different time points, one could project the cells from the other time points in grey in the background.

      We will revise the presentation of these data.

      Figure S2: The dotplot is more informative than the heatmap, consider removing the heatmap.

      We will do that.

      Figure 3A: If technically feasible, image and visualize both the GFP and tdTomato expression. It would be informative to see the Cx3cr1+ and Ms4a3-derived cells in the same specimen.

      We will strive to show this in the revised manuscript.

      Figure 3C: Highlight that tdTomato expression is visualized here.

      We will do that.

      Figure 3G,F: The authors should place the schematics and graphs next to each other, so the data points can be more easily compared.

      We aim to do this in the revised manuscript.

      Figure 4B: Which co-staining was performed for the immunofluorescence to quantify the % of tdTomato+ cells?

      We co-stained for F4/80 and assessed localization in the lining or sublining. This will be clarified in the revised Figure legend.

      Figure 4C: The trajectory analysis appears to have an arrow pointing from the Ccr2+ macrophages to the Ly6c+ monocytes. Please verify this directionality, as its seems against the known biology.

      This will be addressed during revision.

      Figure 5 mentions that the Csfr1 levels were reduced in a tissue-specific manner, but it is unclear how this tissue specificity was achieved.

      We apologize for this misunderstanding. Csfr1FIRE mice are not tissue-specific knockouts, but they are more specific than global knockout mice, since only a (myeloid-specific) enhancer is affected. We will clarify this in the relevant section.

      For the TGFb perturbations (Tgfbr2 KO and systemic TGFb depletion): did the authors validate reduced TGFb pathway activity in the macrophages, for example, reduced pSMAD2/3 levels? This would validate the effectiveness of the perturbations.

      This is an important point, and assessing signaling events downstream of TGFb is a very good suggestion. As per above comment, we have decided to uncouple the functional data with exception of CSF1 from the revised version of the current manuscript, but we will be taking this into account for substantiating our functional data in follow-up work.

      Figure 5F could benefit from a timeline of the treatment.

      As for 15., we will be taking this into account for follow-up work on the uncoupled functional data.

      The Methods mention that Gene Ontology analysis was performed on the single-cell data, but the results are not plotted in a figure. It would be informative to include this GO/pathway analysis in the appropriate figure(s).

      We will include this in the revised (supplementary) information.

    1. Reviewer #2 (Public review):

      Summary:

      In this work, Ganesh and colleagues use experimental data from Hi-C and from live-cell imaging to evaluate different polymer models of 3D genome organization in Drosophila based on both structural and dynamic properties. The authors consider several leading hypotheses, which are examined sequentially in increasing level of complexity - from the minimal Rouse polymer, to a model combining sequence-specific compartmentalization and loop-extrusion without extrusion blockers. They conclude that the combination of both compartmentalization and loop-extrusion gives the best agreement with the data. Their analysis also leads to concrete predictions about the processivity of cohesin loop extrusion in Drosophila, and a conclusion that the compartmental interaction strength is poised near criticality in the coil-globule phase space.

      Strengths:

      There is considerable interest in the field in understanding the mechanisms responsible for the 3D spatial organization genome and the dynamic movement of the genome, which has major implications for our understanding of long-range transcriptional regulation and other genome behaviors. The live-cell experimental work on which this study draws highlights the limitations of existing models to explain even the dynamic behaviors observed in the data, further exciting interest in further exploration. Therefore, this paper seeks to address an important gap in the field. The work is written in a well-organized, well-illustrated fashion. The text and figures are nicely integrated, easy to read, and explain challenging concepts with elegance and brevity in a manner that will be accessible to a broad audience.

      Weaknesses:

      The validity and utility of these conclusions are, in my view, substantially undermined by what appears to be unappreciated peculiarities of the live-cell data set that was used to constrain the model. The live-cell data comes from embryos were edited in a way that intentionally substantively changed both the 3D genome structure and dynamics specifically at the loci which are imaged, a case which is not at all explained by any of the models suggested nor acknowledged in the current work, nor compatible with the Hi-C data that simultaneously used to explain these models. As these ignored synthetic alterations have been previously shown to be determinative of transcriptional activity, the relevance of the author's work to transcriptional control (a prime motivation in the introduction) is unclear.

      The agreement in 3D organization, as represented in chromosome-scale contact frequency heatmaps, is substantially less impressive than the agreement seen in prior work with similar models. This discrepancy appears to be due in part to the unappreciated effects of the mentioned in the previous limitation, as well as inappropriate choices in metrics used to evaluate agreement. It is also not particularly surprising that combining more models, with more free parameters, results in an improvement in the quality of fit.

      Some major results, including both theoretical works and experimental ones, are ignored, despite their relevance to the stated objective of the work. The current manuscript and analysis could be improved substantially by a consideration of these works.

      I describe these issues in more detail below.

      Major issues:

      (1) The genetic element "homie" is present in a subset of the data: The experimental data used in this analysis come from different fly lines, half of which have been edited explicitly to alter genome structure and consequent transcriptional behavior, yet the authors are trying to fit with a common model - a problem which substantially undermines the utility of the analysis.

      Specifically, the authors evaluate the various models/simulations by comparing them to Hi-C from wildtype Drosophila embryos on the chromosome scale and 3D distances and dynamics from live cell imaging in genetically edited embryos, to a series of models in turn. The exercise fatally overlooks a critical fact, (admittedly not easily noticed in the work from Bruckner et al), that the fly embryos used for nearly all their analyses contain not only fluorescent labels, but also contain two copies of a powerful genetic sequence, "homie", known for its ability to dramatically change the 3D organization and dynamics of the genome. Whether or not the fluorescent labels themselves used in the study further alter structure and dynamics is not entirely clear (and will require further work beyond the scope of either study), but at least these fluorescent labels aren't known to dramatically affect 3D structure and dynamics the way homie is. The critical problem is that adding or removing the "homie", as shown in a collection of prior works I describe below in more detail, dramatically affects structure, dynamics, and gene expression. Whether or not the genome contains two distal cis-linked copies of homie fundamentally changes genome structure and dynamics, so to use one dataset which has this edit (the live-cell data) and one dataset which lacks it (the Hi-C data) is, in some sense, to guarantee failure of any model to match all the data.

      If the authors had chosen instead to focus exclusively on the 'no homie' genetic lines in the Brukner data, they would have a much smaller dataset (just 2 distances), which would not cover all the length scales of interest, but it would at least be a dataset not known to be contradictory to the Hi-C. The two 'no homie' lines make much more plausible candidates for the sort of generalizable polymer dynamics these authors seek to explain, as will hopefully be made more clear by a brief review of what is known about homie. I next describe the published data that support these conclusions about how homie affects 3D genome spatial organization and dynamics:

      What is "homie" and how does it affect 3D genome distances, dynamics, and gene expression?

      The genetic element "homie" was named by James Jaynes' lab ( Fujioka...Jaynes 2009) in reference to its remarkable "homing" ability - a fascinating and still poorly understood biological observation that some genetic sequences from Drosophila, when cloned on plasmids and reintegrated into the genome with p-elements, had a remarkable propensity to re-integrate near their endogenous sequence, (Hama et al., 1990; Kassis, 2002; Taillebourg and Dura, 1999; Bender and Hudson, 2000; Fujioka...Jaynes 2009). By contrast, most genetic elements tend to incorporate at random across the genome in such assays (with some bias for active chromatin).

      The Jaynes lab subsequently showed that flies carrying two copies of homie, one integrated in cis, ~140 kb distal from the endogenous element, formed preferential cis contacts with one another. Indeed, if a promoter and reporter gene were included at this distal integration site, the reporter gene would activate gene expression in the pattern normally seen by the gene, even-skipped. The endogenous copy of homie marks one border of ~16 kb mini-TAD which contains the even-skipped gene, (eve), and its developmental enhancers, so this functional interaction provides further evidence of physical proximity (as was also shown by 3C by Jaynes (Fujioka..., Schedl, Jaynes 2016), and later with elegant live imaging, by Jaynes and Gregor (Chen 2018)).

      Critically, if either copy of homie is deleted or substantially mutated, the 3D proximity is lost (Fujioka 2016, Chen 2018, Bruckner 2023), and the expression of the transgene is dramatically reduced (at 58 kb) or lost. Given the author's motivation of understanding "E-P" interactions, the fact that the increased 3D proximity provided by homie is as essential for transcription as the promoter itself at the ~150 kb distance, underscores that these are not negligible changes.

      These effects can be seen by plotting the data from Bruckner 2023, which includes data from labels with separations of 58 kb and ~150 kb "no homie" as well as homie. Unfortunately, the authors don't plot this data in the manuscript in the comparison of 3D distances, though the two-point MSD can be seen in Figure S13C, and laudably, the data is made public in a well-annotated repository on Zenodo, noted in the study. Note that the distance data in Figure S13 were filtered to exclude the transcriptionally off state, and are thus not the quantity the current authors are interested in. If they plot the published data for no homie, they will see the clear effect on the average 3D distance, R(s), and a somewhat stronger effect on the contact frequency P(s), which causes significant deviation from the trend-line followed by the homie-containing data.

      (2) The agreement between the "best performing" simulations for all models and the Hi-C data is not on par with prior studies using similar approaches, apparently due to some erroneous choices in how the optimization is carried out:

      Hi-C-comparison

      The 'best fit' simulation Hi-C looks strikingly different from the biological data in all comparisons, with clearly lower agreement than other authors have shown using highly similar methods (e.g., Shi and Thirumalai 2023; Di Pierro et al. 2017; Nuebler et al. 2018; Esposito et al. 2022; Conte et al. 2022), among many others. I believe this results from a few issues with how the current authors select and evaluate the data in their work:

      (a) Most works have used Pearson's correlation rather than Spearman's correlation when comparing simulation and Hi-C contact frequencies. Pearson's correlation is more appropriate when we expect the values to be linearly related, which they should be in this case, as they are constructed indeed to be measuring the same thing (contact frequency), just derived from two different methods. Spearman's correlation would have been justifiable for comparing how transcription output correlates with contact frequency. This may fix the bafflingly low correlations reported at lower adhesion values in Figure S2C.

      (b) Choice of adhesion strengths - The Hi-C map comparison in Figure 3 strongly suggests that a much more striking visual agreement would have been achieved if much weaker (but still non-zero) homotypic monomer affinity had been selected. In the authors' simulation, the monomer state (A/B identity) strongly dominates polymer position, resulting in the visual appearance of an almost black-and-white checkerboard. The data, meanwhile, look like a weak checkerboard superimposed on the polymer.

      (c) A further confounding problem is the aforementioned issue that the Hi-C data don't come from the edited cell lines, and that the interaction of the two Homie sites is vastly stronger than the compartment interactions of this region of the genome.

      (3) Some important concepts from the field are ignored:

      The crumpled/fractal globule model is widely discussed in the literature (including the work containing the data used in this study) - its exclusion from this analysis thus appears as a substantial gap/oversight:

      A natural alternative to the much-discussed Rouse polymer model is the "crumpled polymer" (Grosberg et al. 1988; Grosberg 2016; Halverson et al. 2011; Halverson et al. 2011), also known as the "fractal globule" (Lieberman-Aiden et al. 2009; Mirny 2011; Dekker and Mirny 2016; Boettiger et al. 2016), much discussed for the way it captures the ⅓ scaling of R(s), found for much of the genome (or, equivalently, the -1 exponent of the probability of contact as a function of genome separation, P(s)). Given the 1/3rd scaling in the data, and the fact that the original authors highlighted the crumpled model in addition to the Rouse model, it seems that this comparison would be instructive and the lack of discussion an oversight. Moreover, while prior works (e.g., Buckner, Gregor, 2023) used some traditional simplifying assumptions to estimate the MSD and relaxation time scaling of this model, I believe a more rigorous analysis with explicit simulations (as in Figure 1 for the Rouse model) would be instructive for the crumpled polymer simulations. Note the crumpled globule is not necessarily the same as the globule in the coil-globule transition discussed here - it requires some assumptions about non-entanglement to stay trapped in the meta-stable state which has the 1/3rd R(s) scaling that is indicative of this model, and not the 1/2 exhibited by equilibrium globules (for s<< length of the polymer) and dilute polymers alike.

      While the fit in Figure 2 appears to get closer to the 1/3rd exponent (B= 0.32), this appears to be a largely coincidental allusion of agreement - the simulation data in truth shows a systematic deviation, returning to the 1/2 scaling for distances from 500 kb to whole chromosomes. This feature is not very evident as the authors restrict the analysis to only the few points available in the experimental data, though had they tested intervening distances I expect they would show log-log P(s) is nonlinear (non-powerlaw) for distances less than the typical loop length up to a few fold larger than the loop length, and thereafter returns to the scaling provided by the 'base' polymer behavior. This appears to be Rouse-like in these authors' model, with R(s) going like 1/2, even though the data are closer to 1/3rd, as indeed most published simulated P(s) curves based on loop extrusion - e.g., (Fudenberg et al. 2016; Nuebler et al. 2018). In this vein, it would be instructive to the readers if the authors would include additional predictions from the simulation on the plot that lie at genomic separation distances not tested in the data, to better appreciate the predictions.

      Minor issues

      (1) I think it is too misleading to only describe the experimental data from Brukner as "E-P" interactions from Drosophila. It is important to note somewhere that this is not an endogenous interaction with a functional role in Drosophila - it is a synthetic interaction between enhancers in the vicinity of the eve gene and a synthetic promoter placed at a variable distance away. The uniformity is elegant - (it is the same pair of elements being studied at all distances), but also provides limited scope for generalization as suggested by the current text. Moreover, the enhancers were not directly labeled; rather, the 3D position of nascent RNA transcribed from eve was tracked with an RNA-binding protein and used as a proxy for the 3D position of the enhancers. There is not an individual enhancer at the eve locus that interacts with the transgene, but rather a collection of enhancers is distributed at different positions throughout the entire TAD, which contains eve, and must form separate loops to reach eve. Indeed, it was previously reported that differences in the local position of these enhancers, relative to eve, affect their ability to interact with the distal reporter gene and the endogenous eve gene (Chen 2018). There is also reported competition between these enhancers and the distal gene, which further complicates the analysis (especially since the state of eve and of its enhancers varies among the different cells as a function of stripe position) - see Chen 2018. All of this is ignored in the current work, despite the assertion of the application to understanding E-P interaction. A detailed discussion of these issues is not necessary, but I fear that ignoring them entirely is to invite further confusion and error.

      (2) I believe this sentence is overstated, given available data: " TAD borders are characterized by transitions between epigenetic states rather than by preferentially-bound CTCF [4, 23, 24]." Indeed, this claim has been repeatedly made in the literature as cited here. However, other data clearly demonstrate a strong enrichment of CTCF at TAD borders (and at epigenetic borders, which in Drosophila have a high correspondence with TAD borders, as the authors have already appropriately noted). See, for example, Figure 4 of Sexton Cell 2012, and compare to Figure 2 of Dixon 2012. Of minor note, CTCF peaks co-occupied by the Zinc Finger TF CP190 are more likely to be TAD borders than CTCF alone. How big a species-specific difference this is remains unclear, as it appears some mammalian CTCF-marked TAD boundaries may be co-occupied by additional ZNFs. While plenty of Drosophila TAD boundaries indeed lack CTCF, many are marked by CTCF, this is enriched relative to what would be expected by chance (or relative to the alignment of other TFs, like Twist or Eve with TAD boundaries), and it has been shown that CTCF loss is sufficient to remove a subset of these, see for example Figure 5 of (Kaushal et al. 2021) (though it is possible, most will require mutation of the all the border-associated factors that collectively bind many of the borders, dCTCF, CP190, mod(mdg4) and others).

      (3) This assertion is overstated given available data: "Although TAD boundaries in Drosophila are often associated with insulator proteins [20], there is no direct evidence that these elements block LEFs in vivo. Therefore, we did not impose boundary constraints in our simulations; LEFs were allowed to move freely unless stalled by collisions with other LEFs, with the possibility of crossover.". Deletion of insulator in Drosophila that lie within a common epigenetic state leads to fusion of TADs (e.g., Mateo et al., 2019 - deletion of the CTCF-marked Fub insulator, in posterior tissues where both flanks of Fub are active; Kaushal, 2021, has examples as well). Loss of CTCF causes a small number of TADs to fuse as measured by Hi-C. This is far from 'direct evidence that insulators block LEFs' - as the authors have already noted, even the idea that cohesin extrudes loops in Drosophila in the first place is indeed controversial. However, LEF activity and stalling at insulators would provide a very natural explanation of why chromatin in a shared epigenetic state should form distinct TADs, and why these TADs should fuse upon insulator deletion. Justifying the lack of stalling sites based on empirical data is thus not very convincing to this reviewer. I believe it would be more apt to simply describe this as a simplifying assumption, rather than the above phrase, which may be misleading.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary

      In this investigation Kapustin et al. demonstrate that vascular smooth muscle cells (VSMCs) exposed to the extracellular matrix fibronectin stimulates the release of small extracellular vesicles (sEVs). The authors provide experimental evidence that stimulation of the actin cytoskeleton boosts sEV secretion and posit that sEVs harbor both fibronectin and collagen IV protein themselves which also, in turn, alter cell migration parameters. It is well established that fibronectin is associated with increased cell migration and adherence; therefore, this association with VSMCs is not novel.

      The reviewer is correct that FN has been associated with migration and adherence in previous studies.  However we have extended these observations to show that the extracellular fibronectin matrix stimulates small extracellular vesicle (sEVs) secretion by modulating the actin cytoskeleton. We also showed that sEVs are trapped in the extracellular matrix and that by presenting collagen VI induce early focal adhesion formation, reduce excessive cellular spreading and guide cell invasion directionality though a 3D matrix. Hence, sEVs mediate cell-matrix cross talk and change cell behaviour in the context of fibronectin matrix. This is critically important for vasculature where regulated VSMC invasion is essential for repair with its deregulation leading to pathology.

      The authors purport that sEV are largely born of filopodia origin; however, this data is not well executed and seems generally at odds with the presented data.

      Our experimental data showed that CD63 MVs are associated with filopodia in fixed and live cells (Fig 2E, 2F and Video S1) and that inhibition of filopodia formation using the formin inhibitor, SMIFH2 reduced sEV secretion on FN (Fig 2B). However, we agree with the reviewer that further studies are required to connect sEV secretion to filopodia.  To address this we have provided further data analysis but also toned down our conclusions regarding this point: . Changes include:

      (1) Title: Matrix-associated extracellular vesicles modulate smooth muscle cell adhesion and directionality by presenting collagen VI.

      (2) Results, section title: 2. FN-induced sEV secretion is modulated by Arp2/3 and formin-dependent actin cytoskeleton remodelling

      (3) Results, page 6 Line 27-44 and conclusion page 7, Ln 3 “Interestingly, CD63+ MVBs can be observed in filopodia-like structures suggesting that sEV secretion can also occur spatially via cellular protrusion-like filopodia but more studies are needed to confirm this hypothesis.”

      (4) Discussion, page 12, line 19. “Curiously we observed CD63+ MVB transport toward the filopodia tips as well as inhibition of sEV-secretion with filopodia formation inhibitors suggesting that sEV secretion can be directly linked to filopodia but further studies are needed to define the contribution of this pathway to the overall sEV secretion by cells.”

      Similarly, the effect of sEVs on parameters of cell migration has almost no magnitude of effect, making mechanism exploration somewhat nebulous.

      VSMC are mesenchymal-type cells with a low migration rate and we agree that the changes in the motility are not of great magnitude even for the positive controls suggesting that this is a complex, multifactorial process for VSMCs. In our experiments we collected data from >5000 individual cells to measure the average speed and found that fibronectin matrix on its own increased VSMC speed from ~0.61 um/min to ~0.68 μm/min (~12% raise) which was statistically significant (Fig 5A). Addition of a sEV inhibitor caused a modest but significant decrease in cellular speed. Interestingly, addition of ECM-associated sEVs did not influence cell speed in 2D or 3D assays. However in a 3D model we observed a 22% change in cell directionality (Fig 5G) and  a 235% change in cell alignment index (FMI, Fig 5H) which we believe is very strong evidence that VSMC-derived sEVs are involved in a regulation of VSMC invasion directionality.  These data are also in agreement with sEV effects in tumour cells (Sung et al., 2015) though this previous study did not identify the factor driving the directionality and we think our Collagen VI data extends significantly these previous observations. 

      Results, page 9: “Hence, ECM-associated sEVs have modest influence on VSMC speed but influence VSMC invasion directionality.”.  

      Lastly, the proposed mechanism of VSMCs responding to, and depositing, ECM proteins via sEVs was not rigorously executed; again, making the conclusions challenging for the reader to interpret.

      We appreciate the reviewer’s comment regarding the mechanistic aspects of VSMCs responding to and depositing ECM proteins via sEVs. In our revised manuscript, we have expanded the data demonstrating that sEVs can be retained within the extracellular matrix (see Figs 3A, 3B, S3A, S3B). Additionally, we show that collagen VI is present on the surface of sEVs, where it may modulate cell adhesion and influence the directionality of cell invasion (Fig 7E). Our results further indicate that both fibronectin (FN) and collagen VI can be recycled through multivesicular bodies (see Figs S3C, S3D, S3E–S3G). However, we acknowledge that the precise mechanisms governing the selective loading of ECM proteins onto sEVs, as well as the specific contributions of sEVs to overall ECM organization, remain to be fully elucidated and warrant further investigation. Based on our current evidence, we propose that collagen VI–loaded sEVs act primarily in a signaling capacity by modulating focal adhesion formation but are not directly involved in ECM structural remodeling.

      Results, page 7: To quantify ECM-trapped sEVs we applied a modified protocol for the sequential extraction of extracellular proteins using salt buffer (0.5M NaCl) to release sEVs which are loosely-attached to ECM via ionic interactions, followed by 4M guanidine HCl buffer (GuHCl) treatment to solubilize strongly-bound sEVs (Fig S3A) [42]. We quantified total sEV and characterised the sEV tetraspanin profile in conditioned media, and the 0.5M NaCl and GuHCl fractions using ExoView. The total particle count showed that EVs are both loosely bound and strongly trapped within the ECM. sEV tetraspanin profiling showed differences between these 3 EV populations.  While there was close similarity between the conditioned media and the 0.5M NaCl fraction with high abundance of CD63+/CD81+ sEVs as well as CD63+/CD81+/CD9+ in both fractions (Fig S3A). In contrast, the GuHCl fraction was particularly enriched with CD63+ and CD63+/CD81+ sEVs with very low abundance of CD9+ EVs (Fig S3A). The abundance of CD63+/CD81+ sEVs was confirmed independently by a CD63+ bead capture assay in the media and loosely bound fractions (Fig S3B).

      Results, page 7: We previously found that the serum protein prothrombin binds to the sEV surface both in the media and MVB lumen showing it is recycled in sEVs and catalyses thrombogenesis being on the sEV surface43. So we investigated whether FN can also be associated with sEV surface where it can be directly involved in sEV-cell cross-talk43.   We treated serum-deprived primary human aortic VSMCs with FN-Alexa568 and found that it was endocytosed and subsequently delivered to early and late endosomes together with fetuin A, another abundant serum protein that is a recycled sEV cargo and elevated in plaques (Figs S3C and S3D). CD63 visualisation with a different fluorophore (Alexa488) confirmed FN colocalization with CD63+ MVBs (Fig S3E). Next, we stained non-serum deprived VSMC cultured in normal growth media (RPMI supplemented with 20% FBS) with an anti-FN antibody and observed colocalization of CD63 and serum-derived FN.  Co-localisation was reducd likely due to competitive bulk protein uptake by non-deprived cells (Fig S3F). Notably, when we compared FN distribution in sparsely growing VSMCs versus confluent cells we found that FN intracellular spots, as well as colocalization with CD63, completely disappeared in the confluent state (Fig S3F and S3G). This correlated with nearly complete loss of CD63+/CD81+ sEV secretion by the confluent cells indicating that confluence abrogates intracellular FN trafficking as well as sEV secretion by VSMCs (Fig S3H). Finally, FN could be co-purified with sEVs from VSMC conditioned media (Fig S3I) and detected on the surface of sEVs by flow cytometry confirming its loading and secretion via sEVs (Fig 3C).

      Results: page 10  Collagen VI was the most abundant protein in VSMC-derived sEVs (Fig 7B, Table S7) and  was previously implicated in the interaction with the proteoglycan NG2[53] and suppression of cell spreading on FN[54]. To confirm the presence of collagen VI in ECM-associated sEVs we analysed sEVs extracted from the 3D matrix using 0.5M NaCl treatment and showed that both collagen VI and FN are present (Fig 7D). Next, we analysed the distribution of collagen VI using dot-blot. Alix staining was bright only upon permeabilization of sEV indicating that it is preferentially a luminal protein (Fig 7E). On the contrary, CD63 staining was similar in both conditions showing that it is surface protein (Fig 7E). Interestingly, collagen VI staining revealed that 40% of the protein is located on the outside surface with 60% in the sEV lumen (Fig 7E). 

      Discussion page 12. “In fact, we observed that an extensive secretion of sEVs effectively ceased protrusion activity; also VSMCs acquired a rounded morphology when “hovering” over the FN matrix decorated with sEVs (data not shown). Hence, it will be interesting in future studies to investigate whether sEVs can stimulate Rho activity by presenting adhesion modulators—particularly collagen VI—on their surface, thereby guiding cell directionality during invasion..”

      Discussion, page 14 “In summary, cooperative activation of integrin signalling and F-actin cytoskeleton pathways results in the secretion of sEVs which associate with the ECM and play a signalling role by controling FA formation and cell-ECM crosstalk. Further studies are needed to test these mechanisms across various cell types and ECM matrices.     

      Strengths

      The authors provide a comprehensive battery of cytoskeletal experiments to test how fibronectin and sEVs impact both sEV release and vascular smooth muscle cell migratory activation.

      We appreciate this comment reflecting our efforts to apply a range of orthogonal methods to show the role of the integrin/actin cytoskeleton in ECM-stimulated sEV secretion.

      Weaknesses

      Unfortunately, this article suffers from many weaknesses. First, the rigor of the experimental approach is low, which calls into question the merit of the conclusions. In this vein, there is a lack of proper controls or inclusion of experiments addressing alternative explanations for the phenotype or lack thereof.

      We acknowledge this comment and agree that there was not sufficient evidence to conclude that sEV secretion occurs via filopodia despite the microscopy/inhibitory data so this claim has now been excluded from the study. However we believe that our experimental data does clearly show that FN stimulates the secretion of collagenVI-loaded sEVs which are trapped by the ECM and have the capacity to modulate VSMC adhesion and invasion directionality. To support this, we have now extended the dataset in the revised version:

      (1) In addition to the use of inhibitors and live cell analysis we have added quantitative data confirming that a large proportion of CD63+ endosomes are associated with F-actin/cortactin tails and this colocalization is increased upon the inhibition of sEV secretion with 3-OMS (Fig  2D, Fig S2B).

      (2) We developed a method to extract ECM-associated sEVs and quantified/characterized these using ExoView Assays further confirming significant sEV entrapment by the ECM (Figs 3B, S3A, S3B).    

      (3) We extended the controls to confirm FN delivery to CD63+ endosomes and showed that FN recycling is stopped upon reaching cell confluence (Figs S3F, S3G and Fig S3H).

      (4) We included more intensive characterisation of human atherosclerotic plaque morphology (H&E, Masson’s trichrome staining, Orcein, elastin fibers staining) to confirm predominant accumulation of sEV in the neointima (Figs S4A, S4B and S4C). We also excluded an endothelial origin for the  CD81+ sEVs (Fig 4G).

      (5) We included individual cellular tracks to the 2D migration analysis to confirm the statistical significance and concluded that ECM-associated sEVs regulate cell invasion directionality but not the cell speed (Figs 5A and 5B).

      (6) We showed surface localisation of collagen VI on sEVs confirming that it can activate signalling pathways leading to early FA formation on the FN matrix  (Figs 7D and 7E).

      (7) We included alternative explanations for some of our data in the discussion.      

      Reviewer #2 (Public Review):

      Extracellular vesicles have recently gained significant attention across a wide variety of fields, and they have therefore been implicated in numerous physiological and pathophysiological processes. When such a discovery and an explosion of interest occur in science, there is often much excitement and hope for answers to mechanisms that have remained elusive and poorly understood. Unfortunately, there is an equal amount of hype and overstatement that may also be put forth in the name of "impact", but this temptation must be avoided so that scientists and the broader public are not misled by overreaching interpretations and statements that lack rigorous and fully convincing evidence.

      Thank you for your comment and we agree that investigating sEVs is particularly challenging due to the their heterogeneity and nano-size, as well as complex biogenesis mechanisms. ECM-associated sEVs is a very new direction for the EV field but one that is particularly relevant to the vasculature where cells must invade through a thick ECM and where the accumulation of ECM-bound EVs is a unique and documented phenomenon.  To further strengthen out conclusions we have included new data to support our statements but also excluded statements re: filopodia as the origin of sEVs, that are out of scope of our study and need to be investigated further.

      The study presented by Kapustin et al. is certainly intriguing and timely, and it offers an interesting working hypothesis for the fields of extracellular vesicles and vascular biology to consider. The authors do a reasonable job at detecting these small extracellular vesicles, though some aspects of data presentation are missing such as full Western blots with accompanying size markers for the viewer to more fully appreciate that data and comparisons being made (see Figures 1 and 7).

      We agree with the reviewer and have now included molecular weight markers (Fig 1F, 7C, 7D, S3I, S4E) and provided all original western blot scans (uncropped and unedited) to the eLife editor. 

      Much of the imaging data from cell-based experiments is strong and conducted with many cutting-edge tools and approaches. That said, the static images and the dynamic imaging fall short of being fully convincing that the small extracellular vesicles found in the neighboring extracellular matrix are indeed being deposited there via the smooth muscle cell filopodia. Many of the lines of evidence presented suggest that this could occur, but alternative hypotheses also exist that were not fully ruled out, such as the ECM-deposited vesicles were secreted more from the soma and/or the lamellipodia that are also emitted and retracted from the cells. In particular, the authors show very nice dynamic imaging (Supplementary Figure S2A and Supplemental Video S1) that is interpreted as "extracellular vesicles being released from the cell" and these are seen as "bursts" of fluorescent signal; however, none of these appear to occur in filopodia as they appear within the cell proper (a "burst" of signal vs. a more intense "streak" of signal), which would be a stronger and more consistent observation predicted by the working model proposed by the authors.

      Our live and fixed cell microscope data as well as inhibitor analysis showed that sEV secretion can be associated with the filopodia. However we agree with the reviewer that the data generated using pHluoron GFP marker clearly indicate that the majority of sEVs are secreted from the cell soma toward the ECM:

      To reflect this, we have added further changes:

      (1) Title: Matrix-associated extracellular vesicles modulate smooth muscle cell adhesion and directionality by presenting collagen VI.

      (2) Results, section title: 2. FN-induced sEV secretion is modulated by Arp2/3 and formin-dependent actin cytoskeleton remodelling

      (3)  Results, page 6 Line 27-36 “Formins and the Arp2/3 complex play a crucial role in the formation of filopodia, a cellular protrusion required for sensing the extracellular environment and cell-ECM interactions36. To test whether MVBs can be delivered to filopodia, we stained VSMCs for Myosin-10 (Myo10)37. We observed no difference between total filopodia number per cell on plastic or FN matrices (n=18±8 and n=14±3, respectively) however the presence of endogenous CD63+ MVBs along the Myo10-positive filopodia were observed in both conditions (Fig 2E, arrows). Filopodia have been implicated in sEV capture and delivery to endocytosis “hot-spots”38, so next we examined the directionality of CD63+ MVB movement in filopodia by overexpressing Myo10-GFP and CD63-RFP in live VSMCs. Importantly, we observed anterograde MVB transport toward the filopodia tip (Fig 2F and Supplementary Video S2) indicative of MVB secretion”.

      (4) Results, page 6, Ln 37-44 “We also attempted to visualise sEV release in filopodia using CD63-pHluorin where fluorescence is only observed upon the fusion of MVBs with the plasma membrane39. Using total internal reflection fluorescence microscopy (TIRF) we observed the typical “burst”-like appearance of sEV secretion at the cell-ECM interface in full agreement with an earlier report showing MVB recruitment to invadopodia-like structures in tumor cells18 (Fig S2B and Supplementary Video S1). Although we also observed an intense CD63-pHluorin staining along filopodia-like structures we were not able to detect typical “burst”-like events to confirm sEV secretion in filopodia. (Fig S2C and Supplemental Video S1)”.

      (5) Results, page 7 Ln 3 “Interestingly, CD63+ MVBs can be observed in filopodia-like structures suggesting that sEV secretion can also occur spatially via cellular protrusion-like filopodia but more studies are needed to confirm this hypothesis.”

      (6) Discussion, page 12, line 19. “Curiously we observed CD63+ MVB transport toward the filopodia tips as well as inhibition of sEV-secretion with filopodia formation inhibitors suggesting that sEV secretion can be directly linked to filopodia but further studies are needed to define the contribution of this pathway to the overall sEV secretion by cells.”

      Imaging of related human samples is certainly a strength of the paper, and the authors are commended for attempting to connect the findings from their cell culture experiments to an important clinical scenario. However, the marker selected for marking extracellular vesicles is CD81, which has been described as present on the endothelium of atherosclerotic plaques with a proposed role in the recruitment of monocytes into diseased arteries (Rohlena et al. Cardiovasc Res 2009). More data should address this potentially confounding interpretation of the signals presented in images within Figure 4.

      We thank the reviewer for this insightful comment that the  sEV marker CD81 can originate from endothelial cells in agreement with Rohlena et al., 2009.   To address this we investigated the spatial overlap between CD81 and the endothelial marker, CD31. We observed very strong CD81 staining in the intact endothelial cell (intima) layer and occasional CD31 positive cells in the neointima. Importantly, quantification of colocalization confirmed that 80% of CD81 in the neointima does not overlap with CD31 excluding an endothelial origin of these sEVs. (Fig 4G).  Moreover, we included complete morphological characterisation of the atherosclerotic plaques confirming that CD81 sEVs were primarily observed in the neointima where VSMCs constitute the cellular majority (Fig S4A, S4B, S4C and S4D).

      On a conceptual level, the idea that the small extracellular vesicles contain Type VI Collagen, and this element of their cargo is modulating smooth muscle cell migration, is an intriguing aspect of the authors' working model. Nevertheless, the evidence supporting this potential mechanism does not quite fit together as presented. It is not entirely clear how the collagen VI within the vesicles is somehow accessed by the smooth muscle cell filopodia during migration. Are the vesicles lysed open once on the extracellular matrix? If so, what is the proposed mechanism for that to occur? If not, how are the adhesion molecules on the smooth muscle cell surface engaging the collagen VI fibers that are contained within the vesicles? This aspect of the model does not quite fit together with the proposed mechanism and may be an interesting speculative interpretation, warranting further investigation, but it should not be considered a strong conclusion with sufficient convincing data supporting this idea.

      We thank the reviewer for their insightful comments regarding the mechanism by which collagen VI associated with sEVs could modulate smooth muscle cell adhesion and migration. To clarify, our new data suggest that collagen VI is predominantly present on the surface of the sEVs, as evidenced by Fig 7E. This surface localization strongly implies that collagen VI can be directly accessed by cell surface adhesion receptors, without the need for vesicle lysis or opening. While we cannot entirely rule out all alternative mechanisms, we consider vesicle rupture or lysis within the extracellular matrix to be a highly unlikely route for collagen VI exposure, given the known stability of sEVs under physiological conditions. We have added these points to clarify:

      (1) Results, page 10, Ln 45 “To confirm the presence of collagen VI in ECM-associated sEVs we analysed sEVs extracted from the 3D matrix using 0.5M NaCl treatment and showed that both collagen VI and FN are present (Fig 7D). Next, we analysed the distribution of collagen VI using dot-blot. Alix staining was bright only upon permeabilization of sEV indicating that it is preferentially a luminal protein (Fig 7E). On the contrary, CD63 staining was similar in both conditions showing that it is surface protein (Fig 7E). Interestingly, collagen VI staining revealed that 40% of the protein is located on the outside surface with 60% in the sEV lumen (Fig 7E).”

      (2) Discussion, page 13, Ln 2 “Hence, it will be interesting in future studies to investigate whether sEVs can stimulate Rho activity by presenting adhesion modulators—particularly collagen VI—on their surface, thereby guiding cell directionality during invasion..”

      (3) Discussion, page 14, Ln 30: In addition to collagen VI the unique adhesion cluster in VSMC-derived sEVS also includes EGF-like repeat and discoidin I-like domain-containing protein (EDIL3), transforming growth factor-beta-induced protein ig-h3 (TGFBI) and the lectin galactoside-binding soluble 3 binding protein (LGALS3BP) and these proteins are also directly implicated in activation of integrin signalling and cellular invasiveness85-87. Although we found that collagen VI plays the key role in sEV-induced early formation of FAs in VSMCs, it is tempting to speculate that the high sEV efficacy in stimulating FA formation is driven by cooperative action of this unique adhesion complex on the sEVs surface and targeting this novel sEV-dependent mechanism of VSMC invasion may open-up new therapeutic opportunities to modulate atherosclerotic plaque development or even to prevent undesired VSMC motility in restenosis.    .   

      (4) Abstract Figure

      On a technical level, some of the statistical analysis is not readily understood from the data presented. It is very much appreciated that the authors show many of the graphs with technical and biological replicate values in addition to the means and standard deviations (though this is not clearly stated in all figure legends). However, in figures such as Figure 5, there are bars shown and indicated to be different by statistical comparison (see panel B in Figure 5). It is not clear how the values for Group 1 (no FN, no 3-OMS, no sEV) are statistically different (denoted by three asterisks but no p value provided in the legend) than Group 3 (no FN, 3-OMS added, no sEV), when their means and standard deviations appear almost identical. If this is an oversight, this needs to be corrected. If this is truly the outcome, further explanation is warranted. A higher level of transparency in such instances would certainly go a long way in helping address the current crisis of mistrust within the scientific community and at the interface with society at-large.

      We thank the reviewer for their careful reading and important comments on the statistical analysis. We acknowledge that the technical and biological replicate data were not clearly reported in all figure legends and that the statistical approach for Figures 5A and 5B required clarification. In response, we have made several changes for greater transparency and rigor:

      First, we have now explicitly included the numbers of biological replicates (N) and technical replicates (n) in all relevant figure legends for Figures 1–7. In addition, the number of individual cell tracks is now annotated for the migration/invasion analyses, along with the mean values for each dataset.

      Upon review, we found that the original statistical analyses for Figures 5A and 5B were conducted using pooled averaged data. To address this, we have repeated the statistical tests using pooled individual cell track data, applying the Kruskal–Wallis test with Dunn’s multiple comparison correction. This more stringent approach revealed revised p-values, which are now indicated in Figures 5A and 5B.

      With these corrections, we reconfirm our major findings: In the 2D model, fibronectin (FN) coating promotes VSMC velocity, while inhibition of sEV secretion with 3-OMS leads to reduced cell speed (Fig. 5A). Addition of sEVs to the ECM had no effect on VSMC speed at baseline but did rescue cell speed and distance in the presence of 3-OMS, consistent with EVs acting primarily on invasion directionality rather than speed in both 2D and 3D models (Fig. 5A, 5D). Furthermore, sEVs continue to significantly impact VSMC invasion directionality (Figs. 5G, 5H), in agreement with previous reports in tumor cells (Sung et al., 2015).

      In summary, we have implemented the following revisions:

      (1) Figures 5A and 5B: Individual cell track data are now shown, and statistical analyses have been repeated using the Kruskal–Wallis test with Dunn’s multiple comparisons.

      (2) Figure legends and results sections: Numbers of biological and technical replicates, as well as individual data points, are now clearly stated.

      Results, page 9, line 14: The text has been updated to clarify the statistical approach and major findings as described above.

      We hope that these changes address the reviewer’s concerns and improve the transparency and reproducibility of our data presentation

      Reviewer #1 (Recommendations For The Authors):

      We are very thankful for the comprehensive review and comments which helped to improve our data.

      Figure 1.<br /> The authors clearly show that FN stimulation (immobilized or cell-derived) promotes sEV secretion via canonical integrin pathways. FN is a promigratory substrate, hence its extensive use as a cell adhesion aid; thus one could assume that simply plating on FN induces a pro-migratory phenotype (later data supports this notion). Does the addition of growth factors also increase sEV release? An endogenous function of FN is siloing of various GFs during clot formation. Also, FAK and SRC networks intersect with canonical RTK signaling in terms of promoting Rac1, CDC42 and other migration mediators. The reason I believe this is important is because the data could be interpreted in two ways: 1) FN induces pro-migration signaling and then sEVs are released, or visa versa, FN induces sEV release and migration is initiated. GF supplementation in the absence of FN would clarify this relationship.

      We thank the reviewer for this insightful comment regarding the possible role of growth factors (GFs) and the mechanistic relationship between FN stimulation, sEV secretion, and cell migration. We agree that FN is a well-established promoter of cell migration, and it is important to distinguish whether FN directly induces a pro-migratory phenotype or does so via sEV-mediated signaling.

      Our data show that FN stimulation markedly increases VSMC motility, as reflected by enhanced cell speed (Fig. 5A), an increased number of focal adhesions (Fig. 6E), and facilitated centripetal movement of FAs (Fig. 6F). Interestingly, ECM-associated sEVs appear to play a complementary but distinct role: they do not significantly affect cell migration speed (Fig. 5A) but instead guide cell invasion directionality (Figs. 5G, 5H), reduce the number of FAs per cell (Fig. 6E), and promote early peripheral FA formation (Fig. 6F). In light of these findings, we have updated our graphical abstract to reflect the unique cross-talk mediated by sEVs between VSMCs and the ECM.

      Regarding the influence of growth factors, we acknowledge that FN can bind and present different GFs, which could also contribute to changes in sEV secretion. Although our inhibition studies and integrin-blocking antibody results support a primary role for β1 integrin activation and actin assembly in triggering sEV secretion, we cannot entirely exclude the possibility that FN-bound growth factors play a role in this process. We have now incorporated this point into the discussion to address the reviewer’s suggestion.

      Discussion, page 14 , Ln 7 “Although our small inhibitors and integrin modulating antibody data clearly indicate that β1 activation triggers sEV secretion via activation of actin assembly we cannot fully rule out that FN may also be modulating growth factor activity which in turn contributes to sEV secretion by VSMCs<sup>23</sup>.  Excessive collagen and elastin matrix breakdown in atheroma has been tightly linked to acute coronary events hence it will be interesting to study the possible link between sEV secretion and plaque stability as sEV-dependent invasion is also likely to influence the necessary ECM degradation induced by invading cells<sup>96</sup>

      Figure 2.<br /> • The authors provide no evidence (or references) that SMIFH2 or CK666 halts filopodia extensions.

      Thank you for this important note. We have included the corresponding references:

      Results, page 5: “So next we tested the contribution of Arp2/3 and formins by using the small molecule inhibitors, CK666 and SMIFH2, respectively31, 32”.  

      • Is there an increase in filopodia density when plated on FN vs plastic? Similarly, if there are more filopodia present is that associated with more sEV? Please provide evidence in this regard.

      We agree that connecting the number of filopodia with the secretion level of sEVs may be an important clue if sEV secretion can be driven by FN-induced filopodia formation. However, Myosin10 staining to quantify filopodia (Fig 2E) showed no difference between VSMCs plated on plastic versus FN matrix. Therefore, we agree with the reviewer that the filopodia contribution to sEV secretion needs to be investigated further.  This idea is reflected in the following comments:

      (1) Results, page 6, Ln 29 “We observed no difference between total filopodia number per cell on plastic or FN matrices (n=18±8 and n=14±3, respectively) however the presence of endogenous CD63+ MVBs along the Myo10-positive filopodia were observed in both conditions (Fig 2E, arrows).

      (2) Results, page 6, Ln 37 “We also attempted to visualise sEV release in filopodia using CD63-pHluorin where fluorescence is only observed upon the fusion of MVBs with the plasma membrane39. Using total internal reflection fluorescence microscopy (TIRF) we observed the typical “burst”-like appearance of sEV secretion at the cell-ECM interface in full agreement with an earlier report showing MVB recruitment to invadopodia-like structures in tumor cells18 (Fig S2B and Supplementary Video S1). Although we also observed an intense CD63-pHluorin staining along filopodia-like structures we were not able to detect typical “burst”-like events to confirm sEV secretion in filopodia. (Fig S2C and Supplemental Video S1)..”

      (3) Discussion, page 12, Ln 15 : “Focal complexes either disassemble or mature into the elongated centripetally located FAs48. In turn, these mature FAs anchor the ECM to actin stress fibres and the traction force generated by actomyosin-mediated contractility pulls the FAs rearward and the cell body forward12, 13. Here we report that β1 integrin activation triggers sEV release followed by sEV entrapment by the ECM. Curiously we observed CD63+ MVB transport toward the filopodia tips as well as inhibition of sEV-secretion with filopodia formation inhibitors suggesting that sEV secretion can be directly linked to filopodia but further studies are needed to define the contribution of this pathway to the overall sEV secretion by cells..”

      As hinted above, this data could be interpreted in the light of generally inhibiting cell migration to blunt sEV shedding. Does cell confluence affect sEV release? If cells are cultured to 100% confluency this would limit filopodia formation regardless of ECM type. If sEV secretion remains elevated on FN in this culture condition it would suggest a lack of dependency on filopodia.

      We thank the reviewer for this thoughtful suggestion regarding the influence of cell confluence on sEV release and filopodia formation. To directly address this hypothesis, we performed additional experiments comparing VSMCs cultured at low and high confluency. As described in the revised Results (page 7, line 39), we found that high cellular confluency reduced FN recycling, as indicated by the marked decrease in intracellular FN-positive spots and loss of colocalization with CD63 (Figs S3F, S3G). Importantly, this was accompanied by a significant reduction in CD63+/CD81+ sEV secretion by confluent cells (Fig S3H). These results suggest that VSMC confluence, which suppresses filopodia formation, nearly abolishes both intracellular FN trafficking and sEV secretion, even in the presence of FN. Thus, under our experimental conditions, sEV secretion by VSMCs appears to be closely linked to dynamic cell–matrix interactions and is dramatically reduced when these processes are limited by confluence:

      (1) Results, page 7, Ln 39 : “Notably, when we compared FN distribution in sparsely growing VSMCs versus confluent cells we found that FN intracellular spots, as well as colocalization with CD63, completely disappeared in the confluent state (Fig S3F and S3G). This correlated with nearly complete loss of CD63+/CD81+ sEV secretion by the confluent cells indicating that confluence abrogates intracellular FN trafficking as well as sEV secretion by VSMCs (Fig S3H)..  

      • Inhibition of branched actin polymerization has been shown to reduce both exocytic and endocytic activity. Thus, it is hard to interpret the results of Fig. 2B than anything more than a generalized effect of losing actin.

      We thank the reviewer for this important point regarding the broad cellular functions of branched actin polymerization, and agree that generalized actin loss can influence both exocytic and endocytic pathways. To address this, we performed additional experiments and analyses to better define the relationship between branched actin structures and sEV-related processes in VSMCs.

      As described in the revised Results (page 6), we overexpressed ARPC2-GFP (an Arp2/3 subunit) together with F-tractin-RFP in VSMCs and carried out live-cell imaging. This approach revealed that Arp2/3 and F-actin organize into lamellipodial scaffolds at the cell cortex, as expected (Fig. S2A; Supplementary Video S2). Additionally, and more unexpectedly, we observed numerous Arp2/3– and F-actin–positive dynamic spots within the VSMC cytoplasm. These structures resemble actin comet tails seen in other systems, previously implicated in endosomal propulsion (Fig. S2A, arrow; Supplementary Video S2).

      Quantitative analysis confirmed that a substantial fraction of these dynamic F-actin/cortactin spots colocalized with CD63+ endosomes (Fig. 2D), and that these structures are indeed branched actin tails based on cortactin immunostaining. Furthermore, inhibition of SMPD3 (with 3-OMS) induced enlarged cortactin/F-actin/CD63+ complexes, morphologically similar to invadopodia (Fig. 2D, arrowheads), supporting a functional link between actin branching and MVB dynamics.

      To quantify the association, we calculated Manders’ colocalization coefficients for F-actin tails and CD63+ endosomal structures in fixed VSMCs, observing that ~50% of F-actin tails were associated with ~13% of endosomes. Upon 3-OMS treatment, this overlap increased further (Fig. S2B).

      Finally, using live-cell imaging (Fig 2C; Supplementary Video S4), we directly observed CD63+ MVBs being propelled through the cytoplasm by Arp2/3-driven actin tails, suggesting a mechanistic role for branched actin assembly in MVB intracellular transport, rather than a generalized effect of actin disruption alone.

      We believe these combined data reinforce a more specific mechanistic role for Arp2/3-mediated branched actin in MVB/endosome transport and, consequently, in sEV secretion in VSMCs—over and above an indirect effect of global actin loss. We hope these additional experiments and quantitative analyses address the reviewer’s concern and clarify the functional relevance of branched actin structures to sEV trafficking:

      (1) Results, page 6, Ln 3 “As regulators of branched actin assembly, the Arp2/3 complex and cortactin are thought to contribute to sEV secretion in tumour cells by mediating MVB intracellular transport and plasma membrane docking[28, 33]. Therefore, we overexpressed the Arp2/3 subunit, ARPC2-GFP and the F-actin marker, F-tractin-RFP in VSMCs and performed live-cell imaging. As expected, Arp2/3 and F-actin bundles formed a distinct lamellipodia scaffold in the cellular cortex (Fig S2A and Supplementary Video S2). Unexpectedly, we also observed numerous  Arp2/3/F-actin positive spots moving  through the VSMC cytoplasm that resembled previously described endosome actin tails observed in Xenopus eggs[33] and parasite infected cells where actin comet tails propel parasites via filopodia to neighbouring cells[34, 35] (Fig S2A, arrow, and Supplementary Video S2). Analysis of the intracellular distribution of Arp2/3 and CD63-positive endosomes in VSMCs showed CD63-MVB propulsion by the F-actin tail in live cells (Fig 2C and Supplementary Video S4).”

      (2) Results, New data Fig 2D, page 6, Ln 14. “we observed numerous F-actin spots in fixed VSMCs that were positive both for F-actin and cortactin indicating that these are branched-actin tails (Fig 2D). Moreover, cortactin/F-actin spots colocalised with CD63+ endosomes and addition of the SMPD3 inhibitor, 3-OMS, induced the appearance of enlarged doughnut-like cortactin/F-actin/CD63 complexes resembling invadopodia-like structures similar to those observed in tumour cells (Fig 2D, arrowheads)[18].”

      (3) Results, New data Fig S2B, page 6, Ln 19 “To quantify CD63 overlap with the actin tail-like structures, we extracted round-shaped actin structures and calculated the thresholded Manders colocalization coefficient (Fig S2B).  We observed overlap between F-actin tails and CD63 as well as close proximity of these markers in fixed VSMCs (Fig S2B). Approximately 50% of the F-actin tails were associated with 13% of all endosomes (tM1=0.44±0.23 and tM2= 0.13±0.06, respectively, N=3). Addition of 3-OMS enhanced this overlap further (tM1=0.75±0.18 and tM2=0.25±0.09) suggesting that Arp2/3-driven branched F-actin tails are involved in CD63+ MVB intracellular transport in VSMCs”

      • In video 1 the author states (lines 8-9; pg6) "intense CD63 staining along filopodia" Although, there is some fluorescence (not strong) in these structures, there was no visible exocytic activity. This data is more suggestive that sEVs (marked by CD63) are not associated with filopodia. The following conclusion statement the authors make is overreaching given this result.

      We thank the reviewer for this careful observation and agree that the previous conclusion regarding sEV release from filopodia was overstated. In response, we have revised both the Results and Discussion sections to more accurately reflect the data..

      (1) Results, page 6, Ln37 “We also attempted to visualise sEV release in filopodia using CD63-pHluorin where fluorescence is only observed upon the fusion of MVBs with the plasma membrane39. Using total internal reflection fluorescence microscopy (TIRF) we observed the typical “burst”-like appearance of sEV secretion at the cell-ECM interface in full agreement with an earlier report showing MVB recruitment to invadopodia-like structures in tumor cells18 (Fig S2B and Supplementary Video S1). Although we also observed an intense CD63-pHluorin staining along filopodia-like structures we were not able to detect typical “burst”-like events to confirm sEV secretion in filopodia. (Fig S2C and Supplemental Video S1)..”

      (2) Discussion, page 12, Ln19 “Curiously we observed CD63+ MVB transport toward the filopodia tips as well as inhibition of sEV-secretion with filopodia formation inhibitors suggesting that sEV secretion can be directly linked to filopodia but further studies are needed to define the contribution of this pathway to the overall sEV secretion by cells.”. 

      • Fig 2D and video 2 are wholly unconvincing with regard to sEV secretion sites. The authors could use their CD63-pHluroin construct to count exocytic events in the filopodia vs the whole cell. Given the movie, I have a suspicion this would not be significant. The authors could also perform staining CD63 in non-permeabilized cells to capture and count exocytic events at the plasma membrane as well as their location between groups.

      We thank the reviewer for these constructive suggestions and their critical assessment of our current data regarding the sites of sEV secretion. We agree that our CD63-pHluorin approach clearly indicates sEV secretion events in the soma at the cell–ECM interface, while we did not observe comparable events in filopodia. Accordingly, we have clarified these points in the revised manuscript.

      (1) Results, page 6, Ln37 “We also attempted to visualise sEV release in filopodia using CD63-pHluorin where fluorescence is only observed upon the fusion of MVBs with the plasma membrane39. Using total internal reflection fluorescence microscopy (TIRF) we observed the typical “burst”-like appearance of sEV secretion at the cell-ECM interface in full agreement with an earlier report showing MVB recruitment to invadopodia-like structures in tumor cells18 (Fig S2B and Supplementary Video S1). Although we also observed an intense CD63-pHluorin staining along filopodia-like structures we were not able to detect typical “burst”-like events to confirm sEV secretion in filopodia. (Fig S2C and Supplemental Video S1)..”

      (2) Discussion, page 12, Ln19 “Curiously we observed CD63+ MVB transport toward the filopodia tips as well as inhibition of sEV-secretion with filopodia formation inhibitors suggesting that sEV secretion can be directly linked to filopodia but further studies are needed to define the contribution of this pathway to the overall sEV secretion by cells.”. 

      • Fig. 2E and video 4. Again, the conclusions drawn from this data are very strained. First, no co-localization quantification is presented on the proportion of CD63 vesicles with actin. Once again, the movie, if anything convinces the reader that 95-99% of all CD63 vesicles are not associated with actin; therefore, this is an unlikely mechanism of transport.

      We thank the reviewer for this valuable comment and for highlighting the need for quantitative co-localization analysis. In response, we developed a method to systematically quantify F-actin and CD63 co-localization in fixed VSMCs, as now presented in new Figures 2D and S2B. We acknowledge that the majority of CD63+ endosomes are not associated with F-actin, consistent with the reviewer’s interpretation. However, our quantitative data now show that a specific subpopulation of MVBs appears to utilize this actin-based mechanism for transport. We believe this addresses the concern and more accurately reflects the prevalence and significance of the mechanism described.

      (1) Results, page 6 , Ln 19. “To quantify CD63 overlap with the actin tail-like structures, we extracted round-shaped actin structures and calculated the thresholded Manders colocalization coefficient (Fig S2B).  We observed overlap between F-actin tails and CD63 as well as close proximity of these markers in fixed VSMCs (Fig S2B). Approximately 50% of the F-actin tails were associated with 13% of all endosomes (tM1=0.44±0.23 and tM2= 0.13±0.06, respectively, N=3). Addition of 3-OMS enhanced this overlap further (tM1=0.75+/-0.18 and tM2=0.25+/-0.09) suggesting that Arp2/3-driven branched F-actin tails are involved in CD63+ MVB intracellular transport in VSMCs.”

      • Are there perturbations that increase filopodia numbers? A gain of function experiment would be valuable here.

      We thank the reviewer for this important suggestion regarding the potential value of gain-of-function experiments to clarify filopodia’s contribution to sEV release. In agreement with the reviewer’s scepticism, we have removed statements linking filopodia to sEV release from both the title and abstract to avoid overinterpretation. At present, our understanding of filopodia biology and the lack of robust tools to selectively and substantially increase filopodia numbers in VSMCs prevent us from directly addressing this question through gain-of-function assays. We acknowledge that future studies using established methods—such as overexpression of filopodia-inducing proteins (e.g., mDia2 or fascin)—could provide insight into whether an increased number of filopodia affects sEV release. However, such experiments are beyond the scope of the current manuscript. We have made the following changes to clarify these points:

      (1) Results, page 6, Ln37 “We also attempted to visualise sEV release in filopodia using CD63-pHluorin where fluorescence is only observed upon the fusion of MVBs with the plasma membrane39. Using total internal reflection fluorescence microscopy (TIRF) we observed the typical “burst”-like appearance of sEV secretion at the cell-ECM interface in full agreement with an earlier report showing MVB recruitment to invadopodia-like structures in tumor cells18 (Fig S2B and Supplementary Video S1). Although we also observed an intense CD63-pHluorin staining along filopodia-like structures we were not able to detect typical “burst”-like events to confirm sEV secretion in filopodia. (Fig S2C and Supplemental Video S1)..”

      (2) Discussion, page 12, Ln19 “Curiously we observed CD63+ MVB transport toward the filopodia tips as well as inhibition of sEV-secretion with filopodia formation inhibitors suggesting that sEV secretion can be directly linked to filopodia but further studies are needed to define the contribution of this pathway to the overall sEV secretion by cells.”. 

      Figure 3<br /> • Fig 3A. The CD63 staining is strongly associated with the entire plasma membrane. How are the authors distinguishing between normal membrane shedding and bona fida sEVs based on this staining alone (?)- this is insufficient as all membrane structures are seemingly positive. Additionally, there are very few sEVs in scrutinizing the provided images. For the "sEV secretion, fold change" graphs in previous figures, could the authors provide absolute values, or an indication of what these values are in absolute terms?

      We thank the reviewer for raising this important point regarding the specificity of CD63 staining and the need to distinguish bona fide sEVs from membrane fragments or general membrane shedding. We agree that CD63 staining alone at the plasma membrane or in the extracellular matrix is not sufficient to unequivocally identify sEVs. To address this, we employed several complementary approaches to rigorously characterize ECM-associated sEVs:

      First, using high-resolution iSIM imaging, we confirmed the association of CD63-positive particles specifically with the FN-rich matrix, and demonstrated that SMPD3 knockdown significantly reduced the number of CD63+ particles in the matrix (Fig. 3B; revised from Fig. S3A).

      Second, by incubating FN matrices with purified and fluorescently labeled sEVs, we directly observed efficient entrapment of these labeled sEVs within the matrices (Fig. 3E), confirming that sEVs can interact with and be retained by the ECM.

      Third, we developed and applied a sequential extraction protocol using mild salt buffer (0.5M NaCl) and strong denaturant (4M guanidine HCl) to selectively extract ECM-associated sEVs based on the strength of their association (see new Figs. S3A and S3B). Extracted vesicles were then characterized by ExoView analysis, which demonstrated a tetraspanin profile (CD63+/CD81+/CD9+) closely matching that of sEVs from conditioned media, providing evidence that these particles are true sEVs and not merely membrane debris. We also found that the more weakly bound (NaCl-extracted) fraction closely resembles media-derived sEVs, while the strongly bound (GuHCl-extracted) fraction is more enriched in CD63+ and CD63+/CD81+ sEVs but contains very few CD9+ vesicles, further supporting distinct extracellular vesicle subpopulations within the ECM.

      In addition, the abundance of CD63+/CD81+ sEVs in both media and ECM-derived fractions was independently validated by CD63 bead-capture assay (Fig. S3B).

      We hope these clarifications and the expanded data set address the reviewer’s concerns about sEV identification and quantification in the extracellular matrix:

      (1) Results, page 7, Ln 16. To quantify ECM-trapped sEVs we applied a modified protocol for the sequential extraction of extracellular proteins using salt buffer (0.5M NaCl) to release sEVs which are loosely-attached to ECM via ionic interactions, followed by 4M guanidine HCl buffer (GuHCl) treatment to solubilize strongly-bound sEVs (Fig S3A) 42. We quantified total sEV and characterised the sEV tetraspanin profile in conditioned media, and the 0.5M NaCl and GuHCl fractions using ExoView. The total particle count showed that EVs are both loosely bound and strongly trapped within the ECM. sEV tetraspanin profiling showed differences between these 3 EV populations.  While there was close similarity between the conditioned media and the 0.5M NaCl fraction with high abundance of CD63+/CD81+ sEVs as well as CD63+/CD81+/CD9+ in both fractions (Fig S3A). In contrast, the GuHCl fraction was particularly enriched with CD63+ and CD63+/CD81+ sEVs with very low abundance of CD9+ EVs (Fig S3A). The abundance of CD63+/CD81+ sEVs was confirmed independently by a CD63+ bead capture assay in the media and loosely bound fractions (Fig S3B).

      • A control of fig 3b would be helpful to parse out random uptake of extracellular debris verses targeted sEV internalization. It would be helpful if the authors added particles of similar size to that of the sEVs to test whether these structures are endocytosed/micropinocytosed at similar levels.

      We thank the reviewer for this useful suggestion regarding the need for better controls to distinguish specific sEV uptake from nonspecific internalization of extracellular debris or similarly sized particles. As a comparison, in our study we analyzed the uptake of both sEVs and serum proteins such as fibronectin and fetuin-A (Figs S3C and S3D), and observed similar patterns of intracellular trafficking. However, we acknowledge that inert nanoparticles or beads of a similar size to sEVs could serve as potential controls to assess nonspecific micropinocytosis or endocytosis.

      It is important to note, however, that the uptake of sEVs is strongly influenced by their surface protein composition and the so-called “protein corona.” Recent work from Prof. Khuloud T. Al-Jamal’s group underscores that exosome uptake mechanisms may be highly specific (Liam-Or et al., 2024), and studies from Mattias Belting’s lab have also shown the importance of heparan sulfate proteoglycans in exosome endocytosis (Cerezo-Magana et al., 2021). As a result, uptake comparisons with inert particles or beads may not fully recapitulate the specificity of sEV internalization, and distinct nanoparticle classes may rely on different uptake pathways.

      Figure 4<br /> • Fig. 4E,F,G. How are the authors determining the neointima and media compartments without ancillary staining for basement membrane or endothelial markers? Anatomic specific markers need to be incorporated here for the reader to evaluate the specificity of the FN and CD81 staining. It is also hard to understand the severity of the atherosclerotic lesion without a companion H&E cross section.

      We thank the reviewer for highlighting the need for more rigorous characterization of atherosclerotic lesion architecture and anatomical compartments in our study. In response, we have incorporated additional histological analyses and now provide ancillary staining and companion images to enable clear identification of the neointima and medial compartments, as well as to assess lesion severity (see new Figs S4A–S4D):

      (1)Results, page  8, Ln 28. . “To test if FN associates with sEV markers in atherosclerosis, we investigated the spatial association of FN with sEV markers using the sEV-specific marker CD81. Staining of atherosclerotic plaques with haematoxylin and eosin revealed well-defined regions with the neointima as well as tunica media layers formed by phenotypically transitioned or contractile VSMCs, respectively (Fig S4A). Masson's trichrome staining of atherosclerotic plaques showed abundant haemorrhages in the neointima, and sporadic haemorrhages in the tunica media (Fig S4B). Staining of atherosclerotic plaques with orcein indicated weak connective tissue staining in the atheroma with a confluent extracellular lipid core, and strong specific staining at the tunica media containing elastic fibres which correlated well with the intact elastin fibrils in the tunica media (Figs S4C and S4D). Using this clear morphological demarcation, we found that FN accumulated both in the neointima and the tunica media where it was significantly colocalised with the sEV marker, CD81 (Fig. 4D, 4E and 4F). Notably CD81 and FN colocalization was particularly prominent in cell-free, matrix-rich plaque regions (Figs. 4E and 4F).”

      • Figs s4c, S4d- proper controls are not provided. Again, a non-FN internalization control as well as a 4oC cold block negative control is required to interpret this data.

      We thank the reviewer for this valuable suggestion. To enhance the rigor of our internalization assays, we have now included several additional controls using alternative treatments, fluorophore combinations, and internalization conditions:

      a) We performed FN-Alexa568 uptake assays, followed by immunostaining for CD63 with a distinct fluorophore (Alexa488), to confirm the colocalization of internalized FN with CD63+ endosomal compartments in VSMCs (new Fig. S3E).

      b) We also stained VSMCs, cultured under normal growth conditions, with an anti-FN antibody to visualize intracellular serum-derived FN and again observed colocalization with CD63 (new Figs. S3F and S3G). Notably, in cells grown to confluence, we observed a complete loss of intracellular FN staining and FN/CD63 colocalization, suggesting that FN recycling is prominent in sparse, motile cells, but not in confluent populations.

      These additional controls strengthen our conclusions regarding FN internalization pathways and the conditions under which FN trafficking to the endosomal system occurs:

      (1) Results, page 7, Ln 31  We treated serum-deprived primary human aortic VSMCs with FN-Alexa568 and found that it was endocytosed and subsequently delivered to early and late endosomes together with fetuin A, another abundant serum protein that is a recycled sEV cargo and elevated in plaques (Figs S3C and S3D). CD63 visualisation with a different fluorophore (Alexa488) confirmed FN colocalization with CD63+ MVBs (Fig S3E). Next, we stained non-serum deprived VSMC cultured in normal growth media (RPMI supplemented with 20% FBS) with an anti-FN antibody and observed colocalization of CD63 and serum-derived FN.  Co-localisation was reduced likely due to competitive bulk protein uptake by non-deprived cells (Fig S3F). Notably, when we compared FN distribution in sparsely growing VSMCs versus confluent cells we found that FN intracellular spots, as well as colocalization with CD63, completely disappeared in the confluent state (Fig S3F and S3G)..

      • Can the authors please provide live and fixed imaging of FN and CD63-mediate filopodial secretion to amply support their conclusions.

      We have observed CD63 MVBs in both fixed (Fig 2E) and live VSMCs (Fig 2F) yet we agree that further studies are required to establish the contribution of filopodia to sEV secretion. Therefore, we have added the following changes:

      (1) Results, page 6, Ln37 “We also attempted to visualise sEV release in filopodia using CD63-pHluorin where fluorescence is only observed upon the fusion of MVBs with the plasma membrane39. Using total internal reflection fluorescence microscopy (TIRF) we observed the typical “burst”-like appearance of sEV secretion at the cell-ECM interface in full agreement with an earlier report showing MVB recruitment to invadopodia-like structures in tumor cells18 (Fig S2B and Supplementary Video S1). Although we also observed an intense CD63-pHluorin staining along filopodia-like structures we were not able to detect typical “burst”-like events to confirm sEV secretion in filopodia. (Fig S2C and Supplemental Video S1)..”

      (2) Discussion, page 12, Ln19 “Curiously we observed CD63+ MVB transport toward the filopodia tips as well as inhibition of sEV-secretion with filopodia formation inhibitors suggesting that sEV secretion can be directly linked to filopodia but further studies are needed to define the contribution of this pathway to the overall sEV secretion by cells.”. 

      Figure 5

      • Fig. 5A,B. The authors claim that sEV supplementation enhances VSMC migration speed and distance. The provided graphs show only a marginal increase in speed with sEV addition (A) but, concerningly, there is a four-star significant difference between the FN condition compared with FN+sEV (B) while the means appear the same. How are these conditions statistically different? The statistics seem off for these comparisons.

      We thank the reviewer for highlighting concerns regarding the statistical analysis in Figures 5A and 5B. In response, we have carefully re-examined our data and statistical approach to ensure accuracy and transparency.

      First, we have now included all individual cell migration tracks in the data representation for these figures. The statistical tests were repeated using the Kruskal–Wallis test with Dunn’s multiple comparison correction across all groups. This more stringent analysis confirmed our key findings: fibronectin (FN) stimulates VSMC migration speed, while inhibition of sEV secretion (with 3-OMS) reduces cellular speed (Fig. 5A). Addition of exogenous ECM-associated sEVs modestly restored cell speed in the presence of 3-OMS, but had no effect on baseline migration speed in 2D or 3D models (Figs. 5A, 5D).

      Regarding the four-star significance observed in the original Fig. 5B, the previous result reflected an analysis based on pooled group averages, which may have overstated marginal differences. The revised analysis, based on individual cell tracks, does not support a substantial difference between FN and FN+sEV groups. The revised p-values and comparisons are now provided directly on the figures and described in the figure legends. We also clearly report the numbers of biological replicates, technical replicates, and individual data points for every condition.

      Further, the modest effect of ECM-associated sEVs on speed is consistent with our observation that sEVs influence invasion directionality rather than baseline migration velocity, in agreement with previous findings in tumor models (Sung et al., 2015).

      The manuscript has been revised accordingly, with updates in:

      (1) Figures 5A and 5B: Individual cell track data are now shown, and statistical analyses have been repeated using the Kruskal–Wallis test with Dunn’s multiple comparisons.

      (2) Figure legends and results sections: Numbers of biological and technical replicates, as well as individual data points, are now clearly stated.

      (3) Results, page 9, line 14:  “FN as a cargo in sEVs promotes FA formation in tumour cells and increases cell speed14, 15. As we found that FN is loaded into VSMC-derived sEVs we hypothesized that ECM-entrapped sEVs can enhance cell migration by increasing cell adhesion and FA formation in the context of a FN-rich ECM. Therefore, we tested the effect of sEV deposition onto the FN matrix on VSMC migration in 2D and 3D models. We found that FN coating promoted VSMC velocity and inhibition of bulk sEV secretion with 3-OMS reduced VSMC speed in a 2D single-cell migration model (Figs. 5A, 5B) in agreement with previous studies using tumour cells14, 15. However, addition of sEVs to the ECM had no effect on VSMC speed at baseline but rescued cell speed and distance in the presence of the sEV secretion inhibitor, 3-OMS suggesting the EVs are not primarily regulating cell speed (Figs 5A and 5B).”

      (4) Results, page 9, Ln 29 “Hence, ECM-associated sEVs have modest influence on VSMC speed but influence VSMC invasion directionality.”.

      We hope that these changes address the reviewer’s concerns and improve the transparency and reproducibility of our data presentation

      • Fig d-h. Generally, the magnitude of the difference between the presented conditions are biologically insignificant. Several of the graphs show a four-star difference with means that appear equivalent with overlapping error bars. Do the authors conclude that a 0.1%, or less, effect between groups is biologically meaningful?

      We thank the reviewer for drawing attention to the apparent mismatch between statistical significance and biological relevance in Figures 5d–h. In response, we have reanalyzed the data using individual cell tracks and more stringent non-parametric statistical tests, as described above. This reanalysis confirmed that the magnitude of differences in migration speed and related parameters between the groups is minimal and not biologically meaningful. Thus, we no longer claim that sEVs significantly affect VSMC migration speed under these conditions in either 2D or 3D assays. Our revised manuscript now accurately reflects this finding in both the Results and Discussion sections, and the updated figures and legends clarify the true extent of any differences observed.

      Figure 6

      • Generally, the author's logic for looking into adhesion, focal adhesion and traction forces is hard to follow. If there are sEV-mediated migration differences, then there would inexorably be focal adhesion alterations. However, the data indicates few differences brought on by sEVs, which speaks to the lack of migration differences presented in Fig. 5. Overall, the sEV migration phenotype has so little of an effect, to then search for a mechanism seems destine to not turn up anything significant.

      We thank the reviewer for highlighting the importance of connecting the observed phenotypic effects of sEVs to the investigation of adhesion and focal adhesion mechanisms. While our revised analysis confirms that sEVs have little to no effect on VSMC migration speed or distance in 2D and 3D models, we did observe a robust effect of sEVs on the directionality of cell invasion (Figs. 5G and 5H). This prompted us to look more closely at pathways involved in cell guidance rather than bulk cell motility.

      Our proteomic comparison between larger EVs (10K fraction) and sEVs (100K fraction) revealed a unique adhesion complex present specifically on the sEVs—comprising collagen VI, TGFBI, LGALS3BP, and EDIL3 (Figs. 7A–C)—each of which has previously been implicated in integrin signaling, cell adhesion, or invasion. Functional blocking and knockdown studies further identified collagen VI as a key mediator in the regulation of cell adhesion and invasion directionality influenced by sEVs (Figs. 7F and 7I).

      In response to this mechanistic insight, we have modified the graphical abstract and discussion to clarify our approach:

      We now explicitly state that our focus has shifted from analyzing baseline migration speed to mechanisms guiding invasion directionality, in line with our key phenotypic findings.We highlight that the unique adhesion cluster identified on sEVs—including collagen VI and its cooperative partners—provides a strong mechanistic rationale for examining focal adhesion dynamics and ECM interactions, even in the absence of changes in migration velocity.Discussion excerpts (pages 13–14) have been updated to reflect this rationale and to summarize the potential significance of these findings for vascular biology and disease.

      We hope this clarifies the logic underlying our approach and justifies the mechanistic studies performed in this context:

      (1) Discussion, page 13, Ln 2  “Hence, it will be interesting in future studies to investigate whether sEVs can stimulate Rho activity by presenting adhesion modulators—particularly collagen VI—on their surface, thereby guiding cell directionality during invasion.”

      (2) Discussion, page 13, Ln 30  “In addition to collagen VI the unique adhesion cluster in VSMC-derived sEVS also includes EGF-like repeat and discoidin I-like domain-containing protein (EDIL3), transforming growth factor-beta-induced protein ig-h3 (TGFBI) and the lectin galactoside-binding soluble 3 binding protein (LGALS3BP) and these proteins are also directly implicated in activation of integrin signalling and cellular invasiveness85-87. Although we found that collagen VI plays the key role in sEV-induced early formation of FAs in VSMCs, it is tempting to speculate that the high sEV efficacy in stimulating FA formation is driven by cooperative action of this unique adhesion complex on the sEVs surface and targeting this novel sEV-dependent mechanism of VSMC invasion may open-up new therapeutic opportunities to modulate atherosclerotic plaque development or even to prevent undesired VSMC motility in restenosis”.    . 

      (3) Discussion, page 14, Ln 14 “In summary, cooperative activation of integrin signalling and F-actin cytoskeleton pathways results in the secretion of sEVs which associate with the ECM and play a signalling role by controlling FA formation and cell-ECM crosstalk. Further studies are needed to test these mechanisms across various cell types and ECM matrices.     ”.    

      Figure 7<br /> • The authors need to provide additional evidence Col IV is harbored in sEVs and not a contaminant of sEV isolation as VSMCs secrete a copious amount of this in culture. For instance, IHC of isolated sEVs stained for CD63 and Col IV as well as single cell staining of the same sort.

      We thank the reviewer for this important comment regarding the specificity of collagen VI detection in sEVs. To ensure that collagen VI is associated with bona fide sEVs—rather than being a contaminant resulting from high extracellular abundance—we performed a comparative analysis of vesicles isolated from the same conditioned media. Both proteomic mass spectrometry and western blotting revealed that collagen VI was exclusively present in the small EV (100K pellet) fraction and not in the larger EVs (10K pellet), as shown in Figs. 7B and 7C. Collagen VI was further identified in sEVs extracted from the ECM using our salt/guanidine protocol (new Fig. 7D).

      Reviewer #2 (Recommendations For The Authors):

      The authors have presented a nice collection of data with strong approaches to address their hypotheses. Nevertheless, an additional section within the Discussion would be welcome in addressing the potential limitations and important caveats to be considered alongside their study. These caveats and limitations could be reshaped by additional data supporting the ideas that: (1) small extracellular vesicles can be directly observed during their secretion from filopodia, (2) CD81 labeling in tissue can be interpreted clearly as extracellular vesicles and not the cell surface of other cell types (co-staining with an endothelial cell marker such as PECAM-1 perhaps), and (3) collagen VI within the vesicles is somehow accessed by adhesion molecules on the cell surface of migrating cells.

      We thank the reviewer for these important suggestions and we have now added further studies and modified our conclusions to reflect the data more accurately:

      (1) Results. Page 6, Ln37  “We also attempted to visualise sEV release in filopodia using CD63-pHluorin where fluorescence is only observed upon the fusion of MVBs with the plasma membrane39. Using total internal reflection fluorescence microscopy (TIRF) we observed the typical “burst”-like appearance of sEV secretion at the cell-ECM interface in full agreement with an earlier report showing MVB recruitment to invadopodia-like structures in tumor cells18 (Fig S2B and Supplementary Video S1). Although we also observed an intense CD63-pHluorin staining along filopodia-like structures we were not able to detect typical “burst”-like events to confirm sEV secretion in filopodia. (Fig S2C and Supplemental Video S1)”..  

      (2) Discussion, page 12, Ln18: “Here we report that β1 integrin activation triggers sEV release followed by sEV entrapment by the ECM. Curiously we observed CD63+ MVB transport toward the filopodia tips as well as inhibition of sEV-secretion with filopodia formation inhibitors suggesting that sEV secretion can be directly linked to filopodia but further studies are needed to define the contribution of this pathway to the overall sEV secretion by cells”..

      We quantified the colocalization of CD81 and CD31 to exclude the endothelial cell origin of sEVs and extended the characterisation of the atherosclerotic matrix as well as highlighting any limitations to interpretation ie re  CD81 ECM localisation: 

      (1) Results, page 8, Ln 43 “An enhanced expression of CD81 by endothelial cells in early atheroma has been previously reported so to study the contribution of CD81+ sEVs derived from endothelial cells  we investigated the localisation of CD31 and CD8145. In agreement with a previous study, we found that the majority of CD31 colocalises with CD81 (Thresholded Mander's split colocalization coefficient 0.54±0.11, N=6) indicating that endothelial cells express CD81 (Fig 4G)45. However, only a minor fraction of total CD81 colocalised with CD31 (Thresholded Mander's split colocalization coefficient 0.24±0.06, N=6) confirming that the majority of CD81 in the neointima is originating from the most abundant VSMCs.. 

      (2) Results, page 8, Ln 28: “To test if FN associates with sEV markers in atherosclerosis, we investigated the spatial association of FN with sEV markers using the sEV-specific marker CD81. Staining of atherosclerotic plaques with haematoxylin and eosin revealed well-defined regions with the neointima as well as tunica media layers formed by phenotypically transitioned or contractile VSMCs, respectively (Fig S4A). Masson's trichrome staining of atherosclerotic plaques showed abundant haemorrhages in the neointima, and sporadic haemorrhages in the tunica media (Fig S4B). Staining of atherosclerotic plaques with orcein indicated weak connective tissue staining in the atheroma with a confluent extracellular lipid core, and strong specific staining at the tunica media containing elastic fibres which correlated well with the intact elastin fibrils in the tunica media (Figs S4C and S4D). Using this clear morphological demarcation, we found that FN accumulated both in the neointima and the tunica media where it was significantly colocalised with the sEV marker, CD81 (Fig. 4D, 4E and 4F). Notably CD81 and FN colocalization was particularly prominent in cell-free, matrix-rich plaque regions (Figs. 4E and 4F). .”

      We showed that collagen VI is presented on the surface of sEVs:

      (1) Results, page 10, Ln43: “Collagen VI was the most abundant protein in VSMC-derived sEVs (Fig 7B, Table S7) and  was previously implicated in the interaction with the proteoglycan NG253 and suppression of cell spreading on FN54. To confirm the presence of collagen VI in ECM-associated sEVs we analysed sEVs extracted from the 3D matrix using 0.5M NaCl treatment and showed that both collagen VI and FN are present (Fig 7D). Next, we analysed the distribution of collagen VI using dot-blot. Alix staining was bright only upon permeabilization of sEV indicating that it is preferentially a luminal protein (Fig 7E). On the contrary, CD63 staining was similar in both conditions showing that it is surface protein (Fig 7E). Interestingly, collagen VI staining revealed that 40% of the protein is located on the outside surface with 60% in the sEV lumen (Fig 7E)

    1. Reviewer #3 (Public review):

      Summary:

      The present manuscript investigates and proposes different mechanisms for the effects of two therapeutic approaches - cognitive distancing technique and use of antidepressants - on subjective ratings of happiness, confidence, and task engagement, and on the influence of such subjective experiences on choice behavior. Both approaches were found to link to changes in affective state dynamics in a choice task, specifically reduced drift (cognitive distancing) and increased baseline (antidepressant use). Results also suggest that cognitive distancing may reduce the weighing of recent expected values in the happiness model, while antidepressant use may reduce forgetting of choices and outcomes.

      Strengths:

      This is a timely topic and a significant contribution to ongoing efforts to improve our mechanistic understanding of psychopathology and devise effective novel interventions. The relevance of the manuscript's central question is clear, and the links to previous literature and the broader field of computational psychiatry are well established. The modelling approaches are thoughtful and rigorously tested, with appropriate model checks and persuasive evidence that modelling complements the theoretical argument and empirical findings.

      Weaknesses:

      Some vagueness and lack of clarity in theoretical mechanisms and interpretation of results leave outstanding questions regarding (a) the specific links drawn between affective biases, therapies aimed at mitigating them, and mental health function, and (b) the structure and assumptions of the modelling, and how they support the manuscript's central claims. Broadly, I do not fully understand the distinction between how choice behavior vs. affect are impacted separately or together by cognitive distancing. Clarification on this point is needed, possibly through a more explicit proposal of a mechanism (or several alternative mechanisms?) in the introduction and more explicit interpretation of the modelling results in the context of the cyclical choice-affect mechanism.

      (1) Theoretical framework and proposed mechanisms

      The link between affective biases and negative thinking patterns is a bit unclear. The authors seem to make a causal claim that "affective biases are precipitated and maintained by negative thinking patterns", but it is unclear what precisely these negative patterns are; earlier in the same paragraph, they state that affective biases "cause low mood" and possibly shift choices toward those that maintain low mood. So the directionality of the mechanism here is unclear - possibly explaining a bit more of the cyclic nature of this mechanism, and maybe clarifying what "negative thinking patterns" refer to will be helpful.

      More generally, this link between affect and choices, especially given the modelling results later on, should be clarified further. What is the mechanism by which these two impact each other? How do the models of choice and affect ratings in the RL task test this mechanism? I'm not quite sure the paper answers these questions clearly right now.

      The authors also seem to implicitly make the claim that symptoms of mental ill-health are at least in part related to choice behavior. I find this a persuasive claim generally; however, it is understated and undersupported in the introduction, to the point where a reader may need to rely on significant prior knowledge to understand why mitigating the impact of affective biases on choice behavior would make sense as the target of therapeutic interventions. This is a core tenet of the paper, and it would be beneficial to clarify this earlier on.

      It would be helpful to interpret a bit more clearly the findings from 3.4. on decreased drift in all three subjective assessments in the cognitive distancing group. What is the proposed mechanism for this? The discussion mentions that "attenuated declines [...] over time, [add] to our previously reported findings that this psychotherapeutic technique alters aspects of reward learning" - but this is vague and I do not understand, if an explanation for how this happens is offered, what that explanation is. Given the strong correlation of the drift with fatigue, is the explanation that cognitive distancing mitigates affect drift under fatigue? Or is this merely reporting the result without an interpretation around potential mechanisms?

      (Relatedly, aside from possibly explaining the drift parameter, do the fatigue ratings link with choice behavior in any way? Is it possible that the cognitive distancing was helping participants improve choices under fatigue?)

      (2) Task Structure and Modelling

      It is unclear what counted as a "rewarding" vs. "unrewarding" trial in the model. From my understanding of the task description, participants obtained positive or no reward (no losses), and verbal feedback, Correct/Incorrect. But given the probabilistic nature of the task, it follows that even some correct choices likely had unrewarding results. Was the verbal feedback still "Correct" in those cases, but with no points shown? I did not see any discussion on whether it is the #points earned or the verbal feedback that is considered a reward in the model. I am assuming the former, but based on previous literature, likely both play a role; so it would be interesting - and possibly necessary to strengthen the paper's argument - to see a model that assigns value to positive/negative feedback and earned points separately.

      From a theory perspective, it's interesting that the authors chose to assume separate learning rates for rewarding and non-rewarding trials. Why not, for example, separate reward sensitivity parameters? E.g., rather than a scaling parameter on the PE, a parameter modifying the r term inside the PE equation to, perhaps, assign different values to positive and zero points? (While I think overall the math works out similarly at the fitting time, this type of model should be less flexible on scaling the expected value and more flexible on scaling the actual #points / the subjective experience of the obtained verbal feedback, which seems more in line with the theoretical argument made in the introduction). The introduction explicitly states that negative biases "may cause low mood by making outcomes appear less rewarding" - which in modelling equations seems more likely to translate to different reward-perception biases, and not different learning rates. Alternatively, one might incorporate a perseveration parameter (e.g., similar to Collins et al. 2014) that would also accomplish a negative bias. Either of these two mechanisms seems perhaps worth testing out in a model - especially in a model that defines more clearly what rewarding vs. unrewarding may mean to the participant.

      If I understand correctly, the affect ratings models assume that the Q-value and the PE independently impact rating (so they have different weights, w2 and w3), but there is no parameter allowing for different impact for perceived rewarding and unrewarding outcomes? (I may be misreading equations 4-5, but if not, Q-value and PE impact the model via static rather than dynamic parameters.) Given the joint RL-affect fit, this seems to carry the assumption that any perceptual processing differences leading to different subjective perceptions of reward associated with each outcome only impact choice behavior, but not affect? (whereas affect is more broadly impacted, if I'm understanding this correctly, just by the magnitude of the values and PEs?) This is an interesting assumption, and the authors seem to have tested it a bit more in the Supplementary material, as shown in Figure S4. I'm wondering why this was excluded from the main text - it seems like the more flexible model found some potentially interesting differences which may be worth including, especially as they might shed additional insight into the influence of cognitive distancing on the cyclical choice-affect mechanisms proposed.

      Minor comments:

      If fatigue ratings were strongly associated with drift in the best-fitting model (as per page 13), I wonder if it would make sense to use those fatigue ratings as a proxy rather than allow the parameter to vary freely? (This does not in any way detract from the winning model's explanatory power, but if a parameter seems to be strongly explained by a variable we have empirical data for, it's not clear what extra benefit is earned by having that parameter in the model).

    1. and that the Lord may behold us as a People offering Praise and thereby glorifying Him

      They want to receive praise from God for offering a day of peace where pilgrims and natives can feast together. I interpret this to mean that they view God as kind and forgiven and therefore think he will favor them if they behave similarly even though they view them as heathens. This helps me understand how they interacted with the Native Americans and how they thought about God at the time and what interpretations they used. This provides changes over time with what we know celebrate Thanksgiving as and how it originally started.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Public review): 

      Summary: 

      Ferreiro et al. present a method to simulate protein sequence evolution under a birth-death model where sequence evolution is guided by structural constraints on protein stability. The authors then use this model to explore the predictability of sequence evolution in several viral proteins. In principle, this work is of great interest to molecular evolution and phylodynamics, which has struggled to couple non-neutral models of sequence evolution to phylodynamic models like birth-death processes. Unfortunately, though, the model shows little improvement over neutral models in predicting protein sequence evolution, although it can predict protein stability better than models assuming neutral evolution. It appears that more work is needed to determine exactly what aspects of protein sequence evolution are predictable under such non-neutral phylogenetic models. 

      We thank the reviewer for the positive comments about our work. We agree that further work is needed in the field of substitution models of molecular evolution to enable more accurate predictions of specific amino acid sequences in evolutionary processes.

      Major concerns: 

      (1) The authors have clarified the mapping between birth-death model parameters and fitness, but how fitness is modeled still appears somewhat problematic. The authors assume the death rate = 1 - birth rate. So a variant with a birth rate b = 1 would have a death rate d = 0 and so would be immortal and never die, which does not seem plausible. Also I'm not sure that this would "allow a constant global (birth-death) rate" as stated in line 172, as selection would still act to increase the population mean growth rate r = b - d. It seems more reasonable to assume that protein stability affects only either the birth or death rate and assume the other rate is constant, as in the Neher 2014 model. 

      The model proposed by Neher, et al. (2014), which incorporates a death rate (d) higher than 0 for any variant, was implemented and applied in the present method. In general, this model did not yield results different from those obtained using the model that assumes d = 1 – b, suggesting that this aspect may not be crucial for the study system. Next, the imposition of arbitrary death events based on an arbitrary death rate could be a point of concern. Regarding the original model, a variant with d = 0 can experience a decrease in fitness through the mutation process. In an evolutionary process, each variant is subject to mutation, and Markov models allow for the incorporation of mutations that decrease fitness (albeit with lower probability than beneficial ones, but they can still occur). All this information is included in the manuscript.

      (2) It is difficult to evaluate the predictive performance of protein sequence evolution. This is in part due to the fact that performance is compared in terms of percent divergence, which is difficult to compare across viral proteins and datasets. Some protein sequences would be expected to diverge more because they are evolving over longer time scales, under higher substitution rates or under weaker purifying selection. It might therefore help to normalize the divergence between predicted and observed sequences by the expected or empirically observed amount of divergence seen over the timescale of prediction. 

      AU: The study protein datasets showed different levels of sequence divergence over their evolutionary times, as indicated for each dataset in the manuscript. For some metrics, we evaluated the accuracy (or error) of the predictions through direct comparisons between real and predicted protein variants using percentages to facilitate interpretation: 0% indicates a perfect prediction (no error), while 100% indicates a completely incorrect prediction (total error). Regarding normalization of these evaluations, we respectfully disagree with the suggestion because diverse factors can affect (not only the substitution rate, but also the sample size, structural features of the protein that may affect stability when accommodating different sequences, among others) and this complicates defining a consistent and meaningful normalization criterion. Given that the manuscript provides detailed information for each dataset, we believe that the presentation of the prediction accuracy through direct comparisons between real and predicted protein variants, expressed as percentages of similarity, is the clearest way.

      (3) Predictability may also vary significantly across different sites in a protein. For example, mutations at many sites may have little impact on structural stability (in which case we would expect poor predictive performance) while even conservative changes at other sites may disrupt folding. I therefore feel that there remains much work to be done here in terms of figuring out where and when sequence evolution might be predictable under these types of models, and when sequence evolution might just be fundamentally unpredictable due to the high entropy of sequence space. 

      We agree with this reflection. Mutations can have different effects on folding stability, which are accounted for by the model presented in this study. However, accurately predicting the exact sequences of protein variants with similar stability remains difficult with current structurally constrained substitution models, and therefore, further work is needed in this regard. This aspect is indicated in the manuscript.

      We want to thank the reviewer again for taking the time to revise our work and for the insightful and helpful comments.

      Reviewer #2 (Public review): 

      In this study, the authors aim to forecast the evolution of viral proteins by simulating sequence changes under a constraint of folding stability. The central idea is that proteins must retain a certain level of structural stability (quantified by folding free energy, ΔG) to remain functional, and that this constraint can shape and restrict the space of viable evolutionary trajectories. The authors integrate a birth-death population model with a structurally constrained substitution (SCS) model and apply this simulation framework to several viral proteins from HIV-1, SARS-CoV-2, and influenza.

      The motivation to incorporate biophysical constraints into evolutionary models is scientifically sound, and the general approach aligns with a growing interest in bridging molecular evolution and structural biology. The authors focus on proteins where immune pressure is limited and stability is likely to be a dominant constraint, which is conceptually appropriate. The method generates sequence variants that preserve folding stability, suggesting that stability-based filtering may capture certain evolutionary patterns. 

      Correct. We thank the reviewer for the positive comments about our study.

      However, the study does not substantiate its central claim of forecasting. The model does not predict future sequences with measurable accuracy, nor does it reproduce observed evolutionary paths. Validation is limited to endpoint comparisons in a few datasets. While KL divergence is used to compare amino acid distributions, this analysis is only applied to a single protein (HIV-1 MA), and there is no assessment of mutation-level predictive accuracy or quantification of how well simulated sequences recapitulate real evolutionary paths. No comparison is made to real intermediate variants available from extensive viral sequencing datasets which gather thousands of sequences with detailed collection date annotation (SARS-CoV-2, Influenza, RSV). 

      There are several points in this comment.

      The presented method accurately predicts folding stability of forecasted variants, as shown through comparisons between real and predicted protein variants. However, as the reviewer correctly indicates, predicting the exact amino acid sequences remains challenging. This limitation is discussed in detail in the manuscript, where we also suggest that further improvements in substitution models of protein evolution are needed to better capture the evolutionary signatures of amino acid change at the sequence level, even between amino acids with similar physicochemical properties. Regarding the time points used for validation, the studied influenza NS1 dataset included two validation points. A key limitation in increasing the number of time points is the scarcity of datasets derived from monitoring protein evolution with sufficient molecular diversity between samples collected at consecutive time points (i.e., at least more than five polymorphic amino acid sites). 

      As described in the manuscript, calculating Kullback-Leibler (KL) divergence requires more than one sequence per studied time point. However, most datasets in the literature include only a single sequence per time point, typically a consensus sequence derived from bulk population sequencing. Generating multiple sequences per time point is experimentally more demanding, often requiring advanced methods such as single-virus sequencing or amplification of sublineages in viral subpopulations, as was done for the first dataset used in the study (Arenas, et al. 2016), which enabled the calculation of KL divergence. The extent to which the simulated sequences resemble real evolution is evaluated in the method validation. As noted, intermediate time point validation was performed using the influenza NS1 protein dataset. Although, as the reviewer indicates, thousands of viral sequences are available, these are usually consensus sequences from bulk sequencing. Indeed, many viral variants mainly differ through synonymous mutations, where the number of accumulated nonsynonymous mutations is small. For example, from the original Wuhan strain to the Omicron variant, the SARS-CoV-2 proteins Mpro and PLpro accumulated only 10 and 22 amino acid changes, respectively.

      Analyzing intermediate variants of concern (i.e., Gamma or Delta) would reduce this number affecting statistics. In addition, many available viral sequences are not consecutive in evolutionary terms (one dataset does not represent the direct origin of another dataset at a subsequent time point), which further limits their applicability in this study. There is little data from monitored protein evolution with consecutive samples. The most suitable studies usually involve in vitro virus evolution, but the data from these studies often show low genetic variability between samples collected at different time points. Finally, it is important to note that the presented method can only be applied to proteins with known 3D structures, as it relies on selection based on folding stability. Non-structural proteins cannot be analyzed using this approach. Future work could incorporate additional selection constraints, which may improve the accuracy of predictions. These considerations and limitations are indicated in the manuscript.

      The selection of proteins is narrow and the rationale for including or excluding specific proteins is not clearly justified. 

      The viral proteins included in the study were selected based on two main criteria, general interest and data availability. In particular, we included proteins from viruses that affect humans and for which data from monitored protein evolution, with sufficient molecular diversity between consecutive time points, is available. These aspects are indicated in the manuscript.

      The analyzed datasets are also under-characterized: we are not given insight into how variable the sequences are or how surprising the simulated sequences might be relative to natural diversity. Furthermore, the use of consensus sequences to represent timepoints is problematic, particularly in the context of viral evolution, where divergent subclades often coexist - a consensus sequence may not accurately reflect the underlying population structure. 

      The manuscript indicates the sequence identity among protein datasets of different time points, along with other technical details. Next, the evaluation based on comparisons between simulated and real sequences reflects how surprising the simulated sequences might be relative to natural diversity, considering that the real dataset is representative. We believe that the diverse study real datasets are useful to evaluate the accuracy of the method in predicting different molecular patterns. Regarding the use of consensus sequences, we agree that they provide an approximation. However, as previously indicated, most of the available data from monitored protein evolution consist of consensus sequences obtained through bulk sequencing. Additionally, analyzing every individual viral sequence within a viral population, which is typically large, would be ideal but computationally intractable.

      The fitness function used in the main simulations is based on absolute ΔG and rewards increased stability without testing whether real evolutionary trajectories tend to maintain, increase, or reduce folding stability over time for the particular systems (proteins) that are studied. While a variant of the model does attempt to center selection around empirical ΔG values, this more biologically plausible version is underutilized and not well validated.

      The applied fitness function, based on absolute ΔG, is well stablished in the field (Sella and Hirsh 2005; Goldstein 2013). The present study independently predicts ΔG for the real and simulated protein variants at each sampling point. This ΔG prediction accounts not only for negative design, informed by empirical data, but also for positive design based on the study data (Arenas, et al. 2013; Minning, et al. 2013), thereby enabling the detection of variation in folding stability among protein variants. These aspects are indicated in the manuscript. Therefore, in our view, the study provides a proper comparison of real and predicted evolutionary trajectories in terms of folding stability.

      Ultimately, the model constrains sequence evolution to stability-compatible trajectories but does not forecast which of these trajectories are likely to occur. It is better understood as a filter of biophysically plausible outcomes than as a predictive tool. The distinction between constraint-based plausibility and sequence-level forecasting should be made clearer. Despite these limitations, the work may be of interest to researchers developing simulation frameworks or exploring the role of protein stability in viral evolution, and it raises interesting questions about how biophysical constraints shape sequence space over time. 

      The presented method estimates the fitness of each protein variant, which can reflect the relative survival capacity of the variant. Therefore, despite the error due to evolutionary constraints not considered by the method, it indicates which variants are more likely to become fixed over time. In our view, the method does not merely filter plausible variants, rather, it generates predictions of variant survival through predicted fitness based on folding stability and simulations of protein evolution under structurally constrained substitution models integrated with birth-death population genetics approaches. The use of simulation-based approaches for prediction is well established in population genetics. For example, approaches such as approximate Bayesian computation (Beaumont, et al. 2002) rely on this strategy, and it has also been applied in other studies of forecasting evolution (e.g., Neher, et al. 2014). We believe that the distinction between forecasting folding stability and amino acid sequence is clearly shown in the manuscript, including the main text and the figures.

      Reviewer #2 (Recommendations for the authors): 

      I thank the authors for addressing the question about template switching, their clarification was helpful. However, the core concerns I raised remain unresolved: the claim that the method is useful for forecasting is not substantiated.  In order to support the paper's central claims or to prove its usefulness, several key improvements could be incorporated: 

      (1) Systematic analysis of more proteins: 

      The manuscript would be significantly strengthened by a systematic evaluation of model performance across a broader set of viral proteins, beyond the examples currently shown. Many human influenza and SARS-CoV-2 proteins have wellcharacterized structures or high-quality homology templates, making them suitable candidates. In the light of limited success of the method, presenting the model's behavior across a more comprehensive protein set, including those with varying structural constraints and immune pressures, would help assess generalizability and clarify the specific conditions under which the model is applicable. 

      Following a comment from the reviewer in a previous revision of the study, we included the analysis of an influenza NS1 protein dataset that contains two evaluation time points. Next, to validate the prediction method, it is necessary to have monitored protein sequences collected at least at two consecutive time points, with sufficient divergence between them to capture evolutionary signatures that allow for proper evaluation. Additionally, many data involve sequences that are not consecutive in evolutionary terms (one dataset is not a direct ancestor of another dataset existing at a posterior time point), which disallows their applicability in this study. Little data from monitored protein evolution with trustable consecutive (ancestor-descendant) samples exist. The most suitable studies often involve in vitro virus evolution, but they usually show low genetic variability between samples collected at different time points. Although thousands of sequences are available for some viruses, they are usually consensus sequences from bulk sequencing and often show a low number of nonsynonymous mutations at the study protein-coding gene between time points. For example, from the original Wuhan strain and the Omicron variant, the SARS-CoV-2 proteins Mpro and PLpro accumulated only 10 and 22 amino acid changes, respectively. Analyzing intermediate variants of concern (i.e., Gamma or Delta) would reduce this number affecting statistics. Thus, in practice, we found scarcity of data derived from monitoring protein evolution, with trustable ancestor and corresponding descendant data at consecutive time points and with sufficient molecular diversity between them (i.e., at least more than five polymorphic amino acid sites). In all, we believe that the diverse viral protein datasets used in the present study, along with the multiple analyzed datasets collected from monitored HIV-1 populations present in different patients, provide a representative application of the method, since notice that similar patterns were generally generated from the analysis of the different datasets.

      (2) Present clear data statistics: For each analyzed dataset, the authors should provide basic information about the number of unique sequences, levels of variability, and evolutionary divergence between start and end sequences. This would contextualize the forecasting task and clarify whether the simulations are non-trivial. In particular, it should be shown that the consensus sequence is indeed representative of the viral population at a given time point. In viral evolution we frequently observe co-circulation of subclades and the consensus sequence is then not representative. 

      For each dataset analyzed, the manuscript provides the sequence identity between samples at the study time points (which also informs about sequence variability), sample sizes, representative protein structure, and other technical details. The study assumes that consensus sequences, typically generated by bulk sequencing, are representative of the viral population. Next, samples at different time points should involve ancestor-descendant relationships, which is a requirement and one of the limitations to find appropriate data for this study, as noted in our previous response.

      (3) Explore other metrics for population level sequence comparison: 

      In the light of possible existence of subclades, mentioned above, the currently used metrics for sequence comparison may underestimate performance of the simulations. It would be sufficient to see some overlap of simulated clades and and the observed clades. 

      We found this to be a good idea. However, in practice, we believe that the criteria used to define subclades could introduce biases into the results. For some metrics, we evaluated the accuracy of the predictions through direct comparisons between all real and predicted protein variants, using percentages to facilitate interpretation. We believe that using subclades could potentially reduce the current prediction errors, but this would complicate the interpretation of the results, as they would be influenced by the subjective criteria used to define the subclades.

      Currently, the manuscript presents a plausible filtering framework rather than a predictive model. Without these additional analyses, the main claims remain only partially supported. 

      Please see our reply to the comment of the reviewer just before the section titled “Recommendations for the authors”.

      Response to some rebuttal statements: 

      (1) "Sequence comparisons based on the KL divergence require, at the studied time point, an observed distribution of amino acid frequencies among sites and an estimated distribution of amino acid frequencies among sites. In the study datasets, this is only the case for the HIV-1 MA dataset, which belongs to a previous study from one of us and collaborators where we obtained at least 20 independent sequences at each sampling point (Arenas, et al. 2016)" 

      The available Influenza and SARS-CoV-2 data gathers isolates annotated with exact collection dates, providing reach datasets for such analysis. 

      The available influenza and SARS-CoV-2 sequences are typically derived from bulk sequencing and, therefore, they are consensus sequences. As a result, they cannot be used to calculate KL divergence. Additionally, many of the indicated sequences from databases are not demonstrated to be consecutive in evolutionary terms (one dataset is not a direct ancestor of another dataset existing at a posterior time point), which disallows their applicability in this study. The most suitable studies often involve in vitro virus evolution, but they usually show low genetic variability between samples collected at different time points.

      (2) "Regarding extending the analysis to other time points (other variants of concern), we kindly disagree because Omicron is the variant of concern with the highest genetic distance to the Wuhan variant, and a high genetic distance is  required to properly evaluate the prediction method." 

      There have been many more variants of concern subsequent to Omicron which circulated in 2021. 

      A key aspect is the accumulation of diversity in the study proteins across different time points. The SARS-CoV-2 proteins Mpro and PLpro accumulated only 10 and 22 amino acid changes from the original Wuhan variant to Omicron, respectively.

      Analyzing intermediate variants of concern (e.g., Gamma or Delta) or those closely related to Omicron would reduce the number of accumulated mutations even further.   

      We want to thank the reviewer again for taking the time to revise our work and for the insightful and helpful comments.


      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      Ferreiro et al. present a method to simulate protein sequence evolution under a birth-death model where sequence evolution is constrained by structural constraints on protein stability. The authors then use this model to explore the predictability of sequence evolution in several viral structural proteins. In principle, this work is of great interest to molecular evolution and phylodynamics, which have struggled to couple non-neutral models of sequence evolution to phylodynamic models like birth-death. Unfortunately, though, the model shows little improvement over neutral models in predicting protein evolution, and this ultimately appears to be due to fundamental conceptual problems with how fitness is modeled and linked to the phylodynamic birth-death model. 

      AU: We thank the reviewer for the positive comments about our work.

      Regarding predictive power, the study showed a good accuracy in predicting the real folding stability of forecasted protein variants under a selection model, but not under a neutral model. Next, predicting the exact sequences was more challenging. In this revised version, where we added additional real data, we found that the accuracy of this prediction can vary among proteins (i.e., the SCS model was more accurate than the neutral model in predicting sequences of the influenza NS1 protein at different time points). Still, we consider that efforts are required in the field of substitution models of molecular evolution. For example, amino acids with similar physicochemical properties can result in predictions with appropriate folding stability while different specific sequence. The development of accurate substitution models of molecular evolution is an active area of research with ongoing progress, but further efforts are still needed. Next, forecasting the folding stability of future real proteins is fundamental for proper forecasting protein evolution, given the essential role of folding stability in protein function and its variety of applications. Regarding the conceptual concerns related to fitness modeling, we clarify them in detail in our responses to the specific comments below.

      Major concerns:

      (1) Fitness model: All lineages have the same growth rate r = b-d because the authors assume b+d=1. But under a birth-death model, the growth r is equivalent to fitness, so this is essentially assuming all lineages have the same absolute fitness since increases in reproductive fitness (b) will simply trade off with decreases in survival (d). Thus, even if the SCS model constrains sequence evolution, the birthdeath model does not really allow for non-neutral evolution such that mutations can feed back and alter the structure of the phylogeny. 

      We thank the reviewer for this comment that aims to improve the realism of our model. In the model presented (but see later another model, derived from the proposal of the reviewer, that we have now implemented into the framework and applied it to the study data), the fitness predicted from a protein variant is used to obtain the corresponding birth rate of that variant. In this way, protein variants with high fitness have high birth rates leading to overall more birth events, while protein variants with low fitness have low birth rates resulting in overall more extinction events, which has biological meaning for the study system. The statement “All lineages have the same growth rate r = b-d” in our model is incorrect because, in our model, b and d can vary among lineages according to the fitness. For example, a lineage might have b=0.9, d=0.1, r=0.8, while another lineage could have b=0.6, d=0.4, r=0.2. Indeed, the statement “this is essentially assuming all lineages have the same absolute fitness” is incorrect. Clearly, assuming that all lineages have the same fitness would not make sense, in that situation the folding stability of the forecasted protein variants would be similar under any model, which is not the case as shown in the results. In our model, the fitness affects the reproductive success, where protein variants with a high fitness have higher birth rates leading to more birth events, while those with lower fitness have higher death rates leading to more extinction events. This parameterization is meaningful for protein evolution because the fitness of a protein variant can affect its survival (birth or extinction) without necessarily affecting its rate of evolution. While faster growth rate can sometimes be associated with higher fitness, a variant with high fitness does not necessarily accumulate substitutions at a faster rate. Regarding the phylogenetic structure, the model presented considers variable birth and death events across different lineages according to the fitness of the corresponding protein variants, and this affects the derived phylogeny (i.e., protein variants selected against can go extinct while others with high fitness can produce descendants). We are not sure about the meaning of the term “mutations can feed back” in the context of our system. Note that we use Markov models of evolution, which are well-stablished in the field (despite their limitations), and substitutions are fixed mutations, which still could be reverted later if selected by the substitution model (Yang 2006). Altogether, we find that the presented birth-death model is technically correct and appropriate for modeling our biological system. Its integration with structurally constrained substitution (SCS) models of protein evolution as Markov models follows general approaches of molecular evolution in population genetics (Yang 2006; Carvajal-Rodriguez 2010; Arenas 2012; Hoban, et al. 2012). We have now provided a more detailed description of the models in the manuscript.

      Apart from these clarifications about the birth-death model used, we could understand the point of the reviewer and following the suggestion we have now incorporated an additional birth-death model that accounts for variable global birth-death rate among lineages. Specifically, we followed the model proposed by Neher et al (2014), where the death rate is considered as 1 and the birth rate is modeled as 1 + fitness. In this model, the global birth-death rate can vary among lineages. We implemented this model into the computer framework and applied it to the data used for the evaluation of the models. The results indicated that, in general, this model yields similar predictive accuracy compared to the previous birth-death model. Thus, accounting for variability in the global birth-death rate does not appear to play a major role in the studied systems of protein evolution. We have now presented this additional birth-death model and its results in the manuscript.

      (2) Predictive performance: Similar performance in predicting amino acid frequencies is observed under both the SCS model and the neutral model. I suspect that this rather disappointing result owes to the fact that the absolute fitness of different viral variants could not actually change during the simulations (see comment #1). 

      As indicated in our previous answer, our study shows a good accuracy in predicting the real folding stability of forecasted protein variants under a selection model, but not under a neutral model. Next, predicting the exact sequences was more challenging, which was not surprising considering previous studies. In particular, inferring specific sequences is considerably challenging even for ancestral molecular reconstruction (Arenas, et al. 2017; Arenas and Bastolla 2020). Indeed, observed sequence diversity is much greater than observed structural diversity (Illergard, et al. 2009; Pascual-Garcia, et al. 2010), and substitutions between amino acids with similar physicochemical properties can yield modeled protein variants with more accurate folding stability, even when the exact amino acid sequences differ. As indicated, further work is demanded in the field of substitution models of molecular evolution. Next, in this revised version, where we included analyses of additional real datasets, we found that the accuracy of sequence prediction can vary among datasets. Notably, the analysis of an influenza NS1 protein dataset, with higher diversity than the other datasets studied, showed that the SCS model was more accurate than the neutral model in predicting sequences across different time points. Datasets with relatively high sequence diversity can contain more evolutionary information, which can improve prediction quality. In any case, as previously indicated, we believe that efforts are required in the field of substitution models of molecular evolution. Apart from that, forecasting the folding stability of future real proteins is an important advance in forecasting protein evolution, given the essential role of folding stability in protein function (Scheiblhofer, et al. 2017; Bloom and Neher 2023) and its variety of applications.

      Next, also as indicated in our previous response, the birth-death model used in this study accounts for variation in fitness among lineages producing variable reproductive success. The additional birth-death model that we have now incorporated, which considers variation of the global birth-death rate among lineages, produced similar prediction accuracy, suggesting a limited role in protein evolution modeling. Molecular evolution parameters, particularly the substitution model, appear to be more critical in this regard. We have now included these aspects in the manuscript.

      (3) Model assessment: It would be interesting to know how much the predictions were informed by the structurally constrained sequence evolution model versus the birth-death model. To explore this, the authors could consider three different models: 1) neutral, 2) SCS, and 3) SCS + BD. Simulations under the SCS model could be performed by simulating molecular evolution along just one hypothetical lineage. Seeing if the SCS + BD model improves over the SCS model alone would be another way of testing whether mutations could actually impact the evolutionary dynamics of lineages in the phylogeny. 

      In the present study, we compared the neutral model + birth-death (BD) with the SCS model + BD. Markov substitution models Q are applied upon an evolutionary time (i.e., branch length, t) and this allows to determine the probability of substitution events during that time period [P(t) = exp (Qt)]. This approach is traditionally used in phylogenetics to model the incorporation of substitution events over time. Therefore, to compare the neutral and SCS models in terms of evolutionary inference, an evolutionary time is required, in this case it is provided by the birth-death process. Thus, the cases 1) and 2) cannot be compared without an underlined evolutionary history. Next, comparisons in terms of likelihood, and other aspects, between models that ignore the protein structure and the implemented SCS models are already available in previous studies based on coalescent simulations or given phylogenetic trees (Arenas, et al. 2013; Arenas, et al. 2015). There, SCS models outperformed models that ignore evolutionary constraints from the protein structure, and those findings are consistent with the results obtained in the present study where we explored the application of these models to forecasting protein evolution. We would like to emphasize that forecasting the folding stability of future real proteins is a significant finding, folding stability is fundamental to protein function and has a variety of applications. We have now indicated these aspects in the manuscript.

      (4) Background fitness effects: The model ignores background genetic variation in fitness. I think this is particularly important as the fitness effects of mutations in any one protein may be overshadowed by the fitness effects of mutations elsewhere in the genome. The model also ignores background changes in fitness due to the environment, but I acknowledge that might be beyond the scope of the current work. 

      AU: This comment made us realize that more information about the features of the implemented SCS models should be included in the manuscript. In particular, the implemented SCS models consider a negative design based on the observed residue contacts in nearly all proteins available in the Protein Data Bank (Arenas, et al. 2013; Arenas, et al. 2015). This data is distributed with the framework, and it can be updated to incorporate new structures (further details are provided in the distributed framework documentation and practical examples). Therefore, the prediction of folding stability is a combination of positive design (direct analysis of the target protein) and negative design (consideration of background proteins from a database to improve the predictions), thus incorporating background molecular diversity. We have now indicated this important aspect in the manuscript. Regarding the fitness caused by the environment, we agree with the reviewer. This is a challenge for any method aiming to forecast evolution, as future environmental shifts are inherently unpredictable and may affect the accuracy of the predictions. Although one might attempt to incorporate such effects into the model, doing so risks overparameterization, especially when the additional factors are uncertain or speculative. We have now mentioned this aspect in the manuscript.

      (5) In contrast to the model explored here, recent work on multi-type birth-death processes has considered models where lineages have type-specific birth and/or death rates and therefore also type-specific growth rates and fitness (Stadler and Bonhoeffer, 2013; Kunhert et al., 2017; Barido-Sottani, 2023). Rasmussen & Stadler (eLife, 2019) even consider a multi-type birth-death model where the fitness effects of multiple mutations in a protein or viral genome collectively determine the overall fitness of a lineage. The key difference with this work presented here is that these models allow lineages to have different growth rates and fitness, so these models truly allow for non-neutral evolutionary dynamics. It would appear the authors might need to adopt a similar approach to successfully predict protein evolution. 

      We agree with the reviewer that robust birth-death models have been developed applying statistics and, in many cases, the primary aim of those studies is the development and refinement of the model itself. Regarding the study by Rasmussen and Stadler 2019, it incorporates an external evaluation of mutation events where the used fitness is specific for the proteins investigated in that study, which may pose challenges for users interested in analyzing other proteins. In contrast, our study takes a different approach. We implement a fitness function that can be predicted and evaluated for any type of structural protein (Goldstein 2013), making it broadly applicable. Actually, in this revised version we added the analysis of additional data of another protein (influenza NS1 protein) with predictions at different time points. In addition, we provide a freely available and well-documented computational framework to facilitate its use. The primary aim of our study is not the development of novel or complex birthdeath models. Rather, we aim to explore the integration of a standard birth-death model with SCS models for the purpose of predicting protein evolution. In the context of protein evolution, substitution models are a critical factor (Liberles, et al. 2012; Wilke 2012; Bordner and Mittelmann 2013; Echave, et al. 2016; Arenas, et al. 2017; Echave and Wilke 2017), and the presented combination with a birth-death model constitutes a first approximation upon which next studies can build to better understand this evolutionary system. We have now indicated these considerations in the manuscript.

      Reviewer #2 (Public review): 

      Summary: 

      In this study, "Forecasting protein evolution by integrating birth-death population models with structurally constrained substitution models", David Ferreiro and coauthors present a forward-in-time evolutionary simulation framework that integrates a birth-death population model with a fitness function based on protein folding stability. By incorporating structurally constrained substitution models and estimating fitness from ΔG values using homology-modeled structures, the authors aim to capture biophysically realistic evolutionary dynamics. The approach is implemented in a new version of their open-source software, ProteinEvolver2, and is applied to four viral proteins from HIV-1 and SARS-CoV-2. 

      Overall, the study presents a compelling rationale for using folding stability as a constraint in evolutionary simulations and offers a novel framework and software to explore such dynamics. While the results are promising, particularly for predicting biophysical properties, the current analysis provides only partial evidence for true evolutionary forecasting, especially at the sequence level. The work offers a meaningful conceptual advance and a useful simulation tool, and sets the stage for more extensive validation in future studies.

      We thank the reviewer for the positive comments on our study. Regarding the predictive power, the results showed good accuracy in predicting the folding stability of the forecasted protein variants. In this revised version, where we included analyses of additional real datasets, we found that the accuracy of sequence prediction can vary among datasets. Notably, the analysis of an influenza NS1 protein dataset, with higher diversity than the other datasets studied, showed that the SCS model was more accurate than the neutral model in predicting sequences across different time points. Datasets with relatively high sequence diversity can contain more evolutionary information, which can improve prediction quality. Still, we believe that further efforts are required in the field in improving the accuracy of substitution models of molecular evolution. Altogether, accurately forecasting the folding stability of future real proteins is fundamental for predicting their protein function and enabling a variety of applications. Also, we implemented the models into a freely available computer framework, with detailed documentation and a variety of practical examples.

      Strengths: 

      The results demonstrate that fitness constraints based on protein stability can prevent the emergence of unrealistic, destabilized variants - a limitation of traditional, neutral substitution models. In particular, the predicted folding stabilities of simulated protein variants closely match those observed in real variants, suggesting that the model captures relevant biophysical constraints. 

      We agree with the reviewer and appreciate the consideration that forecasting the folding stability of future real proteins is a relevant finding. For instance, folding stability is fundamental for protein function and affects several other molecular properties.

      Weaknesses: 

      The predictive scope of the method remains limited. While the model effectively preserves folding stability, its ability to forecast specific sequence content is not well supported. 

      Our study showed a good accuracy in predicting the real folding stability of forecasted protein variants under a selection model, but not under a neutral model. Predicting the exact sequences was more challenging, which was not surprising considering previous studies. In particular, inferring specific sequences is considerably challenging even for ancestral molecular reconstruction (Arenas, et al. 2017; Arenas and Bastolla 2020). Indeed, observed sequence diversity is much greater than observed structural diversity (Illergard, et al. 2009; Pascual-Garcia, et al. 2010), and substitutions between amino acids with similar physicochemical properties can yield modeled protein variants with more accurate folding stability, even when the exact amino acid sequences differ. As indicated, further work is demanded in the field of substitution models of molecular evolution. Next, in this revised version, where we included analyses of additional real datasets, we found that the accuracy of sequence prediction can vary among datasets. Notably, the analysis of an influenza NS1 protein dataset, with higher diversity than the other datasets studied, showed that the SCS model was more accurate than the neutral model in predicting sequences across different time points. Datasets with relatively high sequence diversity can contain more evolutionary information, which can improve prediction quality. In any case, as previously indicated, we believe that efforts are required in the field of substitution models of molecular evolution. Apart from that, forecasting the folding stability of future real proteins is an important advance in forecasting protein evolution, given the essential role of folding stability in protein function (Scheiblhofer, et al. 2017; Bloom and Neher 2023) and its variety of applications. We have now expanded these aspects in the manuscript.

      Only one dataset (HIV-1 MA) is evaluated for sequence-level divergence using KL divergence; this analysis is absent for the other proteins. The authors use a consensus Omicron sequence as a representative endpoint for SARS-CoV-2, which overlooks the rich longitudinal sequence data available from GISAID. The use of just one consensus from a single time point is not fully justified, given the extensive temporal and geographical sampling available. Extending the analysis to include multiple timepoints, particularly for SARS-CoV-2, would strengthen the predictive claims. Similarly, applying the model to other well-sampled viral proteins, such as those from influenza or RSV, would broaden its relevance and test its generalizability. 

      The evaluation of forecasting evolution using real datasets is complex due to several conceptual and practical aspects. In contrast to traditional phylogenetic reconstruction of past evolutionary events and ancestral sequences, forecasting evolution often begins with a variant that is evolved forward in time and requires a rough fitness landscape to select among possible future variants (Lässig, et al. 2017). Another concern for validating the method is the need to know the initial variant that gives rise to the corresponding future (forecasted) variants, and it is not always known. Thus, we investigated systems where the initial variant, or a close approximation, is known, such as scenarios of in vitro monitored evolution. In the case of SARS-CoV-2, the Wuhan variant is commonly used as the starting variant of the pandemic. Next, since forecasting evolution is highly dependent on the used model of evolution, unexpected external factors can be dramatic for the predictions. For this reason, systems with minimal external influences provide a more controlled context for evaluating forecasting evolution. For instance, scenarios of in vitro monitored virus evolution avoid some external factors such as host immune responses. Another important aspect is the availability of data at two (i.e., present and future) or more time points along the evolutionary trajectory, with sufficient genetic diversity between them to identify clear evolutionary signatures. Additionally, using consensus sequences can help mitigate effects from unfixed mutations, which should not be modeled by a substitution model of evolution. Altogether, not all datasets are appropriate to properly evaluate or apply forecasting evolution. These aspects are indicated in the manuscript. Sequence comparisons based on the KL divergence require, at the studied time point, an observed distribution of amino acid frequencies among sites and an estimated distribution of amino acid frequencies among sites. In the study datasets, this is only the case for the HIV-1 MA dataset, which belongs to a previous study from one of us and collaborators where we obtained at least 20 independent sequences at each sampling point (Arenas, et al. 2016). This aspect is now more clearly indicated in the manuscript. Regarding the Omicron datasets, we used 384 curated sequences of the Omicron variant of concern to construct the study data and we believe that it is a representative sample. The sequence used for the initial time point was the Wuhan variant (Wu, et al. 2020), which is commonly assumed to be the origin of the pandemic in SARS-CoV-2 studies. As previously indicated, the use of consensus sequences is convenient to avoid variants with unfixed mutations. Regarding extending the analysis to other time points (other variants of concern), we kindly disagree because Omicron is the variant of concern with the highest genetic distance to the Wuhan variant, and a high genetic distance is required to properly evaluate the prediction method. Actually, we noted that earlier variants of concern show a small number of fixed mutations in the study proteins, despite the availability of large numbers of sequences in databases such as GISAID. Additionally, we investigated the evolutionary trajectories of HIV-1 protease (PR) in 12 intra-host viral populations with predictions for up to four different time points. Apart from those aspects, following the proposal of the reviewer, we have now incorporated the analysis of an additional dataset of influenza NS1 protein (Bao, et al. 2008), with predictions for two different time points, to further assess the generalizability of the method. We have now included details of this influenza NS1 protein dataset and the predictions derived from it in the manuscript.

      It would also be informative to include a retrospective analysis of the evolution of protein stability along known historical trajectories. This would allow the authors to assess whether folding stability is indeed preserved in real-world evolution, as assumed in their model.

      Our present study does not aim to investigate the evolution of the folding stability over time, although it provides this information indirectly at the studied time points. Instead, the present study shows that the folding stability of the forecasted protein variants is similar to the folding stability of the corresponding real protein variants for diverse viral proteins, which provides an important evaluation of the prediction method. Next, the folding stability can indeed vary over time in both real and modeled evolutionary scenarios, and our present study is not in conflict with this. In that regard, which is not the aim of our present study, some previous phylogenetic-based studies have reported temporal fluctuations in folding stability for diverse protein data (Arenas, et al. 2017; Olabode, et al. 2017; Arenas and Bastolla 2020; Ferreiro, et al. 2022).

      Finally, a discussion on the impact of structural templates - and whether the fixed template remains valid across divergent sequences - would be valuable. Addressing the possibility of structural remodeling or template switching during evolution would improve confidence in the model's applicability to more divergent evolutionary scenarios.

      This is an important point. For the datasets that required homology modeling (in several cases it was not necessary because the sequence was present in a protein structure of the PDB), the structural templates were selected using SWISS-MODEL, and we applied the best-fitting template. We have now included in a supplementary table details about the fitting of the structural templates. Indeed, our proposal assumes that the protein structure is maintained over the studied evolutionary time, which can be generally reasonable for short timescales where the structure is conserved (Illergard, et al. 2009; Pascual-Garcia, et al. 2010). Over longer evolutionary timescales, structural changes may occur and, in such cases, modeling the evolution of the protein structure would be necessary. To our knowledge, modeling the evolution of the protein structure remains a challenging task that requires substantial methodological developments. Recent advances in artificial intelligence, particularly in protein structure prediction from sequence, may offer promising tools for addressing this challenge. However, we believe that evaluating such approaches in the context of structural evolution would be difficult, especially given the limited availability of real data with known evolutionary trajectories involving structural change. In any case, this is probably an important direction for future research. We have now included this discussion in the manuscript.

      Reviewer #1 (Recommendations for the authors): 

      (1) Abstract: "expectedly, the errors grew up in the prediction of the corresponding sequences" <- Not entirely clear what is meant by "errors grew up" or what the errors grew with.

      This sentence refers to the accuracy of sequence prediction in comparison to that of folding stability prediction. We have now clarified this aspect in the manuscript.

      (2) Lines 162-165: "Alternatively, if the fitness is determined based on the similarity in folding stability between the modeled variant and a real variant, the birth rate is assumed to be 1 minus the root mean square deviation (RMSD) in folding stability." <- What is the biological motivation for using the RMSD? It seems like a more stable variant would always have higher fitness, at least according to Equation 1.

      RMSD is commonly used in molecular biology to compare proteins in terms of structural distance, folding stability, kinetics, and other properties. It offers advantages such as minimizing the influence of small deviations while amplifying larger differences, thereby enhancing the detection of remarkable molecular changes. Additionally, RMSD would facilitate the incorporation of other biophysical parameters, such as structural divergences from a wild-type variant or entropy, which could be informative for fitness in future versions of the method. We have now included this consideration in the manuscript.

      (3) Lines 165-166: "In both cases, the death rate (d) is considered as 1-b to allow a constant global (birth-death) rate" <- This would give a constant R = b / (1-b) over the entire phylogenetic tree. For applications to pathogens like viruses with epidemic dynamics, this is extremely implausible. Is there any need to make such a restrictive assumption? 

      Regarding technical considerations of the model, we refer to our answer to the first public review comment. Next, a constant global rate of evolution was observed in numerous genes and proteins of diverse organisms, including viruses (Gojobori, et al.1990; Leitner and Albert 1999; Shankarappa, et al. 1999; Liu, et al. 2004; Lu, et al. 2018; Zhou, et al. 2019). However, following the comment of the reviewer, and as we indicated in our answer to the first public review comment, we have now implemented and evaluated an additional birth-death model that allows for variation in the global birth-death rate among lineages. We have implemented this additional model in the framework and described it along with its results in the manuscript.

      (4) Lines 187-188: "As a consequence, since b+d=1 at each node, tn is consistent across all nodes, according to (Harmon, 2019)." <- This would also imply that all lineages have a growth rate r = b - d, which under a birth-death model is equivalent to saying all lineages have the same fitness! 

      We clarified this aspect in our answer to the first public review comment. In particular, in the model presented, protein variants with higher fitness have higher birth rates, leading to more birth events, while protein variants with lower fitness have lower birth rates leading to more extinction events, which presents biological meaning for the study system. In our model b and d can vary among lineages according to the corresponding fitness (i.e., a lineage may have b=0.9, d=0.1, r=0.8; while another one may have b=0.6, d=0.4, r=0.2). Since the reproductive success varies among lineages in our model, the statement “this is essentially assuming all lineages have the same absolute fitness” is incorrect, although it could be interpreted like that in certain models. Fitness affects reproductive success, but fitness and growth rate of evolution are different biological processes (despite a faster growth rate can sometimes be associated with higher fitness, a variant with a high fitness not necessarily has to accumulate substitutions at a higher rate). An example in molecular adaptation studies is the traditional nonsynonymous to synonymous substitution rates ratio (dN/dS), where dN/dS (that informs about selection derived from fitness) can be constant at different rates of evolution (dN and dS). In any case, we thank the reviewer for raising this point, which led us to incorporate an additional birth-death model and inspired some ideas.  Thus, following the comment of the reviewer and as indicated in the answer to the first public review comment, we have now implemented and evaluated an additional birthdeath model that allows for variation in the global birth-death rate among lineages. The results indicated that this model yields similar predictive accuracy compared to the previous birth-death model. We have now included these aspects, along with the results from the additional model, in the manuscript.

      (5) Line 321-322: "For the case of neutral evolution, all protein variants equally fit and are allowed, leading to only birth events," <- Why would there only be birth events? Lineages can die regardless of their fitness. 

      AU: In the neutral evolution model, all protein variants have the same fitness, resulting in a flat fitness landscape. Since variants are observed, we allowed birth events. However, it assumed the absence of death events as no information independent of fitness is available to support their inclusion and quantification, thereby avoiding the imposition of arbitrary death events based on an arbitrary death rate. We have now provided a justification of this assumption in the manuscript.

      Reviewer #2 (Recommendations for the authors): 

      (1) Clarify the purpose of the alternative fitness mode ("ΔG similarity to a target variant"): 

      The manuscript briefly introduces an alternative fitness function based on the similarity of a simulated protein's folding stability to that of a real protein variant, but does not provide a clear motivation, usage scenario, or results derived from it. 

      The presented model provides two approaches for deriving fitness from predicted folding stability. The simpler approach assumes that a more stable protein variant has higher fitness than a less stable one. The alternative approach assigns high fitness to protein variants whose stability closely matches observed stability, acknowledging that the real observed stability is derived from the real selection process, and this approach considers negative design by contrasting the prediction with real information. For the analyses of real data in this study, we used the second approach, guided by these considerations. We have now clarified this aspect in the manuscript.

      (2) Report structural template quality and modeling confidence: 

      Since folding stability (ΔG) estimates rely on structural models derived from homology templates, the accuracy of these predictions will be sensitive to the choice and quality of the template structure. I recommend that the authors report, for each protein modeled, the template's sequence identity, coverage, and modeling quality scores. This will help readers assess the confidence in the ΔG estimates and interpret how template quality might impact simulation outcomes. 

      We agree with the reviewer and we have now included additional information in a supplementary table regarding sequence identity, modeling quality and coverage of the structural templates for the proteins that required homology modeling. The selection of templates was performed using the well-established framework SWISS-MODEL and the best-fitting template was chosen. Next, a large number of protein structures are available in the PDB for the study proteins, which favors the accuracy of the homology modeling. For some datasets, homology modeling was not required, as the modeled sequence was already present in an available protein structure. We have now included this information in the manuscript and in a supplementary table.

      (3) Clarify whether structural remodeling occurs during simulation: 

      It appears that folding stability (ΔG) for all simulated protein variants is computed by mapping them onto a single initial homology model, without remodeling the structure as sequences evolve. If correct, this should be clearly stated, as it assumes that the structural fold remains valid across all simulated variants. A discussion on the potential impact of structural drift would be welcome.

      We agree with the reviewer. As indicated in our answer to a previous comment, our method assumes that the protein structure is maintained over the studied evolutionary time, which is generally acceptable for short timescales where the structure is conserved (Illergard, et al. 2009; Pascual-Garcia, et al. 2010). At longer timescales the protein structure could change, requiring the modeling of structural evolution over the evolutionary time. To our knowledge, modeling the evolution of the protein structure remains a challenging task that requires substantial methodological developments. Recent advances in artificial intelligence, particularly in protein structure prediction from sequence, can be promising tools for addressing this challenge. However, we believe that evaluating such approaches in the context of structural evolution would be difficult, especially given the limited availability of real datasets with known evolutionary trajectories involving structural change. In any case, this is probably an important direction for future research. We have now included this discussion in the manuscript.

    1. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      In this study, the authors develop a complete integral drive system in Anopheles gambiae malaria mosquitoes. This type of gene drive is interesting, with special advantages and disadvantages compared to more common designs. Here, the authors develop the Cas9 element and combine it with a previously developed antimalaria effector element. The new element performs very well in terms of drive efficiency, but it has unintended fitness costs, and a higher than desirable rate of functional resistance allele formation. Nevertheless, this study represents a very good step forward toward developing effective gene drives and is thus of high impact.

      The format of the manuscript is a bit suboptimal for review. Please add line numbers next time for easy reference. It would also help to have spaces between paragraphs and to have figures (with legends) added to the text where they first appear.

      It might be useful to add subsections to the results, just like in the methods section. It could even be expanded a bit with some specific parts from the discussion, through this is optional.

      Abstract: The text says: "As a minimal genetic modification, nanosd does not induce widespread transcriptomic perturbations." However, it does seem to change things based on Figure 3c.

      Page 2: "drive technologies for public health and pest control applications" needs a period afterward.

      Page 2: "The fitness costs, homing efficiency, and resistance rate of the gene drive is" should be "The fitness costs, homing efficiency, and resistance rate of the gene drive are".

      Page 2: "When they target important mosquito genes, gene drives are designed to ensure that the nuclease activity window (germline) does not overlap with that of the target gene (somatic)." is note quite correct. This is, of course, sensible for suppression drives, but it's not a necessary requirement for modification drives with rescue elements in many situations.

      Page 2: "recessive somatic fitness cost phenotypes" is unclear. I think that you are trying to avoid the recessive fitness cost of null alleles becoming a dominant fitness cost from a gene drive allele (in drive-wild-type heterozygotes).

      Page 2: "This optimization approach has had only limited success, and suboptimal performance is commonly attributed to not capturing all the regulatory elements specific to the germline gene's expression9,12". I don't think this is correct. There are several examples where a new promoter helped a lot. The zpg promoter in Anopheles gambiae allowed success at the dsx site in suppression cage studies (Kyrou et al 2018), and nanos gave big improvement to modification drives at the cardinal locus (Carballer et al 2020). In flies, several promoters were tested, and one allowed success in cage experiments (Du et al 2024). In Aedes, the shu promoter allowed for high drive performance (Anderson et al 2023), though this last one hasn't been tested in more difficult situations. I think you could certainly argue in the general case that not all promoters will work the way their transcriptome says, but there are many examples where they seem to be pretty good.

      Page 2: "make it more likely that mutations that disrupt the drive components are selected against though loss of function of the host gene." I think that this needs a bit more explanation. You are referring to mutations in regulatory elements or frameshift mutations. This will make it more resistant to mutation. Also, these mutations would tend to have a minor effect expect perhaps in the cargo gene of a modification drive. By using a cargo gene in an integral drive, you could still keep it somewhat safer, but whether this is 1.2x or 10x safer is unclear.

      Page 3: "they can incur severe unintended fitness costs". This is central to integral drives and this manuscript. It's worth elaborating on.

      Page 3: "Regulatory elements from germline genes that have worked sub-optimally in traditional gene drive designs for the reasons outlined above may work well in an IDG design20." This is setting up the integral drive with nanos, but nanos DOES work well in traditional Anopheles gambiae gene drive designs. It is possible that you might end up with less somatic expression than Hammond et al 2020 (though the comparison is unclear due to batch effects in that study), but there is no direct comparison in this manuscript to that.

      Page 3: "This suggests an impact of maternal deposition on drive efficiency only in female drive carriers." This is quite strange. Usually, I would expect to see an equal reduction in efficiency between male and female progeny. Could this be due to limited sample size? Random idea: It's also possible that almost all maternal deposition was mosaic and wouldn't be enough to direct affect drive conversion. However, it could cause enough of a fitness cost TOGETHER with new drive expression in females that perhaps only tissues with randomly low expression rates properly developed and led to progeny, reducing drive inheritance? Another possibility: Could the drive/resistance males have impaired fertility, so these ones are underrepresented in the batch cross? If nanos is needed in males and a single drive copy is not quite enough for good fertility or mating competitiveness, they may be underrepresented in your crosses. They might have worse fertility than drive homozygous males, which at least have two partially working copies of nanos rather than just one (in many cells, at least). Maybe check the testis for abnormal phenotypes?

      Overall, it would be favorable if the drive allele was somewhere more fit than a nonfunctional resistance allele. This could already be achieved in this drive, but it doesn't get much mention.

      Page 3: There should be a comma after, "Interestingly, while many of the observed mutations were predicted to abolish nanos expression" and "This could indicate that in these experiments".

      Page 3 last sentence: Please improve the clarity.

      Removing the EGFP is supposed to restore the fitness, and this was helpful in some previous integral drive constructs. This could get a bit more mention (it is possible that I missed this somewhere in the manuscript).

      Page 4: The MM-CP line and it's association with the integral drive strategy could get a little more introduction. Maybe even a supplemental figure showing the general idea.

      Page 5: "cassette is predicted to disrupt the CP function entirely (Fig. 5d)" also lacks a period.

      Page 5: "The subsequent stabilization of the nanosd frequency and the lack of rapid loss suggests that any associated fitness cost is primarily recessive." This is not quite correct because by this point, drive/wild-type heterozygotes are rare, and this is where you'd find a potential dominant fitness cost. It should be correct in the end stages where it is a mix of drive and functional/nonfunctional resistance alleles (though the nonfunctional resistance alleles may cause greater fitness costs when together with a drive - see above).

      Page 6: "Maternal deposition of Cas9, or Cas9;gRNA, into the zygote can lead to cutting at stages when homing is not favoured, and has been commonly observed for canonical Anopheles nanos drives9,10,35." Reference 35 (which is more suitable for referencing an example of nanos in other Anopheles) found some resistance alleles by deep sequencing, but the timing that they formed was unclear (it's not certain if it was maternal deposition). This study may be a more suitable reference: Carballar-Lejarazú R, Tushar T, Pham TB, James AA. Cas9-mediated maternal-effect and derived resistance alleles in a gene-drive strain of the African malaria vector mosquito, Anopheles gambiae. Genetics, 2022.

      Page 8: "could further reduce the likelihood of resistance allele formation by increasing the frequency of HDR events." Multiple gRNAs would mostly help by reducing functional resistance allele formation, especially since drive conversion is already very high in Anopheles.

      Page 8, last paragraph: This conclusion is perhaps a little optimistic considering the functional resistance alleles, which should get a little more attention in the summary or elsewhere in the discussion section.

      Figure 1d: The vertical text saying "Non-WT" is confusing. The circles themselves show + and -. Also, "-" isn't necessarily a knockout allele, so I'm not sure if - is the best symbol for resistance.

      Figure 2e: The vertical scale is not the most intuitive. Consider rearranging it to "Transition from larvae to pupae" starting at zero and going to 1 when all the larvae become pupae.

      Figure 2e-f: For both of these, there are clear differences between males and females. Thus, when comparing drive homozygotes to wild-type, it would probably be better to have separate statistical comparisons for males and females.

      Figure 3: Can any of these transcription results in individual genes potentially explain the observed fitness cost?

      Figure 3b: The scale here also doesn't quite make sense. It's the fraction of underdeveloped ovaries, but the graph is also perhaps trying to show whether just 1-2 ovaries are present, or maybe how many ovaries are undeveloped, but then it would say "zero"? This should be clarified. Number of ovaries and how well-developed they are is separate (it can be put on the same graph, but needs to be more clear).

      Figure 4f: The vertical axis should say "ONNV."

      Figure 5c-d: These should be labeled as the most common resistance allele. Also, I'm not sure how relevant it is, but we also found an alternate start codon here: Hou S, Chen J, Feng R, Xu X, Liang N, Champer J. A homing rescue gene drive with multiplexed gRNAs reaches high frequency in cage populations but generates functional resistance. J Genet Genomics, 2024. Maybe this is a more common problem than one would expect?

      Figure 5cd,S4,S5: They have a bit of a weird plot. Why not make four line graphs for each? Also, some alleles use the  symbol. + is wild-type, which is well understood, but - as resistance is not always clear, and seeing them together may confuse readers. Additionally, the fact that you have the most common resistance allele in Figure 5cd might mean that you know more about the genotype? If so, it would be best to separate wild-type and resistance alleles in whatever the final figure looks like.

      Some supplemental raw data files would be useful if they were available, but the figures are through enough that this isn't essential.

      Review by:

      Jackson Champer, with major assistance from Ruobing Feng (essentially section B) and Jie Du

      Referee cross-commenting

      We don't have any cross-comments, other than supporting the idea of slightly more comparisons to the authors' previous construct.

      Significance

      • Describe the nature and significance of the advance (e.g. conceptual, technical, clinical) for the field.

      A key innovation of the nanosd gene drive is its integral gene drive (IGD) design, which inserts the drive cassette directly into the A. gambiae nanos gene, incorporating only the minimal components necessary for drive function. The drive achieves high transmission rates, without causing widespread disruption of gene expression or increasing susceptibility to malaria parasites, and imposes an acceptable fitness cost-primarily a reduction in female fecundity when homozygous. The strong performance of nanosd can be attributed to its design: Cas9 is expressed in the correct cells and timing to induce efficient homing, effectively hijacking the nanos gene's natural expression profile. However, despite the careful design aimed at preserving nanos function, the rescue was incomplete: homozygous female drive carriers exhibited a clear reduction in ovarian function.

      In caged population trials, both the drive and a co-introduced anti-malaria effector gene reached high frequencies, even in the presence of emerging resistance alleles. Because the drive is inserted into an essential gene, nonfunctional resistance alleles are selected against and tend to be purged over time. Nonetheless, functional resistance remains a concern. The use of a single, though precisely positioned gRNA targeting the native nanos gene ATG site increases the likelihood of generating functional resistance alleles. Over the long term, if the drive imposes fitness costs, it may be outcompeted by such functional resistance alleles, potentially undermining the goal of sustained population modification.

      Overall, this study represent a notable advance in Anopheles mosquito gene drive development and can be considered as high impact. - Place the work in the context of the existing literature (provide references, where appropriate).

      Previous IGD efforts in Drosophila, mice and mosquitoes have demonstrated nearly super‐Mendelian inheritance but often at the expense of host fitness. For example, Nash et al. built an intronic‐gRNA Cas9 drive at the D. melanogaster rcd-1r locus that propagated efficiently through cage populations (Nash et al., 2022), and Gonzalez et al. reported that a Cas9 drive inserted at the germline zpg locus in Anopheles stephensi biased inheritance by ~99.8% (Gonzalez et al., 2025). However, these strong drives disrupted essential genes: in A. gambiae, inserting Cas9 into zpg produced efficient homing but rendered females largely sterile (Ellis et al., 2022). A similar germline Cas9 knock-in in Mus musculus enabled gene conversion in both sexes, albeit with only modest efficiency and potential fitness trade-offs (Weitzel et al., 2021). The current nanosd IGD is explicitly designed to overcome this limitation by selecting a more permissive gene target and using a minimal drive cassette, so as to preserve mosquito viability while maintaining robust drive efficiency, although still with reduced female drive homozygotes fertility.

      This nanosd gene drive like previous homing drives in Anopheles, is capable of achieving a high level of inheritance bias. Although it uses the endogenous nanos regulatory elements, which have less leaky somatic expression compared to using vasa (Gantz et al., 2015; Hammond et al., 2016; Hammond et al., 2017) or zpg promoters(Hammond et al., 2021; Kyrou et al., 2018), to drive Cas9 expression and thereby reduces somatic expression-induced female sterility, the incomplete rescue of nanos function still leads to reduced female fertility in drive homozygotes. - State what audience might be interested in and influenced by the reported findings.

      It's worth noting the broad audience that will find this work relevant. Gene drive developers and molecular geneticists will be impressed by the good drive performance and directly influenced by the design principles showcased here. The study's integral gene drive architecture that leverages the endogenous nanos regulatory elements, in-frame E2A peptide linkage for co-expression, and intronic insertion of gRNA and selectable markers addresses long-standing challenges in promoter leakage, somatic fitness costs, and resistance allele evolution. What's more, vector biologists and malaria researchers will be interested in the successful deployment of a gene drive in A. gambiae that actually carries a disease-blocking trait. - Define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate.

      We have worked on CRISPR gene drive development in both fruit flies and Anopheles mosquitoes and have experience with modeling their spread.

      References

      Ellis, D.A., Avraam, G., Hoermann, A., Wyer, C.A.S., Ong, Y.X., Christophides, G.K., and Windbichler, N. (2022). Testing non-autonomous antimalarial gene drive effectors using self-eliminating drivers in the African mosquito vector Anopheles gambiae. PLOS Genetics 18, e1010244-e1010244.

      Gantz, V.M., Jasinskiene, N., Tatarenkova, O., Fazekas, A., Macias, V.M., Bier, E., and James, A.A. (2015). Highly efficient Cas9-mediated gene drive for population modification of the malaria vector mosquito Anopheles stephensi. Proc Natl Acad Sci U S A 112, E6736-E6743.

      Gonzalez, E., Anderson, M.A.E., Ang, J.X.D., Nevard, K., Shackleford, L., Larrosa-Godall, M., Leftwich, P.T., and Alphey, L. (2025). Optimization of SgRNA expression with RNA pol III regulatory elements in Anopheles stephensi. Scientific Reports 15, 13408.

      Hammond, A., Galizi, R., Kyrou, K., Simoni, A., Siniscalchi, C., Katsanos, D., Gribble, M., Baker, D., Marois, E., Russell, S., et al. (2016). A CRISPR-Cas9 gene drive system targeting female reproduction in the malaria mosquito vector Anopheles gambiae. Nat Biotechnol 34, 78-83.

      Hammond, A., Karlsson, X., Morianou, I., Kyrou, K., Beaghton, A., Gribble, M., Kranjc, N., Galizi, R., Burt, A., Crisanti, A., et al. (2021). Regulating the expression of gene drives is key to increasing their invasive potential and the mitigation of resistance. PLOS Genetics 17, e1009321-e1009321.

      Hammond, A.M., Kyrou, K., Bruttini, M., North, A., Galizi, R., Karlsson, X., Kranjc, N., Carpi, F.M., D'Aurizio, R., Crisanti, A., et al. (2017). The creation and selection of mutations resistant to a gene drive over multiple generations in the malaria mosquito. PLOS Genetics 13, e1007039-e1007039.

      Kyrou, K., Hammond, A.M., Galizi, R., Kranjc, N., Burt, A., Beaghton, A.K., Nolan, T., and Crisanti, A. (2018). A CRISPR-Cas9 gene drive targeting doublesex causes complete population suppression in caged Anopheles gambiae mosquitoes. Nature Biotechnology 36, 1062-1066.

      Nash, A., Capriotti, P., Hoermann, A., Papathanos, P.A., and Windbichler, N. (2022). Intronic gRNAs for the construction of minimal gene drive systems. Frontiers in Bioengineering and Biotechnology 0, 570-570. Weitzel, A.J., Grunwald, H.A., Ceri, W., Levina, R., Gantz, V.M., Hedrick, S.M., Bier, E., and Cooper, K.L. (2021). Meiotic Cas9 expression mediates gene conversion in the male and female mouse germline. Plos Biol 19, e3001478-e3001478.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC-2025-03064

      Corresponding author(s): Massimo, Hilliard; Sean, Coakley

      1. General Statements

      We are grateful to the reviewers for taking time to review our manuscript and for providing such clear, insightful and actionable suggestions. The consensus between 4 independent reviewers that this story is of general interest to cell biologists, neurobiologists and clinical researchers is remarkable. In addition to our mechanistic insights into the regulation of GTPase activity, we think that the experimental systems we have developed will be of great value to study how GTPases their associated GAPs and GEFs function to maintain the nervous system, especially due to the demonstrated conservation of these molecules. We believe that our data provides a powerful and tractable model to study such molecules in a physiological context.

      We agree with the reviewers' concerns and propose the following plan below to address them.

      2. Description of the planned revisions

      Reviewer #1(Evidence, reproducibility and clarity (Required)):


      __Summary Stability of the PLM axon in C. elegans is maintained through interactions with the epidermis. Previous studies by this group found that loss of the tbc-10 Rab GTPase Activating Protein strongly enhanced the PLM axon break phenotype of unc-70/beta-spectrin mutants. TBC-10 is a GAP for RAB-35 and thus loss of rab-35 suppresses the tbc-10 phenotype. Of the two RAB-35 GEFs, loss of RME-4 partially suppressed the tbc-10 phenotype and FLCN-1 was not involved suggesting that there may be an additional GEF involved. Here Bonacossa-Pereira et al identify a point mutation in agef-1a (vd92) as a suppressor of tbc-10 PLM axon break phenotype (all experiments also have a dominant allele of unc-70) and confirm that point mutation is causative by replicating the mutation via genome editing (vd123). Rescue experiments demonstrate that AGEF-1a is required in the epidermis and not PLM as previous demonstrated with tbc-10 and unc-70. Rescue is dependent on a functional SEC7/GEF activity. AGEF-1a is a functional ortholog to human BIG2/ArfGEF2 as its expression fully rescues tbc-10. AGEF-1a functions upstream of RAB-35 as expression of activated RAB-35 can suppress loss of agef-1. AGEF-1a functions in parallel to RME-4 as the double has stronger suppression of tbc-10. AGEF-1a is an ARF GEF, however it functions independently of ARF-1.2 as loss of arf-1.2 does not suppress tbc-10. They demonstrate that AGEF-1a interacts with RAB-35 through colocalization experiments suggesting that AGEF-1a could directly activate RAB-35. Finally, they demonstrate that AGEF-1a regulates the localization of the LET-805 epidermal attached complex component as it restores localization in a tbc-10 mutant.

      Major comments

      The manuscript is well written and easy to understand.

      The experiments are well done and controlled.

      I enjoyed reading this paper. However...

      Some of the claims are not supported by the data.__

      __1) The claim that AGEF-1a directly interacts with RAB-35 was not demonstrated. The evidence provided to support a direct interaction are colocalization experiments in Figure 3. AGEF-1a does partially colocalize with RAB-35 in the epidermis. However, colocalization does not indicate a physical interaction direct or indirect. A simple fix would be to change the claim to that they partially colocalize. Optional, a physical interaction could be done with the split-GFP since they already have the AGEF-1 strain or they could perform co-IP experiments, though neither of those are proof of direct interactions.

      __

      We agree that the biochemical co-IP experiment could provide some answers, however, using a full length AGEF-1a would not only represent a significant technical challenge but will also not prove a direct interaction in a physiological context. To overcome this limitation, and to directly test their interaction in vivo, we propose to use a split-GFP approach as suggested by the reviewer. In this experiment, we will generate an endogenously tagged GFP1-10::rab-35 allele and combine it with the previously generated and available tagged agef-1a::GFP11x7. If AGEF-1 and RAB-35 closely interact, we should observe the reconstitution of full length GFP. It is possible that the endogenously tagged versions only provide a very weak GFP signal that will be difficult to detect. As an alternative approach, we will generate the same tagged molecules as overexpressed transgenes under epidermal-specific promoters (such as Pdpy-7). If the results are still negative, we agree to temper our claim that these molecules physically interact and rephrase the manuscript to reflect the new data.

      • *

      2) The claim that AGEF-1a facilitates RAB-35 activation is not supported. While it is likely that AGEF-1a facilitates RAB-35 activation based on the epistasis experiments as well as studies in mammalian cells there were no experiments to demonstrate that modulating AGEF-1a activity resulted in a change in RAB-35 activity. I would suggest tempering this claim to something along the line that the data are consistent with AGEF-1a regulating RAB-35 activity as shown in mammalian cells. An optional experiment would be to look at the colocalization of RAB-35 with a known effector in wild type and agef-1(vd92) with the expectation that there would be a higher level of colocalization in agef-1 mutants. Effector pull-down experiments or perhaps a cell based GEF assay could be used (PMID: 35196081).


      We welcome this suggestion and acknowledge the limitations of these experiments. While we might be able to determine if AGEF-1 and RAB-35 physically interact in vivo with the experiments proposed above, screening for the relevant rab-35 effector in this context and/or doing effector pull-down/cell based GEF assays would be a significant technical challenge. We propose to temper our claim as suggested.

      3) The claim that AGEF-1a functions independently of ARF-1.2 is not well supported. The fact that the ARF-1.2 mutant does not suppress tbc-10 suggests that ARF-1.2 may not be involved but does not eliminate the possibility that ARF-1.2 functions redundantly with ARF-5 or WARF-1/ARF-1.1. This can be resolved by toning down the claim. Alternatively, this can be tested by RNAi of arf-5 and warf-1 in tbc-10 and arf-1.2; tbc-10 mutants.

      We agree that warf-1 and arf-5 could be functioning redundantly with arf-1.2. We have attempted to generate an AID::arf-5 allele to test the effect of cell-specific degradation, but homozygous AID::arf-5 animals were lethal. We have not yet examined warf-1. We believe the best way to test these two molecules is through RNAi knockdown, and we propose to do this experiment and adjust our interpretation and discussion according to the new data.

      Minor comments

      Figure 1C the CRISPR generated allele (vd123) is referred to as [S784L] and then in 1E vd92 is referred to as [S784L]. Perhaps it would be clearer if the allele name was used instead of the amino acid change.

      We will reformat the manuscript to include the allele names instead of amino acid change.

      Page 6 "We reasoned that if the S784L mutation we isolated causes a similar loss of the GTPase activation function, then SKIN::AGEF-1a[E608K] would not have the capacity to restore the rate of PLM axon breaks to background levels in agef-1[S784L]; tbc-10; vdSi2 animals." It was unclear to me whether you were testing if the S784L mutation could be disrupting a GEF independent function or might disrupt the nucleotide exchange activity as might be tested in a biochemical assay. There are many reasons this change could cause a loss of function phenotype (ie. Improper folding, mislocalization, etc.). The most clear explanation would be that you were testing if GEF function was required for rescue rather than testing if the S784L mutation disrupted GEF activity.

      Indeed, this experiment reveals that reducing the activation of the AGEF-1 target phenocopies the effect of S784L and does not further enhance the effect of S784L. However, it does not answer if, specifically, the GEF function is affected by S784L. We propose to rewrite the quoted sentence as follows: "We asked whether the GEF function is required for axonal damage. If that is the case, then SKIN::AGEF-1a[E608K] overexpression should phenocopy the effect of AGEF-1a[S784L]."

      • *

      Page 13. It was unclear how testing if AGEF-1, RME-4, ARF-5 and RAB-35 form complexes in vivo (I assume you are suggesting colocalize based on figure 3 interpretation) would resolve how AGEF-1 was regulating RAB-35.


      We apologize that our phrasing was not clear. We will rewrite this section to better reflect the following idea. Given literature data showing an allosteric interaction between RME-4/DENND1 and ARF-5/Arf5, and our own data showing that AGEF-1 regulates RAB-35, we believe these molecules could form a complex. Considering that we do not have data to support this notion, mostly due to the inability to test the effect of ARF-5, we will present this possibility in the discussion section.


      __**Cross-commenting**

      I agree with the comments made by the other reviewers and I stand by my own as well. I will echo that it is important to know the nature of their agef-1 allele.

      Reviewer #1 (Significance (Required)):

      Bonacossa-Pereira et al identify AGEF-1 as a regulator of axon integrity that functions in a pathway with RAB-35 in the epidermis is an exciting finding. As pointed out in the discussion, mutations in the human ortholog cause neurodevelopmental defects which leads to obvious characterization of BIG2/ArfGEF2 in neurons while this study indicates that this protein can have cell non-autonomous roles in regulating neurons. These findings could have important implications for understanding the etiology of these defects that would be of interest to neurobiologists and clinical researchers.

      The finding of this paper would also be of interest to cell biologists and particularly those studying the roles of Rab and Arf GTPases in membrane trafficking, such as myself. The idea that AGEF-1 might function as a Rab35 GEF is provocative and would generate a lot of interest and skepticism from the field. However, there is no data to support that AGEF-1 would be a direct regulator of Rab35 over the previously demonstrated cross regulation of Rab35 by Arf GTPases. Therefore, it would be fine to speculate in the discussion a direct interaction, but I would refrain from suggesting this as a model and elsewhere in the manuscript.

      __

      Although we agree that current evidence is not sufficient to support the model where AGEF-1 is a direct regulator of RAB-35, our data points to the direction where there is an important genetic relationship between these molecules in a physiological context in a living animal, with a defined phenotype relevant to the nervous system maintenance. We think that the proposed revision experiments will provide a better understanding of how AGEF-1 functions with RAB-35 and we agree with the suggestion to rephrase our manuscript to reflect the limitations of our results.


      __Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      This interesting manuscript reports the outcome of a fruitful C. elegans genetic screen with a complex but clever design. Through it, the authors identify AGEF-1 as a GEF that likely regulates the active state of the GTPase RAB-35 in the skin to protect touch receptor axons from mechanical breakage.

      Major points: 1. Based on localization experiments, the authors claim "AGEF-1a interacts with RAB-35 in the epidermis" (Results heading) and state "these data demonstrate that AGEF-1a interacts with a subset of RAB-35 molecules in the epidermis." In general, localization studies cannot be used to conclude physical interaction (with some exceptions such as single-molecule kinetics). In this case, the data in my view do not even make a compelling argument for co-localization. There is a lot of AGEF-1 and RAB-35 signal everywhere and it may not be meaningful that the signals sometimes overlap. A more quantitative approach with controls would be needed to conclude meaningful co-localization. Importantly, this would still not demonstrate interaction.__

      We thank the reviewer for the comment. Indeed, co-localization does prove a physical interaction, and we appreciate the concern about our imaging data not making a compelling argument. To address this notion, we plan to perform an experiment using a more robust, quantitative and physiologically relevant strategy. We will generate an endogenously tagged mScarlet3::rab-35 allele for precise endogenous localization. In addition, as a positive control, we will generate an endogenous rme-4::GFP11x7 allele to cell-specifically demonstrate the level of colocalization of RME-4 with mScarlet3::RAB-35 within the epidermis. To address the possible interaction between AGEF-1a and RAB-35 we will leverage a split-GFP approach to assess their interaction in vivo, in the context relevant to the phenotypes we observed (see reply to reviewer #1 point 1).

      __2. The effect of the AGEF-1(S784L) mutation is not clear to me. Naively, as the S784L mutation lies in the auto-inhibitory domain, I would have expected AGEF-1 to become constitutively active, not inactive as the authors seem to suggest. Is the idea that it is constitutively auto-inhibited? The main evidence for a loss of function effect seems to be that a putative dominant negative mutation AGEF-1(E608K) does not further supress axon breakage when co-expressed in trans to AGEF(S784L), but in my view this only shows that, once the defect is suppressed, it cannot be suppressed any further. Defining the nature of the S784L allele is important. Some suggestions, although the authors may come up with different approaches: use of an inducible or cell-specific depletion system like AID/TIR1, Cre/lox, or FLP/FRT to circumvent the lethality of agef-1(0) and reveal what a true loss-of-function looks like; testing if deletion of the auto-inhibitory domain phenocopies S784L to test if this mutation impairs autoinhibition.

      __

      This is an very insightful comment. To address this point, we will follow the reviewer's suggestion and deplete AGEF-1 cell-specifically in the epidermis using the auxin-inducible degron system. Specifically, we will generate an agef-1::AID allele to degrade this molecule in a spatially and temporally controlled fashion, which will allow to circumvent the lethality of agef-1(0) and determine whether the S784L allele mimics the depletion of AGEF-1.

      Although it would be interesting to further dissect the effect of this mutation on AGEF-1 activity, we believe that this falls outside of the scope of this manuscript. As an alternative, we propose to elaborate more in the discussion the implications of the possible roles for the S784L mutation to clarify our model of its function. Our data supports a model in which this mutation reduces AGEF-1 function leading to a reduction in the activity of its downstream target GTPases. It is possible that this is due to AGEF-1 becoming constitutively autoinhibited, or that this mutation affects the structure of the molecule in a way that it reduces its affinity towards its downstream effectors.

      Minor points: 1. I am not able to see the "vesicle-like structures with a clear luminal space" or RAB-35 being "notably enriched at the membrane near the epidermal furrow" in Fig. 3. The "3D surface rendering" in Fig. 3e is grossly oversampled and should not be included.

      We will rectify this section and include new super-resolved images using Airyscan confocal microscopy. We hope these will yield a better-quality representation of these concepts. __ 2. As the agef-1a isoform is specifically referenced throughout, please describe the different agef-1 isoforms somewhere to save readers from having to look this up.__

      Yes, we will include a description of the isoforms. In C. elegans there are two: AGEF-1a which has been confirmed by cDNA and AGEF-1b which is predicted and partially confirmed by cDNA. The mutation we isolated exclusively affects AGEF-1a.

      3. The authors include an interesting speculation in the Discussion: "Future investigations of BIG2-associated neurological disorders should consider... hyper-activity of BIG2 as a driver of neuropathology." If the authors have the tools to test the effect of hyperactive BIG2 in this system, it could be an exciting addition.


      This is an exciting idea that we would like to keep in the Discussion. The biology of BIG2 activity regulation is a nascent field of research and we believe that to accurately generate and characterise a hyperactive BIG2 would be beyond the scope of this manuscript.

      __ On a personal note, since GEFs act oppositely to GTPase Activating Proteins (GAPs), I had to stop and re-read carefully whenever the authors referred to a GEF "activating" a GTPase. I understand their meaning (i.e., putting the GTPase in its active GTP-bound state, not activating its GTPase function) but I wanted to point out this potential confusion in case there is a way to better define terms in the Introduction or change word choice. I realize this may be a standard jargon in the field.__

      Indeed, this is confusing nomenclature and a difficult concept to deliver in an accurate and succinct manner. We propose to include a clearer, more didactic explanation of their function. In a simple explanation, GTPases perform cellular functions when bound to GTP. GAPs terminate GTPase activity by catalysing GTP hydrolysis, generating GDP. GEFs initiate GTPase activity by catalysing the release of GDP and allowing GTP binding.

      __ Please check the correct nomenclature for CRISPR/Cas9.__


      We will rectify where appropriate.

      __6. p.7 "these molecules act in synergy", consider replacing with "redundantly".

      __

      We will rectify where appropriate.

      __Reviewer #2 (Significance (Required)):

      The significance of this story is to show that GEF-GTPases pairing can be highly context-dependent. Previous studies have identified GEFs that pair with RAB-35 and GTPases that pair with AGEF-1, but the authors find that these factors have at best a modest role in the context of skin-axon interactions. Instead, the authors suggest a novel GTPase-GEF pairing of RAB-35 with AGEF-1 and provide evidence that this relationship is conserved in the human homolog of AGEF-1. These results suggest that GTPase-GEF pairings depend not only on chemical affinity but also cellular context.

      The main strength of the study is its clever genetics. For the screen, the authors looked for suppressors of a synthetic defect in axon integrity caused in part by elevated activity of RAB-35 due to loss of its GAP TBC-10. It is satisfying that this screen isolated a mutation in a GEF that in principle could counterbalance the loss of a GAP.

      The main weakness of the study is the lack of direct evidence for an AGEF-1/RAB-35 interaction. While not necessary for publication, the inclusion of biochemical data to support the role of AGEF-1 as a GEF for RAB-35 and the effect of the S784L mutation on this activity would strongly elevate the study. The genetic data for this interaction are consistent with the model but not conclusive, and in my view the colocalization data are not compelling. Nevertheless this is a solid genetic story with a clever screen.__

      __ __We appreciate the feedback and are grateful for the positive comments on the significance of our study. As explained in the significance section related to Reviewer 1, if we find evidence of a direct interaction between AGEF-1 and RAB-35 in the proposed new experiments, we will include it in the manuscript; alternatively, we will present it as a possibility in the discussion section, as suggested. We agree that a more nuanced understanding of the effect of the S784L is interesting and that our colocalization data can be improved, and we have proposed experiments to address these concerns.

      __Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      This paper investigates the mechanism by which molecular pathways in the skin protect the processes of nerves that innervate them from damage. The authors previously showed that spectrin and the small GTPase RAB-35 act in the epidermis of C. elegans to protect mechanosensory axons from breaking. In this paper they used a suppression screen to identify another gene involved in this process, an ARF-GEF called AGEF-1. Partial loss-of-function mutations in agef-1 suppress the axon-breakage phenotype of spectrin mutations, and genetic experiments by the authors are consistent with the possibility that AGEF-1 could act directly as an exchange factor for RAB-35. Consistent with this model, they show that AGEF-1 and RAB-35 colocalise in the skin.

      Major comments: The experiments in this paper are well-designed and well-controlled, and the interpretations of the results are all reasonable. On the other hand, I don't think the authors' hypothesis that AGEF-1 acts directly as an exchange factor for RAB-35, or that these two proteins directly interact, is definitively proven. This is not an issue of the authors overinterpreting their data--the paper is very carefully and thoughtfully written. However, the most interesting and counterintuitive finding--that an ARF-GEF could also be a RAB-GEF--might be strengthened with more experiments (for example, could they more directly show protein-protein interaction through co-IP or mass spec?).__

      We thank the reviewer for the suggestion. We propose to further investigate the notion that AGEF-1a might be a direct interactor of RAB-35 using a split-GFP approach to assess whether these molecules closely interact, in vivo, in the physiological context that is relevant for the maintenance of the touch sensing neurons (please see reply to reviewer #1 major point 1 and reviewer #2 major point 1 for more details).

      Minor comments: There are also two places where the fact that null mutations are lethal (for agef-1 and arf-5) prevented the authors from addressing the effect of agef-1 loss of function in the skin, and addressing whether ARF-5 could be an AGEF-1 target, respectively. In principle, they could have tried to make a CRISPR line in which these genes could be cell-specifically deleted in the skin (using a dpy-7-driven recombinase). I don't think either of these experiments are essential, but if it is feasible to make these lines it would tie up a couple of loose ends.

      We agree to explore the roles of agef-1 and arf-5 loss-of-function. We propose to tissue-specifically degrade agef-1 using an auxin-inducible degradation strategy (please see reviewer #2 major point 2 reply for more details). For arf-5, we propose knocking-down its function using RNAi to overcome lethality (please see reviewer #1 major point 3 reply for more details).

      __Reviewer #3 (Significance (Required)):

      Overall I think this is an interesting paper on a topic of general interest. The most interesting finding is that an exchange factor for an ARF (a small GRPase involved in vesicle coating/uncoating) could also be an exchange factor for a RAB (a small GTPase involved in vesicle tethering). The evidence presented is suggestive and intriguing, though as noted above not completely definitive. In summary, I think it is an interesting paper in its current form, and anything it could do to more firmly establish a direct interaction between AGEF-1 and RAB-35 would increase its impact and importance.

      __

      We thank the reviewer for the positive evaluation of the significance of our study.

      __ Reviewer #4 (Evidence, reproducibility and clarity (Required)):

      Summary: In this study Bonacossa-Pereira et al. identify AGEF-1a, an Arf-GEF, as a factor that functions in the epidermis through RAB-35 to regulate axonal integrity of the PLM mechanosensory neurons in C. elegans. Specifically, epidermal attachment sites are regulated by these genes form the epidermis and compromising these attachment sites results in axonal degeneration. The study provides some evidence that that RAB-35 and AGEF-1 at least partially colocalize in the skin. Finally, the authors provide evidence that the human orthologue BIG2 is capable of functionally replacing AGEF-1a in C. elegans. Overall, the experiments are well designed and the paper is clear and succinct. The conclusions are supported by the findings and provide an important extension of the author's findings a few back, when they identified the role of rab-35 in mediating the epidermal-neuronal attachment sites.

      Major comments: 1. AGEF-1/BIG2 are known to regulate other GTPases such as ARF-5 or ARF-2. The authors exclude a non-redundant function for ARF-2, but are unable to establish a role for ARF-5 because of the lethality associated with the mutation. Alternative approaches, such as cell specific knock out or knock down experiment. In addition, studies to test potentially physical interaction such as pull-down assays, co-IP experiments and FRET could be used to test whether AGEF-can bind RAB-35 or ARF-5.__

      We thank the reviewer for this suggestion. We propose addressing these concerns using a tissue-specific degradation for AGEF-1a (please see reviewer #1 major point 2 for details). To establish a role for ARF-5 we propose to do an RNAi mediated knock-down to overcome lethality (please see reviewer #1 major point 3 for details). Finally, we plan to use a split-GFP approach to test the physical interaction between agef-1a and rab-35 in vivo (please see reviewer #1 major point 1 for details)

      __ Phenotypic readout has been limited to only axon breaks. It may be interesting to also test other aspects such as axonal deformities including swellings and vesiculation in other parts of the nervous system. Moreover, behavioral or functional experiments such as response to gentle touch or synaptic integrity could be informative.__

      We have not observed any obvious touch receptor neurons axonal phenotypes other than axonal breaks in these mutants, and we will include a statement that reflects this concept. In relation to the behavior, we have not tested it as the results will be difficult to interpret for two reasons: first, the breaks are not always bilateral and one neuron is sufficient to provide mechanical response; second, the mixed identity of the PLM neurite allows it to retain some function despite being severed. However, if deemed essential, we will perform these experiments.

      __ Overexpression constructs such as SKIN::RAB-35[Q69L], SKIN::BIG2, SKIN::AGEF-1a[E608K] in extrachromosomal transgenes could lead to non-physiological localization or effects. Single copy expression using MosSCI or CRISPR insertions are generally considered better approaches (other than endogenous reporters) to provide accurate insights at the physiological level. While the authors tacitly acknowledge this by conducting the experiments in a rab-35 mutant background and very low transgene concentration, at the very least this caveat regarding the localization should be discussed.__

      This is an important remark, and we appreciate the comment. We acknowledge that experiments using extrachromosomal arrays have inherent caveats, especially for localization studies. To address the RAB-35 localization concern we plan to repeat the localization studies using an endogenously tagged RAB-35 using CRISPR to overcome the possible artifacts caused by extrachromosomal array driven expression (please see reviewer #1 point 1 for more details). For the cell-specific rescues or dominant-negative constructs expression, we believe that using extrachromosomal arrays is sufficient, since this allows us to compare genetically identical transgenic vs non-transgenic siblings of independent lines. Moreover, given these constructs are already driven by a tissue-specific promoter that is inherently stronger than their respective endogenous promoters, even a single-copy insertion would have the same caveats.

      __4. The study does not address clearly whether AGEF-1a acts in parallel to spectrin or upstream/ downstream to it. Epistasis experiments could help to figure out the signaling pathway involved.

      __

      Indeed, this is a concept that we need to communicate more clearly. We have data showing that a mutation in agef-1 does not cause axonal damage on its own, and that it has no effect on the axonal damage caused by unc-70 dominant negative mutation alone. We only detect an effect of agef-1 when tbc-10 is mutated together with unc-70 (Fig. 1a of manuscript). Together, these data indicate that agef-1 functions upstream of rab-35, thus acting in parallel to unc-70 (see schematic below) to ensure the mechanical stability of neuron epidermal attachment. We plan to include this data and the following schematic as a supplement to better convey the idea and discuss the results appropriately.

      __ The finding that BIG2 rescues the mutant defect is an important finding and rightfully finds its place in the abstract. I wonder whether a reference to the human diseases caused by loss of BIG2 in the abstract and introduction would not increase interest/impact for the study, rather than burying this potentially interesting connection in the discussion.

      __

      We appreciate the reviewer's comment, and welcome the suggestion. We propose to include relevant background about BIG2-related human diseases in the abstract and introduction as suggested and expand the discussion regarding BIG2 mutations.

      __Minor comments:

      1. Some explanation about how mutating the autoinhibitory domain could impact the catalytic activity of a GEF might be helpful.__

      2. *

      We acknowledge that this notion was not well communicated. We propose to elaborate more about why we think a mutation in the autoinhibitory domain might be affecting the GEF activity and we plan to do further experiments to dissect how this might be happening. Please see reviewer #2 major point 2 for a more detailed explanation.

      __ The paper refers to rme-4(b1001) as a null allele while wormbase refers to the same as a missense allele. It would be more accurate to refer rme-4(b1001) as a strong loss of function or putative null.__

      We agree and will refer to b1001 as a strong loss-of-function.

      __ The paper does not clearly discuss limitations of the hypomorphic agef-1[S784L] and that the observed phenotypes in this hypomorph might underestimate the complete role of AGEF-1a.__

      • *

      We thank the reviewer for this suggestion. We propose to elaborate more on these limitations, especially considering the possible new results from the experiments suggested in reply to reviewer #2 major comment point 2.

      __ In figure 1, where there really only one extrachromosomal transgenic line for some of the construct tested? __

      • *

      For the Pdpy-7::AGEF-1a lines we have scored 3 transgenic lines (data not included) and only one yielded a full rescue. For all extrachromosomal lines presented, we tested 3 independent transgenic lines. For brevity, we only included the result for the positive rescues (1 for BIG2 and 1 for AGEF-1a), except for the Pmec-4 lines, of which none rescued the phenotype (data included in Table S2). We will update Table S2 to include all the lines tested.

      __ The concentrations of transgenes vary in different transgenes. Is there a rationale behind this? __

      Yes, we have attempted multiple concentrations of injections for each transgene and there was some variability for each construct injected, thus we only included the ones where we observed an effect. As mentioned in point 4 above, we will update Table S2 to include details of all lines tested.

      __ In Fig.1e: I may be useful to also show the "WT" phenotype, i.e. the strong defects to get a visual comparison for the degree of rescue. __

      • *

      We think this suggestion will help the readers. We will include this as a representative dashed line showing the WT phenotype.

      __Reviewer #4 (Significance (Required)):

      The study has identified AGEF-1a as a regulator of axonal maintenance, functioning to protect neurons against mechanical stress by acting through RAB-35. Additionally, this epidermal GEF, AGEF-1a is functionally conserved as its human orthologue BIG2 can replace AGEF-1a in C. elegans for axonal protection. Important points here are that the findings extend prior work by the authors of non-autonomous mechanism that regulates epidermal-neuronal attachment. In my humble opinion, the human disease connection, in particular with regard to the unexplained neuronal phenotypes in patients could be better developed in the manuscript. It may also increase impact/interest of a wonderful story that right now reads a bit 'wormy'.__


      This is an important remark and we are grateful for the positive comments. The fact that human BIG2 is also conserved in C. elegans points to a fundamental role of this molecule in multicellular life, and it provides a tractable model to investigate the function of this molecule in a physiological context. We welcome the suggestion to elaborate more the connection with the unexplained neuronal phenotypes in patients and use a more accessible language to convey our findings to a wider audience.


      3. Description of the revisions that have already been incorporated in the transferred manuscript

      N/A

      4. Description of analyses that authors prefer not to carry out

      __Reviewer #1 __


      "...studies to test potentially physical interaction such as pull-down assays, co-IP experiments and FRET could be used to test whether AGEF-can bind RAB-35 or ARF-5."


      While pull-down assays, co-IP and FRET would reveal whether AGEF-1a can form a complex with RAB-35, we believe that using a full length AGEF-1a would not only represent a significant technical challenge but will also not prove a direct interaction in a physiological context.


      "...An optional experiment would be to look at the colocalization of RAB-35 with a known effector in wild type and agef-1(vd92) with the expectation that there would be a higher level of colocalization in agef-1 mutants. Effector pull-down experiments or perhaps a cell based GEF assay could be used (PMID: 35196081)."


      We think that screening for the relevant rab-35 effector in this context and/or doing effector pull-down/cell based GEF assays would be a significant technical challenge. We propose to address this concern by tempering our claim as suggested by the reviewer.


      "...It may be interesting to also test other aspects such as axonal deformities including swellings and vesiculation in other parts of the nervous system. Moreover, behavioral or functional experiments such as response to gentle touch or synaptic integrity could be informative."

      As indicated above in major point 2 of reviewer 4, these are interesting ideas that might answer how the function of these neurons might be affected. However, in addition to the challenges indicated above, they will not provide further insights into how their integrity is maintained. We believe these will fall outside the scope of the manuscript, but if deemed essential we will perform behavioral analysis.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      McDougal et al. aimed to characterize the antiviral activity of mammalian IFIT1 orthologs. They first performed three different evolutionary selection analyses within each major mammalian clade and identified some overlapping positive selection sites in IFIT1. They found that one site that is positively selected in primates is in the RNA-binding exit tunnel of IFIT1 and is tolerant of mutations to amino acids with similar biochemical properties. They then tested 9 diverse mammalian IFIT1 proteins against VEEV, VSV, PIV3, and SINV and found that each ortholog has distinct antiviral activities. Lastly, they compared human and chimpanzee IFIT1 and found that the determinant of their differential anti-VEEV activity may be partly attributed to their ability to bind Cap0 RNA. 

      Strengths: 

      The study is one of the first to test the antiviral activity of IFIT1 from diverse mammalian clades against VEEV, VSV, PIV3, and SINV. Cloning and expressing these 39 IFIT1 orthologs in addition to single and combinatorial mutants is not a trivial task. The positive connection between anti-VEEV activity and Cap0 RNA binding is interesting, suggesting that differences in RNA binding may explain differences in antiviral activity. 

      Weaknesses: 

      The evolutionary selection analyses yielded interesting results, but were not used to inform follow-up studies except for a positively selected site identified in primates. Since positive selection is one of the two major angles the authors proposed to investigate mammalian IFIT1 orthologs with, they should integrate the positive selection results with the rest of the paper more seamlessly, such as discussing the positive selection results and their implications, rather than just pointing out that positively selected sites were identified. The paper should elaborate on how the positive selection analyses PAML, FUBAR, and MEME complement one another to explain why the tests gave them different results. Interestingly, MEME which usually provides more sites did not identify site 193 in primates that was identified by both PAML and FUBAR. The authors should also provide the rationale for choosing to focus on the 3 sites identified in primates only. One of those sites, 193, was also found to be positively selected in bats, although the authors did not discuss or integrate that finding into the study. In Figure 1A, they also showed a dN/dS < 1 from PAML, which is confusing and would suggest negative selection instead of positive selection. Importantly, since the authors focused on the rapidly evolving site 193 in primates, they should test the IFIT1 orthologs against viruses that are known to infect primates to directly investigate the impact of the evolutionary arms race at this site on IFIT1 function. 

      We thank the reviewer for their assessment and for acknowledging the breadth of our dataset regarding diverse IFIT1s, number of viruses tested, and the functional data that may correlate biochemical properties of IFIT1 orthologous proteins with antiviral function. We have expanded the introduction and results sections to better explain and distinguish between PAML, FUBAR, and MEME analyses. Furthermore, we have expanded the discussion to incorporate the observation that site 193 is rapidly evolving in bats, as well as the observation that nearby sites to the TPR4 loop were identified as rapidly evolving in all clades of mammals tested. We also do observe an overall gene dN/dS of <1, however this is simply the average across all codons of the entire gene and does not rule out positive selection at specific sites. This is observed for other restriction factors, as many domains are undergoing purifying selection to retain core functions (e.g enzymatic function, structural integrity) while other domains (e.g. interfaces with viral antagonists or viral proteins) show strong positive selection. Specific examples include the restriction factors BST-2/Tetherin (PMID: 19461879) and MxA (PMID: 23084925). Furthermore, we agree that testing more IFIT1-sensitive viruses that naturally infect primates with our IFIT1 193 mutagenesis library would shed light on the influence of host-virus arms races at this site. However, VEEV naturally does also infect humans as well as at least one other species of primate (PMID: 39983680).

      Below we individually address the reviewers' claims of inaccurate data interpretation.

      Some of the data interpretation is not accurate. For example: 

      (1) Lines 232-234: "...western blot analysis revealed that the expression of IFIT1 orthologs was relatively uniform, except for the higher expression of orca IFIT1 and notably lower expression of pangolin IFIT1 (Figure 4B)." In fact, most of the orthologs are not expressed in a "relatively uniform" manner e.g. big brown bat vs. shrew are quite different. 

      We have now included quantification of the western blots to allow the reader to compare infection results with the infection data (Updated Figure 4B and 4G). We have also removed the phrase “relatively uniform” from the text and have instead included text describing the quantified expression differences.

      (2) Line 245: "...mammalian IFIT1 species-specific differences in viral suppression are largely independent of expression differences." While it is true that there is no correlation between protein expression and antiviral activity in each species, the authors cannot definitively conclude that the species-specific differences are independent of expression differences. Since the orthologs are clearly not expressed in the same amounts, it is impossible to fully assess their true antiviral activity. At the very least, the authors should acknowledge that the protein expression can affect antiviral activity. They should also consider quantifying the IFIT1 protein bands and normalizing each to GAPDH for readers to better compare protein expression and antiviral activity. The same issue is in Line 267. 

      We have now included quantification and normalization of the western blots to allow the reader to compare infection results with the infection data (Updated Figure 4B and 4G). Furthermore, we acknowledge in the text that expression differences may affect antiviral potency in infection experiments.

      (3) Line 263: "SINV... was modestly suppressed by pangolin, sheep, and chinchilla IFIT1 (Figure 4E)..." The term "modestly suppressed" does not seem fitting if there is 60-70% infection in cells expressing pangolin and chinchilla IFIT1. 

      We have modified the text to say “significantly suppressed” rather than “modestly suppressed.”

      (4) The study can be significantly improved if the authors can find a thread to connect each piece of data together, so the readers can form a cohesive story about mammalian IFIT1. 

      We appreciate the reviewer’s suggestion and have tried to make the story including more cohesive through commentary on positive selection and by using the computational analysis to first inform potential evolutionary consequences of IFIT1 functionality first by an intraspecies (human) approach, and then later an interspecies approach with diverse mammals that have great sequence diversity. Furthermore, we point out that almost all IFIT1s tested in the ortholog screen were also included in our computational analysis allowing for the potential to connect functional observations with those seen in the evolutionary analyses.

      Reviewer #2 (Public review): 

      McDougal et al. describe the surprising finding that IFIT1 proteins from different mammalian species inhibit the replication of different viruses, indicating that the evolution of IFIT1 across mammals has resulted in host speciesspecific antiviral specificity. Before this work, research into the antiviral activity and specificity of IFIT1 had mostly focused on the human ortholog, which was described to inhibit viruses including vesicular stomatitis virus (VSV) and Venezuelan equine encephalitis virus (VEEV) but not other viruses including Sindbis virus (SINV) and parainfluenza virus type 3 (PIV3). In the current work, the authors first perform evolutionary analyses on IFIT1 genes across a wide range of mammalian species and reveal that IFIT1 genes have evolved under positive selection in primates, bats, carnivores, and ungulates. Based on these data, they hypothesize that IFIT1 proteins from these diverse mammalian groups may show distinct antiviral specificities against a panel of viruses. By generating human cells that express IFIT1 proteins from different mammalian species, the authors show a wide range of antiviral activities of mammalian IFIT1s. Most strikingly, they find several IFIT1 proteins that have completely different antiviral specificities relative to human IFIT1, including IFIT1s that fail to inhibit VSV or VEEV, but strongly inhibit PIV3 or SINV. These results indicate that there is potential for IFIT1 to inhibit a much wider range of viruses than human IFIT1 inhibits. Electrophoretic mobility shift assays (EMSAs) suggest that some of these changes in antiviral specificity can be ascribed to changes in the direct binding of viral RNAs. Interestingly, they also find that chimpanzee IFIT1, which is >98% identical to human IFIT1, fails to inhibit any tested virus. Replacing three residues from chimpanzee IFIT1 with those from human IFIT1, one of which has evolved under positive selection in primates, restores activity to chimpanzee IFIT1. Together, these data reveal a vast diversity of IFIT1 antiviral specificity encoded by mammals, consistent with an IFIT1-virus evolutionary "arms race". 

      Overall, this is a very interesting and well-written manuscript that combines evolutionary and functional approaches to provide new insight into IFIT1 antiviral activity and species-specific antiviral immunity. The conclusion that IFIT1 genes in several mammalian lineages are evolving under positive selection is supported by the data, although there are some important analyses that need to be done to remove any confounding effects from gene recombination that has previously been described between IFIT1 and its paralog IFIT1B. The virology results, which convincingly show that IFIT1s from different species have distinct antiviral specificity, are the most surprising and exciting part of the paper. As such, this paper will be interesting for researchers studying mechanisms of innate antiviral immunity, as well as those interested in species-specific antiviral immunity. Moreover, it may prompt others to test a wide range of orthologs of antiviral factors beyond those from humans or mice, which could further the concept of host-specific innate antiviral specificity. Additional areas for improvement, which are mostly to clarify the presentation of data and conclusions, are described below. 

      Strengths: 

      (1) This paper is a very strong demonstration of the concept that orthologous innate immune proteins can evolve distinct antiviral specificities. Specifically, the authors show that IFIT1 proteins from different mammalian species are able to inhibit the replication of distinct groups of viruses, which is most clearly illustrated in Figure 4G. This is an unexpected finding, as the mechanism by which IFIT1 inhibits viral replication was assumed to be similar across orthologs. While the molecular basis for these differences remains unresolved, this is a clear indication that IFIT1 evolution functionally impacts host-specific antiviral immunity and that IFIT1 has the potential to inhibit a much wider range of viruses than previously described. 

      (2) By revealing these differences in antiviral specificity across IFIT1 orthologs, the authors highlight the importance of sampling antiviral proteins from different mammalian species to understand what functions are conserved and what functions are lineage- or species-specific. These results might therefore prompt similar investigations with other antiviral proteins, which could reveal a previously undiscovered diversity of specificities for other antiviral immunity proteins. 

      (3) The authors also surprisingly reveal that chimpanzee IFIT1 shows no antiviral activity against any tested virus despite only differing from human IFIT1 by eight amino acids. By mapping this loss of function to three residues on one helix of the protein, the authors shed new light on a region of the protein with no previously known function. 

      (4) Combined with evolutionary analyses that indicate that IFIT1 genes are evolving under positive selection in several mammalian groups, these functional data indicate that IFIT1 is engaged in an evolutionary "arms race" with viruses, which results in distinct antiviral specificities of IFIT1 proteins from different species. 

      Weaknesses: 

      (1) The evolutionary analyses the authors perform appear to indicate that IFIT1 genes in several mammalian groups have evolved under positive selection. However, IFIT1 has previously been shown to have undergone recurrent instances of recombination with the paralogous IFIT1B, which can confound positive selection analyses such as the ones the authors perform. The authors should analyze their alignments for evidence of recombination using a tool such as GARD (in the same HyPhy package along with MEME and FUBAR). Detection of recombination in these alignments would invalidate their positive selection inferences, in which case the authors need to either analyze individual non-recombining domains or limit the number of species to those that are not undergoing recombination. While it is likely that these analyses will still reveal a signature of positive selection, this step is necessary to ensure that the signatures of selection and sites of positive selection are accurate. 

      (2) The choice of IFIT1 homologs chosen for study needs to be described in more detail. Many mammalian species encode IFIT1 and IFIT1B proteins, which have been shown to have different antiviral specificity, and the evolutionary relationship between IFIT1 and IFIT1B paralogs is complicated by recombination. As such, the assertion that the proteins studied in this manuscript are IFIT1 orthologs requires additional support than the percent identity plot shown in Figure 3B. 

      (3) Some of the results and discussion text could be more focused on the model of evolution-driven changes in IFIT1 specificity. In particular, the chimpanzee data are interesting, but it would appear that this protein has lost all antiviral function, rather than changing its antiviral specificity like some other examples in this paper. As such, the connection between the functional mapping of individual residues with the positive selection analysis is somewhat confusing. It would be more clear to discuss this as a natural loss of function of this IFIT1, which has occurred elsewhere repeatedly across the mammalian tree. 

      (4) In other places in the manuscript, the strength of the differences in antiviral specificity could be highlighted to a greater degree. Specifically, the text describes a number of interesting examples of differences in inhibition of VSV versus VEEV from Figure 3C and 3D, but it is difficult for a reader to assess this as most of the dots are unlabeled and the primary data are not uploaded. A few potential suggestions would be to have a table of each ortholog with % infection by VSV and % infection by VEEV. Another possibility would be to plot these data as an XY scatter plot. This would highlight any species that deviate from the expected linear relationship between the inhibition of these two viruses, which would provide a larger panel of interesting IFIT1 antiviral specificities than the smaller number of species shown in Figure 4. 

      We thank the reviewer for their fair assessment of our manuscript. As the reviewer requested, we performed GARD analysis on our alignments used for PAML, FUBAR, and MEME (New Supp Fig 1). By GARD, we found 1 or 2 predicted breakpoints in each clade. However, much of the sequence was after or between the predicted breakpoints. Therefore, we were able to reanalyze for sites undergoing positive selection in the large region of the sequence that do not span the breakpoints. We were able to validate almost all sites originally identified as undergoing positive selection still exhibit signatures of positive selection taking these breakpoints into account: primates (11/12), bats (14/16), ungulates (30/37), and carnivores (2/4). To further validate our positive selection analysis, we used Recombination Detection Program 4 (RDP4) to remove inferred recombinant sequences from the primate IFIT1 alignment and performed PAML, FUBAR, and MEME. Once again, the sites in our original anlaysis were largely validated by this method. Importantly, sites 170, 193, and 366 in primates, which are discussed in our manuscript, were found to be undergoing positive selection in 2 of the 3 analyses using alignments after the indicated breakpoint in GARD and after removal of recombinant sequences by RDP4. We have updated the text to acknowledge IFIT1/IFIT1B recombination more clearly and include the GARD analysis as well as PAML, FUBAR, and MEME reanalysis taking into account predicted breakpoints by GARD and RDP4. Furthermore, to increase evidence that the sequences used in this study for both computational and functional analysis are IFIT1 orthologs rather than IFIT1B, we have included a maximum likelihood tree after aligning coding sequences on the C-terminal end (corresponding to bases 907-1437 of IFIT1). In Daughtery et al. 2016 (PMID: 27240734) this strategy was used to distinguish between IFIT1 and IFITB. All sequences used in our study grouped with IFIT1 sequences (including many confirmed IFIT1 sequences used in Daughterty et al.) rather than IFIT1B sequences or IFIT3. This new data, including the GARD, RDP4, and maximum likelihood tree is included as a new Supplementary Figure 1.

      We also agree with the reviewer that it is possible that chimpanzee IFIT1 has lost antiviral function due to the residues 364 and 366 that differ from human IFIT1. We have updated the discussion sections to include the possibility that chimpanzee IFIT1 is an example of a natural loss of function that has occurred in other species over evolution as well as the potential consequences of this occurrence. Regarding highlighting the strength of differences in antiviral activity between IFIT1 orthologs, we have included several updates to strengthen the ability of the reader to assess these differences. First, we have included a supplementary table that includes the infection data for each ortholog from the VEEV and VSV screen to allow for readers to evaluate ranked antiviral activity of the species that suppress these viruses. In addition, the silhouettes next to the dot plots indicate the top ranked hits in order of viral inhibition (with the top being the most inhibitory) giving the reader a visual representation in the figure of top antiviral orthologs during our screen. We have also updated the figure legend to inform the reader of this information.

      Reviewer #3 (Public Review):  

      Summary: 

      This manuscript by McDougal et al, demonstrates species-specific activities of diverse IFIT1 orthologs and seeks to utilize evolutionary analysis to identify key amino acids under positive selection that contribute to the antiviral activity of this host factor. While the authors identify amino acid residues as important for the antiviral activity of some orthologs and propose a possible mechanism by which these residues may function, the significance or applicability of these findings to other orthologs is unclear. However, the subject matter is of interest to the field, and these findings could be significantly strengthened with additional data.

      Strengths:

      Assessment of multiple IFIT1 orthologs shows the wide variety of antiviral activity of IFIT1, and identification of residues outside of the known RNA binding pocket in the protein suggests additional novel mechanisms that may regulate IFIT1 activity.

      Weaknesses:

      Consideration of alternative hypotheses that might explain the variable and seemingly inconsistent antiviral activity of IFIT1 orthologs was not really considered. For example, studies show that IFIT1 activity may be regulated by interaction with other IFIT proteins but was not assessed in this study.

      Given that there appears to be very little overlap observed in orthologs that inhibited the viruses tested, it's possible that other amino acids may be key drivers of antiviral activity in these other orthologs. Thus, it's difficult to conclude whether the findings that residues 362/4/6 are important for IFIT1 activity can be broadly applied to other orthologs, or whether these are unique to human and chimpanzee IFIT1. Similarly, while the hypothesis that these residues impact IFIT1 activity in an allosteric manner is an attractive one, there is no data to support this.  

      We thank the reviewer for their fair assessment of our manuscript. To address the weaknesses that the reviewer has pointed out we have expanded the discussion to more directly address alternate hypotheses, such as the possibility of IFIT1 activity being regulated by interaction with other IFIT proteins. Furthermore, we expanded the discussion to include an alternate hypothesis for the role of residues 364 and 366 in primate IFIT1 besides allosteric regulation. In addition, we did not intend to claim or imply that residues 364/6 are the key drivers of antiviral activity for all IFITs tested. However, we speculate that within primates these residues may play a key role as these residues differ between chimpanzee IFIT1 (which lacks significant antiviral activity towards the viruses tested in this study) and human IFIT1 (which possesses significant antiviral activity). In addition, these residues seem to be generally conserved in primate species, apart from chimpanzee IFIT1. We have included changes to the text to more clearly indicate that we highlight the importance of these residues specifically for primate IFIT1, but not necessarily for all IFIT1 proteins in all clades.

      Reviewer #1 (Recommendations for the authors): 

      (1) The readers would benefit from a more detailed background on the concept and estimation of positive selection for the readers, including the M7/8 models in PAML. 

      We have included more information in the text to provide a better background for the concepts of positive selection and how PAML tests for this using M7 and M8 models.

      (2) Presentation of data 

      a) Figure 3C and 3D: is there a better way to present the infection data so the readers can tell the ranked antiviral activity of the species that suppress VEEV? 

      We have included a supplementary table that includes the infection data for each ortholog from the VEEV and VSV screen to allow for readers to evaluate ranked antiviral activity of the species that suppress these viruses. In addition, the silhouettes next to the dot plots indicate the top ranked hits in order of viral inhibition (with the top being the most inhibitory). We have updated the figure legend to inform the reader of this information as well.

      b) Figure 4C and 4D: consider putting the western blot in Supplementary Figure 1 underneath the infection data or with the heatmap so readers can compare it with the antiviral activity. 

      We have also included quantification of the western blots performed to evaluate IFIT1 expression during the experiments shown in Figure 4C and 4D in an updated Figure 4B. We have also included normalized expression values with the heatmap shown in an updated Figure 4G so the reader can evaluate potential impact of protein expression on antiviral activity for all infection experiments shown in figure 4.

      (3) Line 269-270: as a rationale for narrowing the species to human, black flying fox, and chimp IFIT1, human and black flying fox were chosen because they strongly inhibit VEEV, but pangolin wasn't included even though it had the strongest anti-VEEV activity? 

      The rationale for narrowing the species to human, black flying fox, and chimpanzee IFIT1 was related to the availability of biological tools, high quality genome/transcriptome sequencing databases, and other factors. Specifically human and chimp IFIT1 are closely related but have variable antiviral activities, making their comparison highly relevant. Bats are well established as reservoirs for diverse viruses, whereas the reservoir status of many other mammals is less well defined. Furthermore, purifying large amounts of high quality IFIT1 protein after bacterial expression was another limitation to functional studies. We have added this information into the manuscript text.

      (4) Figure 5A: to strengthen the claim that "species-specific antiviral activities of IFIT1s can be partly explained by RNA binding potential", it would be good to include one more positive and one more negative control. In other words, test the cap0 RNA binding activity of an IFIT1 ortholog that strongly inhibits VEEV and an ortholog that does not. It would also be good to discuss why chimp IFIT1 still shows dose-dependent RNA binding yet it is one of the weakest at inhibiting VEEV. 

      We appreciate the reviewer's suggestion to include more controls and expand the dataset. While we understand the potential value of expanding the dataset, we believe that human IFIT1 serves as a robust positive control and human IFIT1 R187 (RNA-binding deficient) serves as an established negative control. Future experiments with other purified IFITs from other species will indeed strengthen evidence linking IFIT1 species-specific activity and RNA-binding.

      Regarding chimpanzee IFIT1, we acknowledge there appears to be some dose-dependent Cap0 RNA-binding. However, the binding affinity is much weaker than that of human or black flying fox IFIT1. We speculate that during viral infection reduced binding affinity could impair the ability of chimpanzee IFIT1 to efficiently sequester viral RNA and inhibit viral translation. This reduction in binding affinity may, therefore, allow the cell to be overwhelmed by the exponential increase in viral RNA during replication resulting in an ineffective antiviral IFIT1. In the literature, a similar phenomenon is observed by Hyde et. al (PMID: 24482115). In this study, the authors test mouse Ifit1 Cap0 RNA binding by EMSA of the 5’ UTR sequence of VEEV RNA containing an A or G at nucleotide position 3. EMSA shows binding of both the A3 and G3 Cap0 VEEV RNA sequences, however stronger Ifit1 binding is observed for A3 Cap0 RNA sequence. The consequences of the reduced Ifit1 binding of the G3 Cap0 VEEV RNA are observed in vitro by a substantial increase in viral titers produced from cells as well as an increase in protein produced in a luciferase-based translation assay. The authors also show in vivo relevance of this reduction of Ifit1 binding as WT B6 mice infected with VEEV containing the A3 UTR exhibited 100% survival, while WT B6 mice infected with VEEV containing the G3 UTR survived at a rate of only ~25%. Therefore, the literature supports that a decrease in Cap0 RNA binding by an IFIT protein (while still exhibiting Cap0 RNA binding) observed by EMSA can result in considerable alterations of viral infection both in vitro and in vivo.

      Minor: 

      (1) Line 82: "including 5' triphosphate (5'-ppp-RNA), or viral RNAs..." having a comma here will make the sentence clearer. 

      We have improved the clarity of this sentence. It now reads, “IFIT1 binds uncapped 5′triphosphate RNA (5′-ppp-RNA) and capped but unmethylated RNA (Cap0, an m<sup>7</sup>G cap lacking 2′-O methylation).”

      (2) Line 100: "...similar mechanisms have been at least partially evolutionarily conserved in IFIT proteins to restrict viral infection by IFIT proteins". 

      We have updated the text to improve clarity by revising the sentence to “VEEV TC-83 is sensitive to human IFIT1 and mouse Ifit1B, indicating at least partial conservation of antiviral function by IFIT proteins."

      (3) Line 109: "signatures of rapid evolution or positive selection" would put positive selection second because that is the more technical term that can benefit from the more layperson term (rapid evolution). 

      We have updated this sentence incorporating this suggestion. “Positive selection, or rapid evolution, is denoted by a high ratio of nonsynonymous to synonymous substitutions (dN/dS >1).”

      (4) Lines 116-117: "However, this was only assessed in a few species" would benefit from a citation. 

      We have inserted the citation.

      (5) Line 127 heading: "IFIT1 is rapidly evolving in mammals" would be more accurate to say "in major clades of mammals". 

      We have updated the text to include this suggestion.

      (6) Line 165: "IFIT1 L193 mutants". 

      We have updated the text to rephrase this for clarity.

      (7) Line 170: two strains of VEEV were mentioned in the Intro, so it would be good to specify which strain of VEEV was used?

      We have updated the text to clarify the VEEV strain. In this study, all experiments were performed using the VEEV TC-83 strain.

      (8) Line 174: "Indeed, all mutants at position 193, whether hydrophobic or positively charged, inhibited VEEV similarly to the WT..." It should read "all hydrophobic and positively charged mutants inhibited VEEV similarly to the WT...". 

      We corrected as suggested. 

      (9) Line 204: what are "control cells"? Cells that are mock-infected, or cells without IFIT1? 

      We have updated the text to improve clarity. What we refer to as control cells, were cells expressing an empty vector control rather than an IFIT1.

      (10) Need to clarify n=2 and n=3 replicates throughout the manuscript. Does that refer to three independent experiments? Or an experiment with triplicate wells/samples? 

      We have updated the text to say “independent experiments” instead of “biological replicates” to prevent any confusion.  All n=2 or n=3 replicates denote independent experiments.

      (11) Line 254: "dominant antiviral effector against the related human parainfluenza virus type 5..." 

      We have updated the text to improve clarity.

      (12) Line 271: "The black flying fox (Pteropus alecto), is a model megabat species..." scientific name was italicized here but not elsewhere. Remove comma.

      We have updated the text accordingly.

      (13) Line 293: "...chimpanzee IFIT1 lacked these properties" but chimp IFIT1 can bind cap0 RNA, just at a lower level. 

      We have updated the text to acknowledge that chimpanzee IFIT1 can bind cap0 RNA, albeit at a lower level than human IFIT1.

      (14) Figure 6B: please fix the x-axis labels. They're very cramped. 

      We have updated the x-axis labels for figure 6B and figure 6D to improve clarity.

      (15) Line 609: "...trimmed and aligned"? 

      Our phrasing is to indicate that coding sequences were aligned, and gaps were removed to reduce the chance of false positive signal by underrepresented codons such as gaps or short insertions. We have removed “trimmed” from the text and changed the text to say “aligned sequences” to increase clarity.

      Reviewer #2 (Recommendations for the authors): 

      (1) Numbers less than 10 should be spelled out throughout the manuscript (e.g. line 138). 

      We have updated the text to reflect the request.

      (2) Line 165: "expression of IFIT1 193 mutants" should be rephrased. 

      We have updated the text to rephrase this sentence for clarity.

      (3) A supplemental table or file should be included that contains the accession number and species names of sequences used for evolutionary analyses and for functional testing. In addition, the alignments that were used for positive selection can be included.  

      We have included a supplemental file containing accession numbers, species names for evolutionary analysis and functional studies. In addition, this table includes the infection data for each IFIT1 homolog for the screen performed in figure 3.

      (4) The discussion of potential functions of the C-terminus of IFIT1 should include possible interactions with other proteins. In particular, the C-terminus of IFIT1 has been shown to interact with IFIT3 in a way that modulates its activity (PMID: 29525521). Although residues 362-366 were not shown in that paper to interact with a fragment of IFIT3, it is possible that these residues may be important for interaction with full-length IFIT3 or some other IFIT1 binding partner. 

      We thank the reviewer for their suggestion. We have expanded the discussion to explore the possibility that residues 364 and 366 of IFIT1 may be involved in IFIT1-IFIT3 interactions and consequently Cap0 RNA-binding and antiviral activity.

      (5) The quantification of the EMSAs should be described in more detail. In particular, from looking at the images shown in Figure 5A, it would appear that human and chimpanzee IFIT1 show similar degrees of probe shift, while the human R187H panel shows no shifting at all. However, the quantification shows chimpanzee IFIT1 as being statistically indistinguishable from human R187H. Additional information on how bands were quantified and whether they were normalized to unshifted RNA would be helpful in attempting to resolve this visual discordance. 

      EMSAs were quantified by determining Adj. Vol. Intensity in ImageLab (BioRad), which subtracts background signal, after imaging at the same exposure and SYBR Gold staining time. To determine Adj. Vol. Intensity, we drew a box (same size for each gel and lane for each replicate) for each lane above the free probe. These values were not normalized to unshifted RNA, however equal RNA was loaded. While the ANOVA shows no significant difference, between human R187H and chimpanzee IFIT1 band shift intensity, this is potentially due to the between group variance in the ANOVA. The increase in the AUC value for chimpanzee IFIT1 is 36.4% higher than R187H.

      The AUC of Adj. Vol. Intensity of human IFIT1 band shift is roughly 2-fold more than that of chimpanzee IFIT1. We believe this matches with the visual representation as well, as human IFIT1 has a darker “upper” band in the shift, as well as a clear dark “lower” band that is not well defined in the chimpanzee shift. Furthermore, the upper band of the chimpanzee IFIT1 shift appears to be as intense in the 400nM as the upper band in the 240nM human IFIT1 lane, without taking into account the lower band seen for human IFIT1 as well. We included this quantification as kD was unable to be calculated due to no clear probe disappearance and we do not intend for this quantification to act as a substitute for binding affinity calculations, rather to aid the reader in data interpretation.

      Reviewer #3 (Recommendations for the authors): 

      (1) IFIT1 has been demonstrated to function in conjunction with other IFIT proteins, do you think the absence of antiviral activity is due to isolated expression of IFIT1 without these cofactors, and therefore might explain why there was little overlap observed in orthologs that inhibited the viruses tested (Figure 3, lines 209-210). 

      We do not believe that isolated expression of IFIT1 without cofactors (such as orthologous IFIT proteins) would fully explain the disparities in antiviral activity as many IFIT1s that expressed inhibited either VSV or VEEV in our screen. However, we acknowledge that the expression of IFIT1 alone does create a limitation in our study as IFIT1 antiviral activity and RNA-binding can be modulated by interactions with other IFIT proteins. Therefore, we do believe that it is possible that co-expression of IFIT1 with other IFITs from a given species might potentially enhance antiviral activity. Future studies may shed light on this.

      (2) Figure 5 - Calculating the Kd for each protein would be more informative. How does the binding affinity of these IFIT1 proteins compare to that which has previously been reported? 

      We are unable to accurately determine kD as there is not substantial diminished signal of the free probe. Therefore, we are only able to compare IFIT1 protein binding between species without accurate mathematical calculation of binding affinity. Our result does appear similar to that of mouse Ifit1 binding to VEEV RNA (PMID: 24482115), in which the authors also do not calculate a kD for their RNA EMSA.

      (3) Mutants 364 and 366 may not have direct contact with RNA, but RNA EMSA data presented suggest that the binding affinity may be different (though this is hard to conclude without Kd data). Additional biochemical data with these mutants might provide more insight here. 

      We agree that further studies using 364 and 366 double mutant human and chimpanzee protein in EMSAs would provide additional biochemical data and provide insight into the role of these residues in direct RNA binding. We acknowledge this is a limitation of our study as we provide only genetic data demonstrating the importance of these residues.

      (4) Given that there appears to be very little overlap observed in orthologs that inhibited the viruses tested, it's possible that other amino acids may be key drivers of antiviral activity in these other orthologs. Thus, it's difficult to conclude whether the findings that residues 362/4/6 are important for IFIT1 activity can be broadly applied to other orthologs. A more systematic assessment of the role of these mutations across multiple diverse orthologs would provide more insight here. Do other antiviral proteins show this trend (ie exhibit little overlap in orthologs that inhibit these viruses). What do you think might be driving this? 

      We agree that other residues outside of 364 and 366 may be key drivers of antiviral activity across the IFTI1 orthologs tested. We do not hypothesize that this will broadly apply across IFIT1 from diverse clades of mammals as overall amino acid identity can differ by over 30%. However, based on the chimpanzee and human IFIT1 data, as well as sequence alignment within primates specifically, we believe these residues may be key for primate (but not necessarily other clades of mammals) IFIT1 antiviral activity.

      Regarding if other antiviral proteins show little overlap in orthologs that inhibit a given virus, to our knowledge such a functional study with this large and divergent dataset of orthologs has not been performed. However, there are many examples of restriction factors exhibiting speciesspecific antiviral activity when ortholog screens have been performed. For example, HIV was reported to be suppressed by MX2 orthologs from human, rhesus macaque, and African green monkey, but not sheep or dog MX2 (PMID: 24760893). In addition, foamy virus was inhibited by the human and rhesus macaque orthologs of PHF11, but not the mouse and feline orthologs (PMID: 32678836). Furthermore, studies from our lab have shown variability in RTP4 ortholog antiviral activity inhibition towards viruses much as hepatitis C virus (HCV), West Nile virus (WNV), and Zika virus (ZIKV) (PMID: 33113352).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      Summary: 

      Weiss and co-authors presented a versatile probabilistic tool. aTrack helps in classifying tracking behaviors and understanding important parameters for different types of single particle motion types: Brownian, Confined, or Directed motion. The tool can be used further to analyze populations of tracks and the number of motion states. This is a stand-alone software package, making it user-friendly for a broad group of researchers. 

      Strengths: 

      This manuscript presents a novel method for trajectory analysis. 

      Weaknesses: 

      (1) In the results section, is there any reason to choose the specific range of track length for determining the type of motion? The starting value is fine, and would be short enough, but do the authors have anything to report about how much is too long for the model? 

      We chose to test the range of track lengths (five-to-hundreds of steps) to cover the broad range of scenarios arising from single proteins or fluorophores to brighter objects with more labels.  While there is no upper-limit per se, the computation time of our method scales linearly with track length, 100 time-points takes ~2 minutes to run on a standard consumer-level desktop CPU. We have added the following sentence to note the time-cost with trajectory length:  

      “The recurrent formula enables our model computation time to scale linearly with the number of time points.”

      (2) Robustness to model mismatches is a very important section that the authors have uplifted diligently. Understanding where and how the model is limited is important. For example, the authors mentioned the limitation of trajectory length, do the authors have any information on the trajectory length range at which this method works accurately? This would be of interest to readers who would like to apply this method to their own data. 

      We agree that limitations are important to estimate, and trajectory length is an important consideration when choosing how to analyze a dataset. We report the categorization certainty, i.e. the likelihood differences, for a range of track lengths (Fig. 2 a,c, Fig. 3c-d, and Fig. 4 c,g.).

      For example, here are the key plots from Fig. 2 quantifying the relative likelihoods, where being within the light region is necessary. The light areas represent a useful likelihood ratio.

      We only performed analysis up to track lengths of 600 time steps but parameter estimations and significance can only improve when increasing the track length as long as the model assumptions are verified. The broader limitations and future opportunities for new methods are now expanded upon in the discussion, for example switching between states and model and state and model ambiguities (bound vs very slow diffusion vs very slow motion).

      (3) aTrack extracts certain parameters from the trajectories to determine the motion types. However, it is not very clear how certain parameters are calculated. For example, is the diffusion coefficient D calculated from fitting, and how is the confinement factor defined and estimated, with equations? This information will help the readers to understand the principles of this algorithm.

      We apologize for the confusion. All the model parameters are fit using the maximum likelihood approach. To make this point clearer in the manuscript, we have made three changes:

      (1) We modified the following sentence to replace “determined” with "fit”:

      “Finally, Maximum Likelihood Estimation (MLE) is used to fit the underlying parameter value”

      (2) We added the following sentence in the main text :

      “In our model, the velocity is the characteristic parameter of directed motion and the confinement factor represents the force within a potential well. More precisely, the confinement factor $l$ is defined such that at each time step the particle position is updated by $l$ times the distance particle/potential well center (see the Methods section for more details).”.

      (3) We have added a new section in the methods, called Fitting Method, where we have added the explanation below:

      “For the pure Brownian model, the parameters are the diffusion coefficient and the localization error. For the confinement model, the parameters are the diffusion coefficient, the localization error, confinement factor, and the diffusion coefficientof the potential well. For the directed model, the parameters are the diffusion coefficient, the localization error, the initial velocity and the acceleration variance.

      These parameters are estimated using the maximum likelihood approach which consists in finding the parameters that maximize the likelihood. We realize this fitting step using gradient descent via a TensorFlow model. All the estimates presented in this article are obtained from a single set of initial parameters to demonstrate that the convergence capacity of aTrack is robust to the initial parameter values.”

      (4) The authors mentioned the scenario where a particle may experience several types of motion simultaneously. How do these motions simulated and what do they mean in terms of motion types? Are they mixed motion (a particle switches motion types in the same trajectory) or do they simply present features of several motion types? It is not intuitive to the readers that a particle can be diffusive (Brownian) and direct at the same time. 

      In the text, we present an example where one can observe this type of motion to help the reader understand when this type of motion can be met: “Sometimes, particles undergo diffusion and directed motion simultaneously, for example, particles diffusing in a flowing medium (Qian 1991).”

      This is simulated by the addition of two terms affecting the hidden position variable before adding a localization term to create the observed variable. In the analysis, this manifests as non-zero values for the diffusion coefficient and the linear velocity. For example, Figure 4g and the associated text, where a single particle moves with a directed component and a Brownian diffusion component at each step.

      We did not simulate transitions between types of motion. Switching is not treated by this current model; however, this limitation is described in the discussion and our team and others are currently working on addressing this challenge.

      Reviewer #2 (Public Review): 

      Summary: 

      The authors present a software package "aTrack" for identification of motion types and parameter estimation in single-particle tracking data. The software is based on maximum likelihood estimation of the time-series data given an assumed motion model and likelihood ratio tests for model selection. They characterized the performance of the software mostly on simulated data and showed that it is applicable to experimental data. 

      Strengths: 

      A potential advantage of the presented method is its wide applicability to different motion types. 

      Weaknesses: 

      (1) There has been a lot of similar work in this field. Even though the authors included many relevant citations in the introduction, it is still not clear what this work uniquely offers. Is it the first time that direct MLE of the time-series data was developed? Suggestions to improve would include (a) better wording in the introduction section, (b) comparing to other popular methods (based on MSD, step-size statistics (Spot-On, eLife 2018;7:e33125), for example) using the simulated dataset generated by the authors, (c) comparing to other methods using data set in challenges/competitions (Nat. Comm (2021) 12:6253).  

      We thank the reviewer for this suggestion and agree that the explanation of the innovative aspects of our method in the introduction was not clear enough. We have now modified the introduction to better explain what is improved here compared to previous approaches.

      “The main innovations of this model are: 1) it uses analytical recurrence formulas to perform the integration step for complex motion, improving speed and accuracy; 2) it handles both confined and directed motion; 3) anomalous parameters, such as the center of the potential well and the velocity vector are allowed to change through time to better represent tracks with changing directed motion or confinement area; and lastly 4) for a given track or set of tracks, aTrack can determine whether tracks can be statistically categorized as confined or directed, and the parameters that best describe their behavior, for example, diffusion coefficient, radius of confinement, and speed of directed motion.”

      Regarding alternatives, we compare our method in the text to the best-performing algorithm of the

      2021 Anomalous Diffusion (AnDi) Challenge challenge mentioned by the reviewer in Figure 6 (RANDI, Argun et al, arXiv, 2021, Muñoz-Gil et al, Nat Com. 2021). Notably, both methods performed similarly on fBm, but ours was more robust in cases where there were small differences between the process underlying the data and the model assumptions, a likely scenario in real datasets. Regarding Spot-On, this was not mentioned as it only deals with multiple populations of Brownian diffusers, preventing a quantitative comparison.

      (2) The Hypothesis testing method presented here has a number of issues: first, there is no definition of testing statistics. Usually, the testing statistics are defined given a specific (Type I and/or Type II) error rate. There is also no discussion of the specificity and sensitivity of the testing results (i.e. what's the probability of misidentification of a Brownian trajectory as directed? etc).

      We now explain our statistical approach and how to perform hypothesis testing with our metric in a new supplementary section, Statistical test. 

      We use the likelihood ratio as a more conservative alternative to the p-value. In Fig S2, we show that our metric is an upper bound of the p-value and can be used to perform hypothesis testing with a chosen type I error rate. 

      Related, it is not clear what Figure 2e (and other similar plots) means, as the likelihood ratio is small throughout the parameter space. Also, for likelihood ratio tests, the authors need to discuss how model complexity affects the testing outcome (as more complex models tend to be more "likely" for the data) and also how the likelihood function is normalized (normalization is not an issue for MLE but critical for ratio tests). 

      We present the likelihood ratio as an upper bound of the p-value. Therefore, we can reject the null hypothesis if it is smaller than a given threshold, e.g. 0.05, but this number should be decreased if multiple tests are performed. The colorscale we show in the figure is meant to highlight the working range (light), and ambiguous range (dark) of the method.

      As the reviewer mentions, we expect the alternative hypothesis to result in higher likelihoods than the simpler null hypothesis for null hypothesis tracks, but, as seen in the Fig S2, the likelihood ratio of a dataset corresponding to the null hypothesis is strongly skewed toward its upper limit 1. This means that for most of the tracks, the likelihood is not (or little) affected by the model complexity. The likelihoods of all the models are normalized so their integrals over the data equals 1/A with A the area of the field of view which is independent of the model complexity.

      (3) Relating to the mathematical foundation (Figure 1b). The measured positions are drawn as direct arrows from the real position states: this infers instantaneous localization. In reality, there is motion blur which introduces a correlation of the measured locations. Motion blur is known to introduce bias in SPT analysis, how does it affect the method here? 

      The reviewer raises an important point as our model does not explicitly consider motion blur. We have now added a paragraph that presents how our model performs in case of motion blur in the section called Robustness to model mismatches. This section and the corresponding new Supplemental Fig. S7 demonstrate that the estimated diffusion length is accurate so long as the static localization error is higher than the dynamic localization error. If the dynamic localization error is higher, our model systematically underestimates the diffusion length by a factor 0.81 = (2/3)<sup>0.5</sup> which can be corrected for with an added post-processing step.  

      (4) The authors did not go through the interpretation of the figure. This may be a matter of style, but I find the figures ambiguous to interpret at times.  

      We thank the reviewer for their feedback on improving the readability. To avoid overly repetitive and lengthy sections of text, we have opted for a concise approach. This allows us to present closely related panels at the same point in the text, while not ignoring important variations and tests. Considering this feedback and the reviewers, we have added more information and interpretation throughout our manuscript to improve interpretability.

      (5) It is not clear to me how the classification of the 5 motion types was accomplished. 

      We have modified the specific text related to this figure to describe an illustrative example to show how one could use aTrack on a dataset where not that much is known: First, we present the method to determine the number of states; second, we verify the parameter estimates correspond to the different states.  

      Classifying individual tracks is possible. While not done in the section corresponding to Fig. 5, this is done in Fig. 7 and a new supplementary plot, Fig. S9b (shown below). In brief, this is accomplished with our method by computing the likelihood of each track given each state. The probability that a given track is in state k equals the likelihood of the track given the state divided by the sum of the likelihoods given the different states. 

      (6) Figure 3. Caption: what is ((d_{est}-0.1)/0.1)? Also panel labeled as "d" should be "e". 

      Thank you for bringing these errors to our attention, the panel and caption have been corrected.

      Reviewer #3 (Public Review): 

      Summary: 

      In this work, Simon et al present a new computational tool to assess non-Brownian single-particle dynamics (aTrack). The authors provide a solid groundwork to determine the motion type of single trajectories via an analytical integration of multiple hidden variables, specifically accounting for localization uncertainty, directed/confined motion parameters, and, very novel, allowing for the evolution of the directed/confined motion parameters over time. This last step is, to the best of my knowledge, conceptually new and could prove very useful for the field in the future. The authors then use this groundwork to determine the motion type and its corresponding parameter values via a series of likelihood tests. This accounts for obtaining the motion type which is statistically most likely to be occurring (with Brownian motion as null hypothesis). Throughout the manuscript, aTrack is rigorously tested, and the limits of the methods are fully explored and clearly visualised. The authors conclude with allowing the characterization of multiple states in a single experiment with good accuracy and explore this in various experimental settings. Overall, the method is fundamentally strong, wellcharacterised, and tested, and will be of general interest to the single-particle-tracking field. 

      Strengths: 

      (1) The use of likelihood ratios gives a strong statistical relevance to the methodology. There is a sharp decrease in likelihood ratio between e.g. confinement of 0.00 and 0.05 and velocity of 0.0 and 0.002 (figure 2c), which clearly shows the strength of the method - being able to determine 2nm/timepoint directed movement with 20 nm loc. error and 100 nm/timepoint diffusion is very impressive. 

      We apologize for the confusion, the directed tracks in Fig 2 have no Brownian-motion component, i.e. D=0. We have made this clearer in the main text. Specifically, this section of the text refers to a track in linear motion with 2 nm displacements per step. With 70 time points (69 steps), a single particle which moved from 138 nm with a localization error of 20 nm (95% uncertainty range of 80 nm) can be statistically distinguished from slow diffusive motion.

      In Fig. 4g, we explore the capabilities of our method to detect if a diffusive particle also has a directed motion component. 

      (2) Allowing the hidden variables of confinement and directed motion to change during a trajectory (i.e. the q factor) is very interesting and allows for new interpretations of data. The quantifications of these variables are, to me, surprisingly accurate, but well-determined. 

      (3) The software is well-documented, easy to install, and easy to use. 

      Weaknesses: 

      (1) The aTrack principle is limited to the motions incorporated by the authors, with, as far as I can see, no way to add new analytical non-Brownian motion. For instance, being able to add a dynamical stateswitching model (i.e. quick on/off switching between mobile and non-mobile, for instance, repeatable DNA binding of a protein), could be of interest. I don't believe this necessarily has to be incorporated by the authors, but it might be of interest to provide instructions on how to expand aTrack.  

      We agree that handling dynamic state switching is very useful and highlight this potential future direction in the discussion. The revised text reads:

      “An important limitation of our approach is that it presumes that a given track follows a unique underlying model with fixed parameters. In biological systems, particles often transition from one motion type to another; for example, a diffusive particle can bind to a static substrate or molecular motor (46). In such cases, or in cases of significant mislinkings, our model is not suitable. However, this limitation can be alleviated by implicitly allowing state transitions with a hidden Markov Model (15) or alternatives such as change-point approaches (30, 47, 48), and spatial approaches (49).”

      (2) The experimental data does not very convincingly show the usefulness of aTrack. The authors mention that SPBs are directed in mitosis and not in interphase. This can be quantified and studied by microscopy analysis of individual cells and confirming the aTrack direction model based on this, but this is not performed. Similarly, the size of a confinement spot in optical tweezers can be changed by changing the power of the optical tweezer, and this would far more strongly show the quantitative power of aTrack. 

      We agree with the reviewer and have revised the biological experiment section significantly to better illustrate the potential of aTrack in various use cases.

      Now, we show an experiment to quantify the effect of LatA, an actin inhibitor, on the fraction of directed tracks obtained with aTrack. We find that LatA significantly decreases directed motion while a LatA-resistant mutant is not affected (Fig7a-c).

      As suggested by the reviewer, we have expanded the optical tweezer experiment by varying the laser power. As expected, increasing the laser power decreases the confinement radius.

      (3) The software has a very strict limit on the number of data points per trajectory, which is a user input. Shorter trajectories are discarded, while longer trajectories are cut off to the set length. It is not explained why this is necessary, and I feel it deletes a lot of useful data without clear benefit (in experimental conditions).

      We thank the reviewer for this recommendation; we have now modified the architecture of our model to enable users to consider tracks of multiple lengths. Note that the computation time is proportional to the longest track length times the number of tracks.  

      Reviewer #2 (Recommendations For The Authors): 

      Develop a better mathematical foundation for the likelihood ratio tests. 

      We added more explanation of the likelihood ratio tests and their interpretation a new section entitled Statistical test in the supplementary information to address this recommendation.

      Place this work in clearer contexts. 

      We have now revised the introduction to better contextualize this work.

      Improve manuscript clarity. 

      Based on reviewer feedback and input from others, we have addressed this point throughout the article to improve readability.

      Make the code available. 

      The code is available on https://github.com/FrancoisSimon/aTrack, now including code for track generation.

      Reviewer #3 (Recommendations For The Authors): 

      (1) I believe the underlying model presented in Figure 1 is of substantial impact, especially when considering it as a simulation tool. I would suggest the authors make their method also available as a simulator (as far as I can tell, this is not explicitly done in their code repository, although logically the code required for the simulator should already be in the codebase somewhere). 

      Thank you for this suggestion, the simulation scripts are now on the Github repository together with the rest of the analysis method. https://github.com/FrancoisSimon/aTrack

      (2) The authors should explore and/or discuss the effects of wrong trajectory linking to their method. Throughout the text, fully correct trajectory linking is assumed and assessed, while in real experiments, it is often the case that trajectory linking is wrong, e.g. due to blinking emitters, imaging artefacts, high-density localizations, etc etc. This would have a major impact on the accuracy of trajectories, and it is extremely relevant to explore how this is translated to the output of aTrack. 

      As the reviewer notes, our current model does not account for track mislinking. This limits the method to data with lower fluorophore-densities, which is the typical use-case for SPT. We have added a brief description of the issue into the discussion of limitations.  

      (3) aTrack only supports 2D-tracking, but I don't believe there is a conceptual reason not to have this expanded to three dimensions. 

      The stand-alone software is currently limited to 2D tracks, however, the aTrack Python package works for any number of dimensions (i.e. 1-3). Note that since the current implementation assumes a single localization error for all axes, more modifications may be required for some types of 3D tracking. See https://github.com/FrancoisSimon/aTrack for more details about aTrack implementations.

      (4) Crucial information is missing in the experimental demonstrations. Especially in the NP-bacteria dataset, I miss scalebars, and information on the number of tracks. It is not explained why 5 different states are obtained - especially because I would naively expect three states: immobile NPs (e.g. stuck to glass), diffusing NPs, and NPs attached to bacteria, and thus directed. Figure 7e shows three diffusive states (why more than one?), no immobile states (why?), and two directed states (why?). 

      We thank the reviewer for pointing out these issues. We have now added scalebars and more experimental details to the figure and text as well as modifying the plot to more clearly emphasize the directed nanoparticles that are attached to cells from the diffusive nanoparticles.  

      Likely, our focal plane was too high to see the particles stuck on glass. The multiple diffusive states may be caused by different sizes of nanoparticle complexes, the multiple directed states can be caused by the fact that directed motion of the cell-attached-nanoparticles occasionally shows drastic changes of orientations. We have also clarified in the text how multiple states can help handle a heterogeneous population as was shown by Prindle et al. 2022, Microbiol Spectr. The characterization and phenotyping of microbial populations by nanoparticle tracking was published in Zapata et al. 2022, Nanoscale. 

      (5) I don't think I agree that 'robustness to model mismatches' is a good thing. Very crudely, the fact that aTrack finds fractional Brownian motion to be normal Brownian motion is technically a downside - and this should be especially carefully positioned if (in the future) a fractional Brownian motion model would be added to aTrack. I think that the author's point can be better tested by e.g. widely varying simulated vs fitted loc precision/diffusion coefficient (which are somewhat interchangeable).

      In this context, our intention in describing the robustness to “model mismatches” refers to classifying subdiffusion as subdiffusive irrespective of the exact subdiffusion motion physics (as well as superdiffusion), that is, to use aTrack how MSD analysis is often deployed. This is important in the context of real-world applications where simple mathematical models cannot perfectly represent real tracks with greater complexity. 

      Inevitably, some fraction of tracks with a pure Brownian motion may appear to match with a fractional Brownian motion, and thus statistical tests are needed to determine if this is significant. In general, aTrack finds fBm to be normal Brownian motion only when the anomalous coefficient is near 1, i.e. when the two models are indeed the same. When analysing fBm tracks with anomalous coefficients of 0.5 or 1.5, aTrack find that these tracks are better explained by our confined diffusion model or directed motion model, respectively (Please see Fig. 6a, copied below). 

      To better clarify our objective, the section now has a brief introduction that reads:

      “One of the most important features of a method is its robustness to deviations from its assumptions. Indeed, experimental tracking data will inevitably not match the model assumptions to some degree, and models need to be resilient to these small deviations.”  

      Smaller points: 

      (1) It is not clear what a biological example is of rotational diffusion. 

      We modified the text to better explain the use of rotational diffusion.

      (2) The text in the section on experimental data should be expanded and clarified, there currently are multiple 'floating sentences' that stop halfway, and it does not clearly describe the biological relevance and observed findings.  

      We thank the reviewer for pointing out this issue. We have reworked the experimental section to better and more clearly explain the biological relevance of the findings.

      (3) Caption of figure 3: 'd' should be 'e'. 

      (4) Caption of Figure 7: log-likelihood should be Lconfined - Lbrownian, I believe. 

      (5) Equation number missing in SI first sentence. 

      (6) Supplementary Figure 1 top part access should be Lc-Lb instead of Ld-Lb. 

      We have made these corrections, thank you for bringing them to our attention.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the reviewers for their careful assessment and enthusiastic appreciation of our work.

      __Reviewer #1 (Evidence, reproducibility and clarity (Required)): __In this article, Thomas et al. use a super-resolution approach in living cells to track proteins involved in the fusion event of sexual reproduction. They study the spatial organization and dynamics of the actin fusion focus, a key structure in cell-cell fusion in Schizosaccharomyces pombe. The researchers have adapted a high-precision centroid mapping method using three-color live-cell epifluorescence imaging to map the dynamic architecture of the fusion focus during yeast mating. The approach relies on tracking the centroid of fluorescence signals for proteins of interest, spatially referenced to Myo52-mScarlet-I (as a robust marker) and temporally referenced using a weakly fluorescent cytosolic protein (mRaspberry), which redistributes strongly upon fusion. The trajectories of five key proteins, including markers of polarity, cytoskeleton, exocytosis and membrane fusion, were compared to Myo52 over a 75-minute window spanning fusion. Their observations indicate that secretory vesicles maintain a constant distance from the plasma membrane whereas the actin network compacts. Most importantly, they discovered a positive feedback mechanism in which myosin V (Myo52) transports Fus1 formin along pre-existing actin filaments, thereby enhancing aster compaction.

      This article is well written, the arguments are convincing and the assertions are balanced. The centroid tracking method has been clearly and solidly controlled. Overall, this is a solid addition to our understanding of cytoskeletal organization in cell fusion.

      Major comments: No major comment.

      Minor comments: _ Page 8 authors wrote "Upon depletion of Myo52, Ypt3 did not accumulate at the fusion focus (Figure 3C). A thin, wide localization at the fusion site was occasionally observed (Figure 3C, Movies S3)" : Is there a quantification of this accumulation in the mutant?

      We will provide the requested quantification. The localization is very faint, so we are not sure that quantification will capture this faithfully, but we will try.

      _ The framerate of movies could be improved for reader comfort: For example, movie S6 lasts 0.5 sec.

      We agree that movies S3 and S6 frame rates could be improved. We will provide them with slower frame rate.

      Reviewer #1 (Significance (Required)):

      This study represents a conceptual and technical breakthrough in our understanding of cytoskeletal organization during cell-cell fusion. The authors introduce a high-precision, three-color live-cell centroid mapping method capable of resolving the spatio-temporal dynamics of protein complexes at the nanometer scale in living yeast cells. This methodological innovation enables systematic and quantitative mapping of the dynamic architecture of proteins at the cell fusion site, making it a powerful live-cell imaging approach. However, it is important to keep in mind that the increased precision achieved through averaging comes at the expense of overlooking atypical or outlier behaviors. The authors discovered a myosin V-dependent mechanism for the recruitment of formin that leads to actin aster compaction. The identification of Myo52 (myosin V) as a transporter of Fus1 (formin) to the fusion focus adds a new layer to our understanding of how polarized actin structures are generated and maintained during developmentally regulated processes such as mating.

      Previous studies have shown the importance of formins and myosins during fusion, but this paper provides a quantitative and dynamic mapping that demonstrates how Myo52 modulates Fus1 positioning in living cells. This provides a better understanding of actin organization, beyond what has been demonstrated by fixed-cell imaging or genetic perturbation.

      Audience: Cell biologists working on actin dynamics, cell-cell fusion and intracellular transport. Scientists involved in live-cell imaging, single particle tracking and cytoskeleton modeling.

      I have expertise in live-cell microscopy, image analysis, fungal growth machinery and actin organization.

      We thank the reviewer for their appreciation of our work.

      __Reviewer #2 (Evidence, reproducibility and clarity (Required)): __ A three-color imaging approach to use centroid tracking is employed to determine the high resolution position over time of tagged actin fusion focus proteins during mating in fission yeast. In particular, the position of different protein components (tagged in a 3rd color) were determined in relation to the position (and axis) of the molecular motor Myo52, which is tagged with two different colors in the mating cells. Furthermore, time is normalized by the rapid diffusion of a weak fluorescent protein probe (mRaspberry) from one cell to the other upon fusion pore opening. From this approach multiple important mechanistic insights were determined for the compaction of fusion focus proteins during mating, including the general compaction of different components as fusion proceeds with different proteins having specific stereotypical behaviors that indicate underlying molecular insights. For example, secretory vesicles remain a constant distance from the plasma membrane, whereas the formin Fus1 rapidly accumulates at the fusion focus in a Myo52-dependent manner.

      I have minor suggestions/points: (1) Figure 1, for clarity it would be helpful if the cells shown in B were in the same orientation as the cartoon cells shown in A. Similarly, it would be helpful to have the orientation shown in D the same as the data that is subsequently presented in the rest of the manuscript (such as Figure 2) where time is on the X axis and distance (position) is on the Y axis.

      We have turned each image in panel B by 180° to match the cartoon in A. For panel D, we are not sure what the reviewer would like. This panel shows the coordinates of each Myo52 position, whereas Figure 2 shows oriented distance (on the Y axis) over time (on the X axis). Perhaps the reviewer suggests that we should display panel D with a rotation onto the Y axis rather than the X axis. We feel that this would not bring more clarity and prefer to keep it as is.

      (2) Figure 2, for clarity useful to introduce how the position of Myo52 changes over time with respect to the fusion site (plasma membrane) earlier, and then come back to the positions of different proteins with respect to Myo52 shown in 2E. Currently the authors discuss this point after introducing Figure 2E, but better for the reader to have this in mind beforehand.

      We have added a sentence at the start of the section describing Figure 2, pointing out that the static appearance of Myo52 is due to it being used as reference, but that in reality, it moves relative to the plasma membrane: “Because Myo52 is the reference, its trace is flat, even though in reality Myo52 also moves relative to other proteins and the plasma membrane (see Figure 2E)”. This change is already in the text.

      (3) First sentence of page 8 "..., peaked at fusion time and sharply dropped post-fusion (Figure S3)." Figure S3 should be cited so that the reader knows where this data is presented.

      Thanks, we have added the missing figure reference to the text.

      (4) Figure 3D-H, why is Exo70 used as a marker for vesicles instead of Ypt3 for these experiments? Exo70 seems to have a more confusing localization than Ypt3 (3C vs 3D), which seems to complicate interpretations.

      There are two main reasons for this choice. First, the GFP-Ypt3 fluorescence intensity is lower than that of Exo70-GFP, which makes analysis more difficult and less reliable. Second, in contrast to Exo70-GFP where the endogenous gene is tagged at the native genomic locus, GFP-Ypt3 is expressed as additional copy in addition to endogenous untagged Ypt3. Although GFP-Ypt3 was reported to be fully functional as it can complement the lethality of a ypt3 temperature sensitive mutant (Cheng et al, MBoC 2002), its expression levels are non-native and we do not have a strain in which ypt3 is tagged at the 5’ end at the native genomic locus. For these reasons, we preferred to examine in detail the localization of Exo70. We do not think it complicates interpretations. Exo70 faithfully decorates vesicles and exhibits the same localization as Ypt3 in WT cells (see Figure 2D) and in myo52-AID (see Figure 3C-D). We realize that our text was a bit confusing as we opposed the localization of Exo70 and Ypt3, when all we wanted to state was that the Exo70-GFP signal is stronger. We have corrected this in the text.

      (5) Page 10, end of first paragraph, "We conclude...and promotes separation of Myo52 from the vesicles." This is an interesting hypothesis/interpretation that is consistent with the spatial-temporal organization of vesicles and the compacting fusion focus, but the underlying molecular mechanism has not be concluded.

      This is an interpretation that is in line with our data. Firm conclusion that the organization of the actin fusion focus imposes a steric barrier to bulk vesicle entry will require in vitro reconstitution of an actin aster driven by formin-myosin V feedback and addition of myosin V vesicle-like cargo, which can be a target for future studies. To make clear that it is an interpretation and not a definitive statement, we have added “likely” to the sentence, as in: “We conclude that the distal position of vesicles in WT cells is a likely steric consequence of the architecture of the fusion focus, which restricts space at the center of the actin aster and promotes separation of Myo52 from the vesicles”.

      (6) Figure 5F and 5G, the results are confusing and should be discussed further. Depletion of Myo52 decreases Fus1 long-range movements, indicating that Fus1 is being transported by Myo52 (5F). Similarly, the Fus1 actin assembly mutant greatly decreases Fus1 long-range movements and prevents Myo52 binding (5G), perhaps indicating that Fus1-mediated actin assembly is important. It seems the author's interpretations are oversimplified.

      We show that Myo52 is critical for Fus1 long-range movements, as stated by the reviewer. We also show that Fus1-mediated actin assembly is important. The question is in what way.

      One possibility is that FH2-mediated actin assembly powers the movement, which in this case represents the displacement of the formin due to actin monomer addition on the polymerizing filament. A second possibility is that actin filaments assembled by Fus1 somehow help Myo52 move Fus1. This could be for instance because Fus1-assembled actin filaments are preferred tracks for Myo52-mediated movements, or because they allow Myo52 to accumulate in the vicinity of Fus1, enhancing their chance encounter and thus the number of long-range movements (on any actin track). Based on the analysis of the K1112A point mutant in Fus1 FH2 domain, our data cannot discriminate between these three different options, which is why we concluded that the mutant allele does not allow us to make a firm conclusion. However, the Myo52-dependence clearly shows that a large fraction of the movements requires the myosin V. We have clarified the end of the paragraph in the following way: “Therefore, analysis of the K1112A mutant phenotype does not allow us to clearly distinguish between Fus1-powered from Myo52-powered movements. Future work will be required to test whether, in addition to myosin V-dependent transport, Fus1-mediated actin polymerization also directly contributes to Fus1 long-range movements.”

      (7) Figure 6, why not measure the fluorescence intensity of Fus1 as a proxy for the number of Fus1 molecules (rather than the width of the Fus1 signal), which seems to be the more straight-forward analysis?

      The aim of the measurement was to test whether Myo52 and Fus1 activity help focalize the formin at the fusion site, not whether these are required for localization in this region. This is why we are measuring the lateral spread of the signal (its width) rather than the fluorescence intensity of the signal. We know from previous work that Fus1 localizes to the shmoo tip independently of myosin V (Dudin et al, JCB 2015), and we also show this in Figure 6. However, the precise distribution of Fus1 is wider in absence of the myosins.

      We can and will measure intensities to test whether there is also a quantitative difference in the number of molecules at the shmoo tip.

      (8) Figure 7, the authors should note (and perhaps discuss) any evidence as to whether activation of Fus1 to facilitate actin assembly depends upon Fus1 dissociating from Myo52 or whether Fus1 can be activated while still associated with Myo52, as both circumstances are included in the figure.

      This is an interesting point. We have no experimental evidence for or against Fus1 dissociating from Myo52 to assemble actin. However, it is known that formins rotate along the actin filament double helix as they assemble it, a movement that seems poorly compatible with processive transport by myosin V. In Figure 7, we do not particularly want to imply that Myo52 associates with Fus1 linked or not with an actin filament. The figure serves to illustrate the focusing mechanism of myosin V transporting a formin, which is more evident when we draw the formin attached to a filament end. We have now added a sentence in the figure legend to clarify this point: “Note that it is unknown whether Myo52 transports Fus1 associated or not with an actin filament.”

      (9) Figure 7, the color of secretory vesicles should be the same in A and B.

      This is now corrected.

      Reviewer #2 (Significance (Required)):

      This is an impactful and high quality manuscript that describes an elegant experimental strategy with important insights determined. The experimental imaging strategy (and analysis), as well as the insight into the pombe mating fusion focus and its comparison to other cytoskeletal compaction events will be of broad scientific interest.

      We thank the reviewer for their appreciation of our work.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Summary:

      Fission yeast cell-cell fusion during mating is mediated by an actin-based structure called the 'fusion focus', which orchestrates actin polymerization by the mating-specific formin, Fus1, to direct polarized secretion towards the mating site. In the current study, Thomas and colleagues quantitatively map the spatial distribution of proteins mediating cell-cell fusion using a three-color fluorescence imaging methodology in the fission yeast Schizosaccharomyces pombe. Using Myo52 (Type V myosin) as a fluorescence reference point, the authors discover that proteins known to localize to the fusion focus have distinct spatial distributions and accumulation profiles at the mating site. Myo52 and Fus1 form a complex in vivo detected by co-immunoprecipitation and each contribute to directing secretory vesicles to the fusion focus. Previous work from this group has shown that the intrinsically disordered region (IDR) of Fus1 plays a critical role in forming the fusion focus. Here, the authors swap out the IDR of fission yeast Fus1 for the IDR of an unrelated mammalian protein, coincidentally called 'fused in sarcoma' (FUS). They express the Fus1∆IDR-FUSLC-27R chimera in mitotically dividing fission yeast cells, where Fus1 is not normally expressed, and discover that the Fus1∆IDR-FUSLC-27R chimera can travel with Myo52 on actively polymerizing actin cables. Additionally, they show that acute loss of Myo52 or Fus1 function, using Auxin-Inducible Degradation (AID) tags and point mutations, impair the normal compaction of the fusion focus, suggesting that direct interaction and coordination of Fus1 and Myo52 helps shape this structure.

      Major Comments:

      (1) In the Results section for Figure 2, the authors claim that actin filaments become shorter and more cross-linked they move away from the fusion site during mating, and suggest that this may be due to the presence of Myo51. However, the evidence to support this claim is not made clear. Is it supported by high-resolution electron microscopy of the actin filaments, or some other results? This needs to be clarified.

      Sorry if our text was unclear. The basis for the claim that actin filaments become shorter comes from our observation that the average position of tropomyosin and Myo51, both of which decorate actin filaments, is progressively closer to both Fus1 and the plasma membrane. Thus, the actin structure protrudes less into the cytosol as fusion progresses. The basis for claiming that Myo51 promotes actin filament crosslinking comes mainly from previously published papers, which had shown that 1) Myo51 forms complexes with the Rng8 and Rng9 proteins (Wang et al, JCB 2014), and 2) the Myo51-Rng8/9 not only binds actin through Myo51 head domain but also binds tropomyosin-decorated actin through the Rng8/9 moiety (Tang et al, JCB 2016; reference 27 in our manuscript). We had also previously shown that these proteins are necessary for compaction of the fusion focus (Dudin et al, PLoS Genetics 2017; reference 28 in our manuscript). Except for measuring the width of Fus1 distribution in myo51∆ mutants, which confirms previous findings, we did not re-investigate here the function of Myo51.

      We have now re-written this paragraph to present the previous data more clearly: “The distal localization of Myo51 was mirrored by that of tropomyosin Cdc8, which decorates linear actin filaments (Figure 2B) (Hatano et al, 2022). The distal position of the bulk of Myo51-decorated actin filaments was confirmed using Airyscan super-resolution microscopy (Figure 2B, right). Thus, the average position of actin filaments and decreasing distance to Myo52 indicates they initially extend a few hundred nanometers into the cytosol and become progressively shorter as fusion proceeds. Previous work had shown that Myo51 cross-links and slides Cdc8-decorated actin filaments relative to each other (Tang et al, 2016) and that both proteins contribute to compaction of the fusion focus in the lateral dimension along the cell-cell contact area (perpendicular to the fusion axis) (Dudin et al, 2017). We confirmed this function by measuring the lateral distribution of Fus1 along the cell-cell contact area (perpendicular to the fusion axis), which was indeed wider in myo51∆ than WT cells (see below Figure 6A-B).”

      (2) In Figure 4, the authors comment that disrupting Fus1 results in more disperse Myo52 spatial distribution at the fusion focus, raising the possibility that Myo52 normally becomes focused by moving on the actin filaments assembled by Fus1. This can be tested by asking whether latrunculin treatment phenocopies the 'more dispersed' Myo52 localization seen in fus1∆ cells? If Myo52 is focused instead by its direct interaction with Fus1, the latrunculin treatment should not cause the same phenotype.

      This is in principle a good idea, though it is technically challenging because pharmacological treatment of cell pairs in fusion is difficult to do without disturbing pheromone gradients which are critical throughout the fusion process (see Dudin et al, Genes and Dev 2016). We will try the experiment but are unsure about the likelihood of technical success.

      We note however that a similar experiment was done previously on Fus1 overexpressed in mitotic cells (Billault-Chaumartin et al, Curr Biol 2022; Fig 1D). Here, Fus1 also forms a focus and latrunculin A treatment leads to Myo52 dispersion while keeping the Fus1 focus, which is in line with our proposal that Myo52 becomes focused by moving on Fus1-assembled actin filaments. Similarly, we showed in Figure 5B that Latrunculin A treatment of mitotic cells expressing Fus1∆IDR-FUSLC-27R also results in Myo52, but not Fus1 dispersion.

      (3) The Fus1∆IDR-FUSLC-27R chimera used in Figure 5 is an interesting construct to examine actin-based transport of formins in cells. I was curious if the authors could provide the rates of movement for Myo52 and for Fus1∆IDR-FUSLC-27R, both before and after acute depletion of Myo52. It would be interesting to see if loss of Myo52 alters the rate of movement, or instead the movement stems from formin-mediated actin polymerization.

      We will measure these rates.

      (4) Also, Myo52 is known to interact with the mitotic formin For3. Does For3 colocalize with Myo52 and Fus1∆IDR-FUSLC-27R along actin cables?

      This is an interesting question for which we do not have an answer. For technical reasons, we do not have the tools to co-image For3 with Fus1∆IDR-FUSLC-27R because both are tagged with GFP. We feel that this question goes beyond the scope of this paper.

      (5) If Fus1∆IDR-FUSLC-27R is active, does having ectopic formin activity in mitotic cells affect actin cable architecture? This could be assessed by comparing phalloidin staining for wildtype and Fus1∆IDR-FUSLC-27R cells.

      We are not sure what the purpose of this experiment is, or how informative it would be. If it is to evaluate whether Fus1∆IDR-FUSLC-27R is active, our current data already demonstrates this. Indeed, Fus1∆IDR-FUSLC-27R recruits Myo52 in a F-actin and FH2 domain-dependent manner (shown in Figure 5B and 5G), which demonstrates that Fus1∆IDR-FUSLC-27R FH2 domain is active. Even though Fus1∆IDR-FUSLC-27R assembles actin, we predict that its effect on general actin organization will be weak. Indeed, it is expressed under endogenous fus1 promoter, leading to very low expression levels during mitotic growth, such that only a subset of cells exhibit a Fus1 focus. Furthermore, most of these Fus1 foci are at or close to cell poles, where linear actin cables are assembled by For3, such that they may not have a strong disturbing effect. Because analysis of actin cable organization by phalloidin staining is difficult (due to the more strongly staining actin patches), cells with clear change in organization predicted to be rare in the population, and the gain in knowledge not transformative, we are not keen to do this experiment.

      Minor Comments:

      Prior studies are referenced appropriately. Text and figures are clear and accurate. My only suggestion would be Figure 1E-H could be moved to the supplemental material, due to their extremely technical nature. I believe this would help the broad audience focus on the experimental design mapped out in Figure 1A-D.

      We are relatively neutral about this. If this suggestion is supported by the Editor, we can move these panels to supplement.

      Reviewer #3 (Significance (Required)):

      Significance: This study provides an improved imaging method for detecting the spatial distributions of proteins below 100 nm, providing new insights about how a relatively small cellular structure is organized. The use of three-color cell imaging to accurately measure accumulation rates of molecular components of the fusion focus provides new insight into the development of this structure and its roles in mating. This method could be applied to other multi-protein structures found in different cell types. This work uses rigorously genetic tools such as knockout, knockdown and point mutants to dissect the roles of the formin Fus1 and Type V myosin Myo52 in creating a proper fusion focus. The study could be improved by biochemical assays to test whether Myo52 and Fus1 directly interact, since the interaction is only shown by co-immunoprecipitation from extracts, which may reflect an indirect interaction.

      Indeed, future studies should dissect the Fus1-Myo52 interaction, to determine whether it is direct and identify mutants that impair it.

      I believe this work advances the cell-mating field by providing others with a spatial and temporal map of conserved factors arriving to the mating site. Additionally, they identified a way to study a mating specific protein in mitotically dividing cells, offering future questions to address.

      This study should appeal to a range of basic scientists interested in cell biology, the cytoskeleton, and model organisms. The three-colored quantitative imaging could be applied to defining the architecture of many other cellular structures in different systems. Myosin and actin scientists will be interested in how this work expands the interplay of these two fields.

      I am a cell biologist with expertise in live cell imaging, genetics and biochemistry.

      We thank the reviewer for their appreciation of our work.

    1. Reviewer #1 (Public review):

      Summary:

      Parise presents another instantiation of the Multisensory Correlation Detector model that can now accept stimulus-level inputs. This is a valuable development as it removes researcher involvement in the characterization/labeling of features and allows analysis of complex stimuli with a high degree of nuance that was previously unconsidered (i.e. spatial/spectral distributions across time). The author demonstrates the power of the model by fitting data from dozens of previous experiments including multiple species, tasks, behavioral modality, and pharmacological interventions.

      Strengths:

      One of the model's biggest strengths, in my opinion, is its ability to extract complex spatiotemporal co-relationships from multisensory stimuli. These relationships have typically been manually computed or assigned based on stimulus condition and often distilled to a single dimension or even single number (e.g., "-50 ms asynchrony"). Thus, many models of multisensory integration depend heavily on human preprocessing of stimuli and these models miss out on complex dynamics of stimuli; the lead modality distribution apparent in figure 3b and c are provocative. I can imagine the model revealing interesting characteristics of the facial distribution of correlation during continuous audiovisual speech that have up to this point been largely described as "present" and almost solely focused on the lip area.

      Another aspect that makes the MCD stand out among other models is the biological inspiration and generalizability across domains. The model was developed to describe a separate process - motion perception - and in a much simpler organism - drosophila. It could then describe a very basic neural computation that has been conserved across phylogeny (which is further demonstrated in the ability to predict rat, primate, and human data) and brain area. This aspect makes the model likely able to account for much more than what has already been demonstrated with only a few tweaks akin to the modifications described in this and previous articles from Parise.

      What allows this potential is that, as Parise and colleagues have demonstrated in those papers since our (re)introduction of the model in 2016, the MCD model is modular - both in its ability to interface with different inputs/outputs and its ability to chain MCD units in a way that can analyze spatial, spectral, or any other arbitrary dimension of a stimulus. This fact leaves wide-open the possibilities for types of data, stimuli, and tasks a simplistic neutrally inspired model can account for.

      And so it's unsurprising (but impressive!) that Parise has demonstrated the model's ability here to account for such a wide range of empirical data from numerous tasks (synchrony/temporal order judgement, localization, detection, etc.) and behavior types (manual/saccade responses, gaze, etc.) using only the stimulus and a few free parameters. This ability is another of the model's main strengths that I think deserves some emphasis: it represents a kind of validation of those experiments - especially in the context of cross-experiment predictions.

      Finally, what is perhaps most impressive to me is that the MCD (and the accompanying decision model) does all this with very few (sometimes zero) free parameters. This highlights the utility of the model and the plausibility of its underlying architecture, but also helps to prevent extreme overfitting if fit correctly.

      Weaknesses:

      The model boasts an incredible versatility across tasks and stimulus configurations and its overall scope of the model is to understand how and what relevant sensory information is extracted from a stimulus. We still need to exercise care when interpreting its parameters, especially considering the broader context of top-down control of perception and that some multisensory mappings may not be derivable purely from stimulus statistics (e.g., the complementary nature of some phonemes/visemes).

    2. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      Parise presents another instantiation of the Multisensory Correlation Detector model that can now accept stimulus-level inputs. This is a valuable development as it removes researcher involvement in the characterization/labeling of features and allows analysis of complex stimuli with a high degree of nuance that was previously unconsidered (i.e., spatial/spectral distributions across time). The author demonstrates the power of the model by fitting data from dozens of previous experiments, including multiple species, tasks, behavioral modalities, and pharmacological interventions.

      Thanks for the kind words!

      Strengths:

      One of the model's biggest strengths, in my opinion, is its ability to extract complex spatiotemporal co-relationships from multisensory stimuli. These relationships have typically been manually computed or assigned based on stimulus condition and often distilled to a single dimension or even a single number (e.g., "-50 ms asynchrony"). Thus, many models of multisensory integration depend heavily on human preprocessing of stimuli, and these models miss out on complex dynamics of stimuli; the lead modality distribution apparent in Figures 3b and c is provocative. I can imagine the model revealing interesting characteristics of the facial distribution of correlation during continuous audiovisual speech that have up to this point been largely described as "present" and almost solely focused on the lip area.

      Another aspect that makes the MCD stand out among other models is the biological inspiration and generalizability across domains. The model was developed to describe a separate process - motion perception - and in a much simpler organism - Drosophila. It could then describe a very basic neural computation that has been conserved across phylogeny (which is further demonstrated in the ability to predict rat, primate, and human data) and brain area. This aspect makes the model likely able to account for much more than what has already been demonstrated with only a few tweaks akin to the modifications described in this and previous articles from Parise.

      What allows this potential is that, as Parise and colleagues have demonstrated in those papers since our (re)introduction of the model in 2016, the MCD model is modular - both in its ability to interface with different inputs/outputs and its ability to chain MCD units in a way that can analyze spatial, spectral, or any other arbitrary dimension of a stimulus. This fact leaves wide open the possibilities for types of data, stimuli, and tasks a simplistic, neutrally inspired model can account for.

      And so it's unsurprising (but impressive!) that Parise has demonstrated the model's ability here to account for such a wide range of empirical data from numerous tasks (synchrony/temporal order judgement, localization, detection, etc.) and behavior types (manual/saccade responses, gaze, etc.) using only the stimulus and a few free parameters. This ability is another of the model's main strengths that I think deserves some emphasis: it represents a kind of validation of those experiments, especially in the context of cross-experiment predictions (but see some criticism of that below).

      Finally, what is perhaps most impressive to me is that the MCD (and the accompanying decision model) does all this with very few (sometimes zero) free parameters. This highlights the utility of the model and the plausibility of its underlying architecture, but also helps to prevent extreme overfitting if fit correctly (but see a related concern below).

      We sincerely thank the reviewer for their thoughtful and generous comments. We are especially pleased that the core strengths of the model—its stimulus-computable architecture, biological grounding, modularity, and cross-domain applicability—were clearly recognized. As the reviewer rightly notes, removing researcher-defined abstractions and working directly from naturalistic stimuli opens the door to uncovering previously overlooked dynamics in complex multisensory signals, such as the spatial and temporal richness of audiovisual speech.

      We also appreciate the recognition of the model’s origins in a simple organism and its generalization across species and behaviors. This phylogenetic continuity reinforces our view that the MCD captures a fundamental computation with wide-ranging implications. Finally, we are grateful for the reviewer’s emphasis on the model’s predictive power across tasks and datasets with few or no free parameters—a property we see as key to both its parsimony and explanatory utility.

      We have highlighted these points more explicitly in the revised manuscript, and we thank the reviewer for their generous and insightful endorsement of the work.

      Weaknesses:

      There is an insufficient level of detail in the methods about model fitting. As a result, it's unclear what data the models were fitted and validated on. Were models fit individually or on average group data? Each condition separately? Is the model predictive of unseen data? Was the model cross-validated? Relatedly, the manuscript mentions a randomization test, but the shuffled data produces model responses that are still highly correlated to behavior despite shuffling. Could it be that any stimulus that varies in AV onset asynchrony can produce a psychometric curve that matches any other task with asynchrony judgements baked into the task? Does this mean all SJ or TOJ tasks produce correlated psychometric curves? Or more generally, is Pearson's correlation insensitive to subtle changes here, considering psychometric curves are typically sigmoidal? Curves can be non-overlapping and still highly correlated if one is, for example, scaled differently. Would an error term such as mean-squared or root mean-squared error be more sensitive to subtle changes in psychometric curves? Alternatively, perhaps if the models aren't cross-validated, the high correlation values are due to overfitting?

      The reviewer is right: the current version of the manuscript only provides limited information about parameter fitting. In the revised version of the manuscript, we included a parameter estimation and generalizability section that includes all information requested by the reviewer.

      To test whether using the MSE instead of Pearson correlation led to a similar estimated set of parameter values, we repeated the fitting using the MSE. The parameter estimated with this method (TauV, TauA, TauBim) closely followed those estimated using Pearson correlation (TauV, TauA, TauBim). Given the similarity of these results, we have chosen not to include further figures, however this analysis is now included in the new section (pages 23-24).

      Regarding the permutation test, it is expected that different stimuli produce analogous psychometric functions: after all, all studies relied on stimuli containing identical manipulation of lags. As a result, MCD population responses tend to be similar across experiments. Therefore, it is not a surprise that the permuted distribution of MCD-data correlation in Supplementary Figure 1K has a mean as high as 0.97. However, what is important is to demonstrate that the non-permuted dataset has an even higher goodness of fit. Supplementary Figure 1K demonstrates that none of the permuted stimuli could outperform the non-permuted dataset; the mean of the non-permuted distribution is 4.7 (standard deviations) above the mean of the already high  permuted distribution.

      We believe the new section, along with the present response, fully addresses the legitimate concerns of the reviewer.

      While the model boasts incredible versatility across tasks and stimulus configurations, fitting behavioral data well doesn't mean we've captured the underlying neural processes, and thus, we need to be careful when interpreting results. For example, the model produces temporal parameters fitting rat behavior that are 4x faster than when fitting human data. This difference in slope and a difference at the tails were interpreted as differences in perceptual sensitivity related to general processing speeds of the rat, presumably related to brain/body size differences. While rats no doubt have these differences in neural processing speed/integration windows, it seems reasonable that a lot of the differences in human and rat psychometric functions could be explained by the (over)training and motivation of rats to perform on every trial for a reward - increasing attention/sensitivity (slope) - and a tendency to make mistakes (compression evident at the tails). Was there an attempt to fit these data with a lapse parameter built into the decisional model as was done in Equation 21? Likewise, the fitted parameters for the pharmacological manipulations during the SJ task indicated differences in the decisional (but not the perceptual) process and the article makes the claim that "all pharmacologically-induced changes in audiovisual time perception" can be attributed to decisional processes "with no need to postulate changes in low-level temporal processing." However, those papers discuss actual sensory effects of pharmacological manipulation, with one specifically reporting changes to response timing. Moreover, and again contrary to the conclusions drawn from model fits to those data, both papers also report a change in psychometric slope/JND in the TOJ task after pharmacological manipulation, which would presumably be reflected in changes to the perceptual (but not the decisional) parameters.

      Fitting or predicting behaviour does not in itself demonstrate that a model captures the underlying neural computations—though it may offer valuable constraints and insights. In line with this, we were careful not to extrapolate the implications of our simulations to specific neural mechanisms.

      Temporal sensitivity is, by definition, a behavioural metric, and—as the reviewer correctly notes—its estimation may reflect a range of contributing factors beyond low-level sensory processing, including attention, motivation, and lapse rates (i.e., stimulus-independent errors). In Equation 21, we introduced a lapse parameter specifically to account for such effects in the context of monkey eye-tracking data. For the rat datasets, however, the inclusion of a lapse term was not required to achieve a close fit to the psychometric data (ρ = 0.981). While it is likely that adding a lapse component would yield a marginally better fit, the absence of single-trial data prevents us from applying model comparison criteria such as AIC or BIC to justify the additional parameter. In light of this, and to avoid unnecessary model complexity, we opted not to include a lapse term in the rat simulations.

      With respect to the pharmacological manipulation data, we acknowledge the reviewer’s point that observed changes in slope and bias could plausibly arise from alterations at either the sensory or decisional level—or both. In our model, low-level sensory processing is instantiated by the MCD architecture, which outputs the MCDcorr and MCDlag signals that are then scaled and integrated during decision-making. Importantly, this scaling operation influences the slope of the resulting psychometric functions, such that changes in slope can arise even in the absence of any change to the MCD’s temporal filters. In our simulations, the temporal constants of the MCD units were fixed to the values estimated from the non-pharmacological condition (see parameter estimation section above), and only the decision-related parameters were allowed to vary. From this modelling perspective, the behavioural effects observed in the pharmacological datasets can be explained entirely by changes at the decisional level. However, we do not claim that such an explanation excludes the possibility of genuine sensory-level changes. Rather, we assert that our model can account for the observed data without requiring modifications to early temporal tuning.

      To rigorously distinguish sensory from decisional effects, future experiments will need to employ stimuli with richer temporal structure—e.g., temporally modulated sequences of clicks and flashes that vary in frequency, phase, rhythm, or regularity (see Fujisaki & Nishida, 2007; Denison et al., 2012; Parise & Ernst, 2016, 2025; Locke & Landy, 2017; Nidiffer et al., 2018). Such stimuli engage the MCD in a more stimulus-dependent manner, enabling a clearer separation between early sensory encoding and later decision-making processes. Unfortunately, the current rat datasets—based exclusively on single click-flash pairings—lack the complexity needed for such disambiguation. As a result, while our simulations suggest that the observed pharmacologically induced effects can be attributed to changes in decision-level parameters, they do not rule out concurrent sensory-level changes.

      In summary, our results indicate that changes in the temporal tuning of MCD units are not necessary to reproduce the observed pharmacological effects on audiovisual timing behaviour. However, we do not assert that such changes are absent or unnecessary in principle. Disentangling sensory and decisional contributions will ultimately require richer datasets and experimental paradigms designed specifically for this purpose. We have now modified the results section (page 6) and the discussion (page 11) to clarify these points.

      The case for the utility of a stimulus-computable model is convincing (as I mentioned above), but its framing as mission-critical for understanding multisensory perception is overstated, I think. The line for what is "stimulus computable" is arbitrary and doesn't seem to be followed in the paper. A strict definition might realistically require inputs to be, e.g., the patterns of light and sound waves available to our eyes and ears, while an even more strict definition might (unrealistically) require those stimuli to be physically present and transduced by the model. A reasonable looser definition might allow an "abstract and low-dimensional representation of the stimulus, such as the stimulus envelope (which was used in the paper), to be an input. Ultimately, some preprocessing of a stimulus does not necessarily confound interpretations about (multi)sensory perception. And on the flip side, the stimulus-computable aspect doesn't necessarily give the model supreme insight into perception. For example, the MCD model was "confused" by the stimuli used in our 2018 paper (Nidiffer et al., 2018; Parise & Ernst, 2025). In each of our stimuli (including catch trials), the onset and offset drove strong AV temporal correlations across all stimulus conditions (including catch trials), but were irrelevant to participants performing an amplitude modulation detection task. The to-be-detected amplitude modulations, set at individual thresholds, were not a salient aspect of the physical stimulus, and thus only marginally affected stimulus correlations. The model was of course, able to fit our data by "ignoring" the on/offsets (i.e., requiring human intervention), again highlighting that the model is tapping into a very basic and ubiquitous computational principle of (multi)sensory perception. But it does reveal a limitation of such a stimulus-computable model: that it is (so far) strictly bottom-up.

      We appreciate the reviewer’s thoughtful engagement with the concept of stimulus computability. We agree that the term requires careful definition and should not be taken as a guarantee of perceptual insight or neural plausibility. In our work, we define a model as “stimulus-computable” if all its inputs are derived directly from the stimulus, rather than from experimenter-defined summary descriptors such as temporal lag, spatial disparity, or cue reliability. In the context of multisensory integration, this implies that a model must account not only for how cues are combined, but also for how those cues are extracted from raw inputs—such as audio waveforms and visual contrast sequences.

      This distinction is central to our modelling philosophy. While ideal observer models often specify how information should be combined once identified, they typically do not address the upstream question of how this information is extracted from sensory input. In that sense, models that are not stimulus-computable leave out a key part of the perceptual pipeline. We do not present stimulus computability as a marker of theoretical superiority, but rather as a modelling constraint that is necessary if one’s aim is to explain how structured sensory input gives rise to perception. This is a view that is also explicitly acknowledged and supported by Reviewer 2.

      Framed in Marr’s (1982) terms, non–stimulus-computable models tend to operate at the computational level, defining what the system is doing (e.g., computing a maximum likelihood estimate), whereas stimulus-computable models aim to function at the algorithmic level, specifying how the relevant representations and operations might be implemented. When appropriately constrained by biological plausibility, such models may also inform hypotheses at the implementational level, pointing to potential neural substrates that could instantiate the computation.

      Regarding the reviewer’s example illustrating a limitation of the MCD model, we respectfully note that the account appears to be based on a misreading of our prior work. In Parise & Ernst (2025), where we simulated the stimuli from Nidiffer et al. (2018), the MCD model reproduced participants’ behavioural data without any human intervention or adjustment. The model was applied in a fully bottom-up, stimulus-driven manner, and its output aligned with observer responses as-is. We suspect the confusion may stem from analyses shown in Figure 6 - Supplement Figure 5 of Parise & Ernst (2025), where we investigated the lack of a frequency-doubling effect in the Nidiffer et al. data. However, those analyses were based solely on the Pearson correlation between auditory and visual stimulus envelopes and did not involve the MCD model. No manual exclusion of onset/offset events was applied, nor was the MCD used in those particular figures. We also note that Parise & Ernst (2025) is a separate, already published study and is not the manuscript currently under review. 

      In summary, while we fully agree that stimulus computability does not resolve all the complexities of multisensory perception (see comments below about speech), we maintain that it provides a valuable modelling constraint—one that enables robust, generalisable predictions when appropriately scoped. 

      The manuscript rightly chooses to focus a lot of the work on speech, fitting the MCD model to predict behavioral responses to speech. The range of findings from AV speech experiments that the MCD can account for is very convincing. Given the provided context that speech is "often claimed to be processed via dedicated mechanisms in the brain," a statement claiming a "first end-to-end account of multisensory perception," and findings that the MCD model can account for speech behaviors, it seems the reader is meant to infer that energetic correlation detection is a complete account of speech perception. I think this conclusion misses some facets of AV speech perception, such as integration of higher-order, non-redundant/correlated speech features (Campbell, 2008) and also the existence of top-down and predictive processing that aren't (yet!) explained by MCD. For example, one important benefit of AV speech is interactions on linguistic processes - how complementary sensitivity to articulatory features in the auditory and visual systems (Summerfield, 1987) allow constraint of linguistic processes (Peelle & Sommers, 2015; Tye-Murray et al., 2007).

      We thank the reviewer for their thoughtful comments, and especially for the kind words describing the range of findings from our AV speech simulations as “very convincing.”

      We would like to clarify that it is not our view that speech perception can be reduced to energetic correlation detection. While the MCD model captures low- to mid-level temporal dependencies between auditory and visual signals, we fully agree that a complete account of audiovisual speech perception must also include higher-order processes—including linguistic mechanisms and top-down predictions. These are critical components of AV speech comprehension, and lie beyond the scope of the current model.

      Our use of the term “end-to-end” is intended in a narrow operational sense: the model transforms raw audiovisual input (i.e., audio waveforms and video frames) directly into behavioural output (i.e., button press responses), without reliance on abstracted stimulus parameters such as lag, disparity or reliability. It is in this specific technical sense that the MCD offers an end-to-end model. We have revised the manuscript to clarify this usage to avoid any misunderstanding.

      In light of the reviewer’s valuable point, we have now edited the Discussion to acknowledge the importance of linguistic processes (page 13) and to clarify what we mean by end-to-end account (page 11). We agree that future work will need to explore how stimulus-computable models such as the MCD can be integrated with broader frameworks of linguistic and predictive processing (e.g., Summerfield, 1987; Campbell, 2008; Peelle & Sommers, 2015; Tye-Murray et al., 2007).

      References

      Campbell, R. (2008). The processing of audio-visual speech: empirical and neural bases. Philosophical Transactions of the Royal Society B: Biological Sciences, 363(1493), 1001-1010. https://doi.org/10.1098/rstb.2007.2155

      Nidiffer, A. R., Diederich, A., Ramachandran, R., & Wallace, M. T. (2018). Multisensory perception reflects individual differences in processing temporal correlations. Scientific Reports 2018 8:1, 8(1), 1-15. https://doi.org/10.1038/s41598-018-32673-y

      Parise, C. V, & Ernst, M. O. (2025). Multisensory integration operates on correlated input from unimodal transient channels. ELife, 12. https://doi.org/10.7554/ELIFE.90841

      Peelle, J. E., & Sommers, M. S. (2015). Prediction and constraint in audiovisual speech perception. Cortex, 68, 169-181. https://doi.org/10.1016/j.cortex.2015.03.006

      Summerfield, Q. (1987). Some preliminaries to a comprehensive account of audio-visual speech perception. In B. Dodd & R. Campbell (Eds.), Hearing by Eye: The Psychology of Lip-Reading (pp. 3-51). Lawrence Erlbaum Associates.

      Tye-Murray, N., Sommers, M., & Spehar, B. (2007). Auditory and Visual Lexical Neighborhoods in Audiovisual Speech Perception: Trends in Amplification, 11(4), 233-241. https://doi.org/10.1177/1084713807307409

      Reviewer #2 (Public review):

      Summary:

      Building on previous models of multisensory integration (including their earlier correlation-detection framework used for non-spatial signals), the author introduces a population-level Multisensory Correlation Detector (MCD) that processes raw auditory and visual data. Crucially, it does not rely on abstracted parameters, as is common in normative Bayesian models," but rather works directly on the stimulus itself (i.e., individual pixels and audio samples). By systematically testing the model against a range of experiments spanning human, monkey, and rat data, the authors show that their MCD population approach robustly predicts perception and behavior across species with a relatively small (0-4) number of free parameters.

      Strengths:

      (1) Unlike prior Bayesian models that used simplified or parameterized inputs, the model here is explicitly computable from full natural stimuli. This resolves a key gap in understanding how the brain might extract "time offsets" or "disparities" from continuously changing audio-visual streams.

      (2) The same population MCD architecture captures a remarkable range of multisensory phenomena, from classical illusions (McGurk, ventriloquism) and synchrony judgments, to attentional/gaze behavior driven by audio-visual salience. This generality strongly supports the idea that a single low-level computation (correlation detection) can underlie many distinct multisensory effects.

      (3) By tuning model parameters to different temporal rhythms (e.g., faster in rodents, slower in humans), the MCD explains cross-species perceptual data without reconfiguring the underlying architecture.

      We thank the reviewer for their positive evaluation of the manuscript, and particularly for highlighting the significance of the model's stimulus-computable architecture and its broad applicability across species and paradigms. Please find our responses to the individual points below.

      Weaknesses:

      (1) The authors show how a correlation-based model can account for the various multisensory integration effects observed in previous studies. However, a comparison of how the two accounts differ would shed light on the correlation model being an implementation of the Bayesian computations (different levels in Marr's hierarchy) or making testable predictions that can distinguish between the two frameworks. For example, how uncertainty in the cue combined estimate is also the harmonic mean of the unimodal uncertainties is a prediction from the Bayesian model. So, how the MCD framework predicts this reduced uncertainty could be one potential difference (or similarity) to the Bayesian model.

      We fully agree with the reviewer that a comparison between the correlation-based MCD model and Bayesian accounts is valuable—particularly for clarifying how the two frameworks differ conceptually and where they may converge.

      As noted in the revised manuscript, the key distinction lies in the level of analysis described by Marr (1982). Bayesian models operate at the computational level, describing what the system is aiming to compute (e.g., optimal cue integration). In contrast, the MCD functions at the algorithmic level, offering a biologically plausible mechanism for how such integration might emerge from stimulus-driven representations.

      In this context, the MCD provides a concrete, stimulus-grounded account of how perceptual estimates might be constructed—potentially implementing computations with Bayesian-like characteristics (e.g., reduced uncertainty, cue weighting). Thus, the two models are not mutually exclusive but can be seen as complementary: the MCD may offer an algorithmic instantiation of computations that, at the abstract level, resemble Bayesian inference.

      We have now updated the manuscript to explicitly highlight this relationship (pages 2 and 11). In the revised manuscript, we also included a new figure (Figure 5) and movie (Supplementary Movie 3), to show how the present approach extends previous Bayesian models for the case of cue integration (i.e., the ventriloquist effect).

      (2) The authors show a good match for cue combination involving 2 cues. While Bayesian accounts provide a direction for extension to more cues (also seen empirically, for eg, in Hecht et al. 2008), discussion on how the MCD model extends to more cues would benefit the readers.

      We thank the reviewer for this insightful comment: extending the MCD model to include more than two sensory modalities is a natural and valuable next step. Indeed, one of the strengths of the MCD framework lies in its modularity. Let us consider the MCDcorr​ output (Equation 6), which is computed as the pointwise product of transient inputs across modalities. Extending this to include a third modality, such as touch, is straightforward: MCD units would simply multiply the transient channels from all three modalities, effectively acting as trimodal coincidence detectors that respond when all inputs are aligned in time and space.

      By contrast, extending MCDlag is less intuitive, due to its reliance on opponency between two subunits (via subtraction). A plausible solution is to compute MCDlag in a pairwise fashion (e.g., AV, VT, AT), capturing relative timing across modality pairs.

      Importantly, the bulk of the spatial integration in our framework is carried by MCDcorr, which generalises naturally to more than two modalities. We have now formalised this extension and included a graphical representation in a supplementary section of the revised manuscript.

      Likely Impact and Usefulness:

      The work offers a compelling unification of multiple multisensory tasks- temporal order judgments, illusions, Bayesian causal inference, and overt visual attention - under a single, fully stimulus-driven framework. Its success with natural stimuli should interest computational neuroscientists, systems neuroscientists, and machine learning scientists. This paper thus makes an important contribution to the field by moving beyond minimalistic lab stimuli, illustrating how raw audio and video can be integrated using elementary correlation analyses.

      Reviewer #1 (Recommendations for the authors):

      Recommendations:

      My biggest concern is a lack of specificity about model fitting, which is assuaged by the inclusion of sufficient detail to replicate the analysis completely or the inclusion of the analysis code. The code availability indicates a script for the population model will be included, but it is unclear if this code will provide the fitting details for the whole of the analysis.

      We thank the reviewer for raising this important point. A new methodological section has been added to the manuscript, detailing the model fitting procedures used throughout the study. In addition, the accompanying code repository now includes MATLAB scripts that allow full replication of the spatiotemporal MCD simulations.

      Perhaps it could be enlightening to re-evaluate the model with a measure of error rather than correlation? And I think many researchers would be interested in the model's performance on unseen data.

      The model has now been re-evaluated using mean squared error (MSE), and the results remain consistent with those obtained using Pearson correlation. Additionally, we have clarified which parts of the study involve testing the model on unseen data (i.e., data not used to fit the temporal constants of the units). These analyses are now included and discussed in the revised fitting section of the manuscript (pages 23-24).

      Otherwise, my concerns involve the interpretation of findings, and thus could be satisfied with minor rewording or tempering conclusions.

      The manuscript has been revised to address these interpretative concerns, with several conclusions reworded or tempered accordingly. All changes are marked in blue in the revised version.

      Miscellanea:

      Should b0 in equation 10 be bcrit to match the below text?

      Thank you for catching this inconsistency. We have corrected Equation 10 (and also Equation 21) to use the more transparent notation bcrit instead of b0, in line with the accompanying text.

      Equation 23, should time be averaged separately? For example, if multiple people are speaking, the average correlation for those frames will be higher than the average correlation across all times.

      We thank the reviewer for raising this thoughtful and important point. In response, we have clarified the notation of Equation 23 in the revised manuscript (page 20). Specifically, we now denote the averaging operations explicitly as spatial means and standard deviations across all pixel locations within each frame.

      This equation computes the z-score of the MCD correlation value at the current gaze location, normalized relative to the spatial distribution of correlation values in the same frame. That is, all operations are performed at the frame level, not across time. This ensures that temporally distinct events are treated independently and that the final measure reflects relative salience within each moment, not a global average over the stimulus. In other words, the spatial distribution of MCD activity is re-centered and rescaled at each frame, exactly to avoid the type of inflation or confounding the reviewer rightly cautioned against.

      Reviewer #2 (Recommendations for the authors):

      The authors have done a great job of providing a stimulus computable model of cue combination. I had just a few suggestions to strengthen the theoretical part of the paper:

      (1) While the authors have shown a good match between MCD and cue combination, some theoretical justification or equivalence analysis would benefit readers on how the two relate to each other. Something like Zhang et al. 2019 (which is for motion cue combination) would add to the paper.

      We agree that it is important to clarify the theoretical relationship between the Multisensory Correlation Detector (MCD) and normative models of cue integration, such as Bayesian combination. In the revised manuscript, we have now modified the introduction and added a paragraph in the Discussion addressing this link more explicitly. In brief, we see the MCD as an algorithmic-level implementation (in Marr’s terms) that may approximate or instantiate aspects of Bayesian inference.

      (2) Simulating cue combination for tasks that require integration of more than two cues (visual, auditory, haptic cues) would more strongly relate the correlation model to Bayesian cue combination. If that is a lot of work, at least discussing this would benefit the paper

      This point has now been addressed, and a new paragraph discussing the extension of the MCD model to tasks involving more than two sensory modalities has been added to the Discussion section.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public Review):

      Summary:

      Previous studies have shown that treatment with 17α-estradiol (a stereoisomer of the 17β-estradiol) extends lifespan in male mice but not in females. The current study by Li et al, aimed to identify cell-specific clusters and populations in the hypothalamus of aged male rats treated with 17α-estradiol (treated for 6 months). This study identifies genes and pathways affected by 17α-estradiol in the aged hypothalamus.

      Strengths:

      Using single-nucleus transcriptomic sequencing (snRNA-seq) on the hypothalamus from aged male rats treated with 17α-estradiol they show that 17α-estradiol significantly attenuated age-related increases in cellular metabolism, stress, and decreased synaptic activity in neurons.

      Thanks.

      Moreover, sc-analysis identified GnRH as one of the key mediators of 17α-estradiol's effects on energy homeostasis. Furthermore, they show that CRH neurons exhibited a senescent phenotype, suggesting a potential side effect of the 17α-estradiol. These conclusions are supported by supervised clustering by neuropeptides, hormones, and their receptors.

      Thanks.

      Weaknesses:

      However, the study has several limitations that reduce the strength of the key claims in the manuscript. In particular:

      (1) The study focused only on males and did not include comparisons with females. However, previous studies have shown that 17α-estradiol extends lifespan in a sex-specific manner in mice, affecting males but not females. Without the comparison with the female data, it's difficult to assess its relevance to the lifespan.

      This study was originally designed based on previous findings indicating that lifespan extension is only effective in males, leading to the exclusion of females from the analysis. The primary focus of our research was on the transcriptional changes and serum endocrine alterations induced by 17α-estradiol in aged males compared to untreated aged males. We believe that even in the absence of female subjects, the significant effects of 17α-estradiol on metabolism in the hypothalamus, synapses, and endocrine system remain evident, particularly regarding the expression levels of GnRH and testosterone. Notably, lower overall metabolism, increased synaptic activity, and elevated levels of GnRH and testosterone are strong indicators of health and well-being in males, supporting the validity of our primary conclusions. However, including female controls would enhance the depth of our findings. If female controls were incorporated, we propose redesigning the sample groups to include aged male control, aged female control, aged female treated, aged male treated, as well as young male control, young male treated, young female control, and young female treated. We regret that we cannot provide this data in the short term. Nevertheless, we believe this reviewer’s creative idea presents a valuable avenue for future research on this topic. In this study, we emphasize the role of 17α-estradiol in overall metabolism, synaptic function, GnRH, and testosterone in aged males and underscore the importance of supervised clustering of neuropeptide-secreting neurons in the hypothalamus.

      (2) It is not known whether 17α-estradiol leads to lifespan extension in male rats similar to male mice. Therefore, it is not possible to conclude that the observed effects in the hypothalamus, are linked to the lifespan extension.

      Thanks for the reminding. 17α-estradiol was reported to extend lifespan in male rats similar to male mice (PMID: 33289482). We have added the valuable reference to introduction in the new version.  

      (3) The effect of 17α-estradiol on non-neuronal cells such as microglia and astrocytes is not well-described (Figure 1). Previous studies demonstrated that 17α-estradiol reduces microgliosis and astrogliosis in the hypothalamus of aged male mice. Current data suggest that the proportion of oligo, and microglia were increased by the drug treatment, while the proportions of astrocytes were decreased. These data might suggest possible species differences, differences in the treatment regimen, or differences in drug efficiency. This has to be discussed.

      We have reviewed reports describing changes in cell numbers following 17α-estradiol treatment in the brain, using the keywords "17α-estradiol," "17alpha-estradiol," and "microglia" or "astrocyte." Only a limited amount of data was obtained. We found one article indicating that 17α-estradiol treatment in Tg (AβPP(swe)/PS1(ΔE9)) model mice resulted in a decreased microglial cell number compared to the placebo (AβPP(swe)/PS1(ΔE9) mice), but this change was not significant when compared to the non-transgenic control (PMID: 21157032). The transgenic AβPP(swe)/PS1(ΔE9) mouse model may differ from our wild-type aging rat model in this context.

      Moreover, the calculation of cell numbers was based on visual observation under a microscope across several brain tissue slices. This traditional method often yields controversial results. For example, oligodendrocytes in the corpus callosum, fornix, and spinal cord have been reported to be 20-40% more numerous in males than in females based on microscopic observations (PMID: 16452667). In contrast, another study found no significant difference in the number of oligodendrocytes between sexes when using immunohistochemistry staining (PMID: 18709647). Such discrepancies arising from traditional observational methods are inevitable.

      We believe the data presented in this article are reliable because the cell number and cell ratio data were derived from high-throughput cell counting of the entire hypothalamus using single-cell suspension and droplet wrapping (10x Genomics).

      (4) A more detailed analysis of glial cell types within the hypothalamus in response to drugs should be provided.

      We provided more enrichment analysis data of differentially expressed genes between Y, O, and O.T in microglia and astrocytes in Figure 2—figure supplement 3. In this supplemental data, we found unlike that in neurons, Micro displayed lower levels of synapse-related cellular processes in O.T. compared to O.

      (5) The conclusion that CRH neurons are going into senescence is not clearly supported by the data. A more detailed analysis of the hypothalamus such as histological examination to assess cellular senescence markers in CRH neurons, is needed to support this claim.

      We also noted the inappropriate claim and have changed "senescent phenotype" to "stressed phenotype" and "abnormal phenotype" in both the abstract and results sections. The stressed phenotype could be induced by heightened functional activity in the cells, potentially indicating higher cellular activity. The GnRH and CRH neurons discussed in this paper may represent such a case, as illustrated by the observed high serum GnRH, testosterone, and cortisol levels. This revision suggestion is highly valuable and constructive for our understanding of the unique physiological characteristics revealed by these data.

      Reviewer #2 (Public Review):

      Summary:

      Li et al. investigated the potential anti-ageing role of 17α-Estradiol on the hypothalamus of aged rats. To achieve this, they employed a very sophisticated method for single-cell genomic analysis that allowed them to analyze effects on various groups of neurons and non-neuronal cells. They were able to sub-categorize neurons according to their capacity to produce specific neurotransmitters, receptors, or hormones. They found that 17α-Estradiol treatment led to an improvement in several factors related to metabolism and synaptic transmission by bringing the expression levels of many of the genes of these pathways closer or to the same levels as those of young rats, reversing the ageing effect. Interestingly, among all neuronal groups, the proportion of Oxytocin-expressing neurons seems to be the one most significantly changing after treatment with 17α-Estradiol, suggesting an important role of these neurons in mediating its anti-ageing effects. This was also supported by an increase in circulating levels of oxytocin. It was also found that gene expression of corticotropin-releasing hormone neurons was significantly impacted by 17α-Estradiol even though it was not different between aged and young rats, suggesting that these neurons could be responsible for side effects related to this treatment. This article revealed some potential targets that should be further investigated in future studies regarding the role of 17α-Estradiol treatment in aged males.

      Strengths:

      (1) Single-nucleus mRNA sequencing is a very powerful method for gene expression analysis and clustering. The supervised clustering of neurons was very helpful in revealing otherwise invisible differences between neuronal groups and helped identify specific neuronal populations as targets.

      Thanks.

      (2) There is a variety of functions used that allow the differential analysis of a very complex type of data. This led to a better comparison between the different groups on many levels.

      Thanks.

      (3) There were some physiological parameters measured such as circulating hormone levels that helped the interpretation of the effects of the changes in hypothalamic gene expression

      Thanks.

      Weaknesses

      (1) One main control group is missing from the study, the young males treated with 17α-Estradiol.

      Given that the treatment period lasts six months, which extends beyond the young male rats' age range, we aimed to investigate the perturbation of 17α-Estradiol on the normal aging process. Including data from young males could potentially obscure the treatment's effects in aged males due to age effects, though similar effects between young and aged animals may exist. Long-term treatment of hormone may exert more developmental effects on the young than the old. Consequently, we decided to exclude this group from our initial sample design. We apologize for this omission.

      (2) Even though the technical approach is a sophisticated one, analyzing the whole rat hypothalamus instead of specific nuclei or subregions makes the study weaker.

      The precise targets of 17α-Estradiol within the hypothalamus remain unresolved. Selecting a specific nucleus for study is challenging. The supervised clustering method described in this manuscript allows us to identify the more sensitive neuron subtypes influenced by 17α-Estradiol and aging across the entire hypothalamus, without the need to isolate specific nuclei in a disturbed hypothalamic environment.

      (3) Although the authors claim to have several findings, the data fail to support these claims. You may mean the claim as the senescent phenotype in Crh neuron induced by 17a-estradiol.

      Thanks. We have changed the "senescent phenotype" to "stressed phenotype" in the abstract and results to avoid such claim. The stressed phenotype may be induced by heightened functional activity in the cells, potentially indicating higher cellular activity.

      (4) The study is about improving ageing but no physiological data from the study demonstrated such a claim with the exception of the testes histology which was not properly analyzed and was not even significantly different between the groups.

      The primary objective of this study is to elucidate the effects of 17α-Estradiol on the endocrine system in the aging hypothalamus; exploring anti-aging effects is not the main focus. From the characteristics of the aging hypothalamus, we know that down-regulated GnRH and testosterone levels, along with elevated mTOR signaling, are indicators of aging in these organs from previous publications (PMID: 37886966, PMID: 37048056, PMID: 22884327). The contrasting signaling networks related to metabolism and synaptic processes significantly differentiate young and aging hypothalami, and 17α-Estradiol helps rebalance these networks, suggesting its potential anti-aging effects.

      (5) Overall, the study remains descriptive with no physiological data to demonstrate that any of the effects on hypothalamic gene expression are related to metabolic, synaptic, or other functions.

      The study focuses on investigating cellular responses and endocrine changes in the aging hypothalamus induced by 17α-estradiol, utilizing single-nucleus RNA sequencing (snRNA-seq) and a novel data mining methodology to analyze various neuron subtypes. It is important to note that this study does not mainly aim to explore the anti-aging effects. Consequently, we have revised the claim in the abstract from “the effects of 17α-estradiol in anti-aging in neurons” to “the effects of 17α-estradiol on aging neurons.” We observed that the lower overall metabolism and increased expression levels of cellular processes in the synapses align with findings previously reported regarding 17α-estradiol. To address the lack of physiological data and the challenges in measuring multiple endocrine factors due to their volatile nature, we employed several bidirectional Mendelian analyses of various genome-wide association study (GWAS) data related to these serum endocrine factors to identify their mutual causal effects.

      Reviewing Editor Comment:

      Based on the Public Reviews and Recommendations for Authors, the Reviewers strongly recommend that revisions include an experimental demonstration of the physiological effects of the treatment on ageing in rats as well as the CRH-senescence link. Additional analysis of the glia would greatly strengthen the study, as would inclusion of females and young male controls. The important point was also raised that the work linking 17a-estradiol was performed in mice, and the link with lifespan in rats is not known. Discussion of this point is recommended.

      We thank the reviewers for their constructive feedback. Regarding the recommendations in the Public Reviews and Recommendations for Authors:

      a)  Physiological effects & CRH-senescence link:

      We acknowledge that 17α-estradiol has been reported to extend lifespan in male rats, consistent with findings in male mice (PMID: 33289482). This point has now been noted in the Introduction. We regret that further experimental validation of the treatment's physiological effects on aging in rats was beyond the scope of this study.

      b) Phenotype terminology:

      In response to concerns about the "senescent" characterization of CRH neurons, we have revised this terminology to "stressed phenotype" throughout the abstract and results. While we were unable to conduct additional experiments to confirm senescence markers, this revised description better reflects the heightened cellular activity observed (as evidenced by elevated serum GnRH and testosterone levels), without implying confirmed senescence.

      c) Glial cell analysis:

      To address questions about glial cell function during treatment, we have added new enrichment analysis data of differentially expressed genes in microglia and astrocytes from young (Y), old (O), and old treated (O.T) groups in Figure 2—figure supplement 3. This analysis reveals that microglia exhibit contrasting synaptic-related cellular processes compared to total neurons.

      d) Female and young controls:

      We sincerely apologize for the absence of female subjects and young male controls in the current study. The reviewers' suggestion to examine the male-specific effects of 17α-estradiol using female controls represents an excellent direction for future research, which we plan to pursue in upcoming studies.

      Reviewer #2 (Recommendations For The Authors):

      General comments:

      (1) The manuscript is very hard to read. Proofreading and editing by software or a professional seems necessary. The words "enhanced", "extensive" etc. are not always used in the right way.

      Thanks for the suggestion. We have revised the proofreading and editing. The words "enhanced" and "extensive" were also revised in most sentences.

      (2) The numbers of animals and samples are not well explained. Is it 9 rats overall or per group? If there are 8 testes samples per group, should we assume that there were 4 rats per group? The pooling of the hypothalamic how was it done? Were all the hypothalamic from each group pooled together? A small table with the animals per group and the samples would help.

      We appreciate your reminder regarding the initial mistake in our manuscript preparation. In the preliminary submission, we reported 9 rats based solely on sequencing data and data mining. The revised version (v1) now includes additional experimental data, with an effective total of 12 animals (4 per group). Unfortunately, we overlooked updating this information in the v1 submission. We have since added detailed information in the Materials and Methods sections: Animals, Treatment and Tissues, and snRNA-seq Data Processing, Batch Effect Correction, and Cell Subset Annotation.

      (3) The Clustering is wrong. There are genes in there that do not fall into any of the 3 categories: Neurotransmitters, Receptors, Hormones.

      We acknowledge the error in gene clustering and have implemented the following corrections:

      (a) The description has been updated to state: 'Vast majority of these subtypes were clustered by neuropeptides, hormones, and their receptors among all neurons.'

      (b) Genes not belonging to these three categories have been substantially removed.

      (c) The neuropeptide category (now including several growth hormones) has been expanded to 104 genes, while their corresponding receptors (including several sex hormone receptors) now comprise 105 genes.

      (4) The coloring of groups in the graphs is inconsistent. It must be more homogeneous to make it easier to identify.

      We have changed the colors of groups in Fig. 1D to make the color of cell clusters consistent in Fig. 1A-D.

      (5) The groups c1-c4 are not well explained. How did the authors come up with these?

      We have added more descriptions of c1-c4 in materials and methods in the new version.

      (6) In most cases it's not clear if the authors are talking about cell numbers that express a certain mRNA, the level of expression of a certain mRNA, or both. They need to do a better job using more precise descriptions instead of using general terms such as "signatures", "expression profiles", "affected neurons" etc. It is very hard to understand if the number of neurons is compared between the groups or the gene expression.

      We have changed the "signatures" to "gene signatures" to make it more accurate in meaning. The "affected neurons" were also changed to "sensitive neurons". But sorry that we were not able to find better alternatives to the "expression profiles".

      (7) Sometimes there are claims made without justification or a reference. For example, the claim about the senescence of CRH neurons due to the upregulation of mitochondrial genes and downregulation of adherence junction genes (lines 326-328) should be supported by a reference or own findings.

      The "senescence" here is not appropriate. We have changed it to "stressed phenotype" or "aberrant changes" in abstract and results.

      (8) Young males treated with Estradiol as a control group is necessary and it is missing.

      Your suggestion is appreciated; however, the treatment duration for aged mice (O.T) was set at 6 months, while the young mice were only 4 months old. This disparity makes it challenging to align treatment timelines for the young animals. The primary aim of this study is to investigate the perturbation of 17α-estradiol on the aging process, and any distinct effects due to age effect observed in young males might complicate our understanding of its role in aged males, though similar endocrine effects may exist in the young animals. Long-term treatment of hormone may exert more developmental effects on the young than the old. Therefore, we made the decision to exclude the young samples in our initial study design. We apologize for any confusion this may have caused.

      Specific Comments:

      Line 28: "elevated stresses and decreased synaptic activity": Please make this clearer. Can't claim changes in synaptic activity by gene expression.

      We have changed it to "the expression level of pathways involved in synapse"

      Line 32: "increased Oxytocin": serum Oxytocin.

      We have added the “serum”.

      Line 52 - 54: Any studies from rats?

      Thanks. In rats there is also reported that 17α-estradiol has similar metabolic roles as that in mice (PMID: 33289482) and we have added it to the refences. It’s very useful for this manuscript.

      Line 62 - 65: It wasn't investigated thoroughly in this paper so why was it suggested in the introduction?

      We have deleted this sentence as being suggested.

      Line 70: "synaptic activity" Same as line 28.

      We have changed it to "pathways involved in synaptic activity".

      Line 79: Why were aged rats caged alone and young by two? Could that introduce hypothalamic gene expression effects?

      The young males were bred together in peace. But the aged males will fight and should be kept alone.

      Lines 78, 99, 109-110: It is not clear how many animals per group were used and how many samples per group were used separately and/or grouped. Please be more specific.

      We have added these information to Materials and methods/Animals, treatment and tissues and Materials and methods/snRNA-seq data processing, batch effect correction, and cell subset annotation.

      Line 205: "in O" please add "versus young.".

      We have changed accordingly.

      Line 207: replace "were" with "was"

      We have alternatively changed the "proportion" to "proportions".

      Line 208: replace "that" with "compared to" and after "in O.T." add "compared to?"

      We have changed accordingly.

      Line 223: "O.T." compared to what? Figure?

      We have changed it accordingly.

      Line 227: Figure?

      We have added (Figure 1E) accordingly.

      Line 229: "synaptic activity" Same as line 28.

      We have revised it.

      Line 235: "synaptic activity" and "neuropeptide secretion" Same as line 28.

      We have revised it.

      Line 256:" interfered" please revise.

      We changed to "exerted".

      Line 263: "on the contrary" please revise.

      We have changed "on the contrary" to "opposite".

      Line 270: "conversed" did you mean "conserved"?

      We have changed "conversed" to "inversed".

      Line 296-298: Please explain. Why would these be side effects?

      It’s hard to explain, therefore, we deleted the words "side effects".

      Line 308: "synaptic activity" Same as line 28.

      We have changed it to "expression levels of synapse-related cellular processes".

      Line 314: "and sex hormone secretion and signaling"Isn't this expected?

      Yes, it is expected. We have added it to the sentence "and, as expected, sex hormone secretion and signaling".

      Line 325-328: Why is this senescence? Reference?

      We have added “potent” to it.

      Line 360-361: This doesn't show elevated synaptic activity.

      "elevated synaptic activity" was changed to "The elevated expression of synapse-related pathways"

      Line 363-364: "Unfortunately" is not a scientific expression and show bias.

      We have changed it to "Notably".

      Line 376: Similar as above.

      Yes, we have change it to "in contrast".

      Lines 382-385: This is speculation. Please move to discussion.

      Sorry for that. We think the causal effects derived from MR result is evidence. As such, we have not changed it.

      Line 389: Please revise "hormone expressing".

      We have changed it accordingly.

      Line 401: Isn't this effect expected due to feedback inhibition of the biochemical pathway? Please comment.

      The binding capability of 17alpha-estradiol to estrogen receptors and its role in transcriptional activation remain core questions surrounded by controversy. Earlier studies suggest that 17alpha-estradiol exhibits at least 200 times less activity than 17beta-estradiol (PMID: 2249627, PMID: 16024755). However, recent data indicate that 17alpha-estradiol shows comparable genomic binding and transcriptional activation through estrogen receptor α (Esr1) to that of 17beta-estradiol (PMID: 33289482). Additionally, there is evidence that 17alpha-estradiol has anti-estrogenic effects in rats (PMID: 16042770). These findings imply possible feedback inhibition via estrogen receptors. Furthermore, 17alpha-estradiol likely differs from 17beta-estradiol due to its unique metabolic consequences and its potential to slow aging in males, an effect not attributed to 17beta-estradiol. For instance, neurons are also targets of 17alpha-estradiol, with Esr1 not being the sole target (PMID: 38776045). Intriguingly, neurons expressing Ar and Esr1 ranked among the top 20 most perturbed receptor subtypes during aging (O vs Y), but were no longer ranked in this group following treatment (O.T vs Y and O.T vs O comparisons). This indicates that 17α-estradiol administration attenuated age-associated perturbation in these neuronal subtypes, which may be a consequence of potential feedback (Figure 3D). Nevertheless, the precise effective targets of 17alpha-estradiol are still unresolved.

      Line 409: This conclusion cannot be made because the effect is not statistically significant. Can say "trend" etc.

      Thanks for the recommendation. We have added "potential" in front of the conclusion.

      Line 426: "suggesting" please revise.

      sorry, it’s a verb.

      Lines 426-428: This is speculation. Please move to discussion.

      The elevated GnRH levels in O.T., observed through EIA analysis, suggest a deduction regarding the direct causal effects of 17alpha-estradiol on various endocrine factors related to feeding, energy homeostasis, reproduction, osmotic regulation, stress response, and neuronal plasticity through MR analysis. Thus, we have not amended our position. We apologize for any confusion.

      Lines 431-432: improved compared to what?

      The statement have been revised as " The most striking role of 17α-estradiol treatment revealed in this study showed that HPG axis was substantially improved in the levels of serum Gnrh and testosterone".

      Line 435: " Estrogen Receptor Antagonists". Please revise.

      Thanks for the recommendation. We have changed it to "estrogen receptor antagonists".

      Line 438" "Secrete". Please revise

      Sorry, it is "secret".

      Lines 439-449: None of this has been demonstrated. Please remove these conclusions.

      We appreciate the reviewer's scrutiny regarding lines 439-449. While these statements should not be interpreted as definitive conclusions from our current data, we propose they serve as clinically relevant discussion points worthy of exploration. Our findings demonstrate 17α-estradiol's role in modulating testosterone levels in aged males. This mechanistic insight warrants consideration of its therapeutic potential for age-related hypogonadism - a hypothesis we believe merits discussion given the compound's specific endocrine effects.

      Lines 450-457: No females were included in this study. Why? Also, why is this discussed? It is relevant but doesn't belong in this manuscript since it was not studied here.

      Testosterone levels are crucial for male health, while estradiol levels are essential for the health and fertility of females. Previous studies have demonstrated that 17α-estradiol does not contribute to lifespan extension in females. Given the effects of 17α-estradiol on males—specifically, its role in promoting testosterone and reducing estradiol levels—we believe it is important to discuss the potential sex-biased effects of 17α-estradiol, as this could inform future investigations. We have refined this section to clarify that these points represent mechanistic hypotheses derived from our male data and existing literature, not conclusions about unstudied female physiology. This framing maintains the discussion's scientific value while respecting the study's scope.

      Lines 458-459: This was not demonstrated in this article. Please remove.

      We have restricted the claim to "expression level of energy metabolism in hypothalamic neurons".

      Line 464: "Promoted lifespan extension" Not demonstrated. Please remove.

      At the end of the sentence it was revised as "which may be a contributing factor in promoting lifespan extension".

      Line 466: "Showed" No.

      The whole sentence was deleted in the new version.

      Line 483: "the sex-based effects". Not studied here.

      Since the changes in testosterone levels are significant in this dataset and this hormone has a sex-biased nature, we find it worthwhile to suggest this as a topic for future investigation. We have added "which needs further verification in the future" at the end of this sentence.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      In this manuscript, Dillard and colleagues integrate cross-species genomic data with a systems approach to identify potential driver genes underlying human GWAS loci and establish the cell type(s) within which these genes act and potentially drive disease. Specifically, they utilize a large single-cell RNA-seq (scRNA-seq) dataset from an osteogenic cell culture model - bone marrow-derived stromal cells cultured under osteogenic conditions (BMSC-OBs) - from a genetically diverse outbred mouse population called the Diversity Outbred (DO) stock to discover network driver genes that likely underlie human bone mineral density (BMD) GWAS loci. The DO mice segregate over 40M single nucleotide variants, many of which affect gene expression levels, therefore making this an ideal population for systems genetic and co-expression analyses. The current study builds on previously published work from the same group that used co-expression analysis to identify co-expressed "modules" of genes that were enriched for BMD GWAS associations. In this study, the authors utilize a much larger scRNA-seq dataset from 80 DO BMSC-OBs, infer co-expression-based and Bayesian networks for each identified mesenchymal cell type, focused on networks with dynamic expression trajectories that are most likely driving differentiation of BMSC-OBs, and then prioritized genes ("differentiation driver genes" or DDGs) in these osteogenic differentiation networks that had known expression or splicing QTLs (eQTL/sQTLs) in any GTEx tissue that colocalized with human BMD GWAS loci. The systems analysis is impressive, the experimental methods are described in detail, and the experiments appear to be carefully done. The computational analysis of the single-cell data is comprehensive and thorough, and the evidence presented in support of the identified DDGs, including Tpx2 and Fgfrl1, is for the most part convincing. Some limitations in the data resources and methods hamper enthusiasm somewhat and are discussed below. Overall, while this study will no doubt be valuable to the BMD community, the cross-species data integration and analytical framework may be more valuable and generally applicable to the study of other diseases, especially for diseases with robust human GWAS data but for which robust human genomic data in relevant cell types is lacking. 

      Specific strengths of the study include the large scRNA-seq dataset on BMSC-OBs from 80 DO mice, the clustering analysis to identify specific cell types and sub-types, the comparison of cell type frequencies across the DO mice, and the CELLECT analysis to prioritize cell clusters that are enriched for BMD heritability (Figure 1). The network analysis pipeline outlined in Figure 2 is also a strength, as is the pseudotime trajectory analysis (results in Figure 3). One weakness involves the focus on genes that were previously identified as having an eQTL or sQTL in any GTEx tissue. The authors rightly point out that the GTEx database does not contain data for bone tissue, but the reason that eQTLs can be shared across many tissues - this assumption is valid for many cis-eQTLs, but it could also exclude many genes as potential DDGs with effects that are specific to bone/osteoblasts. Indeed, the authors show that important BMD driver genes have cell-type-specific eQTLs. Furthermore, the mesenchymal cell type-specific co-expression analysis by iterative WGCNA identified an average of 76 co-expression modules per cell cluster (range 26-153). Based on the limited number of genes that are detected as expressed in a given cell due to sparse per-cell read depth (400-6200 reads/cell) and dropouts, it's hard to believe that as many as 153 co-expression modules could be distinguished within any cell cluster. I would suspect some degree of model overfitting here and would expect that many/most of these identified modules have very few gene members, but the methods list a minimum module size of 20 genes. How do the numbers of modules identified in this study compare to other published scRNA-seq studies that use iterative WGCNA? 

      In the section "Identification of differentiation driver genes (DDGs)", the authors identified 408 significant DDGs and found that 49 (12%) were reported by the International Mouse Knockout [sic] Consortium (IMPC) as having a significant effect on whole-body BMD when knocked out in mice. Is this enrichment significant? E.g., what is the background percentage of IMPC gene knockouts that show an effect on whole-body BMD? Similarly, they found that 21 of the 408 DDGs were genes that have BMD GWAS associations that colocalize with GTEx eQTLs/sQTLs. Given that there are > 1,000 BMD GWAS associations, is this enrichment (21/408) significant? Recommend performing a hypergeometric test to provide statistical context to the reported overlaps here. 

      We thank the reviewer for their constructive feedback and thoughtful questions. In regards to the iterativeWGCNA, a larger number of modules is sometimes an outcome of the analysis, as reported in the iterativeWGCNA preprint (Greenfest-Allen et al., 2017). While we did not make a comparison to other works leveraging this tool for scRNA-seq, it has been used broadly across other published studies, such as PMID: 39640571, 40075303, 33677398, 33653874. While model overfitting, as you mention, may be a cause for more modules, our Bayesian network analysis we perform after iterativeWGCNA highlights smaller aspects of coexpression modules, as opposed to focusing on the entirety of any given module.

      We did not perform enrichment or statistical tests as our goal was to simply highlight attributes or unique features of these genes for additional context.

      Reviewer #2 (Public review): 

      Summary: 

      In this manuscript, Farber and colleagues have performed single-cell RNAseq analysis on bone marrow-derived stem cells from DO Mice. By performing network analysis, they look for driver genes that are associated with bone mineral density GWAS associations. They identify two genes as potential candidates to showcase the utility of this approach. 

      Strengths: 

      The study is very thorough and the approach is innovative and exciting. The manuscript contains some interesting data relating to how cell differentiation is occurring and the effects of genetics on this process. The section looking for genes with eQTLs that differ across the differentiation trajectory (Figure 4) was particularly exciting. 

      Weaknesses: 

      The manuscript is in parts hard to read due to the use of acronyms and there are some questions about data analysis that need to be addressed. 

      We thank the reviewer for their feedback and shared enthusiasm for our work. We tried to minimize the use of technical acronyms as much as we could without compromising readability. Additionally, we addressed questions regarding aspects of data analysis. 

      Reviewer #1 (Recommendations for the authors):

      (1) For increased transparency and to allow reproducibility, it would be necessary for the scripts used in the analysis to be shared along with the publication of the preprint. Also, where feasible, sharing the processed data in addition to the raw data would allow the community greater access to the results and be highly beneficial. 

      Thank you for this suggestion. The raw data will be available via GEO accession codes listed in the data availability statement. We will make available scripts for some analyses on our Github (https://github.com/Farber-Lab/DO80_project) and processed scRNA-seq data in a Seurat object (.rds) on Zenodo (https://zenodo.org/records/15299631)

      (2) Lines 55-76: I think the summary of previous work here is too long. I understand that they would like to cover what has been done previously, but this seems like overkill. 

      Good suggestion. We have streamlined some of the summary of our previous work.

      (3) Did the authors try to map QTL for cell-type proportion differences in their BMSC-OBs? While 80 samples certainly limit mapping power, the data shown in Figs 4C/D suggest that you might identify a large-effect modifier of LMP/OB1 proportions. 

      We did try to map QTL for cell type proportion differences, but no significant associations were identified. 

      (4) Methods question: Does the read alignment method used in your analysis account for SNPs/indels that segregate among the DO/CC founder strains? If not, the authors may wish to include this in their discussion of study limitations and speculate on how unmapped reads could affect expression results. 

      The read alignment method we used does not account for SNPs/indels from the DO founder strains that fall in RNA transcripts captured in the scRNA-seq data. We have included this as a limitation in our discussion (line 422-424). 

      (5) Much of the discussion reads as an overview of the methods, while a discussion of the results and their context to the existing BMD literature is relatively lacking in comparison.

      We have added additional explanation of the results and context to the discussion (line 381-382, 396-407). 

      (6) Figure 1E and lines 146-149: Adjusted p values should be reported in the figure and accompanying text instead of switching between unadjusted and adjusted p values. 

      We updated Figure 1e to portray adjusted p-values, listed the adjusted p-values in legend of Figure 1e, and listed them in the main text (line 153-154).

      (7) Why do the authors bring the IMPC KO gene list into the analysis so late? This seems like a highly relevant data resource (moreso than the GTEx eQTLs/sQTLs) that could have been used much earlier to help identify DDGs. 

      Given that our scRNA-seq data is also from mice, we did choose to integrate information from the IMPC to highlight supplemental features of genes in networks (i.e., genes that have an experimentally-tested and significant effect on BMD in mice). However, our primary goal was to inform human GWAS and leverage our previous work in which we identified colocalizations between human BMD GWAS and eQTL/sQTL in a human GTEx tissue, which is why this information was used to guide our network analysis.

      (8) Does Fgfrl1 and/or Tpx2 have a cis-eQTL in your BMSC-OB scRNA-seq dataset? 

      We did not identify cis-eQTL effects for Fgfrl1 and Tpx2.

      (9) Figure 4B-C: These eQTLs may be real, but based on the diplotype patterns in Figure 4C, I suspect they are artifacts of low mapping power that are driven by rare genotype classes with one or two samples having outlier expression results. For example, if you look at the results in Fig 4C for S100a1 expression, the genotype classes with the highest/lowest expression have lower sample numbers. In the case of Pkm eQTL showing a PWK-low effect, the PWK genome has many SNPs that differ from the reference genome in the 3' UTR of this gene, and I wonder if reads overlapping these SNPs are not aligning correctly (see point 4 above) and resulting (falsely) in lower expression values for samples with a PWK haplotype. 

      As mentioned above, our alignment method did not consider DO founder genetic variation that is specifically located in the 3’ end of RNA transcripts in the scRNA-seq data. We have included this as a limitation in our discussion (line 422-424).

      In future studies, we intend to include larger populations of mice to potentially overcome, as you mention, any artifacts that may be attributable to low statistical power, rare genotype classes, or outlier expression.

      Reviewer #2 (Recommendations for the authors):

      Major Points 

      (1) The authors hypothesize "that many genes impacting BMD do so by influencing osteogenic differentiation or possibly bone marrow adipogenic differentiation". However, cell type itself does not correlate with any bone trait. Does this indicate that the hypothesis is not entirely correct, as genes that drive these phenotypes would not be enriched in one particular cell type? The authors have previously identified "high-priority target genes". So, are there any cell types that are enriched for these target genes? If not, this would indicate that all these genes are more ubiquitously expressed and this is probably why they would have a greater effect on the overall bone traits. Furthermore, are the 73 eGenes (so genes with eQTLs in a particular cell type that change around cell type boundaries) or the DDGs (Table 1) enriched for these high-priority target genes? 

      The bone traits measured in the DO mice are complex and impacted by many factors, including the differentiation propensity and abundance of certain cell types, both within and outside of bone. Though we did not identify correlations between cell type abundance and the bone traits we measured, we tailored our investigations to focus on cellular differentiation using the scRNA-seq data. However, future studies would need to be performed to investigate any connections between cellular differentiation, cell type abundance, and bone traits.

      We did not perform enrichment analyses of either the target genes identified from our other work or eGenes identified here, but instead used the target gene list to center our network analysis and the eGenes to showcase the utility of the DO mouse population.

      (2) The readability of the paper could be improved by minimising the use of acronyms and there are several instances of confusing wording throughout the paper. In many cases, this can be solved by re-organising sentences and adding a bit more detail. For example, it was unclear how you arrived at Fgfrl1 or Tpx2.

      One of the goals of our study was to identify genes that have (to our knowledge) little to no known connection to BMD. We chose to highlight Fgfrl1 and Tpx2 because there is minimal literature characterizing these genes in the context of bone, which we speak to in the results (line 296-297). Additionally, we prioritized these genes in our previous work and they were identified in this study by using our network analyses using the scRNA-seq data, which we mention in the results (line 276-279).

      (3) Technical aspects of the assay. In Figure 1d you show that the cell populations vary considerably between different DO mice. It would be useful to give some sense of the technical variance of this assay given that the assay involves culturing the cells in an exogenous environment. This could take the form of tests between mice within the same inbred strain, or even between different legs of the same DO mice to show that results are technically very consistent. It might also be prudent to identify that this is a potential limitation of the approach as in vitro culturing has the potential to substantially change the cell populations that are present. 

      We agree that in vitro culturing, in addition to the preparation of single cells for scRNA-seq, are unavoidable sources of technical variation in this study. However, the total number of cells contributed by each of the 80 DO mice after data processing does not appear to be skewed and the distribution appears normal (see added figures, now included as Supplemental Figure 3). Therefore, technical variation is at least consistent across all samples. Nevertheless, we have mentioned the potential for technical variation artifacts in our study in the discussion (line 414-416).

      (4) Need for permutation testing. "We identified 563 genes regulated by a significant eQTL in specific cell types. In total, 73 genes with eQTLs were also tradeSeq-identified genes in one or more cell type boundaries". These types of statements are fine but they need to be backed up with permutation testing to show that this level of enrichment is greater than one would expect by chance. 

      We did not perform enrichment tests as our only goal was to 1. determine if eQTL could be resolved in the DO mouse population using our scRNA-seq data and 2. predict in what cell type the associated eQTL and associated eGene may have an effect.

      (5) The main novelty of the paper seems to be that you have used single-cell RNA seq (given that you appear to have already detailed the candidates at the end). I don't think this makes the paper less interesting, but I think you need to reframe the paper more about the approach, and not the specific results. How you landed on these candidates is also not clear. So the paper might be improved by more robustly establishing the workflow and providing guidelines for how studies like this should be conducted in the future. 

      We sought to not only devise a rigorous approach to analyze our single cell data, but also showcase the utility of the approach in practice by highlighting targets for future research (i.e., Fgfrl1 and Tpx2).

      Our goal was to identify novel genes and we landed on these candidate genes (Fgfrl1 and Tpx2) because they had substantial data supporting their causality and they have yet to be fully characterized in the context of bone and BMD (line 295-297).

      In regards to establishing the workflow, we have included rationale for specific aspects of our approach throughout the paper. For example, Figure 2 itemizes each step of our network analysis and we explain why each step is utilized throughout various parts results (e.g., lines 168-170, 179-181, 191-193, 202-203, 257-260, 276-277).

      We have added a statement advocating for large-scale scRNA-seq from genetically diverse samples and network analyses for future studies (line 436-438).

      Minor Points 

      (1) In the summary you use the word "trajectory". Trajectories for what? I assume the transition between cell types, but this is not clear. 

      We added text to clarify the use of trajectory in the summary (line 34).

      (2) This sentence: "By 60 identifying networks enriched for genes implicated in GWAS we predicted putatively causal genes 61 for hundreds of BMD associations based on their membership in enriched modules." is also not clear. Do you mean: we predicted putatively causal genes by identifying clusters of co-expressed genes that were enriched for GWAS genes?" It is not clear how you identify the causal gene in the network. Is this just based on the hub gene? 

      The aforementioned sentence has since been removed to streamline the introduction, as suggested by Reviewer 1.

      In regards to causal gene identification, it is not based on whether it is hub gene. We prioritized a DDG (and their associated networks) if it was a causal gene that we identified in our previous work as having eQTL/sQTL in a GTEx tissue that colocalizes with human BMD GWAS.

      (3) Figure 3C. This is good but the labels are quite small. Would be good to make all the font sizes larger. 

      We have enlarged Figure 3C.

      (4) Line 341 in the Discussion should be "pseudotemporal". 

      We have edited “temporal” to “pseduotemporal”.

    1. Reviewer #1 (Public review):

      This is a well-designed and very interesting study examining the impact of imprecise feedback on outcomes on decision-making. I think this is an important addition to the literature and the results here, which provide a computational account of several decision-making biases, are insightful and interesting.

      I do not believe I have substantive concerns related to the actual results presented; my concerns are more related to the framing of some of the work. My main concern is regarding the assertion that the results prove that non-normative and non-Bayesian learning is taking place. I agree with the authors that their results demonstrate that people will make decisions in ways that demonstrate deviations from what would be optimal for maximizing reward in their task under a strict application of Bayes rule. I also agree that they have built reinforcement learning models which do a good job of accounting for the observed behavior. However, the Bayesian models included are rather simple- per the author descriptions, applications of Bayes' rule with either fixed or learned credibility for the feedback agents. In contrast, several versions of the RL models are used, each modified to account for different possible biases. However more complex Bayes-based models exist, notably active inference but even the hierarchical gaussian filter. These formalisms are able to accommodate more complex behavior, such as affect and habits, which might make them more competitive with RL models. I think it is entirely fair to say that these results demonstrate deviations from an idealized and strict Bayesian context; however, the equivalence here of Bayesian and normative is I think misleading or at least requires better justification/explanation. This is because a great deal of work has been done to show that Bayes optimal models can generate behavior or other outcomes that are clearly not optimal to an observer within a given context (consider hallucinations for example) but which make sense in the context of how the model is constructed as well as the priors and desired states the model is given.

      As such, I would recommend that the language be adjusted to carefully define what is meant by normative and Bayesian and to recognize that work that is clearly Bayesian could potentially still be competitive with RL models if implemented to model this task. An even better approach would be to directly use one of these more complex modelling approaches, such as active inference, as the comparator to the RL models, though I would understand if the authors would want this to be a subject for future work.

      Abstract:

      The abstract is lacking in some detail about the experiments done, but this may be a limitation of the required word count? If word count is not an issue, I would recommend adding details of the experiments done and the results. One comment is that there is an appeal to normative learning patterns, but this suggests that learning patterns have a fixed optimal nature, which may not be true in cases where the purpose of the learning (e.g. to confirm the feeling of safety of being in an in-group) may not be about learning accurately to maximize reward. This can be accommodated in a Bayesian framework by modelling priors and desired outcomes. As such the central premise that biased learning is inherently non-normative or non-Bayesian I think would require more justification. This is true in the introduction as well.

      Introduction:

      As noted above the conceptualization of Bayesian learning being equivalent to normative learning I think requires either further justification. Bayesian belief updating can be biased an non-optimal from an observer perspective, while being optimal within the agent doing the updating if the priors/desired outcomes are set up to advantage these "non-optimal" modes of decision making.

      Results:

      I wonder why the agent was presented before the choice - since the agent is only relevant to the feedback after the choice is made. I wonder if that might have induced any false association between the agent identity and the choice itself. This is by no means a critical point but would be interesting to get the authors' thoughts.

      The finding that positive feedback increases learning is one that has been shown before and depends on valence, as the authors note. They expanded their reinforcement learning model to include valence; but they did not modify the Bayesian model in a similar manner. This lack of a valence or recency effect might also explain the failure of the Bayesian models in the preceding section where the contrast effect is discussed. It is not unreasonable to imagine that if humans do employ Bayesian reasoning that this reasoning system has had parameters tuned based on the real world, where recency of information does matter; affect has also been shown to be incorporable into Bayesian information processing (see the work by Hesp on affective charge and the large body of work by Ryan Smith). It may be that the Bayesian models chosen here require further complexity to capture the situation, just like some of the biases required updates to the RL models. This complexity, rather than being arbitrary, may be well justified by decision making in the real world.

      The methods mention several symptom scales- it would be interesting to have the results of these and any interesting correlations noted. It is possible that some of individual variability here could be related to these symptoms, which could introduce precision parameter changes in a Bayesian context and things like reward sensitivity changes in an RL context.

      Discussion:

      (For discussion, not a specific comment on this paper): One wonders also about participant beliefs about the experiment or the intent of the experimenters. I have often had participants tell me they were trying to "figure out" a task or find patterns even when this was not part of the experiment. This is not specific to this paper, but it may be relevant in the future to try and model participant beliefs about the experiment especially in the context of disinformation, when they might be primed to try and "figure things out".

      As a general comment, in the active inference literature, there has been discussion of state-dependent actions, or "habits", which are learned in order to help agents more rapidly make decisions, based on previous learning. It is also possible that what is being observed is that these habits are at play, and that they represent the cognitive biases. This is likely especially true given, as the authors note, the high cognitive load of the task. It is true that this would mean that full-force Bayesian inference is not being used in each trial, or in each experience an agent might have in the world, but this is likely adaptive on the longer timescale of things, considering resource requirements. I think in this case you could argue that we have a departure from "normative" learning, but that is not necessarily a departure from any possible Bayesian framework, since these biases could potentially be modified by the agent or eschewed in favor of more expensive full-on Bayesian learning when warranted. Indeed in their discussion on the strategy of amplifying credible news sources to drown out low-credibility sources, the authors hint to the possibility of longer term strategies that may produce optimal outcomes in some contexts, but which were not necessarily appropriate to this task. As such, the performance on this task- and the consideration of true departure from Bayesian processing- should be considered in this wider context. Another thing to consider is that Bayesian inference is occurring, but that priors present going in produce the biases, or these biases arise from another source, for example factoring in epistemic value over rewards when the actual reward is not large. This again would be covered under an active inference approach, depending on how the priors are tuned. Indeed, given the benefit of social cohesion in an evolutionary perspective, some of these "biases" may be the result of adaptation. For example, it might be better to amplify people's good qualities and minimize their bad qualities in order to make it easier to interact with them; this entails a cost (in this case, not adequately learning from feedback and potentially losing out sometimes), but may fulfill a greater imperative (improved cooperation on things that matter). Given the right priors/desired states, this could still be a Bayes-optimal inference at a social level and as such may be ingrained as a habit which requires effort to break at the individual level during a task such as this.

      The authors note that this task does not relate to "emotional engagement" or "deep, identity-related, issues". While I agree that this is likely mostly true, it is also possible that just being told one is being lied to might elicit an emotional response that could bias responses, even if this is a weak response.

      Comments on revisions:

      In their updated version the authors have made some edits to address my concerns regarding the framing of the 'normative' bayesian model, clarifying that they utilized a simple bayesian model which is intended to adhere in an idealized manner to the intended task structure, though further simulations would have been ideal.

      The authors, however, did not take my recommendation to explore the symptoms in the symptom scales they collected as being a potential source of variability. They note that these were for hypothesis generation and were exploratory, fair enough, but this study is not small and there should have been sufficient sample size for a very reasonable analysis looking at symptom scores.

      However, overall the toned down claims and clarifications of intent are adequate responses to my previous review.

    2. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      This is a well-designed and very interesting study examining the impact of imprecise feedback on outcomes in decision-making. I think this is an important addition to the literature, and the results here, which provide a computational account of several decision-making biases, are insightful and interesting.

      We thank the reviewer for highlighting the strengths of this work.

      I do not believe I have substantive concerns related to the actual results presented; my concerns are more related to the framing of some of the work. My main concern is regarding the assertion that the results prove that non-normative and non-Bayesian learning is taking place. I agree with the authors that their results demonstrate that people will make decisions in ways that demonstrate deviations from what would be optimal for maximizing reward in their task under a strict application of Bayes' rule. I also agree that they have built reinforcement learning models that do a good job of accounting for the observed behavior. However, the Bayesian models included are rather simple, per the author's descriptions, applications of Bayes' rule with either fixed or learned credibility for the feedback agents. In contrast, several versions of the RL models are used, each modified to account for different possible biases. However, more complex Bayes-based models exist, notably active inference, but even the hierarchical Gaussian filter. These formalisms are able to accommodate more complex behavior, such as affect and habits, which might make them more competitive with RL models. I think it is entirely fair to say that these results demonstrate deviations from an idealized and strict Bayesian context; however, the equivalence here of Bayesian and normative is, I think, misleading or at least requires better justification/explanation. This is because a great deal of work has been done to show that Bayes optimal models can generate behavior or other outcomes that are clearly not optimal to an observer within a given context (consider hallucinations for example), but which make sense in the context of how the model is constructed as well as the priors and desired states the model is given.

      As such, I would recommend that the language be adjusted to carefully define what is meant by normative and Bayesian and to recognize that work that is clearly Bayesian could potentially still be competitive with RL models if implemented to model this task. An even better approach would be to directly use one of these more complex modelling approaches, such as active inference, as the comparator to the RL models, though I would understand if the authors would want this to be a subject for future work.

      We thank the reviewer for raising this crucial and insightful point regarding the framing of our results and the definitions of 'normative' and 'Bayesian' learning. Our primary aim in this work was to characterize specific behavioral signatures that demonstrate deviations from predictions generated by a strict, idealized Bayesian framework when learning from disinformation (which we term “biases”). We deliberately employed relatively simple Bayesian models as benchmarks to highlight these specific biases. We fully agree that more sophisticated Bayes-based models (as mentioned by the reviewer, or others) could potentially offer alternative mechanistic explanations for participant behavior. However, we currently do not have a strong notion about which Bayesian models can encompass our findings, and hence, we leave this important question for future work.

      To enhance clarity within the current manuscript we now avoided the use of the term “normative” to refer to our Bayesian models, using the term “ideal” instead. We also define more clearly what exactly we mean by that notion when the idea model is described:

      “This model is based on an idealized assumptions that during the feedback stage of each trial, the value of the chosen bandit is updated (based on feedback valence and credibility) according to Bayes rule reflecting perfect adherence to the instructed task structure (i.e., how true outcomes and feedback are generated).”

      Moreover, we have added a few sentences in the discussion commenting on how more complex Bayesian models might account for our empirical findings:

      “However, as hypothesized, when facing potential disinformation, we also find that individuals exhibit several important biases i.e., deviations from strictly idealized Bayesian strategies. Future studies should explore if and under what assumptions, about the task’s generative structure and/or learner’s priors and objectives, more complex Bayesian models (e.g., active inference (58)) might account for our empirical findings.”

      Abstract:

      The abstract is lacking in some detail about the experiments done, but this may be a limitation of the required word count. If word count is not an issue, I would recommend adding details of the experiments done and the results.

      We thank the reviewer for their valuable suggestion. We have now included more details about the experiment in the abstract:

      “In two experiments, participants completed a two-armed bandit task, where they repeatedly chose between two lotteries and received outcome-feedback from sources of varying credibility, who occasionally disseminated disinformation by lying about true choice outcome (e.g., reporting non reward when a reward was truly earned or vice versa).”

      One comment is that there is an appeal to normative learning patterns, but this suggests that learning patterns have a fixed optimal nature, which may not be true in cases where the purpose of the learning (e.g. to confirm the feeling of safety of being in an in-group) may not be about learning accurately to maximize reward. This can be accommodated in a Bayesian framework by modelling priors and desired outcomes. As such, the central premise that biased learning is inherently non-normative or non-Bayesian, I think, would require more justification. This is true in the introduction as well.

      Introduction:

      As noted above, the conceptualization of Bayesian learning being equivalent to normative learning, I think requires further justification. Bayesian belief updating can be biased and non-optimal from an observer perspective, while being optimal within the agent doing the updating if the priors/desired outcomes are set up to advantage these "non-optimal" modes of decision making.

      We appreciate the reviewer's thoughtful comment regarding the conceptualization of "normative" and "Bayesian" learning. We fully agree that the definition of "normative" is nuanced and can indeed depend on whether one considers reward-maximization or the underlying principles of belief updating. As explained above we now restrict our presentation to deviations from “ideal Bayes” learning patterns and we acknowledge the reviewer’s concern in a caveat in our discussion.

      Results:

      I wonder why the agent was presented before the choice, since the agent is only relevant to the feedback after the choice is made. I wonder if that might have induced any false association between the agent identity and the choice itself. This is by no means a critical point, but it would be interesting to get the authors' thoughts.

      We thank the reviewer for raising this interesting point regarding the presentation of the agent before the choice. Our decision to present the agent at this stage was intentional, as our original experimental design aimed to explore the possible effects of "expected source credibility" on participants' choices (e.g., whether knowledge of feedback credibility will affect choice speed and accuracy). However, we found nothing that would be interesting to report.

      The finding that positive feedback increases learning is one that has been shown before and depends on valence, as the authors note. They expanded their reinforcement learning model to include valence, but they did not modify the Bayesian model in a similar manner. This lack of a valence or recency effect might also explain the failure of the Bayesian models in the preceding section, where the contrast effect is discussed. It is not unreasonable to imagine that if humans do employ Bayesian reasoning that this reasoning system has had parameters tuned based on the real world, where recency of information does matter; affect has also been shown to be incorporable into Bayesian information processing (see the work by Hesp on affective charge and the large body of work by Ryan Smith). It may be that the Bayesian models chosen here require further complexity to capture the situation, just like some of the biases required updates to the RL models. This complexity, rather than being arbitrary, may be well justified by decision-making in the real world.

      Thanks for these additional important ideas which speak more to the notion that more complex Bayesian frameworks may account for biases we report.

      The methods mention several symptom scales- it would be interesting to have the results of these and any interesting correlations noted. It is possible that some of the individual variability here could be related to these symptoms, which could introduce precision parameter changes in a Bayesian context and things like reward sensitivity changes in an RL context.

      We included these questionnaires for exploratory purposes, with the aim of generating informed hypotheses for future research into individual differences in learning. Given the preliminary nature of these analyses, we believe further research is required about this important topic.

      Discussion:

      (For discussion, not a specific comment on this paper): One wonders also about participants' beliefs about the experiment or the intent of the experimenters. I have often had participants tell me they were trying to "figure out" a task or find patterns even when this was not part of the experiment. This is not specific to this paper, but it may be relevant in the future to try and model participant beliefs about the experiment especially in the context of disinformation, when they might be primed to try and "figure things out".

      We thank the reviewer for this important recommendation. We agree and this point is included in our caveat (cited above) that future research should address what assumptions about the generative task structure can allow Bayesian models to account for our empirical patterns.

      As a general comment, in the active inference literature, there has been discussion of state-dependent actions, or "habits", which are learned in order to help agents more rapidly make decisions, based on previous learning. It is also possible that what is being observed is that these habits are at play, and that they represent the cognitive biases. This is likely especially true given, as the authors note, the high cognitive load of the task. It is true that this would mean that full-force Bayesian inference is not being used in each trial, or in each experience an agent might have in the world, but this is likely adaptive on the longer timescale of things, considering resource requirements. I think in this case you could argue that we have a departure from "normative" learning, but that is not necessarily a departure from any possible Bayesian framework, since these biases could potentially be modified by the agent or eschewed in favor of more expensive full-on Bayesian learning when warranted.<br /> Indeed, in their discussion on the strategy of amplifying credible news sources to drown out low-credibility sources, the authors hint at the possibility of longer-term strategies that may produce optimal outcomes in some contexts, but which were not necessarily appropriate to this task. As such, the performance on this task- and the consideration of true departure from Bayesian processing- should be considered in this wider context.

      Another thing to consider is that Bayesian inference is occurring, but that priors present going in produce the biases, or these biases arise from another source, for example, factoring in epistemic value over rewards when the actual reward is not large. This again would be covered under an active inference approach, depending on how the priors are tuned. Indeed, given the benefit of social cohesion in an evolutionary perspective, some of these "biases" may be the result of adaptation. For example, it might be better to amplify people's good qualities and minimize their bad qualities in order to make it easier to interact with them; this entails a cost (in this case, not adequately learning from feedback and potentially losing out sometimes), but may fulfill a greater imperative (improved cooperation on things that matter). Given the right priors/desired states, this could still be a Bayes-optimal inference at a social level and, as such, may be ingrained as a habit that requires effort to break at the individual level during a task such as this.

      We thank the reviewer for these insightful suggestions speaking further to the point about more complex Bayesian models.

      The authors note that this task does not relate to "emotional engagement" or "deep, identity-related issues". While I agree that this is likely mostly true, it is also possible that just being told one is being lied to might elicit an emotional response that could bias responses, even if this is a weak response.

      We agree with the reviewer that a task involving performance-based bonuses, and particularly one where participants are explicitly told they are being lied to, might elicit weak emotional response. However, our primary point is that the degree of these responses is expected to be substantially weaker than those typically observed in the broader disinformation literature, which frequently deals with highly salient political, social, or identity-related topics that inherently carry strong emotional and personal ties for participants, leading to much more pronounced affective engagement and potential biases. Our task deliberately avoids such issues thus minimizing the potential for significant emotion-driven biases. We have toned down the discussion accordingly:

      “This occurs even when the decision at hand entails minimal emotional engagement or pertinence to deep, identity-related, issues.”

      Reviewer #2 (Public review):

      This valuable paper studies the problem of learning from feedback given by sources of varying credibility. The solid combination of experiment and computational modeling helps to pin down properties of learning, although some ambiguity remains in the interpretation of results.

      Summary:

      This paper studies the problem of learning from feedback given by sources of varying credibility. Two banditstyle experiments are conducted in which feedback is provided with uncertainty, but from known sources. Bayesian benchmarks are provided to assess normative facets of learning, and alternative credit assignment models are fit for comparison. Some aspects of normativity appear, in addition to deviations such as asymmetric updating from positive and negative outcomes.

      Strengths:

      The paper tackles an important topic, with a relatively clean cognitive perspective. The construction of the experiment enables the use of computational modeling. This helps to pinpoint quantitatively the properties of learning and formally evaluate their impact and importance. The analyses are generally sensible, and parameter recovery analyses help to provide some confidence in the model estimation and comparison.

      We thank the reviewer for highlighting the strengths of this work.

      Weaknesses:

      (1) The approach in the paper overlaps somewhat with various papers, such as Diaconescu et al. (2014) and Schulz et al. (forthcoming), which also consider the Bayesian problem of learning and applying source credibility, in terms of theory and experiment. The authors should discuss how these papers are complementary, to better provide an integrative picture for readers.

      Diaconescu, A. O., Mathys, C., Weber, L. A., Daunizeau, J., Kasper, L., Lomakina, E. I., ... & Stephan, K. E. (2014). Inferring the intentions of others by hierarchical Bayesian learning. PLoS computational biology, 10(9), e1003810.

      Schulz, L., Schulz, E., Bhui, R., & Dayan, P. Mechanisms of Mistrust: A Bayesian Account of Misinformation Learning. https://doi.org/10.31234/osf.io/8egxh

      We thank the reviewers for pointing us to this relevant work. We have updated the introduction, mentioning these precedents in the literature and highlighting our specific contributions:

      “To address these questions, we adopt a novel approach within the disinformation literature by exploiting a Reinforcement Learning (RL) experimental framework (36). While RL has guided disinformation research in recent years (37–41), our approach is novel in using one of its most popular tasks: the “bandit task”.”

      We also explain in the discussion how these papers relate to the current study:

      “Unlike previous studies wherein participants had to infer source credibility from experience (30,37,72), we took an explicit-instruction approach, allowing us to precisely assess source-credibility impact on learning, without confounding it with errors in learning about the sources themselves. More broadly, our work connects with prior research on observational learning, which examined how individuals learn from the actions or advice of social partners (72–75). This body of work has demonstrated that individuals integrate learning from their private experiences with learning based on others’ actions or advice—whether by inferring the value others attribute to different options or by mimicking their behavior (57,76). However, our task differs significantly from traditional observational learning. Firstly, our feedback agents interpret outcomes rather than demonstrating or recommending actions (30,37,72).”

      (2) It isn't completely clear what the "cross-fitting" procedure accomplishes. Can this be discussed further?

      We thank the reviewer for requesting further clarification on the cross-fitting procedure. Our study utilizes two distinct model families: Bayesian models and CA models. The credit assignment parameters from the CA models can be treated as “data/behavioural features” corresponding to how choice feedback affects choice-propensities. The cross fitting-approach allows us in effect to examine whether these propensity features are predicted from our Bayesian models. To the extent they are not, we can conclude empirical behavior is “biased”.

      Thus, in our cross-fitting procedure we compare the CA model parameters extracted from participant data (empirical features) with those that would be expected if our Bayesian agents performed the task. Specifically, we first fit participant behavior with our Bayesian models, then simulate this model using the best-fitted parameters and fit those simulations with our CA models. This generates a set of CA parameters that would be predicted if participants behavior is reduced to a Bayesian account. By comparing these predicted Bayesian CA parameters with the actual CA parameters obtained from human participants, the cross-fitting procedure allows us to quantitatively demonstrate that the observed participant parameters are indeed statistically significant deviations from normative Bayesian processing. This provides a robust validation that the biases we identify are not artifacts of the CA model's structure but true departures from normative learning.

      We also note that Reviewer 3 suggested an intuitive way to think about the CA parameters—as analogous to logistic regression coefficients in a “sophisticated regression” of choice on (recencyweighted) choice-feedback. We find this suggestion potentially helpful for readers. Under this interpretation, the purpose of the cross-fitting method can be seen simply as estimating the regression coefficients that would be predicted by our Bayesian agents, and comparing those to the empirical coefficients.

      In our manuscript we now explain this issues more clearly by explaining how our model is analogous to a logistic regression:

      “The probability to choose a bandit (say A over B) in this family of models is a logistic function of the contrast choice-propensities between these two bandits. One interpretation of this model is as a “sophisticated” logistic regression, where the CA parameters take the role of “regression coefficients” corresponding to the change in log odds of repeating the just-taken action in future trials based on the feedback (+/- CA for positive or negative feedback, respectively; the model also includes gradual perseveration which allows for constant log-odd changes that are not affected by choice feedback) . The forgetting rate captures the extent to which the effect of each trial on future choices diminishes with time. The Q-values are thus exponentially decaying sums of logistic choice propensities based on the types of feedback a bandit received.”

      We also explain our cross-fitting procedure in more detail:

      “To further characterise deviations between behaviour and our Bayesian learning models, we used a “crossfitting” method. Treating CA parameters as data-features of interest (i.e., feedback dependent changes in choice propensity), our goal was to examine if and how empirical features differ from features extracted from simulations of our Bayesian learning models. Towards that goal, we simulated synthetic data based on Bayesian agents (using participants’ best fitting parameters), but fitted these data using the CA-models, obtaining what we term “Bayesian-CA parameters” (Fig. 2d; Methods). A comparison of these BayesianCA parameters, with empirical-CA parameters obtained by fitting CA models to empirical data, allowed us to uncover patterns consistent with, or deviating from, ideal-Bayesian value-based inference. Under the sophisticated logistic-regression interpretation of the CA-model family the cross-fitting method comprises a comparison between empirical regression coefficients (i.e., empirical CA parameters) and regression coefficient based on simulations of Bayesian models (Bayesian CA parameters).”

      (3) The Credibility-CA model seems to fit the same as the free-credibility Bayesian model in the first experiment and barely better in the second experiment. Why not use a more standard model comparison metric like the Bayesian Information Criterion (BIC)? Even if there are advantages to the bootstrap method (which should be described if so), the BIC would help for comparability between papers.

      We thank the reviewer for this important comment regarding our model comparison approach. We acknowledge that classical information criteria like AIC and BIC are widely used in RL studies. However, we argue our method for model-comparison is superior.

      We conducted a model recovery analysis demonstrating a significant limitation of using AIC or BIC for model-comparison in our data. Both these methods are strongly biased in favor of the Bayesian models. Our PBCM method, on the other hand, is both unbiased and more accurate. We believe this is because “off the shelf” methods like AIC and BIC rely on strong assumptions (such as asymptotic sample size and trial-independence) that are not necessarily met in our tasks (Data is finite; Trials in RL tasks depend on previous trials). PBCM avoids such assumptions to obtain comparison criteria specifically tailored to the structure and size of our empirical data. We have now mentioned this fact in the results section of the main text:

      “We considered using AIC and BIC, which apply “off-the shelf” penalties for model-complexity. However, these methods do not adapt to features like finite sample size (relying instead on asymptotic assumption) or temporal dependence (as is common in reinforcement learning experiments). In contrast, the parametric bootstrap cross-fitting method replaces these fixed penalties with empirical, data-driven criteria for modelselection. Indeed, model-recovery simulations confirmed that whereas AIC and BIC were heavily biased in favour of the Bayesian models, the bootstrap method provided excellent model-recovery (See Fig. S20).”

      We have also included such model recovery in the SI document:

      (4) As suggested in the discussion, the updating based on random feedback could be due to the interleaving of trials. If one is used to learning from the source on most trials, the occasional random trial may be hard to resist updating from. The exact interleaving structure should also be clarified (I assume different sources were shown for each bandit pair). This would also relate to work on RL and working memory: Collins, A. G., & Frank, M. J. (2012). How much of reinforcement learning is working memory, not reinforcement learning? A behavioral, computational, and neurogenetic analysis. European Journal of Neuroscience, 35(7), 10241035.

      We thank the reviewer for this point. The specific interleaved structure of the agents is described in the main text:

      “Each agent provided feedback for 5 trials for each bandit pair (with the agent order interleaved within the bandit pair).”

      As well as in the methods section:

      “Feedback agents were randomly interleaved across trials subject to the constraint that each agent appeared on 5-trials for each bandit pair.”

      We also thank the reviewer for mentioning the relevant work on working memory. We have now added it to our discussion point:

      “In our main study, we show that participants revised their beliefs based on entirely non-credible feedback, whereas an ideal Bayesian strategy dictates such feedback should be ignored. This finding resonates with the “continued-influence effect” whereby misleading information continues to influence an individual's beliefs even after it has been retracted (59,60). One possible explanation is that some participants failed to infer that feedback from the 1-star agent was statistically void of information content, essentially random (e.g., the group-level credibility of this agent was estimated by our free-credibility Bayesian model as higher than 50%). Participants were instructed that this feedback would be “a lie” 50% of the time but were not explicitly told that this meant it was random and should therefore be disregarded. Notably, however, there was no corresponding evidence random feedback affected behaviour in our discovery study. It is possible that an individual’s ability to filter out random information might have been limited due to a high cognitive load induced by our main study task, which required participants to track the values of three bandit pairs and juggle between three interleaved feedback agents (whereas in our discovery study each experimental block featured a single bandit pair). Future studies should explore more systematically how the ability to filter random feedback depends on cognitive load (61).”

      (5) Why does the choice-repetition regression include "only trials for which the last same-pair trial featured the 3-star agent and in which the context trial featured a different bandit pair"? This could be stated more plainly.

      We thank the reviewer for this question. When we previously submitted our manuscript, we thought that finding enhanced credit-assignment for fully credible feedback following potential disinformation from a different context would constitute a striking demonstration of our “contrast effect”. However, upon reexamining this finding we found out we had a coding error (affecting how trials were filtered). We have now rerun and corrected this analysis. We have assessed the contrast effect for both "same-context" trials (where the contextual trial featured the same bandit pair as the learning trial) and "different-context" trials (where the contextual trial featured a different bandit pair). Our re-analysis reveals a selective significant contrast effect in the samecontext condition, but no significant effect in the different-context condition. We have updated the main text to reflect these corrected findings and provide a clearer explanation of the analysis:

      “A comparison of empirical and Bayesian credit-assignment parameters revealed a further deviation from ideal Bayesian learning: participants showed an exaggerated credit-assignment for the 3-star agent compared with Bayesian models [Wilcoxon signed-rank test, instructed-credibility Bayesian model (median difference=0.74, z=11.14); free-credibility Bayesian model (median difference=0.62, z=10.71), all p’s<0.001] (Fig. 3a). One explanation for enhanced learning for the 3-star agents is a contrast effect, whereby credible information looms larger against a backdrop of non-credible information. To test this hypothesis, we examined whether the impact of feedback from the 3-star agent is modulated by the credibility of the agent in the trial immediately preceding it. More specifically, we reasoned that the impact of a 3-star agent would be amplified by a “low credibility context” (i.e., when it is preceded by a low credibility trial). In a binomial mixed effects model, we regressed choice-repetition on feedback valence from the last trial featuring the same bandit pair (i.e., the learning trial) and the feedback agent on the trial immediately preceding that last trial (i.e., the contextual credibility; see Methods for model-specification). This analysis included only learning trials featuring the 3-star agent, and context trials featuring the same bandit pair as the learning trial (Fig. 4a). We found that feedback valence interacted with contextual credibility (F(2,2086)=11.47, p<0.001) such that the feedback-effect (from the 3-star agent) decreased as a function of the preceding context-credibility (3-star context vs. 2-star context: b= -0.29, F(1,2086)=4.06, p=0.044; 2star context vs. 1-star context: b=-0.41, t(2086)=-2.94, p=0.003; and 3-star context vs. 1-star context: b=0.69, t(2086)=-4.74, p<0.001) (Fig. 4b). This contrast effect was not predicted by simulations of our main models of interest (Fig. 4c). No effect was found when focussing on contextual trials featuring a bandit pair different than the one in the learning trial (see SI 3.5). Thus, these results support an interpretation that credible feedback exerts a greater impact on participants’ learning when it follows non-credible feedback, in the same learning context.”

      We have modified the discussion accordingly as well:

      “A striking finding in our study was that for a fully credible feedback agent, credit assignment was exaggerated (i.e., higher than predicted by our Bayesian models). Furthermore, the effect of fully credible feedback on choice was further boosted when it was preceded by a low-credibility context related to current learning. We interpret this in terms of a “contrast effect”, whereby veridical information looms larger against a backdrop of disinformation (21). One upshot is that exaggerated learning might entail a risk of jumping to premature conclusions based on limited credible evidence (e.g., a strong conclusion that a vaccine is produces significant side-effect risks based on weak credible information, following non-credible information about the same vaccine). An intriguing possibility, that could be tested in future studies, is that participants strategically amplify the extent of learning from credible feedback to dilute the impact of learning from noncredible feedback. For example, a person scrolling through a social media feed, encountering copious amounts of disinformation, might amplify the weight they assign to credible feedback in order to dilute effects of ‘fake news’. Ironically, these results also suggest that public campaigns might be more effective when embedding their messages in low-credibility contexts , which may boost their impact.”

      And we have included some additional analyses in the SI document:

      “3.5 Contrast effects for contexts featuring a different bandit

      Given that we observed a contrast effect when both the learning and the immediately preceding "context trial” involved the same pair of bandits, we next investigated whether this effect persisted when the context trial featured a different bandit pair – a situation where the context would be irrelevant to the current learning. Again, we used in a binomial mixed effects model, regressing choice-repetition on feedback valence in the learning trial and the feedback agent in the context trial. This analysis included only learning trials featuring the 3-star agent, and context trials featuring a different bandit pair than the learning trial (Fig. S22a). We found no significant evidence of an interaction between feedback valence and contextual credibility (F(2,2364)=0.21, p=0.81) (Fig. S22b). This null result was consistent with the range of outcomes predicted by our main computational models (Fig. S22c).

      We aimed to formally compare the influence of two types of contextual trials: those featuring the same bandit pair as the learning trial versus those featuring a different pair. To achieve this, we extended our mixedeffects model by incorporating a new predictor variable, "CONTEXT_TYPE" which coded whether the contextual trial involved the same bandit pair (coded as -0.5) or a different bandit pair (+0.5) compared to the learning trial. The Wilkinson notation for this expanded mixed-effects model is:

      𝑅𝐸𝑃𝐸𝐴𝑇 ~ 𝐶𝑂𝑁𝑇𝐸𝑋𝑇_𝑇𝑌𝑃𝐸 ∗ 𝐹𝐸𝐸𝐷𝐵𝐴𝐶𝐾 ∗ (𝐶𝑂𝑁𝑇𝐸𝑋𝑇<sub>2-star</sub> + 𝐶𝑂𝑁𝑇𝐸𝑋𝑇<sub>3-star</sub>) + 𝐵𝐸𝑇𝑇𝐸𝑅 + (1|𝑝𝑎𝑟𝑡𝑖𝑐𝑖𝑝𝑎𝑛𝑡)

      This expanded model revealed a significant three-way interaction between feedback valence, contextual credibility, and context type (F(2,4451) = 7.71, p<0.001). Interpreting this interaction, we found a 2-way interaction between context-source and feedback valence when the context was the same (F(2,4451) = 12.03, p<0.001), but not when context was different (F(2,4451) = 0.23, p = 0.79). Further interpreting the double feedback-valence * context-source interaction (for the same context) we obtained the same conclusions as reported in the main text.”

      (6) Why apply the "Truth-CA" model and not the Bayesian variant that it was motivated by?

      Thanks for this very useful suggestion. We are unsure if we fully understand the question. The Truth-CA model was not motivated by a new Bayesian model. Our Bayesian models were simply used to make the point that participants may partially discriminate between truthful and untruthful feedback (for a given source). This led to the idea that perhaps more credit is assigned for truth (than lie) trials, which is what we found using our Truth-CA model. Note we show that our Bayesian models cannot account for this modulation.

      We have now improved our "Truth-CA" model. Previously, our Truth-CA model considered whether feedback on each trial was true or not based on realized latent true outcomes. However, it is possible that the very same feedback would have had an opposite truth-status if the latent true outcome was different (recall true outcomes are stochastic). This injects noise into the trial classification in our previous model. To avoid this, in our new model feedback is modulated by the probability the reported feedback is true (marginalized over stochasticity of true outcome).

      We have described this new model in the methods section:

      “Additionally, we formulated a “Truth-CA” model, which worked as our Credibility-CA model, but incorporated a free truth-bonus parameter (TB). This parameter modulates the extent of credit assignment for each agent based on the posterior probability of feedback being true (given the credibility of the feedback agent, and the true reward probability of the chosen bandit). The chosen bandit was updated as follows:

      𝑄 ← (1 – 𝑓<sub>Q</sub>) ∗ 𝑄 + [𝐶𝐴(𝑎𝑔𝑒𝑛𝑡) + 𝑇𝐵 ∗ (𝑃(𝑡𝑟𝑢𝑡ℎ) − 0.5)] ∗ 𝐹

      where P(truth) is the posterior probability of the feedback being true in the current trial (for exact calculation of P(truth) see “Methods: Bayesian estimation of posterior belief that feedback is true”).”

      All relevant results have been updated accordingly in the main text:

      “To formally address whether feedback truthfulness modulates credit assignment, we fitted a new variant of the CA model (the “Truth-CA” model) to the data. This variant works as our Credibility-CA model but incorporated a truth-bonus parameter (TB) which increases the degree of credit assignment for feedback as a function of the experimenter-determined likelihood the feedback is true (which is read from the curves in Fig 6a when x is taken to be the true probability the bandit is rewarding). Specifically, after receiving feedback, the Q-value of the chosen option is updated according to the following rule: 𝑄 ← (1 – 𝑓<sub>Q</sub>) ∗ 𝑄 + [𝐶𝐴(𝑎𝑔𝑒𝑛𝑡) + 𝑇𝐵 ∗ (𝑃(𝑡𝑟𝑢𝑡ℎ) − 0.5)] ∗ 𝐹 where 𝑇𝐵 is the free parameter representing the truth bonus, and 𝑃(𝑡𝑟𝑢𝑡ℎ) is the probability the received feedback being true (from the experimenter’s perspective). We acknowledge that this model falls short of providing a mechanistically plausible description of the credit assignment process, because participants have no access to the experimenter’s truthfulness likelihoods (as the true bandit reward probabilities are unknown to them). Nonetheless, we use this ‘oracle model’ as a measurement tool to glean rough estimates for the extent to which credit assignment Is boosted as a function of its truthfulness likelihood. Fitting this Truth-CA model to participants' behaviour revealed a significant positive truth-bonus (mean=0.21, t(203)=3.12, p=0.002), suggesting that participants indeed assign greater weight to feedback that is likely to be true (Fig. 6c; see SI 3.3.1 for detailed ML parameter results). Notably, simulations using our other models (Methods) consistently predicted smaller truth biases (compared to the empirical bias) (Fig. 6d). Moreover, truth bias was still detected even in a more flexible model that allowed for both a positivity bias and truth-bias (see SI 3.7). The upshot is that participants are biased to assign higher credit based on feedback that is more likely to be true in a manner that is inconsistent with out Bayesian models and above and beyond the previously identified positivity biases.“

      Finally, the Supplementary Information for the discovery study has also been revised to feature this analysis:

      “We next assessed whether participants infer whether the feedback they received on each trial was true or false and adjust their credit assignment based on this inference. We again used the “Truth-CA” model to obtain estimates for the truth bonus (TB), the increase in credit assignment as a function of the posterior probability of feedback being true. As in our main study, the fitted truth bias parameter was significantly positive, indicating that participants assign greater weight to feedback they believe is likely to be true (Fig, S4a; see SI 3.3.1 for detailed ML parameter results). Strikingly, model-simulations (Methods) predicted a lower truth bonus than the one observed in participants (Fig. S4b).”

      (7) "Overall, the results from this study support the exact same conclusions (See SI section 1.2) but with one difference. In the discovery study, we found no evidence for learning based on 50%-credibility feedback when examining either the feedback effect on choice repetition or CA in the credibility-CA model (SI 1.2.3)" - this seems like a very salient difference, when the paper reports the feedback effect as a primary finding of interest, though I understand there remains a valence-based difference.

      We agree with the reviewer and thank them for this suggestion. We now state explicitly throughout the manuscript that this finding was obtained only in one of our two studies. In the section “Discovery study” of the results we state explicitly this finding was not found in the discovery study:

      “However, we found no evidence for learning based on 50%-credibility feedback when examining either the feedback effect on choice repetition or CA in the credibility-CA model (SI 1.2.3).”

      We also note that related to another concern from R3 (that perseveration may masquerade as positivity bias) we conducted additional analyses (detailed in SI 3.6.2). These analyses revealed that the observed positivity bias for the 1-star agent in the discovery study falls within the range predicted by simple choice-perseveration. Consequently, we have removed the suggestion that participants still learn from the random agent in the discovery study. Furthermore, we have modified the discussion section to include a possible explanation for this discrepancy between the two studies:

      “Notably, however, there was no corresponding evidence random feedback affected behaviour in our discovery study. It is possible that an individual’s ability to filter out random information might have been limited due to a high cognitive load induced by our main study task, which required participants to track the values of three bandit pairs and juggle between three interleaved feedback agents (whereas in our discovery study each experimental block featured a single bandit pair). Future studies should explore more systematically how the ability to filter random feedback depends on cognitive load (61).”

      (8) "Participants were instructed that this feedback would be "a lie 50% of the time but were not explicitly told that this meant it was random and should therefore be disregarded." - I agree that this is a possible explanation for updating from the random source. It is a meaningful caveat.

      Thank you for this thought. While this can be seen as a caveat—since we don’t know what would have happened with explicit instructions—we also believe it is interesting from another perspective. In many real-life situations, individuals may have all the necessary information to infer that the feedback they receive is uninformative, yet still fail to do so, especially when they are not explicitly told to ignore it.

      In future work, we plan to examine how behaviour changes when participants are given more explicit instructions—for example, that the 50%-credibility agent provides purely random feedback.

      (9) "Future studies should investigate conditions that enhance an ability to discard disinformation, such as providing explicit instructions to ignore misleading feedback, manipulations that increase the time available for evaluating information, or interventions that strengthen source memory." - there is work on some of this in the misinformation literature that should be cited, such as the "continued influence effect". For example: Johnson, H. M., & Seifert, C. M. (1994). Sources of the continued influence effect: When misinformation in memory affects later inferences. Journal of experimental psychology: Learning, memory, and cognition, 20(6), 1420.

      We thank the reviewer for pointing us towards the relevant literature. We have now included citations about the “continued influence effect” of misinformation in the discussion:

      “In our main study, we show that participants revised their beliefs based on entirely non-credible feedback, whereas an ideal Bayesian strategy dictates such feedback should be ignored. This finding resonates with the “continued-influence effect” whereby misleading information continues to influence an individual's beliefs even after it has been retracted (59,60).”

      (10) Are the authors arguing that choice-confirmation bias may be at play? Work on choice-confirmation bias generally includes counterfactual feedback, which is not present here.

      We agree with the reviewer that a definitive test for choice-confirmation bias typically requires counterfactual feedback, which is not present in our current task. In our discussion, we indeed suggest that the positivity bias we observe may stem from a form of choice-confirmation, drawing on the extensive literature on this bias in reinforcement learning (Lefebvre et al., 2017; Palminteri et al., 2017; Palminteri & Lebreton, 2022). However, we fully acknowledge that this link is a hypothesis and that explicitly testing for choice-confirmation bias would necessitate a future study specifically incorporating counterfactual feedback. We have included a clarification of this point in the discussion:

      “Previous reinforcement learning studies, report greater credit-assignment based on positive compared to negative feedback, albeit only in the context of veridical feedback (43,44,62). Here, supporting our a-priori hypothesis we show that this positivity bias is amplified for information of low and intermediate credibility (in absolute terms in the discovery study, and relative to the overall extent of CA in both studies) . Of note, previous literature has interpreted enhanced learning for positive outcomes in reinforcement learning as indicative of a confirmation bias (42,44). For example, positive feedback may confirm, to a greater extent than negative feedback one’s choice as superior (e.g., “I chose the better of the two options”). Leveraging the framework of motivated cognition (35), we posited that feedback of uncertain veracity (e.g., low credibility) amplifies this bias by incentivising individuals to self-servingly accept positive feedback as true (because it confers positive, desirable outcomes), and explain away undesirable, choice-disconfirming, negative feedback as false. This could imply an amplified confirmation bias on social media, where content from sources of uncertain credibility, such as unknown or unverified users, is more easily interpreted in a self-serving manner, disproportionately reinforcing existing beliefs (63). In turn, this could contribute to an exacerbation of the negative social outcomes previously linked to confirmation bias such as polarization (64,65), the formation of ‘echo chambers’ (19), and the persistence of misbelief regarding contemporary issues of importance such as vaccination (66,67) and climate change (68–71). We note however, that further studies are required to determine whether positivity bias in our task is indeed a form of confirmation bias.”

      Reviewer #3 (Public review):

      Summary

      This paper investigates how disinformation affects reward learning processes in the context of a two-armed bandit task, where feedback is provided by agents with varying reliability (with lying probability explicitly instructed). They find that people learn more from credible sources, but also deviate systematically from optimal Bayesian learning: They learned from uninformative random feedback, learned more from positive feedback, and updated too quickly from fully credible feedback (especially following low-credibility feedback). Overall, this study highlights how misinformation could distort basic reward learning processes, without appeal to higher-order social constructs like identity.

      Strengths

      (1) The experimental design is simple and well-controlled; in particular, it isolates basic learning processes by abstracting away from social context.

      (2) Modeling and statistics meet or exceed the standards of rigor.

      (3) Limitations are acknowledged where appropriate, especially those regarding external validity.

      (4) The comparison model, Bayes with biased credibility estimates, is strong; deviations are much more compelling than e.g., a purely optimal model.

      (5) The conclusions are interesting, in particular the finding that positivity bias is stronger when learning from less reliable feedback (although I am somewhat uncertain about the validity of this conclusion)

      We deeply thank the reviewer for highlighting the strengths of this work.

      Weaknesses

      (1) Absolute or relative positivity bias?

      In my view, the biggest weakness in the paper is that the conclusion of greater positivity bias for lower credible feedback (Figure 5) hinges on the specific way in which positivity bias is defined. Specifically, we only see the effect when normalizing the difference in sensitivity to positive vs. negative feedback by the sum. I appreciate that the authors present both and add the caveat whenever they mention the conclusion (with the crucial exception of the abstract). However, what we really need here is an argument that the relative definition is the right way to define asymmetry....

      Unfortunately, my intuition is that the absolute difference is a better measure. I understand that the relative version is common in the RL literature; however previous studies have used standard TD models, whereas the current model updates based on the raw reward. The role of the CA parameter is thus importantly different from a traditional learning rate - in particular, it's more like a logistic regression coefficient (as described below) because it scales the feedback but not the decay. Under this interpretation, a difference in positivity bias across credibility conditions corresponds to a three-way interaction between the exponentially weighted sum of previous feedback of a given type (e.g., positive from the 75% credible agent), feedback positivity, and condition (dummy coded). This interaction corresponds to the nonnormalized, absolute difference.

      Importantly, I'm not terribly confident in this argument, but it does suggest that we need a compelling argument for the relative definition.

      We thank the reviewer for raising this important point about the definition of positivity bias, and for their thoughtful discussion on the absolute versus relative measures. We believe that the relative valence bias offers a distinct and valuable perspective on positivity bias. Conceptually, this measure describes positivity bias in a manner akin to a “percentage difference” relative to the overall level of learning which allows us to control for the overall decreases in the overall amount of credit assignment as feedback becomes less credible. We are unsure if one measure is better or more correct than the other and we believe that reporting both measures enriches the understanding of positivity bias and allows for a more comprehensive characterization of this phenomenon (as long as these measures are interpreted carefully). We have stated the significance of the relative measure in the results section:

      “Following previous research, we quantified positivity bias in 2 ways: 1) as the absolute difference between credit-assignment based on positive or negative feedback, and 2) as the same difference but relative to the overall extent of learning. We note that the second, relative, definition, is more akin to “percentage change” measurements providing a control for the overall lower levels of credit-assignment for less credible agent.”

      We also wish to point out that in our discovery study we had some evidence for amplification of positivity bias in absolute sense.

      (2) Positivity bias or perseveration?

      A key challenge in interpreting many of the results is dissociating perseveration from other learning biases. In particular, a positivity bias (Figure 5) and perseveration will both predict a stronger correlation between positive feedback and future choice. Crucially, the authors do include a perseveration term, so one would hope that perseveration effects have been controlled for and that the CA parameters reflect true positivity biases. However, with finite data, we cannot be sure that the variance will be correctly allocated to each parameter (c.f. collinearity in regressions). The fact that CA- is fit to be negative for many participants (a pattern shown more strongly in the discovery study) is suggestive that this might be happening. A priori, the idea that you would ever increase your value estimate after negative feedback is highly implausible, which suggests that the parameter might be capturing variance besides that it is intended to capture.

      The best way to resolve this uncertainty would involve running a new study in which feedback was sometimes provided in the absence of a choice - this would isolate positivity bias. Short of that, perhaps one could fit a version of the Bayesian model that also includes perseveration. If the authors can show that this model cannot capture the pattern in Figure 5, that would be fairly convincing.

      We thank the reviewer for this very insightful and crucial point regarding the potential confound between positivity bias and perseveration. We entirely agree that distinguishing these effects can be challenging. To rigorously address this concern and ascertain that our observed positivity bias, particularly its inflation for low-credibility feedback, is not merely an artifact of perseveration, we conducted additional analyses as suggested.

      First, following the reviewer’s suggestion we simulated our Bayesian models, including a perseveration term, for both our main and discovery studies. Crucially, none of these simulations predicted the specific pattern of inflated positivity bias for low-credibility feedback that we identified in participants.

      Additionally, taking a “devil’s advocate” approach, we tested whether our credibility-CA model (which includes perseveration but not a feedback valence bias) can predict our positivity bias findings. Thus, we simulated 100 datasets using our Credibility-CA model (based on empirical best-fitting parameters). We then fitted each of these simulated datasets using our CredibilityValence CA model. By examining the distribution of results across these synthetic datasets fits and comparing them to the actual results from participants, we found that while perseveration could indeed lead (as the reviewer suspected) to an artifactual positivity bias, it could not predict the magnitude of the observed inflation of positivity bias for low-credibility feedback (whether measured in absolute or relative terms).

      Based on these comprehensive analyses, we are confident that our main results concerning the modulation of a valence bias as a function of source-credibility cannot be accounted by simple choice-perseveration. We have briefly explained these analyses in the main results section:

      “Previous research has suggested that positivity bias may spuriously arise from pure choice-perseveration (i.e., a tendency to repeat previous choices regardless of outcome) (49,50). While our models included a perseveration-component, this control may not be preferent. Therefore, in additional control analyses, we generated synthetic datasets using models including choice-perseveration but devoid of feedback-valence bias, and fitted them with our credibility-valence model (see SI 3.6.1). These analyses confirmed that perseveration can masquerade as an apparent positivity bias. Critically, however, these analyses also confirmed that perseveration cannot account for our main finding of increased positivity bias, relative to the overall extent of CA, for low-credibility feedback.”

      Additionally, we have added a detailed description of these additional analyses and their findings to the Supplementary Information document:

      “3.6 Positivity bias results cannot be explained by a pure perseveration

      3.6.1 Main study

      Previous research has suggested it may be challenging to dissociate between a feedback-valence positivity bias and perseveration (i.e., a tendency to repeat previous choices regardless of outcome). While our Credit Assignment (CA) models already include a perseveration mechanism to account for this, this control may not be perfect. We thus conducted several tests to examine if our positivity-bias related results could be accounted for by perseveration.

      First we examined whether our Bayesian-models, augmented by a perseveration mechanism (as in our CA model) can generate predictions similar to our empirical results. We repeated our cross-fitting procedure to these extended Bayesian models. To briefly recap, this involved fitting participant behavior with them, generating synthetic datasets based on the resulting maximum likelihood (ML) parameters, and then fitting these simulated datasets with our Credibility-Valence CA model (which is designed to detect positivity bias). This test revealed that adding perseveration to our Bayesian models did not predict a positivity bias in learning. In absolute terms there was a small negativity bias (instructed-credibility Bayesian: b=−0.19, F(1,1218)=17.78, p<0.001, Fig. S23a-b; free-credibility Bayesian: b=−0.17, F(1,1218)=13.74, p<0.001, Fig. S23d-e). In relative terms we detected no valence related bias (instructed-credibility Bayesian: b=−0.034, F(1,609)=0.45, p=0.50, Fig. S22c; free-credibility Bayesian: b=−0.04, F(1,609)=0.51, p=0.47, Fig. S23f). More critically, these simulations also did not predict a change in the level of positivity bias as a function of feedback credibility, neither at an absolute level (instructed-credibility Bayesian: F(2,1218)=0.024, p=0.98, Fig. S23b; free-credibility Bayesian: F(2,1218)=0.008, p=0.99, Fig. S23e), nor at a relative level (instructedcredibility Bayesian: F(2,609)=1.57, p=0.21, Fig. S23c; free-credibility Bayesian: F(2,609)=0.13, p=0.88, Fig. S23f). The upshot is that our positivity-bias findings cannot be accounted for by our Bayesian models even when these are augmented with perseveration.

      However, it is still possible that empirical CA parameters from our credibility-valence model (reported in main text Fig. 5) were distorted, absorbing variance from a perseveration. To address this, we took a “devil's advocate” approach testing the assumption that CA parameters are not truly affected by feedback valance and that there is only perseveration in our data. Towards that goal, we simulated data using our CredibilityCA model (which includes perseveration but does not contain a valence bias in its learning mechanism) and then fitted these synthetic datasets using our Credibility-Valence CA model to see if the observed positivity bias could be explained by perseveration alone. Specifically, we generated 101 “group-level” synthetic datasets (each including one simulation for each participant, based on their empirical ML parameters), and fitted each dataset with our Credibility-Valence CA model. We then analysed the resulting ML parameters in each dataset using the same mixed-effects models as described in the main text, examining the distribution of effects of interest across these simulated datasets. Comparing these simulation results to the data from participants revealed a nuanced picture. While the positivity bias observed in participants is within the range predicted by a pure perseveration account when measured in absolute terms (Fig. S24a), it is much higher than predicted by pure perseveration when measured relative to the overall level of learning (Fig. S24c). More importantly, the inflation in positivity bias for lower credibility feedback is substantially higher in participants than what would be predicted by a pure perseveration account, a finding that holds true for both absolute (Fig. S24b) and relative (Fig. S24d) measures.”

      “3.6.2 Discovery study

      We then replicated these analyses in our discovery study to confirm our findings. We again checked whether extended versions of the Bayesian models (including perseveration) predicted the positivity bias results observed. Our cross-fitting procedure showed that the instructed-credibility Bayesian model with perseveration did predict a positivity bias for all credibility levels in this discovery study, both when measured in absolute terms [50% credibility (b=1.74,t(824)=6.15), 70% credibility (b=2.00,F(1,824)=49.98), 85% credibility (b=1.81,F(1,824)=40.78), 100% credibility (b=2.42,F(1,824)=72.50), all p's<0.001], and in relative terms [50% credibility (b=0.25,t(412)=3.44), 70% credibility (b=0.31,F(1,412)=17.72), 85% credibility (b=0.34,F(1,412)=21.06), 100% credibility (b=0.42,F(1,412)=31.24), all p's<0.001]. However, importantly, these simulations did not predict a change in the level of positivity bias as a function of feedback credibility, neither at an absolute level (F(3,412)=1.43,p=0.24), nor at a relative level (F(3,412)=2.06,p=0.13) (Fig. S25a-c). In contrast, simulations of the free-credibility Bayesian model (with perseveration) predicted a slight negativity bias when measured in absolute terms (b=−0.35,F(1,824)=5.14,p=0.024), and no valence bias when measured relative to the overall degree of learning (b=0.05,F(1,412)=0.55,p=0.46). Crucially, this model also did not predict a change in the level of positivity bias as a function of feedback credibility, neither at an absolute level (F(3,824)=0.27,p=0.77), nor at a relative level (F(3,412)=0.76,p=0.47) (Fig. S25d-f).

      As in our main study, we next assessed whether our Credibility-CA model (which includes perseveration but no valence bias) predicted the positivity bias results observed in participants in the discovery study. This analysis revealed that the average positivity bias in participants is higher than predicted by a pure perseveration account, both when measured in absolute terms (Fig. S26a) and in relative terms (Fig. S26c). Specifically, only the aVBI for the 70% credibility agent was above what a perseveration account would predict, while the rVBI for all agents except the completely credible one exceeded that threshold. Furthermore, the inflation in positivity bias for lower credibility feedback (compared to the 100% credibility agent) is significantly higher in participants than would be predicted by a pure perseveration account, in both absolute (Fig. S26b) and relative (Fig. S26d) terms.

      Together, these results show that the general positivity bias observed in participants could be predicted by an instructed-credibility Bayesian model with perseveration, or by a CA model with perseveration. Moreover, we find that these two models can predict a positivity bias for the 50% credibility agent, raising a concern that our positivity bias findings for this source may be an artefact of not-fully controlled for perseveration. However, the credibility modulation of this positivity bias, where the bias is amplified for lower credibility feedback, is consistently not predicted by perseveration alone, regardless of whether perseveration is incorporated into a Bayesian or a CA model. This finding suggests that participants are genuinely modulating their learning based on feedback credibility, and that this modulation is not merely an artifact of choice perseveration.”

      (3) Veracity detection or positivity bias?

      The "True feedback elicits greater learning" effect (Figure 6) may be simply a re-description of the positivity bias shown in Figure 5. This figure shows that people have higher CA for trials where the feedback was in fact accurate. But assuming that people tend to choose more rewarding options, true-feedback cases will tend to also be positive-feedback cases. Accordingly, a positivity bias would yield this effect, even if people are not at all sensitive to trial-level feedback veracity. Of course, the reverse logic also applies, such that the "positivity bias" could actually reflect discounting of feedback that is less likely to be true. This idea has been proposed before as an explanation for confirmation bias (see Pilgrim et al, 2024 https://doi.org/10.1016/j.cognition.2023.105693and much previous work cited therein). The authors should discuss the ambiguity between the "positivity bias" and "true feedback" effects within the context of this literature....

      Before addressing these excellent comments, we first note that we have now improved our "TruthCA" model. Previously, our Truth-CA model considered whether feedback on each trial was true or not based on realized latent true outcomes. However, it is possible that the very same feedback would have had an opposite truth-status if the latent true outcome was different (recall true outcomes are stochastic). This injects noise into the trial classification in our former model. To avoid this, in our new model feedback is modulated by the probability the reported feedback is true (marginalized over stochasticity of true outcome). Please note in our responses below that we conducted extensive analysis to confirm that positivity bias doesn’t in fact predict the truthbias we detect using our truth biased model

      We have described this new model in the methods section:

      “Additionally, we formulated a “Truth-CA” model, which worked as our Credibility-CA model, but incorporated a free truth-bonus parameter (TB). This parameter modulates the extent of credit assignment for each agent based on the posterior probability of feedback being true (given the credibility of the feedback agent, and the true reward probability of the chosen bandit). The chosen bandit was updated as follows:

      𝑄 ← (1 – 𝑓<sub>Q</sub>) ∗ 𝑄 + [𝐶𝐴(𝑎𝑔𝑒𝑛𝑡) + 𝑇𝐵 ∗ (𝑃(𝑡𝑟𝑢𝑡ℎ) − 0.5)] ∗ 𝐹

      where P(truth) is the posterior probability of the feedback being true in the current trial (for exact calculation of P(truth) see “Methods: Bayesian estimation of posterior belief that feedback is true”).”

      All relevant results have been updated accordingly in the main text:

      To formally address whether feedback truthfulness modulates credit assignment, we fitted a new variant of the CA model (the “Truth-CA” model) to the data. This variant works as our Credibility-CA model, but incorporated a truth-bonus parameter (TB) which increases the degree of credit assignment for feedback as a function of the experimenter-determined likelihood the feedback is true (which is read from the curves in Fig 6a when x is taken to be the true probability the bandit is rewarding). Specifically, after receiving feedback, the Q-value of the chosen option is updated according to the following rule:

      𝑄 ← (1 – 𝑓<sub>Q</sub>) ∗ 𝑄 + [𝐶𝐴(𝑎𝑔𝑒𝑛𝑡) + 𝑇𝐵 ∗ (𝑃(𝑡𝑟𝑢𝑡ℎ) − 0.5)] ∗ 𝐹

      where 𝑇𝐵 is the free parameter representing the truth bonus, and 𝑃(𝑡𝑟𝑢𝑡ℎ) is the probability the received feedback being true (from the experimenter’s perspective). We acknowledge that this model falls short of providing a mechanistically plausible description of the credit assignment process, because participants have no access to the experimenter’s truthfulness likelihoods (as the true bandit reward probabilities are unknown to them). Nonetheless, we use this ‘oracle model’ as a measurement tool to glean rough estimates for the extent to which credit assignment Is boosted as a function of its truthfulness likelihood.

      Fitting this Truth-CA model to participants' behaviour revealed a significant positive truth-bonus (mean=0.21, t(203)=3.12, p=0.002), suggesting that participants indeed assign greater weight to feedback that is likely to be true (Fig. 6c; see SI 3.3.1 for detailed ML parameter results). Notably, simulations using our other models (Methods) consistently predicted smaller truth biases (compared to the empirical bias) (Fig. 6d). Moreover, truth bias was still detected even in a more flexible model that allowed for both a positivity bias and truth-bias (see SI 3.7). The upshot is that participants are biased to assign higher credit based on feedback that is more likely to be true in a manner that is inconsistent with out Bayesian models and above and beyond the previously identified positivity biases.”

      Finally, the Supplementary Information for the discovery study has also been revised to feature this analysis:

      “We next assessed whether participants infer whether the feedback they received on each trial was true or false and adjust their credit assignment based on this inference. We again used the “Truth-CA” model to obtain estimates for the truth bonus (TB), the increase in credit assignment as a function of the posterior probability of feedback being true. As in our main study, the fitted truth bias parameter was significantly positive, indicating that participants assign greater weight to feedback they believe is likely to be true (Fig, S4a; see SI 3.3.1 for detailed ML parameter results). Strikingly, model-simulations (Methods) predicted a lower truth bonus than the one observed in participants (Fig. S4b).”

      Additionally, we thank the reviewer for pointing us to the relevant work by Pilgrim et al. (2024). We agree that the relationship between "true feedback" and "positivity bias" effects is nuanced, and their potential overlap warrants careful consideration. Note our analyses suggest that this is not solely the case. Firstly, simulations of our Credibility-Valence CA model predict only a small "truth bonus" effect, which is notably smaller than what we observed in participants. Secondly, we formulated an extension of our "Truth-CA" model that includes a valence bias in credit assignment. If our truth bonus results were merely an artifact of positivity bias, this extended model should absorb that variance, producing a null truth bonus parameter. However, fitting this model to participant data still revealed a significant positive truth bonus, which again exceeds the range predicted by simulations of our Credibility CA model:

      “3.7 Truth inference is still detected when controlling for valence bias

      Given that participants frequently select bandits that are, on average, mostly rewarding, it is reasonable to assume that positive feedback is more likely to be objectively true than negative feedback. This raises a question if the "truth inference" effect we observed in participants might simply be an alternative description of a positivity bias in learning. To directly test this idea, we extended our Truth-CA model to explicitly account for a valence bias in credit assignment. This extended model features separate CA parameters for positive and negative feedback for each agent. When we fitted this new model to participant behavior, it still revealed a significant truth bonus in both the main study (Wilkoxon’s signrank test: median = 0.09, z(202)=2.12, p=0.034; Fig. S27a) and the discovery study (median = 3.52, z(102)=7.86, p<0.001; Fig. S27c). Moreover, in the main study, this truth bonus remained significantly higher than what was predicted by all the alternative models, with the exception of the instructed-credibility bayesian model (Fig. S27b). In the discovery study, the truth bonus was significantly higher than what was predicted by all the alternative models (Fig. S27d).”

      Together, these findings suggest that our truth inference results are not simply a re-description of a positivity bias.

      Conversely, we acknowledge the reviewer's point that our positivity bias results could potentially stem from a more general truth inference mechanism. We believe that this possibility should be addressed in a future study where participants rate their belief that received feedback is true (rather than a lie).We have extended our discussion to clarify this possibility and to include the suggested citation:

      “Our findings show that individuals increase their credit assignment for feedback in proportion to the perceived probability that the feedback is true, even after controlling for source credibility and feedback valence. Strikingly, this learning bias was not predicted by any of our Bayesian or credit-assignment (CA) models. Notably, our evidence for this bias is based on a “oracle model” that incorporates the probability of feedback truthfulness from the experimenter's perspective, rather than the participant’s. This raises an important open question: how do individuals form beliefs about feedback truthfulness, and how do these beliefs influence credit assignment? Future research should address this by eliciting trial-by-trial beliefs about feedback truthfulness. Doing so would also allow for testing the intriguing possibility that an exaggerated positivity bias for non-credible sources reflects, to some extent, a truth-based discounting of negative feedback—i.e., participants may judge such feedback as less likely to be true. However, it is important to note that the positivity bias observed for fully credible sources (here and in other literature) cannot be attributed to a truth bias—unless participants were, against instructions, distrustful of that source.”

      The authors get close to this in the discussion, but they characterize their results as differing from the predictions of rational models, the opposite of my intuition. They write:

      “Alternative "informational" (motivation-independent) accounts of positivity and confirmation bias predict a contrasting trend (i.e., reduced bias in low- and medium credibility conditions) because in these contexts it is more ambiguous whether feedback confirms one's choice or outcome expectations, as compared to a full-credibility condition.”

      I don't follow the reasoning here at all. It seems to me that the possibility for bias will increase with ambiguity (or perhaps will be maximal at intermediate levels). In the extreme case, when feedback is fully reliable, it is impossible to rationally discount it (illustrated in Figure 6A). The authors should clarify their argument or revise their conclusion here.

      We apologize for the lack of clarity in our previous explanation. We removed the sentence you cited (it was intended to make a different point which we now consider non-essential). Our current narration is consistent with the point you are making.

      (4) Disinformation or less information?

      Zooming out, from a computational/functional perspective, the reliability of feedback is very similar to reward stochasticity (the difference is that reward stochasticity decreases the importance/value of learning in addition to its difficulty). I imagine that many of the effects reported here would be reproduced in that setting. To my surprise, I couldn't quickly find a study asking that precise question, but if the authors know of such work, it would be very useful to draw comparisons. To put a finer point on it, this study does not isolate which (if any) of these effects are specific to disinformation, rather than simply less information. I don't think the authors need to rigorously address this in the current study, but it would be a helpful discussion point.

      We thank the reviewer for highlighting the parallel (and difference) between feedback reliability and reward stochasticity. However, we have not found any comparable results in the literature. We also note that our discussion includes a paragraph addressing the locus of our effects making the point that more studies are necessary to determine whether our findings are due to disinformation per se or sources being less informative. While this paragraph was included in the previous version it led us to infer our Discussion was too long and we therefore shortened it considerably:

      “An important question arises as to the psychological locus of the biases we uncovered. Because we were interested in how individuals process disinformation—deliberately false or misleading information intended to deceive or manipulate—we framed the feedback agents in our study as deceptive, who would occasionally “lie” about the true choice outcome. However, statistically (though not necessarily psychologically), these agents are equivalent to agents who mix truth-telling with random “guessing” or “noise” where inaccuracies may arise from factors such as occasionally lacking access to true outcomes, simple laziness, or mistakes, rather than an intent to deceive. This raises the question of whether the biases we observed are driven by the perception of potential disinformation as deceitful per se or simply as deviating from the truth. Future studies could address this question by directly comparing learning from statistically equivalent sources framed as either lying or noisy. Unlike previous studies wherein participants had to infer source credibility from experience (30,37,72), we took an explicit-instruction approach, allowing us to precisely assess source-credibility impact on learning, without confounding it with errors in learning about the sources themselves. More broadly, our work connects with prior research on observational learning, which examined how individuals learn from the actions or advice of social partners (72–75). This body of work has demonstrated that individuals integrate learning from their private experiences with learning based on others’ actions or advice—whether by inferring the value others attribute to different options or by mimicking their behavior (57,76). However, our task differs significantly from traditional observational learning. Firstly, our feedback agents interpret outcomes rather than demonstrating or recommending actions (30,37,72). Secondly, participants in our study lack private experiences unmediated by feedback sources. Finally, unlike most observational learning paradigms, we systematically address scenarios with deliberately misleading social partners. Future studies could bridge this by incorporating deceptive social partners into observational learning, offering a chance to develop unified models of how individuals integrate social information when credibility is paramount for decision-making.”

      (5) Over-reliance on analyzing model parameters

      Most of the results rely on interpreting model parameters, specifically, the "credit assignment" (CA) parameter. Exacerbating this, many key conclusions rest on a comparison of the CA parameters fit to human data vs. those fit to simulations from a Bayesian model. I've never seen anything like this, and the authors don't justify or even motivate this analysis choice. As a general rule, analyses of model parameters are less convincing than behavioral results because they inevitably depend on arbitrary modeling assumptions that cannot be fully supported. I imagine that most or even all of the results presented here would have behavioral analogues. The paper would benefit greatly from the inclusion of such results. It would also be helpful to provide a description of the model in the main text that makes it very clear what exactly the CA parameter is capturing (see next point).

      We thank the reviewer for this important suggestion which we address together with the following point.

      (6) RL or regression?

      I was initially very confused by the "RL" model because it doesn't update based on the TD error. Consequently, the "Q values" can go beyond the range of possible reward (SI Figure 5). These values are therefore not Q values, which are defined as expectations of future reward ("action values"). Instead, they reflect choice propensities, which are sometimes notated $h$ in the RL literature. This misuse of notation is unfortunately quite common in psychology, so I won't ask the authors to change the variable. However, they should clarify when introducing the model that the Q values are not action values in the technical sense. If there is precedent for this update rule, it should be cited.

      Although the change is subtle, it suggests a very different interpretation of the model.

      Specifically, I think the "RL model" is better understood as a sophisticated logistic regression, rather than a model of value learning. Ignoring the decay term, the CA term is simply the change in log odds of repeating the just-taken action in future trials (the change is negated for negative feedback). The PERS term is the same, but ignoring feedback. The decay captures that the effect of each trial on future choices diminishes with time. Importantly, however, we can re-parameterize the model such that the choice at each trial is a logistic regression where the independent variables are an exponentially decaying sum of feedback of each type (e.g., positive-cred50, positive-cred75, ... negative-cred100). The CA parameters are simply coefficients in this logistic regression.

      Critically, this is not meant to "deflate" the model. Instead, it clarifies that the CA parameter is actually not such an assumption-laden model estimate. It is really quite similar to a regression coefficient, something that is usually considered "model agnostic". It also recasts the non-standard "cross-fitting" approach as a very standard comparison of regression coefficients for model simulations vs. human data. Finally, using different CA parameters for true vs false feedback is no longer a strange and implausible model assumption; it's just another (perfectly valid) regression. This may be a personal thing, but after adopting this view, I found all the results much easier to understand.

      We thank the reviewer for their insightful and illuminating comments, particularly concerning the interpretation of our model parameters and the nature of our Credit assignment model. We believe your interpretation of the model is accurate and we now narrate it to readers in the hope that our modelling will become clearer and more intuitively. We also present to readers how these recasts our “cross-fitting” approach in the way you suggested (we return to this point below).

      Broadly, while we agree that modelling results depend on underlying assumptions, we believe that “model-agnostic” approaches also have important limitations—especially in reinforcement learning (RL), where choices are shaped by histories of past events, which such approaches often fail to fully account for. As students of RL, we are frequently struck by how careful modelling demonstrates that seemingly meaningful “model-agnostic” patterns can emerge as artefacts of unaccounted-for variables. We also note that the term “model-agnostic” is difficult to define—after all, even regression models rely on assumptions, and some computational models make richer or more transparent assumptions than others. Ideally, we aim to support our findings using converging methods wherever possible.

      We want to clarify that many of our reported findings indeed stem from straightforward behavioral analyses (e.g., simple regressions of choice-repetition), which do not rely on complex modeling assumptions. The two key results that primarily depend on the analysis of model parameters are our findings related to positivity bias and truth inference.

      Regarding the positivity bias, identifying truly model-agnostic behavioral signatures, distinct from effects like choice-perseveration, has historically been a significant challenge in the literature. Classical research on this bias rests on the interpretation of model parameters (Lefebvre et al., 2017; Palminteri et al., 2017), or at least on the use of models to assess what an “unbiased learner” baseline should look like (Palminteri & Lebreton, 2022). Some researchers have suggested possible regressions incorporating history effects to detect positivity bias from choicerepetition behavior, but these regressions (as our model) rely on subtle assumptions about forgetting and history effects (Toyama et al., 2019). Specifically, in our case, this issue is also demonstrated by analysis we conducted related to the previous point the reviewer made (about perseveration masquerading as positivity bias). We believe that dissociating clearly positivity bias from perseveration is an important challenge for the field going forward.

      For our truth inference results, obtaining purely behavioral signatures is similarly challenging due to the intricate interdependencies (the reviewer has identified in previous points) between agent credibility, feedback valence, feedback truthfulness, and choice accuracy within our task design.

      Finally, we agree with the reviewer that regression coefficients are often interpreted as a “modelagnostic” pattern. From this perspective even our findings regarding positivity and truth bias are not a case of over-reliance on complex model assumptions but are rather a way to expose deviations between empirical “sophisticated” regression coefficients and coefficients predicted from Bayesian models.

      We have now described the main learning rule of our model in the main text to ensure that the meaning of the CA parameters is clearer for readers:

      “Next, we formulated a family of non-Bayesian computational RL models. Importantly, these models can flexibly express non-Bayesian learning patterns and, as we show in following sections, can serve to identify learning biases deviating from an idealized Bayesian strategy. Here, an assumption is that during feedback, the choice propensity for the chosen bandit (which here is represented by a point estimate, “Q value“, rather than a distribution) either increases or decreases (for positive or negative feedback, respectively) according to a magnitude quantified by the free “Credit-Assignment (CA)” model parameters (47):

      𝑄(𝑐ℎ𝑜𝑠𝑒𝑛) ← (1 – 𝑓<sub>Q</sub>) ∗ 𝑄(𝑐ℎ𝑜𝑠𝑒𝑛) + 𝐶𝐴(𝑎𝑔𝑒𝑛𝑡, 𝑣𝑎𝑙𝑒𝑛𝑐𝑒) ∗ 𝐹

      where F is the feedback received from the agents (coded as 1 for reward feedback and -1 for non-reward feedback), while fQ (∈[0,1]) is the free parameter representing the forgetting rate of the Q-value (Fig. 2a, bottom panel; Fig. S5b; Methods). The probability to choose a bandit (say A over B) in this family of models is a logistic function of the contrast choice-propensities between these two bandits. One interpretation of this model is as a “sophisticated” logistic regression, where the CA parameters take the role of “regression coefficients” corresponding to the change in log odds of repeating the just-taken action in future trials based on the feedback (+/- CA for positive or negative feedback, respectively; the model also includes gradual perseveration which allows for constant log-odd changes that are not affected by choice feedback; see “Methods: RL models”) . The forgetting rate captures the extent to which the effect of each trial on future choices diminishes with time. The Q-values are thus exponentially decaying sums of logistic choice propensities based on the types of feedback a bandit received.”

      We also explain the implications of this perspective for our cross-fitting procedure:

      “To further characterise deviations between behaviour and our Bayesian learning models, we used a “crossfitting” method. Treating CA parameters as data-features of interest (i.e., feedback dependent changes in choice propensity), our goal was to examine if and how empirical features differ from features extracted from simulations of our Bayesian learning models. Towards that goal, we simulated synthetic data based on Bayesian agents (using participants’ best fitting parameters), but fitted these data using the CA-models, obtaining what we term “Bayesian-CA parameters” (Fig. 2d; Methods). A comparison of these BayesianCA parameters, with empirical-CA parameters obtained by fitting CA models to empirical data, allowed us to uncover patterns consistent with, or deviating from, ideal-Bayesian value-based inference. Under the sophisticated logistic-regression interpretation of the CA-model family the cross-fitting method comprises a comparison between empirical regression coefficients (i.e., empirical CA parameters) and regression coefficient based on simulations of Bayesian models (Bayesian CA parameters). Using this approach, we found that both the instructed-credibility and free-credibility Bayesian models predicted increased BayesianCA parameters as a function of agent credibility (Fig. 3c; see SI 3.1.1.2 Tables S8 and S9). However, an in-depth comparison between Bayesian and empirical CA parameters revealed discrepancies from ideal Bayesian learning, which we describe in the following sections.”

      Recommendations for the authors:

      Reviewer #3 (Recommendations for the authors):

      (1) Keep terms consistent, e.g., follow-up vs. main; hallmark vs. traditional.

      We have now changed the text to keep terms consistent.

      (2) CA model is like a learning rate; but it's based on the raw reward, not the TD error - this seems strange.

      We thank the reviewer for this comment. We understand that the use of a CA model instead of a TD error model may seem unusual at first glance. However, the CA model offers an important advantage: it more easily accommodates what we term "negative learning rates". This means that some participants may treat certain agents (especially the random one) as consistently deceitful, leading them to effectively increase/reduce choice tendencies following negative/positive feedback. A CA model handles this naturally by allowing negative CA parameters as a simple extension of positive ones. In contrast, adapting a TD error model to account for this is more complex. For instance, attempting to introduce a "negative learning rate" makes the RW model behave in a non-stable manner (e.g., Q values become <0 or >1). At the initial stages of our project, we explored different approaches to dealing with this issue and we found the CA model provides the best approach. For these reasons, we decided to proceed with our CA model.

      Additionally, we used the CA model in previous studies (e.g., Moran, Dayan & Dolan (2021)) where we included (in SI) a detailed discussion of the similarities and difference between creditassignment and Rescorla-Wagner models

      (3) Why was the follow-up study not pre-registered?

      We appreciate the reviewer's comment regarding preregistration, which we should have done. Unfortunately, this is now “water under the bridge” but going forward we hope to pre-register increasing parts of our work.

      (4) Other work looking at reward stochasticity?

      As noted in point 4 of the main weaknesses, previous work on reward stochasticity primarily focused on explaining the increase/decrease in learning and its mechanistic bases under varying stochasticity levels. In our study, we uniquely characterize several specific learning biases that are modulated by source credibility, a topic not extensively explored within the existing reward stochasticity framework, as far as we know.

      (5) Equation 1 is different from the one in the figure?

      The reviewer is completely correct. The figure provides a simplified visual representation, primarily focusing on the feedback-based update of the Q-value, and for simplicity, it omits the forgetting term present in the full Equation 1. To ensure complete clarity and prevent any misunderstanding, we have now incorporated a more detailed explanation of the model, including the complete Equation 1 and its components, directly within the main text. This comprehensive description will ensure that readers are fully aware of how the model operates.

      “Next, we formulated a family of non-Bayesian computational RL models. Importantly, these models can flexibly express non-Bayesian learning patterns and, as we show in following sections, can serve to identify learning biases deviating from an idealized Bayesian strategy. Here, an assumption is that during feedback, the choice propensity for the chosen bandit (which here is represented by a point estimate, “Q value“, rather than a distribution) either increases or decreases (for positive or negative feedback, respectively) according to a magnitude quantified by the free “Credit-Assignment (CA)” model parameters (47):

      𝑄(𝑐ℎ𝑜𝑠𝑒𝑛) ← (1 – 𝑓<sub>Q</sub>) ∗ 𝑄(𝑐ℎ𝑜𝑠𝑒𝑛) + 𝐶𝐴(𝑎𝑔𝑒𝑛𝑡, 𝑣𝑎𝑙𝑒𝑛𝑐𝑒) ∗ 𝐹

      where F is the feedback received from the agents (coded as 1 for reward feedback and -1 for non-reward feedback), while fQ (∈[0,1]) is the free parameter representing the forgetting rate of the Q-value (Fig. 2a, bottom panel; Fig. S5b; Methods).”

      (6) Please describe/plot the distribution of all fitted parameters in the supplement. I would include the mean and SD in the main text (methods) as well.

      Following the reviewer’s suggestions, we have included in the Supplementary Document tables displaying the mean and SD of fitted parameters from participants for our main models of interest. We have also plotted the distributions of such parameters. Both for the main study:

      (7) "A novel approach within the disinformation literature by exploiting a Reinforcement Learning (RL) experimental framework".

      The idea of applying RL to disinformation is not new. Please tone down novelty claims. It would be nice to cite/discuss some of this work as well.

      https://arxiv.org/abs/2106.05402?utm_source=chatgpt.com https://www.scirp.org/pdf/jbbs_2022110415273931.pdf https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4173312

      We thank the reviewer for pointing us towards relevant literature. We have now toned down the sentence in the introduction and cited the references provided:

      “To address these questions, we adopt a novel approach within the disinformation literature by exploiting a Reinforcement Learning (RL) experimental framework (36). While RL has guided disinformation research in recent years (37–40), our approach is novel in using one of its most popular tasks: the “bandit task”.”

      (8) Figure 3a - The figures should be in the order that they're referenced (3 is referenced before 2).

      We generally try to stick to this important rule but, in this case, we believe that our ordering serves better the narrative and hope the reviewer will excuse this small violation.

      (9) "Additionally, we found a positive feedback-effect for the 3-star agent"

      What is the analysis here? To avoid confusion with the "positive feedback" effect, consider using "positive effect of feedback". The dash wasn't sufficient to avoid confusion in my case.

      We have now updated the terms in the text to avoid confusion.

      (10) The discovery study revealed even stronger results supporting a conclusion that the credibility-CA model was superior to both Bayesian models for most subjects

      This is very subjective, but I'll just mention that my "cherry-picking" flag was raised by this sentence. Are you only mentioning cases where the discovery study was consistent with the main study? Upon a closer read, I think the answer is most likely "no", but you might consider adopting a more systematic (perhaps even explicit) policy on when and how you reference the discovery study to avoid creating this impression in a more casual reader.

      We thank the reviewer for this valuable suggestion. To prevent any impression of "cherry-picking", we have removed specific references to the discovery study from the main body of the text. Instead, all discussions regarding the convergence and divergence of results between the two studies are now in the dedicated section focusing on the discovery study:

      “The discovery study (n=104) used a disinformation task structurally similar to that used in our main study, but with three notable differences: 1) it included 4 feedback agents, with credibilities of 50%, 70%, 85% and 100%, represented by 1, 2, 3, and 4 stars, respectively; 2) each experimental block consisted of a single bandit pair, presented over 16 trials (with 4 trials for each feedback agent); and 3) in certain blocks, unbeknownst to participants, the two bandits within a pair were equally rewarding (see SI section 1.1). Overall, this study's results supported similar conclusions as our main study (see SI section 1.2) with a few differences. We found convergent support for increased learning from more credible sources (SI 1.2.1), superior fit for the CA model over Bayesian models (SI 1.2.2) and increased learning from feedback inferred to be true (SI 1.2.6). Additionally, we found an inflation of positivity bias for low-credibility both when measured relative to the overall level of credit assignment (as in our main study), or in absolute terms (unlike in our main study) (Fig. S3; SI 1.2.5). Moreover, choice-perseveration could not predict an amplification of positivity bias for low-credibility sources (see SI 3.6.2). However, we found no evidence for learning based on 50%-credibility feedback when examining either the feedback effect on choice repetition or CA in the credibility-CA model (SI 1.2.3).”

      (11) An in-depth comparison between Bayesian and empirical CA parameters revealed discrepancies from normative Bayesian learning.

      Consider saying where this in-depth comparison can be found (based on my reading, I think you're referring to the next section?

      We have now modified the sentence for better clarity:

      “However, an in-depth comparison between Bayesian and empirical CA parameters revealed discrepancies from ideal Bayesian learning, which we describe in the following sections.”

      (12) "which essentially provides feedback" Perhaps you meant "random feedback"?

      We have modified the text as suggested by the reviewer.

      <(13) Essentially random

      Why "essentially"? Isn't it just literally random?

      We have modified the text as suggested by the reviewer.

      (14) Both Bayesian models predicted an attenuated credit-assignment for the 3-star agent

      Attenuated relative to what? I wouldn't use this word if you mean weaker than what we see in the human data. Instead, I would say people show an exaggerated credit-assignment, since Bayes is the normative baseline.

      We changed the text according to the reviewer’s suggestion:

      “A comparison of empirical and Bayesian credit-assignment parameters revealed a further deviation from ideal Bayesian learning: participants showed an exaggerated credit-assignment for the 3-star agent compared with Bayesian models.”

      (15) "there was no difference between 2-star and 3-star agent contexts (b=0.051, F(1,2419)=0.39, p=0.53)"

      You cannot confirm the null hypothesis! Instead, you can write "The difference between 2-star and 3-star agent contexts was not significant". Although even with this language, you should be careful that your conclusions don't rest on the lack of a difference (the next sentence is somewhat ambiguous on this point).

      Additionally, the reported b coefs do not match the figure, which if anything, suggests a larger drop from 0.75 (2-star) to 1 (3-star). Is this a mixed vs fixed effects thing? It would be helpful to provide an explanation here.

      We thank the reviewer for this question. When we previously submitted our manuscript, we thought that finding enhanced credit-assignment for fully credible feedback following potential disinformation from a DIFFERENT context would constitute a striking demonstration of our “contrast effect”. However, upon reexamining this finding we found out we had a coding error (affecting how trials were filtered). We have now rerun and corrected this analysis. We have assessed the contrast effect for both "same-context" trials (where the contextual trial featured the same bandit pair as the learning trial) and "different-context" trials (where the contextual trial featured a different bandit pair). Our re-analysis reveals a selective significant contrast effect in the same-context condition, but no significant effect in the different-context condition. We have updated the main text to reflect these corrected findings and provide a clearer explanation of the analysis:

      “A comparison of empirical and Bayesian credit-assignment parameters revealed a further deviation from ideal Bayesian learning: participants showed an exaggerated credit-assignment for the 3-star agent compared with Bayesian models [Wilcoxon signed-rank test, instructed-credibility Bayesian model (median difference=0.74, z=11.14); free-credibility Bayesian model (median difference=0.62, z=10.71), all p’s<0.001] (Fig. 3a). One explanation for enhanced learning for the 3-star agents is a contrast effect, whereby credible information looms larger against a backdrop of non-credible information. To test this hypothesis, we examined whether the impact of feedback from the 3-star agent is modulated by the credibility of the agent in the trial immediately preceding it. More specifically, we reasoned that the impact of a 3-star agent would be amplified by a “low credibility context” (i.e., when it is preceded by a low credibility trial). In a binomial mixed effects model, we regressed choice-repetition on feedback valence from the last trial featuring the same bandit pair (i.e., the learning trial) and the feedback agent on the trial immediately preceding that last trial (i.e., the contextual credibility; see Methods for model-specification). This analysis included only learning trials featuring the 3-star agent, and context trials featuring the same bandit pair as the learning trial (Fig. 4a). We found that feedback valence interacted with contextual credibility (F(2,2086)=11.47, p<0.001) such that the feedback-effect (from the 3-star agent) decreased as a function of the preceding context-credibility (3-star context vs. 2-star context: b= -0.29, F(1,2086)=4.06, p=0.044; 2star context vs. 1-star context: b=-0.41, t(2086)=-2.94, p=0.003; and 3-star context vs. 1-star context: b=0.69, t(2086)=-4.74, p<0.001) (Fig. 4b). This contrast effect was not predicted by simulations of our main models of interest (Fig. 4c). No effect was found when focussing on contextual trials featuring a bandit pair different than the one in the learning trial (see SI 3.5). Thus, these results support an interpretation that credible feedback exerts a greater impact on participants’ learning when it follows non-credible feedback, in the same learning context.”

      We have modified the discussion accordingly as well:

      “A striking finding in our study was that for a fully credible feedback agent, credit assignment was exaggerated (i.e., higher than predicted by our Bayesian models). Furthermore, the effect of fully credible feedback on choice was further boosted when it was preceded by a low-credibility context related to current learning. We interpret this in terms of a “contrast effect”, whereby veridical information looms larger against a backdrop of disinformation (21). One upshot is that exaggerated learning might entail a risk of jumping to premature conclusions based on limited credible evidence (e.g., a strong conclusion that a vaccine produces significant side-effect risks based on weak credible information, following non-credible information about the same vaccine). An intriguing possibility, that could be tested in future studies, is that participants strategically amplify the extent of learning from credible feedback to dilute the impact of learning from noncredible feedback. For example, a person scrolling through a social media feed, encountering copious amounts of disinformation, might amplify the weight they assign to credible feedback in order to dilute effects of ‘fake news’. Ironically, these results also suggest that public campaigns might be more effective when embedding their messages in low-credibility contexts, which may boost their impact.”

      And we have included some additional analyses in the SI document:

      “3.5 Contrast effects for contexts featuring a different bandit Given that we observed a contrast effect when both the learning and the immediately preceding "context trial” involved the same pair of bandits, we next investigated whether this effect persisted when the context trial featured a different bandit pair – a situation where the context would be irrelevant to the current learning. Again, we used in a binomial mixed effects model, regressing choice-repetition on feedback valence in the learning trial and the feedback agent in the context trial. This analysis included only learning trials featuring the 3-star agent, and context trials featuring a different bandit pair than the learning trial (Fig. S22a). We found no significant evidence of an interaction between feedback valence and contextual credibility (F(2,2364)=0.21, p=0.81) (Fig. S22b). This null result was consistent with the range of outcomes predicted by our main computational models (Fig. S22c).”

      We aimed to formally compare the influence of two types of contextual trials: those featuring the same bandit pair as the learning trial versus those featuring a different pair. To achieve this, we extended our mixedeffects model by incorporating a new predictor variable, "CONTEXT_TYPE" which coded whether the contextual trial involved the same bandit pair (coded as -0.5) or a different bandit pair (+0.5) compared to the learning trial. The Wilkinson notation for this expanded mixed-effects model is:

      𝑅𝐸𝑃𝐸𝐴𝑇 ~ 𝐶𝑂𝑁𝑇𝐸𝑋𝑇_𝑇𝑌𝑃𝐸 ∗ 𝐹𝐸𝐸𝐷𝐵𝐴𝐶𝐾 ∗ (𝐶 𝐶𝑂𝑁𝑇𝐸𝑋𝑇<sub>2-star</sub> + 𝐶𝑂𝑁𝑇𝐸𝑋𝑇<sub>3-star</sub>) + 𝐵𝐸𝑇𝑇𝐸𝑅 + (1|𝑝𝑎𝑟𝑡𝑖𝑐𝑖𝑝𝑎𝑛𝑡)

      This expanded model revealed a significant three-way interaction between feedback valence, contextual credibility, and context type (F(2,4451) = 7.71, p<0.001). Interpreting this interaction, we found a 2-way interaction between context-source and feedback valence when the context was the same (F(2,4451) = 12.03, p<0.001), but not when context was different (F(2,4451) = 0.23, p = 0.79). Further interpreting the double feedback-valence * context-source interaction (for the same context) we obtained the same conclusions as reported in the main text.”

      (16) "Strikingly, model-simulations (Methods) showed this pattern is not predicted by any of our other models"

      Why doesn't the Bayesian model predict this?

      Thanks for the comment. Overall, Bayesian models do predict a slight truth inference effect (see Figure 6d). However, these effects are not as strong as the ones observed in participants, suggesting that our results go beyond what would be predicted by a Bayesian model.

      Conceptually, it's important to note that the Bayesian model can infer (after controlling for source credibility and feedback valence) whether feedback is truthful based solely on prior beliefs about the chosen bandit. Using this inferred truth to amplify the weight of truthful feedback would effectively amount to “bootstrapping on one’s own beliefs.” This is most clearly illustrated with the 50% agent: if one believes that a chosen bandit yields rewards 70% of the time, then positive feedback is more likely to be truthful than negative feedback. However, a Bayesian observer would also recognize that, given the agent’s overall unreliability, such feedback should be ignored regardless.

      (17) "A striking finding in our study was that for a fully credible feedback agent, credit assignment was exaggerated (i.e., higher than predicted by a Bayesian strategy)".

      "Since we did not find any significant interactions between BETTER and the other regressors, we decided to omit it from the model formulation".

      Was this decision made after seeing the data? If so, please report the original analysis as well.

      We have included the BETTER regressor again, and we have re-run the analyses. We now report the results of such regression. We have also changed the methods section accordingly:

      “We used a different mixed-effects binomial regression model to test whether value learning from the 3-star agent was modulated by contextual credibility. We focused this analysis on instances where the previous trial with the same bandit pair featured the 3-star agent. We regressed the variable REPEAT, which indicated whether the current trial repeated the choice from the previous trial featuring the same bandit-pair (repeated choice=1, non-repeated choice=0). We included the following regressors: FEEDBACK coding the valence of feedback in the previous trial with the same bandit pair (positive=0.5, negative=-0.5), CONTEXT2-star indicating whether the trial immediately preceding the previous trial with the same bandit pair (context trial) featured the 2-star agent (feedback from 2-star agent=1, otherwise=0), and CONTEXT3star indicating whether the trial immediately preceding the previous trial with the same bandit pair featured the 3-star agent. We also included a regressor (BETTER) coding whether the bandit chosen in the learning trial was the better -mostly rewarding- or the worse -mostly unrewarding- bandit within the pair. We included in this analysis only current trials where the context trial featured a different bandit pair. The model in Wilkinson’s notation was:

      𝑅𝐸𝑃𝐸𝐴𝑇~ 𝐹𝐸𝐸𝐷𝐵𝐴𝐶𝐾 ∗ (𝐶𝑂𝑁𝑇𝐸𝑋𝑇<sub>2-star</sub> + 𝐶𝑂𝑁𝑇𝐸𝑋𝑇<sub>3-star</sub>) + 𝐵𝐸𝑇𝑇𝐸𝑅 + (1|𝑝𝑎𝑟𝑡𝑖𝑐𝑖𝑝𝑎𝑛𝑡) ( 13 )

      In figure 4c, we independently calculate the repeat probability difference for the better (mostly rewarding) and worse (mostly non-rewarding) bandits and averaged across them. This calculation was done at the participants level, and finally averaged across participants.”

    1. Shellfish reefs, particularly mussels, can form large areas of habitat that are vital to their infaunal communities (Cole and McQuaid, 2010), but past research has shown that as calcifying organisms, they are the most vulnerable to warming and acidification (Kroeker et al., 2013a; Parker et al., 2013). On temperate Australian rocky shores, habitats created by the native mussel Trichomya hirsuta, and to a lesser extent, the invasive mussel Mytilus galloprovincialis support a local diversity of annelids, crustaceans, molluscs, and echinoderms (People, 2006; Cole, 2010). Eastern Australia is a climate change “hot-spot” with sea surface temperatures in this region increasing three times faster than the global average (Wernberg et al., 2011; Hobday and Pecl, 2014), and oceans are acidifying worldwide (Collins et al., 2013). The invasive M. galloprovincialis is relatively tolerant to environmental change (Hiebenthal et al., 2013); whereas little is known about the tolerance of T. hirsuta. As the oceans warm and acidify, M. galloprovincialis may have the capacity to replace T. hirsuta as the dominant biogenic habitat on the Australian rocky shores. Any changes in the biogenic mussel habitat could alter the infaunal communities, with downstream consequences for dependent organisms. Such consequences will have an impact on the natural communities and the success of current and future shellfish reef restoration projects (Pereira et al., 2019).

      If natives are replaced by hardier shellfish, do we think organisms will adapt to consume the new shellfish? Perhaps softer shelled mussels move in to the territory, will these areas be more susceptible to storm surges and wave energy? The new species may temporarily sound good but could be quickly destroyed by storm systems. This may enable the new species to spread out further and possibly benefit, or lead to the softer shelled mussels demise. Could the stronger storm systems associated with climate change put more stress on these oyster beds?

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      This work by Govorunova et al. identified three naturally blue-shifted channelrhodopsins (ChRs) from ancyromonads, namely AnsACR, FtACR, and NlCCR. The phylogenetic analysis places the ancyromonad ChRs in a distinct branch, highlighting their unique evolutionary origin and potential for novel applications in optogenetics. Further characterization revealed the spectral sensitivity, ionic selectivity, and kinetics of the newly discovered AnsACR, FtACR, and NlCCR. This study also offers valuable insights into the molecular mechanism underlying the function of these ChRs, including the roles of specific residues in the retinal-binding pocket. Finally, this study validated the functionality of these ChRs in both mouse brain slices (for AnsACR and FtACR) and in vivo in Caenorhabditis elegans (for AnsACR), demonstrating the versatility of these tools across different experimental systems.

      In summary, this work provides a potentially valuable addition to the optogenetic toolkit by identifying and characterizing novel blue-shifted ChRs with unique properties.

      Strengths:

      This study provides a thorough characterization of the biophysical properties of the ChRs and demonstrates the versatility of these tools in different ex vivo and in vivo experimental systems. The mutagenesis experiments also revealed the roles of key residues in the photoactive site that can affect the spectral and kinetic properties of the channel.

      We thank the Reviewer for his/her positive evaluation of our work.

      Weaknesses:

      While the novel ChRs identified in this work are spectrally blue-shifted, there still seems to be some spectral overlap with other optogenetic tools. The authors should provide more evidence to support the claim that they can be used for multiplex optogenetics and help potential end-users assess if they can be used together with other commonly applied ChRs. Additionally, further engineering or combination with other tools may be required to achieve truly orthogonal control in multiplexed experiments.

      To demonstrate the usefulness of ancyromonad ChRs for multiplex optogenetics as a proof of principle, we co-expressed AnsACR with the red-shifted cation-conducting ChR Chrimson and measured net photocurrent generated by this combination as a function of the wavelength. We found that it is hyperpolarizing in the blue region of the spectrum, and depolarizing at the red region. In the revision, we added a new panel (Figure 1D) showing these results and the following paragraph to the main text:

      “To test the possibility of using AnsACR in multiplex optogenetics, we co-expressed it with the red-shifted CCR Chrimson (Klapoetke et al., 2014) fused to an EYFP tag in HEK293 cells. We measured the action spectrum of the net photocurrents with 4 mM Cl<sup>-</sup> in the pipette, matching the conditions in the neuronal cytoplasm (Doyon, Vinay et al. 2016). Figure 1D, black shows that the direction of photocurrents was hyperpolarizing upon illumination with λ<500 nm and depolarizing at longer wavelengths. A shoulder near 520 nm revealed a FRET contribution from EYFP (Govorunova, Sineshchekov et al. 2020), which was also observed upon expression of the Chrimson construct alone (Figure 1D, red)”.

      In the C. elegans experiments, partial recovery of pharyngeal pumping was observed after prolonged illumination, indicating potential adaptation. This suggests that the effectiveness of these ChRs may be limited by cellular adaptation mechanisms, which could be a drawback in long-term experiments. A thorough discussion of this challenge in the application of optogenetics tools would prove very valuable to the readership.

      We added the following paragraph to the revised Discussion:

      “One possible explanation of the partial recovery of pharyngeal pumping that we observed after 15-s illumination, even at the highest tested irradiance, is continued attenuation of photocurrent during prolonged illumination (desensitization). However, the rate of AnsACR desensitization (Figure 1 – figure supplement 4A and Figure 1 – figure supplement 5A) is much faster than the rate of the pumping recovery, reducing the likelihood that desensitization is driving this phenomenon. Another possible reason for the observed adaptation is an increase in the cytoplasmic Cl<sup>-</sup> concentration owing to AnsACR activity and hence a breakdown of the Cl<sup>-</sup> gradient on the neuronal membrane. The C. elegans pharynx is innervated by 20 neurons, 10 of which are cholinergic (Pereira, Kratsios et al. 2015). A pair of MC neurons is the most important for regulation of pharyngeal pumping, but other pharyngeal cholinergic neurons, including I1, M2, and M4, also play a role (Trojanowski, Padovan-Merhar et al. 2014). Moreover, the pharyngeal muscles generate autonomous contractions in the presence of acetylcholine tonically released from the pharyngeal neurons (Trojanowski, Raizen et al. 2016). Given this complexity, further elucidation of pharyngeal pumping adaptation mechanisms is beyond the scope of this study.”

      Reviewer #2 (Public review):

      Summary:

      Govorunova et al present three new anion opsins that have potential applications in silencing neurons. They identify new opsins by scanning numerous databases for sequence homology to known opsins, focusing on anion opsins. The three opsins identified are uncommonly fast, potent, and are able to silence neuronal activity. The authors characterize numerous parameters of the opsins.

      Strengths:

      This paper follows the tradition of the Spudich lab, presenting and rigorously characterizing potentially valuable opsins. Furthermore, they explore several mutations of the identified opsin that may make these opsins even more useful for the broader community. The opsins AnsACR and FtACR are particularly notable, having extraordinarily fast onset kinetics that could have utility in many domains. Furthermore, the authors show that AnsACR is usable in multiphoton experiments having a peak photocurrent in a commonly used wavelength. Overall, the author's detailed measurements and characterization make for an important resource, both presenting new opsins that may be important for future experiments, and providing characterizations to expand our understanding of opsin biophysics in general.

      We thank the Reviewer for his/her positive evaluation of our work.

      Weaknesses:

      First, while the authors frequently reference GtACR1, a well-used anion opsin, there is no side-by-side data comparing these new opsins to the existing state-of-the-art. Such comparisons are very useful to adopt new opsins.

      GtACR1 exhibits the peak sensitivity at 515 nm and therefore is poorly suited for combination with red-shifted CCRs or fluorescent sensors, unlike blue-light-absorbing ancyromonad ACRs. Nevertheless, we conducted side-by-side comparison of ancyromonad ChRs, GtACR1 and GtACR2, the latter of which has the spectral maximum at 470 nm. The results are shown in the new Figures 1E and F, and the new multipanel Figure 1 – figure supplement 4 added in the revision. We also added the following text, describing these results, to the revised Results section:

      “Figures 1E and F show the dependence of the peak photocurrent amplitude and reciprocal peak time, respectively, on the photon flux density for ancyromonad ChRs and GtACRs. The current amplitude saturated earlier than the time-to-peak for all tested ChRs. Figure 1 – figure supplement 4A-E shows normalized photocurrent traces recorded at different photon densities. Quantitation of desensitization at the end of 1-s illumination revealed a complex light dependence (Figure 1, Figure Supplement 4F). Figure 1 – figure supplement 5 shows normalized photocurrent traces recorded in response to a 5-s light pulse of the maximal available intensity and the magnitude of desensitization at its end.”

      Next, multiphoton optogenetics is a promising emerging field in neuroscience, and I appreciate that the authors began to evaluate this approach with these opsins. However, a few additional comparisons are needed to establish the user viability of this approach, principally the photocurrent evoked using the 2p process, for given power densities. Comparison across the presented opsins and GtACR1 would allow readers to asses if these opsins are meaningfully activated by 2P.

      We carried out additional 2P experiments in ancyromonad ChRs, GtACR1 and GtACR2 and added their results to a new main-text Figure 6 and Figure 6 – figure supplement 1. We added the new section describing these results, “Two-photon excitation”, to the main text in the revision:

      “To determine the 2P activation range of AnsACR, FtACR, and NlCCR, we conducted raster scanning using a conventional 2P laser, varying the excitation wavelength between 800 and 1,080 nm (Figure 6 – figure supplement 1). All three ChRs generated detectable photocurrents with action spectra showing maximal responses at ~925 nm for AnsACR, 945 nm for FtACR, and 890 nm for NlCCR (Figure 6A). These wavelengths fall within the excitation range of common Ti:Sapphire lasers, which are widely used in neuroscience laboratories and can be tuned between ~700 nm and 1,020-1,300 nm. To assess desensitization, cells expressing AnsACR, FtACR, or NlCCR were illuminated at the respective peak wavelength of each ChR at 15 mW for 5 seconds. GtACR1 and GtACR2, previously used in 2P experiments (Forli, Vecchia et al. 2018, Mardinly, Oldenburg et al. 2018), were included for comparison. The normalized photocurrent traces recorded under these conditions are shown in Figure 6B-F. The absolute amplitudes of 2P photocurrents at the peak time and at the end of illumination are shown in Figure 6G and H, respectively. All five tested variants exhibited comparable levels of desensitization at the end of illumination (Figure 6I).”

      Reviewer #3 (Public review):

      Summary:

      The authors aimed to develop Channelrhodopsins (ChRs), light-gated ion channels, with high potency and blue action spectra for use in multicolor (multiplex) optogenetics applications. To achieve this, they performed a bioinformatics analysis to identify ChR homologues in several protist species, focusing on ChRs from ancyromonads, which exhibited the highest photocurrents and the most blue-shifted action spectra among the tested candidates. Within the ancyromonad clade, the authors identified two new anion-conducting ChRs and one cation-conducting ChR. These were characterized in detail using a combination of manual and automated patch-clamp electrophysiology, absorption spectroscopy, and flash photolysis. The authors also explored sequence features that may explain the blue-shifted action spectra and differences in ion selectivity among closely related ChRs.

      Strengths:

      A key strength of this study is the high-quality experimental data, which were obtained using well-established techniques such as manual patch-clamp and absorption spectroscopy, complemented by modern automated patch-clamp approaches. These data convincingly support most of the claims. The newly characterized ChRs expand the optogenetics toolkit and will be of significant interest to researchers working with microbial rhodopsins, those developing new optogenetic tools, as well as neuro- and cardioscientists employing optogenetic methods.

      We thank the Reviewer for his/her positive evaluation of our work.

      Weaknesses:

      This study does not exhibit major methodological weaknesses. The primary limitation of the study is that it includes only a limited number of comparisons to known ChRs, which makes it difficult to assess whether these newly discovered tools offer significant advantages over currently available options.

      We conducted side-by-side comparison of ancyromonad ChRs and GtACRs, wildly used for optical inhibition of neuronal activity. The results are shown in the new Figures 1E and F, and the new multipanel Figure 1 – figure supplement 4 and Figure 1 – figure supplement 5 added in the revision. We also added the following text, describing these results, to the revised Results section:

      “Figures 1E and F show the dependence of the peak photocurrent amplitude and reciprocal peak time, respectively, on the photon flux density for ancyromonad ChRs and GtACRs. The current amplitude saturated earlier than the time-to-peak for all tested ChRs. Figure 1 – figure supplement 4A-E shows normalized photocurrent traces recorded at different photon densities. Quantitation of desensitization at the end of 1-s illumination revealed a complex light dependence (Figure 1, Figure Supplement 4F). Figure 1 – figure supplement 5 shows normalized photocurrent traces recorded in response to a 5-s light pulse of the maximal available intensity and the magnitude of desensitization at its end.”

      Additionally, although the study aims to present ChRs suitable for multiplex optogenetics, the new ChRs were not tested in combination with other tools. A key requirement for multiplexed applications is not just spectral separation of the blue-shifted ChR from the red-shifted tool of interest but also sufficient sensitivity and potency under low blue-light conditions to avoid cross-activation of the respective red-shifted tool. Future work directly comparing these new ChRs with existing tools in optogenetic applications and further evaluating their multiplexing potential would help clarify their impact.

      As a proof of principle, we co-expressed AnsACR with the red-shifted cation-conducting CCR Chrimson and demonstrated that the net photocurrent generated by this combination is hyperpolarizing in the blue region of the spectrum, and depolarizing at the red region. In the revision, we added a new panel (Figure 1D) showing these results and the following paragraph to the main text:

      “To test the possibility of using AnsACR in multiplex optogenetics, we co-expressed it with the red-shifted CCR Chrimson (Klapoetke et al., 2014) fused to an EYFP tag in HEK293 cells. We measured the action spectrum of the net photocurrents with 4 mM Cl<sup>-</sup> in the pipette, matching the conditions in the neuronal cytoplasm (Doyon, Vinay et al. 2016). Figure 1D, black shows that the direction of photocurrents was hyperpolarizing upon illumination with λ<500 nm and depolarizing at longer wavelengths. A shoulder near 520 nm revealed a FRET contribution from EYFP (Govorunova, Sineshchekov et al. 2020), which was also observed upon expression of the Chrimson construct alone (Figure 1D, red)”.

      Reviewing Editor Comments:

      The reviewers suggest that direct comparison to GtACR1 is the most important step to make this work more useful to the community.

      We followed the Reviewers’ recommendations and carried out side-by-side comparison of ancyromonad ChRs and GtACR1 as well as GtACR2 (Figure 1E and F, Figure 1 – figure supplement 4, Figure 1 – figure supplement 5, and Figure 6). Note, however, that GtACR1’s spectral maximum is at 515 nm, which makes it poorly suitable for blue light excitation. Also, ChRs are known to perform very differently in different cell types and upon expression of their genes in different vector backbones, so our results cannot be generalized for all experimental systems. Each ChR user needs to select the most appropriate tool for his/her purpose by testing several candidates in his/her own experimental setting.

      Reviewer #1 (Recommendations for the authors):

      (1) The figure legend for Figure 2D-I appears to be incomplete. Please provide a detailed explanation of the panels.

      In the revision, we have expanded the legend of Figure 2 to explain all individual panels.

      (2) The meaning of the Vr shift (Y-axis in Figure 2H-I) should be clarified in the main text to aid reader understanding.

      In the revision, we added the phrase “which indicated higher relative permeability to NO<sub>3</sub> than to Cl<sup>-“</sup> to explain the meaning of the Vr shift upon replacement of Cl<sup>-</sup> with NO<sub>3</sub>-.

      (3) Adding statistical analysis for the peak and end photocurrent values in Figure 2D-F would strengthen the claim that there is minimal change in relative permeability during illumination.

      In the revision, we added the V<sub>r</sub> values for the peak photocurrent to Figure 2H-I, which already contained the V<sub>r</sub> values for the end photocurrent, and carried out a statistical analysis of their comparison. The following sentence was added to the text in the revision:

      “The V<sub>r</sub> values of the peak current and that at the end of illumination were not significantly different by the two-tailed Wilcoxon signed-rank test (Fig. 2G), indicating no change in the relative permeability during illumination.”

      (4) Figure 4H and I seem out of place in Figure 4, as the title suggests a focus on wild-proteins and AnsACR mutants. The authors could consider moving these panels to Figure 3 for better alignment with the content.

      As noted below, we changed the panel order in Figure 4 upon the Reviewer’s request. In particular, former Figure 4I is Figure 4C in the revision, and former Figure 4H is now panel C in Figure 3 – figure supplement 1 in the revision. We rearranged the corresponding section of the text (highlighted yellow in the manuscript).

      (5) The characterization section could be strengthened by including data on the pH sensitivity of FtACR, which is currently missing from the main figures.

      Upon the Reviewer’s request, we carried out pH titration of FtACR absorbance and added the results as Figure 4B in the revision.

      (6) The logic in Figure 4A-G appears somewhat disjointed. For example, Figure 4A shows pH sensitivity for WT AnsACR and the G86E mutant, while Figure 4 B-D shifts to WT AnsACR and the D226N mutant, and Figure 4E returns to the G86E mutant. Reorganizing or clarifying the flow would improve readability.

      We followed the Reviewer’s advice and changed the panel order in Figure 4. In the revised version, the upper row (panels A-C) shows the pH titration data of the three WTs, the middle row (panels D-F) shows analysis of the AnsACR_D226N mutant, and the lower row (panels G-I) shows analysis of the AnsACR_G88E mutant. We also rearranged accordingly the description of these panels in the text.

      (7) In Figure 5A, "NIACR" should likely be corrected to "NlCCR".

      We corrected the typo in the revision.

      (8) The statistical significance in Figure 6C and D is somewhat confusing. Clarifying which groups are being compared and using consistent symbols would improve interoperability.

      In the revision, we improved the figure panels and legend to clarify that the comparisons are between the dark and light stimulation groups within the same current injection.

      (9) The authors pointed out that at rest or when a small negative current was injected, the neurons expressing Cl- permeable ChRs could generate a single action potential at the beginning of photostimulation, as has been reported before. The authors could help by further discussing if and how this phenomenon would affect the applicability of such tools.

      We mentioned in the revised Discussion section that activation of ACRs in the axons could depolarize the axons and trigger synaptic transmission at the onset of light stimulation, and this undesired excitatory effect need to be taken into consideration when using ACRs.

      Reviewer #2 (Recommendations for the authors):

      Govorunova et al present three new anion opsins that have potential applications in silencing neurons. This paper follows the tradition of the Spudich lab, presenting and rigorously characterizing potentially valuable opsins. Furthermore, they explore several mutations of the identified opsin that may make these opsins even more useful for the broader community. In general, I feel positively about this manuscript. It presents new potentially useful opsins and provides characterization that would enable its use. I have a few recommendations below, mostly centered around side-by-side comparisons to existing opsins.

      (1) My primary concern is that while there is a reference to GtACR1, a highly used opsin first described by this team, they do not present any of this data side by side.

      When evaluating opsins to use, it is important to compare them to the existing state of the art. As a potential user, I need to know where these opsins differ. Citing other papers does not solve this as, even within the same lab, subtle methodological differences or data plotting decisions can obscure important differences.

      As we explained in the response to the public comments, we carried out side-by-side comparison of ancyromonad ChRs and GtACRs as requested by the Reviewer. The results are shown in the new Figures 1E and F, and the new multipanel Figure 1 – figure supplement 4 and Figure 1 – figure supplement 5, added in the revision. However, we would like to emphasize a limited usefulness of such comparative analysis, as ChRs are known to perform very differently in different cell types and upon expression of their genes in different vector backbones, so our results cannot be generalized for all experimental systems. Each ChR user needs to select the most appropriate tool for his/her purpose by testing several candidates in his/her own experimental setting.

      (2) Multiphoton optogenetics is an emerging field of optogenetics, and it is admirable that the authors address it here. The authors should present more 2p characterization, so that it can be established if these new opsins are viable for use with 2P methods, the way GtACR1 is. The following would be very useful for 2P characterization:

      Photocurrents for a given power density, compared to GtACR1 and GtACR2.

      The new Figure 6 (B-F) added in the revision shows photocurrent traces recorded from the three ancyromonad ChRs and  two GtACRs upon 2P excitation of a given power density.

      Comparing NICCR and FtACR's wavelength specificity and photocurrent. If these opsins are too weak to create reasonable 2P spectra, this difference should be discussed.

      The new Figure 6A shows the 2P action spectra of all three ancyromonad ChRs.

      A Trace and calculated photocurrent kinetics to compare 1P and 2P. This need not be the flash-based absorption characterization of Figure 3, but a side-by-side photocurrent as in Figure 2.

      As mentioned above, photocurrent traces recorded from ancyromonad ChRs and GtACRs upon 2P excitation are shown in the new Figure 6 (B-F). However, direct comparison of the 2P data with the 1P data is not possible, as we used laser scanning illumination for the former and wild-field illumination for the latter.

      Characterization of desensitization. As the authors mention, many opsins undergo desensitization, presenting the ratio of peak photocurrent vs that at multiple time points (probably up to a few seconds) would provide evidence for how effectively these constructs could be used in different scenarios.

      We conducted a detailed analysis of desensitization under both 1P and 2P excitation. The new Figure 1 – figure supplement 4 and Figure 1 – figure supplement 5 show the data obtained under 1P excitation, and the new Figure 6 shows the data for 2P conditions.

      I have to admit, that by the end of the paper, I was getting confused as to which of the three original constructs had which property, and how that was changing with each mutation. I would suggest that a table summarizing each opsin and mutation with its onset and offset kinetics, peak wavelength, photocurrent, and ion selectivity would greatly increase the ability to select and use opsins in the future.

      In the revision, we added a table of the spectroscopic properties of all tested mutants as Supplementary File 2. This study did not aim to analyze other parameters listed by the Reviewer. We added the following sentence referring to this table to the main text:

      “Supplementary File 2 contains the λ values of the half-maximal amplitude of the long-wavelength slope of the spectrum, which can be estimated more accurately from the action spectra than the λ of the maximum.”

      It may be out of the scope of this manuscript, but if a soma localization sequence can be shown to remove the 'axonal spiking' (as described in line 441), this would be a significant addition to the paper.

      Our previous study (Messier et al., 2018, doi: 10.7554/eLife.38506) showed that a soma localization sequence can reduce, but not eliminate, the axonal spiking. We plan to test these new ACRs with the trafficking motifs in the future.

      NICCR appears to have the best photocurrents of all tested opsins in this paper. It seems odd that it was omitted from the mouse cortical neurons experiments.

      We have not included analysis of NlCCR behavior in neurons because we are preparing a separate manuscript on this ChR.

      Figure 6 would benefit from more gradation in the light powers used to silence and would benefit from comparison to GtACR. I suggest using a fixed current with a series of illumination intensities to see which of the three opsins (or GtACR) is most effective at silencing. At present, it looks binary, and a user cannot evaluate if any of these opsins would be better than what is already available.

      In the revision, we added the data comparing the light sensitivity of AnsACR and FtACR with previously identified GtACR1 and GtACR2 (new Figure 1E and F) to help users compare these ACRs. Although they are less sensitive to light comparing to GtACR1 and GtACR2, they could still be activated by commercially available light sources if the expression levels are similar. Less sensitive ACRs may have less unwanted activation when using with other optogenetic tools.

      Reviewer #3 (Recommendations for the authors):

      Suggested Improvements to Experiments, Data, or Analyses:

      (1) Line 25: "significantly exceeding those by previously known tools" and Line 408: "NlCCR is the most blue-shifted among ancyromonad ChRs and generates larger photocurrents than the earlier known CCRs with a similar absorption maximum." As noted in the public review, this statement applies only to a very specific subgroup of ChRs with spectral maxima below 450 nm. If the goal was to claim that NlCCR is a superior tool among a broader range of blue-light-activated ChRs, direct comparisons with state-of-the-art ChRs such as ChR2 T159C (Berndt et al., 2011), CatCh (Kleinlogel et al., 2014), CoChR (Klapoetke et al., 2014), CoChR-3M (Ganjawala et al., 2019), or XXM 2.0 (Ding et al., 2022) would be beneficial. If the goal was to demonstrate superiority among tools with spectra below 450 nm, I suggest explicitly stating this in the paper.

      The Reviewer correctly inferred that we emphasized the superiority of NlCCR among tools with similar spectral maxima, not all blue-light-activated ChRs available for neuronal photoexcitation, most of which exhibit absorption maxima at longer wavelengths. To clarify this, we added “with similar spectral maxima” to the sentence in the original Line 25. The sentence in Line 408 already contains this clarification: “with a similar absorption maximum”.

      (2) Lines 111-113: "The absorption spectra of the purified proteins were slightly blue-shifted from the respective photocurrent action spectra (Figure 1D), likely due to the presence of non-electrogenic cis-retinal-bound forms." I would be skeptical of this statement. The spectral shifts in NlCCR and AnsACR are small and may fall within the range of experimental error. The shift in FtACR is more apparent; however, if two forms coexist in purified protein, this should be reflected as two Gaussian peaks in the absorption spectrum (or at least as a broader total peak reflecting two states with close maxima and similar populations). On the contrary, the action spectrum appears to have two peaks, one potentially below 465 nm. Generally, neither spectrum appears significantly broader than a typical microbial rhodopsin spectrum. This question could be clarified by quantifying the widths of the absorption and action spectra or by overlaying them on the same axis. In my opinion, the two spectra seem very similar, and just appearance of the "bump" in the action spectum shifts the apparent maximum of the action spectrum to the red. If there were two states, then they should both be electrogenic, and the slight difference in spectra might be explained by something else (e.g. by a slight difference in the quantum yields of the two states).

      As the Reviewer suggested, in the revision we added a new figure (Figure 1 – figure supplement 2), showing the overlay of the absorption and action spectra of each ancyromonad ChR. This figure shows that the absorption spectra are wider than the action spectra (especially in AnsACR and FtACR), which confirms our interpretation (contribution of the non-electrogenic blue-shifted cis-retinal-bound forms to the absorption spectrum). Note that the presence of such forms explaining a blue shift of the absorption spectrum has been experimentally verified in HcKCR1 (doi: 10.1016/j.cell.2023.08.009; 10.1038/s41467-025-56491-9). Therefore, we revised the text as follows:

      “The absorption spectra of the purified proteins (Figure 1C) were slightly blue-shifted from the respective photocurrent action spectra (Figure 1 – figure supplement 3), likely due to the presence of non-electrogenic cis-retinal-bound forms. The presence of such forms, explaining the discrepancy between the absorption and the action spectra, was verified by HPLC in KCRs (Tajima et al. 2023, Morizumi et al., 2025).”

      (3) Lines 135-136: "The SyncroPatch enables unbiased estimation of the photocurrent amplitude because the cells are drawn into the wells without considering their tag fluorescence." While SyncroPatch does allow unbiased selection of patched cells, it does not account for the fraction of transfected cells. Without a method to exclude non-transfected cells, which are always present in transient transfections, the comparison of photocurrents may be affected by the proportion of untransfected cells, which could vary between constructs. To clarify whether the statistically significant difference in the Kolmogorov-Smirnov test could indicate that the fraction of transfected cells after 48-72h differs between constructs, I suggest analyzing only transfected cells or reporting fractions of transfected cells by each construct.

      The Reviewer correctly states that non-transfected cells are always present in transiently transfected cell populations. However, his/her suggestion to “exclude non-transfected cells” is not feasible in the absence of a criterion for such exclusion. As it is evident from our data, transient transfection results in a continuum of the amplitude values, and it is not possible to distinguish a small photocurrent from no photocurrent, considering the noise level. We would like, however, to emphasize that not excluding any cells provides an estimate of the overall potency of each ChR variant, which depends on both the fraction of transfected cells and their photocurrents. This approach mimics the conditions of in vivo experiments, when non-expressing cells also cannot be excluded.

      (4) Line 176: "AnsACR and FtACR photocurrents exhibited biphasic rise." The fastest characteristic time is very close to the typical resolution of a patch-clamp experiment (RC = 50 μs for a 10 pF cell with a 5 MΩ series resistance). Thus, I am skeptical that the faster time constant of the biphasic opening represents a protein-specific characteristic time. It may not be fully resolved by patch-clamp and could simply result from low-pass filtering of a specific cell. I suggest clarifying this for the reader.

      The Reviewer is right that the patch clamp setup acts as a lowpass filter. Earlier, we directly measured its time resolution (~15 μs) by recording the ultrafast (occurring on the ps time scale) charge movements related to the trans-cis isomerization (doi: 10.1111/php.12558). However, the lowpass filter of the setup can only slow the entire signal, but cannot lead to the appearance of a separate kinetic component (i.e. a monophasic process cannot become biphasic). Therefore, we believe that the biphasic photocurrent rise reflects biphasic channel opening rather than a measurement artifact. Two phases in the channel opening have also been detected in GtACR1 (doi: 10.1073/pnas.1513602112) and CrChR2 (10.1073/pnas.1818707116).

      (5) Line 516: "The forward LED current was 900 mA." It would be more informative to report the light intensity rather than the forward current, as many readers may not be familiar with the specific light output of the used LED modules at this forward current.

      We have added the light intensity value in the revision:

      “The forward LED current was 900 mA (which corresponded to the irradiance of ~2 mW mm<sup>-2</sup>)…”

      (6) Lines 402-403: "The NlCCR ... contains a neutral residue in the counterion position (Asp85 in BR), which is typical of all ACRs. Yet, NlCCR does not conduct anions, instead showing permeability to Na+." This is not atypical for CCRs and has been demonstrated in previous works of the authors (CtCCR in Govorunova et al. 2021, ChvCCR1 in Govorunova et al. 2022). What is unique is the absence of negatively charged residues in TM2, as noted later in the current study. However, the absence of negatively charged residues in TM2 appears to be rare for ACRs as well. Not as a strong point of criticism, but to enhance clarity, I suggest analyzing the frequency of carboxylate residues in TM2 of ACRs to determine whether the unique finding is relevant to ion selectivity or to another property.

      The Reviewer is correct that some CCRs lack a carboxylate residue in the D85 position, so this feature alone cannot be considered as a differentiating criterion. However, the complete absence of glutamates in TM2 is not rare in ACRs and is found, for example, in HfACR1 and CarACR2. We have discussed this issue in our earlier review (doi: 10.3389/fncel.2021.800313) and do not think that repeating this discussion in this manuscript is appropriate.

      Recommendations for Writing and Presentation:

      (1) Some figures contain incomplete or missing labels:

      Figure 2: Panels D to I lack labels.

      In the revision, we have expanded the legend of Figure 2 to explain all individual panels.

      Figure 3 - Figure Supplement 1: Missing explanations for each panel.

      In the revision, we changed the order of panes and explained all individual panels in the legend.

      Figure 5 - Figure Supplement 1: Missing explanations for each panel.

      No further explanation for individual panels in this Figure is needed because all panels show the action spectra of various mutants, the names of which are provided in the panels themselves. Repeating this information in the figure legend would be redundant.

      (2) In Figure 2, "sem" is written in lowercase, whereas "SEM" is capitalized in other figures. Standardizing the format would improve consistency.

      In the revision, we changed the font of the SEM abbreviation to the uppercase in all instances.

      (3) Line 20: "spectrally separated molecules must be found in nature." There is no proof that they cannot be developed synthetically; rather, it is just difficult. I suggest softening this statement, as the findings of this study, together with others, will probably allow designing molecules with specified spectral properties in the future.

      In the revision, we changed the cited sentence to the following:

      “Multiplex optogenetic applications require spectrally separated molecules, which are difficult to engineer without disrupting channel function”.

      (4) Line 216-219: "Acidification increased the amplitude of the fast current ~10-fold (Figure 4F) and shifted its Vr ~100 mV (Figure 3 - figure supplement 1D), as expected of passive proton transport. The number of charges transferred during the fast peak current was >2,000 times smaller than during the channel opening, from which we concluded that the fast current reflects the movement of the RSB proton." The claim about passive transport of the RSB proton should be clarified, as typically, passive transport is not limited to exactly one proton per photocycle, and the authors observe the increase in the fast photocurrents upon acidification.

      We thank the Reviewer for pointing out the confusing character of our description. To clarify the matter, we added a new photocurrent trace to Figure 4I in the revision recorded from AnsACR_G86E at 0 mV and pH 7.4. We have rewritten the corresponding section of Results as follows:

      “Its rise and decay τ corresponded to the rise and decay τ of the fast positive current recorded from AnsACR_G86E at 0 mV and neutral pH, superimposed on the fast negative current reflecting the chromophore isomerization (Figure 4I, upper black trace). We interpret this positive current as an intramolecular proton transfer to the mutagenetically introduced primary acceptor (Glu86), which was suppressed by negative voltage (Figure 4I, lower black trace). Acidification increased the amplitude of the fast negative current ~10-fold (Figure 4I, black arrow) and shifted its V<sub>r</sub> ~100 mV to more depolarized values (Figure 4 – figure supplement 2A). This can be explained by passive inward movement of the RSB proton along the large electrochemical gradient.”

      Minor Corrections:

      (1) Line 204: Missing bracket in "phases in the WT (Figure 4D."

      The quoted sentence was deleted during the revision.

      (2) Line 288: Typo-"This Ala is conserved" should probably be "This Met is conserved."

      We mean here the Ala four residues downstream from the first Ala. To avoid confusion, we changed the cited sentence to the following:

      “The Ala corresponding to BR’s Gly122 is also found in AnsACR and NlCCR (Figure 5A)…”

      (3) Lines 702-704: Missing Addgene plasmid IDs in "(plasmids #XXX and #YYY, respectively)."

      In the revision, we added the missing plasmid IDs.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      There is growing appreciation for the important of luminal (apical) ECM in tube development, but such matrices are much less well understood than basal ECMs. Here the authors provide insights into the aECM that shapes the Drosophila salivary gland (SG) tube and the importance of PAPSS-dependent sulfation in its organization and function.

      The first part of the paper focuses on careful phenotypic characterization of papss mutants, using multiple markers and TEM. This revealed reduced markers of sulfation and defects in both apical and basal ECM organization, Golgi (but not ER) morphology, number and localization of other endosomal compartments, plus increased cell death. The authors focus on the fact that papss mutants have an irregular SG lumen diameter, with both narrowed regions and bulged regions. They address the pleiotropy, showing that preventing the cell death and resultant gaps in the tube did not rescue the SG luminal shape defects and discussing similarities and differences between the papss mutant phenotype and those caused by more general trafficking defects. The analysis uses a papss nonsense mutant from an EMS screen - I appreciate the rigorous approach the authors took to analyze transheterozygotes (as well as homozygotes) plus rescued animals in order to rule out effects of linked mutations. Importantly, the rescue experiments also demonstrated that sulfation enzymatic activity is important.

      The 2nd part of the paper focuses on the SG aECM, showing that Dpy and Pio ZP protein fusions localize abnormally in papss mutants and that these ZP mutants (and Np protease mutants) have similar SG lumen shaping defects to the papss mutants. A key conclusion is that SG lumen defects correlate with loss of a Pio+Dpy-dependent filamentous structure in the lumen. These data suggest that ZP protein misregulation could explain this part of the papss phenotype.

      Overall, the text is very well written and clear. Figures are clearly labeled. The methods involve rigorous genetic approaches, microscopy, and quantifications/statistics and are documented appropriately. The findings are convincing.

      Significance:

      This study will be of interest to researchers studying developmental morphogenesis in general and specifically tube biology or the aECM. It should be particularly of interest to those studying sulfation or ZP proteins (which are broadly present in aECMs across organisms, including humans).

      This study adds to the literature demonstrating the importance of luminal matrix in shaping tubular organs and greatly advances understanding of the luminal matrix in the Drosophila salivary gland, an important model of tubular organ development and one that has key matrix differences (such as no chitin) compared to other highly studied Drosophila tubes like the trachea.

      The detailed description of the defects resulting from papss loss suggests that there are multiple different sulfated targets, with a subset specifically relevant to aECM biology. A limitation is that specific sulfated substrates are not identified here (e.g. are these the ZP proteins themselves or other matrix glycoproteins or lipids?); therefore, it's not clear how direct or indirect the effects of papss are on ZP proteins. However, this is clearly a direction for future work and does not detract from the excellent beginning made here.

      Comments on revised version:

      Overall, I am pleased with the authors' revisions in response to my original comments and those of the other reviewers

      Reviewer #2 (Public review):

      Summary

      This study provides new insights into organ morphogenesis using the Drosophila salivary gland (SG) as a model. The authors identify a requirement for sulfation in regulating lumen expansion, which correlates with several effects at the cellular level, including regulation of intracellular trafficking and the organization of Golgi, the aECM and the apical membrane. In addition, the authors show that the ZP proteins Dumpy (Dpy) and Pio form an aECM regulating lumen expansion. Previous reports already pointed to a role for Papss in sulfation in SG and the presence of Dpy and Pio in the SG. Now this work extends these previous analyses and provides more detailed descriptions that may be relevant to the fields of morphogenesis and cell biology (with particular focus on ECM research and tubulogenesis). This study nicely presents valuable information regarding the requirements of sulfation and the aECM in SG development.

      Strengths

      -The results supporting a role for sulfation in SG development are strong. In addition, the results supporting the involvement of Dpy and Pio in the aECM of the SG, their role in lumen expansion, and their interactions, are also strong.

      -The authors have made an excellent job in revising and clarifying the many different issues raised by the reviewers, particularly with the addition of new experiments and quantifications. I consider that the manuscript has improved considerably.

      -The authors generated a catalytically inactive Papss enzyme, which is not able to rescue the defects in Papss mutants, in contrast to wild type Papss. This result clearly indicates that the sulfation activity of Papss is required for SG development.

      Weaknesses

      -The main concern is the lack of clear connection between sulfation and the phenotypes observed at the cellular level, and, importantly, the lack of connection between sulfation and the Pio-Dpy matrix. Indeed, the mechanism/s by which sulfation affects lumen expansion are not elucidated and no targets of this modification are identified or investigated. A direct (or instructive) role for sulfation in aECM organization is not clearly supported by the results, and the connection between sulfation and Pio/Dpy roles seems correlative rather than causative. As it is presented, the mechanisms by which sulfation regulates SG lumen expansion remains elusive in this study.

      -In my opinion the authors overestimate their findings with several conclusions, as exemplified in the abstract:

      "In the absence of Papss, Pio is gradually lost in the aECM, while the Dpy-positive aECM structure is condensed and dissociates from the apical membrane, leading to a thin lumen. Mutations in dpy or pio, or in Notopleural, which encodes a matriptase that cleaves Pio to form the luminal Pio pool, result in a SG lumen with alternating bulges and constrictions, with the loss of pio leading to the loss of Dpy in the lumen. Our findings underscore the essential role of sulfation in organizing the aECM during tubular organ formation and highlight the mechanical support provided by ZP domain proteins in maintaining luminal diameter."

      The findings leading to conclude that sulfation organizes the aECM and that the absence of Papss leads to a thin lumen due to defects in Dpy/Pio are not strong. The authors certainly show that Papss is required for proper Pio and Dpy accumulation. They also show that Pio is required for Dpy accumulation, and that Pio and Dpy form an aECM required for lumen expansion. However, the absence of Pio and Dpy do not fully recapitulate Papss mutant defects (thin lumen). I wonder whether other hypothesis and models could account for the observed results. For instance, a role for Papss affecting secretion, in which case sulfation would have an indirect role in aECM organization. This study does not address the mechanical properties of Dpy in normal and mutant salivary glands.

      -Minor issues relate to the genotype/phenotype analysis. It is surprising that the authors detect only mild effects on sulfation in Papss mutants using an anti-sulfoTyr antibody, as Papss is the only Papss synthathase. Generating germ line clones (which is a feasible experiment) would have helped to prove that this minor effect is due to the contribution of maternal product. The loss of function allele used in this study seems problematic, as it produces effects in heterozygous conditions difficult to interpret. Cleaning the chromosome or using an alternative loss of function condition (another allele, RNAi, etc...) would have helped to present a more reliable explanation.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Overall, I am pleased with the authors' revisions in response to my original comments and those of the other reviewers. The addition of the sulfation(-) mutant to Fig. 1 is particularly nice. I have just a few additional suggestions for text changes to improve clarity/precision.

      (1) The current title of this manuscript is quite broad, making it sound like a review article. I recommend adding sulfation and salivary gland to the title to convey the main points more clearly. e.g. Sulfation affects apical extracellular matrix organization during development of the Drosophila salivary gland tube.

      Thank you for the suggestion. We agree and have changed the title of the paper as suggested.

      (2) Figure 1B shows very striking enrichment of papss expression in the salivary gland compared to other tubes like the trachea that also contain Pio and Dpy. To me, this implies that the key substrate(s) of Papss are likely to be unique, or at least more highly enriched, in the salivary gland aECM compared to the tracheal aECM (e.g. probably not Pio or Dpy themselves). I suggest that the authors address the implications of this apparent SG specificity in the discussion (paragraph beginning on p. 21, line 559).

      Yes, we agree that there may be other key substrates of Papss in the SG, such as mucins, which play an important role in organizing the aECM and expanding the lumen. We have included a discussion.

      (3) p. 15, lines 374-376 "The Pio protein is known to be cleaved, at one cleavage site after the ZP domain by the furin protease and at another cleavage site within the ZP domain by the matriptase Notopleural (Np) (Drees et al., 2019; Drees et al., 2023; Figure 5B)." As far as I can see, the Drees papers show that Pio is cleaved somewhere in the vicinity of a consensus furin cleavage site, but do not actually establish that the cleavage happens at this exact site or is done by a furin protease (this is just an assumption). Please word more carefully, e.g. "at one cleavage site after the ZP domain, possibly by a furin protease".

      Thank you for pointing this out. We have edited the text.

      Reviewer #2 (Recommendations for the authors):

      Throughout the paper, I find a bit confusing the description of the lumen phenotype and their interpretations.

      Papss mutants produce SG that are either "thin" or show "irregular lumen with bulges". Do the authors think that these are two different manifestations of the same effect? or do they think that there are different causes behind?

      The thin lumen phenotype appears to occur when the Pio-Dpy matrix is significantly condensed. When this matrix is less condensed in one region of the lumen than in other regions, the lumen appears irregular with bulges.

      Are the defects in Grasp65 mutants categorized as "irregular lumen with bulges" similar to those in Papss mutants? Why do these mutants don't show a "thin lumen" defect?

      Grasp65 mutant phenotypes are milder than those of Papss mutants. Multiple mutations in several Golgi components that more significantly disrupt Golgi structures and function may cause more severe defects in lumen expansion and shape.

      How the defects described for Pio ("multiple constrictions with a slight expansion between constrictions") and Dpy mutants ("lumen with multiple bulges and constrictions") relate to the "irregular lumen with bulges" in Papss mutants?

      pio and dpy mutants show more stereotypical phenotypes, while Papss mutants exhibit more irregular and random phenotypes. The irregular lumen phenotypes in Papss mutants are associated with a condensed Pio-Dpy matrix.

  4. Aug 2025
    1. L ‘„»I2'8

      If we can assume by the sign-off that he began his book in 1938, and then published in 1944, that would place us in Switzerland during Nazi Germany and the beginning of WWII. I wonder if any of his writings in this book were influenced by current events and if he considered war strategy as a form of play. It is easy for us to think of war as play, but for those who lived through it, it may have seemed like an outrageous statement.

    1. It's not: Can schools save more of our students? Because I think we have the answer to that -- and it's yes they can, if we save our schools first. We can start by caring about the education of other people's children ...

      Tying the amount of money we have lost as a nation to the lack of attention paid to the education system was an interesting point. The financial loss could sway people who previously did not care about other people's children (and their education). Due to the current state of the country it may be difficult to get people to "start caring about other people's children." in tems of improving the condition of our current educational system but the financial implications and losing earning potential could sway stakeholders to invest in educational reform.

    1. This is because our expectations are often based on previous experience and patterns we have observed and internalized, which allows our brains to go on “autopilot” sometimes and fill in things that are missing or overlook extra things.

      This sentence is very relatable. It highlights how our brains rely on past experiences and familiar patterns to make sense of what’s around us, sometimes without us even realizing it. The idea of going on “autopilot,” as stated in the text, is something I experience often. For example, there are times when I’m sitting in my living room and I think I see someone walking past my big front window. But when I actually look outside, there’s nobody there. This has happened multiple times, and I’ve always wondered why. Now, I think it’s because the walkway to the front door is right outside that window, so my brain may be expecting someone to come up to the door.

    1. We anticipate that layers that account for this depth order, e.g. through convolutions or possibly self-attention (as used in spatio-temporal graphs (e.g. Guo et al. 2019, Su et al. 2020)), will often be complementary to other layers acting on the topology (encoded in the phylogenetic graph), e.g. through graph convolutions.

      Related to the pooling operator, I think large gains may come from the use of 1) edge weights in your GCN layers so that not all neighbors are treated equally by the message passing mechanism, and 2) alternative MPNN layer types, including use of the graph attention mechanism (i.e. GAT) or graph transformers, which use the attention mechanism to learn which neighbors are more "important." I suspect that even with simple mean-pooling, these alternative layer types will be much more performant and generalizable (e.g. from CRBD to BiSSE). In effect the GCN layers (particularly without using edge weights) is more akin to the CRBD in that it assumes uniform, homogeneous contribution by all neighbors to feature updates.

    1. And some have suggested we may have been thinking about agriculture wrong. It now seems likely that agriculture began in a very gradual process that goes back much farther than we had imagined.

      I find it interesting how our understanding of the agricultural revolution has changed over the years. We as humans tend to think about history, and really a lot of things, in a chronological order. We’ve learned over the years that it isn’t always the cause, especially in our understanding of pre-written eras.

    1. Author response:

      Reviewer 1:

      (1) Line 65 "(Figure 1A). Inactivation causes a change in the leg's rest position; however, in preliminary experiments, the body rotation did not have a large effect on the rest positions of the leg following inactivation. This result is consistent with the one already reported for stick insects and shows that passive forces within the leg are much larger than the gravitational force on a leg and dominate limb position [1]." This is the direct replication of the previous work by Hooper et al 2009 and therefore authors should ideally show the data for this condition (no weight attached).

      We did not present this data – the effect of inactivation on the leg’s rest position in unweighted leg - because it was already reported in the case of stick insects. However, we understand the reviewer’s point that it is important to present the data showing this replication. We will do the same in the revised version.

      (2) The authors use vglut-gal4, a very broad driver for inactivating motor neurons. The driver labels all glutamatergic neurons, including brain descending neurons and nerve cord interneurons, in addition to motor neurons. Additionally, the strength of inactivation might differ in different neurons (including motor neurons) depending on the expression levels of the opsins. As a result, in this condition, the authors might not be removing all active forces. This is a major caveat that authors do not address. They explore that they are not potentially silencing all inputs to muscles by using an additional octopaminergic driver, but this doesn't address the points mentioned above. At the very least, the authors should try using other motor neuron drivers, as well as other neuronal silencers. This driver is so broad that authors couldn't even use it for physiology experiments. Additionally, the authors could silence VGlut-labeled motor neurons and record muscle activity (potentially using GCaMP as has been done in several recent papers cited by the authors, Azevedo et al, 2020) as a much more direct readout.

      This reviewer critique is related to the use of vglut-gal4 –a broad driver– to inactivate motor neurons (MNs). The reviewer argues that the use of a broad driver might result in some effects that are not due to MN inactivation. Conversely, it is possible that not all MNs are inactivated. These critiques raise important points that we will address in the revision by 1) performing experiments with other MN drivers as suggested by the reviewer, 2) performing experiments in flies that are inactivated by freezing. These measurements will provide other estimates of passive forces allowing us to better triangulate the range of values for the passive forces. Moreover, it appears that one of the reviewer’s main concern is that the passive forces are overestimated because of the residual active forces. We will discuss this possibility in detail. It is important to note that in the end what we hope to accomplish is to provide a useful estimate of the passive forces. It is unlikely that the passive force will be a precise number like a physical constant as the passive forces likely depend on recent history.

      (3) Figure 4 uses an extremely simplified OpenSim model that makes several assumptions that are known to be false. For example, the Thorax-Coxa joint is assumed to be a ball and socket joint, which it is not. Tibia-tarsus joint is completely ignored and likely makes a major contribution in supporting overall posture, given the importance of the leg "claw" for adhering to substrates. Moreover, there are a couple of recent open-source neuromechanical models that include all these details (NeuromechFly by Lobato-Rios et al, 2022, Nat. Methods, and the fly body model by Vaxenburg et al, 2025, Nature). Leveraging these models to rule in or rule out contributions at other joints that are ignored in the authors' OpenSim model would be very helpful to make their case.

      Our OpenSim model predates the newer mechanical model. In the revised manuscript, we will revisit the model in light of recent developments.

      (4) Figure 5 shows the experimental validation of Figure 4 simulations; however, it suffers from several caveats.

      a) The authors track a single point on the head of the fly to estimate the height of the fly. This has several issues. Firstly, it is not clear how accurate the tracking would be. Secondly, it is not clear how the fly actually "falls" on VGlut silencing; do all flies fall in a similar manner in every trial? Almost certainly, there will be some "pitch" and "role" in the way the fly falls. These will affect the location of this single-tracked point that doesn't reflect the authors' expectations. Unless the authors track multiple points on the fly and show examples of tracked videos, it is hard to believe this dataset and, hence, any of the resulting interpretations.

      b) As described in the previous point, the "reason" the fly falls on silencing all glutamatergic neurons could be due to silencing all sorts of premotor/interneurons in addition to the silencing of motor neurons.

      c) (line 175) "The first finding is that there was a large variation in the initial height of the fly (Figure 5C), consistent with a recent study of flies walking on a treadmill[20]." The cited paper refers to how height varies during "walking". However, in the current study, the authors are only looking at "standing" (i.e. non-walking) flies. So it is not the correct reference. In my opinion, this could simply reflect poor estimation of the fly's height based on poor tracking or other factors like pitch and role.

      d) "The rate at which the fly fell to the ground was much smaller in the experimental flies than it was in the simulated flies (Figure 5E). The median rate of falling was 1.3 mm/s compared to 37 mm/s for the simulated flies (Figure 5F). (Line 190) The most likely reason for the longer than expected time for the fly to fall is delays associated with motor neuron inactivation and muscle inactivation." I don't believe this reasoning. There are so many caveats (which I described in the above points) in the model and the experiment, that any of those could be responsible for this massive difference between experiment and modeling. Simply not getting rid of all active forces (inadequate silencing) could be one obvious reason. Other reasons could be that the model is using underestimates of passive forces, as alluded to in point 3.

      (4a) Although we agree that measuring different points on the body would allow us to estimate the moments, we disagree that the height of the fly cannot be evaluated from the measurement of a single point. The measurements have been performed using the same techniques that we used to assess the fly’s height in a different study where we estimated the resolution of our imaging system to be ~20 mm(Chun et. al. 2021). We will include these details in the revised manuscript. The video showing the falling experiments are not available or referenced in the manuscript. These will be made available.

      b) We will repeat the “falling” experiment with a more restrictive driver.

      c) We disagree with the reviewer on this point. The system has a resolution of ~20 mm and is sufficient to make conclusion about the difference in the height of the fly. We will clarify this point in the revised manuscript.

      d) We do not follow the reviewer’s rationale here. The passive forces in the model (along with any residual forces) are the same in the model as well as in the experiment. Moreover, there will be a delay between light onset, neuronal inactivation and muscle inactivation. These processes are not instantaneous. In Figure 6, we estimate these delays and have concluded that they will cause substantial delay. In the revised manuscript, we will discuss other reasons for the delay suggested by the reviewer.

      (5) Final figure (Figure 6) focuses on understanding the time course of neuronal silencing. First of all, I'm not entirely sure how relevant this is for the story. It could be an interesting supplemental data. But it seems a bit tangential. Additionally, it also suffers from major caveats.

      a) The authors now use a new genetic driver for which they don't have any behavioral data in any previous figures. So we do not know if any of this data holds true for the previous experiments. The authors perform whole-cell recordings from random unidentified motor neurons labeled by E49-Gal4>GtACR1 to deduce a time constant for behavioral results obtained in the VGlut-Gal4>GtACR1 experiments.

      b) The DMD setup is useful for focal inactivation, however, the appropriate controls and data are not presented. Line 200 "A spot of light on the cell body produces as much of the hyperpolarization as stimulating the entire fly (mean of 11.3 mV vs 13.1 mV across 9 neurons). Conversely, excluding the cell body produces only a small effect on the MN (mean of 2.6 mV)." First of all, the control experiment for showing that DMD is indeed causing focal inactivation would be to gradually move the spot of light away from the labeled soma, i.e. to the neighboring "labelled" soma and show that there is indeed focal inactivation. Instead authors move it quite a long distance into unlabeled neuropil. Secondly, I still don't get why the authors are doing this experiment. Even if we believe the DMD is functioning perfectly, all this really tells us is that a random subset motor neurons (maybe 5 or 6 cells, legend is missing this info) labeled by E49-Gal4 is strongly hyperpolarized by its own GtACR1 channel opening, rather than being impacted because of hyperpolarizations in other E49-Gal4 labeled neurons. This has no relevance to the interpretation of any of the VGlut-Gal4 behavioral data. VGLut-Gal4 is much broader and also labels all glutamatergic neurons, most of which are inhibitory interneurons whose silencing could lead to disinhibition of downstream networks.

      (5 a) However, we can address the reviewer critique by recording from the Vglut line while using a MN line to target the recordings to MNs.

      b) Once we use the Vglut driver to perform these recordings, it will help assess how much of the MN inactivation is due to the GtACR expressed in the MN versus other neurons.

      Reviewer 2:

      While (as mentioned above) the study's conclusions are well-supported by the results and modeling, limitations arise because of the assumptions made. For instance, using a linear approximation may not hold at larger joint angles, and future studies would benefit from accounting for nonlinearities. Future studies could also delve into the source of passive forces, which is important for more deeply understanding the anatomical and physical basis of the results in this study. For instance, assessments of muscle or joint properties to correlate stiffness values with physical structure might be an area of future consideration.

      We agree with these comments but believe that these studies represent avenues for future work.

      Reviewer 3:

      (1) Passive torques are measured, but only some short speculative statements, largely based on previous work, are offered on their functional significance; some of these claims are not well supported by experimental evidence or theoretical arguments. Passive forces are judged as "large" compared to the weight force of the limb, but the arguably more relevant force is the force limb muscles can generate, which, even in equilibrium conditions, is already about two orders of magnitude larger. The conclusion that passive forces are dynamically irrelevant seems natural, but contrasts with the assertion that "passive forces [...] will have a strong influence on limb kinematics". As a result, the functional significance of passive joint torques in the fruit fly, if any, remains unclear, and this ambiguity represents a missed opportunity. We now know the magnitude of passive joint torques - do they matter and for what? Are they helpful, for example, to maintain robust neuronal control, or a mechanical constraint that negatively impacts performance, e.g., because they present a sink for muscle work?

      To us, measuring passive forces was the first step to understanding neural/biomechanical control of limb. In general, we agree with these comments and would like to understand the role of passive forces in overall control of limb. A complete discussion of the role of the significance of passive forces in the control of limb is beyond the scope of this study. We would like to note that it is unlikely that the active forces are two orders of magnitude larger during unloaded movement of the limb. However, these issues will have to be settled in future work.

      (2) The work is framed with a scaling argument, but the assumptions that underpin the associated claims are not explicit and can thus not be evaluated. This is problematic because at least some arguments appear to contradict textbook scaling theory or everyday experience. For example, active forces are assumed to scale with limb volume, when every textbook would have them scale with area instead; and the asserted scaling of passive forces involves some hidden assumptions that demand more explicit discussion to alert the reader to associated limitations. Passive forces are said to be important only in small animals, but a quick self-experiment confirms that they are sufficient to stabilize human fingers or ankles against gravity, systems orders of magnitude larger than an insect limb, in seeming contradiction with the alleged dominance of scale. Throughout the manuscript, there are such and similar inaccuracies or ambiguities in the mechanical framing and interpretation, making it hard to fairly evaluate some claims, and rendering others likely incorrect.

      We interpret this comment as making two separate points. The first one is that the reviewer says that our statement that active forces depend on the third power of the limb or L<sup>3</sup> is incorrect. We agree and apologize for this oversight. Specifically, on L6-7 we say, “both inertial forces and active forces scale with the mass if the limb which in turn scales with the volume of the limb and therefore depends on the third power of limb length (L<sup>3</sup>)”. Instead, this statement should read “inertial forces scale with the mass if the limb which in turn scales with the volume of the limb and therefore depends on the third power of limb length (L<sup>3</sup>)”. However, this oversight does not affect the scaling argument as the scaling arguments in the rest of the manuscript only involves inertial forces and not active forces.

      The second point is about the scaling law that governs passive forces. In the current manuscript, we have assumed that the passive forces scale as L<sup>2</sup> based on previous work. The reviewer has pointed out that this assumption might be incorrect or at the very least needs a rationale. We agree with this assessment: passive forces that arise in the muscle are likely to scale as L<sup>2</sup> but passive forces that arise in the joint might not. In the revised manuscript, we will discuss this concern.

      Response to the public comment:

      There was a comment from a reader: “None of our work cited in various places in this preprint (i.e., Zakotnik et al. 2006, Guschlbauer et al. 2007, Page et al. 2008, Hooper et al. 2009, Hooper 2012, Ache and Matheson 2012, Blümel et al. 2012, Ache and Matheson 2013, von Twickel et al. 2019, and Guschlbauer et al. 2022) claims or implies that passive forces could be sufficient to support the weight of an insect or any animal. To claim or suggest otherwise (as done in lines 33-35) is incorrect and sets up a misleading straw man that misrepresents our work. All statements in the preprint regarding our work related to this specific matter need to be removed or edited accordingly. For instance, the investigations, calculations, and interpretations in Hooper et al. 2009 are solely about limbs that are not being used in stance or other loaded tasks (indeed, the article's title specifically refers to "unloaded" leg posture and movements). Trying to use this work to predict whether passive muscle forces alone can support a stick insect against gravity requires considering much more than the oversimplified calculation given in lines 290-292. Other “back of the envelope calculations” (lines 299-300) are likely also insufficient and erroneous. The discussion in lines 289-304 needs to be edited accordingly”

      We thank the reader for their comment. However, we interpret these studies differently. The studies above rightly focused on unloaded legs because it would be difficult to study passive forces in an intact insect without genetic tools. The commenter correctly points out that these studies do not comment on whether passive forces are strong enough to support the weight of the fly. However, we disagree that our arguments based on their results are unreasonable or strawman. We think that our interpretation of their measurements is correct. Moreover, we were motivated by Yox et. el. 1982 who states in so many words: “Stiffness of the muscles in the joints of all the legs might be sufficient to support a resting arthropod. A more rigorous analysis of all supporting limbs and joint angles would be required to prove this hypothesis”. We were inspired by this comment. In the revised manuscript, we will make it clear that the statement made in Line 33 is based on Yox. et. al. and our interpretation of measurements made by others.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      GENERAL COMMENTS

      We thank the three reviewers for their comments on the paper.

      We are pleased to see that they consider it be a comprehensive and well-executed study, which clearly establishes a previously overlooked connection between MRTF-SRF signalling and proliferation, and that its conclusions require no further experimentation.

      As review 3 points out, this work has implications for cancer biology, and suggests new research routes to understand the relation between cell adhesion, proliferation, and transformation.

      However, two referees raise significant concerns about its impact

      Review 1 suggests that the paper lacks impact without exploration the wider biological significance of our observations, although it considers it to be a good basic cell biology study. It suggests further work extending the findings to tissue- or tumor-based systems. While we consider such studies worthwhile – indeed we are currently pursuing these directions – we consider them beyond the scope of the present paper.

      Review 2 questions the novelty of our findings. We strongly disagree. This is is the first study to show that MRTF-SRF signalling is required for the proliferation of both primary and immortalised fibroblasts, and epithelial cells. We show that MRTF inactivation leads cells to enter a quiescence-like state under conditions that would permit efficient cell cycle progression in wildtype cells. The study will alter the field's perspective on the role of MRTF-SRF signalling, previously viewed as concerned with cell adhesion, morphology, and motility.

      Responses to individual reviews (italic) follow in regular text.

      RESPONSE TO INDIVIDUAL REVIEWS (comments in italic, response in regular, changes made)

      __Reviewer #1 __

      *(Evidence, reproducibility and clarity (Required)): *

      *The manuscript by Neilsen et al. presents a thorough and well-structured study showing that Myocardin-related transcription factors (MRTF-A/B), via MRTF-SRF, are essential for the proliferation of both primary and immortalized fibroblasts and epithelial cells. Using a combination of knockouts/rescue experiments, cytoskeletal analysis, and transcriptomics, the authors demonstrate that MRTF-SRF signalling controls actin dynamics and contractility-key drivers of cell cycle progression. Notably, they show that the proliferative arrest caused by MRTF loss is reversible, distinguishing it from classical senescence. **

      Major points*

      • The link between MRTF-SRF activity, cytoskeletal organisation, and cell proliferation is clearly established. The fact that disrupting contractility phenocopies MRTF loss strengthens the case that the pathway acts through mechanical control.*
      • The authors support their conclusions using multiple cell types (MEFs, primary fibroblasts, epithelial cells), a range of complementary assays (RNA-seq, traction force microscopy, adhesion/spreading), and genetic tools (CRISPR, inducible rescue).*
      • The ability to restore proliferation by re-expressing MRTF-A argues against true senescence and instead suggests a quiescence-like state driven by cytoskeletal disruption.*
      • This work particularly highlights how mechanical inputs feed into transcriptional programs to regulate proliferation, with implications for understanding anchorage-dependent growth.**

      Suggestions While the authors argue convincingly against classical senescence, elevated SA-βGal and SASP expression suggest a more nuanced arrest state. It not really clear what this state is or is not, therefore a deeper discussion of possible hybrid or intermediate states would be helpful - maybe potential additional experiments to include or exclude potential explanations - e.g. how does it differ from G0 exit?* Our findings show that MRTF inactivation inhibits cell proliferation under conditions that would permit efficient cell cycle progression in wildtype cells, inducing a state with some features associated with classical senescence, and others conventionally associated with reversible cell cycle arrest/quiescence. The reviewer correctly points out that this raises problems with accurately defining the nature of the MRTF-null proliferation defect.

      To our knowledge there are no rigorously defined unambiguous markers for senescence, quiescence, or G0. Indeed, recent studies have shown that senescence and quiescence / G0 states are not as distinct as previously assumed (Anwar et al, 2018; Ashraf et al 2023) as we reviewed in detail in Discussion p27, §2; p28 §3. We therefore do not consider it a productive endeavour to define markers for the MRTF-null state as opposed to defining its mechanistic basis. However, we agree that we should have been clearer about how the phenotypes we observe relate to classical cell arrest states.

      We have therefore revised the presentation of the Results to make it clear which features of the non-proliferative state associated with MRTF inactivation are seen in classical senescence, and which are found in reversible cell cycle exit or quiescence.

      Things done:

      • __Results pp16-17 and Fig 1. Figure panels and presentation are reordered to present “senescence” features together before marker expression (panel G is now panel I). Text now explicitly points out that the spectrum of cell cycle markers, specifically p27 upregulation, is not that associated with classical senescence (p16, p21,etc) but previously linked to reversible arrest or quiescence. Lines 371-380 have been moved up from the succeeding paragraph; statement added re p27 and reversible cell cycle exit on lines 387-389; summary sentence added in lines 398-401). __
      • Statement added that reversibility distinguishes the MRTF defect from classical senescence p20§1 line 454-455.
      • Note that p27 is associated with reversible arrest included on p20§2 line 460. We also explicitly summarised the features of the phenotype at the start of the Discussion.

      • Sentences added p27§1 lines 626-631.

      • Emphasis that p27 protein upregulation is associated with reversible cell cycle inhibition and quiescence is added on p28 line 668-669.

      • The transcriptomic data are strong, but the paper would benefit from zooming in on specific MRTF-SRF targets (e.g., actin isoforms, adhesion molecules) that directly link cytoskeletal regulation to cell cycle control.*

      We have now clarified presentation of the RNAseq data in Figure 5 and the data summary tables. Figure 5B now identifies which of those genes showing deficits in MRTF-null MEFs were previously identified as direct genomic targets for MRTF-SRF, and that the majority are cytoskeletal.

      • __Additional columns added in Table 1 to indicate whether genes are candidate genomic MRTF-SRF targets; Table 2 now show gene symbol lists as well as ENSMBL IDs for GO categories and NCBI Entrez IDs for GSEA categories, respectively. __
      • __Figure 5B revised to point out cytoskeletal genes that are genomic MRTF-SRF targets in bold, legend clarified p40 lines 920-922. __
      • Now noted____ p23 lines 527-529 that cytoskeletal genes affected include many direct MRTF-SRF targets. Our data confirms that in MEFs, MRTF inactivation affects fibroblast cell morphology, adhesion, spreading, motility and contractility (Figures 5, 6), as seen in many other settings.

      A critical question remains as to whether these effects a reflect limitation in one MRTF target gene or several, and how this defect relates to proliferation.

      Concerning specific MRTF-SRF gene targets:

      Cells lacking cytoplasmic actins are reported to exhibit defective proliferation, (__now noted in Results p23 lines 529-532). __We are currently evaluating whether this defect has similarities with the MRTF-null proliferation phenotype (see Discussion p31, §2).

      Previous findings suggest that defective cytoplasmic actin expression may underlie most MRTF knockout phenotypes (Salvany et al, 2014; Maurice et al., 2024) previously noted in the Discussion (see p31, §2).

      The myoferlin gene promotes growth of liver cancer cells by inhibiting ERK activation and oncogene induced senescence. We showed that myoferlin expression does not promote proliferation of MRTF-null MEFs in the original submission (see Figure S5E). Additionally, we now point out that the RNAseq data show that myoferlin expression is not significantly affected in MRTF-null MEFs __(new text p23, lines 532-534). __

      • It depends on where what target journal would be, but this is is a very well executes mechanistic study that doesn't really have an impact. Extending the discussion to human systems-or tissues where contractility is critical-could broaden the impact and applicability of the findings.*

      We interpret this comment as indicating that our paper does not address the wider biological implications of our findings by extension to studies in tissue or tumour systems.

      As outlined in our response to review 3, our study provides strong evidence that MRTF-SRF will be required for cell proliferation in settings where physical progression through cell cycle transitions requires high contractility, either owing to intrinsic factors or external physical constraints such as tissue stiffness, fibrosis, or tumour microenvironment.

      Discussion now explicitly addresses potential roles for tissue stiffness (pp30§2 lines 717-718, and p32§1 725-727). However, we feel that resolution of this question is beyond the scope of the present paper.

      • As above, the paper briefly mentions transformation, but it would be valuable to elaborate on whether MRTF-SRF acts as a barrier or enabler in tumorigenesis under different conditions. This I feel is the main weakness remaining - e.g. it would be fine with enabling different effects driven by other transcription events in emerging tumour cells (oncogenic in context of RAS, suppressive in context of p53) but I think the manuscript fails to be definitive on this points. Addressing this would make a much stronger and impactful study. I believe they have an impact peice of science that outlines how mechanical events impact cell fate decisions, but this is unlikely to be the driver - ie it facilitates cell fate decisions in context of tissue stiffness.*

      We find it difficult to understand the precise points being made here.

      However, transformation has long been known to bypass physical constraints on proliferation such as the requirement for adhesion. Moreover, MRTF-SRF activity is not necessarily required for proliferation of all transformed cells (Hampl et al, 2013; Medjkane et al, 2009; our unpublished data). The relation of our findings to transformation is thus an open question, which we are actively pursuing. Now noted in revised Discussion p32, lines 752-755.

      MRTF-independent proliferation of tumor cells could reflect oncogenic signals substituting for MRTF-dependent ones (eg from focal adhesions), or from relief of cytoskeletal contraints on proliferation (adhesion independent proliferation). In contrast, in proliferation of DLC1-deleted cancer cells is dependent on suppression of oncogene-induced senescence by MRTF-SRF signalling (Hampl et al, 2013). These points were already made in Discussion p28, pp30-31.

      Although our current work is focussed on cell transformation, we would respectfully suggest the in-depth resolution of this complex question is beyond the scope of the present paper.

      See also response to (3) above.

      *Reviewer #1 (Significance (Required)): *

      *Overall *

      This is a well-executed and insightful study that deepens our understanding of how cytoskeletal signals drive proliferation through MRTF-SRF. It broadens the role of this pathway beyond motility and offers new perspectives on mechanotransduction and cellular plasticity. If is weak in its demonstration of biological significance, but if the aim to to present a pure basic cell biology story it is good.

      The vast majority of work with the SRF system has led to the common perception that its role is exclusively with cell motility and adhesive processes, not proliferation. The results presented in the paper, even if limited to cell culture models, are therefore novel.

      Reviewer #2

      (Evidence, reproducibility and clarity (Required)):

      *In this manuscript, Nielsen and colleagues examine the impact of MRTF-A/B and SRF gene inactivation on cell proliferation. They performed an extensive body of work (using multiple cell types and multiple clones) to show that MRTF inactivation causes cell cycle arrest and senescence (mimicking the phenotype of SRF knockout cells) although the changes in the expression of various CDK inhibitors were cell-type specific. *

      *Very interestingly, simultaneous inactivation of all three major CDK inhibitors failed to rescue MRTF knockout cells from their proliferation defect. Expectedly, MRTF knockout cells exhibited defects in actin cytoskeleton, adhesion, and contractility. Interestingly, hyperactivating Rho also failed to rescue MRTF knockout cells from proliferation defect. The main conclusion of the paper was derived from experiments which showed that inhibition of either ROCK or myosin caused wild-type cells to behave like MRTF knockout cells rather than demonstration of any molecular perturbation that could reverse the proliferation defect of MRTF knockout cells. *

      While the experimental studies are thorough and rigorous, a vast majority of the core findings related to the loss-of-function of MRTF that are reported herein (i.e. defects in cell proliferation, elevation of CDK inhibitors, migration, actin cytoskeleton, contractility) are not conceptually new and have been previously reported in other cell systems by several investigators including this research group.

      This is the first study showing that MRTF-SRF signalling is required for the proliferation of both primary and immortalised fibroblasts, and epithelial cells. We show that the MRTF-SRF non-proliferative state combines features of both classical senescence and reversible cell cycle exit / quiescence.

      The vast majority of previous work with the SRF system has led to the common perception that its role is exclusively related to cell motility and adhesive processes and not proliferation (see Olson and Nordheim 2010). Where proliferation has been examined directly, both others and our own previous studies of the MRTFs in immune cells and cancer cells lines have revealed no direct role in proliferation (Schratt et al, 2001;Medjkane et al 2009; Maurice et al, 2024).

      The results presented here are therefore novel.

      In the reviewer's opinion, since the authors have not been able to identify a molecular strategy to reverse the proliferation phenotype of MRTF knockout cells, the underlying mechanisms of MRTF-dependent regulation of cell proliferation remain largely unanswered.

      Indeed, our attempts to rescue the phenotype (knockouts of the CKIs, and overexpression of different downregulated factors) did not restore proliferation. We therefore now aim to attack the problem (i) through overexpression screens, and (ii) by identifying differences between MRTF-SRF dependent and -independent (eg transformed) cells. However, these are new projects that are beyond the scope of a revised paper.

      • *

      Other comments: Majority of the immunoblot data have not been quantified.

      P16 data in Fig 1G vs Fig S1A are not similar (although the authors mention that the findings are similar)

      We have addressed these issues by reorganisation and quantification the immunoblotting data as follows:

      • Figure S1A has been moved to new Figure 1I, replacing the limited analysis shown in old Figure 1G. This more comprehensive, and displays data from all three WT and Mrtfab-/-
      • Figure 1I data is quantified. Marker expression in each Mrtfab-/- pool is evaluated relative its mean expression in the three WT pools treated in parallel.
      • A new Figure S1A shows mean marker expression across the three Mrtfab-/- pools, drawn from 5 independent analyses (not all markers included in each analysis). Different analyses of marker expression may exhibit variation, resulting from differences in handling, culture medium, plating density, relative confluence, etc. However, Mrtfab-/- cells exhibit markedly increased p27 and TLR2 expression, while expression of the other markers tested, including p16, consistently decreases.
      • Spearman comparisons among the WT and Mrtfab-/- pools show that relative marker expression is indeed well correlated between the pools of each genotype. Note on quantitation added in Methods p10 lines 209-213.

      Figure 1I moved from former Figure S1A, to replace former Figure 1G. New legend now includes quantitation, and reference to Spearman correlations, p44 lines 834-841.

      New Figure S1A displays data from multiple independent experiments with all 3 Mrtfab-/- pools. New legend, p44 lines 997-1002.

      Figure S1B legend notes correlation between relative marker expression in untreated WT and Mrtfab-/- cells, p44, lines 1005-1008.

      Results text rewritten p17 lines 383-391; no reference to “similar”.

      *Reviewer #2 (Significance (Required)): *

      *This study aims to investigate a fundamental biological question of how an actin-regulated transcription machinery regulates cell proliferation and is therefore of broad significance. Strengths and limitations of this study are described above. *

      Reviewer #3

      *(Evidence, reproducibility and clarity (Required)): *

      Summary

      *The manuscript by Nielsen et al. (Treisman lab) entitled "MRTF-dependent cytoskeletal dynamics drive efficient cell cycle progression" investigates the effects on cell proliferation elicited upon cellular depletion of the transcription factors MRTF-A and MRTF-B. The MRTFs are actin-dependent co-factors of SRF, which direct the transcription of SRF target genes. The MRTF-SRF regulatory circuit defines both the functioning and the control of actin-driven cytoskeletal dynamics. *

      *The work presented identifies essential molecular links that interconnect cytoskeleton-dependent cellular activities (cell-cell adhesion, cell-substrate contact, cell spreading) and cell proliferation. *

      *General assessment on used methodology. *

      *The presented comprehensive body of work is performed competently; it includes all relevant and necessary state-of-the-art technologies. *

      • *

      Reviewer #3 (Significance (Required)):

      Advance

      Previously published evidence by others (including the Treisman group) had indicated that SRF does not seem essential for the proliferation of some cell types (i. e., embryonic (stem) cells, activation-dependent immune cells, etc.). In regard to this, the authors discuss in the current manuscript: "Although further work is needed to elucidate the basis for these context-dependent dfferences, our data show that MRTF-SRF signalling is likely to play a more general role in proliferation than previously thought." The current manuscript already delineates this "general role": MRTF-SRF signalling impinges on cell proliferation whenever proliferative activities are dependent upon cytoskeletal dynamics.

      We of course support the view that it is MRTF-SRF's role in cytoskeletal dynamics, especially contractility, that is a limiting factor for cell cycle progression in our cells; however, this may not be the cases or other cell types or settings, such adhesion-independent or transformed cells, and/or stiff tissue environments.

      We have stated this view more strongly, modifying the abstract and discussion, and rewording the sentence quoted above.

      The major point is that MRTF-SRF-dependent proliferation may be more common than previously thought, the field having focussed on its role in cytoskeletal dynamics rather than proliferation.

      Abstract lines 48-49; Discussion p28, line 668-669; pp30-31, lines 713-714, 725-727. See also last para pp31/32, __added lines 752-755. __

      *The work has implications for cancer biology. It offers new directions to investigate the regulation of proliferative activities of anchorage-independent tumor cells. **

      Audience *

      *The insights generated serve the wide interests of a large and diverse group of cell and tumor biologists. *

      *Reviewers field of expertise (keywords). *

      Cytoskeletal dynamics, transcriptional con*

    1. Author response:

      The following is the authors’ response to the current reviews

      Reviewer #2 (Public review): 

      This manuscript describes the role of the production of c-di-AMP on the chlamydial developmental cycle. The main findings remain the same. The authors show that overexpression of the dacA-ybbR operon results in increased production of c-di-AMP and early expression of transitionary and late genes. The authors also knocked down the expression of the dacA-ybbR operon and reported a modest reduction in the expression of both hctA and omcB. The authors conclude with a model suggesting the amount of c-di-AMP determines the fate of the RB, continued replication, or EB conversion. 

      Overall, this is a very intriguing study with important implications however the data is very preliminary and the model is very rudimentary. The data support the observation that dramatically increased c-di-AMP has an impact on transitionary gene expression and late gene expression suggesting dysregulation of the developmental cycle. This effect goes away with modest changes in c-di-AMP (detaTM-DacA vs detaTM-DacA (D164N)). However, the model predicts that low levels of c-di-AMP delays EB production is not not well supported by the data. If this prediction were true then the growth rate would increase with c-di-AMP reduction and the data does not show this. The levels of of c-di-AMP at the lower levels need to be better validated as it seems like only very high levels make a difference for dysregulated late gene expression. However, on the low end it's not clear what levels are needed to have an effect as only DacAopMut and DacAopKD show any effects on the cycle and the c-di-AMP levels are only different at 24 hours. 

      These appear to be the same comments the reviewer presented last time, so we will reiterate our prior points here and elsewhere. We do not think and nor do we predict that low c-di-AMP levels should increase growth rate (as measured by gDNA levels), and this conclusion cannot be drawn from our data. Rather, we predict that the inability to accumulate c-di-AMP should delay production of EBs, and this is what the data show. The reviewer has applied their own subjective (and erroneous) interpretation to the model. The asynchronicity of the normal developmental cycle means RBs continue to replicate as EBs are forming, so gDNA levels cannot be used as the sole metric for determining RB levels. We show that reduced c-di-AMP levels reduce EB levels as well as transcripts associated with late stages of development. The parsimonious interpretation of these data support that low c-di-AMP levels delay progression through the developmental cycle consistent with our model.

      The data still do not support the overall model.

      We disagree.  We have presented quantified data that include appropriate controls and statistical tests, and the reviewer has not disputed that or pointed to additional experiments that need to be performed.  The reviewer has imposed a subjective interpretation of our model based on their own biases.  A reader is free, of course, to disagree with our model, but a reviewer should not block a manuscript based on such a disagreement if no experimental flaws have been identified. 

      In Figure 1 the authors show at 24 hpi. 

      We also showed data from 16hpi, which is a more relevant timepoint for assessing premature transition to EBs.  In contrast, the 24hpi is more important for assessing developmental effects of reduced c-di-AMP levels.

      DacA overexpression increases cdiAMP to ~4000 pg/ml 

      DacAmut overexpression reduces cdiAMP dramatically to ~256 pg/ml) 

      DacATM overexpression increases cdiAMP to ~4000 pg/ml. 

      DacAmutTM overexpression does not seem to change cdiAMP ~1500 pg/ml . 

      dacAKD decreases cdiAMP to ~300 pg/ml . 

      dacAKDcom increased cdiAMP to ~8000 pg/ml. 

      DacA-ybbRop overexpression increased cdiAMP to ~500,000 pg/ml. 

      DacA-ybbRopmut ~300 pg/ml. 

      However in Figure 2 the data show that overexpression of DacA (cdiAMP ~4000 pg/ml) did not have a different phenotype than over expression of the mutant (cdiAMP ~256 pg/ml). HctA expression down, omcB expression down, euo not much change, replication down, and IFUs down. Additionally, Figure 3 shows no differences in anything measured although cdiAMP levels were again dramatically different. DacATM overexpression (~4000 pg/ml) and DacAmutTM (~1500). This makes it unclear what cdiAMP is doing to the developmental cycle. 

      As we have explained in the text and in response to reviewer comments on previous rounds of review, overexpressing the full-length WT or mutant DacA is detrimental to developmental cycle progression for reasons that have nothing to do with c-di-AMP levels (likely disrupting membrane function), since, as the reviewer notes, the WT DacA deltaTM strain had similar c-di-AMP levels but no negative effects on growth/development. If we had not presented the effects of overexpressing the individual isoforms, then a reviewer would surely have requested such, which is why we present these data even though they don’t seem to support our model.  This is an honest representation of our findings.  The reviewer seems intent on nitpicking a minor datapoint that seems to contradict the rest of the manuscript while ignoring or not carefully reading the rest of the manuscript.

      In Figure 4 the authors knockdown dacA (dacA-KD) and complement the knockdown (dacA-KDcom) 

      dacAKD decreases cdiAMP (~300) while DacA-KDcom increases cdiAMP much above wt (~8000). 

      KD decreased hctA and omcB at 24hpi. Complementation resulted in a moderate increase in hctA at a single time point but not at 24 hpi and had no effect on euo or omcB expression.

      By 24hpi, late gene transcripts are being maximally produced during a normal developmental cycle. It is unclear why the reviewer thinks that these transcripts should be elevated above this level in any of our strains that prematurely transition to EBs. There is no basis in the literature to support such an assumption. As we noted in the text, the dacA-KDcom strain phenocopied the dacAop OE strain, and we showed RNAseq data and EB production curves for the latter that support our conclusions of the effect of increased c-di-AMP levels on developmental progression.

      Importantly, complementation decreased the growth rate.

      Yes, since the c-di-AMP levels breached the “EB threshold” at 16hpi, it causes premature transition to EBs, which do not replicate their gDNA, at an earlier stage of the cycle when fewer organisms are present. Therefore, the gDNA levels are decreased at 24hpi, which is consistent with our model.

      Based on the proposed model, growth rate should increase as the chlamydia should all be RBs and replicating and not exiting the cell cycle to become EBs (not replicating).

      This is a spurious conclusion from the reviewer. As we clearly showed, the dacA-KDcom did not restore a wild-type phenotype and instead mimicked the dacAop OE strain. This was commented on in the text.

      Interestingly reducing cdiAMP levels by over expressing DacAmut (~256 pg/ml) did not have an effect on the cycle but the reduction in cdiAMP by knockdown of dacA (~300 pg/ml) did have a moderate effect on the cycle. 

      This is again a spurious conclusion from the reviewer. The dacAMut and dacA-KD strains are distinct. As noted in the text and above for DacA WT OE, overexpressing the DacAMut similarly disrupts organism morphology, which is different from dacA-KD. These strains should not be directly compared because of this. This point has been previously highlighted in the text (in Results and Discussion).

      For Figure 5 DacA-ybbRop was overexpressed and this increased cdiAMP dramatically ~500,000 pg/ml as compared to wt ~1500. This increased hctA only at an early timepoint and not at 24hpi and again had no effect on omcB or euo.

      As we explained in prior reviews, our RNAseq data more comprehensively assessed transcripts for the dacAop OE strain. These data show convincingly that late gene transcripts (not just hctA and omcB) are elevated earlier in the developmental cycle. Again, it is not clear why the reviewer should expect that late gene transcripts should be higher in these strains than they are during a normal developmental cycle. This is not part of our model and appears to be a bias that the reviewer has imposed that is not supported by the literature.

      Overexpression of the operon with the mutation DacA-ybbRopmut reduced cdiAMP to ~300 pg/ml and this showed a reduction in growth rate similar to dacAmut but a more dramatic decrease in IFUs. 

      As we described in the text, in earlier revisions, and above, the dacAMut OE strain has distinct effects unrelated to c-di-AMP levels and, therefore, should not be compared to other strains in terms of linking its c-di-AMP levels to its phenotype.

      Overall: 

      DacA overexpression increases cdiAMP to ~4000 pg/ml (decreased everything except euo) 

      DacAmut overexpression reduces cdiAMP dramatically (~256 pg/ml). (decreased everything except euo) 

      DacATM overexpression increases cdiAMP to ~4000 pg/ml (no changes noted) 

      DacAmutTM overexpression does not seem to change cdiAMP ~1500 pg/ml (no changes noted) 

      dacAKD decrease cdiAMP to ~300 pg/ml (decreased everything except euo) 

      dacAKDcom increased cdiAMP to ~8000 pg/ml (decreases growth rate, increase hctA a little but not omcB) 

      DacA-ybbRop overexpression increased cdiAMP to ~500,000 pg/ml (decreases growth rate, increase hctA a little but not omcB) <br /> DacA-ybbRopmut ~300 pg/ml (decreased everything except euo) 

      Overall, the data show that increasing cdiAMP only has a phenotype if it is dramatically increased, no effect at 4000 pg/ml.

      Yes, this clearly shows there is a threshold - as we hypothesize!  However, these thresholds are more important at the 16hpi timepoint not 24hpi (which the reviewer is referencing) when assessing premature transition to EBs.  We specifically highlighted in our prior revision in Figure 1E this EB threshold to make this point clearer for the reader.  Once the threshold is breached, then the overall c-di-AMP levels become irrelevant as the RBs have begun their transition to EBs.

      Decreasing cdiAMP has a consistent effect, decreased growth rate, IFU, hctA expression and omcB expression. However, if their proposed model was correct and low levels of cdiAMP blocked EB conversion then more chlamydial cells would be RBs (dividing cells) and the growth rate should increase.

      The only effect should be normal gDNA levels, which is what we see in the dacA-KD.  Given the asynchronicity of a normal developmental cycle in which RBs continue to replicate as EBs are still forming, there is no basis to assume gDNA levels should increase under these conditions for the dacA-KD strain at 24hpi.

      Conversely, if cdiAMP levels were dramatically raised then all RBs would all convert and the growth rate would be very low.

      We agree. This is what is reflected by the dacAop OE and dacA-KDcom strains, with reduced gDNA levels at 24hpi since organisms have transitioned to EBs at an earlier time post-infection.

      When cdiAMP was raised to ~4000 pg/ml there was no effect on the growth rate.

      Yes, because it had not breached the EB threshold at 16hpi – consistent with our model!  The reviewer is confusing effects of elevated c-di-AMP at 24hpi when they should be assessed at the 16hpi timepoint for strains overproducing this molecule.

      However, an increase to ~8000 pg/ml resulted in a significant decrease but growth continued.

      If the reviewer is referring to the dacA-KDcom strain, then this is not accurate. gDNA levels were decreased in this strain at 24hpi when the c-di-AMP levels were increased compared to the WT (mCherry OE) control at 16hpi, indicating this strain had breached the “EB threshold” and initiated conversion to EBs at an earlier timepoint post-infection when fewer organisms were present.

      Increasing cdAMP to ~500,000 pg/ml had less of an impact on the growth rate.

      It is not clear what this conclusion is based on and what the reviewer is comparing to.  This is a subjective assessment not based on our data.

      Overall, the data does not cleanly support the proposed model.

      It is an unfortunate aspect of biology, particularly for obligate intracellular bacteria – a challenging experimental system on which to work, that the data are not always “clean”.  The overall effects of increased c-di-AMP levels on chlamydial developmental cycle progression we have documented support our model, and we think the reader, as always, should make their own assessment.


      The following is the authors’ response to the original reviews.

      Reviewer #2 (Public review): 

      This manuscript describes the role of the production of c-di-AMP on the chlamydial developmental cycle. The main findings remain the same. The authors show that overexpression of the dacA-ybbR operon results in increased production of c-di-AMP and early expression of transitionary and late genes. The authors also knocked down the expression of the dacA-ybbR operon and reported a modest reduction in the expression of both hctA and omcB. The authors conclude with a model suggesting the amount of c-di-AMP determines the fate of the RB, continued replication, or EB conversion. 

      Overall, this is a very intriguing study with important implications however, the data is very preliminary, and the model is very rudimentary. The data support the observation that dramatically increased c-di-AMP has an impact on transitionary gene expression and late gene expression suggesting dysregulation of the developmental cycle. This effect goes away with modest changes in c-di-AMP (detaTM-DacA vs detaTM-DacA (D164N)). However, the model predicts that low levels of c-di-AMP delays EB production is not not well supported by the data. If this prediction were true then the growth rate would increase with c-di-AMP reduction and the data does not show this.

      Thank you for the comments. We have apparently not adequately communicated our predictions and the model. We do not think and nor do we predict that low c-di-AMP levels should increase growth rate, and there is no basis in any of our data to support that. Rather, we predict that the inability to accumulate c-di-AMP should delay production of EBs, and this is what the data show. We have clarified this in the text (line 89 paragraph).

      The levels of c-di-AMP at the lower levels need to be better validated as it seems like only very high levels make a difference for dysregulated late gene expression. However, on the low end it's not clear what levels are needed to have an effect as only DacAopMut and DacAopKD show any effects on the cycle and the c-di-AMP levels are only different at 24 hours.

      Our hypothesis is that increasing concentrations of c-di-AMP within a given RB is a signal for it to undergo secondary differentiation to the EB, and the data support this as noted by the reviewers. Again, we stress that low levels of c-di-AMP are irrelevant to the model. We have revised Figure 1E to indicate the level of c-di-AMP in the control strain at the 24hpi timepoint that coincides with increased EB levels. We hope this will further clarify the goals of our study. That a given strain might be below the EB control is not relevant to the model beyond indicating that it has not reached the necessary threshold for triggering secondary differentiation.

      The authors responded to reviewers' critiques by adding the overexpression of DacA without the transmembrane region. This addition does not really help their case. They show that detaTM-DacA and detaTM-DacA (D164N) had the same effects on c-di-AMP levels but the figure shows no effects on the developmental cycle.

      As it relates directly to the reviewer’s point, the delta-TM strains did not show the same level of c-di-AMP. It may be that the reviewer misread the graph. The purpose of testing these strains was to show that the negative effects of overexpressing full-length WT DacA were due to its membrane localization. Both the FL and deltaTM-DacA (WT) overexpression had equivalent c-di-AMP levels even though the delta-TM overexpression looked like the mCherry-expressing strain based on the measured parameters. This shows that the c-di-AMP levels were irrelevant to the phenotypes observed when overexpressing these WT isoforms. For the mutant isoforms, the delta-TM looked like the mCherry-expressing control while the FL isoform was negatively impacted for reasons we described in the Discussion (e.g., dominant negative effect). In addition, at 16hpi, neither delta-TM strain had c-di-AMP levels that approached the 24h control as denoted in Figure 1E (dashed line) and in the text, which explains why these strains did not show increased late gene transcripts at an earlier timepoint like the dacAop and dacA-KDcom strains.

      Describing the significance of the findings: 

      The findings are important and point to very exciting new avenues to explore the important questions in chlamydial cell form development. The authors present a model that is not quantified and does not match the data well. 

      We respectfully disagree with this assessment as noted above in response to the reviewer’s critique. All of our data are quantified and support the hypothesis as stated.

      Describing the strength of evidence: 

      The evidence presented is incomplete. The authors do a nice job of showing that overexpression of the dacA-ybbR operon increases c-di-AMP and that knockdown or overexpression of the catalytically dead DacA protein decreases the c-di-AMP levels. However, the effects on the developmental cycle and how they fit the proposed model are less well supported. 

      Overall this is a very intriguing finding that will require more gene expression data, phenotypic characterization of cell forms, and better quantitative models to fully interpret these findings. 

      It is not clear what quantitative models the reviewer would prefer, but, ultimately, it is up to the reader to decide whether they agree or not with the model we present. The data are the data, and we have tried to present them as clearly as possible. We would emphasize that, with the number of strains we have analyzed, we have presented a huge amount of data for a study with an obligate intracellular bacterium. As a comparison, most publications on Chlamydia might use a handful of transformant strains, if any. Given the cost and time associated with performing such studies, it is prohibitive to attempt all the time points that one might like to do, and it is not clear to us that further studies will add to or alter the conclusions of the current manuscript.

      Reviewer #2 (Recommendations for the authors): 

      Minor critiques 

      The graphs have red and blue lines but the figure legends are red and black. It would be better if these matched. 

      Changed.

      For Figure 1C. The labels are not very helpful. It's not clear what is HeLa vs mCherry. I believe it is uninfected vs Chlamydia infected.

      Changed.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      This study uses mesoscale simulations to investigate how membrane geometry regulates the multiphase organization of postsynaptic condensates. It reveals that dimensionality shifts the balance between specific and non-specific interactions, thereby reversing domain morphology observed in vitro versus in vivo.

      Strengths:

      The model is grounded in experimental binding affinities, reproduces key experimental observations in 3D and 2D contexts, and offers mechanistic insight into how geometry and molecular features drive phase behavior.

      Weaknesses:

      The model omits other synaptic components that may influence domain organization and does not extensively explore parameter sensitivity or broader physiological variability.

      We thank the reviewer for his/her time and effort to our manuscript. We agree with the point that the contribution of other synaptic components should be addressed. We have included a discussion of the effects of environmental factors such as protein and ion concentrations, as well as other omitted postsynaptic components (SAPAP, Shank, and Homer) on phase morphology. In the middle of the 2<sup>nd</sup> paragraph of Discussion, we added: 

      “While these in vivo results contain additional scaffold and cytoskeletal elements omitted in our model, such as SAPAP, Shank and Homer, nearly all proteins in the middle and lower layers of the PSD associate directly or indirectly with PSD-95 in the upper PSD layer. Consequently, it is probable that other scaffold proteins contribute to the mobility of AMPAR-containing and NMDAR-containing nanodomains indistinguishably. They may increase the stability of the AMPAR and NMDAR clusters but are unlikely to have a distinct effect to reverse the phase-separation phenomenon.”

      Also, as the reviewer pointed out, we agree with that physiological factors such as ion concentration may influence the phase. However, conditions such as ion concentration are implicitly implemented as the specific and nonspecific interactions in this model, which makes it difficult to estimate the effect of each physiological condition individually. We added the variability potential of physiological conditions to the discussion section as a limitation of this model. To investigate parameter sensitivity in more detail, we performed additional MD simulations with weakened membrane constraints to account for the behavior between 3D and 2D. We added:

      “First, our results did not provide direct insights to physiological conditions, such as ion concentrations. Since such factors are implicitly implemented in our model, it is difficult to estimate these effects individually. This suggests the need for future implementation of environmental factors and validation under a broader range of in vivo-like settings.”

      Reviewer #2 (Public review):

      This is a timely and insightful study aiming to explore the general physical principles for the sub-compartmentalization--or lack thereof--in the phase separation processes underlying the assembly of postsynaptic densities (PSDs), especially the markedly different organizations in three-dimensional (3D) droplets on one hand and the twodimensional (2D) condensates associated with a cellular membrane on the other. Simulation of a highly simplified model (one bead per protein domain) is carefully executed. Based on a thorough consideration of various control cases, the main conclusion regarding the trade-off between repulsive excluded volume interactions and attractive interactions among protein domains in determining the structures of 3D vs 2D model PSD condensates is quite convincing. The results in this manuscript are novel; however, as it stands, there is substantial room for improvement in the presentation of the background and the findings of this work. In particular,

      (i) conceptual connections with prior works should be better discussed 

      (ii) essential details of the model should be clarified, and

      (iii) the generality and limitations of the authors' approach should be better delineated.

      We appreciate the reviewer for his/her time and effort on our manuscript and for encouraging comments and helpful suggestions. We answered every technical comment the reviewer mentioned below.

      Specifically, the following items should be addressed (with the additional references mentioned below cited and discussed):

      (1) Excluded volume effects are referred to throughout the text by various terms and descriptions such as "repulsive force according to the volume" (e.g., in the Introduction), "nonspecific volume interaction", and "volume effects" in this manuscript. This is somewhat curious and not conducive to clarity, because these terms have alternate or connotations of alternate meanings (e.g., in biomolecular modeling, repulsive interactions usually refer to those with longer spatial ranges, such as that between like charges). It will be much clearer if the authors simply refer to excluded volume interactions as excluded volume interactions (or effects).  

      Thank you for this comment. We have substituted the words “excluded volume interactions” for words of similar meaning. However, we have left the expression of “non-specific interactions” as they are referring to explicit interactions that are given as force fields in the model, rather than in the general meaning of excluded volume effect.

      (2) In as much as the impact of excluded volume effects on subcompartmentalization of condensates ("multiple phases" in the authors' terminology), it has been demonstrated by both coarse-grained molecular dynamics and field-theoretic simulations that excluded volume is conducive to demixing of molecular species in condensates [Pal et al., Phys Rev E 103:042406 (2021); see especially Figures 4-5 of this reference]. This prior work bears directly on the authors' observation. Its relationship with the present work should be discussed.  

      We appreciate the reviewer’s insightful comment. We have now included a more detailed discussion on excluded volume effect in the revised manuscript, which provides important context for our findings. Furthermore, we have cited the references to support and enrich the discussion, as recommended.

      (3)  In the present model setup, activation of the CaMKII kinase affects only its binding to GluN2Bc. This approach is reasonable and leads to model predictions that are essentially consistent with the experiment. More broadly, however, do the authors expect activation of the CaMKII kinase to lead to phosphorylation of some of the molecular species involved with PSDs? This may be of interest since biomolecular condensates are known to be modulated by phosphorylation [Kim et al., Science 365:825-829 (2019); Lin et al, eLife 13:RP100284 (2025)].  

      We agree that phosphorylation effect on phase separation is an important and interesting aspect to consider. Some experimental results have shown that activation of CaMKII can lead to phosphorylation of various proteins and make PSD condensate more stable by altering their interactions. We included the sentence below in limitations:

      “In this context, we also do not explicitly account for downstream phosphorylation events. Although such proteins are not included in the current components, they will regulate PSD-95, affecting its binding valency, or diffusion coefficient. This is a subject worthy of future research.”

      (4) The forcefield for confinement of AMPAR/TARP and NMDAR/GluN2Bc to 2D should be specified in the main text. Have the authors explored the sensitivity of their 2D findings on the strength of this confinement?

      We thank the reviewer for the helpful recommendation. We have revised the manuscript to include membrane-mimicking potential on main text. Furthermore, we also think that exploring the shape of the 3D/2D condensate phase due to the sensitivity of confinement is a very interesting point. We have additionally performed MD simulations with smaller/larger membrane constraints and included the results in supporting information as Figure S5. The following parts are added:

      “We further attempted to mimic intermediate conditions between 3D and 2D systems in two different manners. First, we applied a weaker membrane constraint in 2D system. Even when the strength of membrane constraints is reduced by a factor of 1000, NMDARs are located on the inner side when the CaMKII was active, as well as the result in 2D system (Fig.S5ABC). Second, to weaken further the effect of membrane constraints, we artificially altered the membrane thickness from 5 nm to 50 nm, in addition to reducing the membrane constraints by 1000. As a result, NMDAR clusters move to the bottom and surround AMPAR (Fig.S5DEF). In this artificial intermediate condition, both states in which the NMDARs are outside (corresponding to 3D) and in which the NMDARs are inside (corresponding to 2D) are observed, depending on the strength of the membrane constraint.”

      (5)  Some of the labels in Figure 1 are confusing. In Figure 1A, the structure labeled as AMPAR has the same shape as the structure labeled as TARP in Figure 1B, but TARP is labeled as one of the smaller structures (like small legs) in the lower part of AMPAR in Figure 1A. Does the TARP in Figure 1B correspond to the small structures in the lower part of AMPAR? If so, this should be specified (and better indicated graphically), and in that case, it would be better not to use the same structural drawing for the overall structure and a substructure. The same issue is seen for NMDAR in Figure 1A and GluN2Bc in Figure 1B. 

      (6) In addition to clarifying Figure 1, the authors should clarify the usage of AMPAR vs TARP and NMDAR vs GluN2Bc in other parts of the text as well.

      (7) The physics of the authors' model will be much clearer if they provide an easily accessible graphical description of the relative interaction strengths between different domain-representing spheres (beads) in their model. For this purpose, a representation similar to that given by Feric et al., Cell 165:1686-1697 (2016) (especially Figure 6B in this reference) of the pairwise interactions among the beads in the authors' model should be provided as an additional main-text figure. Different interaction schemes corresponding to inactive and activated CAMKII should be given. In this way, the general principles (beyond the PSD system) governing 3D vs 2D multiple-component condensate organization can be made much more apparent.  \

      We sincerely appreciate the reviewer’s comments. According to the recommendation, we have changed the diagram in Figure 1B into interaction matrix with each mesoscale molecular representation and the expression in main text to be clearer about AMPAR and TARP, and about the relationship between NMDAR and GluN2Bc. Former diagram of the pairs of specific interaction is moved to supplementary figure. 

      (8) Can the authors' rationalization of the observed difference between 3D and 2D model PSD condensates be captured by an intuitive appreciation of the restriction on favorable interactions by steric hindrance and the reduction in interaction cooperativity in 2D vs 3D?  

      We thank the reviewer for the comment. As pointed out, the multiphase morphology change observed in this study can be attributed to a decrease in coordination number in 2D compared to 3D. We have included the physicochemical rationalization in the discussion.  

      (9) In the authors' model, the propensity to form 2D condensates is quite weak. Is this prediction consistent with the experiment? Real PSDs do form 2D condensates around synapses.  

      We are grateful to the reviewer for highlighting this important point. We agree with that the real PSD forms 3D condensates beneath the 2D membrane. Some lower PSD components under the membrane (i.e. SAPAP, Shank, and Homer) are omitted in our system, which may cause a weak condensation. To emphasize this, we have added the following sentence:

      “While these in vivo results contain additional scaffold and cytoskeletal elements omitted in our model, such as SAPAP, Shank and Homer, nearly all proteins in the middle and lower layers of the PSD associate directly or indirectly with PSD-95 in the upper PSD layer. Consequently, it is probable that other scaffold proteins contribute to the mobility of AMPAR-containing and NMDAR-containing nanodomains indistinguishably. They may increase the stability of the AMPAR and NMDAR clusters but are unlikely to have a distinct effect to reverse the phase-separation phenomenon.”

      However, we believe that the clusters formed on the 2D membrane are not a robust “phase” because they do not follow scaling law. In fact, in our previous study of PSD system with AMPAR(TARP)<sub>4</sub> and PSD-95, we have already reported that phase separation is less likely to occur in 2D than in 3D. The previous result suggests that phase separation on membrane may be difficult to achieve, which is consistent with the results of this study.

      (10) More theoretical context should be provided in the Introduction and/or Discussion by drawing connections to pertinent prior works on physical determinants of co-mixing and de-mixing in multiple-component condensates (e.g., amino acid sequence), such as Lin et al., New J Phys 19:115003 (2017) and Lin et al., Biochemistry 57:2499-2508 (2018). 

      (11) In the discussion of the physiological/neurological significance of PSD in the Introduction and/or Discussion, for general interest it is useful to point to a recently studied possible connection between the hydrostatic pressure-induced dissolution of model PSD and high-pressure neurological syndrome [Lin et al., Chem Eur J 26:11024-11031 (2020)].

      We thank the reviewer for the helpful recommendation. We have added the recommended references in each relevant part in introduction, respectively.

      (12) It is more accurate to use "perpendicular to the membrane" rather than "vertical" in the caption for Figure 3E and other such descriptions of the orientation of the CaMKII hexagonal plane in the text.

      We thank you for your comment. We replaced the word “vertical” with “perpendicular" in the main text and caption.

      Reviewer #3 (Public review):

      Summary:

      In this work, Yamada, Brandani, and Takada have developed a mesoscopic model of the interacting proteins in the postsynaptic density. They have performed simulations, based on this model and using the software ReaDDy, to study the phase separation in this system in 2D (on the membrane) and 3D (in the bulk). They have carefully investigated the reasons behind different morphologies observed in each case, and have looked at differences in valency, specific/non-specific interactions, and interfacial tension.

      Strengths:

      The simulation model is developed very carefully, with strong reliance on binding valency and geometry, experimentally measured affinities, and physical considerations like the hydrodynamic radii. The presented analyses are also thorough, and great effort has been put into investigating different scenarios that might explain the observed effects.

      Weaknesses:

      The biggest weakness of the study, in my opinion, has to do with a lack of more in-depth physical insight about phase separation. For example, the authors express surprise about similar interactions between components resulting in different phase separation in 2D and 3D. This is not surprising at all, as in 3D, higher coordination numbers and more available volume translate to lower free energy, which easily explains phase separation. The role of entropy is also significantly missing from the analyses. When interaction strengths are small, entropic effects play major roles. In the introduction, the authors present an oversimplified view of associative and segregative phase transitions based on the attractive and repulsive interactions, and I'm afraid that this view, in which all the observed morphologies should have clear pairwise enthalpic explanations, diffuses throughout the analysis. Meanwhile, I believe the authors correctly identify some relevant effects, where they consider specific/nonspecific interactions, or when they investigate the reduced valency of CaMKII in the 2D system.

      We thank the reviewer for the insightful and constructive comments. Regarding the difference in phase behavior between 2D and 3D systems, we appreciate the reviewer’s clarification that differences in coordination number and entropy in higher dimensions can account for the observed morphology of the phases. While it may be clear that entropy decreases due to the decrease of coordination number, our objective was to uncover how such an isotropic entropy reduction regulates the behavior of each phase driven by different interactions, which remains largely unknown. To emphasize this, we modified the introduction and have now included a discussion of the entropic contributions to phase behavior in both 2D and 3D systems, and we have made this clearer in the revised manuscript by referencing relevant theoretical frameworks. In the Discussion, we added the sentence below:

      “Generally, phase separation can be explained by the Flory-Huggins theory and its extensions: phase separation can be favored by the difference in the effective pairwise interactions in the same phase compared to those across different phases, and is disfavored by mixing entropy. The effective interactions contain various molecular interactions, including direct van der Waals and electrostatic interactions, hydrophobic interactions, and purely entropic macromolecular excluded volume interactions. For the latter, Asakura-Oosawa depletion force can drive the phase separation. Furthermore, the demixing effect was explicitly demonstrated in previous simulations and field theory (61). Importantly, we note that the effective pairwise interactions scale with the coordination number z. The coordination number is a clear and major difference between 3D and 2D systems. In 3D systems, large z allows both relatively strong few specific interactions and many weak non-specific interactions. While a single specific interaction is, by definition, stronger than a single non-specific interaction, contribution of the latter can have strong impact due to its large number. On the other hand, a smaller z in the membrane-bound 2D system limits the number of interactions. In case of limited competitive binding, specific interactions tend to be prioritized compared to non-specific ones. In fact, Fig. 3A clearly shows that number of specific interactions in 2D is similar to that in 3D, while that of non-specific interactions is dramatically reduced in 2D. In the current PSD system, CaMKII is characterized by large valency and large volume. In the 3D solution system, non-specific excluded volume interactions drive CaMKII to the outer phase, while this effect is largely reduced in 2D, resulting in the reversed multiphase.   

      Also, I sense some haste in comparing the findings with experimental observations. For example, the authors mention that "For the current four component PSD system, the product of concentrations of each molecule in the dilute phase is in good agreement with that of the experimental concentrations (Table S2)." But the data used here is the dilute phase, which is the remnant of a system prepared at very high concentrations and allowed to phase separate. The errors reported in Table S2 already cast doubt on this comparison. 

      We thank the reviewer for the insightful comment. In the validation process, we adjusted the parameters so that the number of molecules in dilute phase is consistent with the experimental lower limit of phase separation, based on the assumption that phase-separated dilute phase is the same concentration as the critical concentration. That is why we focus on comparing dilute phase concentration in Table S2. However, in our simulations, the number of protein molecules is relatively small since it is based on the average number per synapse spine. For example, there are only about 60 CaMKII molecules at most, and its presence in the dilute phase is highly sensitive to concentration, as the reviewer pointed out. This is one of the limitations, so we have added a description to the Limitations section. We added:

      “Second, parameter calibration contains some uncertainty. Previous in vitro study results used for parameter validation are at relatively high concentrations for phase separation, which may shift critical thresholds compared to that in in vivo environments. Also, since the number of molecules included in the model is small, the difference of a single molecule could result in a large error during this validation process.”

      Or while the 2D system is prepared via confining the particles to the vicinity of the membrane, the different diffusive behavior in the membrane, in contrast to the bulk (i.e., the Saffman-Delbrück model), is not considered. This would thus make it difficult to interpret the results of a coupled 2D/3D system and compare them to the actual system.

      We appreciate the reviewer’s helpful comment. We agree with that there is a concern that the Einstein-Stokes equation does not adequately reproduce the diffusion of membrane-embedded particles. We recalculated the diffusion coefficients for every membrane particle used in this model using the Saffman-Delbrück model and found that diffusion coefficients for receptor cores (AMPAR and NMDAR) were approximately three times larger. These values are still about ~10 times smaller than that of molecules diffusing under the cytoplasm. Additionally, since this study focuses on the morphology of the phase/cluster at the thermodynamic equilibrium, we think that the magnitude of the diffusion coefficient has little influence on the final structure of the cluster. However, we will incorporate the membrane-embedded diffusion as a future improvement item for better modelling and implementation. We added:

      “Third, we estimated all the diffusion coefficients from the Einstein-Stokes equation, which may oversimplify membrane-associated dynamics. Applying the Saffmann-Delbrück model to membrane-embedded particles would be desired although the resulting diffusion coefficients remain of the same order of magnitude. These limitations highlight the need for further research, yet they do not undermine the core significance of the present findings in advancing our understanding of multiphase morphologies.”

    1. Reviewer #2 (Public review):

      In this study, Fontana et al. develop a paradigm for associative conditioning by pairing exposure to an alarm substance with a novel tank. Exposure to conspecific alarm substance (CAS) in the novel tank triggers freezing and what they characterize as evasive swimming behaviour, which is subsequently seen in a re-exposure to the novel tank without the CAS present. Importantly, these states are identified via automated processes, including postural tracking and a random forest classification process, which could be very useful tools for subsequent studies.

      In their experiments, they focus on the differences in behaviour among strains of zebrafish (both males and females), and among individual zebrafish. For males and females of different strains, they find some differences, though the clearest message seems to be that the most robust measure of the behaviour in response to both the CAS and in the memory trials is the freezing behaviour, while evasive behaviour is more variable. and not always seen. This may relate to their observation of significant "evasiveness" in vehicle control experiments (discussed further below).

      Moving on to individual variation from within this multi-strain male/female dataset, they first examine transition matrices between states and find tthat his is not dramatically altered by stimulus exposure. They then use clustering to identify 4 different "classes" of zebrafish that differ in their expression (or not) of two types of behaviour: freezing and/or evasive behaviour. They show that over the three exposure epochs of the experiment, this classification is somewhat stable in an individual fish, though many fish change their behaviour - e.g., evading + freezing -> only freezing.

      In the final set of experiments, the authors move beyond behavioural analyses and perform whole-brain cFos mapping of these individual zebrafish. They perform analyses aimed at identifying correlations between individual behavioural expression and the number of cFos-positive cells in different brain regions. Using partial least squares analysis, they find areas associated with two types of behavioural contrasts, which differ in their weighting of different behavioural expression during the Memory trials. Covariation and network structure analysis within different classes of larvae also find some differences in covariation among brain areas, providing hypotheses as to underlying network effects that may govern the expression of freezing and/or evasive behavior in the memory trial phases.

      Overall, I find this to be an interesting study that employs state of the are methods of behavioural analyses and whole-brain cFos analyses, but I am left a little bit confused as to what the take home message is and what can be concluded from this complex study that mixes in analyses of strain, sex, and individuality within a quite complex assay with multiple behavioural parameters.

      My suggestions are as follows:

      (1) My first concern relates to the claim in the abstract that "We found that fear memory behavior fell into four distinct groups: non-reactive, evaders, evading freezers, and freezers".

      In my opinion, the "freezing" aspect is well supported as being both triggered by the CAS and for memory effect upon re-exposure to the tank, but I am less convinced about the "evasive" behaviour. In Figure 2, it appears that "evasiveness" is generally not increased in both the Exposure or Memory phases for many groups, and in Figure 5, it appears that "evasiveness" is expressed by nearly 50% of the fish in the pre-exposure condition before CAS addition and in all phases in the vehicle condition. Therefore, it appears that most of the expression of this behaviour is independent of any memory-based effect.

      (2) My second concern relates to the claim in the abstract that "background strain and sex influenced how fish respond to CAS, with males more likely to increase evasive behaviors than females and the TU strain more likely to be non-reactive."

      My understanding, based on the introduction and on the methods, is that it is likely important that the CAS be prepared from conspecifics of the same strain and sex, and for this reason, they prepared different CAS specific for each strain and each sex. Therefore, the "CAS" that is applied is necessarily different for each condition, and I am concerned about if the differences observed could relate more to variation in the quality, purity, concentration, etc. of the specific CAS samples for different groups, rather than their reactivity to the substance or their ability to form memories based on such experiences.

      (3) My third concern relates to the interpretation of the cFos data.

      As I mentioned above, I feel as though the behavioural analysis is perhaps more complex than is warranted via the inclusion of evasiveness, and I wonder if the conclusions from the experiments would be simpler if analyzed only from the perspective of freezing.

      But considering the presented analyses: while I dont think there is anything wrong with the partial least squares approach and the network analyses, I am concerned that the simple messaging in the text does not reflect the complexity of this analysis combining different weightings of different behavioural characteristics in a behavioural contrast, or covariations among many regions and what such analyses mean at the level of brain function. For these reasons, I feel like statements along the lines of "Behavioral variation is driven by differences in the activity of brain regions outside the telencephalon, such as the cerebellum, preglomerular nuclei, preoptic area and hypothalamus" are not well supported.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      This manuscript from Jones and colleagues investigates a previously described phenomenon in which P. falciparum malaria parasites display increased trafficking of proteins displayed on the surface of infected RBCs, as well as increased cytoadherence in response to febrile temperatures. While this parasite response was previously described, it was not uniformly accepted, and conflicting reports can be found in the literature. This variability likely arises due to differences in the methods employed and the degree of temperature increase to which the parasites were exposed. Here, the authors are very careful to employ a temperature shift that likely reflects what is happening in infected humans and that they demonstrate is not detrimental to parasite viability or replication. In addition, they go on to investigate what steps in protein trafficking are affected by exposure to increased temperature and show that the effect is not specific to PfEMP1 but rather likely affects all transmembrane domain-containing proteins that are trafficked to the RBC. They also detect increased rates of phosphorylation of trafficked proteins, consistent with overall increased protein export.

      Strengths:

      The authors used a relatively mild increase in temperature (39 degrees), which they demonstrate is not detrimental to parasite viability or replication. This enabled them to avoid potential complications of a more severe heat shock that might have affected previously published studies. They employed a clever method of fractionation of RBCs infected with a var2csa-nanoluc fusion protein expressing parasite line to determine which step in the export pathway was likely accelerating in response to increased temperature. This enabled them to determine that export across the PVM is being affected. They also explored changes in phosphorylation of exported proteins and demonstrated that the effect is not limited to PfEMP1 but appears to affect numerous (or potentially all) exported transmembrane domain-containing proteins.

      Weaknesses:

      All the experiments investigating changes resulting from increased temperature were conducted after an increase in temperature from 16 to 24 hours, with sampling or assays conducted at the 24 hr mark. While this provided consistency throughout the study, this is a time point relatively early in the export of proteins to the RBC surface, as shown in Figure 1E. At 24 hrs, only approximately 50% of wildtype parasites are positive for PfEMP1, while at 32 hrs this approaches 80%. Since the authors only checked the effect of heat stress at 24 hrs, it is not possible to determine if the changes they observe reflect an overall increase in protein trafficking or instead a shift to earlier (or an accelerated) trafficking. In other words, if a second time point had been considered (for example, 32 hrs or later), would the parasites grown in the absence of heat stress catch up?

      We did not assess cytoadhesion at later stages, but in the supplementary figures we show that at 40 hours post infection both heat stress and control conditions have comparable proportions of VAR2CSA-positive iRBCs, whilst they differ at 24h. This is true for the DMSO (control wildtype resembling) HA-tagged lines of HSP70x and PF3D7_072500 (Supplementary Figures 9 and 12 respectively). In the light that protein levels appear not changed, we conclude that trafficking is accelerated during these earlier timepoints, but remains comparable at later stages. This would still increase the overall bound parasite mass as parasites start to adhere earlier during or after a heat stress.

      Reviewer #2 (Public review):

      This manuscript describes experiments characterising how malaria parasites respond to physiologically relevant heat-shock conditions. The authors show, quite convincingly, that moderate heat-shock appears to increase cytoadherance, likely by increasing trafficking of surface proteins involved in this process.

      While generally of a high quality and including a lot of data, I have a few small questions and comments, mainly regarding data interpretation.

      (1) The authors use sorbitol lysis as a proxy for trafficking of PSAC components. This is a very roundabout way of doing things and does not, I think, really show what they claim. There could be a myriad of other reasons for this increased activity (indeed, the authors note potential PSAC activation under these conditions). One further reason could be a difference in the membrane stability following heat shock, which may affect sorbitol uptake, or the fragility of the erythrocytes to hypotonic shock. I really suggest that the authors stick to what they show (increased PSAC) without trying to use this as evidence for increased trafficking of a number of non-specified proteins that they cannot follow directly.

      This is a valid point, however, uninfected RBCs do not lyse following heat stress, nor do much younger iRBCs, indicating that the observed effect is specific to infected RBCs at a defined stage. The sorbitol sensitivity assay is performed at 37°C under normal conditions after cells are returned to non–heat stress temperatures, so the effect is not due to transient changes in membrane permeability at elevated temperature. 

      Planned experiment: However, to increase the strength of our conclusions and further test our hypothesis, we will perform sorbitol sensitivity assays on >20 hours post infection iRBCs following heat stress in the presence and absence of furosemide, a PSAC inhibitor. If iRBC lysis is abolished with furosemide present, this would confirm that the effect is PSAC-dependent. However, the effect could also possibly be due to altered PSAC activity during heat stress which is maintained at lower temperatures, as outlined in the discussion.

      (2) Supplementary Figure 6C/D: The KAHRP signal does not look like it should. In fact, it doesn't look like anything specific. The HSP70-X signal is also blurry and overexposed. These pictures cannot be used to justify the authors' statements about a lack of colocalisation in any way.

      Planned experiment: We agree that the IFAs are not the best as presented and will include better quality supplementary images in a revised version.

      (3) Figure 6: This experiment confuses me. The authors purport to fractionate proteins using differential lysis, but the proteins they detect are supposed to be transmembrane proteins and thus should always be found associated with the pellet, whether lysis is done using equinatoxin or saponin. Have they discovered a currently unknown trafficking pathway to tell us about? Whilst there is a lot of discussion about the trafficking pathways for TM proteins through the host cell, a number of studies have shown that these proteins are generally found in a membrane-bound state. The authors should elaborate, or choose an experiment that is capable of showing compartment-specific localisation of membrane-bound proteins (protease protection, for example).

      We do not believe we identified a novel trafficking pathway, but that we capture trafficking intermediates of PfEMP1 between the PVM and the RBC periphery, in either small vesicles, and/ or possibly Maurer’s clefts. These would still be membrane embedded, but because of their small size, not be pelleted using the centrifugation speeds in our study (we did not use ultracentrifugation). This explanation, we believe, is in line with the current hypothesis of PfEMP1 and other exported TMD protein trafficking to the periphery or the Maurer’s clefts.

      (4) The red blood cell contains, in addition to HSP70-X, a number of human HSPs (HSP70 and HSP90 are significant in this current case). As the name suggests, these proteins non-specifically shield exposed hydrophobic domains revealed upon partial protein unfolding following thermal insult. I would thus have expected to find significantly more enrichment following heat shock, but this is not the case. Is it possible that the physiological heat shock conditions used in this current study are not high enough to cause a real heat shock?

      As noted by the reviewer, we do not see enrichment of red blood cell heat shock proteins following heat stress, either with FIKK10.2-TurboID or in the phosphoproteome. We used a physiologically relevant heat stress that significantly modifies the iRBC, as shown by our functional assays. While a higher temperature might induce an association of red blood cell heat shock proteins, such conditions may not accurately reflect the most commonly found context of malaria infection.

      Reviewer #3 (Public review):

      Summary:

      In this paper, it is established that high fever-like 39 C temperatures cause parasite-infected red blood cells to become stickier. It is thought that high temperatures might help the spleen to destroy parasite-infected cells, and they become stickier in order to remain trapped in blood vessels, so they stop passing through the spleen.

      Strengths:

      The strength of this research is that it shows that fever-like temperatures can cause parasite-infected red blood cells to stick to surfaces designed to mimic the walls of small blood vessels. In a natural infection, this would cause parasite-infected red blood cells to stop circulating through the spleen, where the parasites would be destroyed by the immune system. It is thought that fevers could lead to infected red blood cells becoming stiffer and therefore more easily destroyed in the spleen. Parasites respond to fevers by making their red blood cells stickier, so they stop flowing around the body and into the spleen. The experiments here prove that fever temperatures increase the export of Velcro-like sticky proteins onto the surface of the infected red blood cells and are very thorough and convincing.

      Weaknesses:

      A minor weakness of the paper is that the effects of fever on the stiffness of infected red blood cells were not measured. This can be easily done in the laboratory by measuring how the passage of infected red blood cells through a bed of tiny metal balls is delayed under fever-like temperatures.

      Previous work by Marinkovic et al. (cited in this manuscript) reported that all RBCs, both infected and uninfected, increase in stiffness at 41 °C compared with 37 °C, with trophozoites and schizonts exhibiting a particularly pronounced increase. We agree that it would be interesting to determine whether similar changes occur at physiological fever-like temperatures, and whether this increase in stiffness coincides with the period of elevated protein trafficking. However, since we have already demonstrated enhanced protein export using multiple complementary approaches, we have chosen to address these questions in a follow-up study.

    1. Different individuals, cultures and societies may place more value on one type of knowing than another, although most use a combination that includes science and religion.

      Its interesting to look at the differences in cultural values and morals, especially when it comes to knowing and understanding. For centuries, society has continued to attempt to answer the questions of knowing using both science and religion as a temporary band aid for what we have yet to understand. In no way do I think using these concepts is a negative thing, on the contrary, having something that keeps us content.

    1. But it has progressively realized that there is some kind of intelligibility in the world, that the world can, in part, be understood, and that we have experiences which, if properly interrogated, will yield answers to our questions.

      I don't think this is true. I don't think that we can always have an answer to our questions. Even when we do it can change. I don't think that there can ever truly be a universal truth because as humans we are not able to truly grasp all of the ideas and concepts that go on in the world. Even when we think we find a truth, that may change for us in the future.

    1. Author response:

      The following is the authors’ response to the original reviews

      We would like to express our sincere gratitude to the reviewers for their thorough analysis of the manuscript and their extremely helpful comments. We have taken all the suggestions into consideration and conducted a range of additional experiments to address the points raised. We have also extensively revised the manuscript to clarify descriptions, correct inaccuracies and remove inconsistencies. We have modified the figures for clarity and content.

      Overall, we expanded the description of the EBH structure to emphasise its dimeric nature and the impact of the two binding sites on interpreting the binding data, including cooperativity. Using ITC, we tested the effect of the pre-SxIP residues on the binding affinity with additional peptides. We found that these residues had a significant effect, albeit much smaller than that of the post-SxIP residues. We analysed the binding of the 11MACF-VLL mutant with EBH-ΔC and evaluated the exchange rates. In agreement with our model, we found that the EBH affinity for the SxIP peptide from CK5P2 (KKSRLPRILIKRSR), which has a C-terminal sequence similar to that of the 11MACF-VLLRK mutant, is 21nM, which is similar to the affinity of the mutant itself. This demonstrates the significant variation in affinity observed among natural SxIP ligands, as predicted by our study. Our responses to the specific points raised by the reviewers are provided below.

      Reviewer #1 (Public Review):

      There is no direct experimental evidence for independent dock and lock steps. The model is certainly plausible given their structural data, but all titration and CEST measurements are fully consistent with a simple one-step binding mechanism. Indeed, it is acknowledged that the results for the VLL peptide are not consistent with the predictions of this model, as affinity and dissociation rates do not co-vary. The model may still be a helpful way to interpret and discuss their results, and may indeed be the correct mechanism, but this has not yet been proven.

      Unfortunately, it is not possible to obtain direct experimental evidence because the folding of the C-terminus is too fast to influence the NMR parameters. However, as the reviewer pointed out, our structural data support the two-step model, since folding of the C-terminus is only possible once the ligand containing the post-SxIP residues has bound. By adopting a mechanistically supported model, we can analyse the contributions to binding and relate them to the structural characteristics of the complex. This provides a clearer insight into the roles of the various regions in the interaction and allows to modify them rationally to enhance the ligand affinity.

      In the revised version, we restate the equations in terms of comparing the on-rates. This provides a clearer view of the effect of the additional stage, which cannot increase the overall on-rate since the two stages are sequential. If the forward rate of the second stage is comparable to or slower than the off-rate of the first stage, the overall on-rate decreases. Conversely, if the forward rate is much faster, the overall on-rate remains unchanged. For the wild-type 11MACF peptide, we observed that the presence of the EBH C-terminus does not affect the on-rate of binding, which is in perfect agreement with the two-step model and indicates that the C-terminus folds very quickly.

      Additionally, we evaluated the binding of the 11MACF-VLL mutant to EBH-ΔC and observed a twofold decrease in Kd compared to WT 11MAC, primarily due to an increase in the on-rate. Interestingly, this rate is approximately twice as low as the overall on-rate for EBH/11MACF-VLL binding, contradicting the sequential two-step model. This suggests a more complex binding process where binding is accelerated by additional hydrophobic interactions with the unfolded C-terminus. However, given the difficulty of quantifying very slow exchange rates, it is more likely that the discrepancy is due to the accuracy of the rate measurements. Therefore, the model allows the rational analysis of changes in binding parameters due to mutations.

      There is little discussion of the fact that binding occurs to EBH dimers -  either in terms of the functional significance of this or in the  acquisition and analysis of their data. There is no discussion of  cooperation in binding (or its absence), either in the analysis of NMR  titrations or in ITC measurements. Complete ITC fit results have not  been reported so it is not possible to evaluate this for oneself.

      We added information about the dimer to the introduction, emphasising its role in enhancing interaction with microtubules (MTs) and its structural role in SxIP binding. The ITC data do not exhibit any biphasic behaviour and can be fitted to a single-site model with 1:1 stoichiometry relative to the EB1c monomer. This corresponds to two independent binding sites in the dimer. We have added the stoichiometry to Table 1 and the description. The NMR titration data for the 11MACF and 11MACF-VLL interactions were fitted to the TITAN dimer model, which includes cooperativity parameters. For WT 11MACF, both cooperativity parameters were zero, corresponding to independent binding sites in the ITC model. For 11MACF-VLL, the fitting suggests weak negative cooperativity, with a ~3-fold increase in Kd for binding to the second site and no change in the off-rate. This difference in Kd is likely to be too small to induce a biphasic shape to the ITC curve. As the cooperativity effect on the NMR spectra is small and absent in the ITC, we used the independent sites model for data analysis, as there is insufficient justification for introducing extra parameters into the model. Crucially, fitting to this model did not alter the off-rate value obtained by NMR or affect the conclusions. We added a description of cooperativity to the results and discussion.

      Three peptides are used to examine the role of C-terminal residues in SxIP motifs: 4-MACF (SKIP), 6-MACF (SKIPTP), and 11-MACF (KPSKIPTPQRK). The 11-mer demonstrates the strongest binding, but this has added residues to the N-terminal as well. It has also introduced charges at both termini, further complicating the interpretation of changes in binding affinities. Given this, I do not believe the authors can reasonably attribute increased affinities solely to post-SxIP residues.

      We tested the 9MACF peptide SKIPTPQRK, which has the same N-terminus as the 4- and 6-MACF peptides, and found that its binding affinity is ~10-fold weaker than that of 11MACF. This demonstrates the contribution of both the pre- and post-SxIP residues. This is likely due to electrostatic interactions between the positively charged N-terminus and the negatively charged EBH surface, similar to those involving the positive charges at the peptide C-terminus. Although significant, the contribution of the N-terminal peptide region is approximately one order of magnitude lower than that of the post-SxIP residues, meaning the post-SxIP region is the main affinity modulator. We have added the binding data on 9MACF and a discussion of the contributions to the manuscript.

      Experimental uncertainties are, with exceptions, not reported.

      Uncertainties added to the number in Table 1 and the text. Information on how uncertainties were calculated added to Table 1.

      Reviewer #1 (Recommendations For The Authors):

      (1) Have you tested the binding of the WT dimer in your cell model?

      We haven’t tested the WT dimer because it has already been reported in the 2009 Cell paper by Honappa et al. In the cell experiments, our main focus was on recruiting the high-affinity mutant to MTs. The low level of recruitment, despite the mutant's high affinity, highlights the importance of dimerisation or additional contributions to binding.

      (2) Please deposit all NMR dynamics measurements (relaxation rates and derived model-free parameters) alongside structural data in the BMRB.

      The relaxation data have been submitted to BMRB, IDs 53187 and 53188

      (3) Please report complete fitting results, e.g. for ITC, including stoichiometries. Clarify what this means for binding to a dimer, and if there is any evidence of cooperativity. Figure 3C, right hand panel, shows an unusual stoichiometry, can the authors comment on this?

      We have added more information on stoichiometry and cooperativity; please refer to our response to the above comment for details. We repeated the titration for the VLLRK mutant using fresh peptide stock. As expected, the stoichiometry was close to 1:1 relative to the EB1c monomer. The new data are now included in the table and figure.

      (4) Please report uncertainties for all measurements of Kd, koff, kon, ∆G, ∆H, ∆S, and explain whether these are determined from statistical analysis, technical or biological repeats (and where reported, clarify between standard deviation/standard error). Please also be aware of standard guidelines for reporting significant figures for data with uncertainties, as these have not been followed in Table 1.

      Uncertainties added to the number in Table 1 and the text. Information on how uncertainties were calculated added to Table 1.

      (5) The construct design for the cell model is unclear - given the importance of flanking residues, please report and discuss how the sequences are attached to venus: which termini is attached, and what is the linker composition?

      We cloned the peptides at the C-terminus of mTFP, after the GS linker of the vector. The peptide itself contains a GS sequence at the N-terminus, creating a highly flexible GSGS linker that separates the SxIP region from mTFP and minimises the potential effect of mTFP on binding. We followed the design of Honappa et al. to enable direct comparison with the published results. We have added this information to the 'Methods' section..

      (6) Which HSQC pulse sequence was used for 2D lineshape analysis? The authors mention non-linear chemical shift changes, presumably associated with the dimer interface - this would be useful to expand upon and clarify.

      For the lineshape analysis, we used the standard Bruker sequence hsqcfpf3gpphwg with soft-pulse watergate water suppression and flip-back. This sequence is included in the TITAN model. We added the description of the non-linear chemical shift changes and connection of these changes to the allosteric effect of the binding to the supplementary information describing details of the lineshape analysis.

      (7) Figure 1A could usefully highlight the dimer interface in the surface representation also.

      We believe that including the interface would make the figure too complicated. The dimer configuration is shown in different colours for the two subunits, clearly demonstrating their involvement in forming the binding site.

      (8) Figures 1C and 1D could usefully show a secondary structure schematic to assist the reader. The x-axis in these figures is not linear and this should be corrected. The calculation of combined chemical shift perturbations should be described.

      Thank you for the helpful suggestion. We changed the scale of the figures and added the diagram of the secondary structure.

      (9) Units are missing from many figure axes.

      We added missing units to the axes. Thank you for highlighting this.

      (10) What peptide concentrations are used in Figure 1C? Presumably, these should be reported at saturation for this to be a fair comparison, this should be clarified.

      The protein concentration was 50 µM. Peptides 4MACF and 6MACF were added at a 100-fold molar excess and peptide 11MACF was added at a 4-fold excess. Saturation was achieved for 11MACF. This was impossible for the short peptides due to their mM affinity. This information has been added to the figure legend. The figure's main aim is to illustrate the differences in the chemical shift perturbation profiles, which can be achieved even if full saturation is not attained. Although the absolute value of the chemical shifts is proportional to the degree of saturation, the distribution of the largest chemical shift changes is independent of this degree. Therefore, we can draw conclusions about the distribution of changes by comparing under non-saturation conditions.

      (11) The presentation of raw peak intensities in Figure 1D shows primarily the flexibility of the C-terminal region associated with high intensities. Beyond this, when comparing the binding of peptides it would be much more informative to show relative peak intensities. Residues around 210-225 appear to show strong broadening in the presence of peptide, but this is masked by the low initial intensity. Can the authors clarify and discuss this? Also, what peptide concentrations were used for this comparison? For a fair comparison, it should be close to saturation - particularly to exclude exchange broadening contributions.

      The protein concentration was 50 µM. 6MACF and 6MACF peptides were added at a 100-fold excess and 11MACF at a 4-fold excess. Saturation was achieved for 11MACF. This was impossible to achieve for the short peptide due to its mM affinity. This information has been added to the figure legend. Upon checking the data, we found a small systematic offset in the coiled-coil region of some of the complexes, as the integral intensity had been used in the initial plot. While this does not change the conclusion regarding the high dynamics of the C-terminus, it does create an inaccurate perception of the relative intensities of the folded regions in the different complexes, as noted by the reviewer. We have now plotted the amplitudes at the maximum of the peaks, which do not exhibit any systematic offset as they are much less susceptible to baseline distortions. We are grateful to the reviewer for highlighting this apparent discrepancy.

      (12) Figure 2 - the scale for S2 order parameters appears to be backwards, given the caption, but its range should be indicated. Similarly, the range of values for Rex should also be indicated. These data should also be tabulated/plotted in supporting information.

      We have corrected the figure legend and added S2 and Rex plots to the supplementary material. The figure aims to highlight regions of increased mobility, while the plots provide full quantitative information on the values. We thank the reviewer for pointing out the error in the figure legend and for the suggestions regarding the plots.

      (13) The scale in Figure 3B is illegible. Indeed, the whole structure is quite small and could usefully be expanded.

      We increased the size of the structure panels and added a scale.

      (14) Figure 4 does not show a decrease in exchange rates, as per the caption - no comparison of exchange rates is shown, only thermodynamic information in panel E. Panel C shows CEST measurements, but it is not clear what system this is for - please clarify, and consider showing the comparable data for the ∆C construct for comparison.

      We have amended the figure legend to clarify that the figure shows binding parameters. We added information about the CEST profiles for the EBH/11MACF interaction to the figure legend (Figure 4C). Exchange with the ∆C construct is too fast for CEST measurements. We used lineshape analysis to evaluate the exchange rates for this construct.

      (15) The schematics shown in Figure 4D, and elsewhere, are really quite difficult to understand. They may pose additional challenges to colourblind readers. Please consider ways that this could be clarified.

      We simplified the colour scheme in the model to make the colours easier to see and to highlight SxIP and non-SxIP regions. We believe that this improved the clarity of the figure.

      (16) Figures S1D/E - the x-axes are unclear and units are missing from the y-axes.

      We re-labelled the axes to clarify the scale and units. Thank you for pointing this.

      Reviewer #2 (Public Review):

      The C-terminal tail of EB1, which is adjacent to EBH and is not analyzed in this study, is highly acidic and plays an important role in protein interactions. If the authors discuss the C-terminus of EB1, they should analyze the whole C-terminus of EB1, which would strengthen the conclusion they have made.

      Honapa et al., Cell, 2009, reported chemical shift perturbations (CSPs) on the peptide binding for the full EB1c fragment, which includes the negatively charged C-terminus. Similar to our study, they observed significant CSPs in the FVIP region but negligible CSPs at the negatively charged EEY end. They concluded that the final eight EB1c residues did not contribute to binding and used a truncated EB1c construct for their structural analysis. Building on that study, we used the same EEY-truncated construct to analyse the contribution of the C-terminus in more detail. We believe that conducting additional experiments with the full C-terminus with respect to SxIP binding would be superfluous, as it would merely replicate the findings of Honapa EA. We have added the rationale for selecting the truncated EB1c construct to the text, referencing Honapa et al.

      Reviewer #2 (Recommendations For The Authors):

      (1) Figure 2C: The authors can analyze the 11MACF peptide as well, to provide more assurance to their argument. It would be easier to distinguish the sequences of "SKIP" and "FVIP" by changing their colors.

      Our relaxation analysis (Fig. 2C) focuses on the dynamics of the unstructured C-terminal region in both the free and complex forms. Further relaxation analysis of the peptide would not provide additional information on this, and would be complicated by the presence of free peptide in solution.

      (2) Figure 3B: Acidic residues in EBH should be labeled.<br /> Page 6, line 11: If the authors insist that the acidic patch will influence the interactions between EB1 and the peptide, the data of the analysis using the entire EB1 C-terminus should be included, given that the C-terminal tail of EB1 is highly acidic.

      To test the contribution of charge to binding, we conducted an ITC experiment at increasing salt concentrations. We observed a significant increase in Kd values when the concentration of NaCl increased from 50 to 150 mM, which supports our conclusion regarding the significant electrostatic contribution. This conclusion is independent of the presence or absence of the C-terminus.

      As we explained earlier, Honapa et al., Cell 2009, conducted an NMR experiment on the full EB1c and observed no CPSs in the EEY region, indicating a negligible contribution from the EEY region to SxIP binding. Therefore, we think that additional experiments involving the entire C-terminus are unnecessary, as they would simply replicate the results of Honapa et al. We have added the rationale for selecting the truncated EB1c to the text, referencing Honapa et al.

      It would be very difficult to label the acidic residues without enlarging 3B considerably. However, we do not think this is necessary as we are not discussing any specific residues. The current figure shows the distribution of the surface charge, which is sufficient for our purposes.

      (3) Figure 2B (Page 4, line 27): The side chain of S5477 should be drawn. The authors should include a figure of the crystal structure of EBH and SxIP as a comparison (Honnappa et al., Cell, 2009). In their paper, Honnappa et al. performed chemical shift perturbation titrations by NMR. From their analysis, I imagine that the EB1 tail may not be critical for the EB1 C-terminus:SxIP interactions, since the signals in the tail are not significantly perturbed. The authors should cite this paper.

      We are grateful to the reviewer for highlighting this. CSP analysis of the Honapa EA revealed significant changes in the FVIP region, which we also observed. They also reported negligible CSPs at the EEY end, demonstrating that this part of the tail is non-critical and can be removed. We have added text to the manuscript to highlight the similarity between CSPs and those observed in Honapa EA. Figure 2B shows the side chains for the residues with the strongest detected contacts. These do not include S5477.

      (4) Figure 3C (ITC data): The stoichiometric ratios in the ITC data look strange. EBH vs KPSKIPVLLRKRK, is it 1:1?

      We repeated the ITC experiments using a new stock of the peptide and a new batch of the protein, checking the concentrations using UV spectroscopy. The new experiments produced a stoichiometry close to 1, as shown in the table.

      (5) Page 10, line 27: "The TPQ sequence of 11MACF is not optimal...": What is the meaning of "optimal"? The transient interaction between EB1 and its binding partner is responsible for the dynamics of the microtubule cytoskeleton. In a sense, the relatively weak interaction is "optimal" for the system. The authors should rephrase the word.

      We agree that weak interactions are optimal from a functional perspective, as they have been selected through evolution. In our case, 'optimal' refers to the hydrophobic interaction with the C-terminus. We replaced 'optimal' with 'ideal' to draw more attention to the second part of the sentence, which clarifies the context.

      (6) Page 11, line 2: "small number of comets enriched in the peptide that were too faint for the quantitative analysis, comparable to the reported previously (Honnappa, Gouveia et al. 2009)." Honnappa et al. used EGFP-fusion constructs in their study: EGFP forms a weak dimer, which presumably gave different results from the authors' mTFP-constructs. The authors can note this point in the text.

      We are grateful to the reviewer for highlighting this. This aligns well with our conclusion that dimerisation is important for localisation to comets. We have added this point to the text.

      (7) Page 10, line 21: The authors calculate the free energy of complex formation between EBH and MACF peptide and explain in the text, but it is hard to follow.

      We simplified and clarified the description of the energy contributions by focusing on the SxIP and non-SxIP regions of the peptide, as well as the EBH C-terminus.

      Minor points:

      Page 2, line 9: IP motifs are not usually located in the C-terminus. For example, SxIP in Tastin is located in the N-terminal region, and SxIPs in CLASP are in the middle.

      We corrected this statement, removing C-terminal.

      Page 3, line 4: The authors should note the residue numbers of SKIP.

      We think that in this context the residue number of the SxIP region are not important and would be distracting.

      Figure 3D and Figure S3F: Make the colors and the order the same between the two figures.

      We changed the colour scheme and the order of ITC parameters in S3F to match the main figure.

      Figure 1A, 2B, Figure S5: Change the color of SKIP from other residues in the same chain, otherwise the readers cannot distinguish. Likewise, change the color of FVIP in Figure 2B.

      We think that changing the colours will complicate the figures unnecessary. The corresponding residues are clearly labelled in the figures.

      Figure 3, Figure S5, S6, S7: Box the letters of SKIP for clarity.

      We boxed the SxIP region in S5 (new S6) and underlined in S6 (new S7). In S7 (new S8) the location of SxIP is very clear from the homology.

      Figure 3B; Figure S2: Hard to recognize the peptide (MACF in green).

      We increased the size of 3D and S2, making it easier to see the peptide.

      Figure 1C and D: Make the residual numbers of the x-axes the same between the two graphs.

      We made new plots with a linear scale for the residue numbers.

      Figure 2A: The structures shown are not EB1. It should be described as EBH or EB1(191-260 a.a.).

      Corrected.

      Page 5, line 17: "the S2 values of the C-terminus" should be "the S2 values of the C-terminal loop in EBH", otherwise it is confusing.

      Corrected.

      Page 6, line 27; Figure S3C and S6: Please indicate the assignments of the resonances from "253FVI255" in the Figures.

      We labelled the peaks corresponding to the 253FVI255 region in figure S6 (new S7). Figure S3 shows EBH-ΔC that does not include this region.

      Page 7, line 25: Figure S7 should be S8.

      Corrected

      Page 12, line 6: "sulfatrahsferases" must by a typo.

      Corrected.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      This study examined the effect of blood pressure variability on brain microvascular function and cognitive performance. By implementing a model of blood pressure variability using an intermittent infusion of AngII for 25 days, the authors examined different cardiovascular variables, cerebral blood flow, and cognitive function during midlife (12-15-month-old mice). Key findings from this study demonstrate that blood pressure variability impairs baroreceptor reflex and impairs myogenic tone in brain arterioles, particularly at higher blood pressure. They also provide evidence that blood pressure variability blunts functional hyperemia and impairs cognitive function and activity. Simultaneous monitoring of cardiovascular parameters, in vivo imaging recordings, and the combination of physiological and behavioral studies reflect rigor in addressing the hypothesis. The experiments are well-designed, and the data generated are clear. I list below a number of suggestions to enhance this important work:

      (1) Figure 1B: It is surprising that the BP circadian rhythm is not distinguishable in either group. Figure 2, however, shows differences in circadian rhythm at different timepoints during infusion. Could the authors explain the lack of circadian effect in the 24-h traces?

      The circadian rhythm pattern is apparent in Figure 2 (Active BP higher than Inactive BP), where BP is presented as 12hour averages. When the BP data is expressed as one-hour averages (rather than minute-to-minute) over 24hours, now included in the revised manuscript as Supplemental Figure 3C-D, the circadian rhythm becomes noticeable. In addition, we have included one-hour average BP data for all mice in the control and BPV groups, Supplemental Figure 3A-B.

      Notably, the Ang-II induced pulsatile BP pattern remains evident in the one-hour averages for the BPV group, Supplemental Figure 3B. To minimize bias and validate variability, pump administrations start times were randomized for both control and BPV groups, Supplemental Figure 3A-B. Despite these adjustments, the circadian rhythm profile of BP is consistently maintained across individual mice and in the collective dataset, Supplemental Figure 3C-D.

      (2) While saline infusion does not result in elevation of BP when compared to Ang II, there is an evident "and huge" BP variability in the saline group, at least 40mmHg within 1 hour. This is a significant physiological effect to take into consideration, and therefore it warrants discussion.

      Thank you for this comment. The large variations in BP in the raw traces during saline infusion reflects transient BP changes induced by movement/activity, which is now included in Figure 1B (maroon trace). The revised manuscript now includes Line 222 “Note that dynamic activity-driven BP changes were apparent during both saline- and Ang II infusions, Figure 1B”.

      (3) The decrease in DBP in the BPV group is very interesting. It is known that chronic Ang II increases cardiac hypertrophy, are there any changes to heart morphology, mass, and/or function during BPV? Can the decrease in DBP in BPV be attributed to preload dysfunction? This observation should be discussed.

      The lower DBP in the BPV group was already present at baseline, while both groups were still infused with saline, and was a difference beyond our control. However, this is an important and valid consideration, particularly considering the minimal yet significant increase in SBP within the BPV group (Figure 1D). Our goal was to induce significant transient blood pressure responses (BPV) and investigate the impact on cardiovascular and neurovascular outcomes in the absence of hypertension. We did not anticipate any major cardiac remodeling at this early time point (considering the absence of overt hypertension) and thus cardiac remodeling was not assessed and this is now discussed in the revised manuscript (Line 443-453).

      (4) Examining the baroreceptor reflex during the early and late phases of BPV is quite compelling. Figures 3D and 3E clearly delineate the differences between the two phases. For clarity, I would recommend plotting the data as is shown in panels D and E, rather than showing the mathematical ratio. Alternatively, plotting the correlation of ∆HR to ∆SBP and analyzing the slopes might be more digestible to the reader. The impairment in baroreceptor reflex in the BPV during high BP is clear, is there any indication whether this response might be due to loss of sympathetic or gain of parasympathetic response based on the model used?

      We appreciate the reviewer’s suggestion and have accordingly generated new figures displaying scatter plots of SBP vs HR with linear regression analysis (Figure 3D-G). Our goal is to further investigate which branch of the autonomic nervous system is affected in this model. The loss of a bradycardic response suggests either an enhancement of sympathetic activity, a reduction in parasympathetic activity, or a combination of both. This is briefly discussed in the revised manuscript (Line 486-496).

      Heart rate variability (HRV) serves as an index of neurocardiac function and dynamic, non-linear autonomic nervous system processes, as described in Shaffer and Ginsber[1]. However, given that our data was limited to BP and HR readings collected at one-minute intervals, our primary assessment of autonomic function is limited to the bradycardic response. Further studies will be necessary to fully characterize the autonomic parameters influenced by chronic BPV.

      (5) Figure 3B shows a drop in HR when the pump is ON irrespective of treatment (i.e., independent of BP changes). What is the underlying mechanism?

      We apologize for any lack of clarity. These observed heart rate (HR) changes occurred during Ang II infusion, when blood pressure (BP) was actively increasing. In the control group, the pump solution was switched to Ang II during specific periods (days 3-5 and 21-25 of the treatment protocol) to induce BP elevations and a baroreceptor response, allowing direct comparisons between the control and BPV group.

      To clarify this point, we have revised Line 260-263 of the manuscript: “To compare pressure-induced bradycardic responses between BPV and control mice at both early and later treatment stages, a cohort of control mice received Ang II infusion on days 3-5 (early phase) (Supplemental Figure 4) and days 21-25 (late phase) thereby transiently increasing BP”.

      Additionally, a detailed description has been added to the Methods section (Line 96-101): “Controls receiving Ang II: To facilitate between-group comparisons (control vs BPV), a separate cohort of control mice were subjected to the same pump infusion parameters as BPV mice but for a brief period receiving Ang II infusions on days 3-5 and 21-25 for experiments assessing pressure-evoked responses, including bradycardic reflex, myogenic response, and functional hyperemia at high BP.”

      (6) The correlation of ∆diameter vs MAP during low and high BP is compelling, and the shift in the cerebral autoregulation curve is also a good observation. I would strongly recommend that the authors include a schematic showing the working hypothesis that depicts the shift of the curve during BPV.

      Thank you for this insightful comment. The increase in vessel reactivity to BP elevations in parenchymal arterioles of BPV mice suggests that chronic BPV induces a leftward shift and a potential narrowing of the cerebral autoregulation range (lower BP thresholds for both the upper and lower limits of autoregulation). This has been incorporated (and discussed) into the revised manuscript (see Figure 5N).

      One potential explanation for these changes is that the absence of sustained hypertension, a prominent feature in most rodent models of hypertension, limits adaptive processes that protect the cerebral microcirculation from large BP fluctuations (e.g., vascular remodeling). While this study does not specifically address arteriole remodeling, the lack of such adaptation may reduce pressure buffering by upstream arterioles, thereby rendering the microcirculation more vulnerable to significant BP fluctuations.

      The unique model allows for measurements of parenchymal arteriole reactivity to acute dynamic changes in BP (both an increase and decrease in MAP). Our findings indicate that chronic BPV enhances the reactivity of parenchymal arterioles to BP changes—both during an increase in BP and upon its return to baseline, Supplemental Figure 5C, F. The data suggest an increased myogenic response to pressure elevation, indicative of heightened contractility, a common adaptive process observed in rodent models of hypertension[2-4]. However, our model also reveals a notable tendency for greater dilation when the BP drops, Supplemental Figure 5F. This intriguing observation may suggest ischemia during the vasoconstriction phase (at higher BP), leading to enhanced release of dilatory signals, which subsequently manifest as a greater dilation upon BP reduction. This phenomenon bears similarities to chronic hypoperfusion models[5,6], where vasodilatory mechanisms become more pronounced in response to sustained ischemic conditions. Future studies investigating the effects of BPV on myogenic responses and brain perfusion will be a priority for our ongoing research.

      (7) Functional hyperemia impairment in the BPV group is clear and well-described. Pairing this response with the kinetics of the recovery phase is an interesting observation. I suggest elaborating on why BPV group exerts lower responses and how this links to the rapid decline during recovery.

      Based on the heightened reactivity of BPV parenchymal arterioles to intravascular pressure (Figure 5), we anticipate that the reduction of sensory-evoked dilations results from an increased vasoconstrictive activity and/or a decreased availability of vasodilatory signaling pathways (NO, EETs, COX-derived prostaglandins)[7,8]. Consequently, the magnitude of the FH response is blunted during periods of elevated BP in BPV mice.

      Additionally, upon termination of the stimulus-induced response−when vasodilatory signals would typically dominate−vasoconstrictive mechanisms are rapidly engaged (or unmasked), leading to quicker return to baseline. This shift in the balance between vasodilatory and vasoconstrictive forces favors vasoconstriction, contributing to the altered recovery kinetics observed in BPV mice. This has been included in the Discussion section of the revised manuscript.

      (8) The experimental design for the cognitive/behavioral assessment is clear and it is a reasonable experiment based on previous results. However, the discussion associated with these results falls short. I recommend that the authors describe the rationale to assess recognition memory, short-term spatial memory, and mice activity, and explain why these outcomes are relevant in the BPV context. Are there other studies that support these findings? The authors discussed that no changes in alternation might be due to the age of the mice, which could already exhibit cognitive deficits. In this line of thought, what is the primary contributor to behavioral impairment? I think that this sentence weakens the conclusion on BPV impairing cognitive function and might even imply that age per se might be the factor that modulates the various physiological outcomes observed here. I recommend clarifying this section in the discussion.

      We thank the reviewer for this comment. Clinical studies have demonstrated that patients with elevated BPV exhibit impairments across multiple cognitive domains, including declines in processing speed[9] and episodic memory[10]. To evaluate memory function, we utilized behavioral tests: the novel object recognition (NOR) task to assess episodic memory[11] and the spontaneous Y-maze to evaluate short-term spatial memory[12].

      Previous research indicates that older C57Bl6 mice (14-month-old) exhibit cognitive deficits compared to younger counterparts (4- and 9-month-old)[13]. To ensure rigorous selection for behavioral testing, we conducted preliminary NOR assessment, evaluating recognition memory at the one-hour delay but observing failures at the four-, and 24-hour delays, indicating age-related deficits. Based on these results, animals failing recognition criteria were excluded from subsequent behavioral assessment. However, because no baseline cognitive testing was conducted for the spontaneous Y-maze, it is possible that some mice with aged-related deficits were included in this test, which may have influenced data interpretation.

      Additionally, the absence of differences in the Y-maze performance may suggest that short-term spatial memory remains intact following 25 days of BPV, a point that is now discussed in the revised manuscript.

      (9) Why were only male mice used?

      We appreciate this comment and acknowledge the importance of conducting experiments in both male and female mice. Studies involving female mice are currently ongoing, with telemetry data collection approximately halfway completed and two-photon imaging studies on functional hyperemia also partially completed. However, using middleaged mice for these experiments has proven challenging due to high mortality rates following telemetry surgeries. As a result, we initially limited our first cohort to male mice.

      (10) In the results for Figure 3: "Ang II evoked significant increases in SBP in both control and BPV groups;...". Also, in the figure legend: "B. Five-minute average HR when the pump is OFF or ON (infusing Ang II) for control and BPV groups...." The authors should clarify this as the methods do not state a control group that receives Ang II.

      Please refer to response to comment 5.

      Reviewer #2 (Public review):

      Summary:

      Blood pressure variability has been identified as an important risk factor for dementia. However, there are no established animal models to study the molecular mechanisms of increased blood pressure variability. In this manuscript, the authors present a novel mouse model of elevated BPV produced by pulsatile infusions of high-dose angiotensin II (3.1ug/hour) in middle-aged male mice. Using elegant methodology, including direct blood pressure measurement by telemetry, programmable infusion pumps, in vivo two-photon microscopy, and neurobehavioral tests, the authors show that this BPV model resulted in a blunted bradycardic response and cognitive deficits, enhanced myogenic response in parenchymal arterioles, and a loss of the pressure-evoked increase in functional hyperemia to whisker stimulation.

      Strengths:

      As the presentation of the first model of increased blood pressure variability, this manuscript establishes a method for assessing molecular mechanisms. The state-of-the-art methodology and robust data analysis provide convincing evidence that increased blood pressure variability impacts brain health.

      Weaknesses:

      One major drawback is that there is no comparison with another pressor agent (such as phenylephrine); therefore, it is not possible to conclude whether the observed effects are a result of increased blood pressure variability or caused by direct actions of Ang II.

      We acknowledge this limitation and have attempted to address the concern by introducing an alternative vasopressor, norepinephrine (NE), Figure 4. A subcutaneous dose of 45 µg/kg/min was titrated to match Ang II-induced transient BP pulse (Systolic BP ~150-180 mmHg), Figure 4A. Similar to Ang II treated mice, NE-treated mice exhibited no significant changes in average mean arterial pressure (MAP) throughout the 20-day treatment period (Figure 4B). Although there was a trend (P=0.08) towards increased average real variability (ARV) (Figure 4C left), it did not reach statistical significance. The coefficient of variation (CV) (Figure 4C right) was significantly increased by day 3-4 of treatment (P=0.02).

      Notably, unlike the bradycardic response observed during Ang II-induced BP elevations, NE infusions elicited a tachycardic response (Figure 4A), likely due to β-1 adrenergic receptor activation. However, significant mortality was observed within the NE cohort: three of six mice died prematurely during the second week of treatment, and two additional mice required euthanasia on days 18 and 20 due to lethargy, impaired mobility, and tachypnea.

      While we recognize the importance of comparing results across vasopressors, further investigation using additional vasopressors would require a dedicated study, as each agent may induce distinct off-target effects, potentially generating unique animal models. Alternatively, a mechanical approach−such as implanting a tethered intra-aortic balloon[14] connected to a syringe pump−could be explored to modulate blood pressure variability without pharmacological intervention. However, such an approach falls beyond the scope of the present study.

      Ang II is known to have direct actions on cerebrovascular reactivity, neuronal function, and learning and memory. Given that Ang II is increased in only 15% of human hypertensive patients (and an even lower percentage of non-hypertensive), the clinical relevance is diminished. Nonetheless, this is an important study establishing the first mouse model of increased BPV.

      We agree that high Ang II levels are not a predominant cause of hypertension in humans, which is why it is critical that our pulsatile Ang II dosing did not cause overt hypertension, (no increase in 24-hour MAP). Ang II was solely a tool to produce controlled, transient increases in BP to yield a significant increase in BPV.

      Regarding BPV specifically, prior studies indicate that primary hypertensive patients with elevated urinary angiotensinogen-to-creatinine ratio exhibit significantly higher mean 24-hour systolic ARV compared to those with lower ratios[15]. However, the fundamental mechanisms driving these harmful increases in BPV remain poorly defined. A central theme across clinical BPV studies is impaired arterial stiffness, which has been proposed to contribute to BPV through reduced arterial compliance and diminished baroreflex sensitivity. Moreover, increased BPV can exert mechanical stress on arterial walls, leading to arterial remodeling and stiffness−ultimately perpetuating a detrimental feed-forward cycle[16].

      In our model, male BPV mice exhibited a minimal yet significant elevation in SBP without corresponding increases in DBP, potentially reflecting isolated systolic hypertension, which is strongly associated with arterial stiffness[17,18]. Our initial goal was to establish controlled rapid fluctuations in BP, and Ang II was selected as the pressor due to its potent vasoconstrictive properties and short half-life[19].

      We appreciate the reviewer’s insightful comment and acknowledge the necessity of exploring alternative mechanisms underlying BPV, and independent of Ang II. It is our long-term goal to investigate these factors in further studies.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) How was the dose of Ang II determined? It seems that this dose (3.1ug/hr) is quite high.

      The Ang II dose was titrated in a preliminary study to one that induced a significant and transient BP response without increasing 24-hour blood pressure (i.e. no hypertension).

      Ang II was delivered subcutaneously at 3.1 μg/hr, a concentration comparable to high-dose Ang II administration via mini-osmotic pumps (~1700 ng/kg/min)[20], with one-hour pulses occurring every 3-4 hours. With 6 pulses per day, the total daily dose equates to 18.6 µg/day in a ~30 gram mouse.

      For comparison, if the same 18.6 µg/day dose were administered continuously via a mini-osmotic pump (18.6 µg/0.03kg/1440min), the resulting dosage would be approximately 431 ng/kg/min[21,22], aligning with subpressor dose levels. Thus, while the total dose may appear high, it is not delivered in a constant manner but rather intermittently, allowing for controlled, rapid variations in blood pressure.

      (2) Were behavioral studies performed on the same mice that were individually housed? Individual housing causes significant stress in mice that can affect learning and memory tasks (PMC6709207). It's not a huge issue since the control mice would have been housed the same way, but it is something that could be mentioned in the discussion section.

      Behavioral studies were performed on mice that were individually housed following the telemetry surgery. The study was started once BP levels stabilized, as mice required several days to achieve hemodynamic stability post-surgery. Consequently, all mice were individually housed for several days before undergoing behavioral assessment.

      To account for potential cognitive variability, earlier novel object recognition (NOR) tests were conducted to established cognitive capacity, and mice that did not meet criteria were excluded from further behavioral testing. However, we acknowledge that individual housing induces stress, which can influence learning and memory, and this is a factor we were unable to fully control. Given that both experimental and control groups experienced the same housing conditions, this stress effect should be comparable across cohorts. A discussion on this limitation is now included in the text.

      (3) It looks like one control mouse that was included in both Figures 1 and 2 (control n=12) but was excluded in Table 1 (control n=11), this isn't mentioned in the text - please include the exclusion criteria in the manuscript.

      We apologize for the typo−12 control animals were consistently utilized across Figure 1-2, Table 1, Supplemental Table 1, Figure 6C, and Supplemental Figure 2B. Since the initial submission, one control mouse was completed and included into the telemetry control cohort. Thus, in the updated manuscript, we have corrected the control sample size to 13 mice across these figures ensuring consistency.

      Additionally, exclusion criteria have now been explicitly included in the manuscript (Line 173-175). Mice were excluded from the study if they died prematurely (died prior to treatment onset) or mice exhibited abnormally elevated pressure while receiving saline, likely due to complications from telemetry surgery.

      (4) Please include a statement on why female mice were not included in this study.

      As discussed in our response to Reviewer #1, our initial intention was to include both male and female mice in this study. However, high mortality rates following telemetry surgeries significantly constrained our ability to advance all aspects of the study. As a result, we limited our first cohort to males to establish the basics of the model. A statement is now included in the manuscript, Line 50-53: “Female mice were not included in the present study due to high post-surgery mortality observed in 12-14-month-old mice following complex procedures. To minimized confounding effects of differential survival and to establish foundational data for this model, we restricted the investigation to male mice.”

      Potential sex differences might be complex and warrants a separate future research to comprehensively assess sex as a biological variable, which are currently ongoing.

      (5) On page 14, "experiments from control vs experimental mice were not equally conducted in the same season raising the possibility for a seasonal effect" - does this mean that control experiments were not conducted at the same time as the Ang II infusions in BPV mice? This has huge implications on whether the effects observed are induced by treatment or just batch seasonal effects.

      We fully acknowledge the reviewer’s concern, and our statement aims to provide transparency regarding the study’s limitations. Several challenges contributed to this outcome, including high mortality rates following surgeries (primarily telemetry implantation) and technical issues related to instrumentation, particularly telemetry functionality.

      Differences between BPV and saline mice emerge primarily due to mortality or telemetry failures−some mice did not survive post-surgery, while others remain healthy but had non-functional telemeters. This issue was particularly pronounced in 14-month-old mice, as their fragile vasculature occasionally prevented proper BP readings.

      Each experiment required a minimum of two and a half months per mouse to complete, with a cost (also per mouse) exceeding $1500 USD ($300 pump, $175 mouse, $900 telemeters, per diem, drugs, reagents etc.). Despite our best effort to ensure comparable seasonal/batch data, these logistical and technical constraints prevented perfect synchronization.

      To evaluate whether seasonal differences influenced our results, we incorporated additional telemetry data into the control cohort. Of the seven included control mice, six underwent the same treatment but were allocated to a separate branch of the study, which endpoints did not require a chronic cranial window. We found no significant differences in 24-hour average MAP during the baseline period between control mice with or without a cranial window, Supplemental Figure 2A. Additionally, we grouped mice into seasonal categories based on Georgia’s climate: “Spring-Summer” (May-September) and “Fall-Winter” (October-April) but observed no BP differences between these periods, Supplemental Figure 2B.

      Given the absence of seasonal effects on BP and the fact that mice were sourced from two independent suppliers (Jackson Laboratory and NIA), we anticipate that the observed results are driven by treatment rather than seasonal or batch effects.

      (6) Methods, two-photon imaging: did the authors mean "retro-orbital" instead of "intra-orbital" injection of the Texas red dye? Also, is this a Texas red-dextran? If so, what molecular weight?

      Thank you for this comment. The correct terminology is “retro-orbital” rather than “intra-orbital” injection. Additionally, we utilized Texas Red-dextran (70 kDa, 5% [wt/vol] in saline) for the imaging experiments. These details have now been incorporated into the Methods section.

      (1) Shaffer F, Ginsberg JP. An Overview of Heart Rate Variability Metrics and Norms. Front Public Health. 2017;5:258. doi: 10.3389/fpubh.2017.00258

      (2) Pires PW, Jackson WF, Dorrance AM. Regulation of myogenic tone and structure of parenchymal arterioles by hypertension and the mineralocorticoid receptor. Am J Physiol Heart Circ Physiol. 2015;309:H127-136. doi: 10.1152/ajpheart.00168.2015

      (3) Iddings JA, Kim KJ, Zhou Y, Higashimori H, Filosa JA. Enhanced parenchymal arteriole tone and astrocyte signaling protect neurovascular coupling mediated parenchymal arteriole vasodilation in the spontaneously hypertensive rat. J Cereb Blood Flow Metab. 2015;35:1127-1136. doi: 10.1038/jcbfm.2015.31

      (4) Diaz JR, Kim KJ, Brands MW, Filosa JA. Augmented astrocyte microdomain Ca(2+) dynamics and parenchymal arteriole tone in angiotensin II-infused hypertensive mice. Glia. 2019;67:551-565. doi: 10.1002/glia.23564

      (5) Kim KJ, Diaz JR, Presa JL, Muller PR, Brands MW, Khan MB, Hess DC, Althammer F, Stern JE, Filosa JA. Decreased parenchymal arteriolar tone uncouples vessel-to-neuronal communication in a mouse model of vascular cognitive impairment. GeroScience. 2021. doi: 10.1007/s11357-020-00305-x

      (6) Chan SL, Nelson MT, Cipolla MJ. Transient receptor potential vanilloid-4 channels are involved in diminished myogenic tone in brain parenchymal arterioles in response to chronic hypoperfusion in mice. Acta Physiol (Oxf). 2019;225:e13181. doi: 10.1111/apha.13181

      (7) Tarantini S, Hertelendy P, Tucsek Z, Valcarcel-Ares MN, Smith N, Menyhart A, Farkas E, Hodges EL, Towner R, Deak F, et al. Pharmacologically-induced neurovascular uncoupling is associated with cognitive impairment in mice. J Cereb Blood Flow Metab. 2015;35:1871-1881. doi: 10.1038/jcbfm.2015.162

      (8) Ma J, Ayata C, Huang PL, Fishman MC, Moskowitz MA. Regional cerebral blood flow response to vibrissal stimulation in mice lacking type I NOS gene expression. Am J Physiol. 1996;270:H1085-1090. doi: 10.1152/ajpheart.1996.270.3.H1085

      (9) Sible IJ, Nation DA. Blood Pressure Variability and Cognitive Decline: A Post Hoc Analysis of the SPRINT MIND Trial. Am J Hypertens. 2023;36:168-175. doi: 10.1093/ajh/hpac128

      (10) Epstein NU, Lane KA, Farlow MR, Risacher SL, Saykin AJ, Gao S. Cognitive dysfunction and greater visit-to-visit systolic blood pressure variability. Journal of the American Geriatrics Society. 2013;61:2168-2173. doi: 10.1111/jgs.12542

      (11) Antunes M, Biala G. The novel object recognition memory: neurobiology, test procedure, and its modifications. Cognitive processing. 2012;13:93-110. doi: 10.1007/s10339-011-0430-z

      (12) Kraeuter AK, Guest PC, Sarnyai Z. The Y-Maze for Assessment of Spatial Working and Reference Memory in Mice. Methods Mol Biol. 2019;1916:105-111. doi: 10.1007/978-1-4939-8994-2_10

      (13) Singhal G, Morgan J, Jawahar MC, Corrigan F, Jaehne EJ, Toben C, Breen J, Pederson SM, Manavis J, Hannan AJ, et al. Effects of aging on the motor, cognitive and affective behaviors, neuroimmune responses and hippocampal gene expression. Behav Brain Res. 2020;383:112501. doi: 10.1016/j.bbr.2020.112501

      (14) Tediashvili G, Wang D, Reichenspurner H, Deuse T, Schrepfer S. Balloon-based Injury to Induce Myointimal Hyperplasia in the Mouse Abdominal Aorta. J Vis Exp. 2018. doi: 10.3791/56477

      (15) Ozkayar N, Dede F, Akyel F, Yildirim T, Ates I, Turhan T, Altun B. Relationship between blood pressure variability and renal activity of the renin-angiotensin system. J Hum Hypertens. 2016;30:297-302. doi: 10.1038/jhh.2015.71

      (16) Kajikawa M, Higashi Y. Blood pressure variability and arterial stiffness: the chicken or the egg? Hypertens Res. 2024;47:1223-1224. doi: 10.1038/s41440-024-01589-8

      (17) Laurent S, Boutouyrie P. Arterial Stiffness and Hypertension in the Elderly. Front Cardiovasc Med. 2020;7:544302. doi: 10.3389/fcvm.2020.544302

      (18) Wallace SM, Yasmin, McEniery CM, Maki-Petaja KM, Booth AD, Cockcroft JR, Wilkinson IB. Isolated systolic hypertension is characterized by increased aortic stiffness and endothelial dysfunction. Hypertension. 2007;50:228-233. doi: 10.1161/HYPERTENSIONAHA.107.089391

      (19) Al-Merani SA, Brooks DP, Chapman BJ, Munday KA. The half-lives of angiotensin II, angiotensin II-amide, angiotensin III, Sar1-Ala8-angiotensin II and renin in the circulatory system of the rat. J Physiol. 1978;278:471490. doi: 10.1113/jphysiol.1978.sp012318

      (20) Zimmerman MC, Lazartigues E, Sharma RV, Davisson RL. Hypertension caused by angiotensin II infusion involves increased superoxide production in the central nervous system. Circ Res. 2004;95:210-216. doi: 10.1161/01.RES.0000135483.12297.e4

      (21) Gonzalez-Villalobos RA, Seth DM, Satou R, Horton H, Ohashi N, Miyata K, Katsurada A, Tran DV, Kobori H, Navar LG. Intrarenal angiotensin II and angiotensinogen augmentation in chronic angiotensin II-infused mice. Am J Physiol Renal Physiol. 2008;295:F772-779. doi: 10.1152/ajprenal.00019.2008

      (22) Nakagawa P, Nair AR, Agbor LN, Gomez J, Wu J, Zhang SY, Lu KT, Morgan DA, Rahmouni K, Grobe JL, et al. Increased Susceptibility of Mice Lacking Renin-b to Angiotensin II-Induced Organ Damage. Hypertension. 2020;76:468-477. doi: 10.1161/HYPERTENSIONAHA.120.14972

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Proposed revision plan

      Based on the below reviews, we propose the following revision plan. Briefly:

      • We will remove the functional data on TGFβ signaling and mechanical loading/mechanosensing. We agree with the reviewers that we would need to generate additional histological and molecular data from conditional knockout mice, antibody and (ant)agonist treatments and the optogenetic model to determine their exact involvement in lining macrophage maturation. These experiments require significant time and other resources.
      • We would therefore like to uncouple this question for a follow-on manuscript.We will re-focus the manuscript on the developmental data providing a molecular and cellular blueprint of lining macrophage development. This will include our data on CSF1 as a key signal. The novelty and relevance of our developmental data have been highlighted by all three reviewers, and they have also praised the rigor of these experiments and their interpretation. We thus believe that this re-focus will improve the manuscript message.
      • To further enhance this, we are proposing to include additional data delineating the developmental dynamics of synovial fibroblasts. We have generated an in-depth single cell RNAsequencing dataset but did not include fibroblast-specific analyses in the original manuscript. This is not a change proposed by the reviewers, but we are proposing this because we believe this would be an impactful addition to a revised version of our study, providing data also on the maturation of the synovial (lining) macrophage niche.
      • We will otherwise respond to all individual reviewer comments and implement the requested changes, unless technically not possible. Please find below detailed point-by-point answers.

      Reviewer #1

      Evidence, reproducibility and clarity

      In their manuscript entitled "The synovial lining macrophage layer develops in the first weeks of life in a CSF1- and TGFβ-dependent but monocyte-independent process," the authors explore the developmental trajectory of synovial lining macrophages. They demonstrate that the formation of this specialized macrophage layer is age-dependent and governed by a distinct developmental program that proceeds independently of circulating monocytes. Through scRNA-Seq, the authors show that synovial lining macrophages originate locally from Aqp1⁺ macrophages and are marked by the expression of Csf1r, Tgfbr, and Piezo1. Notably, genetic ablation of each of these factors impaired the development of lining macrophages to varying degrees, suggesting differential contributions of CSF1, TGFβ, and PIEZO1 signaling pathways to their maturation and maintenance.

      The manuscript is well written, and the data quality and representation is of a high standard. The authors have employed a sophisticated array of state-of-the-art mouse models and cutting-edge technologies to elucidate the developmental origin of synovial lining macrophages. Notably, the supporting scRNA-Seq datasets are of excellence and provide valuable insights that will likely be of significant interest to researchers in the field of immunology and joint biology. Accordingly, the experimental approach and interpretations regarding macrophage origin are well-founded and compelling. However, in the eye of the reviewer, the section addressing the underlying molecular mechanisms is a bit less convincing. This part of the study appears slightly underdeveloped, and some of the mechanistic claims lack sufficient experimental clarity. A more rigorous experimental investigation would be essential to reinforce the manuscript's conclusions, particularly concerning the data related to Tgfbr and Piezo1, where the current evidence appears insufficiently substantiated.

      We thank the reviewer for their positive and constructive evaluation of our manuscript. We agree with them (and the other reviewers) that our functional data on the involvement of TGFβ signaling and mechanical loading/mechanosensing are comparably less convincing and substantiated than our developmental data. We are very grateful for their (and the other reviewers’) suggestions to provide more support for the involvement of these factors in lining macrophage development. However, we think that carrying this out to the same high standard will require substantial time and other resources. We have therefore decided to uncouple this from the developmental data and pursue this in follow-up work. We will re-focus the current manuscript on the developmental data. We have proposed to the editors to instead include additional data on synovial fibroblast development, to complement our macrophage data and also delineate the maturation of their niche, thereby providing a conclusive developmental atlas.

      Major point:

      1. The numbers of VSIG4⁺ macrophages appear either unaffected or only minimally altered in both Csf1rMerCreMer Tgfbr2floxed and Fcgr1Cre Piezo1floxed mouse models, respectively. This raises an important question: was the gene deletion efficiency sufficient in each model? Accordingly, the authors are encouraged to include quantitative data on gene deletion efficiency for both mouse models, as this information is critical for interpreting the observed phenotypic outcomes and validating the conclusions regarding gene function. Furthermore, to better assess the impact of Tgfbr2 and Piezo1 disruption, the authors should provide more comprehensive flow cytometry analyses and histological data for these mouse models. Given the apparent homogeneity of VSIG4⁺ macrophages (as shown by the authors themselves), bulk RNA-Seq of sorted Tgfbr2- and Piezo1-deficient VSIG4⁺ macrophages (or from TGFβ-treated animals) would offer valuable insights into both the effectiveness of gene deletion and the molecular pathways governed by TGFβ and PIEZO1 in lining macrophages.

      As outlined above, we have decided to uncouple our functional data on TGFβ, Piezo1 and mechanical loading. The points raised here are all very valid, and we will implement your suggestions in our follow-up functional work focusing on signaling events regulating lining macrophage development. On the suggestion to perform bulk RNA sequencing for VSIG4+ macrophages: This is a good one in principle – although we will not be able to use this strategy where we want to assess the consequences of experimental treatments or genetic models on lining macrophage maturation, because acquisition of VSIG4 is a key maturation event that might be impaired in these conditions.

      Minor points:

      Consistent usage of Cx3cr1-GFP+ nomenclature (for instance: Fig. S1 legend "adult mouse synovial tissue, showing PDGFRα⁺ fibroblasts (yellow) and CX3CR1-GFP⁺ cells (cyan)." versus Fig. 1 legend "Automated spot detection highlights Cx3cr1-GFP⁺ macrophages)".

      We will implement these changes.

      Unclear Fig. 3 legend: "Representative immunofluorescence images of synovial tissue from Clec9aCre:Rosa26lsl-tdT mice at 3 weeks and in adulthood, showing and tdTomato (yellow) and stained for DAPI (blue), VSIG4 (cyan)" Check 'showing and tdTomato.'

      We will implement these changes.

      For greater clarity, it would have been helpful if the transcript names had been directly included within Figures 3C, S3A, and S3C.

      We will implement these changes.

      Page 24: "(Mki67CreERT2:Rosa26lsl-tdT)" Last bracket not superscript.

      We will implement these changes.

      Page 25: "we again leveraged our scRNAsequencing dataset" Missing punctuation.

      We will implement these changes.

      Page 27: Fig. 5C legend: " of synovial tissue of 1 week-old, 3 weeks-old and adult mice." Please specify and change to 'adult Csf1rΔFIRE/ΔFIRE mice'.

      We will implement these changes.

      Page 30: The outcome observed in the Acta1-rtTA:tetO-Cre:ChR2-V5fl mouse model appears to be inconclusive: "This approach resulted in an increased density of VSIG4+ and total (F4/80+) macrophages in the exposed leg of some 5 days-old pups, but others showed the opposite trend (Figure S5D)." This variability may reflect low efficiency of the model or other technical limitations (e.g. muscle contractions frequency or time point of analysis). Given this ambiguity, it is worth reconsidering whether the data are sufficiently robust to warrant inclusion. Should the authors choose to include these findings, further experimentation of appropriate depth and precision is required to allow a conclusive interpretation (either it increases the density of VSIG4+ macrophages or not). The same applies to the Yoda1-treated mice, for which additional data are needed to determine whether VSIG4⁺ macrophage density is truly affected.

      We have decided to remove the data on the optogenetic mouse model and Yoda1 treatment and follow-on separately, implementing these suggestions, including proof of concept data for optogenetically induced muscle contractions.

      Significance

      General assessment: provide a summary of the strengths and limitations of the study. What are the strongest and most important aspects? What aspects of the study should be improved or could be developed? This is a well-designed study that uses cutting-edge methodologies to investigate the developmental trajectory of synovial lining macrophages under homeostatic conditions. The authors present robust experimental evidence and compelling interpretations concerning synovial macrophage origin, which are both well-substantiated and impactful. Nonetheless, from the reviewer's perspective, the section exploring the molecular mechanisms underlying macrophage differentiation is comparatively less convincing. This section appears somewhat underdeveloped, as some of the mechanistic claims lack sufficient depth and experimental rigor to fully substantiate the conclusions.

      Describe the nature and significance of the advance (e.g. conceptual, technical, clinical) for the field: In contrast to earlier studies (PMID: 31391580, 32601335), the inclusion of fate-mapping experiments adds an important dimension, offering novel insight into the ontogeny of synovial macrophages. This expanded perspective may prove particularly valuable in advancing our understanding of joint immunology, especially regarding the local origins and lineage relationships of macrophage populations.

      Furthermore, the authors present novel insights into the molecular pathways underlying the differentiation and development of synovial lining macrophages. By demonstrating previously unrecognized regulatory mechanisms, this work significantly deepens our understanding of the cellular and transcriptional programs that drive macrophage specialization within the joint microenvironment.

      Place the work in the context of the existing literature (provide references, where appropriate): This study builds upon previous work characterizing the macrophage compartment in the joint (PMID: 31391580, 32601335), yet provides a substantially more comprehensive dataset that spans multiple developmental time points and data on the origin of this specialized macrophage subset.

      State what audience might be interested in and influenced by the reported findings: Immunologist, clinicians

      Define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate. This study falls well within the scope of the reviewer's expertise in innate immunity.

      Reviewer #2

      Evidence, reproducibility and clarity

      In the manuscript „The synovial lining macrophage layer develops in the first weeks of life in a CSF1- and TGFβ- dependent but monocyte-independent process", Magalhaes Pinto and colleagues carefully employ a wide range of technologies including single cell profiling, imaging and an exceptional combination of fate mapping models to characterize the ontogeny and development of lining macrophages in the joint, thus dissecting their maturation during postnatal development. Over the last decade, several landmark studies highlighted the imprinting of tissue-resident macrophages by a combination of ontogenetic and tissue-specific niche factors during development. So far, the ontogeny and the tissue niche factors governing the development and maturation of lining macrophages have not been described. Therefore, the results of this study offers insights on a small highly adapted macrophage population with relevance in many disease settings in the joint. Furthermore, the findings are nicely showcasing how macrophages are specializing to even very small tissue niches across development within one bigger anatomical compartment to serve dedicated functions within this niche.

      This manuscript is beautifully written and highlights many novel, highly relevant findings on lining macrophage biology and the authors employ a wide range of different technologies to carefully dissect the postnatal development of lining macrophages.

      In particular, the combination of scRNA-seq and fate mapping is providing a unique the link of transcriptional programs to ontogeny within the tissue niche. Furthermore, the integrative use of distinct fate mapping strategies, transgenic mouse lines, and treatment paradigms to elucidate key niche factors guiding the development and maturation of lining macrophages provides many interesting findings and data that are highly relevant to the field. I really enjoyed reading this manuscript.

      Thank you for your complimentary and constructive assessment of our manuscript, and the detailed comments below, which are very helpful. Please find point-by-point responses below.

      Major points:

      The authors show dynamic regulation of VSIG4 in lining macrophages during development, therefore VSIG4 is maybe not an ideal choice for gating strategies to define lining macrophages or to show as a single markers in immunofluorescence (IF) stainings to demonstrate their abundance across development (even though it is clear that this is the reason why the F4/80 staining is shown next to it). To demonstrate the increase of lining macrophages during development in IF, it would be more helpful if the authors would show quantifications of all F4/80+ cells and additionally VSIG4+ as a proportion of F4/80+ cells (or VSIG4+ F4/80+ and all F4/80+ in a stacked bar plot). We agree with the assessment of VSIG4 not being ideal since this is a key marker of mature lining macrophages only.

      We will provide these additional analyses.

      In Figure 1C, the authors nicely demonstrate that the lining macrophages get closer in their distance across development to build the epithelial-like macrophage structure along the adult lining. Is the close proximity between lining macrophages already fully "matured" at 3 weeks of age and comparable to adults? Please quantify the distance in adult linings.

      We will provide data for adult joints.

      Can the authors explain how the grouping was performed between the analyzed human fetal joints? It is not clear why the cut was chosen between the groups at 16/17 weeks of age. Maybe it would be also beneficial if the authors would consider not grouping these samples but rather show the specific quantifications for each samples individually and estimate via linear regression the expansion over time across human development. Furthermore, can the authors give additional information about the distancing of lining macrophages in the human fetal samples, it would be great to see if they follow the same dynamics as in mouse. Maybe comparison to human juvenile/adult joints would also add on to substantiate the findings in human samples (if possible).

      We will show samples ungrouped and perform linear regression analysis as suggested.

      The scRNA-seq analysis leaves several questions open and some conclusions and workflows cannot be easily followed.

      We appreciate this comment and the complexity of the data, and will implement the below recommendations, and clarify the issues raised.

      It is not clear how and especially why the signature genes to define macrophages vs. monocytes were chosen. Especially as the signature genes for monocytes would not include patrolling monocytes and the macrophage signature genes seem to be highly regulated during development, see also Apoe expression in NB vs. adult in Figure S2e. Why did the authors not take classical markers such as Itgam, Fcgr1a, Csf1r?

      Can dendritic cell signatures be excluded? Cluster 11 and 12 show indeed some DC markers, are these really macrophages?

      The authors provide several figure panels showing TOP marker genes or key marker genes for the identified clusters, however it is not clear if these are TOP DE genes or if the genes were hand chosen. Somehow, the authors give the impression that the clusters were chosen and labeled not based on DE genes, but more on existing literature that previously reported these macrophage populations. DE gene lists for all annotated cell types and macrophage clusters need to be provided within the manuscript.

      The authors claim that Clusters 1 and 4 are "developing" macrophages. How is this defined? Why are these developing cells compared to other clusters? And why are these clusters later on not considered as progenitors of Aqp1 macrophages and Vsig4 macrophages? Why are Aqp1+ macrophages not labeled as developing when they are later on in the manuscript shown as potential intermediate progenitors of lining macrophages?

      Furthermore, it is again confusing that markers are used throughout Figure 2 which are labeled as "key marker genes" for a population and then later on they are claimed to be regulated during development within this population, see for example Figure 2D and 2H.

      It is appreciated that the authors distinguished cycling clusters such as 8, 9, and 10 based on their cycling gene signature. Here it would be very exciting to see a cell cycle analysis across all clusters and time points to see when exactly the cells are expanding during development; this would also substantiate the data later shown for the Mki67-CreERT2 mouse model.

      Can the authors identify certain gene modules during development of lining macrophages (and/or their progenitors) which are associated with certain functions (e.g. GO terms, GSEA enrichment)?

      To determine the actual presence of the identified macrophage clusters from the scRNA-seq as macrophage populations in the joint, the authors should perform IF or FACS for key markers. Especially, Aqp1+ macrophages should be shown in the developing joint.

      We will provide additional data, but would also like to reference a study by collaborators currently in revision at Immunity, which characterizes the Aqp1+ population in detail. We are hoping to have a doi available during our revision process.

      The authors used a wide range of fate mapping models, which is quite unique and highly appreciated. The obtained results and the conclusions made from the models raise a couple of questions: Whereas contribution of HSC-derived/monocyte-derived macrophages to the lining compartment seems to be minor, there is still labeling across different models. Various aspects would need to be clarified.

      We will clarify these data throughout as per below suggestions.

      For example, the authors employ Ms4a3-Cre as a tracing model for GMP-derived monocytes, however all quantifications of the labeling efficiency are not normalized to the labeling in monocytes or another highly recombined cell population. This should be shown, similar to the other fate mapping models (Figure 3 F-I).

      Labelling efficacy for Ms4a3-Cre is near complete for GMP-derived monocytes (and neutrophils) with the Rosa-lsl-tdT (aka Ai14) reporter we have used (see also PMID: 31491389 and doi: 10.1101/2024.12.03.626330); but we will include normalized data as requested.

      Please show Ms4a3 expression across clusters across time points, to exclude expression in fetal-derived clusters.

      We will include this in the revised supplementary information, but there is indeed very little at birth (in line with the original report for other tissues PMID: 31491389).

      In line with the question raised above, if the authors can exclude a development of the Egfr1+ and Clec4n+ developing macrophages into Aqp1+ macrophages and subsequently into Vsig4 lining macrophages, the obtained data from the Ms4a3-Cre model highly suggests a correlative labeling across these clusters what could implicate a relation. However, the authors do not discuss throughout the manuscript the role of these developing macrophages. It is highly encouraged to include this into the manuscript and it would be of high relevance to understand lining macrophage development.

      This is an interesting point and we agree it deserves consideration in the revised manuscript. Indeed, our trajectory analyses do not predict differentiation of the Egfr1+ and Clec4n+ developing macrophages into Aqp1+ macrophages, and hence, ultimately lining macrophages. Conversely, Aqp1+ cells might also convert into Egfr1+ and Clec4n+ developing macrophages. We will elaborate on this more in the revised manuscript.

      The authors conclude from the pseudo bulk transcriptomic profiling of the different macrophage clusters that TdT+ and TdT- macrophages do not differ in their gene expression profile and that this is due to niche imprinting rather than origin imprinting. Even though the data supports that conclusion, the authors should verify if inkling cells early during development also show this similar gene expression profile and gene expression should be compared at the different developmental time points. Tissue niche imprinting is happening within the niche during development, most likely in a stepwise progress, and therefore there should be differences in the beginning.

      This is another important point that we will address in the revised manuscript by performing additional differential gene expression analyses at the different developmental time points, including the earliest stages, as suggested.

      The trajectorial analysis using different pseudotime pipelines is very interesting and nicely points out the potential role of Aqp1 macrophages as intermediates of Vsig4 lining macrophages. From my point of view, all trajectories seem to suggest that Egfr1 developing macrophages and Clec4n developing macrophages might differentiate into Aqp1 macrophages, however the authors are not exploring this further and the role of both developing macrophage clusters is not further discussed (see also comments above).

      We will address and discuss this in the revised manuscript.

      How was the starting point of the trajectorial analyses defined and is it the same for each pipeline used?

      We will clarify this in the revised manuscript.

      Are there potentially two trajectories? It looks like there is one in the beginning of postnatal life and a second one appearing from the monocyte-compartment later in life. If this is true, that would rather speak for a dual ontogeny of Vsig4+ macrophages, wouldn't it?

      We will discuss this in the revised manuscript.

      A heatmap (transcriptional shift) of trajectories between more clusters should be shown at least for Cluster 0,1,2, and 3. It is not sufficient to demonstrate this only between two clusters.

      We will add these analyses during revision.

      To show the similarity between Aqp1 macrophages and proliferating macrophage clusters, the authors should remove the cycling signature and compare these clusters to show that the cycling cells might be Aqp1 macrophages or earlier developing macrophage progenitors aka Clec4n or Egfr1 macrophages.

      We will address this in the revised manuscript.

      The conclusions made from the Mki67-CreERT2 data are a bit difficult to understand, whereas all progenitors (monocyte progenitors and macrophage progenitors will proliferate at the neonatal time point and no conclusions can be made if the cells expand in the niche. The authors should employ Confetti mice or other models (Ubow mice) to analyze clonal expansion in the niche.

      We agree that interpretation of the Mki67-CreERT2 data is complicated by labeling of other cells, and notably, labeling observed in BM-derived cells. We will highlight this better in the revised manuscript. We have tried using Ubow mice to address this issue, but the recombination efficacy we yielded was too low to draw conclusions. We will address this during revision.

      All predicted cell-cell interactions between macrophages and fibroblasts should be provided in a supplementary table. Are the interactions shown in Figure 5 chosen interactions or the TOP predicted ones? Whereas the authors show different numbers of interactions, it is most likely hand-picked and therefore biased.

      We will provide a full list of all predicted interactions in the revised supplementary material in addition to a list of the full differential gene expression analysis.

      The authors further aim to dissect the factors involved in the developmental niche imprinting of lining macrophages. Even though it is highly appreciated that the authors used so many experimental setups to show the reliance of lining macrophages on Csf1 and TGF-beta as well as mechanosensation, the wide range of models the different methods used and selected developmental time points make it very difficult to really interpret the data. The authors should carefully choose time points and methods (either FACS analysis across all models or IF across all, or both). Often deletion efficiencies for transgenic models and proof of concept that the inhibitors and agonists are working in the treatment paradigm are not provided. For example, Csf1rMer-iCre-Mer Tgfbr2fl/fl mice are used but no deletion efficiency is shown or different time points of analysis, maybe the macrophages are not properly targeted in the set up.

      We have decided to uncouple our experimental data on Tgfb, Piezo1 and mechanosensing/mechanical loading, but are taking this into consideration for revision. In many cases, we have in fact performed flow cytometry and imaging analyses, and agree, we should be showing this consistently.

      The authors have shown the role of Csf1 and Tgfbr2 only for lining macrophages, is this specific in the joint to this population of are subliming macrophages affected in a similar manner.

      We will include data on sublining macrophages in the revised figure (for CSF1; Tgfb data will be uncoupled from this current manuscript).

      Can the authors confirm their results in CSF1R-FIRE mice with anti-Csf1 injections or in Csf1op/op mice?

      We will expand our discussion of the Csf1 findings, and will consider including anti-CSF1 data during revision. Phenotypes on other Csf1(r) deficient mice are published, if not with the same developmental resolution as our time course in Csf1rFIRE knockout mice and with simpler readouts. Csf1op/op mice are indeed deficient in synovial lining macrophages, from 2 days of age onwards (PMID: 8050349), and lining macrophages are also absent from 2-weeks-old and adult Csf1r-/- mice (PMID: 11756160).

      The setup in Figure S5G is very interesting to test the role of movement and mechanical load on the joint, however, there is basically no data on the model provided showing the efficiency of the induced optogenetic muscle contractions, and only one time point is shown.

      Data on mechanical loading will be uncoupled from the current manuscript and substantiated in a separate follow-up.

      The results regarding the role of Piezo1 and mechanosensation vary a lot. Could it be that analyses were done too early or that actually proper weight load on the joint must be applied for the maturation of the macrophages? The authors should test this to.

      We will uncouple these data from the current manuscript during revision. However, this is a possibility that we have discussed. In fact, the most appropriate experimental approach to address the involvement of mechanical loading, onset of walking and specifically, weight bearing would be a loss-of-function approach (i.e. paralysis at the newborn stage), for which we unfortunately could not obtain ethics approval from the UK Home Office.

      The Rolipram experiment is shown in Figure S5G, but is not described in the result section. It only appears at some point in the discussion part. The authors should move it to results or remove it from the manuscript.

      We will incorporate these data with the revised section on developing synovial macrophage populations.

      Minor points:

      Please reference the Figure panels in numeric order throughout the text.

      We will change this where not the case.

      Figure 2a and 2b are a bit out of the storyline, it is not obvious why this is shown here and maybe it would be good to move it to the supplements. Gating strategy is also not used for scRNA-seq. Therefore, it would better fit to the later analysis of joint macrophages across different transgenic mouse models and treatment paradigms. The gating strategies are changing across different experiments throughout the figures, it would be nice to have a similar gating strategy for all experiments, see also Figure 3 where the defining markers for joint macrophages are changing between models.

      We will revise Figures 2, 3 and the related supplementary figures.

      A lot of figure panels have very small labeling that is basically unreadable. Axes at FACS plots for example. Sometimes, it is even impossible to distinguish cluster labels especially when they have similar colors.

      We will revise this, thanks for pointing it out.

      In the text on page 14, many markers are named which are specifically regulated during development in lining macrophages, but these factors are not labeled anywhere in the volcano plot. It would be good to showcase at least some of these named genes in the figure panel, e.g. Trem2.

      We will do this for revision.

      Figure 2F and Figure S2F are really nicely showing the percentage of cells per cluster in each analyzed biological sample. Maybe the authors could additionally consider to show a stacked bar plot with the mean percentage of cells per cluster and how the clusters are distributed across time points?

      We will include this in the revised manuscript.

      Figure 3A: IF for adult lining macrophages and the quantification are missing.

      This will be included in the revised version.

      Significance

      This manuscript highlights novel, highly relevant findings on lining macrophage biology and the authors employ a wide range of different technologies to carefully dissect the postnatal development of lining macrophages. Furthermore, this study showcases in a very elegant and detailed way the adaptation of macrophage progenitors to a highly specific anatomical tissue niche.

      The manuscript is of high interest to basic scientists focussing on macrophage biology and immune cell development and clinicians and clinician scientists focussing on joint diseases such as RA.

      Therefore the manuscript is of interest to a wide community working in immunology.

      Reviewer #3

      Summary:

      Magalhaes Pinto, Malengier-Devlies, and co-authors investigated the developmental origins and maturation of synovial (lining and sublining) macrophages across embryonic, newborn, and postnatal stages in mouse. The authors used multiple transgenic reporter lines, lineage tracing, scRNA-seq, 2D confocal and 3D lightsheet imaging, and perturbations to delineate the macrophage states and ontogeny. They propose a model in which the majority of the joint lining macrophages has a fetal (EMP-derived) origin and a small proportion has a definitive HSC-derived monocyte origin, which both seed and mature within the synovial space in the postnatal period in the first 3 weeks of life. Using cell-cell communication analysis on their scRNA-seq data, they identified Fgf2, Csf1, and Tgfb as candidate signaling pathways that support (lining) macrophage development and maturation. Functional experiments indicate that the process is CSF1 and TGFb-dependent and also partly dependent on mechanosensing through Piezo1.

      The key conclusions on the composition of the synovial macrophages are convincing based on the presented results, and are carefully phrased. The study is very comprehensive, yet the description and organization of the results of the different mouse models could be altered to improve the storyline. Several refinements in data presentation, formulation, and minor validation experiments would further improve the clarity of the story, as well as summary recaps of the major findings throughout the text.

      We thank this reviewer for their detailed review. We will be implementing the requested changes wherever technically feasible.

      Major comments:

      Generally, the story could be more streamlined by introducing earlier reporter lines and lineage-origin logic. Clearly state which reporter/CreERT2 lines and acrosses are used. It was unclear in Figure 2 that cells of the cross of the Cx3cr1-GFP and Ms4a3Cre:Rosa26lsl-tdT reporter lines were used for the scRNA-seq. The principle that there are fetal-derived and bone marrow (GMP)-derived monocytes and macrophages doesn't need to be "hidden" until Figure 3. For example, also the imaging of Ms4a3Cre could be introduced before the scRNA-seq.

      We will revise the structure and order of the manuscript during revision.

      Figure 1 could benefit from a cartoon visualizing the anatomy of the knee joint. The terms "sublining" and "synovium" are now a bit unclear, as it appears that sometimes the synovium is indicated as sublining and vice versa. Additionally, a schematic developmental timeline could be added to indicate the parallels between mouse and human development (fetal and postnatal development in mouse versus gestational age in human). Also, the various waves of hematopoiesis could be indicated in this timeline, which would be particularly helpful for Figure 3 for the lineage-tracing readouts. Lastly, the authors could end the manuscript (a new Figure 6) with a general cartoon summarizing all the results presented.

      We will include illustrations as suggested.

      Figure 1 could be rearranged: first introduce the markers CX3CR1 and VSIG4 (Figure 1D) and then present the quantifications (Figure 1B/E). Where possible, co-visualization CX3CR1-GFP and VSIG4 on tissue sections to strengthen the claims on the relationship between these 2 markers. Tying the scRNA-seq insights (Figure 2) to the imaging would be elegant. Moreover, it would be informative to represent the CX3CR1+ and VSIG4+ macrophages as a percentage of F4/80+ macrophages (Figure 1B/E). Similarly, for the flow cytometry data in Figure 2, the relationship between the markers CX3CR1 and VSIG4 on macrophages could be more clearly displayed and discussed.

      Thanks for this remark. We will endeavor to show co-localization and analysis of both markers wherever possible. However, where we did not use Cx3cr1gfp mice, co-staining was limited by antibody choice.

      The 3D imaging of the joint is a nice addition to the manuscript, as it provides more context to the anatomical structure; however, while the text suggests several newborn joints were imaged, Figure 1F visualizes (again) the knee joint. Could other joints also be represented by 3D imaging? If the knee joint is the only joint available for imaging, and previous confocal imaging focused specifically on the meniscus in the knee joint, could the meniscus also be highlighted in the lightsheet imaging?

      Apologies if this was not clear from the original manuscript text, but we have only imaged the knee joint in 3D. We will clarify this during revision and consider inclusion of additional imaging data.

      Clarification is requested regarding the imaging quantification representation. The M&M section under "Statistical analysis and reproducibility" states that individual data points are displayed, and bars represent the mean. However, some of the Figure legends (e.g., Figures 1B and S1C) specify that each dot corresponds to an individual mouse, with quantification based on 2-3 sections per mouse. While this appears to be a very reasonable representation of the data, does this mean that for each dot, the mean value from the 2-3 sections per mouse was calculated and plotted?

      We will clarify this.

      It is not clear how the differential expression analysis was performed on the Vsig4+ cells. Please specify if Cluster 0 was used for analysis, or all Vsig4-expressing cells? Not all cells in Cluster 0 have Vsig4+ expression. The authors described the expression dynamics of Aqp1 as intriguing, but lack a reasoning on why this is interesting.

      We will revise this section.

      Figure S3E: In line with the previous comment, can the authors justify that the tdTomato+/- comparisons are not biased by scRNA-seq dropout (scRNA-seq is zero-inflated, so some tdTomato- cells could be false negatives), and provide methodological details (thresholds, ambient RNA correction, etc.) to support this?

      We will clarify this and include additional representations of the tdTomato transcript data.

      Although the sex-related differences in macrophage composition and the absence of differential expression are interesting, they distract from the manuscript's main messages. Moreover, the Discussion does not elaborate on how these observations relate to joint (disease) biology. Consider removing this section or integrating it clearly into the relevant biological context.

      We will remove this section as suggested.

      CreERT2 transgenic lines are often not 100% efficient in recombination, also depending on whether tamoxifen or 4-OHT is used. Could the authors report the percentage of tdTomato+ cells in the joints and compare them to the recombination efficiencies in the monocytes/microglia under the same tamoxifen or 4-OHT conditions? This would help clarify how the interpret the macrophage labeling %'s.

      We will report labelling efficacies and/or show normalized data in the revised manuscript.

      Could the authors draw parallels between the observations in the mouse knee joint macrophage populations and literature on other joints in mouse and the knee joint in human (for example, as described in Alivernini et al., 2020 and in the very recent Raut et al., 2025)?

      We will include a section on this in the revised manuscript.

      Minor comments:

      In general, the authors should clarify in the Results what each marker used for imaging, flow cytometry, or in the mouse reporter lines delineates. For example, mention that F4/80 is a marker for tissue-resident macrophages (correct?) in immunofluorescence, that IBA1 is a marker for macrophages on human tissue sections (Figure S1), and PDPN is GP38 (Figure S2 - align usage of marker reference across main text and figures).

      We will implement this request.

      For clarity in the microscopy representation, the single channels should be represented in a grey scale.

      We will revise image presentation.

      Figure S1B: Is CX3CR1 also restricted to the lining macrophages in human? Could a co-staining with IBA1 be performed to strengthen the species similarities?

      To our knowledge, there is no antibody available that works for imaging of human CX3CR1. Moreover, CX3CR1 is only limited to the lining population in adult joints, in fetal and newborn (mouse) joints, all macrophages express this receptor, as do fetal progenitors to macrophages. However, Alivernini and colleagues have reported that TREM2high macrophages are the human counterpart of the mouse CX3CR1+ lining population (PMID: 32601335).

      Adipocyte diameter quantification: Avoid plotting individual adipocytes from 2 mice without per-mouse visualization. Instead, report the mean adipocyte diameter per mouse and plot those means.

      We will implement this change.

      A little typo was spotted in the "Statistical analysis and reproducibility" section: it is Dunn's, not Bunn's multiple-comparison correction.

      Thanks for spotting this.

      Figure 2A: The gating strategy for the CX3CR1-GFP cells is missing.

      We will provide this in the revised manuscript or supplementary material.

      Improve the visualization of some plots. For example, Figure 2F is hard to read because of the big dot size. The dots seem to add no information to the graph and could be removed. Additionally, for comparing the clusters across the different time points, one could project the cells from the other time points in grey in the background.

      We will revise the presentation of these data.

      Figure S2: The dotplot is more informative than the heatmap, consider removing the heatmap.

      We will do that.

      Figure 3A: If technically feasible, image and visualize both the GFP and tdTomato expression. It would be informative to see the Cx3cr1+ and Ms4a3-derived cells in the same specimen.

      We will thrive to show this in the revised manuscript.

      Figure 3C: Highlight that tdTomato expression is visualized here.

      We will do that.

      Figure 3G,F: The authors should place the schematics and graphs next to each other, so the data points can be more easily compared.

      We aim to do this in the revised manuscript.

      Figure 4B: Which co-staining was performed for the immunofluorescence to quantify the % of tdTomato+ cells?

      We co-stained for F4/80 and assessed localization in the lining or sublining. This will be clarified in the revised Figure legend.

      Figure 4C: The trajectory analysis appears to have an arrow pointing from the Ccr2+ macrophages to the Ly6c+ monocytes. Please verify this directionality, as its seems against the known biology.

      This will be addressed during revision.

      Figure 5 mentions that the Csfr1 levels were reduced in a tissue-specific manner, but it is unclear how this tissue specificity was achieved.

      We apologize for this misunderstanding. Csfr1FIRE mice are not tissue-specific knockouts, but they are more specific than global knockout mice, since only a (myeloid-specific) enhancer is affected. We will clarify this in the relevant section.

      For the TGFb perturbations (Tgfbr2 KO and systemic TGFb depletion): did the authors validate reduced TGFb pathway activity in the macrophages, for example, reduced pSMAD2/3 levels? This would validate the effectiveness of the perturbations. This is an important point, and assessing signaling events downstream of TGFb is a very good suggestion.

      As per above comment, we have decided to uncouple the functional data with exception of CSF1 from the revised version of the current manuscript, but we will be taking this into account for substantiating our functional data in follow-up work.

      Figure 5F could benefit from a timeline of the treatment.

      As for the previous point raised, we will be taking this into account for follow-up work on the uncoupled functional data.

      The Methods mention that Gene Ontology analysis was performed on the single-cell data, but the results are not plotted in a figure. It would be informative to include this GO/pathway analysis in the appropriate figure(s).

      We will include this in the revised (supplementary) information.

      Significance:

      This work provides a high temporal-resolution and "spatial" resolution reference map of the ontogeny and maturation of the synovial lining macrophages in the knee joint. It complements existing literature that demonstrated the presence of tissue-resident macrophages in the synovial space and lining (Culemann, et al., 2019 and others) by charting the embryonic-to-postnatal emergence of lining and sublining subsets. In particular, this mouse work identified some key signaling pathways in shaping this tissue compartment. This dataset serves as a robust, steady-state reference for joint pathology and can be implemented with human studies on disease biology of the knee joint (e.g., Alivernini et al., 2020; Raut et al., 2025). Insights into the exact developmental origins, mechanisms contributing to diverse or seemingly similar cell types, and distinct maturation processes are crucial to understanding disease biology, in which developmental processes can be hijacked/reactivated.

      These findings will interest researchers in joint disease biology (osteoarthritis and immune-mediated arthritides such as RA and psoriasis), macrophage development (tissue-resident vs monocyte-derived lineages), the bone/joint microenvironment, and joint mechanobiology.

      The reviewer's expertise is in developmental biology, mesoderm, bone biology, hematopoiesis, and monocyte/macrophage biology in disease

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      The authors try to investigate how the population of microtubules (LSPMB) that originate from sporozoite subpellicular microtubules (SSPM) and are remodelled during liver-stage development of malaria parasites. These bundles shrink over time and help form structures needed for cell division. The authors have used expansion microscopy, live-cell imaging, genetically engineered mutants, and pharmacological perturbation to study parasite development with liver cells.

      A major strength of the manuscript is the live cell imaging and expansion microscopy to study this challenging liver stage of parasite development. It gives important knowledge that PTMs of α-tubulin, such as polyglutamylation and tyrosination/detyrosination, are crucial for microtubule stability. Mutations in α-tubulin reduce the parasite's ability to move and proliferate in the liver cells. The drug oryzalin, which targets microtubules, also blocks parasite development, showing how important dynamic microtubules are at this stage.

      The major problem in the manuscript was the way it flows, as the authors keep shifting from the liver stage to the sporogony stages and then back to the liver stages. It was very confusing at times to know what the real focus of the study is, whether sporozoite development or liver stage development. The flow of the manuscript could be improved. Some of the findings reported here substantiate the previous electron microscopy.

      Overall, the study represents an important contribution towards understanding cytoskeletal remodelling during liver stage infection. The study suggests that tubulin modifications are key for the parasite's survival in the liver and could be targets for new malaria treatments. This is also the stage that has been used for vaccine development, so any knowledge of how parasites proliferate in the liver cells will be beneficial towards intervention approaches.

      We would like to express our sincere gratitude to Reviewer #1 for the positive and encouraging feedback on our manuscript. We are delighted that the reviewer found our experimental design and methodologies appropriate and that our study represents an important contribution to understanding cytoskeletal remodelling during liver stage infection, a critical phase for vaccine development. We are also grateful to the reviewer for highlighting the issue with the manuscript's flow. We acknowledge this limitation and will significantly improve the narrative structure and logical progression in the revised manuscript to ensure clarity and avoid any potential confusion. Thank you again for your thoughtful and constructive comments.

      Reviewer #2 (Public review):

      Summary:

      The authors investigated microtubule distribution and their possible post-translational modifications (PTM) in Plasmodium berghei during development of the liver stage, using either hepatocytes or HeLa cells as models. They used conventional immunofluorescence assays and expansion microscopy with various antibodies recognising tubulin and, in the second part of the work, its candidate PTMs, as well as markers of Plasmodium, in addition to live imaging with a fluorescent marker for tubulin. In the third part of the study, they generated 3 mutants deprived of either the last four residues or the last 11 residues, or where a candidate polyglutamylation site was substituted by an alanine residue.

      Strengths:

      In the first part, microtubules are monitored by a combination of two approaches (IFA and live), revealing nicely the evolution of the sporozoite subpellicular microtubules (SSPM, the sporozoite is the developmental stage present in salivary glands of the mosquitoes and that infects hepatocytes) into a different structure termed liver-stage parasite microtubule bundle (LSPMB). The LSPMB shrinks during the course of parasite development and finally disappears while hemi-spindles emerge over time. Contact points between these two structures are observed frequently in live cells and occasionally in fixed cells, suggesting the intriguing possibility that tubulin might be recycled from the LSPMB to contribute to hemi-spindle formation.

      In the second part, antibodies recognising (1) the final tyrosine found at the C-terminal tail and (2) a stretch of 3 glutamate residues in a side chain are used to monitor these candidate PTMs. Signals are positive at the SSPM, and while it remains positive for polyglutamylation, it becomes negative for the final tyrosine at the LSPM, while a positive signal emerges at hemi-spindles at later stages of development.

      In the last part, the three mutants are fed to mosquitoes, where they show reduced development, the one lacking the alpha-tubulin tail even failing to reach the salivary glands. However, the two other mutants infect HeLa cells normally, whereas sporozoites with the C-terminal tail deletion recovered from the haemolymph did not develop in these cells.

      The first part provides convincing evidence that microtubules are extensively remodelled during the infection of hepatocytes and HeLa cells, in agreement with the spectacular Plasmodium morphogenetic changes accompanying massive and rapid proliferation. The third part brings further confirmation that the C-terminal tail of alpha-tubulin is essential for multiple stages of parasite development, in agreement with previous work (50). Since it is the region where several post-translational modifications take place in other organisms (detyrosination, polyglutamylation, glycylation), it makes sense to propose that the essential function is related to these PTMs also in Plasmodium.

      Weaknesses:

      The significance of tubulin PTM relies on two antibodies whose reactivity to Plasmodium tubulins is unclear (see below). The interpretation of the literature on detyrosination and polyglutamylation is confusing in several places, meaning that the statements about the possible role of these PTMs need to be carefully revisited.

      The authors use the term "tyrosination" but the alpha1-tubulin studied here possesses the final tyrosine when it is synthesised, so it is "tyrosinated" by default. It could potentially be removed by a tyrosine carboxypeptidase of the vasoinhibin family (VASH) as reported in other species. After removal, this tyrosine can be added again by a tubulin-tyrosine ligase (TTL) enzyme. It is therefore more appropriate to talk about detyrosination-retyrosination rather than tyrosination (this confusion is unfortunately common in the literature, see Janke & Magiera, 2020).

      The difficulty here is that there is so far no evidence that detyrosination takes place in Plasmodium. Neither VASH nor TTL could be identified in the Plasmodium genome (ref 31, something we can confirm with our unsuccessful BLAST analyses), and mass spectrometry studies of purified tubulin, albeit from blood stages, did not find evidence for detyrosination (reference 43). Western blots using an antibody against detyrosinated tubulin did not produce a positive signal, neither on purified tubulin, nor on whole parasites (43). Of course, the situation could be different in liver stages, but the question of the detyrosinating enzyme is still there. The existence of a unique Plasmodium system for detyrosination cannot be formally ruled out but given the high degree of conservation of these PTMs and their associated enzymes, it sounds difficult to imagine.

      The fact that the anti-tyrosinated antibody still produced a signal in the cell line where the final tyrosine is deleted raises issues about its specificity. A cross-reactivity with beta-tubulin is proposed, but the Plasmodium beta-tubulin does not carry a final tyrosine, further raising concerns about antibody specificity.

      The interpretation of these results should therefore be considered carefully. There also seems to be some confusion in the function of detyrosination cited from the literature. It is said in line 229 that "tyrosination has been associated with stable microtubules" (33, 34, 50, 55). References 33 and 34 actually show that tyrosinated microtubules turn over faster in neurons or in epithelial cells, respectively, while references 50 and 55 do not study de/retyrosination. The general consensus is that tyrosinated microtubules are more dynamic (see reference 24).

      The situation is a bit different for polyglutamylation since several candidate poly- or mono-glutamylases have been identified in the Plasmodium genome, and at least mono-glutamylation of beta-tubulin has been formally proven, still in bloodstream stages (ref 43). The authors propose that the residue E445 is the polyglutamylation site. To our knowledge, this has not been demonstrated for Plasmodium. This residue is indeed the favourite one in several organisms such as humans and trypanosomes (Eddé et al., Science 1990; Schneider et al., JCS, 1997), and it is tempting to propose it would be the same here. However, TTLLs bind the tubulin tails from their C-terminal end like a glove on a finger (Garnham et al., Cell, 2015), and the presence of two extra residues in Plasmodium tubulins would mean that the reactive glutamate might be in position E447 rather than E445. This is worth discussing.

      On the positive side, it is encouraging to see that signals for both anti-tyrosinated tail and poly-glutamylated side chain are going down in the various mutants, but this would need validation with a comparison for alpha-tubulin signal.

      Line 316: polyglutamylation "is commonly associated with dynamic microtubule behavior (78-80)". Actually, references 78 and 79 show the impact of this PTM on interaction with spastin, and reference 80 discusses polyglutamylation as a marker of stable microtubules in the context of cilia and flagella. The consensus is that polyglutamylated microtubules tend to be more stable (ref24).

      Conclusion:

      The first and the third parts of this manuscript - evolution of microtubules and importance of the C-terminal tails for Plasmodium development - are convincing and well supported by data. However, the presence and role of tubulin PTM should be carefully reconsidered.

      Plasmodium tubulins are more closely related to plant tubulins and are sensitive to inhibitors that do not affect mammalian microtubules. They therefore represent promising drug targets as several well-characterised compounds used as herbicides are available. The work produced here further defines the evolution of the microtubule network in sporozoites and liver stages, which are the initial and essential first steps of the infection. Moreover, Plasmodium has multiple specificities that make it a fascinating organism to study both for cell biology and evolution. The data reported here are elegant and will attract the attention of the community working on parasites but also on the cytoskeleton at large. It will be interesting to have the feedback of other people working on tubulin PTMs to figure out the significance of this part of the work.

      We thank Reviewer #2 for the thoughtful and detailed evaluation of our manuscript. We are pleased that the reviewer found our study elegant and believe it will attract the attention of the broader scientific community, both those working on parasites and those focused on cytoskeleton biology. We also acknowledge the concerns raised regarding the specificity of the antibodies used to detect tubulin post-translational modifications (PTMs), as well as the interpretation of their signals and the current lack of identified detyrosination enzymes in the Plasmodium genome. We agree that these are important limitations, and we will address them thoroughly in the revised manuscript. This includes clarifying our interpretation of tyrosination versus detyrosination, adjusting our claims regarding polyglutamylation sites, and carefully revisiting the literature cited to ensure accurate contextualization of PTM function in microtubule stability.

      We are grateful for the reviewer’s close reading and critical feedback, which will help us substantially improve the clarity, precision, and strength of our manuscript.

      Reviewer #3 (Public review):

      Summary:

      The manuscript by Atchou et al. investigates the role of the microtubule cytoskeleton in sporozoites of Plasmodium berghei, including possible functions of microtubule post-translational modifications (tyrosination and polyglutamylation) in the development of sporozoites in the liver. They also assessed the development of sporozoites in the mosquito. Using cell culture models and in vivo infections with parasites that contain tubulin mutants deficient in certain PTMs, they show that may aspects of the life cycle progression are impaired. The main conclusion is that microtubule PTMs play a major role in the differentiation processes of the parasites.

      However, there are a number of major and minor points of criticism that relate to the interpretation of some of the data.

      We thank Reviewer #3 for the overall positive assessment of our study and for recognizing its contribution to advancing our understanding of Plasmodium biology and malaria pathogenesis. We appreciate the reviewer’s constructive feedback, particularly regarding the interpretation of some of our data. These comments have been very helpful in guiding our revisions, and we have worked to improve both the clarity of our presentation and the precision of our interpretations in the revised manuscript.

      Below, we respond in detail to each of the reviewer’s points.

      Comments:<br /> (1) The first paragraph of "Results" almost suggests that the presence of a subpellicular MT-array in sporozoites is a new discovery. This is not the case, see e.g. the recent publication by Ferreira et al. (Nature Communications, 2023).

      We thank the reviewer for pointing this out and fully agree that the subpellicular microtubule (SPM) array in sporozoites is well established, as documented in earlier work (e.g., Cyrklaff et al., 2007) and more recently by Ferreira et al. (Nat. Commun., 2023). Our intention was not to suggest that the existence of the SSPM is a novel finding. Rather, our study builds on this existing knowledge by demonstrating that these sporozoite-derived microtubules are not disassembled upon hepatocyte entry but are repurposed into a newly described structure, the liver stage parasite microtubule bundle (LSPMB). This reorganization, its persistence into liver stage development, and its dynamic role in microtubule remodeling and nuclear division are, to our knowledge, novel observations. We will revise the manuscript to make this distinction clearer in the introduction and the results section.

      (2) Why were HeLa cells and not hepatocytes (as in Figure 3) used for measuring infection rates of the mutants in Figure 5H and 5L? As I understand, HeLa cells are not natural host cells for invading sporozoites. HeLa cells are epithelial cells derived from a cervical tumour. I am not an expert in Plasmodium biology, but is a HeLa infection an accepted surrogate model for liver stage development?

      We appreciate the opportunity to clarify our experimental model. While HeLa cells are not the natural host cells, they are a well-established and validated in vitro model for studying Plasmodium berghei liver stage development in our lab and others. In this system, the parasite completes its full development and generates infectious merozoites. Numerous studies have successfully used HeLa cells as a liver stage infection model, with key findings subsequently validated in primary hepatocytes or in vivo, confirming its utility as a representative model. We employed this cell line primarily to reduce animal usage in accordance with the 3Rs principles (Replacement, Reduction, Refinement). Importantly, to ensure the biological relevance of our discoveries in HeLa cells, we validated our key findings in primary mouse hepatocytes, as shown in Figure 3. Furthermore, we confirmed the in vivo infectivity of mutant parasite lines that produced typical salivary gland sporozoites through an in vivo infection assay, presented in Figure S4C.

      (3) The tubulin staining in Figures 1A and 1B is confusing and doesn't seem to make sense. Whereas in 1A the antibody nicely stains host and parasite tubulin, in 1B, only parasite tubulin is visible. If the same antibody and the same host cells have been used, HeLa cytoplasmic microtubules should be visible in 1B. In fact, they should be the predominant antigen. The same applies to Figure 2, where host microtubules are also not visible.

      We thank the reviewer for this careful observation regarding the α-tubulin staining in Figures 1A and 1B. The same host cell type (HeLa) and α-tubulin antibody were indeed used in both experiments. Figure 1A shows results from conventional immunofluorescence assays, where both host and parasite microtubules are clearly stained. In contrast, Figure 1B shows the outcome of ultrastructure expansion microscopy (U-ExM), where parasite microtubules appear prominently, while host microtubules are less visible.

      This effect appears to be a technical outcome of the U-ExM protocol, which can differentially preserve or reveal microtubule epitopes. We consistently observed stronger parasite signal across various cell types, including primary hepatocytes (Figure 3A,B). The lack of visible host microtubules in some U-ExM images does not reflect their absence, but rather reduced signal intensity relative to the parasite structures. This is not observed with all antibodies, e.g., host microtubules stain strongly with anti-tyrosinated α-tubulin (Figure 3B), likely reflecting their high tyrosination state.

      To overcome this limitation, we employed PS-ExM and combined PS-ExM/U-ExM approaches (as described in reference 56), which allowed simultaneous high-resolution visualization of both host and parasite microtubule networks. These combined methods are now being used in follow-up studies to investigate host–parasite microtubule interactions in more detail.

      We will clarify this point in the revised manuscript to avoid confusion.

      (4) In Figures 2A and B, the host nuclei appear to have very different sizes in the DMSO controls and in the drug-treated cells. For example, in the 20 µM (-) image (bottom right), the nuclei are much larger than in the DMSO (-) control (top left). If this is the case, expansion microscopy hasn't worked reproducibly, and therefore, quantification of fluorescence is problematic. The scalebar is the same for all panels.

      The expansion microscopy methods used in this study have been rigorously validated for both reproducibility and isotropicity. However, as the reviewer rightly notes, host cell nuclei can vary in size due to several factors, including cell cycle stage, infection status, and the extent of parasite development, all of which can influence host nuclei morphology and size.

      Importantly, the quantifications relevant to our conclusions were focused specifically on parasite structures. We did not rely on host nuclear size or host fluorescence intensity as a quantitative readout in this context. While we acknowledge the observed variability in host nuclear dimensions, it does not compromise the accuracy or reproducibility of the parasite specific measurements central to our study.

      We will clarify this point in the revised figure legend and manuscript.

      (5) I don't quite follow the argument that spindles and the LSPMB are dynamic structures (e.g., lines 145, 174). That is a trivial statement for the spindle, as it is always dynamic, but beyond that, it has only been shown that the structure is sensitive to oryzalin. That says little about any "natural" dynamic behaviour. Any microtubule structure can be destroyed by a particular physical or chemical treatment, but that doesn't mean all structures are dynamic. It also depends on the definition of "dynamic" in a particular context, for example, the time scale of dynamic behaviour (changes within seconds, minutes, or hours).

      We agree that sensitivity to chemical depolymerization alone does not necessarily indicate dynamic behavior, particularly in the absence of data on turnover kinetics or temporal changes.

      Our interpretation was based on two observations: first, that the LSPMB, which derives from the highly stable sporozoite subpellicular microtubules (known to be drug-resistant), becomes susceptible to depolymerization during the liver stage; and second, that the LSPMB gradually shrinks over time during parasite development. These features suggested a transition toward a more dynamic state compared to its origin. However, we fully agree that “dynamic” is a context-dependent term and that direct evidence such as turnover rates or structural changes on short time scales, is required to rigorously define microtubule dynamics.

      We will revise the manuscript to clarify our use of this term and explicitly acknowledge the need for further studies to characterize the timescale and mechanisms underlying LSPMB remodeling.

      (6) I am not sure what part in the story EB1 plays. The data are only shown in the Supplements and don't seem to be of particular relevance. EB1 is a ubiquitous protein associated with microtubule plus ends. The statement (line 192) that it "may play a broader role..." is unsubstantiated and cannot be based merely on the observation that it is expressed in a particular life cycle stage.

      We agree that EB1 is a ubiquitous microtubule plus-end binding protein and that its presence alone does not imply a novel function. Previous studies (e.g., Maurer et al., 2023; Yang et al., 2023; Zeeshan et al., 2023) have focused on its role during Plasmodium sexual stages, while its expression during liver and mosquito stages has not been previously documented.

      Our data extend this knowledge by showing that EB1 is also expressed during liver stage development, particularly during the highly mitotic schizont phase. While we agree that this observation alone does not prove functional involvement, it raises the possibility of a broader role for EB1 in regulating microtubule dynamics beyond sexual stages. To avoid overinterpretation, we have presented these findings in the supplementary material and will revise the manuscript to tone down speculative statements and clearly frame this as a preliminary observation that warrants further investigation.

      (7) Line 196 onwards: The antibody IN105 is better known in the field as polyE. Maybe that should be added in Materials and Methods. Also, the antibody T9028 against tyrosinated tubulin is poorly validated in the literature and rarely used. Usually, researchers in this field use the monoclonal antibody YL1/2. I am not sure why this unusual antibody was chosen in this study. In fact, has its specificity against tyrosinated α-tubulin from Plasmodium berghei ever been shown? The original antigen was human and had the sequence EGEEY. The Plasmodium sequence is YEADY and hence very different. It is stated that the LSPMB is both polyglutamylated and tyrosinated. This is unusual because polyglutamylated microtubules are usually indicative of stable microtubules, whereas tyrosinated microtubules are found on freshly polymerised and dynamic microtubules. However, a co-localisation within the same cell has not been attempted. This is, however, possible since polyE is a rabbit antibody and T9028 is a mouse antibody. I suspect that differences or gradients along the LSPMB would have been noticed. Also, in lines 207/208, it is said that tyrosination disappears after hepatocyte invasion, which is shown in Figure 3. However, in Figure 3A, quite a lot of positive signals for tyrosination are visible in the 54 and 56 hpi panels.

      First, we acknowledge that the IN105 antibody is more widely known as "polyE" in the field. We will update the Materials and Methods section accordingly to reflect this nomenclature.

      Regarding the use of the T9028 antibody against tyrosinated α-tubulin: we agree that this monoclonal antibody is less commonly used than YL1/2, and we appreciate the reviewer drawing attention to this. The original antigen for T9028 is based on the mammalian C-terminal sequence EGEEY, which differs from the Plasmodium α1-tubulin sequence (YEADY). Like many in the field, we face the challenge that most available antibodies are raised against mammalian epitopes, and specificity in Plasmodium can vary. Nonetheless, the literature (e.g., Hirst et al., 2022; Fennell et al., 2008) has demonstrated that tyrosination occurs in Plasmodium α1-tubulin, using anti-tyrosination antibodies including YL1/2.

      Following the reviewer’s excellent suggestion, we are currently repeating the key experiments using the YL1/2 antibody to compare staining patterns directly with those obtained using T9028. We will include these results in the revised manuscript.

      Concerning the potential co-localization of polyglutamylation and tyrosination on the LSPMB: we agree that this is an interesting and testable hypothesis. In the current manuscript, Figures 3A and 3B were generated from independent experiments, and thus co-localization was not assessed. However, as the reviewer correctly notes, polyE and T9028 antibodies are raised in rabbit and mouse, respectively, making co-staining feasible. We will follow up on this experimentally and, if feasible within our revision timeline, include data in the revised version or highlight this as a future direction.

      Finally, with regard to Figure 3 and the observation that tyrosination appears to persist at 54 and 56 hpi (Figure 3B): the reviewer is correct that tyrosination signal is still detectable at these time points. Our statement that tyrosination “disappears after hepatocyte invasion” was intended to refer to an overall decrease in signal intensity during early liver stage development, with a reappearance at later stages (e.g., cytomere formation). We will rephrase this section for greater clarity and ensure that figure annotations and legends unambiguously reflect the dynamics observed.

      (8) In line 229, it is stated that tyrosination "has previously been associated with stable microtubule in motility". This statement is not correct. In fact, none of the cited references that apparently support this statement show that this is the case. On the contrary, stable microtubules, such as flagellar axonemes, are almost completely detyrosinated. Therefore, tyrosination is a marker for dynamic microtubules, whereas detyrosinated microtubules are indicative of stable microtubules. This is an established fact, and it is odd that the authors claim the opposite.

      We fully agree that in canonical eukaryotic systems, tyrosinated microtubules are generally markers of dynamic microtubule populations, whereas detyrosinated microtubules are typically associated with stability particularly in structures such as flagellar axonemes.

      Our original statement will be corrected. In our study, we observed that tyrosinated microtubules are prevalent in invasive stages (sporozoites and merozoites), while detyrosinated forms become more prominent during intracellular liver stage development. This pattern is consistent with the established link between tyrosination and dynamic microtubules.

      What is particularly intriguing in Plasmodium is the apparent cycling of tyrosination despite the absence of known tubulin tyrosine ligase (TTL) homologs in the genome. This suggests either a highly divergent enzyme or the involvement of host cell factors, a hypothesis supported by the reappearance of tyrosinated microtubules during liver stage schizogony (Figure 3B).

      We will revise the relevant text and the Discussion section to reflect these mechanistic considerations more accurately and to avoid misrepresenting established principles of microtubule biology.

      (9) Line 236 onwards: Concerning the generation of tubulin mutants, I think it is necessary to demonstrate successful replacement of the wild-type allele by the mutant allele. I am sure the authors have done this by amplification and subsequent sequencing of the genomic locus using PCR primers outside the plasmid sequences. I suggest including this information, e.g., by displaying the chromatograph trace in a supplementary figure. Or are the sequences displayed in Figure S3B already derived from sequenced genomic DNA? This is not described in the Legend or in Materials and Methods. The left PCR products obtained for Figure S3 B would be a suitable template for sequencing.

      Indeed, these data are presented in Figure 4B and the corresponding sequence data are shown in Figure S3B. We appreciate the reviewer’s suggestion, which will help improve the transparency and reproducibility of our methodology.

      (10) It is also important to be aware of the fact that glutamylation also occurs on β-tubulin. This signal will also be detected by polyE (IN105). Therefore, it is surprising that IN105 immunofluorescence is negative on the C-term Δ cells (Figure S3 D). Is there anything known about confirmed polyglutamylation sites on both α- and β-tubulins in Plasmodium, e.g., by MS? In Toxoplasma, both α- and β-tubulin have been shown to be polyglutamylated.

      Indeed, polyglutamylation is known to occur not only on α-tubulin but also on β-tubulin in many organisms, including Toxoplasma gondii, and the polyE (IN105) antibody is expected to detect polyglutamylation on both tubulin isoforms.

      The parasites shown in Figure S3D correspond to mutant lines originally generated by Spreng et al. (2019): the IntronΔ mutant (with deletion of introns in the Plasmodium α1-tubulin gene) and the C-termΔ mutant (with deletion of the final three C-terminal residues: ADY). As the reviewer correctly notes, this particular C-terminal deletion does not include the predicted polyglutamylation site (E445 or E447, depending on alignment), and thus should not abolish all polyglutamylation. However, in our experiments, the IN105 signal is substantially reduced in this mutant. This may suggest that structural alterations in the tubulin tail affect accessibility of the polyglutamylation epitope or influence the modification itself though we cannot exclude other possibilities, including changes in antibody recognition.

      To date, polyglutamylation sites in Plasmodium tubulins have not been definitively confirmed by mass spectrometry. However, a recent MS-based study (reference 43) detected monoglutamylation on β-tubulin in blood stage parasites. Direct MS evidence for polyglutamylation of either α- or β-tubulin in Plasmodium liver stages is still lacking. We will clarify these points in the revised manuscript to avoid potential confusion and to highlight the need for future biochemical validation of PTM sites.

      (11) Figure S3 is very confusing. In the legend, certain intron deletions are mentioned. How does this relate to posttranslational tubulin modifications? The corresponding section in Results (lines 288-292) is also not very helpful in understanding this.

      The parasite lines shown in Figure S3D were originally generated by Spreng et al. (2019) and are not directly part of the main set of PTM-targeted mutants described in our study. Specifically, the IntronΔ line carries deletions in introns of the Plasmodium α1-tubulin gene, while the C-termΔ line lacks the final three C-terminal residues (ADY). These lines were included for comparative purposes to explore whether structural changes in α-tubulin could impact polyglutamylation signal, as detected by the polyE (IN105) antibody.

      We acknowledge that the figure legend and corresponding text (lines 288–292) did not adequately explain the rationale for including these control lines. We will revise both the legend and Results section to more clearly describe the origin, purpose, and relevance of these mutants to the overall study.

      (12) Figure 4E doesn't look like brightfield microscopy but like some sort of fluorescent imaging. In Figure 4C, were the control (NoΔ) cells with an integrated cassette, but no mutations, or non-transgenic cells?

      The reviewer is absolutely correct: Figure 4E shows a fluorescent image acquired using widefield microscopy and not a brightfield image. We will revise the figure legend accordingly to avoid confusion. The “BF” (brightfield) label applies only to the left panel in Figure 4C, which depicts oocysts imaged using transmitted light.

      Regarding the controls labeled "NoΔ" in Figure 4C, we confirm that these parasites contain the integrated selection cassette but do not harbor any mutations in the target gene. They serve as proper integration controls, allowing us to distinguish the effects of the point mutations or deletions introduced in the experimental lines.

      (13) It is difficult to understand why the TyΔ and the CtΔ mutants still show quite a strong signal using the anti-tyrosination antibody. If the mutants have replaced all wild-type alleles, the signal should be completely absent, unless the antibody (see my comment above concerning T9028) cross-reacts with detyrosinated microtubules. Therefore, the quantitation in Figures 5F and 5G is actually indicative of something that shouldn't be like that. The quantitation of 5F is at odds with the microscopy image in 5D. If this image is representative, the anti-Ty staining in TyΔ is as strong as in the control NoΔ.

      We agree that the persistence of anti-tyrosination signal in the TyΔ and CtΔ mutant lines is unexpected, given that all wild-type alleles were replaced. This discrepancy has led us to further investigate the specificity of the T9028 antibody, as raised in the reviewer’s earlier comment. To address this concern, we are currently repeating the key experiments using the well-established YL1/2 monoclonal antibody, which is widely accepted for detecting tyrosinated α-tubulin in other systems.

      We also acknowledge that Figure 5F shows residual tyrosination signal, and the reviewer is correct that this should not occur if the modified residues are the exclusive PTM sites. One possible explanation is that adjacent residues or even alternative tubulin isoforms may serve as substrates. While α1-tubulin is the dominant isoform in Plasmodium, low-level expression of α2-tubulin has been detected in liver stages based on transcriptomic data, and it may contribute to the observed signal.

      Regarding the apparent discrepancy between the quantification in Figure 5F and the representative image in Figure 5D, we will revise the figure legend to clarify that image selection aimed to show detectable signal, not necessarily the average phenotype. We will also reassess and, if needed, repeat the quantification with improved image sets to ensure accuracy and consistency.

      We will revise the manuscript to reflect these points and include a more nuanced interpretation of the residual staining in the mutant lines.

      (14) The statement that the failure of CtΔ mutants to generate viable sporozoites is due to the lack of microtubule PTMs (lines 295-296) is speculative. The lack of the entire C-terminal tail could have a number of consequences, such as impaired microtubule assembly or failure to recruit and bind associated proteins. This is not necessarily linked to PTMs. Also, it has been shown in yeast that for microtubules to form properly and exquisite regulation (proteostasis) of the ratio between α- and β-tubulin is essential (Wethekam and Moore, 2023). I am not sure, but according to Materials and Methods (line 423), the gene cassettes for replacing the wild-type tubulin gene with the mutant versions contain a selectable marker gene for pyrimethamine selection. Are there qPCR data that show that expression levels of mutant α-tubulin are more or less the same as the wild-type levels?

      We agree that attributing the developmental failure of the CtΔ mutants solely to the absence of microtubule post-translational modifications (PTMs) is speculative. As the reviewer rightly points out, deletion of the entire C-terminal tail may have multiple effects, including impaired microtubule assembly, altered α/β-tubulin stoichiometry, or disruption of interactions with essential microtubule-associated proteins (MAPs). These consequences may arise independently of PTMs.

      That said, we note that PTMs particularly polyglutamylation, can modulate MAP binding by altering the surface charge of microtubules (Genova et al., 2023; Mitchell et al., 2010). Therefore, while PTM loss may be a contributing factor, we acknowledge that the phenotype likely results from a combination of mechanisms. We will revise the relevant section of the manuscript to present a more cautious and balanced interpretation.

      Regarding the reviewer’s question on expression levels: although the replacement constructs include a pyrimethamine resistance cassette, we have not yet quantified α-tubulin transcript levels by qPCR. In the interim, the study by Spreng et al. (2019) (reference 50) on a related α1-tubulin nutations provides valuable insight. They observed no difference in mRNA levels in day 12 oocysts, yet reported fainter microtubule staining and shorter sporozoites, suggesting a post-transcriptional mechanism affecting protein expression or function in later stages. Furthermore, the phenotypic spectrum across their mutant panel (Suppl. Fig. 3 D and E) implies that robust α-tubulin regulation is highly sensitive to specific sequences.

      We acknowledge this as a current limitation in our study and will address it in the revised manuscript, noting that direct measurement of transcript levels is a key area for future investigation.

      (15) In the Discussion, my impression is that two recent studies, the superb Expansion Microscopy study by Bertiaux et al. (2021) and the cryo-EM study by Ferreira et al. (2023), are not sufficiently recognised (although they are cited elsewhere in the manuscript). The latter study includes a detailed description of the microtubule cytoskeleton in sporozoites. However, the present study clearly expands the knowledge about the structure of the cytoskeleton in liver stage parasites and is one of the few studies addressing the distribution and function of microtubule post-translational modifications in Plasmodium.

      Indeed, our work builds upon the established knowledge from Bertiaux et al. (2021) and the cryo-EM study by Ferreira et al. (2023), as rightly mentioned by the reviewer. We agree that these foundational studies, combined with our findings, will significantly expand the understanding of Plasmodium biology and cytoskeleton dynamics across its life cycle and will open the door for further investigations. We are grateful for this suggestion and will ensure these key studies are appropriately acknowledged in the revised manuscript.

      (16) I somewhat disagree with the statement of a co-occurrence of polyglutamylated and tyrosinated microtubules. I think the resolution is too low to reach that conclusion. As this is a bold claim, and would be contrary to what is known from other organisms, it would require a more rigorous validation. Given the apparent problems with the anti-Ty antibody (signal in the TyΔ mutant), one should be very cautious with this claim.

      This is a very important point to clarify. As mentioned previously, the initial experiments for these modifications were performed independently. It is established that sporozoite subpellicular microtubules exhibit both tyrosination and polyglutamylation. We will revise the manuscript to temper this statement and clearly indicate that the co-occurrence of these PTMs remains a hypothesis that requires more rigorous validation. As suggested, we are now conducting additional co-staining experiments using the better validated YL1/2 antibody to re-express and directly compare the distribution of both PTMs within the same cell. These follow-up experiments will help clarify whether both modifications occur simultaneously on the same microtubule structures in Plasmodium liver stages.

      (17) In the Discussion (lines 311 and 377), it is again claimed that tyrosinated microtubules are "a well-known marker of stable microtubules". This statement is completely incorrect, and I am surprised by this serious mistake. A few lines later, the authors say that polyglutamylated is "commonly associated with dynamic microtubule behaviour". Again, this is completely incorrect and is the opposite of what is firmly established in the literature. Polyglutamylation and detyrosination are markers of stable microtubules.

      Indeed, in canonical eukaryotic systems, tyrosinated microtubules are generally considered markers of dynamic microtubule populations, whereas detyrosinated and polyglutamylated microtubules are more commonly associated with stability.

      We acknowledge this mistake and will revise the Discussion to correct these statements accordingly. In the context of Plasmodium, our observations suggest an unusual regulation of microtubule dynamics, which may reflect parasite-specific adaptations. For example, we observed tyrosinated α-tubulin in the stable subpellicular microtubules of sporozoites structures typically known for their exceptional stability. This atypical association implies either non-canonical roles for tyrosination or parasite-specific mechanisms for modulating microtubule properties. Additionally, the presence of both PTMs at different stages of development and on different microtubule populations suggests tightly regulated spatial and temporal modulation of microtubule function.

      We will carefully revise the relevant sections of the manuscript to remove incorrect generalizations and ensure accurate representation of the current consensus in the field, while emphasizing the possibility of Plasmodium-specific adaptations that merit further study.

      (18) In line 339, the authors interpret the residual antibody staining after the introduction of the mutant tubulin as a compensatory mechanism. There is no evidence for this. More likely explanations are firstly the quality of the anti-Ty-antibody used (see comment above), and the fact that also β-tubulin carries C-terminal polyglutamylation sites, which haven't been investigated in this study. PTMs on β-tubulin are not compensatory, but normal PTMs, at least in all other organisms where microtubule PTMs have been investigated.

      As mentioned above, we are currently repeating the key experiments with the [YL1/2] antibody, as suggested. Furthermore, we fully agree with the reviewer's point regarding polyglutamylation on β-tubulin. The C-terminal tail of β-tubulin does indeed contain polyglutamylation sites. As we noted in the manuscript (Lines 340-352), this aspect has not been investigated in the present study, and we acknowledge it as a valuable direction for future research. We will revise the text accordingly to avoid overinterpretation and to more accurately reflect the limitations of our current data.

    1. many therapists believe strongly in the unconscious and the impact of early childhood experiences on the rest of a person’s life.

      Many therapists think that things we go through as young children, along with unconscious thoughts we may not even be aware of, can shape how we think, feel, and act for the rest of our lives. How might early childhood experiences affect the way a person handles relationships or emotions as an adult?

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      In this work, the authors apply TDCS to awake and anesthetized macaques to determine the effect of this modality on dynamic connectivity measured by fMRI. The question is to understand the extent to which TDCS can influence conscious or unconscious states. Their target was the PFC. During the conscious states, the animals were executing a fixation task. Unconsciousness was achieved by administering a constant infusion of propofol and a continuous infusion of the muscle relaxant cisatracurium. They observed the animals while awake receiving anodal or cathodal hd-TDCS applied to the PFC. During the cathodal stimulation, they found disruption of functional connectivity patterns, enhanced structure-function correlations, a decrease in Shannon entropy, and a transition towards patterns that were more commonly anatomically based. In contrast under propofol anesthesia anodal hd-TDCS stimulation appreciably altered the brain connectivity patterns and decreased the correlation between structure and function. The PFC stimulations altered patterns associated with consciousness as well as those associated with unconsciousness.

      Strengths: 

      The authors carefully executed a set of very challenging experiments that involved applying tDCS in awake and anesthetized non-human primates while conducting functional imaging.

      We thank the Reviewer for summarising our study and for his appreciation of the highly challenging experiments we performed.

      Weaknesses:

      The authors show that tDCS can alter functional connectivity measured by fMRI but they do not make clear what their studies teach the reader about the effects of tDCS on the brain during different states of consciousness. No important finding is stated contrary to what is stated in the abstract. It is also not clear what the work teaches us about how tDCS works nor is it clear what are the "clinical implications for disorders of consciousness." The deep anesthesia is akin to being in a state of coma. This was not discussed.  

      While the authors have executed a set of technically challenging experiments, it is not clear what they teach us about how tDCS works, normal brain neurophysiology, or brain pathological states such as disorders of consciousness.

      We thank the reviewer for his comments. We agree that we could better highlight the value and implications of our work, and we take this opportunity to improve our manuscript according to the suggestions.

      Actions in the text: We have added several new paragraphs in the Discussion section, considering these comments and other related remarks from the Reviewing Editor (see below our answer to the first comment of the Reviewing Editor: REC#1).

      Reviewer #2 (Public review): 

      General comments: 

      The authors investigated the effects of tDCS on brain dynamics in awake and anesthetized monkeys using functional MRI. They claim that cathodal tDCS disrupts the functional connectivity pattern in awake monkeys while anodal tDCS alters brain patterns in anesthetized monkeys. This study offers valuable insight into how brain states can influence the outcomes of noninvasive brain stimulation. However, there are several aspects of the methods and results sections that should be improved to clarify the findings.

      We thank the Reviewer for the summary and appreciation of our study.  

      Major comments 

      For the anesthetized monkeys, the anode location differs between subjects, with the electrode positioned to stimulate the left DLFPC in monkey R and the right DLPFC in monkey N. The authors mention that this discrepancy does not result in significant differences in the electric field due to the monkeys' small head size. However, this is incorrect, as placing the anode on the left hemisphere would result in a much lower EF in the right DLPFC than placing the anode on the right side. Running an electric field simulation would confirm this. Additionally, the small electrode size suggested by the Easy cap configuration for NHP appears sufficient to stimulate the targeted regions focally. If this interpretation is correct, the authors should provide additional evidence to support their claim, such as a computational simulation of the EF distribution.

      We thank the Reviewer for the comments. First, regarding the reviewer’s statement that placing the anode on the left hemisphere would result in a much lower EF in the right DLPFC than placing the anode on the right side, we would like to clarify that we did not use a typical 4 x 1 concentric ring high-definition setup (which consists of a small centre electrode surrounded by four return electrodes), but a two-electrode montage, with one electrode over the left or right PFC and the other one over the contralateral occipital cortex. According to EF modelling papers, a 4 x 1 high-definition setup would produce an EF that is focused and limited to the cortical area circumscribed by the ring of the return electrodes (Datta et al. 2009; Alam et al. 2016). Therefore, targeting the left or right DLPFC with a 4 x 1 setup would produce an EF confined to the targeted hemisphere of the PFC. In contrast, we expect the brain current flow generated with our 2-electrode setup to be broader, despite the small size of the electrodes,  because there is no constraint from return electrodes. Thus, with our setup, the current is expected to flow between the PFC and the occipital cortex (see also our responses to comments R3.3., R.E.C.#2.1. and R.E.C.#2.2.). 

      Second, we would like to point out that in awake experiments, in which we stimulated the right PFC of both monkeys, there was no gross evidence of left or right asymmetry in the computed functional connectivity patterns (Figure 3A, Figure 3 - figure supplement 2A; Figure 5A). These results, showing that our stimulation montages did not induce asymmetric dynamic FC changes in NHPs, support the idea that our setups did not generate EFs that were spatially focused enough to alter brain activity in one hemisphere substantially more than the other.

      Third, it is also worth noting that current evidence suggests that human brains are significantly more lateralized than those of macaques. Macaque monkeys have been found to have some degree of lateralized networks, but these are of lower complexity, and the lateralization is less pronounced and functionally organized than in humans. (Whey et al., 2014; Mantini et al., 2013). This suggests that, even if the stimulation were focal enough to stimulate the left or the right part of the PFC only, the behavioural effects would likely be similar.

      We strongly agree with the reviewer that conducting an EF simulation would be valuable to confirm our expectations and to gain a comprehensive view of the characteristics of the EFs generated with our different setups in NHPs. However, the challenge is in the fact that EF computational models have been developed for humans, and their use in NHPs is not straightforward due to significant anatomical differences. For example, macaque monkeys are distinct from humans in terms of brain size, shape and cortical organisation, skull thickness, and the presence of muscles, as well as different tissue conductivities (Lee et al. 2015; Datta et al.2016; Mantell et al. 2023). We plan to address this in future work.

      Actions in the text: In the Materials and Methods section, we have modified the sentence: “Because of the small size of the monkey's head and because we did not use return electrodes to restrict the current flow (as is achieved with typical high-definition montages (Datta et al. 2009; Alam et al. 2016)), we expected that tDCS stimulation with the two symmetrical montages would result in nearly equivalent electric fields across the monkey’s head and produce roughly similar effects on brain activity.” 

      We also added a new sentence about EF simulation: 

      “This would need to be confirmed by running an electric field simulation. However, computational electric field models have been developed for humans, and their use in NHPs is not straightforward due to anatomical specificities. Indeed, monkeys differ from humans in terms of brain size, shape and cortical organization, skull thickness, tissue conductivities and the presence of muscles (Lee et al. 2015; Datta et al. 2016; Mantell et al. 2023). Modelling of EFs generated with the specific tDCS montages employed in this study will be performed in future work.”

      For the anesthetized monkeys, the authors applied 1 mA tDCS first, followed by 2 mA tDCS. A 20-minute stimulation duration of 1 mA tDCS is strong enough to produce after-effects that could influence the brain state during the 2 mA tDCS. This raises some concerns. Previous studies have shown that 1 mA tDCS can generate EF of over 1 V/m in the brain, and the effects of stimulation are sensitive to brain state (e.g., eye closed vs. eye open). How do the authors ensure that there are no after-effects from the 1 mA tDCS? This issue makes it challenging to directly compare the effects of 1 mA and 2 mA stimulation.

      We agree with the reviewer's comment that 1 mA tDCS may induce aftereffects, as has been observed in several human studies (e.g., (Jamil et al. 2017, 2020). Although the differences between the 1 mA post-stimulation and baseline conditions were not significant in our analyses, it's still possible that the stimulation produced some effects below the threshold of significance that may contribute, albeit weakly, to the changes observed during and after 2 mA stimulation. We have, therefore, amended the paper in line with the reviewer's comments.

      Actions in the text: We have added the following text in the Result section: 

      “While several human studies have reported that 1 mA transcranial stimulation induces aftereffects (e.g., (Jamil et al. 2017, 2020; Monte-Silva et al. 2010), the differences between the 1 mA post-stimulation and baseline conditions were not significant in our analyses. However, it is still possible that the 1 mA stimulation produced some effects below the threshold of significance that may contribute to the changes observed during and after the 2 mA stimulation.”

      The occurrence rate of a specific structural-functional coupling pattern among random brain regions shows significant effects of tDCS. However, these results seem counterintuitive. It is generally understood that noninvasive brain stimulation tends to modulate functional connectivity rather than structural or structural-functional connectivity. How does the occurrence rate of structural-functional coupling patterns provide a more suitable measure of the effectiveness of tDCS than functional connectivity alone? I would recommend that the authors present the results based on functional connectivity itself. If there is no change in functional connectivity, the relevance of changes in structural-functional coupling might not translate into a meaningful alteration in brain function, making it unclear how significant this finding is without corresponding functional evidence.

      First, of all, we would like to make it clear that the occurrence rate of patterns as a function of their SFC is not intended to be used or seen as a ‘better’ measure of the efficacy of tDCS. Instead, it is one aspect of the effects of tDCS on whole-brain functional cortical dynamics, obtained from refined measures (phase-coherences), that specifically addresses the coupling between structure and function. This type of analysis is further motivated by its increasing use in the literature due to its suspected relationship to wakefulness (e.g., (Barttfeld et al. 2015, Demertzi et al. 2019; Castro et al. 2023)). Also, in our analysis, the structure is kept constant: the connectivity matrix used to correlate the functional brain states is always the same (CoCoMac82). Thus, the influence of tDCS on the structure-function side can only be explained by modulating the functional aspects, as suggested by intuition and previous results.

      Then, we agree with the reviewer that studying the functional changes induced by tDCS alone could be valuable. However, usual metrics used in FC analysis are usually done statistically: FC-states are either computed through averaging spatial correlations over time, then analyzed through graph-theoretical properties for instance (or by just directly computing the element-wise differences), or either by considering the properties of the different visited FC-states by computing spatial correlations over a sliding time-window, and then similar analysis can be done as previously explained. But these are static metrics, if the states visited are essentially the same (which is expected from non-invasive neuromodulations that haven’t already demonstrated strong and/or characteristic impact), but the dynamical process of visiting said states changes, one would see no difference in that regard. As such, in the case of resting-state fMRI, differences in FCs are hard to interpret given that between-sessions within-condition differences are usually found with some degree of variance for the respective conditions. Trying then to interpret between-condition differences is quite tricky in the case of subtle modulations of the system’s activity. On the other hand, more subtle differences can be captured by considering more detailed analysis, such as using phase-based methods like we did,  by incorporating some statistical learning component with regard to the dynamicity of the system (supervised learning for instance like we did followed by temporal & transition-based methodology), and by adding some dimensions along which one will be able to give some interpretation to the analysis.  In our case we were interested in characterizing resting-state differences between stimulation conditions, which have nuanced and subtle interactions with the biological system. 

      As such, classical measures of differences between FC states are likely to not be refined and precise enough. In fact, we propose additional files investigating those classically used measures such as differences in average FC matrices, or changes in functional graph properties (like modularity, efficiency and density) of the visited FC states. These figures show that, for the first case, comparing region-to-region specific FCs provides very few statistically significant results. With respect to the second part, we show that virtually no differences are observed in the properties of the functional states visited. 

      These results suggest, as expected, that the actual brain states visited across the different stimulation conditions are topologically quite similar, and that only very few region-specific pairwise functional connectivities are particularly modulated by specific tDCS montages while, on the other hand, the actual dynamical process dictating how the brain activity passes from one state to another is in fact being influenced as shown by the dynamical analysis presented in the main figures in a more apparent and meaningful way (in that it is dependent on the montage, somewhat consistent with regard to the post-stimulations conditions, and can be made sense of by considering the theoretical effect of near-anodal versus near-cathodal neuromodulatory effects).

      Actions in the text: We have added new supplementary files showing the effects of the stimulations on FC matrices and on classical functional graph properties in awake and anesthesia datasets (Supplementary Files 3 & 4).

      We have added new sentences about these new analyses on the effects of the stimulations on FC matrices and on classical functional graph properties in the Results section:

      “In addition, we performed the main analyses separately for the two monkeys, explored the inter-condition variability (Supplementary File 2), and computed classical measures of functional connectivity such as average FC matrices and functional graph properties (modularity, efficiency and density) of the visited FC states (Supplementary File 3).... In contrast, classical FC metrics did not show significant differences across stimulation conditions, highlighting the value of dynamic FC metrics to capture the neuromodulatory effects of tDCS.”

      “Analyses of the two monkeys separately showed that the changes in slope and Shannon entropy were bigger in one of the two monkeys but went in the same direction (Supplementary File 2), while classical FC metrics did not capture any statistical differences between the different stimulation conditions (Supplementary File 3).”

      The authors recorded data from only two monkeys, which may limit the investigation of the group effects of tDCS. As the number of scans for the second monkey in each consciousness condition is lower than that in the first monkey, there is a concern that the main effects might primarily reflect the data from a single monkey. I suggest that the authors should analyze the data for each monkey individually to determine if similar trends are observed in both subjects.

      We agree that the small number of subjects is a limitation of our study. However, we have already addressed these aspects by reporting statistical analyses that consider them, using linear models of such variables, and running them through ANOVA tests. In addition, we experimentally ensured that we recorded a relatively high number of sessions over a period of several years. Regardless, we agree that our study would benefit from further investigation into this matter. We have therefore prepared complementary figures showing the main analysis performed separately for the two monkeys as proposed, as well as further investigations into the inter-condition variability outmatching the inter-individual variability, itself being also outmatched by intra-individual changes. 

      Actions in the text: We have added a supplementary file showing the main analyses performed separately for the two monkeys (Supplementary File 2) and further investigations into the inter-condition variability (Supplementary Files 3 & 4).

      We have added new sentences about these analyses performed separately for the two monkeys in the Results section:

      “In addition, we performed the main analyses separately for the two monkeys, explored the inter-condition variability (Supplementary File 2), and computed classical measures of functional connectivity such as average FC matrices and functional graph properties (modularity, efficiency and density) of the visited FC states (Supplementary File 3). The separate analyses showed that the changes in slope and Shannon entropy were substantially more pronounced in one of the two monkeys, corroborating some of the effects captured in the ANOVA tests.”

      “Analyses of the two monkeys separately showed that the changes in slope and Shannon entropy were bigger in one of the two monkeys but went in the same direction (Supplementary

      File 2)”.

      Anodal tDCS was only applied to anesthetized monkeys, which limits the conclusion that the authors are aiming for. It raises questions about the conclusion regarding brain state dependency. To address this, it would be better to include the cathodal tDCS session for anesthetized monkeys. If cathodal tDCS changes the connectivity during anesthesia, it becomes difficult to argue that the effects of cathodal tDCS vary depending on the state of consciousness as discussed in this paper. On the other hand, if cathodal tDCS would not produce any changes, the conclusion would then focus on the relationship between the polarity of tDCS and consciousness. In that case, the authors could maintain their conclusion but might need to refine it to reflect this specific relationship more accurately. 

      We agree with the reviewer that it would have been interesting to investigate the effects of cathodal tDCS in anesthetized monkeys. However, due to the challenging nature of the experimental procedures under anesthesia, we had to limit the investigations to only one stimulation modality. We chose to deliver anodal stimulation because, from a translational point of view, we aimed to provide new information on the effects of tDCS under anesthesia as a model for disorders of consciousness. It also made much more sense to increase the cortical excitability of the prefrontal cortex in an attempt to wake up the sedated monkeys rather than doing the opposite.

      Actions in the text: We have added a new sentence in the Results section:

      “Due to the challenging nature of the experimental procedures under anesthesia, we limited the investigations to only one stimulation modality. We chose to deliver anodal stimulation to provide new information on the effects of tDCS under anesthesia as a model for disorders of consciousness and to increase the cortical excitability of the PFC in an attempt to wake up the sedated monkeys.”

      Reviewer #3 (Public review): 

      Summary: 

      This study used transcranial direct current stimulation administered using small 'high-definition' electrodes to modulate neural activity within the non-human primate prefrontal cortex during both wakefulness and anaesthesia. Functional magnetic resonance imaging (fMRI) was used to assess the neuromodulatory effects of stimulation. The authors report on the modification of brain dynamics during and following anodal and cathodal stimulation during wakefulness and following anodal stimulation at two intensities (1 mA, 2 mA) during anaesthesia. This study provides some possible support that prefrontal direct current stimulation can alter neural activity patterns across wakefulness and sedation in monkeys. However, the reported findings need to be considered carefully against several important methodological limitations. 

      Strengths: 

      A key strength of this work is the use of fMRI-based methods to track changes in brain activity with good spatial precision. Another strength is the exploration of stimulation effects across wakefulness and sedation, which has the potential to provide novel information on the impact of electrical stimulation across states of consciousness.

      We thank the Reviewer for the summary and for highlighting the strengths of our study. 

      Weaknesses: 

      The lack of a sham stimulation condition is a significant limitation, for instance, how can the authors be sure that results were not affected by drowsiness or fatigue as a result of the experimental procedure?

      We agree with the reviewer that adding control conditions could have strengthened our study. Control conditions usually consist of a sham condition or active control conditions. However, as mentioned in response to one of Reviewer 2 comments (R.2.5), we had to make choices as we could not perform as many experiments due to their demanding nature, especially under anesthesia. 

      In the awake state, we acquired data with two experimental conditions; the monkeys were exposed to either anodal (F4/O1) or cathodal (O1/F4) PFC tDCS. As anodal tDCS of the PFC induced only minor changes in brain dynamics, it could be considered as an active control condition for the cathodal condition, which had striking effects on the cortical dynamics. It is also worth noting that doubts have been raised about the neurobiological inertia of certain sham protocols. Indeed, different sham protocols have been employed in the literature, some of which may produce unintended effects (Fonteneau et al. 2019). Therefore, active control conditions, such as reversing the polarity of the stimulation or targeting a different brain region, have been proposed to provide better control (Fonteneau et al. 2019). Furthermore, in the context of experiments performed under anesthesia, the relevance of a sham control condition typically used to achieve adequate blinding is questionable. 

      With regard to drowsiness and fatigue as a result of the experimental procedure, we agree with the reviewer that this is a common problem in functional imaging due to the length of the recording sessions. We assumed, as was done in previous work (Uhrig, Dehaene, and Jarraya 2014; Wang et al. 2015), that the monkeys' performance on the fixation task during acquisition would capture these periods of fatigue. Therefore, only sessions with fixation rates above 85% were included in our analysis. 

      Actions in the text: We have now specified, in the Materials and Methods section, the fact that only runs with a high fixation rate (> 85%) were included in the study: 

      “To ensure that the results were not biased by fatigue or drowsiness due to the lengthy

      In the anaesthesia condition, the authors investigated the effects of two intensities of stimulation (1 mA and 2 mA). However, a potential confound here relates to the possibility that the initial 1 mA stimulation block might have caused plasticity-related changes in neural activity that could have interfered with the following 2 mA block due to the lack of a sufficient wash-out period. Hence, I am not sure any findings from the 2 mA block can really be interpreted as completely separate from the initial 1 mA stimulation period, given that they were administered consecutively. Several previous studies have shown that same-day repeated tDCS stimulation blocks can influence the effects of neuromodulation (e.g., Bastani and Jaberzadeh, 2014, Clin Neurophysiol; Monte-Silva et al., J. Neurophysiology). 

      We agree with the reviewer’s comment that the initial 1 mA stimulation block might have induced changes in neural activity and that the 20-minute post 1 mA block would not be long enough to wash out these changes. This comment is very similar to the second comment made by Reviewer 2 (R.2.2). Although our experimental data do not support this possibility (as the differences between the 1 mA post-stimulation and baseline conditions were not significant), it is still conceivable that the stimulation produced some effects below the threshold of significance and that these might weakly contribute to the changes observed during and after the 2 mA stimulation. 

      Actions in the text: We have modified the paper according to the reviewers' comments (please see our answer and actions in the text to R.2.2.).

      The different electrode placement for the two anaesthetised monkeys (i.e., Monkey R: F3/O2 montage, Monkey N: F4/O1 montage) is problematic, as it is likely to have resulted in stimulation over different brain regions. The authors state that "Because of the small size of the monkey's head, we expected that tDCS stimulation with these two symmetrical montages would result in nearly equivalent electric fields across the monkey's head and produce roughly similar effects on brain activity"; however, I am not totally convinced of this, and it really would need E-field models to confirm. It is also more likely that there would in fact be notable differences in the brain regions stimulated as the authors used HD-tDCS electrodes, which are generally more focal.

      We thank the Reviewer for the remark, which is very similar to the second comment from Reviewer 2. Please see our answer to the first comment of Reviewer 2 

      Actions in the text: We have modified the paper according to the reviewers' comments (please see the actions taken in response to R.2.1.).

      Given the very small sample size, I think it is also important to consider the possibility that some results might also be impacted by individual differences in response to stimulation. For instance, in the discussion (page 9, paragraph 2) the authors contrast findings observed in awake animals versus anaesthetised animals. However, different monkeys were examined for these two conditions, and there were only two monkeys in each group (monkeys J and Y for awake experiments [both male], and monkeys R and N [male and female] for the anaesthesia condition). From the human literature, it is well known that there is a considerable amount of inter-individual variability in response to stimulation (e.g., Lopez-Alonso et al., 2014, Brain Stimulation; Chew et al., 2015, Brain Stimulation), therefore I wonder if some of these differences could also possibly result from differences in responsiveness to stimulation between the different monkeys? At the end of the paragraph, the authors also state "Our findings also support the use of tDCS to promote rapid recovery from general anesthesia in humans...and suggest that a single anodal prefrontal stimulation at the end of the anesthesia protocol may be effective." However, I'm not sure if this statement is really backed-up by the results, which failed to report "any behavioural signs of awakening in the animals" (page 7)?

      We thank the Reviewer for this comment. Because working with non-human primates is expensive and labor intensive, the sample sizes in classical macaque experiments are generally small (typically 2-4 subjects per experiment). Our sample size (i.e. 2 rhesus macaques in awake experiments and 2 macaques under sedation, 11 +/- 9 scan sessions per animal, 288 and 136 runs in the awake and anesthesia state, respectively) is comparable to other previous work in non-human primates using fMRI (Milham et al. 2018; Yacoub et al. 2020; Uchimura, Kumano, and Kitazawa 2024). In addition, we would like to point out that the baseline cortical dynamics we found before stimulation, whether in the awake or sedated state, are comparable to previous studies (Barttfeld et al. 2015; Uhrig et al. 2018; Tasserie et al. 2022). This suggests our results are reproducible across datasets, despite the small sample size.

      That being said, we agree with the reviewer that inter-individual variability in response to stimulation can be considerable, as shown by a large body of literature in the field. It seems possible that the two monkeys studied in each condition responded differently to the stimulation. But even if that’s the case, our results suggest that at least in one of the two monkeys, cathodal PFC stimulation in the awake state and anodal PFC stimulation under propofol anesthesia induced striking changes in brain dynamics, which we believe is a significant contribution to the field. 

      In fact, supplementary analysis, as proposed by Reviewer 2 (cf R2.4), investigating how the different measurables we’ve used were differently affected by tDCS show that indeed monkey Y’s case is more apparent and significant than monkey J’s. Still, the effects observed in monkey J’s case are still congruent with what is observed in monkey Y’s and at the population level (though less flagrant). We also show that these inter-individual variabilities are outmatched by the inter-condition variability, (as indicated by our initially strong statistical results at the population levels), thus showing that, even though we have different responses depending on the subject, the effects observed at the population level cannot be only accounted for by the differences in subjects’ specificities.

      Lastly, the Reviewer questioned whether our results support that a single anodal prefrontal stimulation at the end of the anesthesia protocol could effectively promote rapid recovery from general anesthesia, because the stimulation did not wake the animals in our experiments. It should be emphasized that in our case, the monkeys were stimulated while they were still receiving continuous propofol perfusion. In contrast, during the recovery process from anesthesia, the delivery of the anesthetic drug is stopped. It is therefore conceivable that anodal PFC tDCS, which successfully enriched brain dynamics in sedated monkeys in our experiments, may accelerate the recovery from anesthesia when the drug is no longer administered. 

      Actions in the text: We have added a line in the Materials and Methods to compare to other studies:

      “Our sample size is comparable to previous work in NHP using fMRI (Milham et al. 2018; Yacoub et al. 2020; Uchimura, Kumano, and Kitazawa 2024).”

      Reviewing Editor Comments: 

      In some cases, authors opt to submit a revised manuscript. Should you choose to do so, please be aware that the reviewers have indicated that their appraisal is unlikely to change unless some of the suggested field modelling is incorporated into the work. This may change the evaluation of the strength of evidence, but the final wording will be subject to reviewer discretion. Details for responding to the reviews are provided at the bottom of this email.

      Reviewer #1 (Recommendations for the authors): 

      The work should discuss the implications of their experiments for using tDCS to arouse a patient from a coma. The anesthetized animal is effectively in a drug-induced coma. While they observed connectivity changes, these changes did not map nicely onto behavioral changes. 

      I would suggest that the authors spell out more clearly what they view as the clinical implications of their work in terms of new insights into how tDCS may be used to either understand and or treat disorders of consciousness.

      We thank the Reviewer for his thoughtful comments. We appreciate the opportunity to clarify and expand on the key findings and implications of our work, particularly regarding the new insights into how tDCS can be used to understand and treat disorders of consciousness. We therefore provide a broader perspective on the clinical implications of our experiments regarding coma and disorders of consciousness. We also agree with the Reviewer that the absence of behavioral changes but the presence of functional differences should be more clearly addressed. 

      Actions in the text: We have added a few lines about the relevance of anesthesia as a model for disorders of consciousness in the Introduction part:

      “Anesthesia provides a unique model for studying consciousness, which, similarly to DOC, is characterized by the disruption or even  the loss of consciousness (Luppi 2024). Additionally, anesthesia mechanisms involve several subcortical nuclei that are key components of the brain's sleep and arousal circuits (Kelz and Mashour 2019).”

      In the Discussion section, we have modified and expanded a paragraph about the effects of tDCS in DOC patients and how this technique could be further used to study consciousness: From another clinical perspective, our results demonstrating that 2 mA anodal PFC tDCS decreased the structure-function correlation and modified the dynamic repertoire of brain patterns during anesthesia (Figures 6 and 7) are consistent with the beneficial effects of such stimulation in DOC patients (Thibaut et al., 2014; Angelakis et al., 2014; Thibaut et al., 2017; Zhang et al., 2017; Martens et al., 2018; Cavinato et al., 2019; Wu et al., 2019; Hermann et al., 2020; Peng et al., 2022; Thibaut et al., 2023). Although some clinical trials investigated the effects of stimulating other brain regions, such as the motor cortex (Martens et al., 2019; Straudi et al., 2019) or the parietal cortex (Huang et al., 2017; Guo et al., 2019; Zhang et al., 2022; Wan et al., 2023; Wang et al., 2020), the DLPFC appears to be the most effective target for patients with a minimally conscious state (Liu et al., 2023). In terms of neuromodulatory effects in DOC patients, DLPFC tDCS has been reported to increase global excitability (Bai et al., 2017), increase the P300 amplitude (Zhang et al., 2017; Hermann et al., 2020), improve the fronto-parietal coherence in the theta band (Bai et al., 2018), enhance the putative EEG markers of consciousness (Bai et al., 2018; Hermann et al., 2020) and reduce the incidence of slow-waves in the resting state (Mensen et al., 2020). Our findings further support the PFC as a relevant target for modulating consciousness level and align with growing evidence showing that the PFC plays a key role in conscious access networks (Mashour, Pal, and Brown 2022; Panagiotaropoulos 2024). Nevertheless, we hypothesize that other brain targets for tDCS may be of interest for consciousness restoration, potentially using multi-channel tDCS (Havlík et al., 2023). Among transcranial electrical stimulation techniques, tDCS has the great advantage of facilitating either excitation or inhibition of brain regions, depending on the polarity of the stimulation (Sdoia et al., 2019) exploited this advantage to investigate the causal involvement of the DLPFC in conscious access to a visual stimulus during an attentional blink paradigm. While conscious access was enhanced by anodal stimulation of the left DLPFC compared to sham stimulation, opposite effects were found with cathodal stimulation compared to sham over the same locus. Finally, this literature and our findings suggest that tDCS constitutes a non-invasive, reversible, and powerful tool for studying consciousness.”

      We have added a new paragraph about patients with cognitive-motor dissociation and dissociation between consciousness and behavioral responsiveness:

      “Changes in the state of consciousness are generally closely associated with changes in behavioural responsiveness, although some rare cases of dissociation have been described. Cognitive-motor dissociation (CMD) is a condition observed in patients with severe brain injury, characterized by behavior consistent with unresponsive wakefulness syndrome or a minimally conscious state minus (Thibaut et al., 2019). However, in these patients, specific cortical brain areas activate in response to mental imagery tasks (e.g., imagining playing tennis or returning home) in a manner indistinguishable from that of healthy controls, as shown through fMRI or EEG (Thibaut et al., 2019; Owen et al., 2006; Monti et al., 2010; Bodien et al., 2024). Thus, although CMD patients are behaviorally unresponsive, they demonstrate cognitive awareness that is not outwardly apparent. It is worth noting that both the structure-function correlation and the rate of the pattern closest to the anatomy were shown to be significantly reduced in unresponsive patients showing command following during mental imagery tasks compared to those who do not show command following (Demertzi et al., 2019). These observations would be compatible with our findings in anesthetized macaques exposed to 2 mA anodal PFC tDCS. The richness of the brain dynamics would be recovered (at least partially, in our experiments), but not the behaviour. This hypothesis also fits with a recent longitudinal fMRI study on patients recovering from coma (Crone et al., 2020). The researchers examined two groups of patients: one group consisted of individuals who were unconscious at the acute scanning session but regained consciousness and improved behavioral responsiveness a few months later, and the second group consisted of patients who were already conscious from the start and only improved behavioral responsiveness at follow-up. By comparing these two groups, the authors could distinguish between the recovery of consciousness and the recovery of behavioral responsiveness. They demonstrated that only initially conscious patients exhibited rich brain dynamics at baseline. In contrast, patients who were unconscious in the acute phase and later regained consciousness had poor baseline dynamics, which became more complex at follow-up. Complete recovery of both consciousness and responsiveness under general anesthesia is possible through electrical stimulation of the central thalamus (Redinbaugh et al., 2020; Tasserie et al., 2022).”

      Reviewer #2 (Recommendations for the authors): 

      Method 

      (1) The authors mentioned that they used HD-tDCS in their experiments; however, they used 1 x 1 tDCS, which is not HD-tDCS but rather single-channel tDCS.

      We thank the Reviewing Editor for pointing out this ambiguous wording. We understand that "HD-tDCS", which we used in our paper to refer to high-density 1x1 tDCS (because we used small carbon electrodes instead of the large sponge electrodes employed in conventional tDCS), may cause some confusion with high-definition tDCS, which uses compact ring electrodes and most commonly refers to a 4x1 montage (1 active central electrode over the target area and 4 return electrodes placed around the central electrode).

      Therefore, to avoid any confusion, we will use the term "tDCS" rather than “HD-tDCS” to qualify the technique used in this paper and suppress mentions of high-density or high-definition tDCS.

      Actions in the text: We have replaced the abbreviation “HD-tDCS” with “tDCS” throughout the paper. We have also suppressed the sentence about high-definition tDCS in the Introduction (“While conventional tDCS relies on the use of relatively large rectangular pad electrodes, high-density tDCS (HD-tDCS) utilizes more compact ring electrodes, allowing for increased focality, stronger electric fields, and presumably, greater neurophysiological changes (Datta et al. 2009; Dmochowski et al. 2011)”) and the two related citations in the References section.

      (2) Please provide the characteristics of electrodes, including their size, shape, and thickness.

      We thank the Reviewing Editor for this recommendation. We now provide the complete characteristics of the tDCS electrodes used in the paper.

      Actions in the text: We have added a sentence describing the characteristics of the tDCS electrodes in the Materials and Methods section:

      “We used a 1x1 electrode montage with two carbon rubber electrodes (dimensions: 1.4 cm x 1.85 cm, 0.93  cm thick) inserted into Soterix HD-tES MRI electrode holders (base diameter: 25 mm; height: 10.5 mm), which are in contact with the scalp. These electrodes (2.59 cm2) are smaller than conventional tDCS sponge electrodes (typically 25 to 35 cm<sup>2</sup>).”

      (3) Could the authors clarify why they chose to stimulate the right DLPFC? Is there a specific rationale for this choice? Additionally, could the authors explain how they ensured that the stimulation targeted the DLPFC, given that the monkey cap might differ from human configurations? In many NHP studies, structural MRI is used to accurately determine electrode placement. Considering that a single channel F4 - O2 montage was used, even a small displacement of the frontal electrode laterally could result in the electric field not adequately covering the DLPFC. Could the authors provide structural MRI images and details of electrode positioning to help readers better understand targeting accuracy?

      We thank the Reviewing Editor for the thoughtful comments and recommendations. We appreciate the opportunity to further clarify our rationale for stimulating the right DLPFC and also the suggestion to provide structural MRI images and details of electrode positioning, which we think will improve the quality of the paper by showing targeting accuracy.

      First, we would like to clarify that our initial decision to stimulate the right PFC in most animals was driven by experimental constraints. Indeed, we had limited access to the left PFC in three of the four macaques, either due to the presence of cement (spreading asymmetrically from the centre of the head) used to fix the head post in awake animals or due to a scar in one of the two animals studied under anesthesia. 

      Second, we agree with the Reviewing Editor on the importance of showing details of electrode positioning and evidence of targeting accuracy across MRI sessions. Therefore, we now provide structural images showing the positions of anodal and cathodal electrodes in almost all acquired sessions: 10 sessions (out of 10) under anesthesia and 30 sessions in the awake state (out of 34 sessions, because we could not acquire structural images in four sessions). These images show that, in anesthesia experiments, the anodal electrode was positioned over the dorsal prefrontal cortex and the cathodal electrode was placed over the contralateral occipital cortex (at the level of the parieto–occipital junction) in both monkeys. In the awake state, the montage still targeted the prefrontal cortex and the occipital cortex, but with a slightly different placement. One of the electrodes was placed over the prefrontal cortex, closer to the premotor cortex than in anesthesia experiments, while the other one was placed over the occipital cortex (V1), slightly more posterior than in anesthesia experiments. These images therefore show that the placement was relatively accurate across sessions and reproducible between monkeys in each of the two arousal conditions.

      Actions in the text: We have added a supplementary file showing electrode positioning in 40 of the 44 acquired MRI sessions (Supplementary File 1). We have also added a new supplement figure (Figure 1 - figure supplement 1) showing electrode positioning in representative MRI sessions of the awake and anesthetized experiments in the main manuscript. 

      We added a few sentences referring to these figures in the Result section: 

      “Representative structural images showing electrode placements on the head of the two awake monkeys are shown in Figure 1 - figure supplement 1A). Supplementary File 1 displays the complete set of structural images, showing that the two electrodes were accurately placed over the prefrontal cortex and the occipital cortex in a reproducible manner across awake sessions.”

      Figure 1 - figure supplement 1. Structural images displaying electrode placements on the head of monkeys. A) Awake experiments. Representative sagittal, coronal and transverse MRI sections, and the corresponding skin reconstruction images showing the position of the prefrontal and the occipital electrodes on the head of monkeys J. and Y. B) Anesthesia experiments. Representative sagittal, coronal and transverse MRI sections, and the corresponding skin reconstruction images showing the position of the prefrontal and occipital electrodes over the occipital cortex on the head of monkeys R. and N.

      Supplementary File 1 (see attached file). Structural images showing the position of the tDCS electrodes on the monkey's head across sessions. Sagittal, coronal and transverse MRI sections, and corresponding skin reconstruction images showing the position of the prefrontal and occipital electrodes on the monkey's head for each MRI session (except for 4 sessions in which no anatomical scan was acquired). The two electrodes were accurately placed over the prefrontal cortex and the occipital cortex in a reproducible manner across sessions and between the two monkeys studied in each arousal state. In anesthesia experiments, the anodal electrode was placed over the dorsal prefrontal cortex, while the cathodal electrode was positioned over the parieto-occipital junction. In awake experiments, the prefrontal electrode was positioned over the dorsal prefrontal cortex/pre-motor cortex, while the occipital electrode was placed over the visual area 1. The position of the two electrodes differed slightly between the anesthetized and awake experiments due to different body positions (the prone position of the sedated monkeys prevented a more posterior position of the occipital electrode) and also due to the presence of a headpost on the head of the two monkeys in awake experiments (the monkeys we worked with in anesthesia experiments did not have an headpost).

      (4) If the authors did not analyze the data for the passive event-related auditory response, it may be helpful to remove the related sentence to avoid potential confusion for readers.

      We thank the Reviewing Editor for the comment. Although we understand the reviewer’s point of view, we decide to keep this information in the paper to inform the reader that the macaques were passively engaged in an auditory task, as this could have some influence on the brain state. In the Materials and Methods section, we already mentioned that the analysis of the cerebral responses to the auditory paradigm is not part of the paper. We have modified the sentence to make it clearer and to avoid potential confusion for readers.

      Actions in the text: We have modified the sentence referring to the passive event-related auditory response in the Materials and Methods section:

      “All fMRI data were acquired while the monkeys were engaged in a passive event-related auditory task, the local-global paradigm, which is based on local and global deviations from temporal regularities (Bekinschtein et al. 2009; Uhrig, Dehaene, and Jarraya 2014). The present paper does not address how tDCS perturbs cerebral responses to local and global deviants, which will be the subject of future work.”

      (5) Could the authors clarify what x(t) represents in the equation? Additionally, it would be better to number the equations.

      We apologize for the confusion,  x(t) represents the evolution of the BOLD signals over time. We have numbered the equations as suggested. 

      Actions in the text: We have added explanations about the notation and numerotation of equations.

      (6) It would be much better to provide schematic illustrations to explain what the authors did for analyzing fMRI data.

      We thank the Reviewing Editor for the suggestion and now provide a new figure as suggested.  

      Actions in the text: We have added a new figure (Figure 2) graphically showing the overall analysis performed. We have added a sentence about the new Figure 2 in the Results section:  “A graphical overview of the overall analysis is shown in Figure 2.” We have renumbered Figure 2 - supplement figures accordingly.

      Figure 2. fMRI Phase Coherence analysis. A) Left) Animals were scanned before, during and after PFC tDCS stimulation in the awake state (two macaques) or under deep propofol anesthesia (two macaques). Right) Example of Z-scored filtered BOLD time series for one macaque, 111 time points with a TR of 2.4 s. B) Hilbert transform of the z-scored BOLD signal of one ROI into its time-varying amplitude A(t) (red) and the real part of the phase φ (green). In blue, we recover the original z-scored BOLD signal as A(t)cos(φ). C) Example of the phase of the Hilbert transform for each brain region at one TR. D) Symmetric matrix of cosines of the phase differences between all pairs of brain regions. E) We concatenated the vectorized form of the triangular superior of the phase difference matrices for all TRs for all participants, in all the conditions for both datasets separately obtaining using the K-means algorithm, the brain patterns whose statistics are then analyzed in the different conditions.

      Results 

      (1) In Figures 3A, 5A, and 6A showing brain connectivity, it is difficult to relate the connectivity variability among the brain regions. Instead of displaying connection lines for nodes, it would be more effective if the authors highlighted significant, strong connectivity within specific brain regions using additional methods, such as bootstrapping.

      We thank the Reviewing Editor for the comment and suggestion. The connection lines indeed represent all the synchronizations above 0.5 and all the anti-synchronization below -0.5 between all pairs of brain regions. As suggested, another element we haven’t addressed is the heterogeneity in coherences between individual brain regions. We hence propose additional supplementary figures showing, for all centroids mentioned in main figures, the variance in phase-based connectivity of the distributions of coherence of all brain regions to the rest of the brain. High value would then indicate a wide range of values of coherence, while low would indicate the different coherence a region has with the rest of the brain have similar values. Thus, a brain with uniform color would indicate high homogeneity in coherence among brain regions, while sharp changes in colors would reveal that certain regions are more subject to high variance in their coherence distributions. We expect this new figure to more clearly expose the connectivity variability among the brain regions.

      Actions in the text: We have added new figures showing, for all centroids mentioned in the main figures, the variances in phase-based connectivity of the distributions of coherence  (Figure 3 - figure supplement 3;  Figure 5 - figure supplement 2; Figure 6 - figure supplement 3; Figure 7 - figure supplement 2). One of them is shown below for the only awake analysis (Figure 3 - figure supplement 3).

      Figure 3 - figure supplement 3. Variance in inter-region phase coherences of brain patterns. Low values (red and light red) indicate that the distribution of synchronizations between a brain region and the rest of the brain has relatively low variance, while high values (blue and light blue) indicate relatively high variance. Are displayed both supra (top) and subdorsal (bottom) views for each brain pattern from the main figure, ordered similarly as previously: from left (1) to right (6) as their respective SFC increases. 

      We added a few sentences about variances in phase-based connectivity of the distributions of coherence in the Result section: 

      “Further investigation of the variances in inter-region phase coherences of brain patterns, presented in Figure 3 - figure supplement 3, revealed two main findings. First, all the patterns exhibited some degree of lateral symmetry. Second, except for the pattern with the highest SFC, most patterns displayed high heterogeneity in their coherence variances and striking inter-pattern differences. These observations reflect both the segmentation of distinct functional networks across patterns and a topological organization within the patterns themselves: some regions showed a broader spectrum of synchrony with the rest of the brain, while others exhibited narrower distributions of coherence variances. For instance, unlike other brain patterns, pattern 5 was characterized by a high coherence variance in the frontal premotor areas and low variance in the occipital cortex, whereas pattern 3 had a high variance in the frontal and orbitofrontal regions. In addition, we performed the main analyses separately for the two monkeys, explored the inter-condition variability (Supplementary File 2), and computed classical measures of functional connectivity such as average FC matrices and functional graph properties (modularity, efficiency and density) of the visited FC states (Supplementary File 3).”

      “The variance in inter-regional phase coherence across brain patterns showed notably that pattern 4, in contrast to most other patterns, was characterized by a high variance in frontal premotor areas and a low variance in the occipital cortex (Figure 5 - figure supplement 2)." 

      “The variance in inter-region phase coherences of the brain patterns is displayed in Figure 6 - figure supplement 3 and showed a striking heterogeneity between the patterns. For example, pattern 5 had a low overall variance (except in the frontal cortex), while pattern 1 was the only pattern with a high variance in the occipital cortex.”

      “The variance in inter-region phase coherences of brain patterns is displayed in Figure 6 - figure supplement 2.”

      (2) For both conditions, only 2 to 3 out of 6 patterns showed significant effects of tDCS on the occurrence rate. Is it sufficient to claim the authors' conclusion?

      We thank the Reviewer Editor for the comment. We would like to point out that similar kinds of differences in the occurrence rates of specific brain patterns (particularly in patterns at the extremities of the SFC scale) have already been reported previously. Prior works in patients suffering from disorders of consciousness, in healthy humans or in non-human primates,  have shown, by using a similar method of analysis, that not all brain states are equally disturbed by loss of consciousness, even in different modalities of unconscious transitioning (Luppi et al. 2021; Z. Huang et al. 2020; Demertzi et al. 2019; Castro et al. 2023; Golkowski et al. 2019; Barttfeld et al. 2015). Therefore, yes we believe that our conclusions are still supported by the results.

      (3) If the authors want to assert that the brain state significantly influences the effects of tDCS as discussed in the manuscript, further analysis is necessary. First, it would be great to show the difference in connectivity between two consciousness conditions during the baseline (resting state) to see how resting state connectivity or structural connectivity varies. Second, demonstrating the difference in connectivity between the awake and anesthetized conditions (e.g., awake during cathodal vs. anesthetized cathodal) to show how the connectivity among the brain regions was changed by the brain state during tDCS. This would strengthen the authors' conclusion.

      We thank the reviewer for this comment. Firstly, we’d like to clarify that the structural connectivity doesn’t change from one session to another in the same animal and minimally between subjects. Secondly, we agree with the Reviewing Editor that it is informative to show the differences between the baselines and this is what we have done. The results are shown in Figures 5 and 7. Regarding the comparison of the stimulating conditions across arousal levels, the only contrast that we could make is to compare 2 mA anodal awake with 2 mA anodal anesthetized (during and post-stimulation). However, as 2 mA anodal stimulation in the awake state did not affect the connectivity much (compared to the awake baseline), the results would be almost similar to the comparison of the awake baseline with 2 mA anodal anesthetized, which is shown in Figure 7. Therefore, we believe that this would result in minimal informative gains and even more redundancy. 

      Reviewer #3 (Recommendations for the authors): 

      Introduction, par 2: HD-tDCS does not necessarily produce stronger electric fields (E-fields) in the brain. The E-field is largely montage-dependent, and some configurations such as the 4x1 configuration can actually have weaker E-fields compared to conventional tDCS designs (i.e., with two sponge electrodes) as electrodes are often closer together resulting in more current being shunted by skull, scalp, and CSF. I would consider re-phrasing this section.

      We agree with the Reviewer Editor that high-definition tDCS does not necessarily produce stronger electric fields in the brain and apologize for the confusion caused by our use of HD-tDCS to refer to high-density tDCS. To avoid any confusion, we have removed the sentence mentioning that HD-tDCS produces stronger electric fields. 

      Actions in the text: We have removed the sentence about high-definition tDCS in the Introduction (“While conventional tDCS relies on the use of relatively large rectangular pad electrodes, high-density tDCS (HD-tDCS) utilizes more compact ring electrodes, allowing for increased focality, stronger electric fields, and presumably, greater neurophysiological changes (Datta et al. 2009; Dmochowski et al. 2011)”) and the two related citations in the References section.

    1. Author response:

      General Statements:

      The formation of three-dimensional tubes is a fundamental process in the development of organs and aberrant tube size leads to common diseases and congenital disorders, such as polycystic kidney disease, asthma, and lung hypoplasia. The apical (luminal) extracellular matrix (ECM) plays a critical role in epithelial tube morphogenesis during organ formation, but its composition and organization remain poorly understood. Using the Drosophila embryonic salivary gland as a model, we reveal a critical role for the PAPS Synthetase (Papss), an enzyme that synthesizes the universal sulfate donor PAPS, as a critical regulator of tube lumen expansion. Additionally, we identify two zona pellucida (ZP) domain proteins, Piopio (Pio) and Dumpy (Dpy) as key apical ECM components that provide mechanical support to maintain a uniform tube diameter.

      The apical ECM has a distinct composition compared to the basal ECM, featuring a diverse array of components. Many studies of the apical ECM have focused on the role of chitin and its modification, but the composition of the non-chitinous apical ECM and its role, and how modification of the apical ECM affects organogenesis remain elusive. The main findings of this manuscript are listed below.

      (1) Through a deficiency screen targeting ECM-modifying enzymes, we identify Papss as a key enzyme regulating luminal expansion during salivary gland morphogenesis. 

      (2) Our confocal and transmission electron microscopy analyses reveal that Papss mutants exhibit a disorganized apical membrane and condensed aECM, which are at least partially linked to disruptions in Golgi structures and intracellular trafficking. Papss is also essential for cell survival and basal ECM integrity, highlighting the role of sulfation in regulating both apical and basal ECM.

      (3) Salivary gland-specific overexpression of wild-type Papss rescues all defects in Papss mutants, but the catalytically inactive mutant form does not, suggesting that defects in sulfation are the underlying cause of the phenotypes.

      (4) We identify two ZP domain proteins, Piopio (Pio) and Dumpy (Dpy), as key components of the salivary gland aECM. In the absence of Papss, Pio is progressively lost from the aECM, while the Dpy-positive aECM structure is condensed and detaches from the apical membrane, resulting in a narrowed lumen. 

      (5) Mutations in pio or dpy, or in Notopleural (Np), which encodes a matriptase that cleaves Pio, cause the salivary gland lumen to develop alternating bulges and constrictions. Additionally, loss of pio results in loss of Dpy in the salivary gland lumen, suggesting that the Dpycontaining filamentous structures of the aECM is critical for maintaining luminal diameter, with Pio playing an essential role in organizing this structure.

      (6) We further reveal that the cleavage of the ZP domain of Pio by Np is critical for the role of Pio in organizing the aECM structure.

      Overall, our findings underscore the essential role of sulfation in organizing the aECM during tubular organ formation and highlight the mechanical support provided by ZP domain proteins in maintaining tube diameter. Mammals have two isoforms of Papss, Papss1 and Papss2. Papss1 shows ubiquitous expression, with higher levels in glandular cells and salivary duct cells, suggesting a high requirement for sulfation in these cell types. Papss2 shows a more restricted expression, such as in cartilage, and mutations in Papss2 have been associated with skeletal dysplasia in humans. Our analysis of the Drosophila Papss gene, a single ortholog of human Papss1 and Papss2, reveals its multiple roles during salivary gland development. We expect that these findings will provide valuable insights into the function of these enzymes in normal development and disease in humans. Our findings on the key role of two ZP proteins, Pio and Dpy, as major components of the salivary gland aECM also provide valuable information on the organization of the non-chitinous aECM during organ formation.

      We believe that our results will be of broad interest to many cell and developmental biologists studying organogenesis and the ECM, as well as those investigating the mechanisms underlying human diseases associated with conserved mutations.

      Point-by-point description of the revisions:

      We are delighted that all three reviewers were enthusiastic about the work. Their comments and suggestions have improved the paper. The details of the changes we have made in response to each reviewer’s comments are included in italicized text below.

      Reviewer #1 (Evidence, reproducibility and clarity):

      PAPS is required for all sulfotransferase reactions in which a sulfate group is covalently attached to amino acid residues of proteins or to side chains of proteoglycans. This sulfation is crucial for properly organizing the apical extracellular matrix (aECM) and expanding the lumen in the Drosophila salivary gland. Loss of Papss potentially leads to decreased sulfation, disorganizing the aECM, and defects in lumen formation. In addition, Papss loss destabilizes the Golgi structures.

      In Papss mutants, several changes occur in the salivary gland lumen of Drosophila. The tube lumen is very thin and shows irregular apical protrusions. There is a disorganization of the apical membrane and a compaction of the apical extracellular matrix (aECM). The Golgi structures and intracellular transport are disturbed. In addition, the ZP domain proteins Piopio (Pio) and Dumpy (Dpy) lose their normal distribution in the lumen, which leads to condensation and dissociation of the Dpy-positive aECM structure from the apical membrane. This results in a thin and irregularly dilated lumen.

      (1) The authors describe various changes in the lumen in mutants, from thin lumen to irregular expansion. I would like to know the correct lumen diameter, and length, besides the total area, by which one can recognize thin and irregular.

      We have included quantification of the length and diameter of the salivary gland lumen in the stage 16 salivary glands of control, Papss mutant, and salivary gland-specific rescue embryos (Figure 1J, K). As described, Papss mutant embryos have two distinct phenotypes, one group with a thin lumen along the entire lumen and the other group with irregular lumen shapes. Therefore, we separated the two groups for quantification of lumen diameter. Additionally, we have analyzed the degree of variability for the lumen diameter to better capture the range of phenotypes observed (Figure 1K’). These quantifications enable a more precise assessment of lumen morphology, allowing readers to distinguish between thin and irregular lumen phenotypes.

      (2) The rescue is about 30%, which is not as good as expected. Maybe the wrong isoform was taken. Is it possible to find out which isoform is expressed in the salivary glands, e.g., by RNA in situ Hyb? This could then be used to analyze a more focused rescue beyond the paper.

      Thank you for this point, but we do not agree that the rescue is about 30%. In Papss mutants, about 50% of the embryos show the thin lumen phenotype whereas the other 50% show irregular lumen shapes. In the rescue embryos with a WT Papss, few embryos showed thin lumen phenotypes. About 40% of the rescue embryos showed “normal, fully expanded” lumen shapes, and the remaining 60% showed either irregular (thin+expanded) or slightly overexpanded lumen. It is not uncommon that rescue with the Gal4/UAS system results in a partial rescue because it is often not easy to achieve the balance of the proper amount of the protein with the overexpression system. 

      To address the possibility that the wrong isoform was used, we performed in situ hybridization to examine the expression of different Papss spice forms in the salivary gland. We used probes that detect subsets of splice forms: A/B/C/F/G, D/H, and E/F/H, and found that all probes showed expression in the salivary gland, with varying intensities. The original probe, which detects all splice forms, showed the strongest signals in the salivary gland compared to the new probes which detect only a subset. However, the difference in the signal intensity may be due to the longer length of the original probe (>800 bp) compared to other probes that were made with much smaller regions (~200 bp). Digoxigenin in the DIG labeling kit for mRNA detection labels the uridine nucleotide in the transcript, and the probes with weaker signals contain fewer uridines (all: 147; ABCFG, 29; D, 36; EFH, 66). We also used the Papss-PD isoform, for a salivary gland-specific rescue experiment and obtained similar results to those with Papss-PE (Figure 1I-L, Figure 4D and E). 

      Furthermore, we performed additional experiments to validate our findings. We performed a rescue experiment with a mutant form of Papss that has mutations in the critical rescues of the catalytic domains of the enzyme, which failed to rescue any phenotypes, including the thin lumen phenotype (Figure 1H, J-L), the number and intensity of WGA puncta (Figure 3I, I’), and cell death (Figure 4D, E). These results provide strong evidence that the defects observed in Papss mutants are due to the lack of sulfation.  

      (3) Crb is a transmembrane protein on the apicolateral side of the membrane. Accordingly, the apicolateral distribution can be seen in the control and the mutant. I believe there are no apparent differences here, not even in the amount of expression. However, the view of the cells (frame) shows possible differences. To be sure, a more in-depth analysis of the images is required. Confocal Z-stack images, with 3D visualization and orthogonal projections to analyze the membranes showing Crb staining together with a suitable membrane marker (e.g. SAS or Uif). This is the only way to show whether Crb is incorrectly distributed. Statistics of several papas mutants would also be desirable and not just a single representative image. When do the observed changes in Crb distribution occur in the development of the tubes, only during stage 16? Is papss only involved in the maintenance of the apical membrane? This is particularly important when considering the SJ and AJ, because the latter show no change in the mutants.

      We appreciate your suggestion more thoroughly analyze Crb distribution. We adapted a method from a previous study (Olivares-Castiñeira and Llimargas, 2017) to quantify Crb signals in the subapical region and apical free region of salivary gland cells. Using E-Cad signals as a reference, we marked the apical cell boundaries of individual cells and calculated the intensity of Crb signals in the subapical region (along the cell membrane) and in the apical free region. We focused on the expanded region of the SG lumen in Papss mutants for quantification, as the thin lumen region was challenging to analyze. This quantification is included in Figure 2D. Statistical analysis shows that Crb signals were more dispersed in SG cells in Papss mutants compared to WT.

      (4) A change in the ECM is only inferred based on the WGA localization. This is too few to make a clear statement. WGA is only an indirect marker of the cell surface and glycosylated proteins, but it does not indicate whether the ECM is altered in its composition and expression. Other important factors are missing here. In addition, only a single observation is shown, and statistics are missing.

      We understand your concern that WGA localization alone may not be sufficient to conclude changes in the ECM. However, we observed that luminal WGA signals colocalize with Dpy-YFP in the WT SG (Figure 5-figure supplement 2C), suggesting that WGA detects the aECM structure containing Dpy. The similar behavior of WGA and Dpy-YFP signals in multiple genotypes further supports this idea. In Papss mutants with a thin lumen phenotype, both WGA and Dpy-YFP signals are condensed (Figure 5E-H), and in pio mutants, both are absent from the lumen (Figure 6B, D). We analyzed WGA signals in over 25 samples of WT and Papss mutants, observing consistent phenotypes. We have included the number of samples in the text. While we acknowledge that WGA is an indirect marker, our data suggest that it is a reliable indicator of the aECM structure containing Dpy. 

      (5) Reduced WGA staining is seen in papss mutants, but this could be due to other circumstances. To be sure, a statistic with the number of dots must be shown, as well as an intensity blot on several independent samples. The images are from single confocal sections. It could be that the dots appear in a different Z-plane. Therefore, a 3D visualization of the voxels must be shown to identify and, at best, quantify the dots in the organ.

      We have quantified cytoplasmic punctate WGA signals. Using spinning disk microscopy with super-resolution technology (Olympus SpinSR10 Sora), we obtained high-resolution images of cytoplasmic punctate signals of WGA in WT, Papss mutant, and rescue SGs with the WT and mutant forms of Papss-PD. We then generated 3D reconstructed images of these signals using Imaris software (Figure 3E-H) and quantified the number and intensity of puncta. Statistical analysis of these data confirms the reduction of the number and intensity of WGA puncta in Papss mutants (Figure 3I, I’). The number of WGA puncta was restored by expressing WT Papss but not the mutant form. By using 3D visualization and quantification, we have ensured that our results are not limited to a single confocal section and account for potential variations in Z-plane localization of the dots.

      (6) A colocalization analysis (statistics) should be shown for the overlap of WGA with ManII-GFP.

      Since WGA labels multiple structures, including the nuclear envelope and ECM structures, we focused on assessing the colocalization of the cytoplasmic WGA punctate signals and ManIIGFP signals. Standard colocalization analysis methods, such as Pearson’s correlation coefficient or Mander’s overlap coefficient, would be confounded by WGA signals in other tissues. Therefore, we used a fluorescent intensity line profile to examine the spatial relationship between WGA and ManII-GFP signals in WT and Papss mutants (Figure 3L, L’). 

      (7) I do not understand how the authors describe "statistics of secretory vesicles" as an axis in Figure 3p. The TEM images do not show labeled secretory vesicles but empty structures that could be vesicles.

      Previous studies have analyzed “filled” electron-dense secretory vesicles in TEM images of SG cells (Myat and Andrew, 2002, Cell; Fox et al., 2010, J Cell Biol; Chung and Andrew, 2014, Development). Consistent with these studies, our WT TEM images show these vesicles. In contrast, Papss mutants show a mix of filled and empty structures. For quantification, we specifically counted the filled electron-dense vesicles (now Figure 3W). A clear description of our analysis is provided in the figure legend.

      (8) The quality of the presented TEM images is too low to judge any difference between control and mutants. Therefore, the supplement must present them in better detail (higher pixel number?).

      We disagree that the quality of the presented TEM images is too low. Our TEM images have sufficient resolution to reveal details of many subcellular structures, such as mitochondrial cisternae. The pdf file of the original submission may not have been high resolution. To address this concern, we have provided several original high-quality TEM images of both WT and Papss mutants at various magnifications in Figure 2-figure supplement 2. Additionally, we have included low-magnification TEM images of WT and Papss mutants in Figure 2H and I to provide a clearer view of the overall SG lumen morphology. 

      (9) Line 266: the conclusion that apical trafficking is "significantly impaired" does not hold. This implies that Papss is essential for apical trafficking, but the analyzed ECM proteins (Pio, Dumpy) are found apically enriched in the mutants, and Dumpy is even secreted. Moreover, they analyze only one marker, Sec15, and don't provide data about the quantification of the secretion of proteins.

      We agree and have revised our statement to “defective sulfation affects Golgi structures and multiple routes of intracellular trafficking”. 

      (10) DCP-1 was used to detect apoptosis in the glands to analyze acellular regions. However, the authors compare ST16 control with ST15 mutant salivary glands, which is problematic. Further, it is not commented on how many embryos were analyzed and how often they detect the dying cells in control and mutant embryos. This part must be improved.

      Thank you for the comment. We agree and have included quantification. We used stage 16 samples from WT and Papss mutants to quantify acellular regions. Since DCP-1 signals are only present at a specific stage of apoptosis, some acellular regions do not show DCP-1 signals. Therefore, we counted acellular regions regardless of DCP-1 signals. We also quantified this in rescue embryos with WT and mutant forms of Papss, which show complete rescue with WT and no rescue with the mutant form, respectively. The graph with a statistical analysis is included (Figure 4D, E).

      (11) WGA and Dumpy show similar condensed patterns within the tube lumen. The authors show that dumpy is enriched from stage 14 onwards. How is it with WGA? Does it show the same pattern from stage 14 to 16? Papss mutants can suffer from a developmental delay in organizing the ECM or lack of internalization of luminal proteins during/after tube expansion, which is the case in the trachea.

      Dpy-YFP and WGA show overlapping signals in the SG lumen throughout morphogenesis. DpyYFP is SG enriched in the lumen from stage 11, not stage 14 (Figure 5-figure supplement 2). WGA is also detected in the lumen throughout SG morphogenesis, similar to Dpy. In the original supplemental figure, only a stage 16 SG image was shown for co-localization of Dpy-YFP and WGA signals in the SG lumen. We have now included images from stage 14 and 15 in Figure 5figure supplement 2C. 

      Given that luminal Pio signals are lost at stage 16 only and that Dpy signals appear as condensed structures in the lumen of Papss mutants, it suggests that the internalization of luminal proteins is not impaired in Papss mutants. Rather, these proteins are secreted but fail to organize properly. 

      (12) Line 366. Luminal morphology is characterized by bulging and constrictions. In the trachea, bulges indicate the deformation of the apical membrane and the detachment from the aECM. I can see constrictions and the collapsed tube lumen in Fig. 6C, but I don't find the bulges of the apical membrane in pio and Np mutants. Maybe showing it more clearly and with better quality will be helpful.

      Since the bulging phenotype appears to vary from sample to sample, we have revised the description of the phenotype to “constrictions” to more accurately reflect the consistent observations. We quantified the number of constrictions along the entire lumen in pio and Np mutants and included the graph in Figure 6F.

      (13) The authors state that Papss controls luminal secretion of Pio and Dumpy, as they observe reduced luminal staining of both in papss mutants. However, the mCh-Pio and Dumpy-YFP are secreted towards the lumen. Does papss overexpression change Pio and Dumpy secretion towards the lumen, and could this be another explanation for the multiple phenotypes? 

      Thank you for the comment. To clarify, we did not observe reduced luminal staining of Pio and Dpy in Papss mutants, nor did we state that Papss controls luminal secretion of Pio and Dpy. In Papss mutants, Pio luminal signals are absent specifically at stage 16 (Figure 5H), whereas strong luminal Pio signals are present until stage 15 (Figure 5G). For Dpy-YFP, the signals are not reduced but condensed in Papss mutants from stages 14-16 (Figure 5D, H). 

      It remains unclear whether the apparent loss of Pio signals is due to a loss of Pio protein in the lumen or due to epitope masking resulting from protein aggregation or condensation. As noted in our response to Comment 11 internalization of luminal proteins seems unaffected in Papss mutants; proteins like Pio and Dpy are secreted into the lumen but fail to properly organize. Therefore, we have not tested whether Papss overexpression alters the secretion of Pio or Dpy.

      In our original submission, we incorrectly stated that uniform luminal mCh-Pio signals were unchanged in Papss mutants. Upon closer examination, we found these signals are absent in the expanded luminal region in stage 16 SG (where Dpy-YFP is also absent), and weak mCh-Pio signals colocalize with the condensed Dpy-YFP signals (Figure 5C, D). We have revised the text accordingly. 

      Regulation of luminal ZP protein level is essential to modulate the tube expansion; therefore, Np releases Pio and Dumpy in a controlled manner during st15/16. Thus, the analysis of Pio and Dumpy in NP overexpression embryos will be critical to this manuscript to understand more about the control of luminal ZP matrix proteins.

      Thanks for the insightful suggestion. We overexpressed both the WT and mutant form of Np using UAS-Np.WT and UAS-Np.S990A lines (Drees et al., 2019) and analyzed mCh-Pio, Pio antibody, and Dpy-YFP signals. It is important to note that these overexpression experiments were done in the presence of the endogenous WT Np. 

      Overexpression of Np.WT led to increased levels of mCh-Pio, Pio, and Dpy-YFP signals in the lumen and at the apical membrane. In contrast, overexpression of Np.S990A resulted in a near complete loss of luminal mCh-Pio signals. Pio antibody signals remained strong at the apical membrane but was weaker in the luminal filamentous structures compared to WT. 

      Due to the GFP tag present in the UAS-Np.S990A line, we could not reliably analyze Dpy-YFP signals because of overlapping fluorescent signals in the same channel. However, the filamentous Pio signals in the lumen co-localized with GFP signals, suggesting that these structures might also include Dpy-YFP, although this cannot be confirmed definitively. 

      These results suggest that overexpressed Np.S990A may act in a dominant-negative manner, competing with endogenous Np and impairing proper cleavage of Pio (and mCh-Pio). Nevertheless, some level of cleavage by endogenous Np still appears to occur, as indicated by the residual luminal filamentous Pio signals. These new findings have been incorporated into the revised manuscript and are shown in Figure 6H and 6I.

      (14) Minor:

      Fig. 5 C': mChe-Pio and Dumpy-YFP are mixed up at the top of the images.

      Thanks for catching this error.  It has been corrected.

      Sup. Fig7. A shows Pio in purple but B in green. Please indicate it correctly.

      It has been corrected.

      Reviewer #1 (Significance):

      In 2023, the functions of Pio, Dumpy, and Np in the tracheal tubes of Drosophila were published. The study here shows similar results, with the difference that the salivary glands do not possess chitin, but the two ZP proteins Pio and Dumpy take over its function. It is, therefore, a significant and exciting extension of the known function of the three proteins to another tube system. In addition, the authors identify papss as a new protein and show its essential function in forming the luminal matrix in the salivary glands. Considering the high degree of conservation of these proteins in other species, the results presented are crucial for future analyses and will have further implications for tubular development, including humans.

      Reviewer #2 (Evidence, reproducibility and clarity):

      Summary:

      There is growing appreciation for the important of luminal (apical) ECM in tube development, but such matrices are much less well understood than basal ECMs. Here the authors provide insights into the aECM that shapes the Drosophila salivary gland (SG) tube and the importance of PAPSS-dependent sulfation in its organization and function.

      The first part of the paper focuses on careful phenotypic characterization of papss mutants, using multiple markers and TEM. This revealed reduced markers of sulfation (Alcian Blue staining) and defects in both apical and basal ECM organization, Golgi (but not ER) morphology, number and localization of other endosomal compartments, plus increased cell death. The authors focus on the fact that papss mutants have an irregular SG lumen diameter, with both narrowed regions and bulged regions. They address the pleiotropy, showing that preventing the cell death and resultant gaps in the tube did not rescue the SG luminal shape defects and discussing similarities and differences between the papss mutant phenotype and those caused by more general trafficking defects. The analysis uses a papss nonsense mutant from an EMS screen - I appreciate the rigorous approach the authors took to analyze transheterozygotes (as well as homozygotes) plus rescued animals in order to rule out effects of linked mutations.

      The 2nd part of the paper focuses on the SG aECM, showing that Dpy and Pio ZP protein fusions localize abnormally in papss mutants and that these ZP mutants (and Np protease mutants) have similar SG lumen shaping defects to the papss mutants. A key conclusion is that SG lumen defects correlate with loss of a Pio+Dpy-dependent filamentous structure in the lumen. These data suggest that ZP protein misregulation could explain this part of the papss phenotype.

      Overall, the text is very well written and clear. Figures are clearly labeled. The methods involve rigorous genetic approaches, microscopy, and quantifications/statistics and are documented appropriately. The findings are convincing, with just a few things about the fusions needing clarification.

      Minor comments

      (1) Although the Dpy and Qsm fusions are published reagents, it would still be helpful to mention whether the tags are C-terminal as suggested by the nomenclature, and whether Westerns have been performed, since (as discussed for Pio) cleavage could also affect the appearance of these fusions.

      Thanks for the comment. Dpy-YFP is a knock-in line in which YFP is inserted into the middle of the dpy locus (Lye et al., 2014; the insertion site is available on Flybase). mCh-Qsm is also a knock-in line, with mCh inserted near the N-terminus of the qsm gene using phi-mediated recombination using the qsm<sup>MI07716</sup> line (Chu and Hayashi, 2021; insertion site available on Flybase). Based on this, we have updated the nomenclature from Qsm-mCh to mCh-Qsm throughout the manuscript to accurately reflect the tag position. To our knowledge, no western blot has been performed on Dpy-YFP or mCh-Qsm lines. We have mentioned this explicitly in the Discussion.  

      (2) The Dpy-YFP reagent is a non-functional fusion and therefore may not be a wholly reliable reporter of Dpy localization. There is no antibody confirmation. As other reagents are not available to my knowledge, this issue can be addressed with text acknowledgement of possible caveats.

      Thanks for raising this important point. We have added a caveat in the Discussion noting this limitation and the need for additional tools, such as an antibody or a functional fusion protein, to confirm the localization of Dpy.

      (3) TEM was done by standard chemical fixation, which is fine for viewing intracellular organelles, but high pressure freezing probably would do a better job of preserving aECM structure, which looks fairly bad in Fig. 2G WT, without evidence of the filamentous structures seen by light microscopy. Nevertheless, the images are sufficient for showing the extreme disorganization of aECM in papss mutants.

      We agree that HPF is a better method and intent to use the HPF system in future studies. We acknowledge that chemical fixation contributes to the appearance of a gap between the apical membrane and the aECM, which we did not observe in the HPF/FS method (Chung and Andrew, 2014). Despite this, the TEM images still clearly reveal that Papss mutants show a much thinner and more electron-dense aECM compared to WT (Figure 2H, I), consistent to the condensed WGA, Dpy, and Pio signals in our confocal analyses. As the reviewer mentioned, we believe that the current TEM data are sufficient to support the conclusion of severe aECM disorganization and Golgi defects in Papss mutants.

      (4) The authors may consider citing some of the work that has been done on sulfation in nematodes, e.g. as reviewed here: https://pubmed.ncbi.nlm.nih.gov/35223994/ Sulfation has been tied to multiple aspects of nematode aECM organization, though not specifically to ZP proteins.

      Thank you for the suggestion. Pioneering studies in C. elegans have highlighted the key role of sulfation in diverse developmental processes, including neuronal organization, reproductive tissue development, and phenotypic plasticity. We have now cited several works.  

      Reviewer #2 (Significance):

      This study will be of interest to researchers studying developmental morphogenesis in general and specifically tube biology or the aECM. It should be particularly of interest to those studying sulfation or ZP proteins (which are broadly present in aECMs across organisms, including humans).

      This study adds to the literature demonstrating the importance of luminal matrix in shaping tubular organs and greatly advances understanding of the luminal matrix in the Drosophila salivary gland, an important model of tubular organ development and one that has key matrix differences (such as no chitin) compared to other highly studied Drosophila tubes like the trachea.

      The detailed description of the defects resulting from papss loss suggests that there are multiple different sulfated targets, with a subset specifically relevant to aECM biology. A limitation is that specific sulfated substrates are not identified here (e.g. are these the ZP proteins themselves or other matrix glycoproteins or lipids?); therefore it's not clear how direct or indirect the effects of papss are on ZP proteins. However, this is clearly a direction for future work and does not detract from the excellent beginning made here.

      My expertise: I am a developmental geneticist with interests in apical ECM

      Reviewer #3 (Evidence, reproducibility and clarity):

      In this work Woodward et al focus on the apical extracellular matrix (aECM) in the tubular salivary gland (SG) of Drosophila. They provide new insights into the composition of this aECM, formed by ZP proteins, in particular Pio and Dumpy. They also describe the functional requirements of PAPSS, a critical enzyme involved in sulfation, in regulating the expansion of the lumen of the SG. A detailed cellular analysis of Papss mutants indicate defects in the apical membrane, the aECM and in Golgi organization. They also find that Papss control the proper organization of the Pio-Dpy matrix in the lumen. The work is well presented and the results are consistent.

      Main comments

      - This work provides a detailed description of the defects produced by the absence of Papss. In addition, it provides many interesting observations at the cellular and tissular level. However, this work lacks a clear connection between these observations and the role of sulfation. Thus, the mechanisms underlying the phenotypes observed are elusive. Efforts directed to strengthen this connection (ideally experimentally) would greatly increase the interest and relevance of this work.

      Thank you for this thoughtful comment. To directly test whether the phenotypes observed in Papss mutants are due to the loss of sulfation activity, we generated transgenic lines expressing catalytically inactive forms of Papss, UAS-PapssK193A, F593P, in which key residues in the APS kinase and ATP sulfurylase domains are mutated. Unlike WT UAS-Papss (both the Papss-PD or Papss-PE isoforms), the catalytically inactive UAS-Papssmut failed to rescue any of the phenotypes, including the thin lumen phenotype (Figure 1I-L), altered WGA signals (Figure I, I’) and the cell death phenotype (Figure 4D, E). These findings strongly support the conclusion that the enzymatic sulfation activity of Papss is essential for the developmental processes described in this study.  

      - A main issue that arises from this work is the role of Papss at the cellular level. The results presented convincingly indicate defects in Golgi organization in Papss mutants. Therefore, the defects observed could stem from general defects in the secretion pathway rather than from specific defects on sulfation. This could even underly general/catastrophic cellular defects and lead to cell death (as observed).

      This observation has different implications. Is this effect observed in SGs also observed in other cells in the embryo? If Papss has a general role in Golgi organization this would be expected, as Papss encodes the only PAPs synthatase in Drosophila.

      Can the authors test any other mutant that specifically affect Golgi organization and investigate whether this produces a similar phenotype to that of Papss?

      Thank you for the comment. To address whether the defects observed in Papss mutants stem from general disruption of the secretory pathway due to Golgi disorganization, we examined mutants of two key Golgi components: Grasp65 and GM130. 

      In Grasp65 mutants, we observed significant defects in SG lumen morpholgy, including highly irregular SG lumen shape and multiple constrictions (100%; n=10/10). However, the lumen was not uniformly thin as in Papss mutants. In contrast, GM130 mutants–although this line was very sick and difficult to grow–showed relatively normal salivary glands morphology in the few embryos that survived to stage 16 (n=5/5). It is possible that only embryos with mild phenotypes progressed to this stages, limiting interpretation. These data have now been included in Figure 3-figure supplement 2. Overall, while Golgi disruption can affect SG morphology, the specific phenotypes seen in Papss mutants are not fully recapitulated by Grasp65 or GM130 loss. 

      - A model that conveys the different observations and that proposes a function for Papss in sulfation and Golgi organization (independent or interdependent?) would help to better present the proposed conclusions. In particular, the paper would be more informative if it proposed a mechanism or hypothesis of how sulfation affects SG lumen expansion. Is sulfation regulating a factor that in turn regulates Pio-Dpy matrix? Is it regulating Pio-Dpy directly? Is it regulating a

      product recognized by WGA?

      For instance, investigating Alcian blue or sulfotyrosine staining in pio, dpy mutants could help to understand whether Pio, Dpy are targets of sulfation.

      Thank you for the comment. We’re also very interested in learning whether the regulation of the Pio-Dpy matrix is a direct or indirect consequence of the loss of sulfation on these proteins. One possible scenario is that sulfation directly regulates the Pio-Dpy matrix by regulating protein stability through the formation of disulfide bonds between the conserved Cys residues responsible for ZP module polymerization. Additionally, the Dpy protein contains hundreds of EGF modules that are highly susceptible to O-glycosylation. Sulfation of the glycan groups attached to Dpy may be critical for its ability to form a filamentous structure. Without sulfation, the glycan groups on Dpy may not interact properly with the surrounding materials in the lumen, resulting in an aggregated and condensed structure. These possibilities are discussed in the Discussion.

      We have not analyzed sulfation levels in pio or dpy mutants because sulfation levels in mutants of single ZP domain proteins may not provide much information. A substantial number of proteoglycans, glycoproteins, and proteins (with up to 1% of all tyrosine residues in an organism’s proteins estimated to be sulfated) are modified by sulfation, so changes in sulfation levels in a single mutant may be subtle. Especially, the existing dpy mutant line is an insertion mutant of a transposable element; therefore, the sulfation sites would still remain in this mutant. 

      - Interpretation of Papss effects on Pio and Dpy would be desired. The results presented indicate loss of Pio antibody staining but normal presence of cherry-Pio. This is difficult to interpret. How are these results of Pio antibody and cherry-Pio correlating with the results in the trachea described recently (Drees et al. 2023)?

      In our original submission, we stated that the uniform luminal mCh-Pio signals were not changed in Papss mutants, but after re-analysis, we found that these signals were actually absent from the expanded luminal region in stage 16 SG (where Dpy-YFP is also absent), and weak mCh-Pio signals colocalize with the condensed Dpy-YFP signals (Figure 5C, D). We have revised the text accordingly. 

      After cleavages by Np and furin, the Pio protein should have three fragments. The Nterminal region contains the N-terminal half of the ZP domain, and mCh-Pio signals show this fragment. The very C-terminal region should localize to the membrane as it contains the transmembrane domain. We think the middle piece, the C-terminal ZP domain, is recognized by the Pio antibody. The mCh-Pio and Pio antibody signals in the WT trachea (Drees et al., 2023) are similar to those in the SG. mCh-Pio signals are detected in the tracheal lumen as uniform signals, at the apical membrane, and in cytoplasmic puncta. Pio antibody signals are exclusively in the tracheal lumen and show more heterogenous filamentous signals. 

      In Papss mutants, the middle fragment (the C-terminal ZP domain) seems to be most affected because the Pio antibody signals are absent from the lumen. The loss of Pio antibody signals could be due to protein degradation or epitope masking caused by aECM condensation and protein misfolding. This fragment seems to be key for interacting with Dpy, since Pio antibody signals always colocalize with Dpy-YFP. The N-terminal mCh-Pio fragment does not appear to play a significant role in forming a complex with Dpy in WT (but still aggregated together in Papss mutants), and this can be tested in future studies.

      In response to Reviewer 1’s comment, we performed an additional experiment to test the role of Np in cleaving Pio to help organize the SG aECM. In this experiment, we overexpressed the WT and mutant form of Np using UAS-Np.WT and UAS-Np.S990A lines (Drees et al., 2019) and analyzed mCh-Pio, Pio antibody, and Dpy-YFP signals. Np.WT overexpression resulted in increased levels of mCh-Pio, Pio, and Dpy-YFP signals in the lumen and at the apical membrane. However, overexpression of Np.S990A resulted in the absence of luminal mCh-Pio signals. Pio antibody signals were strong at the apical membrane but rather weak in the luminal filamentous structures. Since the UAS-Np.S990A line has the GFP tag, we could not reliably analyze Dpy-YFP signals due to overlapping Np.S990A.GFP signals in the same channel. However, the luminal filamentous Pio signals co-localized with GFP signals, and we assume that these overlapping signals could be Dpy-YFP signals. 

      These results suggest that overexpressed Np.S990A may act in a dominant-negative manner, competing with endogenous Np and impairing proper cleavage of Pio (and mCh-Pio). Nevertheless, some level of cleavage by endogenous Np still appears to occur, as indicated by the residual luminal filamentous Pio signals. These new findings have been incorporated into the revised manuscript and are shown in Figure 6H and 6I. 

      A proposed model of the Pio-Dpy aECM in WT, Papss, pio, and Np mutants has now been included in Figure 7.

      -  What does the WGA staining in the lumen reveal? This staining seems to be affected differently in pio and dpy mutants: in pio mutants it disappears from the lumen (as dpy-YFP does), but in dpy mutants it seems to be maintained. How do the authors interpret these findings? How does the WGA matrix relate to sulfated products (using Alcian blue or sulfotyrosine)?

      WGA binds to sialic acid and N-acetylglucosamine (GlcNAc) residues on glycoproteins and glycolipids. GlcNAc is a key component of the glycosaminoglycan (GAG) chains that are covalently attached to the core protein of a proteoglycan, which is abundant in the ECM. We think WGA detects GlcNAc residues in the components of the aECM, including Dpy as a core component, based on the following data. 1) WGA and Dpy colocalize in the lumen, both in WT (as thin filamentous structures) and Papss mutant background (as condensed rod-like structures), and 2) are absent in pio mutants. WGA signals are still present in a highly condensed form in dpy mutants. That’s probably because the dpy mutant allele (dpyov1) has an insertion of a transposable element (blood element) into intron 11 and this insertion may have caused the Dpy protein to misfold and condense. We added the information about the dpy allele to the Results section and discussed it in the Discussion.

      Minor points:

      - The morphological phenotypic analysis of Papss mutants (homozygous and transheterozygous) is a bit confusing. The general defects are higher in Papss homozygous than in transheterozygotes over a deficiency. Maybe quantifying the defects in the heterozygote embryos in the Papss mutant collection could help to figure out whether these defects relate to Papss mutation.

      We analyzed the morphology of heterozygous Papss mutant embryos. They were all normal. The data and quantifications have now been added to Figure 1-figure supplement 3. 

      - The conclusion that the apical membrane is affected in Papss mutants is not strongly supported by the results presented with the pattern of Crb (Fig 2). Further evidences should be provided. Maybe the TEM analysis could help to support this conclusion

      We quantified Crb levels in the sub-apical and medial regions of the cell and included this new quantification in Figure 2D. TEM images showed variation in the irregularity of the apical membrane, even in WT, and we could not draw a solid conclusion from these images.

      - It is difficult to understand why in Papss mutants the levels of WGA increase. Can the authors elaborate on this?

      We think that when Dpy (and many other aECM components) are condensed and aggregated into the thin, rod-like structure in Papss mutants, the sugar residues attached to them must also be concentrated and shown as increased WGA signals.   

      - The explanation about why Pio antibody and mcherry-Pio show different patterns is not clear. If the antibody recognizes the C-t region, shouldn't it be clearly found at the membrane rather than the lumen?

      The Pio protein is also cleaved by furin protease (Figure 5B). We think the Pio fragment recognized by the antibody should be a “C-terminal ZP domain”, which is a middle piece after furin + Np cleavages. 

      - The qsm information does not seem to provide any relevant information to the aECM, or sulfation.

      Since Qsm has been shown to bind to Dpy and remodel Dpy filaments in the muscle tendon (Chu and Hayashi, 2021), we believe that the different behavior of Qsm in the SG is still informative. As mentioned briefly in the Discussion, the cleaved Qsm fragment may localize differently, like Pio, and future work will need to test this. We have shortened the description of the Qsm localization in the manuscript and moved the details to the figure legend of Figure 5-figure supplement 3.

      Reviewer #3 (Significance):

      Previous reports already indicated a role for Papss in sulfation in SG (Zhu et al 2005). Now this work provides a more detailed description of the defects produced by the absence of Papss. In addition, it provides relevant data related to the nature and requirements of the aECM in the SG. Understanding the composition and requirements of aECM during organ formation is an important question. Therefore, this work may be relevant in the fields of cell biology and morphogenesis.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      In this study, the authors identify an insect salivary protein participating viral initiate infection in plant host. They found a salivary LssaCA promoting RSV infection by interacting with OsTLP that could degrade callose in plants. Furthermore, RSV NP bond to LssaCA in salivary glands to form a complex, which then bond to OsTLP to promote degradation of callose.

      The story focus on tripartite virus-insect vector-plant interaction and is interesting. However, the study is too simple and poor-conducted. The conclusion is also overstated due to unsolid findings.

      We thank the reviewer for their constructive feedback. We have conducted additional experiments to strengthen our results and conclusions as detailed below:

      (1) The comparison between vector inoculation and microinjection involves multiple confounding factors that could affect the experimental results, including salivary components, RSV inoculation titers, and the precision of viral deposition. The differential outcomes could be attributed to these various factors rather than definitively demonstrating the necessity of salivary factors. Therefore, we have removed this comparison from the revised manuscript and instead focused on elucidating the specific mechanisms by which LssaCA facilitates viral infection.

      (2) We conducted new experiments to assess the function of LssaCA enzymatic activity in mediating RSV infection. Additional experiments revealed that OsTLP enzymatic activity is highly pH-dependent, with increased activity as pH decreases from 7.5 to 5.0 (Fig. 3H). However, the LssaCA-OsTLP interaction at pH 7.4 significantly enhanced OsTLP enzymatic activity without requiring pH changes. These results demonstrate that LssaCA-OsTLP protein interactions are crucial for mediating RSV infection. In contrast to pH-dependent mechanisms, our study demonstrated that LssaCA's biological function in mediating RSV infection is at least partially, if not completely, independent of its enzymatic activity. We have added these new resulted into the revised manuscript (Lines 220-227). We have also added a comprehensive discussion comparing the aphid CA mechanism described by Guo et al. (2023 doi.org/10.1073/pnas.2222040120) with our findings in the revised manuscript (Lines 350-371).

      (3) We have repeated majority of callose deposition experiments, providing clearer images (Figures 5-6). In addition to aniline blue staining, we quantified callose concentrations using a plant callose ELISA kit to provide more precise measurements (Figure 5A, I, 6A, C and S8A). We utilized RT-qPCR to measure callose synthase expression in both feeding and non-feeding areas, confirming that callose synthesis was induced specifically in feeding regions, leading to localized callose deposition (Figures 5D-G and S8B-E). For sieve plate visualization, we examined longitudinal sections, which revealed callose deposition in sieve plates during SBPH feeding and RSV infection (Figure S7).

      (4) We generated OsTLP mutant rice seedlings (ostlp) and use this mutant to directly demonstrate that LssaCA mediates callose degradation in planta through enhancement of OsTLP enzymatic activity (Lines 288-302 and Figure 6).

      (5) We produced LssaCA recombinant proteins in sf9 cells to ensure full enzymatic activity and constructed a comprehensive CA mutant protein, in which all seven residues constituting the enzymatic active center mutated (LssaCA<sup>H111D</sup>,LssaCA<sup>N139H</sup>,LssaCA<sup>H141D</sup>, LssaCA<sup>H143D</sup>, LssaCA<sup>E153H</sup>, LssaCA<sup>H166D</sup>, LssaCA<sup>T253E</sup>) (Fig. S1B). This LssaCA mutant protein demonstrated complete loss of enzymatic activity (Fig. 1C).

      Major comments:

      (1) The key problem is that how long the LssCA functioned for in rice plant. Author declared that LssCA had no effect on viral initial infection, but on infection after viral inoculation. It is unreasonable to conclude that LssCA promoted viral infection based on the data that insect inoculated plant just for 2 days, but viral titer could be increased at 14 days post-feeding. How could saliva proteins, which reached phloem 12-14 days before, induce enough TLP to degrade callose to promote virus infection? It was unbelievable.

      We appreciate your insightful comment and acknowledge that our initial description may have been unclear. We agree that salivary proteins would not present in plant tissues for two weeks post-feeding or post-injection. Our intention was to clarify that when salivary proteins enhance RSV infection, this initial enhancement leads to sustained high viral loads. We measured viral burden at 14 days post-feeding or post-injection because this is the common measurement time point when viral titers are sufficiently high for reliable detection by qRT-PCR or western blotting. We have clarified this rationale in the revised manuscript (Lines 155-157).

      To determine the actual persistence of LssaCA in plant tissues, we conducted additional experiments where insects were allowed to feed on a defined aera of rice seedlings for two days. We then monitored LssaCA protein levels at 1 and 3 days after removing the insects. Western blotting analysis revealed that LssaCA protein levels decreased post-feeding and remained detectable at 3 days post-feeding. These results are presented in Figure 2H and described in detail in Lines 184-193.

      (2) Lines 110-116 and Fig. 1, the results of viruliferous insect feeding and microinjection with purified virus could not conclude the saliva factor necessary of RSV infection, because these two tests are not in parallel and comparable. Microinjection with salivary proteins combined with purified virus is comparable with microinjection with purified virus.

      We thank the reviewer’s insightful comment. We agree that “the results of viruliferous insect feeding and microinjection with the purified virus could not conclude the saliva factor necessary of RSV infection”. However, due to the technical difficulty in collecting sufficient quantities of salivary proteins to conduct the microinjection experiment, we have removed these results from the revised manuscript.

      (3) The second problem is how many days post viruliferous insect feeding and microinjection with purified virus did author detect viral titers? in Method section, authors declared that viral titers was detected at 7-14 days post microinjection. Please demonstrate the days exactly.

      We thank the reviewer’s insightful comment. We typically measured RSV infection levels at both 7- and 14-days post-microinjection. However, since the midrib microinjection experiments have been removed from the revised manuscript, this methodology has also been removed accordingly.

      (4) The last problem is that how author made sure that the viral titers in salivary glands of insects between two experiments was equal, causing different phenotype of rice plant. If not, different viral titers in salivary glands of insects between two experiments of course caused different phenotype of rice plant.

      We thank the reviewer’s comment. When we compared the effects of LssaCA deficiency on RSV infection of rice plants, we have compared the viral titers in the insect saliva and salivary glands. The results indicated that the virus titers in both tissues have not changed by LssaCA deficiency, suggesting that the viruses inoculated into rice phloem by insects of different treatments were comparable. Please refer to the revised manuscript Figures 2D-G and Lines 161-173.

      (5) The callose deposition in phloem can be induced by insect feeding. In Fig. 5H, why was the callose deposition increased in the whole vascular bundle, but not phloem? Could the transgenic rice plant directional express protein in the phloem? In Fig. 5, why was callose deposition detected at 24 h after insect feeding? In Fig. 6A, why was callose deposition decreased in the phloem, but not all the cells of the of TLP OE plant? Also in Fig.6A and B, expression of callose synthase genes was required.

      We thank the reviewer for these insightful comments.

      (1) Figure 5. The callose deposition increased in multiple cells within the vascular bundle, including sieve tubes, parenchymatic cells, and companion cells. While callose deposition was detected in other parts of the vascular bundle, no significant differences were observed between treatments in these regions, indicating that in response to RSV infection and other treatments, altered callose deposition mainly occurred in phloem cells. Please refer to the revised 5B, 5J, 6B, and 6D.

      (2) Transgenic plant expression. The OsTLP-overexpressing transgenic rice plants express TLP proteins in various cells under the control of CaMV 35S promoter, rather than being directionally expressed in the phloem. However, since TLP proteins are secreted, they are potentially transported and concentrated in the phloem where they can degrade callose.

      (3) Figure 5. The 24-hour time point for callose deposition detection was selected based on established protocols from previous studies. According to Hao et al. (Plant Physiology 2008), callose deposition increased during the first 3 days of planthopper infestation and decreased after 4 days. Additionally, Ellinger and Voigt (Ann Bot 2014) demonstrated that callose visualization typically begins 18-24 hours after treatment, making 24 hours an optimal detection time point.

      (4) Figure 6, Phloem-specific changes. Similar to Figure 5, while callose deposition was detected in other parts of vascular bundle, significant differences between treatments were mainly observed in phloem cells, indicating that RSV infection specifically affects callose deposition in phloem tissue.

      (5) Callose synthase gene expression. We performed RT-qPCR analysis to measure the expression levels of callose synthase genes. The results indicated that OsTLP overexpression did not significantly alter the mRNA levels of these genes, regardless of RSV infection status in SBPH.

      Reviewer #2 (Public Review):

      There is increasing evidence that viruses manipulate vectors and hosts to facilitate transmission. For arthropods, saliva plays an essential role for successful feeding on a host and consequently for arthropod-borne viruses that are transmitted during arthropod feeding on new hosts. This is so because saliva constitutes the interaction interface between arthropod and host and contains many enzymes and effectors that allow feeding on a compatible host by neutralizing host defenses. Therefore, it is not surprising that viruses change saliva composition or use saliva proteins to provoke altered vector-host interactions that are favorable for virus transmission. However, detailed mechanistic analyses are scarce. Here, Zhao and coworkers study transmission of rice stripe virus (RSV) by the planthopper Laodelphax striatellus. RSV infects plants as well as the vector, accumulates in salivary glands and is injected together with saliva into a new host during vector feeding.

      The authors present evidence that a saliva-contained enzyme - carbonic anhydrase (CA) - might facilitate virus infection of rice by interfering with callose deposition, a plant defense response. In vitro pull-down experiments, yeast two hybrid assay and binding affinity assays show convincingly interaction between CA and a plant thaumatin-like protein (TLP) that degrades callose. Similar experiments show that CA and TLP interact with the RSV nuclear capsid protein NT to form a complex. Formation of the CA-TLP complex increases TLP activity by roughly 30% and integration of NT increases TLP activity further. This correlates with lower callose content in RSV-infected plants and higher virus titer. Further, silencing CA in vectors decreases virus titers in infected plants.

      (1) Interestingly, aphid CA was found to play a role in plant infection with two non-persistent non-circulative viruses, turnip mosaic virus and cucumber mosaic virus (Guo et al. 2023 doi.org/10.1073/pnas.2222040120), but the proposed mode of action is entirely different.

      We appreciate the reviewer’s insightful comment and have carefully examined the cited publication. The study by Guo et al. (2023) elucidates a distinct mechanism for aphid-mediated transmission of non-persistent, non-circulative viruses (turnip mosaic virus and cucumber mosaic virus). In their model, aphid-secreted CA-II in the plant cell apoplast leads to H<sup>+</sup> accumulation and localized acidification. This trigger enhanced vesicle trafficking as a plant defense response, inadvertently facilitating virus translocation from the endomembrane system to the apoplast.

      In contrast to these pH-dependent mechanisms, our study demonstrated that LssaCA’s biological function in mediating RSV infection is, if not completely, at least partially independent of its enzymatic activity. We performed additional experiments to reveal that OsTLP enzymatic activity is highly pH-dependent and exhibits increased enzymatic activity as pH decreases from 7.5 to 5.0 (Fig. 3H); however, the LssaCA-OsTLP interaction occurring at pH 7.4 significantly enhanced OsTLP enzymatic activity without any change in buffer pH (Fig. 3G). These results demonstrate the crucial importance of LssaCA-OsTLP protein interactions, rather than enzymatic activity alone, in mediating RSV infection.

      We have incorporated these new experimental results and added a comprehensive discussion comparing the aphid CA mechanism described by Guo et al. (2023) with our findings in the revised manuscript. Please refer to Figures 3G-H, Lines 220-227 and 350-371 for detailed information.

      (2) While this is an interesting work, there are, in my opinion, some weak points. The microinjection experiments result in much lower virus accumulation in rice than infection by vector inoculation, so their interpretation is difficult.

      We acknowledge the reviewer's concern regarding the lower virus accumulation observed in microinjection experiments compared to vector-mediated inoculation. We have removed these experiments from the revised manuscript. To address the core question raised by these experiments, we have conducted new experiments that directly demonstrate the importance of LssaCA-OsTLP protein-protein interactions in mediating RSV infection. These results demonstrate the crucial importance of LssaCA-OsTLP protein interactions, rather than enzymatic activity alone, in mediating RSV infection. Additionally, we have incorporated a comprehensive discussion examining carbonic anhydrase activity, pH homeostasis, and viral infection. Please refer to the detailed experimental results and discussion in the sections mentioned in our previous response (Figures 3G-H, Lines 220-227 and 350-371).

      (3) Also, the effect of injected recombinant CA protein might fade over time because of degradation or dilution.

      We appreciate the reviewer’s insightful comment. This is indeed a valid concern that could affect the interpretation of microinjection results. To address the temporal dynamics of CA protein presence in planta, we conducted time-course experiments to monitor the retention of naturally SBPH-secreted CA proteins in rice plants. Our analysis at 1- and 3- days post-feeding (dpf) revealed that CA protein levels decreased progressively following SBPH feeding, but could also been detected at 3dpf (Fig. 2H). Please refer to Figures 2H and lines 184-193 for detailed information.

      (4) The authors claim that enzymatic activity of CA is not required for its proviral activity. However, this is difficult to assess because all CA mutants used for the corresponding experiments possess residual activity.

      We appreciate the reviewer’s insightful comment. We constructed a comprehensive CA mutant protein in which all seven residues constituting the enzymatic active center mutated (LssaCA<sup>H111D</sup>, LssaCA<sup>N139H</sup>, LssaCA<sup>H141D</sup>, LssaCA<sup>H143D</sup>, LssaCA<sup>E153H</sup>, LssaCA<sup>H166D</sup>, LssaCA<sup>T253E</sup>) (Fig. S1B). This LssaCA mutant protein demonstrated complete loss of enzymatic activity (Fig. 1C). However, since we have removed the recombinant CA protein microinjection experiments from the revised manuscript, we lack sufficient direct evidence to definitively demonstrate that CA enzymatic activity is dispensable for its proviral function. To address the core question raised by these experiments, we have conducted new experiments that provide direct evidence for the importance of LssaCA-OsTLP protein-protein interactions in mediating RSV infection. Additionally, we have incorporated a comprehensive discussion examining carbonic anhydrase activity, pH homeostasis, and viral infection. Please refer to the detailed experimental results and discussion in the sections mentioned in our previous response (Figures 3G-H, Lines 220-227 and 350-371).

      (5) It remains also unclear whether viral infection deregulates CA expression in planthoppers and TLP expression in plants. However, increased CA and TLP levels could alone contribute to reduced callose deposition.

      We have compared LssaCA mRNA levels in RSV-free and RSV-infected L.striatellus salivary glands, which indicated that RSV infection does not significantly affect LssaCA expression (Figure 1J). By using RSV-free and RSV-infected L.striatellus to feed on rice seedlings, we clarified that RSV infection does not affect TLP expression in plants (Figure 5H).

      Reviewer #1: (Recommendations For The Authors):

      Other comments:

      (1) Most data proving viral infection and LssaCA expression were derived from qPCR assays. Western blot data are strongly required to prove the change at the protein level.

      We agree that western blot data are required to prove the change at the protein level. In the revised manuscript, we have added western-blotting results (Figures 1F, 1I, 2C, 2J, and S6).

      (2) Line 145, data that LssaCA was significantly downregulated should be shown.

      Thank you and the data has been added to the revised manuscript. Please refer to Line 165 and Figure 2D.

      (3) Lines 159-161, how did authors assure that the dose of recombinant LssCA was closed to the release level of insect feeding, but not was excessive? How did author exclude the possibility of upregulated RSV titer caused by excessive recombinant LssCA?

      We appreciate this important concern regarding dosage controls. While microinjection of recombinant proteins typically yields viral infection levels significantly lower than those achieved through natural insect feeding, higher protein concentrations are often required to achieve high viral infection levels. In this experiment, we compared RSV infection levels following microinjection of BSA+RSV versus LssaCA+RSV, with the expectation that any observed upregulation in RSV titer would be specifically attributable to recombinant LssaCA rather than excessive protein dosing. However, given the low RSV infection levels observed with viral microinjection, we have removed their corresponding results from the revised manuscript.

      (4) Lines 124-125, recombinantly expressed LssaCA protein should be underlined, but not the LssaCA protein itself.

      We have clearly distinguished recombinantly expressed LssaCA from endogenous LssaCA protein throughout the manuscript, ensuring that all references to recombinant proteins are properly labeled as such.

      (5) LssaCA expression in salivary glands of viruliferous and nonviruliferous insects is required. LssaCA accumulation in rice plant exposed to viruliferous and nonviruliferous insects is also required.

      We have measured LssaCA mRNA levels in salivary glands of viruliferous and nonviruliferous insects (Figure 1J), and protein levels in rice plant exposed to viruliferous and nonviruliferous insects (Figure 1I).

      (6) Fig. 4G, the enzymatic activities of OsTLP were too low compared with that in Fig. 4E and Fig. 7E. Why did the enzymatic activities of the same protein show so obvious difference?

      We apologize for the error in Fig. 4G. The original data presented relative fold changes between OsTLP+BSA and OsTLP+LssaCA treatment, with OsTLP+BSA normalized to 1.0 and OsTLP+LssaCA values expressed as fold changes relative to this baseline. However, the Y-axis was incorrectly labeled as “β-1,3-glucanase (units mg<sup>-1</sup>)”, which suggested absolute enzymatic activity values. We have now corrected the figure (revised Figure 3G) to display the actual absolute enzymatic activity values with the appropriate Y-axis label “β-1,3-glucanase (units mg<sup>-1</sup>)”.

      (7) Fig. 7E, was the LssaCA + NP and LssaCA + GST quantified?

      Yes, all proteins were quantified, and enzymatic activity values were calculated and expressed as units per milligram of proteins (units mg<sup>-1</sup>).

      Minor comments:

      (1) The keywords: In fact, the LssaCA functioned during initial viral infection in plant, but not viral horizontal transmission.

      We appreciate the reviewer’s insightful comment. We have revised the manuscript title to “Rice stripe virus utilizes an Laodelphax striatellus salivary carbonic anhydrase to facilitate plant infection by direct molecular interaction” and changed the keyword from “viral horizontal transmission” to “viral infection of plant”.

      (2) Fig. 2A, how about testes? Was this data derived from female insects? Fig. 2C, is the saliva collected from nonviruliferous insects? Fig. 2E, what is the control?

      We appreciate the reviewer’s insightful comments.

      (1) Fig. 2A: The data present mean and SD calculated from three independent experiments, with 5 tissue samples per experiment. Since 3<sup>rd</sup> instar nymphs were used for feeding experiments in this study, we also used 3<sup>rd</sup> instar RSV-free nymphs to measure gene expression in guts, salivary glands and fat bodies. R-body represents the remaining body after removing these tissues. Female insects were used to measure gene expression in ovaries, and gene expression in testes was also added. We have added this necessary information to the revised manuscript (please refer to new Figure 1F and Lines 402-403).

      (2) Fig. 2C: Yes, saliva was collected from nonviruliferous insects.

      (3) Fig. 2E: The control consisted of 100 mM PBS, as described in the experimental section (Lines 643-644): “A blank control consisted of 2 mL of 100 mM PBS (pH 7.0) mixed with 1 mL of 3 mM p-NPA.” In the revised manuscript, we recombinantly expressed LssaCA and its mutant proteins in both sf9 cells and E.coli. Therefore, we have used the mutant proteins as controls to demonstrate specific enzymatic activity. Please refer to Figure 1C, Lines 115-122 and 621-635 for detailed information.

      (3) Some figure labeling appeared unprofessional. For example, "a-RSV", "loading" in Fig. 1, "W-saliva", "G-saliva" in Fig. 2, and so on, the related explanations were absent.

      We appreciate the reviewer’s insightful comments. We have thoroughly reviewed all figures to ensure professional labels. Specifically, we have:

      (1) Used proper protein names to label western blots and clearly explained the antibodies used for protein detection.

      (2) Provided comprehensive explanations for all abbreviations used in figures within the corresponding figure legends.

      (3) Ensured consistent and clear labeling throughout all figures.

      Please refer to the revised Figures 1-3 for these corrections.

      (4) Lines 83-84, please cite references on callose preventing viral movement. I do not think the present references were relevant.

      We have added a more relevant reference (Yue et al., 2022, Line 82), which revealed that palmitoylated γb promotes virus cell-to-cell movement by interacting with NbREM1 to inhibit callose deposition at plasmodesmata.

      (5) The background of transgenic plants of OsTLP OE should be characterized. And the overexpression of OsTLP should be shown. Which generation of OsTLP OE did authors use?

      The background of transgenic plants of OsTLP OE and its generation used have been shown in the “Materials and methods” section (Line 782-786) and has been mentioned in the main text (Line 214). T<sup>2</sup> lines have been selected for further analysis (Line 789).

      (6) Fig. 5A, the blank, which derived from plants without exposure to insect, was absent.

      We appreciate the reviewer’s insightful comments. We have added the non- fed control in the revised Figure 5A-C.

      (7) Fig. 7A, the nonviruruliferous insects were required to serve as a control.

      Immunofluorescence localization of RSV and LssaCA in uninfected L. striatellus salivary glands have been added to the revised manuscript (Figure S2).

      (8) The manuscript needs English language edit.

      The manuscript has undergone comprehensive English language editing to improve clarity, grammar, and overall readability.

      Reviewer #2 (Recommendations For The Authors):

      (1) The first experiment compares vector inoculation vs microinjection of RSV in tissue. I am not sure that your claim (saliva factors are necessary for inoculation) holds, because the vector injects RSV directly into the phloem, whereas microinjection is less precise and you cannot control where exactly the virus is deposed. However, virus deposited in other tissues than the phloem might not replicate, and indeed you observe, compared to natural vector inoculation, highly reduced virus titers.

      We appreciate the reviewer’s insightful comments. We agree that the comparison between vector inoculation and microinjection involves multiple confounding factors that could affect the experimental results, including salivary components, RSV inoculation titers, and the precision of viral deposition. As the reviewer correctly points out, the differential outcomes could be attributed to these various factors rather than definitively demonstrating the necessity of salivary factors. Therefore, we have removed this comparison from the revised manuscript and instead focused on elucidating the specific mechanisms by which LssaCA facilitates viral infection.

      (2) Next the authors show that a carbonic anhydrase (CA) that they previously detected in saliva is functional and secreted into rice. I assume this is done with non-infected insects, but I did not find the information. Silencing the CA reduces virus titers in inoculated plants at 14 dpi, but not in infected planthoppers. At 1 dpi, there is no difference in RSV titer in plants inoculated with CA silenced planthoppers or control hoppers. To see a direct effect of CA in virus infection, purified virus is injected together with a control protein or recombinant CA into plants. At 14 dpi, there is about double as much virus in the CA-injected plants, but compared to authentic SBPH inoculation, titers are 20,000 times lower. Actually, I believe it is not very likely that the recombinant CA is active or present so long after initial injection.

      We appreciate the reviewer’s insightful comments.

      (1) Our previous study identified the CA proteins from RSV-free insects. We have added this information to the revised manuscript (Line 110).

      (2) We acknowledge the reviewer's concern regarding the lower virus accumulation observed in microinjection experiments compared to vector-mediated inoculation. We have removed these experiments from the revised manuscript and instead focused on elucidating the specific mechanisms by which LssaCA facilitates viral infection.

      (3) We didn’t intend to suggest that LssaCA proteins presented for 14 days post-injection. We measured viral titers at 14 days post-feeding or post-injection because this is the common measurement time point when viral titers are sufficiently high for reliable detection by RT-qPCR or western blotting. We have clarified this rationale in the revised manuscript (Lines 155-157). To determine the actual persistence of LssaCA in plant tissues, we monitored LssaCA protein levels at 1 and 3 dpf. Western blotting analysis revealed that LssaCA protein levels decreased post-feeding and remained detectable at 3 dpf. These results are presented in Figure 2H and described in detail in Lines 184-193.

      (3) Then the authors want to know whether CA activity is required for its proviral action and single amino acid mutants covering the putative active CA site are created. The recombinant mutant proteins have 30-70 % reduced activity, but none of them has zero activity. When microinjected together with RSV into plants, RSV replication is similar as injection with wild type CA. Since no knock-out mutant with zero activity is used, it is difficult to judge whether CA activity is unimportant for viral replication, as claim the authors.

      We appreciate the reviewer’s insightful comment. We constructed a comprehensive CA mutant protein in which all seven residues constituting the enzymatic active center mutated (LssaCA<sup>H111D</sup>, LssaCA<sup>N139H</sup>, LssaCA<sup>H141D</sup>, LssaCA<sup>H143D</sup>, LssaCA<sup>E153H</sup>, LssaCA<sup>H166D</sup>, LssaCA<sup>T253E</sup>) (Fig. S1B). This LssaCA mutant protein demonstrated complete loss of enzymatic activity (Fig. 1C). However, since we have removed the recombinant CA proteins microinjection experiments from the revised manuscript, we lack sufficient direct evidence to definitively demonstrate that CA enzymatic activity is dispensable for its proviral function. To address the core question raised by these experiments, we have conducted new experiments that provide direct evidence for the importance of LssaCA-OsTLP protein-protein interactions in mediating RSV infection. Additionally, we have incorporated a comprehensive discussion examining carbonic anhydrase activity, pH homeostasis, and viral infection. Please refer to the detailed experimental results and discussion in the sections mentioned in our previous response (Figures 3G-H, Lines 220-227 and 350-371).

      (4) Next a yeast two hybrid assay reveals interaction with a thaumatin-like rice protein (TLP). It would be nice to know whether you detected other interacting proteins as well. The interaction is confirmed by pulldown and binding affinity assay using recombinant proteins. The kD is in favor of a rather weak interaction between the two proteins.

      We have added a list of rice proteins that potentially interact with LssaCA (Table S1) and have measured interactions with additional proteins (unpublished data). Despite the relatively weak binding affinity, the functional significance of the LssaCA-OsTLP interaction in enhancing TLP enzymatic activity is substantial.

      (5) Then the glucanase activity of TLP is measured using recombinant TLP-MBP or in vivo expressed TLP. It is not clear to me which TLP is used in Fig. 4G (plant-expressed or bacteria-expressed). If it is plant-expressed TLP, why is its basic activity 10 times lower than in Fig. 4F?

      Fig. 4G is the Fig. 3G in the revised manuscript. A E. coli-expressed TLP protein has been used. We apologize for the error in our original Fig. 4G. The original data presented relative fold changes between OsTLP+BSA and OsTLP+LssaCA treatment, with OsTLP+BSA normalized to 1.0 and OsTLP+LssaCA values expressed as fold changes relative to this baseline. However, the Y-axis was incorrectly labeled as “β-1,3-glucanase (units mg<sup>-1</sup>)”, which suggested absolute enzymatic activity values. We have now corrected the figure to display the actual absolute enzymatic activity values with the appropriate Y-axis label “β-1,3-glucanase (units mg<sup>-1</sup>)”.

      (6) There is also a discrepancy in the construction of the transgenic rice plants: did you use TLP without signal peptide or full length TLP? If you used TLP without signal peptide, you should explain why, because the wild type TLP contains a signal peptide.

      We cloned the full-length OsTLP gene including the signal peptide sequence (Line 782 in the revised manuscript).

      (7) The authors find that CA increases glucanase activity of TLP. Next the authors test callose deposition by aniline blue staining. Feeding activity of RSV-infected planthoppers induces more callose deposition than does feeding by uninfected insects. In the image (Fig. 5A) I see blue stain all over the cell walls of xylem and phloem cells. Is this what the authors expect? I would have expected rather a patchy pattern of callose deposition on cell walls. Concerning sieve plates, I cannot discern any in the image; they are easier to visualize in longitudinal sections than in transversal section as presented here.

      We appreciate the reviewer’s insightful comment.

      (1) Callose deposition pattern: While callose deposition was detected in other parts of the vascular bundle, significant differences between treatments were mainly observed in phloem cells, indicating that phloem-specific callose deposition is the primary response to RSV infection and SBPH feeding (Figures 5B and 5J).

      (2) Sieve plate visualization: We have examined longitudinal sections to visualize sieve plates, which revealed callose deposition in sieve plates during SBPH feeding and RSV infection (Figure S7).

      (3) Quantitative analysis: In addition to aniline blue staining, we quantified callose concentrations using a plant callose ELISA kit to provide more precise measurements (Figure 5A, 5I and S8A).

      (4) Gene expression analysis: We utilized RT-qPCR to measure callose synthase expression in both feeding and non-feeding areas, confirming that callose synthesis was induced specifically in feeding regions, leading to localized callose deposition (Figures 5D-H).

      These experimental results collectively demonstrate that RSV infection induces enhanced callose synthesis and deposition, with this response occurring primarily in phloem cells, including sieve plates, within feeding sites and their immediate vicinity.

      (8) I do not quite understand how you quantified callose deposition (arbitrary areas?) with ImageJ. Please indicate in detail the analysis method.

      We have added more detailed information for the methods to quantify callose deposition (Lines 673-678).

      (9) More callose content is also observed by a callose ELISA assay of tissue extracts and supported by increased expression of glucanase synthase genes. Did you look whether expression of TLP is changed by feeding activity and RSV infection? Silencing CA in planthoppers increases callose deposition, which is inline with the observation that CA increases TLP activity.

      We measured OsTLP expression following feeding by RSV-free or RSV-infected SBPH and found that gene expression was not significantly affected by either insect feeding or RSV infection. These results have been added to the revised manuscript (Lines 275-277 and Figure 5H).

      (10) Next, callose is measured after feeding of RSV-infected insects on wild type or TLP-overexpressing rice. Less callose deposition (after 2 days) and more virus (after 14 days) is observed in TLP overexpressors. I am missing a control in this experiment, that is feeding of uninfected insects on wild type or TLP overexpressing rice, where I would expect intermediate callose levels.

      We appreciate the reviewer’s insightful comment and fully agree with the prediction. In the revised manuscript, we have constructed ostlp mutant plants and conducted additional experiments to further clarify how callose deposition is regulated by insect feeding, RSV infection, LssaCA levels, and OsTLP expression. Specifically: 

      (1) Both SBPH feeding and RSV infection induce callose deposition, with RSV-infected insect feeding resulting in significantly higher callose levels compared to RSV-free insect feeding (Fig. 5A-C).

      (2) LssaCA enhances OsTLP enzymatic activity, thereby promoting callose degradation (Fig. 5I-K).

      (3) OsTLP-overexpressing (OE) plants exhibit lower callose levels than wild-type (WT) plants, while ostlp mutant plants show higher callose levels than WT (Fig. 6A-B).

      (4) In ostlp knockout plants, LssaCA no longer affects callose levels, indicating that OsTLP is required for LssaCA-mediated regulation of callose (Fig. 6C-D).

      These additional data address the reviewer’s concern and support the conclusion that OsTLP plays a central role in modulating callose levels in response to RSV infection and insect feeding.

      (11) Next the authors test for interaction between virions and CA. Immunofluorescence shows that RSV and CA colocalize in salivary glands; in my opinion, there is partial and not complete colocalization (Fig. 7A).

      We agree with the reviewer’s observation. CA is primarily produced in the small lobules of the principal salivary glands, while RSV infects nearly all parts of the salivary glands. In regions where RSV and CA colocalize within the principal glands, the CA signal appears sharper than that of RSV, likely due to the relatively higher abundance of CA compared to RSV in these areas. This may explain the partial, rather than complete, colocalization observed in our original Figure 7A. In the revised manuscript, please refer to Figure 1A.

      (12) Pulldown experiments with recombinant RSV NP capsid protein and CA confirm interaction, binding affinity assays indicate rather weak interaction between CA and NP. Likewise in pull-down experiments, interaction between NP, CA and TLP is shown. Finally, in vitro activity assays show that activity of preformed TLP-CA complexes can be increased by adding NP; activity of TLP alone is not shown.

      We performed two independent experiments to confirm the influence on TLP enzymatic activity by LssaCA or by the LssaCA-RSV NP complex. In the first experiment, we compared the enhancement of TLP activity by LssaCA using TLP alone as a control (Figure 3G). In the second experiment examining the LssaCA-RSV NP complex effect on TLP activity, we used the LssaCA-TLP combination as the baseline control rather than TLP alone (Figure 4B), since we had already established the LssaCA enhancement effect in the previous experiment.

      (13) For all microscopic acquisitions, you should indicate the exact acquisition conditions, especially excitation and emission filter settings, kind of camera used and objectives. Use of inadequate filters or of a black & white camera could for example be the reason why you observe a homogeneous cell wall label in the aniline blue staining assays. Counterstaining cell walls with propidium iodide might help distinguish between cell wall and callose label.

      Thank you for your insightful suggestions. We have added the detailed information to the revised manuscript (Lines 656-659 and 673-678).

      (14) You should provide information whether CA is deregulated in infected planthoppers, as this could also modify its mode of action.\

      We have compared LssaCA mRNA levels in RSV-free and RSV-infected L.striatellus salivary glands. The results indicated that RSV infection does not significantly affect LssaCA expression (Figure 1J).

      (15) You should show purity of the proteins used for affinity binding measurements.

      We have included SDS-PAGE results of purified proteins in the revised manuscript (Figure S3).

      (16) L 39: Not all arboviruses are inoculated into the phloem.

      Thank you. We have revised this description (Lines 40, 73, 95 and 97).

      (17) L 76: Watery saliva is also injected in epidermis and mesophyll cells.

      Thank you. We have revised this description (Line 73).

      (18) L 79: What do you mean by "avirulent gene"?

      Thank you for your valuable comments. We have revised this description as “certain salivary effectors may be recognized by plant resistance proteins to induce effector-triggered immunity”. Please refer to Lines 76-77 for detail.

      (19) L 128: Please add delivery method.

      Thank you. We have added the delivery methods (Line 134).

      (20) L 195: Please explain "MST".

      Explained (Line 124). Thank you.

      (21) L 203: Please add the plant species overexpressing TLP.

      Added (Line 214). Thank you.

      (22) L 213: Callose deposition has also a role against phloem-feeding insects.

      We appreciate the reviewer’s insight comment. We have added this information to the revised manuscript (Line 252).

      (23) L 626: What is a "mutein"?

      "mutein" is an abbreviation for mutant proteins. Since the recombinant protein microinjection experiments have been removed from the revised manuscript, the term “mutein” has also been removed. For all other instances, we now use the full term “mutant proteins”.

      (24) Fig. 1E: what is "loading"? You should rather show here and elsewhere (or add to supplement) complete protein gels and Western blot membranes and not only bands of interest.

      Thank you for your valuable suggestion. Although Figure 1E has been removed from the revised manuscript, we have carefully reviewed all figures to ensure that the term “loading” has been replaced with the specific protein names where appropriate.

      (25) Fig. 2C: Please indicate which is the blot and which is the silver stained gel and add mass markers in kDa to the silver stained gel.

      Thank you for your suggestion. We have revised figure to include labeled silver-stained gels with indicated molecular weight markers (Figure 1H in the revised manuscript).

    1. Reviewer #2 (Public review):

      In this paper, Hamid et al present 40 genomes from the Faroe Islands. They use these data (a pilot study for an anticipated larger-scale sequencing effort) to discuss the population genetic diversity and history of the sample, and the Faroes population. I think this is an overall solid paper; it is overall well-polished and well-written. It is somewhat descriptive (as might be expected for an explorative pilot study), but does make good use of the data.

      The data processing and annotation follows a state-of-the-art protocol, and at least I could not find any evidence in the results that would pinpoint towards bioinformatic issues having substantially biased some of the results, and at least preliminary results lead to the identification of some candidate disease alleles, showing that small, isolated cohorts can be an efficient way to find populations with locally common, but globally rare disease alleles.

      I also enjoyed the population structure analysis in the context of ancient samples, which gives some context to the genetic ancestry of Faroese, although it would have been nice if that could have been quantified, and it is unfortunate that the sampling scheme effectively precludes within-Faroes analyses.

      I am unfortunately quite critical of the selection analysis, both on a statistical level and, more importantly, I do not believe it measures what the authors think it does.

      Major comments:

      (1) Admixture timing/genomic scaling/localization:<br /> As the authors lay out, the Faroes were likely colonized in the last 1,000-1,500 years, i.e., 40-60 generations ago. That means most genomic processes that have happened on the Faroese should have signatures that are on the order of ~1-2cM, whereas more local patterns likely indicate genetic history predating the colonization of the islands. Yet, the paper seems to be oblivious to this (to me) fascinating and somewhat unique premise. Maybe this thought is wrong, but I think the authors miss a chance here to explain why the reader should care beyond the fact that the small populations might have high-frequency risk alleles and the Faroes are intrinsically interesting, but more importantly, it also makes me think it leads to some misinterpretations in the selection analysis

      (2) ROH:<br /> Would the sampling scheme impact ROH? How would it deal with individuals with known parental coancestry? As an example of what I mean by my previous comment, 1MB is short enough in that I would expect most/many 1MB ROH-tracts to come from pedigree loops predating the colonization of the Faroes. (i.e, I am actually quite surprised that there isn't much more long ROH, which makes me wonder if that would be impacted by the sampling scheme).

      (3) Selection scan:

      We are talking about a bottlenecked population that is recently admixed (Faroese), compared to a population (GBR) putatively more closely related to one of its sources. My guess would be that selection in such a scenario would be possibly very hard to detect, and even then, selection signals might not differentiate selection in Faroese vs. GBR, but rather selection/allele frequency differences between different source populations. I think it would be good to spell out why XP-EHH/iHS measures selection at the correct time scale, and how/if these statistics are expected to behave differently in an admixed population.

      (4) Similarly, for the discussion of LCT, I am not convinced that the haplotypes depicted here are on the right scale to reflect processes happening on the Faroes. Given the admixture/population history, it at the very least should be discussed in the context of whether the 13910 allele frequency on the Faroes is at odds with what would be expected based on the admixture sources.

      (5) I am lacking information to evaluate the procedure for turning the outliers into p-values. Both iHS and XP-EHH are ratio statistics, meaning they might be heavy-tailed if one is not careful, and the central limit theorem may not apply. It would be much easier (and probably sufficient for the points being made here) to reframe this analysis in terms of empirical outliers.

      (6) Oldest individual predating gene flow: It seems impossible to make any statements based on a single individual. Why is it implausible that this person (or their parents), e.g., moved to the Faroes within their lifetime and died there?

    1. AbstractRice (Oryza sativa) is one of the most important staple food crops worldwide, and its wild relatives serve as an important gene pool in its breeding. Compared with cultivated rice species, African wild rice (Oryza longistaminata) has several advantageous traits, such as resistance to increased biomass production, clonal propagation via rhizomes, and biotic stresses. However, previous O. longistaminata genome assemblies have been hampered by gaps and incompleteness, restricting detailed investigations into their genomes. To streamline breeding endeavors and facilitate functional genomics studies, we generated a 343-Mb telomere-to-telomere (T2T) genome assembly for this species, covering all telomeres and centromeres across the 12 chromosomes. This newly assembled genome has markedly improved over previous versions. Comparative analysis revealed a high degree of synteny with previously published genomes. A large number of structural variations were identified between the O. longistaminata and O. sativa. A total of 2,466 segmentally duplicated genes were identified and enriched in cellular amino acid metabolic processes. We detected a slight expansion of some subfamilies of resistance genes and transcription factors. This newly assembled T2T genome of O. longistaminata provides a valuable resource for the exploration and exploitation of beneficial alleles present in wild relative species of cultivated rice.

      This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giaf074), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

      Reviewer 1: Francois Sabot

      The manuscript from Guang et al deals with a T2T assembly for the wild perennial African rice Oryza longistaminata. Using last up to date technologies and approaches, authors provided a high quality assembly for this wild species, rending it a valuable ressource for understanding rice evolution. While the results as assembly are of high quality, the interpretation of some biological results, in particular about the NBS-LRR, are quite weird, in my opinion, and need to be more refined. That's why I think the manuscript should be published, but after major corrections.

      in details:

      -Introduction: not sure the exceptional biomass is a good idea from longistaminata, as this plant has avery high content in silicium, rendering its biomass complex to use. - Methods: We do not have access to most of the command options and command-lines. please provide them at least as a texte file in supp data. In addition, some of the references for tools are missing. Finally, please provide the accession number of the assembled plant. - Assembly in itself: O longistaminata is a outcrossing heterozygous organism. Did you obtained the two haplotypes ? - Comparison with the previous longistaminata genome: is the inversion in middle of Chr6 specific ? or due to an error of previous assembly ? - Table 1: what do you mean "Total size of assembled genomes (bp) 331,045,917" ? What is the residual percentage of N ? - Figure 1 and others: please show the legend in other way, here we may mix it with the main text. in addition, check the legends for spelling and the size of figure (3b eg) for lisibility - Syri/MUMmer analysis: you limit as min size at 1kb ? What was the order of query vs ref ? can we have a bed file with the positions ? - SD: is there a statistical link between chromosome size and number of SD ? It could explain why the first 4 ones have more SD. In general, the data are missing stats. - GO in SD: any statistical validation ? - Genomes comparison: please provide the acc number of the genome you used for comparison. - NBS-LRR: the longistaminata genome has 215 genes for 116 to 289 for other oryza so I cannot see any contraction or expansion. in addition, the text here is weird, starting speaking of onctraction then going to expansion ??? - TF analysis; the african assemblies are quite bad I think, explaining the discrepency. For glaberrima, did you check the one from Tranchant-Dubreuil et al, 2023 ?

    1. AbstractBackground The central bearded dragon (Pogona vitticeps) is widely distributed in central eastern Australia and adapts readily to captivity. Among other attributes, it is distinctive because it undergoes sex reversal from ZZ genotypic males to phenotypic females at high incubation temperatures. Here, we report an annotated telomere to telomere phased assembly of the genome of a female ZW central bearded dragon.Results Genome assembly length is 1.75 Gbp with a scaffold N50 of 266.2 Mbp, N90 of 28.1 Mbp, 26 gaps and 42.2% GC content. Most (99.6%) of the reference assembly is scaffolded into 6 macrochromosomes and 10 microchromosomes, including the Z and W microchromosomes, corresponding to the karyotype. The genome assembly exceeds standard recommended by the Earth Biogenome Project (6CQ40): 0.003% collapsed sequence, 0.03% false expansions, 99.8% k-mer completeness, 97.9% complete single copy BUSCO genes and an average of 93.5% of transcriptome data mappable back to the genome assembly. The mitochondrial genome (16,731 bp) and the model rDNA repeat unit (length 9.5 Kbp) were assembled. Male vertebrate sex genes Amh and Amhr2 were discovered as copies in the small non-recombining region of the Z chromosome, absent from the W chromosome.This, coupled with the prior discovery of differential Z and W transcriptional isoform composition arising from pseudoautosomal sex gene Nr5a1, suggests that complex interactions between these genes, their autosomal copies and their resultant transcription factors and intermediaries, determines sex in the bearded dragon.Conclusion This high-quality assembly will serve as a resource to enable and accelerate research into the unusual reproductive attributes of this species and for comparative studies across the Agamidae and reptiles more generally.Species Taxonomy Eukaryota; Animalia; Chordata; Reptilia; Squamata; Iguania; Agamidae; Amphibolurinae; Pogona; Pogona vitticeps

      This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giaf085), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

      Reviewer 1: Heiner Kuhl

      Patel et al. present a genome assembly of the bearded dragon Pogona vitticeps a lizard species that is widely distributed as a pet and known for its interesting sex-determination, which may switch from genetic sex-determination (ZW) to temperature dependent sex-reversal. The methods chosen to assemble the genome are very state-of-the-art including HIFI and ONT long reads, Hi-C and suitable bioinformatic tools.

      I have to admit that I have recently been reviewing a similar manuscript for Gigascience (https://www.biorxiv.org/content/10.1101/2024.09.05.611321v1), where a female ZZ P. vitticeps had been sequenced/assembled from long read data of a different nanopore technology and analyses of the ZW-chromosome was done by short read coverage analysis. One of my major comments was that this approach lacked a true assembly of the W-chromosome. Thus, I am happy to see that the assembly of the W-specific region has been achieved here and the sequencing technologies used might even improve the assembly quality over the ZZ assembly in terms of phasing, consensus accuracy etc. The two manuscripts are highly complementary and I think they should be published, if possible, in the very same issue of Gigascience. Surely both groups have invested a lot of efforts. (Reading L. 685, I just have realized that this seems to be the intention of the journal and I very much support this idea.)

      Still there are some minor points that need improvement for the current manuscript:

      Why do you leave the Z and W splitted into PAR, Z- and W-specific scaffolds and do not assemble the full-length chromosomes (L. 676)? Would the Hi-C data not support that?

      Mitochondrial assembly: from ONT only (L. 307), please do a consensus correction with illumina data, or at least show that the MT assembly has a high consensus accuracy (Q40-Q50).

      Genome annotation: show BUSCO scores for annotated proteins (do they fit to BUSCO performed on the whole genome?). If possible, compare to results of the NCBI RefSeq annotation (is it already available?). In this regard please explain the relatively low mapping rates (L. 647) of RNAseq to the annotated sequences.

      Could you provide some expression data for the Z-specific Amh and AmhR2? Is it differentially expressed in testis/ovary (after correction for copy number)?

      Table1, could you show results for the two different ONT library types (ligation vs. ultralong kit). It seems the overall yield was low (5 cells -> 100Gb), any speculation why?

      I think assembly statistics (Table2) should also contain contig N50 length as an additional value to show the high continuity of the assembly.

      L. 488: "48.36 (1 error in 146kb)", I think something is wrong here. Q48.36 would be 1 error in 68.5kb. I would suggest to re-check these values and incorporate them in Table2. The high consensus accuracy is one selling point compared to the competitor's assembly.

      L. 490: "Individual haplotypes were 85.5% complete…". Explain why you are confident that the haplotypes are more complete than the Merqury results suggest (just one sentence).

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reply to the reviewers

      We would like to thank the reviewers for their overall positive evaluations of our manuscript and for their invaluable suggestions that will allow us to reinforce our conclusions. We acknowledge that there is some work to be done and are ready to address most of the reviewers' comments as detailed in our replies below.

      Reviewer #1

      1. The findings that mmDicer is proviral in bat cells relies exclusively on the observation that the depletion of Dicer in M. myotis cells leads to a reduced accumulation of SFV and SINV at the RNA and protein levels (figure 2). Heterologous expression of mmDicer in HEK 293T NoDice doesn't lead to an increase permissivity to viral infections (figure 1) and the accumulation of Dicer foci is only observed in M. myotis cells but not when mmDicer is expressed in HEK 293 NoDice cells (figure 6). Given that the key finding of this manuscript relies on these knockdown experiments, the authors should ensure that the impact on viral infections is due to the specific silencing of mmDicer and not caused by off-target effects of their siRNA-mediated approach. The authors designed a siRNA pool to efficiently knock-down mmDicer. They should validate their findings by using individual Dicer siRNA and verify whether the decrease SFV/SINV accumulation is observed with at least two individual siRNAs targeting Dicer. It would also strengthen their findings if they could show a complementation experiment in which a mmDicer (designed to not be affected by the siRNA-mediated silencing) is introduced exogenously in Dicer-depleted cells and show that it rescues the observed decrease in viral accumulation to demonstrate that the proviral role is strictly dependent on mmDicer. Alternatively, the authors could consider a CRISPR/Cas9 genome editing approach to knockout Dicer in bat cells to test whether this proviral effect is confirmed.

      Reply: We agree with this reviewer that it is important to provide evidence for the specificity of the knock-down and to rule out any off-target effect of the siRNAs. This is the reason for using the siTool technology, which relies on the use of a pool of 30 siRNAs that are transfected at a final concentration of 3 nM. This means that each individual siRNA in the pool is at a concentration of 0.1 nM, so the possibility of off-target effect is largely avoided and the efficiency of silencing is boosted by the cooperative activity of many siRNAs (see https://www.sitoolsbiotech.com/documents/sipools/siPOOLBrochure2019_Web.pdf for more details). This being said, we agree that it would be better to confirm that the observed effect can be recapitulated using a single siRNA and that a complementation experiment would definitely strengthen our findings. For this reason, we will test two individual siRNAs targeting the 3' UTR of mmDicer, which will allow us to complement the knock-down by transfecting a cDNA construct. Regarding the CRISPR/Cas9 genome editing approach, we will give it a try, but Dicer is notoriously difficult to knock-out, so we cannot be sure that this will be successful.

      Figure 2: the authors knock-downed Dicer in M. myotis nasal epithelial cells and carried out infections with SINV-GFP and SFV. The authors conclude that Dicer is proviral as its depletion causes a decrease in SINV-GFP and SFV accumulation. While this conclusion is supported by the decrease levels of viral RNA and protein levels upon Dicer depletion (figure 2D, 2E, 2G), the effect on the viral titers is non-significant for both viruses (Figure 2C and 2F) based on the statistical analysis. This reviewer appreciates that the titers are lower upon Dicer knockdown, which support the authors' findings at the viral RNA and protein levels. However, as these results are central to the core message of the manuscript, the authors should provide evidence that this proviral effect observed is statistically significant on viral titers by perhaps providing additional repeats and/or comment on this observation.

      Reply: Indeed, we agree that even if the effect of Dicer knockdown results in a lowering of the viral titer, it would be better to have a statistically significant effect. We will repeat the experiment to increase the number of replicates and the power of the statistical test.

      a) *In figure 4 and 5, the authors nicely show that mmDicer accumulate to cytoplasmic foci in M. myotis cells upon infection with SFV and SINV and these foci co-localise with double-stranded RNA. The authors used a commercial polyclonal antibody against Dicer (A301-937A, Bethyl according to the Material and Methods section) which is specific to human Dicer to carry out their immunostaining in bat cells. The authors should provide evidence that this antibody indeed recognises/crossreacts with mmDicer as well and that the staining shown is indeed specific to mmDicer localisation especially because the heterologous expression of HA-tagged version of mmDicer in HEK 293T NoDice cells did not show this accumulation of cytoplasmic foci. The authors should verify the specificity of their mmDicer immunostaining by performing the same labelling in bat cells in which Dicer is knock-downed (or knock out) by individual and validated siRNA against mmDicer. The decrease signal of bat Dicer staining using the anti-human Dicer antibody would indicate specificity. *

      Reply: the reviewer is correct in its assertion and it is important to provide evidence that the protein that is detected by the anti-human Dicer antibody in bat cells is indeed Dicer. We will perform the suggested experiment and do an immunostaining using the Dicer antibody in bat cells upon Dicer knockdown.

      b) Another complementary approach would be to test their Dicer staining between HEK NoDice cells (no Dicer present) versus NoDice complemented with either mmDicer or human Dicer constructs, which would then indicate how much the anti-human Dicer antibody recognises bat Dicer.

      Reply: this complementary approach should yield even cleaner result than the previous one as there will be no expression of Dicer at all in the HEK NoDice cells. Therefore, we should be able to measure the increase of signal in the IF upon expression of either human or bat Dicer. We will perform this experiment together with the other one suggested above. In addition, since the constructs are tagged, we might be able to do a double-staining and verify the colocalization of the two signals.

      c) In addition, the authors should overexpress HA-tagged mmDicer in M. myotis nasal epithelial cells and test whether HA-mmDicer accumulate into foci upon infection using an anti-HA immunostaining. This would confirm that these accumulation into foci indeed is specific to mmDicer but also would reinforce the authors' findings that host factors within bat cells are important for this formation into foci since mmDicer expression in HEK 293T No Dice cells didn't show this phenotype upon infection (figure 6). OPTIONAL: it would be interesting to overexpress HA-tagged human Dicer into M. myotis nasal epithelial cells as well to then test using anti-HA staining whether human Dicer in presence of host factors from the bat can accumulate into cytoplasmic foci or not upon viral infection.

      Reply: we could perform the suggested experiment, but we might face the issue that transfected cells might mount an immune response, which makes them resistant to the infection. We have observed indeed that we needed to use a higher MOI to infect cells after they have been transfected. Since we will have controls in place, this might not be too much of a problem, but we will have to keep it in mind. Alternatively, we will perform a lentiviral transduction of the cells.

      This reviewer appreciates that this might be judged as beyond the scope of this study since it is focused on the role of Dicer in M. myotis. However, the observation that mmDicer accumulates into foci containing as well viral dsRNA is very interesting and it would significantly improve the manuscript if the authors would provide further indications that this phenotype is related to the lack of antiviral activity of mmDicer compared to what has been previously shown in other bat species (P.alecto and T. brasiliensis). In other words, is this accumulation of mmDicer into foci responsible for its different impact on virus infection? It would therefore be insightful to compare Dicer localisation upon infection in M. myotis versus P.alecto and/or T. brasiliensis bat cells in which Dicer was shown to be antiviral and test whether this accumulation in foci is only observed in bat cells in which Dicer is proviral (M. myotis) but not in the other bat cells in which Dicer is antiviral (P.alecto and/or T. brasiliensis).

      Reply: this is something that we have been wondering about and we have therefore started to look for the cell lines that have been described in the two published studies. While it proved difficult to find the PaKi cells from P. alecto bats, which is not commercially available, we have obtained the Tblu cells from T. brasiliensis and will look at Dicer localization in this model. However, we have to pay attention to the fact that the published data reported a contribution of RNAi in this cell line upon SARS-CoV-2 infection and that we will be using SINV. In addition, we do not know yet whether the anti-Dicer antibody will cross react with the T. brasiliensis Dicer protein.

      OPTIONAL: Given the difference between the provial role of mmDicer compared to the antiviral activity of Dicer in cells from P.alecto and T. brasiliensis bat cells, it would strengthen the authors' findings. if additional experiments would be conducted in parallel using M. myotis, P.alecto and/or T. brasiliensis cells. Notably knocking down Dicer in both M. myotis, P.alecto and/or T. brasiliensis cells, compare the impact on viral infections with SINV, SFV, VSV and correlate any observed difference in phenotype with putative variations in the formation of foci.

      Reply: it would indeed be really nice to be able to do the Dicer knockdown experiment in several bat cell lines and to correlate the phenotype with the formation of foci. This experiment might take a long time and we are not sure to be able to realize it in a reasonable amount of time. It could however be the subject of another manuscript further down the line.

      *Minor comments *

        • Figure 2I: The authors performed a knockdown of Dicer in M. myotis nasal epithelial cells and monitor the impact on VSV-GFP infection. They found that knocking down Dicer leads to an increase in GFP protein and RNA levels suggesting an antiviral role of Dicer while, in contrast, no effect is observed on the production of infectious particles (figure 2H). On the western blot there is only a slight/weak increase of GFP protein level observed upon Dicer knockdown. Yet, the quantification of the band intensity shows a 4-fold increase relative to tubulin and compared to cells treated with siRNA control. This 4-fold increase seems exaggerated given the low increase in the intensity shown on the blot. This discrepancy is most likely due to the lower intensity of tubulin in the western blot analysis of siDicer-treated cells compared to siNeg-treated cells. The authors should reload their western blot with equal amount of protein extract loaded to ensure that the results shown on the western blot are in line with the quantification.*

      Reply: the signal quantification for this experiment was done across several replicates, but we agree that the observed effect seems exaggerated when compared to the signal seen on the blot. We observed important variations between replicates, but we will make sure that this was not due to a problem in the analysis and reload the western blot if needed.

        • Figure 3D: the authors mention that in both HEK293T cells and M. myotis nasal epithelial cells infected with SINV-GFP, there was an enrichment of 22-nucleotides (nt) paired positive and negative sense reads that overlapped with a 2-nt overhang, typical of Dicer cleavage. In Figure 3D, the data shows indeed that the duplexes are enriched for reads of 22-nt but it is unclear how this analysis reveals a 3' 2nt overhang within these duplexes. Can the authors clarify this point and if the data provided in that particular analysis indeed doesn't allow to detect these overhangs, please rephrase accordingly or provide additional analysis to support that point. *

      Reply: In Figure 3D, the graphs show the probability of pairing of all 22 nucleotides sequence mapping either to the plus or the minus strand of the viral RNA. Thus, for each sequence mapping to the plus strand, the number of sequences mapping to the minus strand with a full or partial overall is counted. A corresponding probability of pairing and Z score is calculated for each number of overlapping nucleotides (for more information on the calculation see Antoniewski (2014) Computing siRNA and piRNA Overlap Signatures. In Animal Endo-SiRNAs: Methods and Protocols, Werner A (ed) pp 135-146. New York, NY: Springer). The Z score peaks for an overlap of 20 nt in both HEK293T and M. myotis nasal epithelial cells infected with SINV. This means that there is a higher probability of two 22 nt sequence to pair along 20 nt, and thus that there are two unpaired nucleotides at the extremities of the duplexes. This higher Z score at 20 nt is not seen in VSV-infected cells. We will rephrase the text in the manuscript to make this point clearer.

        • Typo: page 5, line 152: the authors mention that Dicer knock down had an antiviral effect against VSV-GFP infection at the RNA and protein levels. However, the data in Figure 2I and 2J show an increase in both GFP RNA and proteins levels upon knockdown of Dicer. Although this data suggests that Dicer is antiviral against VSV, the knockdown of Dicer itself is not antiviral but rather proviral/increase virus accumulation. Please rephrase this sentence to avoid confusions. *

      Reply: thank you for spotting this typo. We have corrected it accordingly.

      Reviewer #2.

      1. Figure 1 relies on transduction of cells and antibiotic selection to obtain mmDicer-expressing cells. Although we would expect that every cell expresses the construct of interest, this is not always the case, depending on the cell type and toxicity of the construct. As the constructs are tagged, I suggest that the authors use flow cytometry to measure expression levels in a single cell manner. While doing so, they can infect with SINV-GFP and correlate GFP signal with construct expression in each cell, providing a more accurate measurement of mmDicer effect on viral infection. Alternatively, the authors could use live microscopy, as done in Fig 2, to obtain similar data.

      Reply: the reviewer is correct that we did not go for monoclonal selection of our mmDicer-expressing cells and therefore that there could be some cell-to-cell variation in expression. However, we have done immunostaining of Dicer in these cells and did not see drastic differences in expression, so we do not think this should impact SINV-GFP expression in a major way. We will provide these images and a quantification of the Dicer signal as a supplementary figure.

      For Fig 1C and 1F, it would be great to have growth curves with two different MOIs, instead of a single time point, to ensure that a putative antiviral effect is not missed. Same goes for Fig 2C, especially when the authors document quite a big defect on GFP expression (a proxy for SINV infection) when Dicer is knocked down (Fig 2B). There may be a bigger difference in titers at earlier time points. This matter runs throughout the manuscript. I do not suggest that the authors should provide growth curves every time viral titers are measured, but it is still worth doing it for the 2-3 key experiments of the paper.

      Reply: we will perform growth curves of virus infection for the key experiments in the manuscript as suggested. We already have done kinetic measurements of GFP accumulation at different MOIs, which we can provide as supplementary data, but we agree with the reviewer that GFP signal should not been used as the only proxy for the infection and that measuring viral titers by plaque assay is important as well.

      Figure 4, could the authors provide a proof that the Dicer antibody is specific in the bat context? This can be done by staining Dicer in bat cells knocked down for Dicer and infected with SINV. The apparition of foci upon anti-Dicer antibody staining should be abbrogated or severely impaired by the knock-down.

      Reply: see our reply to point 3 of Reviewer 1.

      Fig 5C, please provide a quantification of the images.

      Reply: these microscopy images have not been quantified because they have been obtained with an epifluorescence microscope. Indeed, the Pearson correlation coefficient can only be obtained using a confocal microscope. In fact, we have tried to use a confocal microscope to take pictures of these FISH images, but the SINV gRNA signal was too weak or the dots too small to be properly visualized. Furthermore, there is a very large difference in signal intensity between HEK293T and M. myotis cells, making it difficult to define a signal threshold compatible for both cell lines.

      l.263, when comparing this work with the recent publications on bat antiviral RNAi, the authors could also provide the percentage identity between Dicers from different species.

      Reply: this is a valid point, we have looked at the percentage identity between Dicer proteins from different bat species but we did not include this in our manuscript. We will provide this analysis in the revised version together with a comparison of Dicer from other mammals as a reference point.

      Reviewer 3.

        • Without direct comparison to the other bat species Dicers (especially where RNAi activity has been suggested as antiviral in previous publications) there is little in this paper that can be concluded about global aspects of bat dicer/RNAi.*

      Reply: see our reply to point 4 of Reviewer 1. We are planning to look at least in Tblu cells whether there is also a relocalization of Dicer upon SINV infection. So far, we could not obtain PaKi cells, but we are still looking and should we get those, we will test them as well.

      *Minor *

      What rules out that the mmDicer re-localization observed in the immortalized mm nasal epithelial is due simply to greater expression levels over the NoDice cells heterologously expressing mmDicer?

      Reply: we will provide an immunoblot to show the level of Dicer expression between HEK NoDice + mmDicer and M. myotis nasal epithelial cells as suggested below to address this point.

      • Although partially addressed in the text stating the generally long half-life of miRNAs, it seems the simplest explanation for this observation is due to some activity of a shorter-lived miRNA is required for optimal alphavirus replication is the mm nasal epithelial cells. *

      Reply: this is an interesting hypothesis that would prove difficult to test in a reasonable amount of time. We thank the reviewer and will mention this possibility in the discussion of the revised manuscript.

      *Suggestions that could enhance the magnitude of conclusions that can be drawn from this work. *

      *Major *

        • Making NoDice cells expressing other bat species Dicers, including those with claims that RNAi is antiviral, would address how universal these current observations are to bats/cell lines.*

      Reply: this could be an alternative to the use of P. alecto or T. brasiliensis cell lines that we have mentioned above. We will try to clone Dicer from the Tblu cells that we have in the laboratory. Since we do not have PaKi cells at the moment, it will be more complicated for the Pteropus Dicer, but one possibility could be to synthesize it. However, Dicer is a big gene so it could prove tricky.

        • Including an immunoblot showing that mm cells express mmDicer no more abundantly than the heterologous NoDice cells would allow ruling out the trivial explanation that foci occur at a certain critical mass of Dicer*

      Reply: yes, we will provide this piece of data as stated in reply to point 2.

      *Minor *

        • I believe line 151 " In contrast, Dicer * *knock down had an ANTIVIRAL effect against VSV-GFP infection at the RNA and protein *

      *levels, but no difference in titers was found (Fig. 2H-J)." should be " In contrast, Dicer *

      *knock down had an PROVIRAL effect against VSV-GFP infection at the RNA and protein *

      *levels, but no difference in titers was found (Fig. 2H-J)." *

      Reply: thank you for spotting this error, which was also mentioned by Reviewer 1, we have corrected this in the text.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC-2025-02946

      Corresponding author(s): Margaret, Frame

      Roza, Masalmeh

      [Please use this template only if the submitted manuscript should be considered by the affiliate journal as a full revision in response to the points raised by the reviewers.

      If you wish to submit a preliminary revision with a revision plan, please use our "Revision Plan" template. It is important to use the appropriate template to clearly inform the editors of your intentions.]

      1. General Statements [optional]

      This section is optional. Insert here any general statements you wish to make about the goal of the study or about the reviews.

      We thank the reviewers for recognizing the significance of our work and for their constructive feedback and suggestions, most of which we have implemented in our revised manuscript.

      2. Point-by-point description of the revisions

      This section is mandatory. *Please insert a point-by-point reply describing the revisions that were already carried out and included in the transferred manuscript. *

      Reviewer #1

      Evidence, reproducibility and clarity

      Review of Masalmeh et al. Title: "FAK modulates glioblastoma stem cell energetics..."

      Previous studies have implicated FAK and the related tyrosine kinase PYK2 in glioblastoma growth, cell migration, and invasion. Herein, using a murine stem cell model of glioblastoma, the authors used CRISPR to inactivate FAK, FAK-null cells selected and cloned, and lentiviral re-expression of murine FAK in the FAK-null cells (termed FAK Rx) was accomplished. FAK-/- cells were shown to possess epithelial characteristics whereas FAK Rx cells expressed mesenchymal markers and increased cell migration/invasion in vitro. Comparisons between FAK-/- and FAK Rx cells showed that FAK re-expressed increased mitochondrial respiration and amino acid uptake. This was associated with FAK Rx cells exhibiting filamentous mitochondrial morphology (potentially an OXPHOS phenotype) and decreased levels of MTFR1L S235 phosphorylation (implicated in mito morphology fragmentation). Mito and epithelial cell morphology of FAK-/- cells was reversed by treatment with Rho-kinase inhibitors that also increased mito metabolism and cell viability. Last, FAK-dependent glioblastoma tumor growth was shown by comparisons of FAK-/- and FAK Rx implantation studies.

      The studies by Masalmeh provide interesting findings associating FAK expression with changes in mitochondrial morphology, energy metabolism, and glutamate uptake. According to the authors model, FAK expression is supporting a glioblastoma stem cell like phenotype in vitro and tumor growth in vivo. What remains unclear is the mechanistic connection to cell changes and whether or not these are be dependent on intrinsic FAK activity or as the Frame group has previously published, potentially FAK nuclear localization. The associations with MTFR1L phosphorylation and effects by Rho kinase inhibition are likely indirect and remind this reviewer of long-ago studies with FAK-null fibroblasts that exhibit epithelial characteristics, still express PYK2, exhibited elevated RhoA GTPase activity. Some of these phenotypes were linked to changes in RhoGEF and RhoGAP signaling with FAK and/or Pyk2. At a minimum, it would be informative to know whether Pyk2 signaling is relevant for observed phenotypes and whether the authors can further support their associations with FAK-targeted or FAK-Pyk2-targeted inhibitors or PROTACs.

      Some questions that would enhance potential impact. 1. Cell generation. Please describe the analysis of FAK-/- clones in more detail. The "low viability" phenotype needs further explanation with regard to clonal expansion and growth characteristics?

      Response:

      • We included a better description and a supplementary figure in our revised manuscript to indicate that we have examined several FAK -/- clones and confirmed that our observations were not due to clonal variation; multiple clones displayed similar morphological changes (Figure S1D). We also show that the elongated mesenchymal-like morphology was observed at 48 h after nucleofecting the cells with the FAK‑expressing vector, before beginning G418 selection to enrich for cells expressing FAK (Figure S1C). We also included experiments to acutely modulate FAK signalling (detaching and seeding cells on fibronectin) (Figure S2D, E, F and Figure S3) to exclude the possibility that the profound effects are due to protocols/selection we used for generating FAK-deleted cells.
      • Regarding the term "low viability", we have clarified in the text that there is no significant difference in cell number (Figure S1A) or 'cell viability' when it is assessed by trypan blue exclusion (a non-mitochondria-dependent read-out) (Figure S1B) between FAK-expressing FAK Rx and FAK-/- cells cultured for three days under normal conditions. Therefore, we agree the term 'cell viability' in this context could be confusing and have replace "cell viability" with "metabolic activity as measured by Alamar Blue." in Figure 1D and Figure 5B, and the corresponding text in the original manuscript. This wording more accurately reflects the data.

      Figure 1F: need further support of MET change upon FAK KO and EMT reversion.

      Response: We have added a heatmap (Figure S1E) illustrating the changes in protein expression of core-enriched EMT/MET genes products (by proteomics) after FAK gene deletion (EMT genes as defined in Howe et al., 2018) ; this strengthens the conclusion that the MET reversion morphological phenotype is accompanied by recognised MET protein changes.

      Fig. 2: Need further support if FAK effects impact glycolysis or oxidative phosphorylation in particular as implicated by the stem cell model.

      Response: We show that FAK impacts both glycolysis (Figure 2A, 2E, and 2F) and mitochondrial oxidative phosphorylation on the basis of the oxygen consumption rate (OCR) (Figure 2B, and 2D), showing both are contributing pathways to FAK-dependent energy production. We have clarified this in the text.

      Is there a combinatorial potential between FAKi and chemotherapies used for glioblastoma. Need to build upon past studies.

      Response: Yes, previous studies suggest that inhibiting FAK can sensitize GBM cells to chemotherapy (Golubovskaya et al., 2012; Ortiz-Rivera et al., 2023). We have included a paragraph in the discussion section to make sure this is clearer. Although it is not the subject of this study, we appreciate it is useful context.

      The notation of changes in glucose transporter expression should be followed up with regard to the potential that FAK-expressing cells may have different uptake of carbon sources and other amino acids. Altered uptake could be one potential explanation for increase glycolysis and glutamine flux.

      Response: We agree with the reviewer that glucose uptake could be contributing and we include data that 2 glucose transporters are indeed FAK-regulated namely Glucose transporter 1 (GLUT1, encoded by Slc2a1 gene) and Glucose transporter 3 (GLUT 3, encoded by Slc2a3 gene) (shown in Figure S2B and C).

      It would be helpful to support the confocal microscopy of mitos with EM.

      Response:

      We are concerned (and in our experience) that Electron microscopy (EM) may introduce artefacts during sample preparation. In contrast, immunofluorescence sample preparation is less susceptible to artefacts. The SORA system we used is not a conventional point-scanning confocal microscope, but is a super-resolution module based on a spinning disk confocal platform (CSU-W1; Yokogawa) using optical pixel reassignment with confocal detection. This method enhances resolution in all dimensions with resolution in our samples measured at 120nm. This has been instructive in defining a new level of changes in mitochondrial morphology upon FAK gene deletion.

      Lack of FAK expression with increased MTFR1 phosphorylation is difficult to interpret.

      Response: We do not directly show that this phosphorylation event is causal in our experiments; however, we think it important to document this change since it has been published that phosphorylation of MTFR1 has been causally linked to the mitochondrial morphology we observed in other systems (Tilokani et al., 2022).

      Need to have better support between loss of FAK and the increase in Rho signaling. Use of Rho kinase inhibitors is very limited and the context to FAK (and or Pyk2) remains unclear. Past studies have linked integrin adhesion to ECM as a linkage between FAK activation and the transient inhibition of RhoA GTP binding. Is integrin signaling and FAK involved in the cell and metabolism phenotypes in this new model?

      Response: To better support the antagonistic effect of FAK on Rho-kinase (ROCK) signalling, we included a new experiment in which the integrin-FAK signalling pathway has been disrupted by treating FAK WT cells with an agent that causes detachment from the substratum, Accutase, and growing the cells in suspension in laminin-free medium. We present ROCK activity data, as judged by phosphorylated MLC2 at serine 19 (pMLC2 S19), relating this to induced FAK phosphorylation at Y397 (a surrogate for FAK activity) that is supressed after integrin disengagement. These measurements have been compared with conditions whereby integrin-FAK signalling is activated by growing the cells on laminin coated surfaces. We observed a time-dependent decrease in pFAK(Y397) levels (normalised to total FAK) in suspended cells compared to those spread on laminin, while pMLC2(S19) levels increased in a reciprocal manner over time in detached cells relative to spread cells (S4A and B). There is therefore an inverse relationship between integrin-FAK signalling and ROCK-MLC2 activity, consistent with findings from FAK gene deletion experiments. In the former case, we do not rely on gene deletion cell clones.

      Significance

      The studies by Masalmeh provide interesting findings associating FAK expression with changes in mitochondrial morphology, energy metabolism, and glutamate uptake. According to the authors model, FAK expression is supporting a glioblastoma stem cell like phenotype in vitro and tumor growth in vivo. What remains unclear is the mechanistic connection to cell changes and whether or not these are be dependent on intrinsic FAK activity or as the Frame group has previously published, potentially FAK nuclear localization. The associations with MTFR1L phosphorylation and effects by Rho kinase inhibition are likely indirect and remind this reviewer of long-ago studies with FAK-null fibroblasts that exhibit epithelial characteristics, still express PYK2, exhibited elevated RhoA GTPase activity. Some of these phenotypes were linked to changes in RhoGEF and RhoGAP signaling with FAK and/or Pyk2. At a minimum, it would be informative to know whether Pyk2 signaling is relevant for observed phenotypes and whether the authors can further support their associations with FAK-targeted or FAK-Pyk2-targeted inhibitors or PROTACs.

      __Response: __

      Deleting the gene encoding FAK in mouse embryonic fibroblasts leads to elevated Pyk2 expression (Sieg, 2000). However, in the GBM stem cell model we used here, Pyk2 was not expressed (determined by both transcriptomics and proteomics). We have included Figure S1E to show that PYK2 expression was undetectable in FAK -/- and FAK Rx cells at the RNA level (Figure S1F). We conclude that there is no compensatory increase in Pyk2 upon FAK loss in these cells. In the transformed neural stem cell model of GBM, we do not consistently or robustly detect nuclear FAK.

      Review #2

      Masalmeh and colleagues employ a neural stem/progenitor cell-based glioma model (NPE cells) to investigate the role of Focal Adhesion Kinase (FAK) in GBM, with a focus on potential links between the regulation of morphological/adhesive and metabolic GBM cell properties. For this, the authors employ wt cells alongside newly generated FAK-KO and -reexpressing cells, as well as pharmacological interventions to probe the relevance of specific signaling pathways. The authors´ main claims are that FAK crucially modulates glioma cell morphology, cell-cell and cell-substrate interactions and motility, as well as their metabolism, and that these effects translate to changes to relevant in vivo properties such as invasion and tumor growth.

      My main issues are with the model chosen by the authors.

      As per the methods section, generation of FAK-KO and -"Rx" NPE cells entailed protracted selection/expansion processes, which may have resulted in inadvertent selection for cellular/molecular properties unrelated to the desired one (loss or gain of FAK expression) and which may have had cascading effects on NPE cells. The authors nonetheless repeatedly claim the parameters they quantify, such as mitochondrial or cytoskeletal properties or metabolic features, to have directly resulted from FAK loss or reintroduction. Examples of such causal inferences are to be found in lines 123, 134/135, 165, 181. Such causal claims are, in my view, unsupported.

      Acute perturbation of FAK expression/activity, genetically or pharmacologically, followed by a rapid assessment of the processes under investigation, would be needed to begin to assess causality, even if acute genetic perturbations may be technically challenging as sufficient gene expression reduction or restoration to physiologically relevant levels may be hard to achieve.

      Response:

      We would like to first comment on the model we used here, which we think will clarify the validity of our approach. The model is a transformed stem cell model of GBM that was published in (Gangoso et al., Cell, 2021) and is now used regularly in the GBM field. As mentioned in the response to Reviewer 1, we have added text (page 4 and 5 in the revised manuscript) and a new supplementary figure (Figure S1D) clarifying that the morphological changes we observed were consistent across multiple FAK -/- clones, showing this was not due to any inter-clonal variability. We also added images showing that the morphological changes were apparent at 48 h after nucleofecting FAK -/- cells with the FAK‑expressing vector specifically (not the empty vector), prior to starting G418 selection to enrich for FAK‑expressing cells (Figure S1C), addressing the worry that clonal variation and selection was the cause of the FAK-dependent phenotypes we observed. We believe that our model provides a type of well controlled, clean genetic cancer cell system of a type that is commonly used in cancer cell biology, allowing us to attribute phenotypes to individual proteins.

      We have also carried out a more acute treatment by using the FAK inhibitor VS4718 to perturb FAK kinase activity and assessed the effects on glycolysis and glutamine oxidation after 48h treatment (Figure S2D, E and F). We found that treating the transformed neural stem cells (parental population) with FAK inhibitor (300nM VS4718) decreases glucose incorporation into glycolysis intermediates and glutamine incorporation into TCA cycle intermediates, consistent with a role for FAK's kinase activity in maintaining glycolysis and glutamine oxidation.

      The employed pharmacological modulation of ROCK activity is the only approach that, given the presumably acute nature of the treatment, may have allowed the authors to probe the proposed functional links. The methods section of the manuscript does not however comprise details as to the duration of these treatments, which leaves open the possibility of long-term treatment having been carried out (data shown in Figure 5B refers to 72hr treatment).

      __Response: __

      We have added the duration of the treatment to the Methods section and Figure Legends, to clarify that cells were treated with ROCK inhibitors for 24h, before assessing the effects on mictochondria (Figure 4C, D, S4C and D) and glutamine oxidation (Figure 5A, and S5). For metabolic activity by AlamarBlue assay, cells were treated with ROCK inhibitors for 72h (Figure 5B).

      Even in the case of ROCK inhibitor experiments, it is however unclear if and how the effects on cell morphology and adhesion, mitochondrial organization and metabolic activity may be connected to each other and, if at all, to FAK expression.

      Given the above uncertainties due to the nature of the model and experimental approaches, it is hard to assess the reliability and thus the relevance of the findings.

      Response:

      FAK suppresses ROCK activity (as judged by pMLC2 S19, Figure 4A and B). Treating FAK -/- cells with two different ROCK inhibitors restored mesenchymal-like cell morphology, mitochondrial morphology and glutamine oxidation. As mentioned above, to strengthen our evidence for the antagonistic role of FAK in ROCK-MLC2 signalling, we have now introduced an experiment whereby integrin-FAK signalling was disrupted through treatment with a detachment agent (Accutase), and subsequently maintaining the cells in suspension in laminin-free medium. We assessed pMLC2 S19 levels (a measure of ROCK activity) relating this to FAK phosphorylation that is supressed after integrin disengagement. These results were evaluated relative to spread wild type cells growing on laminin where Integrin-FAK signalling was active (Figure S4A and B). We observed an inverse relationship between Integrin-FAK signalling and ROCK-MLC2 activity in keeping with our conclusions (Figure 4A and B).

      Experimental support for the ability of cell-substrate interaction modulation to concomitantly impact cellular metabolism and motility/invasion would be significant both in terms of advancing our understanding of glioma cell biology and of its translational potential, but the evidence being provided is at best compatible with the proposed model.

      Response: We carried out a new experiment to support the ability of cell-substrate interaction modulation to impact metabolism; specifically, we inhibited cell-substrate interactions by plating the cells on Poly-2-hydroxyethyl methacrylate (Poly 2-HEMA)-coated dishes. This suppressed FAK phosphorylation at Y397, as expected, with concomitant reduction in glutamine utilisation in the TCA cycle (Figure S3A, B and C).

      My background/expertise is in developmental and adult neurogenesis, in vivo modelling of gliomagenesis and cell fate control/reprogramming, with a focus on molecular mechanisms of differentiation and quantitative aspects of lineage dynamics; molecular details of the control of cellular metabolism, cell-cell adhesion and cytoskeletal dynamics are not core expertise of mine.

      We appreciate this reviewer's expertise are not necessarily in the cancer cell biology and genetic intervention aspects of our study. We hope that the explanations we have provided satisfy the reviewer that our conclusions are valid.

    2. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC-2025-02946

      Corresponding author(s): Margaret, Frame

      Roza, Masalmeh

      [Please use this template only if the submitted manuscript should be considered by the affiliate journal as a full revision in response to the points raised by the reviewers.

      If you wish to submit a preliminary revision with a revision plan, please use our "Revision Plan" template. It is important to use the appropriate template to clearly inform the editors of your intentions.]

      1. General Statements [optional]

      This section is optional. Insert here any general statements you wish to make about the goal of the study or about the reviews.

      We thank the reviewers for recognizing the significance of our work and for their constructive feedback and suggestions, most of which we have implemented in our revised manuscript.

      2. Point-by-point description of the revisions

      This section is mandatory. *Please insert a point-by-point reply describing the revisions that were already carried out and included in the transferred manuscript. *

      Reviewer #1

      Evidence, reproducibility and clarity

      Review of Masalmeh et al. Title: "FAK modulates glioblastoma stem cell energetics..."

      Previous studies have implicated FAK and the related tyrosine kinase PYK2 in glioblastoma growth, cell migration, and invasion. Herein, using a murine stem cell model of glioblastoma, the authors used CRISPR to inactivate FAK, FAK-null cells selected and cloned, and lentiviral re-expression of murine FAK in the FAK-null cells (termed FAK Rx) was accomplished. FAK-/- cells were shown to possess epithelial characteristics whereas FAK Rx cells expressed mesenchymal markers and increased cell migration/invasion in vitro. Comparisons between FAK-/- and FAK Rx cells showed that FAK re-expressed increased mitochondrial respiration and amino acid uptake. This was associated with FAK Rx cells exhibiting filamentous mitochondrial morphology (potentially an OXPHOS phenotype) and decreased levels of MTFR1L S235 phosphorylation (implicated in mito morphology fragmentation). Mito and epithelial cell morphology of FAK-/- cells was reversed by treatment with Rho-kinase inhibitors that also increased mito metabolism and cell viability. Last, FAK-dependent glioblastoma tumor growth was shown by comparisons of FAK-/- and FAK Rx implantation studies.

      The studies by Masalmeh provide interesting findings associating FAK expression with changes in mitochondrial morphology, energy metabolism, and glutamate uptake. According to the authors model, FAK expression is supporting a glioblastoma stem cell like phenotype in vitro and tumor growth in vivo. What remains unclear is the mechanistic connection to cell changes and whether or not these are be dependent on intrinsic FAK activity or as the Frame group has previously published, potentially FAK nuclear localization. The associations with MTFR1L phosphorylation and effects by Rho kinase inhibition are likely indirect and remind this reviewer of long-ago studies with FAK-null fibroblasts that exhibit epithelial characteristics, still express PYK2, exhibited elevated RhoA GTPase activity. Some of these phenotypes were linked to changes in RhoGEF and RhoGAP signaling with FAK and/or Pyk2. At a minimum, it would be informative to know whether Pyk2 signaling is relevant for observed phenotypes and whether the authors can further support their associations with FAK-targeted or FAK-Pyk2-targeted inhibitors or PROTACs.

      Some questions that would enhance potential impact. 1. Cell generation. Please describe the analysis of FAK-/- clones in more detail. The "low viability" phenotype needs further explanation with regard to clonal expansion and growth characteristics?

      Response:

      • We included a better description and a supplementary figure in our revised manuscript to indicate that we have examined several FAK -/- clones and confirmed that our observations were not due to clonal variation; multiple clones displayed similar morphological changes (Figure S1D). We also show that the elongated mesenchymal-like morphology was observed at 48 h after nucleofecting the cells with the FAK‑expressing vector, before beginning G418 selection to enrich for cells expressing FAK (Figure S1C). We also included experiments to acutely modulate FAK signalling (detaching and seeding cells on fibronectin) (Figure S2D, E, F and Figure S3) to exclude the possibility that the profound effects are due to protocols/selection we used for generating FAK-deleted cells.
      • Regarding the term "low viability", we have clarified in the text that there is no significant difference in cell number (Figure S1A) or 'cell viability' when it is assessed by trypan blue exclusion (a non-mitochondria-dependent read-out) (Figure S1B) between FAK-expressing FAK Rx and FAK-/- cells cultured for three days under normal conditions. Therefore, we agree the term 'cell viability' in this context could be confusing and have replace "cell viability" with "metabolic activity as measured by Alamar Blue." in Figure 1D and Figure 5B, and the corresponding text in the original manuscript. This wording more accurately reflects the data.

      Figure 1F: need further support of MET change upon FAK KO and EMT reversion.

      Response: We have added a heatmap (Figure S1E) illustrating the changes in protein expression of core-enriched EMT/MET genes products (by proteomics) after FAK gene deletion (EMT genes as defined in Howe et al., 2018) ; this strengthens the conclusion that the MET reversion morphological phenotype is accompanied by recognised MET protein changes.

      Fig. 2: Need further support if FAK effects impact glycolysis or oxidative phosphorylation in particular as implicated by the stem cell model.

      Response: We show that FAK impacts both glycolysis (Figure 2A, 2E, and 2F) and mitochondrial oxidative phosphorylation on the basis of the oxygen consumption rate (OCR) (Figure 2B, and 2D), showing both are contributing pathways to FAK-dependent energy production. We have clarified this in the text.

      Is there a combinatorial potential between FAKi and chemotherapies used for glioblastoma. Need to build upon past studies.

      Response: Yes, previous studies suggest that inhibiting FAK can sensitize GBM cells to chemotherapy (Golubovskaya et al., 2012; Ortiz-Rivera et al., 2023). We have included a paragraph in the discussion section to make sure this is clearer. Although it is not the subject of this study, we appreciate it is useful context.

      The notation of changes in glucose transporter expression should be followed up with regard to the potential that FAK-expressing cells may have different uptake of carbon sources and other amino acids. Altered uptake could be one potential explanation for increase glycolysis and glutamine flux.

      Response: We agree with the reviewer that glucose uptake could be contributing and we include data that 2 glucose transporters are indeed FAK-regulated namely Glucose transporter 1 (GLUT1, encoded by Slc2a1 gene) and Glucose transporter 3 (GLUT 3, encoded by Slc2a3 gene) (shown in Figure S2B and C).

      It would be helpful to support the confocal microscopy of mitos with EM.

      Response:

      We are concerned (and in our experience) that Electron microscopy (EM) may introduce artefacts during sample preparation. In contrast, immunofluorescence sample preparation is less susceptible to artefacts. The SORA system we used is not a conventional point-scanning confocal microscope, but is a super-resolution module based on a spinning disk confocal platform (CSU-W1; Yokogawa) using optical pixel reassignment with confocal detection. This method enhances resolution in all dimensions with resolution in our samples measured at 120nm. This has been instructive in defining a new level of changes in mitochondrial morphology upon FAK gene deletion.

      Lack of FAK expression with increased MTFR1 phosphorylation is difficult to interpret.

      Response: We do not directly show that this phosphorylation event is causal in our experiments; however, we think it important to document this change since it has been published that phosphorylation of MTFR1 has been causally linked to the mitochondrial morphology we observed in other systems (Tilokani et al., 2022).

      Need to have better support between loss of FAK and the increase in Rho signaling. Use of Rho kinase inhibitors is very limited and the context to FAK (and or Pyk2) remains unclear. Past studies have linked integrin adhesion to ECM as a linkage between FAK activation and the transient inhibition of RhoA GTP binding. Is integrin signaling and FAK involved in the cell and metabolism phenotypes in this new model?

      Response: To better support the antagonistic effect of FAK on Rho-kinase (ROCK) signalling, we included a new experiment in which the integrin-FAK signalling pathway has been disrupted by treating FAK WT cells with an agent that causes detachment from the substratum, Accutase, and growing the cells in suspension in laminin-free medium. We present ROCK activity data, as judged by phosphorylated MLC2 at serine 19 (pMLC2 S19), relating this to induced FAK phosphorylation at Y397 (a surrogate for FAK activity) that is supressed after integrin disengagement. These measurements have been compared with conditions whereby integrin-FAK signalling is activated by growing the cells on laminin coated surfaces. We observed a time-dependent decrease in pFAK(Y397) levels (normalised to total FAK) in suspended cells compared to those spread on laminin, while pMLC2(S19) levels increased in a reciprocal manner over time in detached cells relative to spread cells (S4A and B). There is therefore an inverse relationship between integrin-FAK signalling and ROCK-MLC2 activity, consistent with findings from FAK gene deletion experiments. In the former case, we do not rely on gene deletion cell clones.

      Significance

      The studies by Masalmeh provide interesting findings associating FAK expression with changes in mitochondrial morphology, energy metabolism, and glutamate uptake. According to the authors model, FAK expression is supporting a glioblastoma stem cell like phenotype in vitro and tumor growth in vivo. What remains unclear is the mechanistic connection to cell changes and whether or not these are be dependent on intrinsic FAK activity or as the Frame group has previously published, potentially FAK nuclear localization. The associations with MTFR1L phosphorylation and effects by Rho kinase inhibition are likely indirect and remind this reviewer of long-ago studies with FAK-null fibroblasts that exhibit epithelial characteristics, still express PYK2, exhibited elevated RhoA GTPase activity. Some of these phenotypes were linked to changes in RhoGEF and RhoGAP signaling with FAK and/or Pyk2. At a minimum, it would be informative to know whether Pyk2 signaling is relevant for observed phenotypes and whether the authors can further support their associations with FAK-targeted or FAK-Pyk2-targeted inhibitors or PROTACs.

      __Response: __

      Deleting the gene encoding FAK in mouse embryonic fibroblasts leads to elevated Pyk2 expression (Sieg, 2000). However, in the GBM stem cell model we used here, Pyk2 was not expressed (determined by both transcriptomics and proteomics). We have included Figure S1E to show that PYK2 expression was undetectable in FAK -/- and FAK Rx cells at the RNA level (Figure S1F). We conclude that there is no compensatory increase in Pyk2 upon FAK loss in these cells. In the transformed neural stem cell model of GBM, we do not consistently or robustly detect nuclear FAK.

      Review #2

      Masalmeh and colleagues employ a neural stem/progenitor cell-based glioma model (NPE cells) to investigate the role of Focal Adhesion Kinase (FAK) in GBM, with a focus on potential links between the regulation of morphological/adhesive and metabolic GBM cell properties. For this, the authors employ wt cells alongside newly generated FAK-KO and -reexpressing cells, as well as pharmacological interventions to probe the relevance of specific signaling pathways. The authors´ main claims are that FAK crucially modulates glioma cell morphology, cell-cell and cell-substrate interactions and motility, as well as their metabolism, and that these effects translate to changes to relevant in vivo properties such as invasion and tumor growth.

      My main issues are with the model chosen by the authors.

      As per the methods section, generation of FAK-KO and -"Rx" NPE cells entailed protracted selection/expansion processes, which may have resulted in inadvertent selection for cellular/molecular properties unrelated to the desired one (loss or gain of FAK expression) and which may have had cascading effects on NPE cells. The authors nonetheless repeatedly claim the parameters they quantify, such as mitochondrial or cytoskeletal properties or metabolic features, to have directly resulted from FAK loss or reintroduction. Examples of such causal inferences are to be found in lines 123, 134/135, 165, 181. Such causal claims are, in my view, unsupported.

      Acute perturbation of FAK expression/activity, genetically or pharmacologically, followed by a rapid assessment of the processes under investigation, would be needed to begin to assess causality, even if acute genetic perturbations may be technically challenging as sufficient gene expression reduction or restoration to physiologically relevant levels may be hard to achieve.

      Response:

      We would like to first comment on the model we used here, which we think will clarify the validity of our approach. The model is a transformed stem cell model of GBM that was published in (Gangoso et al., Cell, 2021) and is now used regularly in the GBM field. As mentioned in the response to Reviewer 1, we have added text (page 4 and 5 in the revised manuscript) and a new supplementary figure (Figure S1D) clarifying that the morphological changes we observed were consistent across multiple FAK -/- clones, showing this was not due to any inter-clonal variability. We also added images showing that the morphological changes were apparent at 48 h after nucleofecting FAK -/- cells with the FAK‑expressing vector specifically (not the empty vector), prior to starting G418 selection to enrich for FAK‑expressing cells (Figure S1C), addressing the worry that clonal variation and selection was the cause of the FAK-dependent phenotypes we observed. We believe that our model provides a type of well controlled, clean genetic cancer cell system of a type that is commonly used in cancer cell biology, allowing us to attribute phenotypes to individual proteins.

      We have also carried out a more acute treatment by using the FAK inhibitor VS4718 to perturb FAK kinase activity and assessed the effects on glycolysis and glutamine oxidation after 48h treatment (Figure S2D, E and F). We found that treating the transformed neural stem cells (parental population) with FAK inhibitor (300nM VS4718) decreases glucose incorporation into glycolysis intermediates and glutamine incorporation into TCA cycle intermediates, consistent with a role for FAK's kinase activity in maintaining glycolysis and glutamine oxidation.

      The employed pharmacological modulation of ROCK activity is the only approach that, given the presumably acute nature of the treatment, may have allowed the authors to probe the proposed functional links. The methods section of the manuscript does not however comprise details as to the duration of these treatments, which leaves open the possibility of long-term treatment having been carried out (data shown in Figure 5B refers to 72hr treatment).

      __Response: __

      We have added the duration of the treatment to the Methods section and Figure Legends, to clarify that cells were treated with ROCK inhibitors for 24h, before assessing the effects on mictochondria (Figure 4C, D, S4C and D) and glutamine oxidation (Figure 5A, and S5). For metabolic activity by AlamarBlue assay, cells were treated with ROCK inhibitors for 72h (Figure 5B).

      Even in the case of ROCK inhibitor experiments, it is however unclear if and how the effects on cell morphology and adhesion, mitochondrial organization and metabolic activity may be connected to each other and, if at all, to FAK expression.

      Given the above uncertainties due to the nature of the model and experimental approaches, it is hard to assess the reliability and thus the relevance of the findings.

      Response:

      FAK suppresses ROCK activity (as judged by pMLC2 S19, Figure 4A and B). Treating FAK -/- cells with two different ROCK inhibitors restored mesenchymal-like cell morphology, mitochondrial morphology and glutamine oxidation. As mentioned above, to strengthen our evidence for the antagonistic role of FAK in ROCK-MLC2 signalling, we have now introduced an experiment whereby integrin-FAK signalling was disrupted through treatment with a detachment agent (Accutase), and subsequently maintaining the cells in suspension in laminin-free medium. We assessed pMLC2 S19 levels (a measure of ROCK activity) relating this to FAK phosphorylation that is supressed after integrin disengagement. These results were evaluated relative to spread wild type cells growing on laminin where Integrin-FAK signalling was active (Figure S4A and B). We observed an inverse relationship between Integrin-FAK signalling and ROCK-MLC2 activity in keeping with our conclusions (Figure 4A and B).

      Experimental support for the ability of cell-substrate interaction modulation to concomitantly impact cellular metabolism and motility/invasion would be significant both in terms of advancing our understanding of glioma cell biology and of its translational potential, but the evidence being provided is at best compatible with the proposed model.

      Response: We carried out a new experiment to support the ability of cell-substrate interaction modulation to impact metabolism; specifically, we inhibited cell-substrate interactions by plating the cells on Poly-2-hydroxyethyl methacrylate (Poly 2-HEMA)-coated dishes. This suppressed FAK phosphorylation at Y397, as expected, with concomitant reduction in glutamine utilisation in the TCA cycle (Figure S3A, B and C).

      My background/expertise is in developmental and adult neurogenesis, in vivo modelling of gliomagenesis and cell fate control/reprogramming, with a focus on molecular mechanisms of differentiation and quantitative aspects of lineage dynamics; molecular details of the control of cellular metabolism, cell-cell adhesion and cytoskeletal dynamics are not core expertise of mine.

      We appreciate this reviewer's expertise are not necessarily in the cancer cell biology and genetic intervention aspects of our study. We hope that the explanations we have provided satisfy the reviewer that our conclusions are valid.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      RESPONSE TO REVIEWERS

      We thank the reviewers for their thoughtful and constructive feedback, which has been instrumental in improving the overall quality of our manuscript.

      In response, we have undertaken a substantial revision that includes new experimental data, refined analyses, and clearer presentation of our findings. Specifically, we have addressed concerns about RNAi efficiency and protein-level validation, expanded our genetic models to include loss-of-function contexts, and clarified the interpretation of mitochondrial morphology using both confocal and electron microscopy. We also incorporated new data on Cyclin E regulation and mitochondrial membrane potential to strengthen the mechanistic link between dPGC1 depletion and Yki-driven tumorigenesis. These revisions not only address the specific points raised by the reviewers but also enhance the coherence and impact of the study. We are confident that the revised manuscript presents a more robust and compelling case for the role of dPGC1 as a context-dependent tumor suppressor and that it will be of broad interest to the fields of developmental biology, cancer metabolism, and mitochondrial dynamics.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)): Sew et al. examine the master regulator of mitochondrial biogenesis, dPGC1, in the context of Drosophila wing and larval development. They primarily use confocal imaging to probe the interplay between dPGC1 and an overactive Hippo pathway, driven by overexpression of the main effector protein, Yki. In their study, they find that tumors, driven by overactivity of Yki grow larger when dPGC1 is downregulated, implicating the mitochondrial biogenesis pathway in tumor suppression. Furthermore, in the context of Yki overexpression, they find that levels of Mfn or Opa1 modulate tumor size. Lastly, they show a role of cyclin E in controlling the size of tumors formed by Yki OE + dPGC1 RNAi. The potential role of dPGC1 as a tumor suppressor is interesting because it highlights an emerging recognition of mitochondria in the aetiology of cancer. However, before publication, much of the data in this manuscript should be strengthened by a refinement in the methods/analysis and an increase in orthogonal approaches.

      We addressed concerns regarding RNAi efficiency and wing development by incorporating data from a dPGC1 mutant allele and using a ubiquitous driver for qPCR validation of transgene efficiency. We clarified the rationale for EM use. The manuscript now avoids overinterpretation of mitochondrial morphology and focuses on fusion-specific regulators. We also revised the narrative arc to maintain coherence and added loss-of-function models to support our conclusions.

      Below, we address each of the reviewer’s points in detail.

      Major comments:

      The authors indicate that for example, in lines127-28, that neither downregulating or overexpressing dPGC1 affects wing size. However, the quantification in Fig. 1C shows a significant decrease in wing size following RNAi treatment. This decrease is modest, but it is nevertheless significant. It is worth pointing out, too, that the efficiency of the RNAi in Fig. S1C suggests that the conclusions drawn are premature. While a roughly 55% drop in mRNA levels may be statistically significant, it is unclear whether this drop in transcripts corresponds to a commensurate depletion of protein. Moreover, it is unclear, in this context, how much dPGC1 may indeed be necessary to drive a relatively normal program of mitochondrial biogenesis in wing development. To obtain a clear result, it is necessary to show significant depletion of the dPGC1 protein. (Ultimately, if it is the case that dPGC1 is unnecessary for wing development and function, a more coherent line of inquiry would be to find out the reason for this rather than to pivot the story to studying tumorigenesis in larva.)

      We agree that the interpretation of the RNAi efficiency data requires clarification.

      The qPCR analysis shown in former Fig. S1C was performed using wing discs from flies expressing UAS-dPGC1-RNAi under the control of the MS1096-Gal4 driver. However, as shown in current Fig. 1C, MS1096-Gal4 is not expressed uniformly across the wing disc. Some regions remain RFP-negative, indicating that the RNAi construct is not active in all cells. As a result, the measured mRNA levels likely underestimate the true knockdown efficiency. This is because the qPCR includes mRNA from both RNAi-expressing and non-expressing cells, diluting the apparent reduction in transcript levels.

      To address this limitation and more accurately assess RNAi efficiency, we repeated the qPCR analysis using a ubiquitous driver (actin-Gal4) to ensure uniform expression of the RNAi construct. Under these conditions, we observed a more substantial knockdown, with dPGC1 mRNA levels reduced to approximately 25% of control levels (this is shown in current Fig S2). This result indicates that the RNAi line is more effective than initially suggested by the MS1096-Gal4-based analysis.

      To complement our RNAi-based analysis, we additionally used a mutant strain carrying a characterized allele of dPGC1 (dPGC11, also known as dPGC1KG08646; see FlyBase: https://flybase.org/reports/FBal0148128). This genetically distinct approach allowed us to validate and strengthen our findings regarding dPGC1 function. Flies homozygous for this allele exhibited a modest but statistically significant reduction in both wing disc and adult wing size. These results support the conclusion that dPGC1 is required for normal wing growth and development. The new data are now included in Figure 1 and referenced in the main text (lines 144-153).

      Additionally, as suggested by the reviewer, we have revised the relevant section to maintain a coherent line of inquiry. The updated text can be found in lines 163–172.

      In Figure 3H-K, it is not clear why the authors used electron microscopy to evaluate mitochondrial morphology. The very good confocal images in Figure 3C-G show a clear change in mitochondrial morphology following the knockdown of Mfn, Opa1, and Miro. While it is clear from the electron micrographs in Figure H that the mitochondria are enlarged, it is not obvious that this increase in length is a result of increased mitochondrial fusion. Indeed, if the mean form factor were used to quantify the shape, it is likely that in both conditions, the value would be close to 1, indicating more of a round object, and it not obvious whether there would be a difference between the Yki OE versus the YkI OE + dPGC1 RNAi. Therefore, from this data alone, it cannot be concluded that the YkI OE + dPGC1 RNAi condition leads to mitochondrial hyperfusion.

      Our rationale for including electron microscopy (EM) was to overcome specific limitations in imaging mitochondrial morphology within the main epithelium of the wing disc, where Yki-driven tumors arise. These tumors were generated using ap-Gal4, which drives expression specifically in the main epithelium and is not active in the peripodial membrane. This is an important distinction, as the peripodial membrane—used in Figures 3C–G—has a squamous architecture and larger cytoplasmic volume, making it ideal for high-resolution confocal imaging and for assessing the effects of manipulating dMfn, Opa1, and miro. However, because ap-Gal4 is not expressed in the peripodial membrane, this tissue could not be used to analyze mitochondrial morphology in the actual tumorous context.

      To directly evaluate mitochondria in the main epithelium, we employed EM, which provides the resolution necessary to visualize ultrastructural changes that are not easily captured by confocal microscopy in this densely packed tissue. While EM does not directly measure fusion events, it allowed us to detect changes in mitochondrial size and shape that support our broader findings.

      We acknowledge that mitochondrial enlargement alone does not definitively demonstrate hyperfusion. However, the EM data were interpreted alongside additional evidence: the upregulation of mitochondrial fusion genes (dMfn and Opa1) in Yki + dPGC1-RNAi tumors, and functional data showing that overexpression of these genes promotes fusion in the peripodial membrane. Together, these findings suggest that dPGC1 depletion enhances mitochondrial fusion in Yki-driven tumors.

      To further clarify this point, we also imaged mitochondria in the main epithelium using confocal microscopy. However, the resolution was considerably lower than that achieved with EM, limiting our ability to assess fine mitochondrial structures. We have prepared a representative figure for the reviewer (below), showing representative confocal images of wing discs from three genotypes: (A) ap-Gal4, UAS-GFP (control), (B) ap-Gal4, UAS-Yki, and (C) ap-Gal4, UAS-Yki, UAS-dPGC1-RNAi. We used anti-ATP-synthase (Abcam, ab14748, dilution 1:200), to label the mitochondria for this Figure. Despite the lower resolution, mitochondria in the Yki + dPGC1-RNAi tumors appear elongated (yellow arrows) compared to those in the other conditions, consistent with the changes observed by EM. We believe this example illustrates the limitations of confocal imaging in this tissue and reinforces the need for EM to accurately assess mitochondrial morphology in the tumorous epithelium.

      While our EM analyses reveal mitochondrial enlargement in wing discs co-expressing Yki and PGC1-RNAi, we acknowledge that these structural features alone do not conclusively demonstrate mitochondrial hyperfusion. To address this, we have revised the manuscript to avoid overinterpreting the EM data and instead emphasize the functional relevance of mitochondrial fusion regulators such as dMfn and Opa1 in promoting tumor growth.

      Taken together, the EM analysis provides structural validation in the tumorous epithelium (Fig 4), while the confocal imaging and functional manipulation of fusion genes in the peripodial membrane offer mechanistic insight (Fig 3). This integrated approach strengthens the conclusion that PGC1 depletion in a Yki-overexpressing context promotes changes in mitochondrial morphology and contributes to tumorigenesis, independent of whether these changes reflect hyperfusion.

      Figure 4. refers to changes in mitochondrial fusion and fission in tumor formation; however, the authors do not attempt to alter mitochondrial fission factors, so it is not accurate to mention a role of mitochondrial fission, in this context.

      As we did not directly manipulate fission-related factors in our experiments, we agree that it would be inappropriate to draw conclusions about the role of mitochondrial fission in this context. Our revised figure (current Fig 5) and accompanying text now focus exclusively on the effects of mitochondrial fusion and the genes directly involved in regulating this process.

      It must be noted, too, that the authors have not demonstrated that their genetic interventions have actually affected mitochondrial morphology in these experiments. As noted in the previous figure, the Yki OE + dPGC1 RNAi condition showed enlarged mitochondria, but not necessarily hyperfused organelles. Therefore, the downregulation of Mfn or Opa1 in this set of experiments may not necessarily have altered mitochondrial morphology. Perhaps suppression of Mfn or Opa1 would normalize the areas of these evidently swollen mitochondria, but this is unclear without images. Furthermore, it should be appreciated that both Opa1 and Mfn exhibit pleiotropic attributes - e.g., Opa1 not only regulates IMM fusion, but it also modulates the shape and tightness of cristae membranes, specialized sites of oxidative phosphorylation as well as sequestration of cytochrome c, the release of which influences apoptosis (Frezza et al., 2006). At least in mammalian cells, Mfn2 is thought to regulate contacts between mitochondria and endoplasmic reticulum (Naon et al., 2023), which may serve other functions than OMM fusion, such as stabilization of the MAM.

      To directly address this point, we performed EM to assess mitochondrial ultrastructure in Yki + dPGC1-RNAi wing disc tumors, with and without dMfn1 downregulation, the most upregulated mitochondrial fusion gene in this tumor context. In Yki + dPGC1-RNAi tumors, mitochondria appeared more elongated, consistent with increased fusion. Upon dMfn1 depletion, we observed a dramatic shift in mitochondrial morphology: mitochondria became larger and more rounded, with disrupted cristae and onion-like structures, indicative of compromised mitochondrial integrity and function (see current Fig. 4).

      As the reviewer rightly notes, these morphological changes are consistent with the pleiotropic roles of Mfn and Opa1, which extend beyond outer and inner membrane fusion to include regulation of cristae architecture and ER-mitochondria contacts (Frezza et al., 2006; Naon et al., 2023). We now discuss these broader roles in the revised manuscript (lines 493–497). Taken together, our EM and confocal analyses, combined with targeted genetic manipulations, provide evidence that mitochondrial morphology is indeed altered in response to dPGC1 depletion and fusion gene deregulation in the wing disc.

      Figure 5 highlights a connection between dysregulation of mitochondria and Cyclin E, which allows cells to prematurely enter S phase. The data presented here do not offer clarity on whether the enlargement of the tumors results from increase cellular proliferation and/or cell size. The role of the cell cycle adds a layer of complexity to these results, because it is thought that mitochondria undergo fragmentation during the cell cycle to promote an even distribution of the organelle population after mitosis (Taguchi et al., 2007); however, in this manuscript, the authors contend that the downregulation dPGC1 is promoting mitochondrial hyperfusion. It is unclear how and whether cellular division and proliferation would proceed at an accelerated rate in a situation with mitochondrial hyperfusion.

      To address this point, we started by analyzing whether Yki + dPGC1-RNAi tumors exhibit increased proliferation compared to tumors expressing Yki alone. We quantified mitotic activity using the phospho-Histone H3 (PH3) marker of mitotic cells and observed a significant increase in PH3-positive cells in the Yki + dPGC1-RNAi condition. These results indicate an elevated proliferation rate in these tumors and are now presented in Fig 2O–Q. In the text, can be found in lines 221-228.

      We agree with the reviewer that our findings challenge the conventional view that mitochondrial fragmentation is a prerequisite for mitosis, as we observe increased expression of gene promoting mitochondrial fusion in the context of dPGC1 downregulation alongside signs of accelerated cell cycle entry. It is important to note that we also show that the levels of the oncogene Cyclin E, a key driver of cell cycle progression and S-phase entry, were elevated in Yki + dPGC1-RNAi tumors compared to those expressing Yki alone, suggesting that the increased proliferation observed is at least in part driven by enhanced cycle activity. To further probe Cyclin E’s role, we used the CycE-05306 heterozygous mutant allele, which reduces Cyclin E levels by ~50% without affecting normal development. Notably, this partial reduction strongly suppressed tumor growth in the Yki + dPGC1-RNAi background (Fig 6), underscoring Cyclin E’s functional importance in supporting oncogenic growth in this context.

      These findings support the notion that defects in the expression of mitochondrial genes involved in mitochondrial morphology induced by dPGC1 depletion do not impair but rather coincide with accelerated cell division.

      Minor comments:

      Lines 69-72 contrast the roles of PGC1α and β. It is not clear whether the comparison is of their respective roles in cancer or in normal physiology. In either case, it is important to note that PGC1β has been shown to drive mitochondrial fusion as well as biogenesis through its control of MFN2, among other factors (Liesa et al., 2008).

      In response, we have clarified the comparison between PGC1α and PGC1β in the introduction to specify that it refers to their roles in cancer. Additionally, we now acknowledge that PGC1β has been shown to promote mitochondrial biogenesis and fusion, notably through the regulation of MFN2, as demonstrated by Liesa et al. (2008). This reference has been added to provide a more balanced and accurate representation of PGC1β’s functions. In the text it can be found in lines 77-81.

      Although this study focuses on PGC1, the authors do not seem to site the original literature from the Spiegelman lab.

      In response to the reviewer’s comment, we have added a new section in the introduction that cites key foundational studies from the Spiegelman lab. This addition can be found in the introduction in lines 68-73.

      There are 10-20 grammatical errors throughout the text.

      We apologize for this. We have carefully revised the text, and we are very confident those errors have been corrected.

      **Referee Cross-commenting**

      There is agreement among the referees that the potential role of PGC1 as a tumor suppressor is interesting and significant. However, various aspects of this work require attention prior to publication. For example, there needs to be a complete knock down of PGC1 to come to any conclusion as to its role in wing development. The methods for analyzing mitochondrial morphology need to be clarified and be consistent with standards in the field of mitochondrial dynamics. Also, the authors need to quantify their Western blots to obtain accurate assessments of protein levels. Generally, the study relies too heavily on overexpression experiments; understanding the potential role of mitochondria in regulating the Hippo pathway should include various knockdown and/or knockout models.

      Reviewer #1 (Significance (Required)):

      Overall, the authors show an interesting dampening effect of dPGC1 on growth of Yki-driven tumors. This data could be relevant for elucidating how dysregulation of the Hippo signalling pathway can underlie tumorigenesis.

      The narrative arc of the study, however, appears to lack a focused line of inquiry. Figure 1 highlights an attempt to modulate Drosophila wing size and/or structure by downregulating dPGC1, but to no effect. Although examination of the efficiency of the RNAi revealed that the transcripts were still present in significant quantities; so, the conclusion that dPGC1 is dispensable for wing formation is premature. To have clarity on this point, it would be necessary to completely knockdown the gene, preferably by showing a total loss of protein. This should be feasible for the authors, since they showed Western blotting in Figure 5A. In any event, it seems that this negative data led the authors to study the Hippo pathway in the larval stage. This transition from Figure 1 to 2 seemed somewhat arbitrary and leads to a rather disjointed sense of the main line of inquiry around dPGC1.

      It is important to note, too, that the authors highlight a role of mitochondrial dynamics in the pathway of Yki-driven tumor formation; however, they only directly evaluate mitochondrial dynamics in this context in a single assay, namely, Figure 3H-K, and this quantification is likely inaccurate because the mitochondria in the Yki OE + dPGC1 RNAi condition seem to be substantially enlarged, circular structures. It is critical to keep in mind that mitochondrial enlargement does not necessarily stem from hyperfusion. It could come from a decrease in the activity of Drp1 or result from an imbalance between mitochondrial biogenesis and mitophagy.

      As noted in our responses above, we have addressed these concerns by clarifying the limitations of our mitochondrial morphology analysis. Additionally, we have expanded the discussion (lines 498-504) to explicitly acknowledge that mitochondrial enlargement does not necessarily indicate hyperfusion. In that paragraph, we consider alternative explanations such as reduced fission or imbalances in mitochondrial biogenesis and mitophagy, and we outline the need for future studies using dynamic assays and additional markers to more precisely dissect mitochondrial remodeling in Yki-driven tumors.

      A marked limitation of this study is the overuse of rather artificial manipulations of transcriptional regulatory pathways. The study would benefit a lot from investigation of the loss of function of components of the Hippo pathway rather than just OE of Yki.

      We performed additional experiments using Warts (Wts) mutant clones to assess the role of dPGC1 in a loss-of-function context within the Hippo pathway. While our initial analyses were based on Yki overexpression, which allowed us to robustly probe the interaction between Yki and dPGC1, we agree that this approach may not fully reflect physiological conditions. By generating Wts mutant clones, which endogenously activate Yki through loss of upstream inhibition, we were able to evaluate the impact of dPGC1 depletion in a more physiologically relevant setting. These new results confirm and extend our previous findings, showing that dPGC1 limits tissue overgrowth even when Yki is activated through loss of Wts, thereby strengthening the biological relevance of our conclusions.

      These results are presented in Fig 2F-I. In the text, those results are presented in lines 181-189.

      My expertise is in mitochondrial biology, with specialization in super-resolution imaging, mitochondrial dynamics and membrane architecture. I have also worked in the interface between mitochondrial physiology and cancer. With this perspective, I think that the authors uncover a potentially interesting role of PGC1 as a tumor suppressor.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary In this manuscript the authors the investigate the role of the mitochondrial regulatory transcription factor dPGC1 in tissue growth and oncogenic transformation. They show that dPGC1 limits hyperplasia mediated by overexpression of Yki in the Drosophila wing disc, while having no effect on normal growth. dPGC1 depletion in discs overexpressing Yki results neoplastic overgrowth and hyperfused mitochondria, which was dependent on the increased expression of genes involved in promoting mitochondrial fusion. Additionally, the authors show that dPGC1 limits CycE levels post-transcriptionally in Yki tumors.

      In the revised version of our manuscript, we have clarified the relationship between our findings and prior work by Nagaraj et al., including new experiments that demonstrate the specificity of dPGC1’s role in Yki-driven growth. Specifically, we show that dPGC1 depletion does not enhance tissue overgrowth in EGFR or InR contexts, nor does it affect Yki expression or activity. Furthermore, we tested dPGC1 overexpression in Yki-overexpressing tissues and observed no significant changes in growth or mitochondrial fusion gene expression. Additional controls confirmed that Cyclin E upregulation is specific to the Yki + dPGC1 depletion condition, reinforcing the context-dependent nature of our findings.

      Each of the reviewer’s comments is addressed below.

      Major comments 1) The authors mention several times in passing in the results a manuscript from the Banerjee lab (Nagaraj et al 2012), which shows that many of the genes the authors of the present manuscript show are upregulated upon Yki overexpression + dPGC1-RNAi compared with Yki overexpression alone are in fact upregulated upon Yki overexpression alone compared with control (dMfn/marf, opa1, miro - while interestingly dPGC1 itself is not affected). Nagaraj et al further show that Yki-overexpressing discs have longer mitochondria suggesting increased fusion even in the absence of dPGC1 depletion. The findings from Nagaraj et al should be mentioned explicitly in the introduction and the relationship between this manuscript and the present work clearly outlined in the discussion.

      In the revised manuscript, we have now explicitly referenced the findings of Nagaraj et al. (2012) in the Introduction (lines 106-118), Results (lines 355-360) and Discussion (lines 466-468) sections.

      In the revised Introduction, we summarize their key observations that Yki overexpression alone upregulates mitochondrial fusion genes such as dMfn and Opa1, and leads to mitochondrial elongation, while not affecting dPGC1 expression.

      In the revised Results section, we mention that, building on that work, our study demonstrates that dPGC1 depletion further amplifies this effect, leading to enhanced mitochondrial elongation and tumor growth.

      In the revised Discussion, we now explicitly reference the findings by Nagaraj et al. (2012), which demonstrated that Yki overexpression promotes mitochondrial fusion and upregulates key fusion genes. We build upon this work by showing that dPGC1 depletion in a Yki-overexpressing background further enhances mitochondrial fusion gene expression and tumor growth. This supports a model in which dPGC1 acts as a safeguard against Yki-induced mitochondrial remodeling and oncogenesis, reinforcing its role as a context-dependent tumor suppressor.

      Importantly, we show that this effect is context-dependent and not observed in otherwise normal tissues, highlighting a sensitized mitochondrial response to Yki activation when dPGC1 is lost. These additions help delineate the novel contribution of our study in identifying dPGC1 as a critical modulator of mitochondrial dynamics and tumorigenesis downstream of Yki.

      2) Given that Yki overexpression alone induces mitochondrial fusion and that dMfn/marf and opa1 depletion suppresses Yki-induced overgrowth (Nagaraj et al), does dPGC1 overexpression also suppress Yki-induced overgrowth?

      If so, is this correlated with reduction in dMfn/marf and opa1 compared with Yki overexpression alone?

      In response, we performed additional experiments to assess whether dPGC1 overexpression influences Yki-driven overgrowth. We also analyzed the expression of mitochondrial fusion genes (dMfn and Opa1) in this context. As shown in new Fig. S8, dPGC1 overexpression in Yki-overexpressing wing discs did not significantly affect tissue growth, nor did it alter the mRNA levels of key fusion regulators, dMfn and Opa1. These findings suggest that the transcriptional upregulation of mitochondrial fusion genes observed upon dPGC1 depletion is not a general consequence of altered dPGC1 levels, but rather a specific response that emerges in the context of Yki activation. We now present and discuss these results in the revised manuscript (lines 278-285), highlighting the sensitized nature of mitochondrial remodeling in an oncogenic environment driven by Yki signaling.

      3) One important question raised by this study is: how specific is the effect of dPGC1 depletion to Yki-driven overgrowth? As Yki-driven overgrowth already have increased mitochondrial length, it is possible that Yki-expressing cells are already sensitised to the effects of dPGC1 depletion. Interestingly, Nagaraj et al show that mitochondrial morphology is not affected upon EGFR activation (hyperplasia) or upon scrib and avl depletion (neoplasia). The authors should therefore test if dPGC1 depletion can potentiate the growth of other hyperplasia drivers such as activated EGFR and InR in the wing disc.

      We tested whether the growth-suppressive effect of dPGC1 depletion was specific to Yki-driven overgrowth or could also potentiate tissue growth in other oncogenic contexts. Specifically, we downregulated dPGC1 in wing discs overexpressing either EGFR or InR. In both cases, we did not observe any enhancement of tissue overgrowth upon dPGC1 depletion, in contrast to what we observed in Yki-overexpressing discs. These results suggest that the sensitivity to dPGC1 depletion is specific to Yki-driven overgrowth and is not a general feature of hyperplastic growth induced by other oncogenes.

      These results are shown in Fig S4 and in lines 195-202.

      4) There are a few simple control experiments the authors should provide to clarify the relationship between Yki and dPGC1: - Are Yki levels affected by dPGC1 depletion?

      To address the potential regulation of Yki by dPGC1, we performed quantitative PCR (qPCR) analysis to measure the expression levels of yki and its well-established transcriptional targets—Cyclin E, Diap1, and bantam—in wing discs depleted of dPGC1. As shown in Fig. S3, we did not detect significant changes in the transcript levels of yki or its target genes, suggesting that the enhanced phenotype observed upon dPGC1 depletion is unlikely to be driven by increased Yki expression or activity. These results indicate that dPGC1 does not strongly influence Yki expression or activity. These new results are presented in lines 190-194.

      • Does dPGC1 knockdown alone modify the expression of the genes tested in Fig.3A? In other words, is this upregulation specific of the Yki-overexpression context?

      We have conducted this analysis, and the results are now presented in new Fig S7. While the trend is similar to that observed in tumors with both Yki depletion and dPGC1 depletion, the magnitude of change is smaller compared to the context of Yki overexpression. This is described in the text in lines 273-277.

      • Does dPCG1 knockdown also stabilise CycE in the absence of Yki overexpression or does the stabilisation of CycE occur only in Yki tumors?

      To address this, we examined Cyclin E levels in wing imaginal discs mutant for dPGC1 alone. Our analysis did not reveal any detectable changes in Cyclin E levels under these conditions. These findings suggest that the upregulation of Cyclin E is not a general consequence of dPGC1 loss, but rather a feature specific to the context of Yki overactivation. The corresponding data are now included in Fig S14 of the revised manuscript. In the text, it can be found in lines 442-448.

      5) Figure 3C-G: it is not clear how the authors can quantify the length of 3D structures like mitochondria from 2D TEM images (unless they have done volume reconstruction from consecutive sections) and no details are provided in the methods. The quantification of mitochondrial length has to be performed rigorously as it is a key part of the paper.

      We agree that TEM provides only 2D profiles of 3D mitochondrial structures, and that this does not allow for precise volumetric reconstruction. In our study, we measured the longest axis of mitochondria visible in thin TEM sections, which is a commonly used 2D proxy for mitochondrial length in the literature (e.g., PMID: 36367943 and PMID: 38637532). To avoid misunderstandings, we have clarified in the Material and Methods section that the reported values represent apparent mitochondrial length in 2D sections, not true 3D length. To enhance the accuracy of these estimates, we measured more than three tissues per genotype, multiple regions per tissue, several cells per region, and various fields of view per cell.

      Minor Comments:

      1) Line 51: "Mitochondria are highly dynamics organelles." should be "Mitochondria are highly dynamic organelles."

      We have corrected that mistake. Thanks!

      2) Introduction: the authors should summarise the known physiological functions of PGC1α in order to put their findings in context.

      We have added a section in the introduction (lines 66-81) summarizing the known physiological functions of PGC1α

      3) lines: 121-3: "...depletion of dPGC1...did not have a major impact on adult wing size and shape (Fig 1B, C)." There is a small but statistically significant difference so the authors should state this in the text.

      We have revised the text to acknowledge that dPGC1 depletion leads to a modest but statistically significant reduction in wing size. In addition to the original analysis, we have now included further experiments to strengthen this point. Specifically, we analyzed wings from flies homozygous for the dPGC11 allele (also known as dPGC1KG08646; see FlyBase: https://flybase.org/reports/FBal0148128) and confirmed a small but significant reduction in both wing disc and adult wing size compared to controls (this can be found in Fig. 1 and Fig. S1). These results support the conclusion that, although dPGC1 is dispensable for viability and gross morphology, it contributes to normal wing growth. These new results can be found in lines 144-153.

      4) Figure 5A (Cyclin E western blot): the authors should show molecular weight markers. In the revised version of our manuscript, we are including the molecular markers as indicated by the reviewer. These can be found in Fig S12.

      Reviewer #2 (Significance (Required)):

      The manuscript by Sew et al builds on the previous work by Nagaraj et al to explore the role of mitochondrial function in tumors driven by disruption of the Hippo pathway. In particular, the authors identify dPGC1 as a transcription factor that limits Yki-driven mitochondrial fusion and tissue growth. Interestingly, they further show that Yki/PGC1-depleted tumors are highly sensitive to Cyclin E levels, due to post-transcriptional Cyclin E increase. These results further our knowledge of how Yki drives growth and how mitochondria participate in oncogenic transformation. With appropriate revision as outlined above (for example exploring whether the mechanism proposed is Yki-specific), the manuscript will be of broad interest to developmental and cancer biologists.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      The manuscript presents compelling evidence that dPGC1 acts as a context-dependent tumor suppressor in Drosophila by modulating mitochondrial dynamics and limiting Yorkie (Yki)-induced oncogenic growth. By leveraging the Drosophila wing imaginal disc as a model, the authors investigate how dPGC1 depletion exacerbates Yki-driven tissue overgrowth, mitochondrial hyperfusion, Cyclin E upregulation, and DNA damage, leading to tumorigenesis. The study provides valuable insights into the interplay between mitochondrial dynamics and cancer, with implications for understanding metabolic regulation in oncogenesis. While the findings are significant and well-aligned with the field, certain aspects of the experimental design, data presentation, and mechanistic insights require further attention to enhance clarity, reproducibility, and impact. Below, I outline my major concerns and recommendations.

      We addressed concerns about RNAi efficiency and protein-level validation with new qPCR data and mutant analysis. We provided EM and confocal evidence of mitochondrial changes. We clarified non-autonomous effects and quantified Mmp1 and F-actin and added data on miro and Opa1 manipulations. Cyclin E quantification was expanded using multiple Western replicates and a validated mutant allele, and we included new data on mitochondrial membrane potential to assess functional consequences.

      Our detailed responses to each point raised by the reviewer are provided below.

      Major Points

      1. One point is the knock-down efficiency of dPGC1 on the mRNA level, which is between 30 to >50% (Fig. S1C). This is not too strong, so the question arises how severly the protein levels are affected. If possible, an antibody staining with quantification should be performed. From these data it cannot be concluded dPGC1 is not required for normal development, half the dose could be sufficient. How do wings look like when the ap-GAL4 driver is used for dPGC1 knock-down, as this is the driver used in the subsequent experiments? Reviewer 1 also raised concerns about the potential inefficiency of the RNAi treatment in revealing a function during normal wing growth. We agree with both reviewers that the interpretation of the RNAi efficiency data requires clarification.

      The qPCR analysis shown in former Fig. S1C was performed using wing discs from flies expressing UAS-dPGC1-RNAi under the control of the MS1096-Gal4 driver. However, as shown in current Fig. 1C, MS1096-Gal4 is not expressed uniformly across the wing disc. Some regions remain RFP-negative, indicating that the RNAi construct is not active in all cells. As a result, the measured mRNA levels likely underestimate the true knockdown efficiency. This is because the qPCR includes mRNA from both RNAi-expressing and non-expressing cells, diluting the apparent reduction in transcript levels.

      To address this limitation and more accurately assess RNAi efficiency, we repeated the qPCR analysis using a ubiquitous driver (actin-Gal4) to ensure uniform expression of the RNAi construct. Under these conditions, we observed a more substantial knockdown, with dPGC1 mRNA levels reduced to approximately 25% of control levels (this is shown in current Fig S2). This result indicates that the RNAi line is more effective than initially suggested by the MS1096-Gal4-based analysis.

      To complement our RNAi-based analysis, we additionally used a mutant strain carrying a characterized allele of dPGC1 (dPGC11, also known as dPGC1KG08646; see FlyBase: https://flybase.org/reports/FBal0148128). This genetically distinct approach allowed us to validate and strengthen our findings regarding dPGC1 function. Flies homozygous for this allele exhibited a modest but statistically significant reduction in both wing disc and adult wing size. These results support the conclusion that dPGC1 is required for normal wing growth and development. The new data are now included in Figure 1 and referenced in the main text (lines 144-151).

      Unfortunately, we cannot perform antibody staining due to the unavailability of antibodies against dPGC1.

      How does the wing disc look like when dPGC1 is overepressed together with Yki?

      In response, we performed additional experiments to assess whether dPGC1 overexpression influences Yki-driven overgrowth. We also analyzed the expression of mitochondrial fusion genes (dMfn and Opa1) in this context. As shown in new Fig. S8, dPGC1 overexpression in Yki-overexpressing wing discs did not significantly affect tissue growth, nor did it alter the mRNA levels of key fusion regulators, dMfn and Opa1. These findings suggest that the transcriptional upregulation of mitochondrial fusion genes observed upon dPGC1 depletion is not a general consequence of altered dPGC1 levels, but rather a specific response that emerges in the context of Yki activation. We now present and discuss these results in the revised manuscript (lines 278-285), highlighting the sensitized nature of mitochondrial remodeling in an oncogenic environment driven by Yki signaling.

      In Fig 2D (but also in Fig. 2C) not only cells in the dorsal but also in the ventral comparmtent seem to overproliferate. Either this is a mis-conception or it is a non-autonomous effect from interfering with Yki and dPGC1 in the vertrnal compartment. In either cases, this has to be clarified.

      Ventral cells are not labelled by GFP. Fig 3D shows a tumor in which GFP-negative cells are not present, suggesting that they are not overproliferating but rather being eliminated. This phenomenon is consistent with cell competition, a well-characterized process in which transformed or tumorigenic cells outcompete and eliminate neighboring wild-type cells. We have previously described this behavior in wing disc tumors (PMID: 26853367; DOI: 10.1016/j.cub.2015.12.042), and it likely contributes to the expansion of the tumor mass by removing surrounding normal tissue also in this context.

      In Fig. 2F-H quantification of Mmp1 and F-actin is missing. Mmp1 is a JNK target, so the authors could do in addition an anti-phospho JNK antibody staining.

      In response, we have performed those quantifications. They are now included in Fig 2M, N.

      In Fig. 3: how does the mitochondrial network look like in the wing disc periopodial epithelium using the Gug>Yki+dPGC1 genotype? Is it similar to Gug>dMfn or Gug>miro?

      We attempted to perform this analysis; however, Yki overexpression under the control of Gug-GAL4 resulted in larval lethality, likely due to GAL4 activity in essential tissues such as the central nervous system. As a result, we were only able to induce transgene expression for 24 hours before lethality occurred.

      At this early point, no detectable changes in mitochondrial morphology were observed in the peripodial membrane, likely because the duration of transgene expression was insufficient to elicit phenotypic alterations in this specific tissue. Therefore, while we aimed to compare this genotype to Gug>dMfn and Gug>miro, the technical limitations prevented a conclusive analysis.

      We have prepared a representative figure for the reviewer (below), showing representative confocal images of wing discs showing mito-GFP and Dapi in the three genotypes indicated in the Fig.

      In Fig. 3I: what is really the mitochondrion? It would be good to outline the region(s) that was/were measured.

      To improve clarity, we have repeated the electron microscopy (EM) analysis and now provide representative images that more clearly illustrate mitochondrial morphology in the different genotypes analyzed. These updated images presented in Fig 4 better highlight the structural alterations observed upon genetic manipulation and help clarify the basis for our morphological assessments.

      We have extended our analysis and have assessed mitochondrial ultrastructure in Yki + dPGC1-RNAi wing disc tumors, with and without dMfn1 downregulation—the most upregulated mitochondrial fusion gene in this tumor context. In Yki + dPGC1-RNAi tumors, mitochondria appeared more elongated, consistent with increased fusion. Upon dMfn1 depletion, we observed a dramatic shift in mitochondrial morphology: mitochondria became larger and more rounded, with disrupted cristae and onion-like structures, indicative of compromised mitochondrial integrity and function (see new Fig 4).

      A quantification of RNAi and overexpression efficiencies of the different transgenes in Fig. 3 is required.

      To assess the efficiency of RNAi-mediated knockdown and transgene overexpression, we performed quantitative PCR (qPCR) using the ubiquitous Actin-Gal4 driver. While we acknowledge that this driver does not replicate the spatial specificity of the periodic membrane Gal4 driver used in the experiments shown in Figure 3 (Gug-Gal4), the latter targets a very limited number of cells within the imaginal disc, making reliable qPCR quantification unfeasible.

      Using Actin-Gal4 allows us to obtain a relative and informative measure of transgene efficiency across the different constructs. These data confirm effective knockdown and overexpression of the relevant genes and are now included in Figure S2.

      In Fig. 4: what is the phenotype when miro is over-expressed in combination with Yki? Or when it is knocked down in the ap>Yki-dPGC1 background? This was the gene tested in Fig. 3 with a clear mitochondrial phenotype

      To address whether miro contributes to Yki-mediated tumor growth, we performed the requested experiments and now include the results in the revised manuscript (see updated Results section, lines 374-377, and new Fig. S11).

      Our data show that overexpression of miro in combination with Yki does not lead to a significant increase in tissue growth or tumor-like phenotypes, in contrast to the effects observed with dMfn or Opa1 overexpression. Similarly, knockdown of miro in the ap>Yki-dPGC1-RNAi background did not suppress tumor growth, indicating that miro is not required for the enhanced proliferation observed in this context.

      These findings suggest that, although miro influences mitochondrial morphology in normal wing discs (as shown in Fig. 3), its role in tumorigenesis is distinct from that of dMfn and Opa1. We have revised the manuscript to clarify the gene-specific contributions of mitochondrial fusion regulators to Yki-driven tumorigenesis. This distinction underscores the complexity of mitochondrial dynamics and highlights that not all fusion-related genes exert the same functional impact in oncogenic settings.

      How does the mitochondrial morphology in the wing disc peripodial epithelium look like in Gug>Opa1RNAi or Gug>Opa1 discs?

      To assess the impact of Opa1 on mitochondrial morphology in the peripodial epithelium of the wing disc, we used the Gug-GAL4 driver to either overexpress or knock down Opa1. Our analysis revealed that Opa1 overexpression led to slightly elongated mitochondria, but did not result in extensive network formation, suggesting a modest enhancement of inner membrane fusion. In contrast, Opa1 knockdown caused clear mitochondrial fragmentation, closely resembling the phenotype observed upon dMfn depletion. These results shown in Fig 3 are consistent with the distinct roles of Opa1 and dMfn in regulating mitochondrial fusion: Opa1 primarily modulates inner membrane fusion and cristae architecture, while dMfn drives outer membrane fusion and network connectivity.

      The corresponding data are presented in Figure 3F, G, and quantified in Figure S9, alongside experiments manipulating other genes involved in mitochondrial dynamics.

      Why have the authors switched between the ap>Yki+dPGCRNAi and the ap>Yki+dPGC1shRNA lines? It would be important to have this series of experiments in the same backgrounds, as KD efficiencies are different (Fig. S1C).

      The primary reason for switching between the dPGC1-RNAi and dPGC1-shRNA lines was practical: the chromosomal insertion sites of the transgenes made certain genetic combinations more feasible with one line over the other. This flexibility significantly facilitated our experimental design and analysis.

      To address concerns regarding knockdown efficiency, we performed a comparative analysis using the ubiquitous actin-GAL4 driver, rather than MS1096-GAL4, which exhibits patchy and dynamic expression in the wing imaginal disc. This allowed us to obtain a more consistent and interpretable measure of mRNA downregulation for both transgenes. Our results show that both lines achieve comparable levels of knockdown, as shown in Figure S2.

      Fig. 5A: proper quantification of Western Blot signals is required. I do not agree that Cyclin E protein levels are elevated in ap>Yki or ap>Yki+dPGC1 discs. Even at the mRNA levels the increase in expression is rather weak. From these results nothing can be concluded.

      We have repeated the Western blot analysis using seven independent membranes to ensure robust quantification of Cyclin E levels in ap>Yki and ap>Yki+dPGC1-RNAi wing discs (Fig 6).

      Although the increase in Cyclin E protein levels is subtle, it is consistent across replicates and statistically significant. We have now included the quantification of these Western blot signals in the revised Figure 6, which supports the conclusion that Cyclin E levels are elevated in ap>Yki+dPGC1 discs.

      We hope this additional data addresses the reviewer’s concern and strengthens the interpretation of our results.

      Knock-down efficiencies for dap and CycE needs to be quantifiec (Fig. 5H-N). Although the rescue experiment with CycE knock down is from the phenotype convincing, it is nonetheless puzzling, as CycE is accodring to Fig. 5A+B hardly upregulated. An independent CycE RNAi line would be useful.

      We have quantified the knockdown efficiency of the dap-RNAi line, and the results are included in Figure S13.

      Regarding Cyclin E, we would like to clarify that we did not use an RNAi line in this experiment. Instead, we employed the CycE-05306 mutant allele in a heterozygous background, which is expected to reduce Cyclin E levels by approximately 50%. The CycE-05306 allele in Drosophila melanogaster is a loss-of-function allele of the Cyclin E gene. This allele carries a P-element insertion in the first intron of the CycE gene, which disrupts normal transcription and reduces Cyclin E expression. In a heterozygous background, as used in your experiments, CycE-05306/+ is expected to reduce Cyclin E levels by approximately 50%, which is typically sufficient to observe genetic interactions or sensitized phenotypes without affecting normal development. This makes it a valuable tool for studying gene dosage effects, particularly in tumor models where Cyclin E activity may be rate-limiting.

      Importantly, this partial reduction does not impair normal tissue growth, but it strongly limits tumor growth in the context of Yki overexpression combined with dPGC1 downregulation, as shown in Figure 6. This selective sensitivity highlights the functional importance of Cyclin E in supporting oncogenic growth driven by Yki and dPGC1 depletion. We believe this provides compelling evidence for Cyclin E’s role in this tumor model.

      Reviewer #3 (Significance (Required)):

      Strengths and Limitations of the Study Strengths Innovative Focus on Mitochondrial Dynamics and Oncogenesis: The study provides compelling evidence linking mitochondrial dynamics, particularly hyperfusion, to tumorigenesis in Drosophila. The identification of dPGC1 as a context-dependent tumor suppressor adds novel insights into the interplay between metabolism and oncogenesis. Comprehensive Use of Drosophila as a Model System: The study leverages the genetic tractability of Drosophila, allowing precise manipulation of mitochondrial regulators and signaling pathways. The use of wing imaginal discs as a model for tumor growth is well-established and appropriate. Integration of Morphological and Genetic Data: The manuscript combines confocal imaging, electron microscopy, and genetic tools to demonstrate the role of dPGC1 in regulating mitochondrial dynamics, Cyclin E levels, and tissue overgrowth. Relevance to Cancer Biology: The findings address key hallmarks of cancer, including deregulated metabolism, genomic instability, and cell cycle misregulation. The study's exploration of these processes in a simple model organism provides a strong basis for translating findings to mammalian systems.

      Limitations Validation of RNAi and Overexpression Efficiency: The knockdown efficiency of dPGC1 on the mRNA level is only moderate (30-50%), and protein-level validation is missing. Without this, the study cannot conclusively demonstrate the role of dPGC1 in normal development or tumorigenesis. Incomplete Mechanistic Insights: The manuscript identifies Cyclin E as a potential driver of tumor growth but does not adequately explore how mitochondrial hyperfusion leads to Cyclin E regulation (e.g., post-transcriptional mechanisms or protein stability). Inconsistencies in Experimental Backgrounds: The study uses different RNAi/shRNA lines and driver combinations inconsistently across experiments, making it difficult to compare results directly. This variability undermines the robustness of the conclusions. Limited Functional Analysis of Mitochondria: While mitochondrial morphology is well-characterized, functional assays (e.g., membrane potential or ATP production) are missing. These would confirm the impact of hyperfusion on cellular energetics and oncogenesis.

      In the revised manuscript, we have addressed each of the concerns raised.

      In addition to that, in the revised version of the manuscript, we have included new experiments to assess mitochondrial functionality in tumors co-expressing Yki and dPGC1-RNAi. Specifically, we analyzed the Mitochondrial Membrane Potential (MMP). We used TMRE staining to evaluate MMP, a key indicator of mitochondrial integrity and oxidative phosphorylation capacity. Our analysis revealed no significant differences in MMP between Yki tumors and Yki + dPGC1-RNAi tumors, suggesting that mitochondrial membrane potential is preserved despite the observed morphological abnormalities. These results are shown in Fig S6. In the text it is discussed in lines 233-243.

      Contribution to Existing Literature The study makes a significant contribution to the growing body of literature on the metabolic regulation of cancer by identifying dPGC1 as a tumor suppressor modulating mitochondrial dynamics. Previous work has established the dual roles of mammalian PGC1α in promoting or suppressing cancer depending on context. This study adds depth by demonstrating similar context-dependent effects in a simpler model organism, facilitating further exploration of the molecular mechanisms involved.

      By linking mitochondrial fusion, Yki signaling, and Cyclin E regulation, the manuscript aligns with and expands upon research on Hippo pathway regulation, cancer metabolism, and mitochondrial biology. The findings highlight the importance of integrating metabolic and signaling networks in understanding oncogenesis.

      Community Selection The current form of the manuscript is best suited for a specialized audience, particularly mitochondrial biologists, Drosophila researchers, and Hippo pathway specialists. To engage a broader community, additional work linking these findings to mammalian models or human cancer biology would be necessary.

  5. drive.google.com drive.google.com
    1. In this section, we review research thatsuggests that, whereas massing practice might promoterapid performance gains during training, distributingpractice facilitates long-term retention of that skill.

      Although massed practice may be useful for understanding material within a limited duration (short term memory), retention is not as effective using this method compared to distributive learning. This is why cramming material before a test is not the most effective way to actually retain the information learned from that short studying period, whereas distributive practice allows for breaks between practice to strengthen retention and retrieval (practice makes perfect!) We should think of studying strategies that discourage cramming in learning.

    1. or by the teacher with input from students

      I think this is something I'll be taking with me into my future classrooms because it isn't something I ever thought about before we talked about it in class. Kids and teenagers are more apt to do what their friends are doing or what their friends think is best, and if we come up with rules and norms as an entire class which includes aforementioned friends they'll feel more likely to listen and abide by them. When the rules are just from the teacher some of the more rebellious teens could feel the need to push the limit some. It could also help by bringing insight into what they may or may not understand or already abide by at home.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC- 2025-03073

      Corresponding author(s): Shaul Yogev

      1. General Statements [optional]

      We kindly thank our reviewers for their enthusiasm, thoughtful feedback, and constructive suggestions on how to strengthen our manuscript. Below, we provide a point-by-point response to reviewer comments and outline the experiments we will do to address every concern that has been raised.

      2. Description of the planned revisions

      • *

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      This interesting study uses an unbiased genetic screen in C. elegans to identify SAX-1/NDR kinase as a regulator of dendritic branch elimination. Loss of SAX-1 results in an excess branching phenotype that is striking and highly penetrant. The authors identify several additional regulators of branch elimination (SAX-2, MOB-1, RABI-1, RAB-11.2) by using a candidate genetic screen aimed at factors that interact physically or genetically with SAX-1. They propose that SAX-1 acts by promoting membrane retrieval based on the nature of these interactors and the results of an imaging-based in vivo assay for endocytic puncta.

      Major comments.

      1. My biggest concern is that the phenotypes are only observed in temperature-sensitive dauer-constitutive mutant backgrounds, and not in wild-type dauers. That is, wild-type animals exiting dauer do not require SAX-1 for dendrite elimination. While this does not undermine the importance of the results, it does require more explanation. The authors write that "the requirement for sax-1... relies on specific physiological states of the dauer stage," but I do not understand what this means. Are they saying that daf-7 and daf-2 dauers are in a different "physiological state" than wild-type dauers? In what way? What is the evidence for this? A more rigorous explanation is needed. We agree that this is puzzling, and we thank the reviewer for recognizing that this does not undermine the importance of the results. There is ample evidence that daf-2 and daf-7 differ from starvation-induced dauers. For example, a recent preprint finds that the transcriptomes of these two mutants at dauer cluster much closer to each other than to starvation-induced dauers (Corchado et al. 2024). Older work has noted other differences, such as the time the dauer entry decision is made (Swanson and Riddle 1981), the synchronicity of dauer exit, the ability to force dauer entry in daf-d mutants, as well as additional dauer-unrelated phenotypes (reviewed in Karp 2018). We agree with the reviewer that this merits further clarifications and will perform the experiments suggested by the reviewer below:

      To me, the simplest genetic explanation is that daf-7 and daf-2 are partially required for branch retraction in a manner redundant with sax-1, and the ts mutants are not fully wild-type at 15C. Thus, the sax-1 requirement is revealed only in these mutant backgrounds. Can the authors examine starvation-induced dauers of daf-7 or daf-2 raised continuously at 15C?

      We will do this experiment.

      daf-7 and daf-2 ts strains can form "partial dauers" that have a dauer-like appearance but are not SDS resistant. Could the difference between partial dauers and full dauers account for the difference in sax-1-dependence? The authors could use SDS selection of the daf-7 strain at 25C to ensure they are examining full dauers.

      We tested daf-7 mutants with 1% SDS when we set up the system – they are fully dauer at 25°C and are SDS sensitive after exit. We will repeat this important control with daf-7; sax-1 double mutants.

      The Bargmann lab has created a daf-2 FLP-OUT strain (ky1095ky1087) that allows cell-type-specific removal of daf-2. Could this be used to test for a cell-autonomous role of daf-2 in IL2Q related to branch elimination?

      We can attempt this experiment. However, since IL2 promoters turn on prior to dauer, the interpretation would not be straightforward – it would be hard to exclude that a cell autonomous defect in dauer entry does not account for the IL2 dauer exit phenotype, even if branching appears normal.

      These ideas are not a list of specific experiments the authors need to complete, rather they are meant to illustrate some possible approaches to the question. Whatever approach they use, it is important for them to more rigorously explain why SAX-1 is not required for branch removal in wild-type animals.

      We completely agree. We will carry out the 15°C experiment, examine morphological characteristics and test SDS resistance. In addition, we will test neuronal markers that differ between dauers and non-dauers to determine whether the mutants are full or partial dauers at the relevant timepoints.

      The SAX-2 localization (Fig. 4) and endocytosis assay (Fig. 6) results were not clear to me from the data shown. Overall a more rigorous analysis and presentation of the data would be important to make these conclusions convincing. This may involve refining the data presentation in the figures, modifying the claims (e.g., "we propose" vs "we find"), or saving some of the data to be more fully explored in a future paper. In my view, these figures are the biggest weak point of the manuscript and also are not important for the central conclusions (which are well supported and convincing), indeed these results are barely mentioned in the Abstract or last paragraph of Introduction.

      We agree that the analysis and presentation of Figures 4 and 6 need to be improved. The presentation has already been updated, and the figures are clearer now. In the revision, we will increase sample size to provide stronger conclusions, consolidate some of the analysis and further improve presentation. While we agree with the reviewer that conclusions from these figures are not as strong as those drawn from genetic experiments, they do complement and support the conclusions of those other figures.

      • In Fig. 4D, why is SAX-2 visible throughout the entire neuron and why is the "punctum" marked with an arrow also seen in the tagRFP channel? One gets the impression that some of the puncta may be background, bleed-through, or artifacts due to cell varicosities.

      There is no bleed-through: this is most evident by looking at the brightest signals in the cell body (now labelled with an asterisk in a zoomed-out image) and noting that they do not bleed between channels. In sax-1 mutants, the SAX-2::GFP puncta are very obvious and distinguishable from the tagRFP channel. In control, SAX-2::GFP is very faint in the dendrite, so we increased the contrast to allow visualization. The reviewer is correct that under these conditions, some puncta look like the cytosolic fill. In the revision, we will re-analyze the data and will not consider these as bona-fide SAX-2 puncta, but rather cytosolic SAX-2 that accumulates due to constrictions and varicosities in the dendrite.

      • Related to both Fig. 4 and Fig. 6, where does SAX-1 localize in IL2Q in dauer and post-dauer? Does its expression or localization change during branch retraction? Does it co-localize with SAX-2 or endocytic puncta?

      We generated an endogenously tagged sax-1 with a 7xspGFP11 tag; however, this was below detection in the IL2s. For the revisions, we can test an overexpressed cDNA construct.

      **Referee cross-commenting**

      I think we all touched on similar points. I wanted to follow up on Reviewer 3's comment, "Is the failure to eliminate branches an indication of incomplete dauer recovery? Do sax-1 mutants retain additional characteristics of dauer morphology in post dauer adults." I thought this was an excellent point. It made me wonder if that might explain why the defect is only seen in daf-7 and daf-2 mutant backgrounds - maybe these strains retain partial dauer traits even after exit. Is there a specific experiment that they could do? Did you have specific characteristics of dauer morphology in mind for them to check? (Ideally something in the nervous system that can be scored quantitatively.)

      Please see response to point #1 regarding experiments we will do to confirm the “dauer state” of daf-7 and daf-7; sax-1 double mutants.

      Reviewer #1 (Significance (Required)):

      A major strength of this work is the pioneering use of a novel system to study neuronal branch retraction. C. elegans has provided a powerful model for studying how dendrite branches form, but much less attention has been paid to how excess neuronal branches are removed. The post-dauer remodeling of IL2Q neurons provides an exciting and dramatic physiological example to explore this question.

      This paper is notable for taking the first steps towards developing this innovative model. It does exactly what is needed at the outset of a new exploration - a forward genetic screen to discover the main regulators of the process. Using a combination of classical and modern genetic approaches, the authors bootstrap their way to a sizeable list of factors and a solid understanding of the properties of this system, for example that retraction of higher vs lower order dendrites show different genetic requirements.

      We thank the reviewer for recognizing the novelty and significance of our work.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      In this manuscript, the authors establish C. elegans IL2 neurons as a system in which to study dendrite pruning. They use the system to perform a genetic screen for pruning regulators and find an allele of sax-1. Unexpectedly sax-1 is only required for post-dauer pruning in two different genetic backgrounds that induce dauer formation, but not starvation-induced dauer formation. Sax-1/NDR kinase reduction has previously been associated with increased outgrowth and branching in other systems, so this is a new role for this protein. However, the authors show that proteins that work with Sax-1 in other systems, like sax-2/fry, also play a role in this pathway. The genetic experiments are beautiful and the findings are all clearly explained and strongly supported. The authors also examine sax-2 localization, which localizes sax-1 in other systems, and show it in puncta in dendrites that increase with dauer exit, consistent with function at the time of pruning. They also show that membrane trafficking regulators associated with NDR kinases function in the same pathway here, hinting that endocytosis may play a role during pruning as in Drosophila. The link to endocytosis was a little weak (see Major point below). Overall, this study describes a new system to study pruning and identifies NDR/fry/Rabs as regulators of pruning during dauer exit. The work is very high quality and both the imaging and genetics are extremely well done.

      We thank the reviewer for their positive assessment of the manuscript.

      Major points

      1. The only place where there were any questions about the data was the last figure (6G and I). Here they use uptake of GFP secreted from muscle as a readout of endocytosis in IL2 neurons. They nicely show that more internalized puncta accumulate as animals exit dauer. The claim that this is reduced in sax-1 mutants doesn't seem to match the images shown well. In the image there are many more puncta in the GFP channel and much more accumulation of the RFP-tagged receptor everywhere. It seems like some additional analysis of this data is important to fully capture what is going on and whether this really represents an endocytic defect. We agree and will provide additional data in Figure 6. The specific discrepancy between the image and the quantification is because we showed a single focal plane rather than a projection. This does not capture all the puncta in a neurite. The current version shows a projection, making it evident that the mutants has fewer puncta compared to the control.

      Reviewer #2 (Significance (Required)):

      Neurite pruning is important in all animals with neurons. Genetic approaches have primarily been applied to the problem using Drosophila, so identifying a new model system in which to study it is an important step. Using this system, a pathway known to function in a different context is linked to pruning. Thus the study provides new insights into both pruning and this pathway.

      We thank the reviewer for the positive assessment of our study’s significance.

      __Reviewer #3 (Evidence, reproducibility and clarity (Required)): __

      Summary: Figueroa-Delgado et al. use a C. elegans neuro plasticity model to examine how dendrites are eliminated upon recovery from the stress induced larval stage, dauer. The authors performed a mutagenesis screen to identify novel regulators of dendrite elimination and revealed some surprising results. Branch elimination mechanism varies between 2{degree sign}, 3{degree sign}, and 4{degree sign} branches. The NDR kinase, SAX-1 and it's interactors (SAX-2 and MOB-2) are required for elimination of second and third order branches but not fourth order branches. Interestingly they showed that branch elimination varies depending on the stimulus of dendrite outgrowth such that the NDR kinase is required for branch elimination after genetically inducing the dauer stage but is not required if dauers are produced through food deprivation. The authors go a step further to include a small candidate screen looking at various pathways of membrane remodeling and identify additional regulators of dendrite elimination related to membrane trafficking including RABI-1, RAB-8, RAB-10, and RAB-11.2.

      We thank the reviewer for their time and suggestions below

      Major comments:

      • While I find the data promising and exciting, several of the experiments have concerningly low sample sizes. Fig 3G, Fig 4G, Fig 5J and L, and Fig 6I all contain data sets that are fewer than 10 animals. Sample sizes should be stated specifically in the figure legends for all data represented in the graphs. We thank the reviewer for finding the data exciting. We agree that the sample sizes in some panels is low and will increase it in the revised version. Sample sizes are now specifically listed in the figure legends.

      • All statements based on data not shown should be amended to include the data as a supplemental figure or edited to omit the statement based on withheld data. We agree. Some “not shown” data are already added to the current version of the manuscript and the rest will be added to the fully revised version, or the statements will be omitted.

      • Rescue experiments (Fig 2J) should demonstrate failure to rescue from neighboring tissue types (hypodermis and muscle) to conclude cell autonomous rescue rather than a broadly acting factor. Thank you for the suggestion. We will use a hypodermal promoter and a muscle promoter driving SAX-1 cDNA expression to strengthen the claim of cell autonomy.

      • Fig 4 needs quantification of higher order branches and SAX-2 proximity to branch nodes as these are discussed in the text. We will add this quantification.

      Minor comments:

      • Fig 1C-F, It appears like the shy87 allele produces animals of significantly different body sizes. It would improve rigor to normalize the dendrite coverage to body size in the quantification. We do not see a biologically meaningful size difference between shy87 and control, it may be the specific image shown. We will confirm this by measuring animal size for the final revision.

      • Is the failure to eliminate branches an indication of incomplete dauer recovery? Do sax-1 mutants retain additional characteristics of dauer morphology in post dauer adults. This important point was also raised by Reviewer 1. We will test SDS sensitivity, morphological markers, and molecular markers to determine the dauer “state” of the mutants used in this study. The results will be included in the final revision.

      • The text references multiple transgenic lines tested in Fig 2I-J but only one line is shown. Additional lines were visually examined under a fluorescent compound microscope but not imaged or quantified. We will add this quantification to the final revision.

      • Fig 4F, Additional timepoints would enhance the sax-1 localization result and might provide insight into mechanism of action for sax-1. We will add the localization in post-dauer adults.

      • Fig 6I Control and sax-1(ky491) example images should be provided in the supplement. We will add these images to the final revision.

      **Referee cross-commenting**

      I agree that we shared many of the same concerns.

      There are several general assays for dauer characteristics that could be used here to determine if the post-dauer animals retain other characteristics of the dauer stage in addition to IL2 branches (SDS resistance, alae remodeling, pharyngeal bulb morphology, nictation behavior). The nictation behavior has been connected very nicely with IL2 neurons (Junho Lee's group). Additionally, FLP dendrites occupy the same space as the IL2 branches and outgrowth in post-dauers occurs in coordination with IL2 branch elimination - this might be another optional experiment, to check if FLP growth is impeded by persistent IL2 branches. All of these could be quantified similar to how the authors have already established with their IL2 model (FLP dendrite branches) or with a binary statistic.

      Please see responses to Reviewer 1 and 3 above for the list of experiments to determine whether the animals fail to completely enter or exit dauer.

      Reviewer #3 (Significance (Required)):

      SIGNIFICANCE ============ These results describe a new role for the NDR kinase complex in dendrite pruning that has clinical significance to our understanding of human brain development and human health concerns in which pruning is dysregulated, such as observed in the case of autism. The authors use an established neuro-plasticity, C. elegans model (Schroeder et al. 2013) which provides a tractable and reproduceable platform for discovering the mechanism of dendrite pruning. These results would influence future work in the fields of cell biology of the neuron and disease models of brain development.

      My expertise is in the field of C. elegans neuroscience and stress biology and have sufficient expertise to evaluate all aspects of this work.

      3. Description of the revisions that have already been incorporated in the transferred manuscript

      Reviewer #1

      • In Fig. 4C, the distinction between puncta in the primary or higher-order dendrites is not clear to me, and several puncta that I would have scored as primary are marked as higher-order.

      We apologize for a mistake in the arrowhead color and overall presentation of this figure. It has been fixed in the current version.

      • Related to this, in Fig. 4B are the two arrows meant to be white as in the top panel, or yellow as in the bottom panel?

      We thank Reviewer #1 for their observation, and we apologize for our oversight. We fixed this in the current version.

      • In Fig. 4, where in the head are we looking? It would help to show a more low-magnification view of the entire cell.

      We added zoomed-out images and indicated where the zoomed in insets are taken from. We thank the reviewer for helping us improve the clarity of the data.

      • The main sax-1 phenotype is increased SAX-2 puncta in dauer, but the branch retraction defect is in post-dauers. How is this relevant to the phenotype?

      This is a very good point. The increase in SAX-2 puncta in sax-1 mutants is stronger during dauer-exit than in dauer, consistent with this being the time when SAX-1 functions. We agree that some earlier activity of SAX-1 cannot be excluded, and we do not assume that the effect on SAX-2 completely accounts for the pruning defects. This is now acknowledged in the text. However, given that both proteins function together in pruning, and given that the effect is strongest during dauer exit, we do believe that this data is informative and worth showing.


      • The number of SAX-2 puncta in sax-1 mutants decreases almost to normal in post dauers. Is there a correlation between the number of remaining branches and the number of SAX-2 puncta? That is, do the many wild-type animals with "excess" SAX-2 puncta also fail to retract branches?

      There is no correlation. In other words, the number of SAX-2 puncta does not instruct the extent of pruning. Please note the quantifications underestimate the number of SAX-2 puncta in the mutants, since they were only done on the primary dendrite. This is necessary because the mutant and control have different arbor size, so only branch order that can be appropriately compared are primary dendrites.

      • The control post-dauer data in Fig. 4F and 4H are identical (re-used data) but the corresponding control dauer data in Fig. 4F and 4G are different. What is going on here?

      We thank the reviewer for raising this point and apologize for the oversight in data presentation. In the revised manuscript, we now show all control and experimental data integrated into a single graph, ensuring that each dataset is represented accurately to provide a comparison between dauer and post dauer recovery conditions.


      • Why are sample sizes so small for both strains in Fig. 4G compared to Fig. 4F and 4H?

      We sincerely apologize for this mistake, some of the data was erroneously grouped in the original submission. The revised version contains an updated number of neurons, presented on the same graph, and in the final revision we will further increase sample size. We apologize again for this error.

      • In Fig. 6C, why are the tagRFP (blue) puncta larger than the neurite? Aren't these meant to represent vesicles inside the surrounding neurite? One gets the impression that this is bleed-through from the GFP channel.

      Based on EM, both an endocytic punctum and the diameter of the neuron are smaller than a single pixel. The apparent difference in size in fluorescence microscopy is because the puncta are brighter (they contain more membrane) and thus appear larger. In the current version, the improved presentation of the figure contains zoomed out images that clearly show that there is no bleed-through.

      • In Fig. 6E and 6F, why are there no tagRFP (blue) puncta? Is CD8 not endocytosed at all if it lacks the nanobody sequence? One would expect the tagRFP (blue) signal to be the same in both strains and simply to lack yellow if the nanobody is not present.

      CD8 lacks clear endocytosis motifs, which is why it is advantageous for labelling neurites and testing endocytosis when paired with an endocytic signal (Lee and Luo 1999; Kozik et al. 2010). Conversely, extracellular GFP binding to a membrane GFP antibody can induce endocytosis (for example, see (Tang et al., 2020)), likely by inducing clustering, although we are not familiar with work that explored the mechanism. In the updated version we included a rare example of an mCD8 punctum.

      • The authors report a decrease in endocytic events in sax-1, but qualitatively it looks like there are vastly more puncta inside the neuron in Fig. 6H than in 6G.

      We apologize for the presentation in the original version of Figure 6. This impression was because we showed single focal planes that only captured some of the signal. In the revised version we show projections, which makes it evident that there are fewer endocytic events in the mutant.

      • In Fig. 6E and 6H, why are there so many GFP (yellow) puncta outside the neuron? What are these structures and why are they absent in the strain with the nanobody?

      These puncta are secreted or muscle-associated GFP that has not been internalized by IL2Q neurons. They are present in all strains in this figure, this can be clearly seen in the zoomed-out images that have been added to the updated figure.

      • What is the large central blue structure in Fig. 6H - is this the soma? - and why are puncta in this region not counted?

      This is indeed the soma. In the updated version this can be clearly seen in the zoom-out. The large puncta in the soma were not counted because they may arise from the fusion of an unknown number of smaller puncta, and their precise number cannot be determined at the resolution of fluorescence microscopy.

      • minor: there is text reading "40-" in the bottom panel of Fig. 6H. It is visible when printed but not on screen - adjust levels in Photoshop to reveal it.

      We thank the reviewer for catching this oversight, it is now fixed.

      Minor points:

      1. At several points the authors emphasize the relationship of neurite remodeling to stress, e.g. Abstract and Discussion: "we adapted C. elegans IL2 sensory dendrites as a model [of...] stress-mediated dendrite pruning". It seems unnecessary and potentially misleading to treat this as a neuronal stress response. First, it conflates organismal and cellular stress - there is no reason to think that IL2 neurons are under cellular stress in dauer. In fact parasitic nematodes go through dauer-like stages as part of healthy development and probably have similar remodeling of IL2. Second, dendrite pruning occurs during dauer exit, which is the opposite of a stress response - it reflects a return to favorable conditions. We agree. We modified the abstract and discussion to avoid conflating organismal stress (the alleviation of which is relevant for triggering pruning) and cellular stress. Thank you for pointing this out.

      In Fig. 1A, C. elegans is shown going directly from L1 to dauer in response to unfavorable conditions, which is incorrect. Animals proceed through L2 (in many cases actually an alternative L2d pre-dauer) and then molt into dauer (an alternative L3 stage) after completing L2.

      We updated the schematic to include the L2d stage where commitment to dauer entry or resumption to reproductive development is made.

      In Fig. 1B, please check if it is correct that hypodermis contacts the pharynx basement membrane as drawn. The schematic in the top panel makes it look like there is a single secondary branch and the quaternary branches are similar in length to the primary dendrite. The schematic in the bottom panel makes it look like the entire neuron is a small fraction of the length of the pharynx. Could these be drawn closer to scale?

      The hypodermis does contact the pharynx basement membrane. We redrew the schematic for clarity.

      Reviewer #2

      For context, it might be helpful to know whether branching of other dendrites is increased in sax-1 mutants (as expected based on phenotypes in other animals) or decreased like IL2 neurons.

      We examined the branching pattern of PVD, a polymodal nociceptive neuron (new Supplemental Figure 3). We find no significant difference between control and sax-1 or sax-2 mutants, suggesting that these genes function in the context of pruning. Recent work (Zhao et al. 2022) confirms that sax-1 is not required for PVD branching.

      Minor:

      "shy87 mutant dauers showed a minor reduction in secondary and tertiary branches compared to control (Figure 1G). These results indicate that shy87 is specifically required for the elimination of dauer-generated dendrite branches." Maybe temper the specificity claim some as the reduction in branches is definitely there.

      We agree, the claim was tempered.

      "three complimentary approaches" should be complementary

      Thank you for noticing. We fixed this.

      "In control animals, SAX-2 was mostly concentrated in the cell body (data not shown)" It might be nice to include some overview images that show the cell body for completeness.

      We added zoomed-out images to the revised figure, thank you for the suggestion.

      Reviewer #3


      Minor comments:


      • Fig 1G-H, are shy87 second and third order branch counts statistically different between dauer and post dauer adults? This comparison would strengthen the claim that these order branches fail to eliminate all together rather than undergo a partial elimination. We added this to Figure S2. The shy87 mutants show a complete failure in eliminating secondary branches (i.e. no difference between dauer and post-dauer) and a strong but incomplete defect in eliminating tertiary branches.

      • Fig 4B-E Indicate branch order in the images, this is unclear and a point that is focused on in the text. Done.

      • Discussion of Fig 1G from the text claims that shy87 is specifically required for branch elimination yet the data shows significant defects in branch outgrowth as well. This raises the question, are the branches abnormally stabilized that results in early underdevelopment and late atrophy? Authors should acknowledge alternative hypotheses. We agree and will revise the text accordingly. The difference between shy87 and control dauers, while statistically significant, is relatively minor and can only be detected by careful quantification, it is not apparent from looking at the images (in contrast for example to rab-8 and rab-10 mutants, where we acknowledge in the text that their branching defects might affect subsequent pruning.

      • Authors reference a branch elimination process but don't outline what this would entail and where their results fit in. We apologize for being unclear. Given that sax-1 and sax-2 function together, one would intuitively expect to see SAX-2 being reduced in sax-1 mutants, yet the opposite is observed. On potential explanation is that SAX-1 does not directly control SAX-2 abundance, but that clearance of SAX-2 is part of the pruning process that both proteins regulate. This would explain the enrichment of SAX-2 in sax-1 mutants. However, additional models cannot be excluded, and we acknowledge this in the revised text.

      References:

      Corchado, Johnny Cruz, Abhishiktha Godthi, Kavinila Selvarasu, and Veena Prahlad. 2024. “Robustness and Variability in Caenorhabditis Elegans Dauer Gene Expression.” Preprint, bioRxiv, August 26. https://doi.org/10.1101/2024.08.15.608164.

      Karp, Xantha. 2018. “Working with Dauer Larvae.” WormBook, August 9, 1–19. https://doi.org/10.1895/wormbook.1.180.1.

      Kozik, Patrycja, Richard W Francis, Matthew N J Seaman, and Margaret S Robinson. 2010. “A Screen for Endocytic Motifs.” Traffic (Copenhagen, Denmark) 11 (6): 843–55. https://doi.org/10.1111/j.1600-0854.2010.01056.x.

      Lee, T., and L. Luo. 1999. “Mosaic Analysis with a Repressible Cell Marker for Studies of Gene Function in Neuronal Morphogenesis.” Neuron 22 (3): 451–61.

      Swanson, M. M., and D. L. Riddle. 1981. “Critical Periods in the Development of the Caenorhabditis Elegans Dauer Larva.” Developmental Biology 84 (1): 27–40. https://doi.org/10.1016/0012-1606(81)90367-5.

      Tang, Rui, Christopher W Murray, Ian L Linde, et al. n.d. “A Versatile System to Record Cell-Cell Interactions.” eLife 9: e61080. https://doi.org/10.7554/eLife.61080.

      Zhao, Ting, Liying Guan, Xuehua Ma, Baohui Chen, Mei Ding, and Wei Zou. 2022. “The Cell Cortex-Localized Protein CHDP-1 Is Required for Dendritic Development and Transport in C. Elegans Neurons.” PLOS Genetics 18 (9): e1010381. https://doi.org/10.1371/journal.pgen.1010381.


      4. Description of analyses that authors prefer not to carry out

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      This interesting study uses an unbiased genetic screen in C. elegans to identify SAX-1/NDR kinase as a regulator of dendritic branch elimination. Loss of SAX-1 results in an excess branching phenotype that is striking and highly penetrant. The authors identify several additional regulators of branch elimination (SAX-2, MOB-1, RABI-1, RAB-11.2) by using a candidate genetic screen aimed at factors that interact physically or genetically with SAX-1. They propose that SAX-1 acts by promoting membrane retrieval based on the nature of these interactors and the results of an imaging-based in vivo assay for endocytic puncta.

      Major comments.

      1. My biggest concern is that the phenotypes are only observed in temperature-sensitive dauer-constitutive mutant backgrounds, and not in wild-type dauers. That is, wild-type animals exiting dauer do not require SAX-1 for dendrite elimination.

      While this does not undermine the importance of the results, it does require more explanation. The authors write that "the requirement for sax-1... relies on specific physiological states of the dauer stage," but I do not understand what this means. Are they saying that daf-7 and daf-2 dauers are in a different "physiological state" than wild-type dauers? In what way? What is the evidence for this? A more rigorous explanation is needed.

      To me, the simplest genetic explanation is that daf-7 and daf-2 are partially required for branch retraction in a manner redundant with sax-1, and the ts mutants are not fully wild-type at 15C. Thus, the sax-1 requirement is revealed only in these mutant backgrounds. Can the authors examine starvation-induced dauers of daf-7 or daf-2 raised continuously at 15C?

      daf-7 and daf-2 ts strains can form "partial dauers" that have a dauer-like appearance but are not SDS resistant. Could the difference between partial dauers and full dauers account for the difference in sax-1-dependence? The authors could use SDS selection of the daf-7 strain at 25C to ensure they are examining full dauers.

      The Bargmann lab has created a daf-2 FLP-OUT strain (ky1095ky1087) that allows cell-type-specific removal of daf-2. Could this be used to test for a cell-autonomous role of daf-2 in IL2Q related to branch elimination?

      These ideas are not a list of specific experiments the authors need to complete, rather they are meant to illustrate some possible approaches to the question. Whatever approach they use, it is important for them to more rigorously explain why SAX-1 is not required for branch removal in wild-type animals. 2. The SAX-2 localization (Fig. 4) and endocytosis assay (Fig. 6) results were not clear to me from the data shown. Overall a more rigorous analysis and presentation of the data would be important to make these conclusions convincing. This may involve refining the data presentation in the figures, modifying the claims (e.g., "we propose" vs "we find"), or saving some of the data to be more fully explored in a future paper. In my view, these figures are the biggest weak point of the manuscript and also are not important for the central conclusions (which are well supported and convincing), indeed these results are barely mentioned in the Abstract or last paragraph of Introduction.

      • In Fig. 4, where in the head are we looking? It would help to show a more low-magnification view of the entire cell.
      • In Fig. 4D, why is SAX-2 visible throughout the entire neuron and why is the "punctum" marked with an arrow also seen in the tagRFP channel? One gets the impression that some of the puncta may be background, bleed-through, or artifacts due to cell varicosities.
      • In Fig. 4C, the distinction between puncta in the primary or higher-order dendrites is not clear to me, and several puncta that I would have scored as primary are marked as higher-order.
      • Related to this, in Fig. 4B are the two arrows meant to be white as in the top panel, or yellow as in the bottom panel?
      • The main sax-1 phenotype is increased SAX-2 puncta in dauer, but the branch retraction defect is in post-dauers. How is this relevant to the phenotype?
      • The number of SAX-2 puncta in sax-1 mutants decreases almost to normal in post dauers. Is there a correlation between the number of remaining branches and the number of SAX-2 puncta? That is, do the many wild-type animals with "excess" SAX-2 puncta also fail to retract branches?
      • The control post-dauer data in Fig. 4F and 4H are identical (re-used data) but the corresponding control dauer data in Fig. 4F and 4G are different. What is going on here?
      • Why are sample sizes so small for both strains in Fig. 4G compared to Fig. 4F and 4H?
      • In Fig. 6C, why are the tagRFP (blue) puncta larger than the neurite? Aren't these meant to represent vesicles inside the surrounding neurite? One gets the impression that this is bleed-through from the GFP channel.
      • In Fig. 6E and 6F, why are there no tagRFP (blue) puncta? Is CD8 not endocytosed at all if it lacks the nanobody sequence? One would expect the tagRFP (blue) signal to be the same in both strains and simply to lack yellow if the nanobody is not present.
      • In Fig. 6E and 6H, why are there so many GFP (yellow) puncta outside the neuron? What are these structures and why are they absent in the strain with the nanobody?
      • What is the large central blue structure in Fig. 6H - is this the soma? - and why are puncta in this region not counted?
      • The authors report a decrease in endocytic events in sax-1, but qualitatively it looks like there are vastly more puncta inside the neuron in Fig. 6H than in 6G.
      • minor: there is text reading "40-" in the bottom panel of Fig. 6H. It is visible when printed but not on screen - adjust levels in Photoshop to reveal it.
      • Related to both Fig. 4 and Fig. 6, where does SAX-1 localize in IL2Q in dauer and post-dauer? Does its expression or localization change during branch retraction? Does it co-localize with SAX-2 or endocytic puncta?

      Minor points:

      1. At several points the authors emphasize the relationship of neurite remodeling to stress, e.g. Abstract and Discussion: "we adapted C. elegans IL2 sensory dendrites as a model [of...] stress-mediated dendrite pruning". It seems unnecessary and potentially misleading to treat this as a neuronal stress response. First, it conflates organismal and cellular stress - there is no reason to think that IL2 neurons are under cellular stress in dauer. In fact parasitic nematodes go through dauer-like stages as part of healthy development and probably have similar remodeling of IL2. Second, dendrite pruning occurs during dauer exit, which is the opposite of a stress response - it reflects a return to favorable conditions.
      2. In Fig. 1A, C. elegans is shown going directly from L1 to dauer in response to unfavorable conditions, which is incorrect. Animals proceed through L2 (in many cases actually an alternative L2d pre-dauer) and then molt into dauer (an alternative L3 stage) after completing L2.
      3. In Fig. 1B, please check if it is correct that hypodermis contacts the pharynx basement membrane as drawn. The schematic in the top panel makes it look like there is a single secondary branch and the quartenary branches are similar in length to the primary dendrite. The schematic in the bottom panel makes it look like the entire neuron is a small fraction of the length of the pharynx. Could these be drawn closer to scale?

      Referee cross-commenting

      I think we all touched on similar points. I wanted to follow up on Reviewer 3's comment, "Is the failure to eliminate branches an indication of incomplete dauer recovery? Do sax-1 mutants retain additional characteristics of dauer morphology in post dauer adults." I thought this was an excellent point. It made me wonder if that might explain why the defect is only seen in daf-7 and daf-2 mutant backgrounds - maybe these strains retain partial dauer traits even after exit. Is there a specific experiment that they could do? Did you have specific characteristics of dauer morphology in mind for them to check? (Ideally something in the nervous system that can be scored quantitatively.)

      Significance

      A major strength of this work is the pioneering use of a novel system to study neuronal branch retraction. C. elegans has provided a powerful model for studying how dendrite branches form, but much less attention has been paid to how excess neuronal branches are removed. The post-dauer remodeling of IL2Q neurons provides an exciting and dramatic physiological example to explore this question.

      This paper is notable for taking the first steps towards developing this innovative model. It does exactly what is needed at the outset of a new exploration - a forward genetic screen to discover the main regulators of the process. Using a combination of classical and modern genetic approaches, the authors bootstrap their way to a sizeable list of factors and a solid understanding of the properties of this system, for example that retraction of higher vs lower order dendrites show different genetic requirements.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      I am not currently convinced by the principal interpretations and think that other explanations based on known phenomena could account for key results. Specifically the authors have not resolved whether oxidative modification to 5mC and 3mC, or chemical attack to ssDNA that is transiently exposed in the repair processing of 5mC and 3mC is the principal source of the observed genotoxicity.

      (1) Original query which still stands: As noted in the manuscript, AlkB repairs alkylation damage by direct reversal (DNA strands are not cut). In the absence of AlkB, repair of alklylation damage/modification is likely through BER or other processes involving strand excision and resulting in single stranded DNA. It has previously been shown that 3mC modification from MMS exposure is highly specific to single stranded DNA (PMID:20663718) occurring at ~20,000 times the rate as double stranded DNA. Consequently the introduction of DNMTs is expected to introduce many methylation adducts genome-wide that will generate single stranded DNA tracts when repaired in an AlkB deficient background (but not in an AlkB WT background), which are then hyper-susceptible to attack by MMS. Such ssDNA tracts are also vulnerable to generating double strand breaks, especially when they contain DNA polymerase stalling adducts such as 3mC. The generation of ssDNA during repair is similarly expected follow the H2O2 or TET based conversion of 5mC to 5hmC or 5fC neither of which can be directly repaired and depend on single strand excision for their removal. The potential importance of ssDNA generation in the experiments has not been [adequately] considered.

      We thank the reviewer for expanding on their previous comment.  We completely agree with the possibility that they raise and have added an extra paragraph in the discussion to expand on our consideration of the role of ssDNA in DNMT-induced DNA damage, which we reproduce here:

      "The observation that TET overexpression sensitizes cells expressing DNMTs to oxidative stress strongly suggests that the site of DNA damage is the modified cytosine itself.  However, we do not currently have definitive evidence supporting this.  As mentioned in the results section, the presence of unrepaired 3mC may lead to increased levels of ssDNA; it is also possible that 5mC itself may increase ssDNA levels.  Loss of alkB would be expected to increase the amount of ssDNA.  Thus DNA damage surrounding modification sites, but not specifically localised to it, might be the cause of the increased sensitivity.  These two different models make different predictions.  If modified cytosines are the source of the damage, mutations arising would be predominantly located at CG dinucleotides.  Alternatively, ssDNA exposure would result in distributed mutations that would not necessarily be located at CG sites.  The highly biased spectrum of mutations that can be screened through the Rif resistance assay does not allow us to address this currently.  However, future experiments to create mutation accumulation lines could allow us to address the question systematically on a genome-wide level. "

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the reviewers for their positive comments. Our manuscript is to our knowledge the first to investigate the role of VAIL (V-ATPase—ATG16L1 induced LC3 lipidation), a form of CASM (Conjugation of ATG8s to single membranes) in SARS-CoV-2 replication. We demonstrate that SARS-CoV-2 Envelope (E) induces VAIL and this contributes to viral replication, including by using a reverse genetics system to make an E mutant virus. There have been many high quality studies examining the role of canonical autophagy in SARS-CoV-2 replication and our manuscript does not argue that all or even most LC3 lipidation during infection is via VAIL. We will try to make this point more clearly in the text. We do not think this detracts from the novelty and importance of our manuscript.

      *Reviewer #1 (Evidence, reproducibility and clarity (Required)): *

      • Figueras-Novoa et al present a short report demonstrating the induction of LC3 lipidation on single membranes by SARS-CoV-2 through a noncanonical autophagy pathway referred to as VAIL. The authors utilize elegant genetic tools to show that the induction of LC3 lipidation upon viral infection is mainly due to VAIL rather than canonical autophagy. They demonstrate that the activity of the viral E protein that can cause neutralization of acidic vesicles leads to the activation of non-canonical LC3 lipidation on single membranes. Interestingly, the authors also conclude that the impairment of VAIL leads to a reduction of viral load as a result of a defect in later stages of viral infection, although the underlying mechanism was not further explored. *

      • Overall, this is an elegant and well controlled study that provides a clear conclusion. I only have some minor comments.*

      We thank the reviewer for their assessment of our manuscript.

      In some experiments, LC3 lipidation does not appear to be fully disrupted upon VAIL inhibition (e.g. Fig.'s 1H, 3D, 4A). As other labs have shown that SARS-CoV2 blocks autophagic flux, this could be further clarified in this manuscript as both VAIL and autophagy may be co-induced upon viral infection.

      We agree with the reviewer that there is a contribution of canonical macroautophagy to the LC3B lipidation observed in SARS-CoV-2. We will extend the discussion in the manuscript to clarify this point for the readers.

      Can the authors test the induction of LC3 lipidation in cells expressing K490 mutant of ATG16L1 in ATG16L1 KO cells to compare them with ATG16L1-ATG13 double knockouts?

      The western blot in figure 3F (quantified in Figure 3G) shows LC3B lipidation in response to E expression in ATG16L1-ATG13 double knock out cells reconstituted with wild type ATG16L1 but not in cells complimented with ATG16L1 K490A mutant. We agree that the referee’s suggestion to perform these experiments in the context of infection would be informative. However in spite of numerous attempts, we have so far been unable to generate a cell clone fully devoid of ATG16L1 in a cell line that can be productively infected with SARS-CoV-2. For reasons unclear to us there appears to be a very low level of residual ATG16L1 activity despite multiple different CRISPR/Cas9 targeting attempts. The suggested complementation experiments might still be informative in the context of low level ATG16L1 expression so we will pursue this. Alternatively, as a contingency we can try to produce SARS-CoV-2 infectable cells with mutations in ATG16L1’s binding partner V1H, this interaction is required for VAIL. A further contingency could be to assess LC3B lipidation during infection and treatment with a Vps34 inhibitor, which inhibits canonical autophagy.

      Minor points: * * The difference between Fig. 1F&G is unclear and why the authors are including both analyses. Similarly figures 4G&H.

      We included both metrics to show that the decrease in LC3B lipidation in cells expressing SopF during infection is robust and observed in two separate readouts. While spot area measures the area of infected cells covered by GFP-LC3B fluorescence, spot intensity is a reading of the intensity of the area defined in an infected cell as being LC3 positive. Theoretically, these measurements could change in different ways. For example, if the same amount of lipidated LC3 were to distribute over a larger area of the cell. We prefer to keep both measurements in the manuscript.

      The authors should show boxed colocalisation of all images, including negative controls. For examples, the authors have shown boxed magnifications in only the lowest panel in Figure 2A but not the upper two panels. Figures 4E&F should include boxed examples. This serves to clarify both positive and negative colocalisation events.

      Boxed magnifications will be added to all images.

      • Reviewer #1 (Significance (Required)): *

      • Overall an elegant and well controlled study demonstrating the induction of non-canonical LC3 conjugation on single membranes (VAIL) during SARS-CoV2 infection. A further exploration of canonical autophagy (as previously published by others) in addition to VAIL would enhance this study.*

      As the reviewer noted, several excellent studies have explored canonical autophagy during SARS-CoV-2 infection, many of which we cite in our manuscript. Our focus, however, is to demonstrate that SARS-CoV-2 E induces LC3 lipidation via VAIL. We believe that exploring the diverse roles of canonical autophagy mechanisms in SARS-CoV-2 infection is beyond the scope of this study.

      *This study is of interest to researchers studying autophagy, viruses, immunology, single membrane LC3 lipidation, and lysosomes as well as potentially clinicians treating SARS-CoV2 infecteted individuals. *

      • This reviewer is experienced in autophagy research.*

      We thank the reviewer for this assessment of our manuscript.

      *Reviewer #2 (Evidence, reproducibility and clarity (Required)): *

      • Major Comments *

      • Figure 1D does not very clearly show an overlap between V1D and LC3B. Both proteins seem broadly present across the cell and there is no easily identifiable change in V1D distribution upon infection. As such the overlay may be purely stochastic. The authors should quantify the observed co-localization events across multiple cells and biological replicates and compare them to other protein(s) with a similar cellular distribution pattern.*

      We agree there is no obvious change in V1D staining on infection. The images in Figure 1D are purely intended to illustrate that LC3 and the V-ATPase can colocalise, not to demonstrate a change in V-ATPase distribution or to suggest a direct interaction. We will make this point more clearly in the text. We will also carry out analyses of the kind (see also response to the first two Minor Comments). We would be happy to provide an alternative method of visualising the V-ATPase (we could use any suitable antibody to the V-ATPase, or the bacterial effector SidK) if required. In response to reviewer 3’s comments, we will carry out a pull-down experiment to test the association of the V-ATPase and ATG16L1 during E expression, as this is a key interaction during VAIL activation.

      Based on Figure 2F the authors suggest that virus entry is unaffected by the inhibition of VAIL in early timepoints. However, according to the figure legend, the timepoint used is 7hpi, while 2D uses 24hpi. Some SARS-CoV-2 papers suggest 7-10 hours is sufficient time to release new virions (Ban-On et al., 2020). As such 7hpi can not necessarily be seen as an early time point. Did the authors test earlier ones? Also, based on this, would it be possible that the effects observed at 24hpi are actually secondary infections, meaning that the virus utilizes pathway components for virion production and a lack thereof reduces infectivity of newly formed virions? In this case it would be interesting to set up an assay that can distinguish between primary and secondary infection to study both individually more closely.

      Whereas 7 hours may be sufficient to release new virions, it is not sufficient to establish infections in other cells – this is why we chose that time point. The observation that there is no difference in the percentage of infected cells at 7 h p.i. (figure 2F) led us to suggest that viral entry is unaffected . We then confirmed this through the pseudovirus assay in Figure 2G, where no difference is found between SopF and mCherry expressing cells. For this assay, GFP-expressing, replication incompetent, lentiviral particles pseudotyped with Spike from different SARS-CoV-2 lineages were used to transduce mCherry and SopF expressing cells. A change in the percentage of GFP-positive cells would indicate an effect on viral entry, but no such change was observed in SopF-expressing cells.

      We agree with the reviewer that the effects observed at 24 hpi are likely due to a defect in subsequent rounds of infection, since no difference was observed at 7 hpi or with our pseudovirus assay. We will attempt to make this point in the text as clearly as possible.

      The authors nicely show in their study an involvement of VAIL in SARS-CoV-2 mediated LC3 lipidation. However, the observed effects are relatively moderate in several experiments, indicating that there may be another contributor to the observed phenotype. It would be nice to highlight this in the discussion and debate potential mechanisms that are causing the observed effects during infection.

      We agree with the reviewer’s analysis. We have discussed the contribution of canonical autophagy in the second paragraph of the discussion, but we will expand on this in a revised manuscript. E expression levels are moderate during infection, other structural proteins such as N and M are present in much higher amounts. Since E is the key protein in VAIL initiation, a moderate effect of VAIL inhibition in perhaps expected. Nonetheless this still plays a crucial role in the viral life cycle.

      *Minor Comments *

      • The re-localization events shown in Fig 3A should be quantified.*

      This quantification of GFP-LC3 relocalisation will be carried out and included.

      • The co-localization events displayed in Fig 4A should be quantified.*

      The quantification of V1D, E and GFP-LC3 will be carried out and included.

      For Figure 2H-K the authors perform KDs of ATG16L1 and ATG13. While the results for the two specific proteins are certainly convincing, the authors would strengthen their argument by testing additional proteins in the autophagy pathway to support their claim that VAIL but not autophagy affects protein abundance of N (OPTIONAL).

      As discussed in response to reviewer 1, we will attempt to infect ATG16L1 KO cells reconstituted with a K490A ATG16L1 mutant, which is an established tool and has been validated to be deficient in VAIL but not canonical autophagy.

      ***Referee cross-commenting** *

      • Overall I agree with the comments of my co-reviewers and I think the suggested experiments/comments are sensible. *
      • I in part already eluted to it my analysis, but I tend to agree with reviewer 3 on the limited effect VAIL seems to have on LC3b lipidation.*

      As outlined above in response to reviewer 1 and below to reviewer 3, we agree that there is a modest contribution of VAIL to overall LC3 lipidation, which correlates with a modest amount of E expression in SARS-CoV-2 infection. VAIL is clearly important for the viral life cycle, thus whatever the proportion of LC3 lipidation attributable to this pathway it must be biologically significant.

      *Reviewer #2 (Significance (Required)): *

      • While previous publications have shown interaction between SARS-CoV2 and autophagy, the authors of this manuscript demonstrate that V-ATPase-ATG16L1 induced LC3 lipidation (VAIL) is activated during infection and affects viral replication. *

      • This study provides an interesting new aspect to host-SARS_CoV-2 interactions. *

      • The manuscript is of interest for people studying virus-host cell interaction, as well as for researchers in the fields of infectious diseases, specifically SARS-CoV2, and autophagy/VAIL*.

      We thank the reviewer for their assessment of our manuscript.

      R*eviewer #3 (Evidence, reproducibility and clarity (Required)): *

      • The interaction of SARS-CoV-2 with canonical autophagy has been well documented. However, whether SARS-CoV-2 infection induces and benefits from non-canonical autophagy is unclear. In this manuscript, the authors demonstrated that SARS-CoV-2 infection induces V-ATPase-ATG16L1-induced LC3 lipidation (VAIL), a form of non-canonical autophagy in which LC3 is conjugated to single membranes. The SARS-CoV-2 envelope protein, through its ion channel activity, triggers the V-ATPase proton pump and induces VAIL during SARS-CoV-2 infection. Inhibiting VAIL during SARS-CoV-2 infection with SopF, a Salmonella effector, attenuates SARS-CoV-2 egress. *

      • While these findings are interesting and demonstrate that SARS-CoV-2 infection triggers VAIL for its own benefit, the mechanism by which VAIL promotes SARS-CoV-2 replication remains unclear. Moreover, the contribution of VAIL to LC3 lipidation during SARS-CoV-2 infection appears to be minimal, as blocking VAIL through SoPF expression only marginally reduced LC3B lipidation (Fig. 1H). Therefore, the contribution of VAIL to LC3 lipidation during SARS-CoV-2 infection is minimal.*

      We thank the reviewer for their assessment of our manuscript. As we have already alluded to in our response, we agree that only part of the LC3 lipidation observed during infection can be attributed to VAIL. There is a reproducible effect on viral replication which we have demonstrated in multiple ways, therefore the contribution of VAIL is of biological importance.

      *Comments: *

      • The authors show that the ion channel activity of E is essential for VAIL induction during SARS-CoV-2 infection. Since V-ATPase recruits the ATG16L complex to induce VAIL, and to clarify how SARS-CoV-2 infection triggers VAIL, the authors should examine whether SARS-CoV-2 infection or the expression of E induces V-ATPase-ATG16L interaction and whether this interaction is disrupted when SopF is expressed.*

      We agree with the reviewer that this would be an informative experiment. We can carry out this experiment in an E expression system, rather than infection. This is due to the difficulty of getting enough material to carry out this kind of pull-down experiment in infected cells (at the time of writing these experiments still have to be carried out in CL3).

      • Since the authors suggest that expression of SopF attenuates viral exit, one would expect that the number of N-positive cells will increase in SopF-expressing cells compared to the mCherry control cells. However, as shown in Figure 2D, this is not the case. Could the authors discuss why N-positive cells will be reduced in SopF-expressing cells when viral egress is impeded in these cells*?

      This is a reflection of multi-cycle kinetics. N is still very strongly expressed in infected cells, even after virions have egressed. SARS-CoV-2 can infect VAIL-deficient cells and expresses the same levels of N prior to subsequent rounds of infection (at 7 hours after infection for example). Egress in VAIL-deficient, SopF-expressing cells is defective. Therefore, fewer cells will be infected in subsequent rounds of infection in SopF expressing cells, resulting in fewer N-positive cells in the SopF expressing cell population (most obvious after 24 hours).

      Figure 2H. The authors show that knockdown of ATG16L1 reduces the expression of N during SARS-CoV-2 infection compared to the controls. To confirm that knockdown of ATG16L1, which is required for both canonical autophagy and VAIL, reduces N staining via VAIL, the authors should examine the impact of SopF expression on N levels in ATG16L KD cells. This experiment will confirm if the reduction in N staining in ATG16L1 KD cells is due to VAIL.

      As stated in the response to reviewer 1, we can attempt this experiment in an ATG16L1 KO system complemented with K490A ATG16L1, which is deficient in VAIL and not canonical autophagy.

      • Figure 2J. The quality of the Western blot data is poor.*

      In this western the exposure is deliberately turned up to show that minimal ATG13 was left after knock down. We will also show the full blot with less exposure – this will demonstrate high quality.

      Also, N appears as a single band in Figure 2J, but appears as double bands in Figures 2A and H. Could the authors explain this?

      An extra band can be seen in 2J for N. However, as the reviewer points out, the intensity of the lower band is fainter than in 2A or 2H. The biology of SARS-CoV-2 N is interesting and complicated, with different truncated isoforms and phosphorylation patterns observed (see for example Mears et al., 2025 PMID:39836705). We observed changes in abundance of the second band between experiments, but this did not obviously depend on VAIL. We therefore consider this to be beyond the scope of this investigation.

      *Reviewer #3 (Significance (Required)): *

      • This manuscript proposes a role for VAIL in LC3 lipidation during SARS-CoV-2 infection. While the findings are interesting, VAIL only marginally contributes to LC3 lipidation during SARS-CoV-2 infection. Therefore, the significance of VAIL to LC3B lipidation during SARS-CoV-2 infection is unclear.*

      Our experiments show unambiguously that VAIL contributes to viral replication. Therefore even if As alluded to above, we do not think a further investigation of canonical macroautophagy and SARS-CoV-2 would enhance the quality of our manuscript. We will try to make our description of the contribution of macroautophagy clearer in the revised manuscript (without providing a full literature review). We also do not think that exploring the nature of the multiple N bands on western blot is within the scope of this paper.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      (1) The authors demonstrate that female Spodoptera littoralis moths prefer to oviposit on wellwatered tomato plants and avoid drought-stressed plants. The study then recorded the sounds produced by drought-stressed plants and found that they produce 30 ultrasonic clicks per minute. Thereafter, the authors tested the response of female S. littoralis moths to clicks with a frequency of 60 clicks per minute in an arena with and without plants and in an arena setting with two healthy plants of which one was associated with 60 clicks per minute. These experiments revealed that in the absence of a plant, the moths preferred to lay eggs on the side of the area in which the clicks could be heard, while in the presence of a plant the S. littoralis females preferred to oviposit on the plant where the clicks were not audible. In addition, the authors also tested the response of S. littoralis females in which the tympanic membrane had been pierced making the moths unable to detect the click sounds. As hypothesised, these females placed their eggs equally on both sites of the area.

      Finally, the authors explored whether the female oviposition choice might be influenced by the courtship calls of S. littoralis males which emit clicks in a range similar to a drought-stressed tomato plant. However, no effect was found of the clicks from ten males on the oviposition behaviour of the female moths, indicating that the females can distinguish between the two types of clicks. Besides these different experiments, the authors also investigated the distribution of egg clusters within a longer arena without a plant, but with a sugar-water feeder. Here it was found that the egg clusters were mostly aggregated around the feeder and the speaker producing 60 clicks per minute. Lastly, video tracking was used to observe the behaviour of the area without a plant, which demonstrated

      that the moths gradually spent more time at the arena side with the click sounds.

      We thank the reviewers for their helpful comments. We agree with the summary, but would like to note that in the control experiment (Figure 2) we used a click rate of 30 clicks per minute—a design choice driven by the editor’s feedback. We have clarified this and, to further probe the system’s dynamics, added a second experiment employing the same click rate (30 clicks per minute) with a dehydrated plant (see details below). In both experiments, females again showed a clear tendency to oviposit nearer the speaker; these findings are described in the updated manuscript.

      (2) The study addresses a very interesting question by asking whether female moths incorporate plant acoustic signals into their oviposition choice, unfortunately, I find it very difficult to judge how big the influence of the sound on the female choice really is as the manuscript does not provide any graphs showing the real numbers of eggs laid on the different plants, but instead only provides graphs with the Bayesian model fittings for each of the experiments. In addition, the numbers given in the text seem to be relatively similar with large variations e.g. Figure 1B3: 1.8 {plus minus} 1.6 vs. 1.1 {plus minus} 1.0. Furthermore, the authors do not provide access to any of the raw data or scripts of this study, which also makes it difficult to assess the potential impact of this study. Hence, I would very much like to encourage the authors to provide figures showing the measured values as boxplots including the individual data points, especially in Figure 1, and to provide access to all the raw data underlying the figures.

      We acknowledge that there are researchers who favor Bayesian graphical representation versus raw data visualization. Therefore, we have added chartplots of the raw data from Figure 1 in the supplementary section. We are aware of the duplication in presentation and apologize for this redundancy.  

      Regarding the variance and means we obtained in our experiment, we have analyzed all raw data using the statistical model presented, and if statistical significance was found despite a particular mean difference or variance, this is meaningful from a biological perspective. One can certainly discuss whether this difference has biological importance, but it should be remembered that in this experimental system, we are trying to isolate the acoustic signal from a complex system that includes multiple signals. Therefore, at no point we’ve suggested that this is a standalone factor, but rather proposed it as an informative and significant component. 

      In addition to the experiments described above, we conducted an experiment in which we counted both eggs and clusters. The results indicate that cluster counts are a reliable proxy for reproductive investment at a given location. In this experiment, we present cluster numbers alongside egg counts (Figure 2).

      Furthermore, we apologize for the technical error that prevented our uploaded data files from reaching the reviewers. We have also uploaded updated data and code.

      (3) Regarding the analysis of the results, I am also not entirely convinced that each night can be taken as an independent egg-laying event, as the amount of eggs and the place were the eggs are laid by a female moth surely depends on the previous oviposition events. While I must admit that I am not a statistician, I would suggest, from a biological point of view, that each group of moths should be treated as a replicate and not each night. I would therefore also suggest to rather analyse the sum of eggs laid over the different consecutive nights than taking the eggs laid in each night as an independent data point.

      We thank the reviewer for this question. This is a valid and point that we will address in three aspects: 

      First, regarding our statistical approach, we used a model that takes into account the sequence of nights and examines whether there is an effect of the order of nights, i.e., we used GLMMs, with the night nested within the repetition. This is equivalent to addressing this as a repeated measure and is, to our best knowledge, the common way to treat such data. 

      Second, following the reviewer's comment, we also reran the statistics of the third experiment (i.e., “sound gradient experiments”, Figure 2 and Supplementary figure 4) when only taking the first night when the female/s laid eggs to avoid the concern of dependency. This analysis revealed the same result – i.e., a significant preference for the sound stimulus. We have now updated our methods and results section to clarify this point.  

      Third, an important detail that may not have been clearly specified in the methods: at the end of each night, we cleaned the arena of counted egg clusters using a cloth with ethanol, so that on the subsequent night, we would not expect there to be evidence of previous oviposition but thus would not exclude some sort of physiological or cognitive memories. We have now updated our methods section to clarify this important procedural point. 

      (4) Furthermore, it did not become entirely clear to me why a click frequency of 60 clicks per minute was used for most experiments, while the plants only produce clicks at a range of 30 clicks per minute. Independent of the ecological relevance of these sound signals, it would be nice if the authors could provide a reason for using this frequency range. Besides this, I was also wondering about the argument that groups of plants might still produce clicks in the range of 60 clicks per minute and that the authors' tests might therefore still be reasonable. I would agree with this, but only in the case that a group of plants with these sounds would be tested. Offering the choice between two single plants while providing the sound from a group of plants is in my view not the most ecologically reasonable choice. It would be great if the authors could modify the argument in the discussion section accordingly and further explore the relevance of different frequencies and dBlevels.

      This is an excellent point. We originally increased the click rate generate a strong signal. However, it was important for us to verify that there was ecological relevance in the stimulus we implemented in the system. For this purpose, we recorded a group of dehydrated plants at a distance of ~20cm and we measured a click rate of 20 clicks per minute (i.e., 0.33 Hz) (see Methods section). Therefore, as mentioned at the beginning of this letter, in the additional experiment described in Figure 2, we reduced the click frequency to 30 clicks per minute, and at this lower rate, the effect was maintained. Increasing plant density would probably lead to a higher rate of 30 clicks per minute. 

      (5) Finally, I was wondering how transferable the findings are towards insects and Lepidopterans in general. Not all insects possess a tympanic organ and might therefore not be able to detect the plant clicks that were recorded. Moreover, I would imagine that generalist herbivorous like Spodoptera might be more inclined to use these clicks than specialists, which very much rely on certain chemical cues to find their host plants. It would be great if the authors would point more to the fact that your study only investigated a single moth species and that the results might therefore only hold true for S. littoralis and closely related species, but not necessary for other moth species such as Sphingidae or even butterflies.

      Good point. Our research uses a specific model system of one moth species and one plant species in a particular plant-insect interaction where females select host plants for their offspring. As with any model-based research that attempts to draw broader conclusions, we've taken care to distinguish between our direct findings and potential wider implications. We believe our system may represent mechanisms relevant to a wider group of herbivorous insects with hearing capabilities, particularly considering that several moth families and other insect orders can detect ultrasound. However, additional research examining more moth and plant species is necessary to determine how broadly applicable these findings are. We have made these clarifications in the text.

      Reviewer #2 (Public review):

      (6) The results are intriguing, and I think the experiments are very well designed. However, if female moths use the sounds emitted by dehydrated plants as cues to decide where to oviposit, the hypothesis would predict that they would avoid such sounds. The discussion mentions the possibility of a multi-modal moth decision-making process to explain these contradictory results, and I also believe this is a strong possibility. However, since this remains speculative, careful consideration is needed regarding how to interpret the findings based solely on the direct results presented in the results section.  

      Thank you for this insightful observation. We agree that the apparent attraction of females to dehydrated-plant sounds contradicts our initial prediction. Having observed this pattern consistently across multiple setups, we have now added a targeted choice experiment to the revised manuscript: here female moths were offered a choice between dehydrated plants broadcasting their natural ultrasonic emissions and a control. These results—detailed in the Discussion and presented in full in the Supplementary Materials (Supplementary Figure 4)—show that when only a dehydrated plant is available, moths would prefer it for oviposition, supporting our hypothesis that in the absence of a real plant, the plant’s sounds might represent a plant..

      (7) Additionally, the final results describing differences in olfactory responses to drying and hydrated plants are included, but the corresponding figures are placed in the supplementary materials. Given this, I would suggest reconsidering how to best present the hypotheses and clarify the overarching message of the results. This might involve reordering the results or re-evaluating which data should appear in the main text versus the supplementary materials

      Thank you for this suggestion. We have reorganized the manuscript and removed the olfactory response data from the current version to maintain a focused narrative on acoustic cues. We agree that a detailed investigation of multimodal interactions deserves a separate study, which we plan to pursue in future work. 

      (8) There were also areas where more detailed explanations of the experimental methods would be beneficial.

      Thank you for highlighting this point. We have expanded and clarified the Methods section to provide comprehensive detail on our experimental procedures.

      Reviewer #1 (Recommendations for the authors):

      (9) Line 1: Please include the name of the species you tested also in the title as your results might not hold true for all moth species.

      We do not fully agree with this comment. Please see comment 5.

      (10) Line 19-20: Please rephrase the sentence so that it becomes clear that the "dehydration stress" refers to the plant and not to the moths.

      Thank you for the suggestion; we have clarified the text accordingly

      (11) Line 31: Male moths might provide many different signals to the females, maybe better "male sound signals" or similar.

      Thank you for the suggestion; we have clarified the text accordingly.

      (12) Line 52-53: Maybe mention here that not all moth species have evolved these abilities.

      Thank you for the suggestion; we have clarified the text accordingly.

      (13) Line 77: add a space after 38.

      Thank you for the suggestion; we have clarified the text accordingly.

      (14) Line 88: Maybe change "secondary predators" to "natural enemies".

      Thank you for the suggestion; we have clarified the text accordingly.

      (15) Line 134: Why is "notably" in italics? I would suggest using normal spelling/formatting rules here.

      Thank you for the suggestion; we have clarified the text accordingly.

      (16) Line 140-144: If you did perform the experiment also with the more ecological relevant playback rate, why not present these findings as your main results and use the data with the higher playback frequency as additional support?

      Thank you for this suggestion. We agree that the ecologically relevant playback data are important; as described in detail at the beginning of this letter and also in comment 4, however, to preserve a clear and cohesive narrative, we have maintained the original ordering of this section. Nevertheless, the various experiments conducted in Figure 1 differ in several components from Figure 2 and the work that examined sounds in plant groups in the appendices. Therefore, we find it more appropriate to use them as supporting evidence for the main findings rather than creating a comparison between different experimental systems. For this reason, we chose to keep them as a separate description in "The ecological playback findings (Lines 140–144) remain fully described in the Results and serve to reinforce the main observations without interrupting the manuscript's flow.

      (17) Line 146: Please explain already here how you deafened the moths.

      Thank you for the suggestion; we have clarified the text accordingly.

      (18) Line 181: should it be "male moths' " ?

      Thank you for the suggestion; we have clarified the text accordingly.

      (19) Line 215: Why is "without a plant" in italics? I would suggest using normal spelling/formatting rules here.

      Thank you for the suggestion; we have clarified the text accordingly.

      (20) Line 234: I do not understand why this type of statistic was used to analyse the electroantennogram (EAG) results. Would a rather simple Student's t-test or a Wilcon rank sum test not have been sufficient? I would also like to caution you not to overinterpret the data derived from the EAG, as you combined the entire headspace into one mixture it is no longer possible to derive information on the different volatiles in the blends. The differences you observe might therefore mostly be due to the amount of emitted volatiles.

      We have reorganized the manuscript and removed the olfactory response data from the current version to maintain a focused narrative on acoustic cues (See comment 7). 

      (21) Line 268: It might be nice to add an additional reference here referring to the multimodal oviposition behaviour of the moths.

      Thank you for the suggestion; we have clarified the text accordingly.

      (22) Line 284: If possible, please add another reference here referring to the different cues used by moths during oviposition.

      Thank you for the suggestion; we have clarified the text accordingly.

      (23) Line 336: What do you mean by "closed together"?

      Thank you for the suggestion; we have clarified the text accordingly.

      (24) Line 434-436: Please see my overall comments. I do not think that you can call it ecologically relevant if the signal emitted by multiple plants is played in the context of just a single plant.

      Please see comments 1 and 4.

      (25) Line 496: Please change "stats" to statistics.

      Thank you for the suggestion; we have clarified the text accordingly.

      (26) Line 522-524: I am not sure whether simply listing their names does give full credit to the work these people did for your study. Maybe also explain how they contributed to your work.

      Thank you for the suggestion; we have clarified the text accordingly.

      Reviewer #2 (Recommendations for the authors):

      (27) L54 20-60kHz --> 20Hz-60kHz or 20kHz - 60kHz?

      OK. We have replaced it.

      (28) L124 Are the results for the condition where nothing was placed and the condition where a decoy silent resistor was placed combined in the analysis? If so, were there no significant differences between the two conditions? Comparing these with a condition presenting band-limited noise in the same frequency range as the drought-stressed sounds might also have been an effective approach to further isolate the specific role of the ultrasonic emissions.

      We have used both conditions due to technical constrains and pooled them tougher for analysis— statistical tests confirmed no significant differences between them—and this clarification has now been added to the Methods section including the results of the statistical test.

      (29) L125 (Fig. 1A), see Exp. 1 in the Methods). -> (Fig.1B. See Exp.1 in the Methods).

      Thank you for the suggestion; we have clarified the text accordingly.

      (30) L132 "The opposite choice to what was seen in the initial experiment (Fig.1B)"

      Thank you for the suggestion; we have clarified the text accordingly.

      (31) L137-143 If you are writing about results, why not describe them with figures and statistics? The current description reads like a discussion.

      These findings were not among our primary research questions; however, we believe that including them in the Results section underscores the experimental differences. In our opinion, introducing an additional figure or expanding the statistical analysis at this point would disrupt the narrative flow and risk confusing the reader.

      (32) L141 "This is higher than the rate reported for a single young plant" Are you referring to the tomato plants used in the experiments? It might be helpful to include in the main text the natural click rate emitted by tomato plants, as this information is currently only mentioned in the Methods section.

      See comment 4.  

      (33) L191 Is the main point here to convey that the plant playback effect remained significant even when the sound presentation frequency was reduced to 30 clicks per minute? The inclusion of the feeder element, however, seems to complicate the message. To simplify the results, moving the content from lines 185-202 to the supplementary materials might be a better approach. Additionally, what is the rationale for placing the sugar solution in the arena? Is it to maintain the moths' vitality during the experiment? Clarifying this in the methods section would help provide context for this experimental detail.

      In this series of experiments, we manipulated four variables—single moths, ultrasonic click rate, arena configuration (from a two-choice design to an elongated enclosure), and the response metric (total egg counts rather than cluster counts)—to evaluate moth oviposition under more ecologically realistic conditions. We demonstrate the system’s robustness and validity in a more realistic setting (by tracking individual moths, counting single eggs, etc.).  

      As noted in the text, feeders were included to preserve the moths’ natural behavior and vitality. We have further clarified this in the revised manuscript.

      (34) L215 Is the click presentation frequency 30 or 60 per minute? Since Figure 3 illustrates examples of moth movement from the experiment described in Figure 1, it might be more effective to present Figure 3 when discussing the results of Figure 1 or to include it in the supplementary materials for better clarity and organization.

      See comments 1 and 4. As mentioned in the above 

      (35) L291 Please provide a detailed explanation of the experiments and measurements for the results shown in Figure S3 (and Figure S2). If the multi-modal hypothesis discussed in the study is a key focus, it might be better to include these results in the main results section rather than in the supplementary materials.

      Thank you for this suggestion. Figure S2 was removed, see comments above. We’ve added now the context to figure S3.

      (36) L303 It might be helpful to include information about the relationship between the moth species used in this study and tomato plants somewhere in the text. This would provide an important context for understanding the ecological relevance of the experiments.

      Thank you for the suggestion; we have clarified the text accordingly.

      (37) Table 1 The significant figures in the numbers presented in the tables should be consistent.

      Thank you for the suggestion; we have clarified the text accordingly.

      (38) L341 The text mentions that experiments were conducted in a greenhouse, but does this mean the arena was placed inside the greenhouse? Also, the term "arena" is used - does this refer to a sealed rectangular case or something similar? For the sound presentation experiments, it seems that the arena cage was placed inside a soundproof room. If the arena is indeed a case-like structure, were there any specific measures taken to prevent sound scattering within the case, such as the choice of materials or structural modifications?

      Here, “arena” refers to the plastic boxes used throughout this study. In this particular experiment, we presented plants alone—reflecting ongoing debate in the literature—and used these trials as a baseline for our subsequent sound-presentation experiments, during which we measured sound intensity as described in the Methods section. All sound-playback experiments were conducted in sound-proof rooms, and acoustic levels were measured beforehand—sound on the control side fell below our system’s detection threshold. 

      (39) L373 "resister similar to the speaker" Could you explain it in more detail? I think this would depend on the type of speaker used-particularly whether it includes magnets. From an experimental perspective, presenting different sounds such as white noise from the speaker might have been a better control. Was there a specific reason for not doing so? Additionally, the study does not clearly demonstrate whether the electric and magnetic field environments on both sides of the arena were appropriately controlled. Without this information, it is difficult to evaluate whether using a resistor as a substitute was adequate.

      Thank you for this comment. We have now addressed this point in the Discussion. We acknowledge that we did not account for the magnetic field, which might have differed between the speaker and the resistor. We agree that using an alternative control, such as white noise, could have been informative, and we now mention this as a limitation in the revised Methods.

      (40) L435 60Hz? The representation of frequencies in the text is inconsistent, with some values expressed in Hz and others as "clicks per second." It would be better to standardize these units for clarity, such as using Hz throughout the manuscript.

      We agree that this is confusing. We reviewed the text and made sure that when we addressed click per second, we meant how many clicks were produced and when we addressed Hz units it was in the context of sound frequencies.  

      (41) L484 "we quantified how many times each individual crossed the center of the arena" Is this data being used in the results?

      Yes. Mentioned in the text just before Figure 3. L220

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We appreciate the constructive and supportive feedback on our manuscript. All three reviewers acknowledged the significance and novelty of our work on bacterial telomere protection. In response to their suggestions, we have conducted the requested experiments and revised the manuscript accordingly. These changes have enhanced the rigor of our study and clarified our interpretations and explanations.

      Moreover, we characterized an additional truncation mutant of TelN (TelN Δ445–631), which lacks the two C-terminal domains. Despite this deletion, the mutant retained protection activity (Supplementary Figure S4B), indicating that the remaining regions of the protein are sufficient to confer efficient protection in this assay.

      Finally, we removed three sequence alignments (previously Supplementary Figures S6A and S7), as we recognized that the high degree of sequence divergence could hinder proper alignment and potentially lead to misinterpretation.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      This study addresses how the bacterial telomere protein TelN protects telomere ends against the action of the Mre11-Rad50 nuclease (MR). This protection is essential for the stability of hairpin-ended linear plasmid and chromosomes in bacteria but had not been explored before. The authors demonstrate that TelN is necessary and sufficient to block MR-dependent DNA cleavage when bound to its specific telomere sequence. By combining elegant genetics and biochemical approaches, it convincingly shows that TelN-dependent inhibition likely involves a specific interaction between TelN and the MR complex. The manuscript is well written, easy to read and focused on the relevant information. The claims and the conclusions are supported by the data. There is no over-interpretation.

      Comments: - Figure 1B, unnormalized transformation efficiency would be useful to show in SI

      The unnormalized B. subtilis transformation efficiency has now been added as new figure panel S1B.

      • Figures 2B, 2C, 3C, 3D, 4C, 5A and 5B: quantification of independent experiments should be added

      While these DNA protection experiments show a clearly reproducible pattern of DNA degradation, the exact response to TelN titration varies somewhat between experimental replicates. We initially included the quantification of remaining full-length DNA because the corresponding band is hard to discern in the gel image due to pixel saturation. However, we realize now that this may mislead readers to think that the degradation occurs always with the exact same dosage response.

      To avoid this, we have decided to remove the quantification and instead show the relevant part of the gel also at higher contrast to better visualize the loss of full-length DNA due to DNA degradation. In addition, we have included replicate experiments carried out at the same MR concentration (125 nM M₂R₂) or at higher concentration (500 nM M₂R₂) in the supplementary material. These examples demonstrate the general reproducibility of the assay.

      **Referee cross-commenting**

      Perfect for me. It seems that there is a consensus.

      Reviewer #1 (Significance (Required)):

      This pioneering study provides a very strong basis for a new understanding of telomeres in bacteria and offers fascinating evolutionary perspectives when compared to similar mechanisms active at telomeres in eukaryotic cells.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      The paper is well-presented and well-written throughout. The paper shows convincingly that TelN protects hairpin DNA ends from the activity of SbcCD, presumably providing a protection mechanism for N15 phage DNA in vivo. Furthermore, this protection activity is shown not to require the catalytic (resolvase) activity of TelN, nor its poorly characterised C-terminal domain. The paper also suggests that this inhibition acts both at the level of competition for the DNA hairpin end and at the level of a direct protein:protein interaction between TelN and MR. An (acknowledged) weakness is that there is no real insight into the protein:protein interaction suggested by the experiments shown in Figure 5. Ideally, the protein:protein interaction interface would be identified and mutations in this interface would be shown to reduce hairpin protection.

      Specific comments/questions

      (1) What pathway (in vivo) leads to inactivation of linear hairpin DNA - one suspects that cleavage by SbcCD at the hairpins is probably not the full story. Presumably SbcCD cleavage facilitates further processing by other long range resection systems such as RecBCD, Exo1, RecQ/J etc. Would it be appropriate to view the hairpin as an adaption to protect against these nucleases, which then must be complemented with a mechanism to suppress SbcCD?

      The reviewer's suggestion that hairpin ends represent a first layer of adaptation against nucleolytic processing is compelling. Hairpin structures inherently resist many exonucleases due to their covalently closed nature (absence of free 3’ or 5’ ends) but remain vulnerable to MR processing (Connelly et al, 1998, 1999; Saathoff et al, 2018). This creates a scenario where effective telomere protection requires both the structural barrier provided by the hairpin and an active mechanism to suppress MR activity. We have added this perspective to the relevant paragraph in the discussion.

      (2) Section starting "Direct inhibition of MR by TelN in vitro". What is the word direct supposed to convey here? To me it suggests that the inhibition is via direct interaction of TelN with MR (rather than, for example, a result of competition for the hairpin DNA end) which is not shown here. Suggest either defining or removing the word direct. This point gains more importance considering that differentiating between inhibition mechanisms becomes a focus of later parts of the paper.

      By "direct inhibition," we meant that TelN blocks MR nuclease activity without requiring additional cofactors, as demonstrated in this minimal reaction system containing only TelN, MR complex, DNA substrate, and ATP. To avoid ambiguity, we have reworded the corresponding headline and paragraph.

      (3) Figure 2B - Why no control lane without MR? - this is a basic control to show that he degradation we are seeing in the absence of TelN is MR-dependent. Formally, as shown, the degradation could be caused by the ATP stock.


      We have now included ATP-only control lanes (without MR complex), which show no substrate degradation, confirming that ATP stocks do not contain contaminating nucleases and that the observed degradation is indeed MR-dependent. These controls are included in the supplementary data (Figure S3A) along with additional replicate experiments. Notably, the dose-dependent protection observed at low TelN concentrations (where MR activity is not fully inhibited) provides additional evidence for the specificity of the MR-TelN interaction system, as non-specific nuclease contamination would result in complete substrate degradation regardless of TelN concentration.

      (4) Why not use B. subtilis SbcCD for the species specificity experiment? Also, is it not surprising that TelN yielded zero protection against MRX given that the DNA sequence specificity experiments above suggest competition for DNA substrate is part of the inhibition mechanism?


      We agree that this would be a great addition. We attempted but were unable to purify active B. subtilis SbcCD protein despite multiple attempts. The yeast MRX experiment serves the same purpose of demonstrating species specificity and represents a more evolutionarily distant comparison, which strengthens our conclusions about bacterial-specific inhibition.

      (5) If the authors felt it appropriate, I thought there was scope for further discussion/introductory material. There are strong parallels here with mechanisms used by phage to protect themselves from the activities of RecBCD, which include both proteins that protect DNA ends like T4 gene 2, we well as proteins that bind directly to RecBCD to inactivate it like lambda Gam. As such, the work here will appeal as much to those interested in bacterial defence systems / phage:host interactions as it does to those interested in telomere biology. Especially significant is the inhibition of DNA end processing factors by lambda Gam since this protein is reported to interact with both RecBCD and SbcCD (PMID: 2531105).

      We agree that there are obvious parallels between lambda Gam and TelN as counter-defence factors. This was likely largely missed in previous work because the telomere resolution activity of TelN masked its function in counter-defence. We have added a statement on this matter at the end of the discussion.

      (6) Just a gripe really: it seems to be 'de rigeur' at the moment to re-name bacterial proteins for their human orthologues, presumably to elevate the perceived importance of the work(?), but it is not a practice I think is terribly helpful as it causes issues when searching literature. Minimally it would be great if the authors could ensure they add SbcCD as a keyword for search purposes.

      We appreciate the reviewer's concern about nomenclature inconsistencies in the literature. We have chosen MR over SbcCD as a more generic term that covers eukaryotes, archaea and lately also bacteria and will hopefully contribute to a more consistent terminology in the literature across the domains of life in the future. Our choice to use "Mre11-Rad50" (MR) for the E. coli SbcCD complex is also consistent with prominent recent publications (Käshammer et al., 2019; Gut et al., 2022), explicitly referring to the E. coli system as "Mre11-Rad50" while acknowledging the bacterial designation. To link to previous literature, we made sure that both "SbcCD" and "Mre11-Rad50" are mentioned in the abstract. And, as suggested, we have now also added “SbcCD” to our keyword list to facilitate comprehensive literature searches.

      **Referee cross-commenting**

      I have nothing to add. The reviewers' comments are all broadly positive and consistent.

      Reviewer #2 (Significance (Required):

      This is an excellent paper unveiling a phage encoded "counter-defence" mechanism designed to protect phage DNA from degradation. It will be of special interest to those studying telomere biology of phage:host interactions.



      Reviewer #3

      The authors investigate how the N15 phage protelomerase TelN protects linear chromosomes that terminate in hairpin structures (a sort of telomere). In E. coli and B. subtilis cells, removal or truncation of telN reduces transformation/survival of linear DNA, whereas complementation with full-length or a catalytically inactive TelN restores viability, consistent with TelN playing a nonenzymatic capping function.

      In vitro, TelN binds hairpin substrates with moderate affinity and protects them from the nuclease activity of the Mre11/Rad50 complex. The authors propose that TelN originated as an early, sequence specific barrier against MR mediated DNA end processing, establishing fundamental principles of telomere protection that persist from bacteria to eukaryotes.

      Major comments:

      The manuscript convincingly shows that TelN can functionally block the Mre11Rad50 (MR) nuclease on a hairpin DNA end in a sequence specific manner (suggesting a physical interaction), but it doesn't directly demonstrate this. A simple pull-down or equilibrium binding method would be useful in proving a physical interaction.

      We agree that this would be a valuable addition to the study. We have made several attempts to detect direct interaction by co-immunoprecipitation. However, without success so far. We do not have sufficient material for equilibrium binding methods (yet).__ ____ __


      The MR complex requires ATP hydrolysis for resection of DNA ends. It would be a nice addition to the manuscript if the effect of TelN of Rad50 ATPase activity was tested.


      We have tested the effect of TelN on Rad50 ATPase activity and found no significant impact under our experimental conditions, possible in line with the lack of stable interaction.

      The bar plot on Fig 3B indicates that the experiments are performed in triplicate. The statistical significance of the differences between conditions should be determined. The same general comment could be made regarding the quantification of the polyacrylamide gels - how reproducible are these values?


      We performed paired t-test analysis for the following figures and now indicate the p-values wherever significant (below 0.05): Figures 1D, 1E, 3B, 4B and S4B. We used paired t-tests to generally compare linear vs circular plasmid transformation efficiency for each condition. In Figure 4B, which included two different linear DNA constructs, we compared the two linear DNA constructs directly to each other. [Given that our experimental design included multiple control conditions with known expected outcomes to validate assay performance, rather than many independent exploratory comparisons, we report uncorrected p-values as the primary analysis. The inclusion of multiple controls with predictable outcomes reduces the likelihood of false positive interpretations.]

      As stated in response to reviewer 1, while the exact values for the DNA degradation profile vary somewhat between experiments (likely due to variations in band quantification – see also response to comment below), the general trends are robust as for example indicated by similar experiments performed with higher MR concentration (500 nM instead of 125 nM M₂R₂ concentrations for all TelN variants) demonstrating reproducibility across different conditions. For Figure 5, however, we are unable to provide additional repeat experiments due to limitations in reagent availability. Considering the robust effect seen with Ec MR controls and the presence of multiple samples in the dilution series, we are nevertheless confident about the conclusion.

      Minor comments:

      A better explanation of how the gels were quantified should be provided. Were the products included in the analysis, or was it just the decrease in the substrate band that was measured?

      As also stated above, we have removed the band quantification and instead show the bands also at different contrast settings.

      In our original approach, gel band quantification was performed using ImageQuant TL software (version 8.2.0, GE Healthcare). For each gel, individual lanes were defined using either fixed-width boundaries (95-103 pixels) or automatic edge detection, depending on the gel quality and band definition. Band volumes were calculated using rolling ball background subtraction (radius 180 pixels) with automatic band detection. Substrate degradation was assessed by measuring the integrated density (volume) of the remaining full-length (or near full-length) substrate bands under different treatment conditions. The band volume values were plotted directly to compare substrate levels across treatment groups.

      We now present the data as two gel panels: an exposure showing the full reaction profile, and another exposure focusing on the substrate bands to clearly demonstrate dose-dependent protection. Additional replicate experiments including ATP-only controls (confirming no contamination from ATP stocks) and experiments at 500 nM M₂R₂ concentrations, are provided in the supplementary data. This approach provides more direct visualization of the biological phenomenon with comprehensive control validation.

      I felt like the Results jump rather abruptly from B. subtilis chromosome assays to E. coli plasmid experiments. Maybe the addition of a few linking sentences would improve this transition.


      Upon re-reading the manuscript we agree with this assertion and have added further information to provide a smoother transition.

      A comment on the stoichiometry of TelN and genome ends during phage replication would be useful.

      Our in vitro data suggest that effective protection can be achieved at relatively low TelN:DNA ratios in vitro, consistent with the notion of formation of stable, protective nucleoprotein structures. We unfortunately do not currently have information on the copy number of TelN per cell or per hairpin end. It is not easy to obtain reliable values for these numbers. However, we can speculate that multiple TelN proteins are present due to the presence of three copies of a DNA sequence motif (binding to CTD1) in each telomeric DNA, consistent with the formation of stable, protective nucleoprotein structures.

      Reviewer #3 (Significance (Required)):

      General assessment:

      Strengths: A nice combination of genetics and biochemistry convincingly demonstrates that TelN protects linear chromosomes/replicons from MR-dependent degradation independent of its cleavage-ligase activity. It does this by binding to the hairpin DNA ends in a sequence specific fashion and the species specificity suggests a direct physical interaction, which likely inhibits the nuclease activity of the MR complex

      Limitations: The lack of characterization of the putative physical interaction between TelN and the MR complex is considered a weakness.

      Advance: The manuscript fills in a mechanistic gap between protelomerase-mediated telomere formation and maintenance by demonstrating a protective/capping role. This is the first quantitative analysis of DNA-end protection from MR nuclease activity by TelN.

      Audience: Readers interested in bacterial chromosome biology, DNA repair, the parallels to eukaryotic shelterin will be interesting to the broader telomere and genome stability communities.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this paper, Behruznia and colleagues use long-read sequencing data for 339 strains of the Mycobacterium tuberculosis complex to study genome evolution in this clonal bacterial pathogen. They use both a "classical" pangenome approach that looks at the presence and absence of genes, and a pangenome graph based on whole genomes in order to investigate structural variants in non-coding regions. The comparison of the two approaches is informative and shows that much is missed when focussing only on genes. The two main biological results of the study are that 1) the MTBC has a small pangenome with few accessory genes, and that 2) pangenome evolution is driven by genome reduction. In the revised article, the description of the data set and the methods is much improved, and the comparison of the two pangenome approaches is more consistent. I still think, however, that the discussion of genome reduction suffers from a basic flaw, namely the failure to distinguish clearly between orthologs and homologs/paralogs.

      Strengths:

      The authors put together the so-far largest data set of long-read assemblies representing most lineages of the Mycobacterium tuberculosis context, and covering a large geographic area. They sequenced and assembled genomes for strains of M. pinnipedi, L9, and La2, for which no high-quality assemblies were available previously. State-of-the-art methods are used to analyze gene presence-absence polymorphisms (Panaroo) and to construct a pangenome graph (PanGraph). Additional analysis steps are performed to address known problems with misannotated or misassembled genes.

      Weaknesses:

      The revised manuscript has gained much clarity and consistency. One previous criticism, however, has in my opinion not been properly addressed. I think the problem boils down to not clearly distinguishing between orthologs and paralogs/homologs. As this problem affects a main conclusion - the prevalence of deletions over insertions in the MTBC - it should be addressed, if not through additional analyses, then at least in the discussion.

      Insertions and deletions are now distinguished in the following way: "Accessory regions were further classified as a deletion if present in over 50% of the 192 sub-lineages or an insertion/duplication if present in less than 50% of sub-lineages." The outcome of this classification is suspicious: not a single accessory region was classified as an insertion/duplication. As a check of sanity, I'd expect at least some insertions of IS6110 to show up, which has produced lineage- or sublineage-specific insertions (Roychowdhury et al. 2015, Shitikov et al. 2019). Why, for example, wouldn't IS6110 insertions in the single L8 strain show up here?

      In a fully clonal organism, any insertion/duplication will be an insertion/duplication of an existing sequence, and thus produce a paralog. If I'm correctly understanding your methods section, paralogs are systematically excluded in the pangraph analysis. Genomic blocks are summarized at the sublineage levels as follows (l.184 ): "The DNA sequences from genomic blocks present in at least one sub-lineage but completely absent in others were extracted to look for long-term evolution patterns in the pangenome." I presume this is done using blastn, as in other steps of the analysis.

      So a sublineage-specific copy of IS6110 would be excluded here, because IS6110 is present somewhere in the genome in all sublineages. However, the appropriate category of comparison, at least for the discussion of genome reduction, is orthology rather than homology: is the same, orthologous copy of IS6110, at the same position in the genome, present or absent in other sublineages? The same considerations apply to potential sublineage-specific duplicates of PE, PPE, and Esx genes. These gene families play important roles in host-pathogen interactions, so I'd argue that the neglect of paralogs is not a finicky detail, but could be of broader biological relevance.

      Reviewer #2 (Public review):

      Summary:

      The authors attempted to investigate the pangenome of MTBC by using a selection of state-of-the-art bioinformatic tools to analyse 324 complete and 11 new genomes representing all known lineages and sublineages. The aim of their work was to describe the total diversity of the MTBC and to investigate the driving evolutionary force. By using long read and hybrid approaches for genome assembly, an important attempt was made to understand why the MTBC pangenome size was reported to vary in size by previous reports. This study provides strong evidence that the MTBC pangenome is closed and that genome reduction is the main driver of this species evolution.

      Strengths:

      A stand-out feature of this work is the inclusion of non-coding regions as opposed to only coding regions which was a focus of previous papers and analyses which investigated the MTBC pangenome. A unique feature of this work is that it highlights sublineage-specific regions of difference (RDs) that was previously unknown. Another major strength is the utilisation of long-read whole genomes sequences, in combination with short-read sequences when available. It is known that using only short reads for genome assembly has several pitfalls. The parallel approach of utilizing both Panaroo and Pangraph for pangenomic reconstruction illuminated limitations of both tools while highlighting genomic features identified by both. This is important for any future work and perhaps alludes to the need for more MTBC-specific tools to be developed. Lastly, ample statistical support in the form of Heaps law and genome fluidity calculations for each pangenome to demonstrate that they are indeed closed.

      Weaknesses:

      There are no major weaknesses in the revised version of this manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      l. 27: "lineage-specific and -independent deletions": it is still not clear to me what a lineage-independent, or convergent, deletion is supposed to be. TBD1, for instance, is not lineage-specific, but it is also not convergent: it occurred once in the common ancestor of lineages 1, 2, and 3, while convergence implies multiple parallel occurrences.

      We have changed this and in other places to more evolutionary terms, such as divergent (single event) and convergent (multiple events), or explain exactly what is meant where needed.

      l. 118: "where relevant", what does that mean?

      This was superfluous to the description and so is now removed.

      l. 178ff.: It is not clear to me what issue is addressed by this correction of the pangenome graph. Also here there seems to be some confusion regarding orthologs and paralogs. A gene or IS copy can be present at one locus but absent at another, which is not a mistake of Pangraph that would require correction. It's rather the notion of "truly absent region" which is ambiguous.

      We have changed the text to be more specific on the utility of this step. Since it is known that Panaroo mislabels some genes as being absent due to over splitting (see Ceres et al 2022 and our reclassification earlier in the paper), we wanted to see if the same occurred in Pangraph. We have modified the methods text to be more specific (line 181) and in the results included the percentage of total genes/regions affected by this correction.

      In relation to copy number, Pangraph is not syntenic in its approach; if a region is present anywhere it is labelled as present in the genome. Pangraph will look for multiple copies of that region (e.g. an IS element) but indeed we did not look for specific syntenic changes across the genomes. This would be a great analysis and something we will consider in the future; we have indicated such in the discussion (line 454).

      l. 305: "mislabelled as absent": see above, is this really 'mislabelled'?

      See answer to question above

      l. 372: "using the approach": something missing here.

      This was superfluous to the description and so is now removed.

      l. 381: the "additional analysis of paralogous blocks" (l. 381) seems to suffer from the same confusion of ortho- and paralogy described above: no new sub-lineage-specific accessory regions are found presumably because the analysis did consider any copy rather than orthologous copies.

      Paralogous copies were looked for by Pangraph, and we did not find any sub-lineage where all members had additional copies compared to other sub-lineages. Indeed, single genomes could have these, and shorter timescales could see a lot of such insertions, but we looked at longer-scale (all genomes within a sub-lineage) patterns and did not find these. These limitations are already outlined in the discussion.

      l. 415: see above. There is no diagnosis of a problem that would motivate a "correction". That's different from the correction of the Panaroo results, where fragmented annotations have been shown to be a problem.

      Of interest, the refining of regions did re-label multiple regions as being core when Pangraph labelled it as absent from some genomes was at about the same rate as the correction to Pangraph (2% of genes/regions). This indicates there is a stringency issue with pangraph where blocks are mislabelled as absent. The underlying reason or this is not clear but the correction is evidently required in this version of Pangraph.

      l. 430ff.: The issue of paralogy and that the "same" gene or region is defined in terms of homology rather than orthology should be addressed here. For me the given evidence does not support the claim that deletion is driving molecular evolution in the MTBC.

      As outlined above, indeed paralogy may be driving some elements of the overall evolutionary patterns; our analysis just did not find this. Panaroo without merged paralogs did not find paralogous genes as a main differentiating factor for any sub-lineage. Pangraph also did not find multiple copies of blocks present in all genomes in a sub-lineage. As outlined above, indeed single genomes show such patterns but we did not include single genome analyses here, and outline that as a next steps in the discussion. We have also linked to a recent pangenome paper that showed duplication is present in the pangenome of Mtbc, although not related to any specific lineage (Discussion line 485).

      l. 443 ff: "lineage-independent deletions (convergent evolution)": see above, I still think this terminology is unclear

      This has now been made clearer to be specifically about convergent and divergent evolutionary patterns.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review): 

      This paper describes technically-impressive measurements of calcium signals near synaptic ribbons in goldfish bipolar cells. The data presented provides high spatial and temporal resolution information about calcium concentrations along the ribbon at various distances from the site of entry at the plasma membrane. This is important information. Important gaps in the data presented mean that the evidence for the main conclusions is currently inadequate. 

      Strengths 

      The technical aspects of the measurements are impressive. The authors use calcium indicators bound to the ribbon and high speed line scans to resolve changes with a spatial resolution of ~250 nm and temporal resolution of less than 10 ms. These spatial and temporal scales are much closer to those relevant for vesicle release than previous measurements. 

      The use of calcium indicators with very different affinities and of different intracellular calcium buffers helps provide confirmation of key results. 

      Thank you very much for this positive evaluation of our work.

      Weaknesses 

      Multiple key points of the paper lack a statistical test or summary data from populations of cells. For example, the text states that the proximal and distal calcium kinetics in Figure 2A differ. This is not clear from the inset to Figure 2A - where the traces look like scaled versions of each other. Values for time to half-maximal peak fluorescence are given for one example cell but no statistics or summary are provided. Figure 8 shows examples from one cell with no summary data. This issue comes up in other places as well. 

      Thank you for this fair and valuable feedback. Following also the suggestion by the Editor, we have now removed the rise-time kinetic fitting results from the manuscript and only retain the bi-exponential decay time constant values. Further, we explicitly detail the issues with kinetic fitting, and state that the precise quantitative conclusions should not be drawn from the differences in kinetic parameters (pages 7 and 2728). 

      We have included the results of paired-t-tests to compare the amplitudes of proximal vs. distal calcium signals shown in Fig. 2A & B, Fig. 3C & D, Fig. 4C & D, Fig. 5A-D, and Fig. 8E&F. Because proximal and distal calcium signals were obtained from the same ribbons within 500-nm distances, as the Reviewer pointed out, “the traces look like scaled versions of each other”. For experiments where we make comparisons across cells or different calcium indicators, as shown in Fig. 3E & F, Fig.5E, and Fig. 8B&C, we have included the results of an unpaired t-test. We have also included the t-test statistics information in the respective figure legends in the revised version.

      In Figure 8, we have shown example fluorescence traces from two different cells at the bottom of the A panel, and example traces from different ribbons of RBC a in the D, and the summary data is described in B-C and E-F, with statistics provided in the figure legends.

      The rise time measurements in Figure 2 are very different for low and high affinity indicators, but no explanation is given for this difference. Similarly, the measurements of peak calcium concentration in Figure 4 are very different with the two indicators. That might suggest that the high affinity indicator is strongly saturated, which raises concerns about whether that is impacting the kinetic measurements. 

      Yes, we do believe that the high-affinity indicator is partially saturated, and therefore, the measurement with the low-affinity indicator dye is a more accurate reflection of the measured Ca<sup>2+</sup> signal. We now state this more explicitly in the text. Further, we note that the rise time values are no longer listed due to lack of statistical significance for such comparisons, as noted above.

      Reviewer #2 (Public review): 

      Summary: 

      The study introduces new tools for measuring intracellular Ca2+ concentration gradients around retinal rod bipolar cell (rbc) synaptic ribbons. This is done by comparing the Ca2+ profiles measured with mobile Ca2+ indicator dyes versus ribbon-tethered (immobile) Ca2+ indicator dyes. The Ca2+ imaging results provide a straightforward demonstration of Ca2+ gradients around the ribbon and validate their experimental strategy. This experimental work is complemented by a coherent, open-source, computational model that successfully describes changes in Ca2+ domains as a function of Ca2+ buffering. In addition, the authors try to demonstrate that there is heterogeneity among synaptic ribbons within an individual rbc terminal. 

      Strengths: 

      The study introduces a new set of tools for estimating Ca2+ concentration gradients at ribbon AZs, and the experimental results are accompanied by an open-source, computational model that nicely describes Ca2+ buffering at the rbc synaptic ribbon. In addition, the dissociated retinal preparation remains a valuable approach for studying ribbon synapses. Lastly, excellent EM. 

      Thank you very much for this positive evaluation of our work.

      Comments on revisions: 

      Specific minor comments: 

      (1) Rewrite the final sentence of the Abstract. It is difficult to understand. 

      Thank you for pointing that out. We have updated the final sentence of the Abstract.

      (2) Add a definition in the Introduction (and revisit in the Discussion) that delineates between micro- and nano-domain. A practical approach would be to round up and round down. If you round up from 0.6 um, then it is microdomain which means ~ 1 um or higher. Likewise, round down from 0.3 um to nanodomain? If you are using confocal, or even STED, the resolution for Ca imaging will be in the 100 to 300 nm range. The point of your study is that your new immobile Ca2-ribbon indicator may actually be operating on a tens of nm scale: nanophysiology. The Results are clearly written in a way that acknowledges this point but maybe make such a "definition" comment in the intro/discussion in order to: 1) demonstrate the power of the new Ca2+ indicator to resolve signals at the base of the ribbon (effectively nano), and 2) (Discussion) to acknowledge that some are achieving nanoscopic resolution (50 to 100nm?) with light microscopy (as you ref'd Neef et al., 2018 Nat Comm).  

      Thank you for the valuable comments. We have now provided this information in the introduction and discussion.  

      (3) Suggested reference: Grabner et al. 2022 (Sci Adv, Supp video 13, and Fig S5). Here rod Cav channels are shown to be expressed on both sides the ribbon, at its base, and they are within nanometers from other AZ proteins. This agrees with the conclusions from your imaging work.  

      Thank you for the valuable suggestion. We have now provided this information in the introduction and discussion.

      (4) In the Discussion, add a little more context to what is known about synaptic transmission in the outer and inner retina.. First, state that the postsynaptic receptors (for example: mGluR6-OnBCs vs KARs-OffBCs, vs. AMPAR-HCs), and possibly the synaptic cleft (ground squirrel), are known to have a significant impact on signaling in the outer retina. In the inner retina, there are many more unknowns. For example, when I think of the pioneering Palmer JPhysio study, which you sight, I think of NMDAR vs AMPAR, and uncertainty in what type postsynaptic cell was patched (GC or AC....). Once you have informed the reader that the postsynapse is known to have a significant impact on signaling, then promote your experimental work that addresses presynaptic processes: "...the new tool and results allow us to explore release heterogeneity, ribbon by ribbon in dissociated preps, which we eventually plan to use at ribbon synapses within slices......to better understand how the presynapse shapes signaling......". 

      Thank you for the valuable comments. We have now provided this information in the introduction and discussion.

      Reviewer #3 (Public review): 

      Summary: 

      In this study, the authors have developed a new Ca indicator conjugated to the peptide, which likely recognizes synaptic ribbons and have measured microdomain Ca near synaptic ribbons at retinal bipolar cells. This interesting approach allows one to measure Ca close to transmitter release sites, which may be relevant for synaptic vesicle fusion and replenishment. Though microdomain Ca at the active zone of ribbon synapses has been measured by Hudspeth and Moser, the new study uses the peptide recognizing synaptic ribbons, potentially measuring the Ca concentration relatively proximal to the release sites. 

      Strengths: 

      The study is, in principle, technically well done, and the peptide approach is technically interesting, which allows one to image Ca near the particular protein complexes. The approach is potentially applicable to other types of imaging. 

      Thank you very much for this appreciation.

      Weaknesses: 

      Peptides may not be entirely specific, and genetic approach tagging particular active zone proteins with fluorescent Ca indicator proteins may well be more specific. Although the authors are aware of this and the peptide approach is generally used for ribbon synapses, the authors should be aware of this, when interpreting the results. 

      We acknowledge the reviewer’s point and believe the peptides and genetic approaches to measure local calcium signals have their merits, each with separate advantages and disadvantages.  

      Reviewer #1 (Recommendations for the authors): 

      The revisions helped with some concerns about the original paper, but some issues were not adequately addressed. I have left two primary concerns in my public review. To summarize those: 

      The difference in kinetics of proximal and distal locations is emphasized and quantified in the paper, but the quantification consists of a fit to the average responses. This does not give an idea of whether the difference observed is significant or not. Without an estimate of the error across measurements the difference in kinetic quoted is not interpretable. 

      Thank you for this feedback. Since the kinetics information is a minor part of the manuscript, we have followed the Editor’s advice to significantly tone down the comparison of kinetic fit parameters (completely removing the rise-time comparisons), in order to put more focus on the better-documented conclusions. We also note that we did establish statistical significance of the differences in fluorescence signal amplitudes. 

      Somewhat relatedly, the difference in amplitude and kinetics of the calcium signals measured with low and high affinity indicators is quite concerning. The authors added one sentence stating that the high affinity indicator might be saturated. This is not adequate. Should we distrust the measurements using the high affinity indicator? The differences between the results using the low and high affinity indicators is in some cases large - e.g. larger than the differences cited as a key result between distal and proximal locations. This issue needs to be dealt with directly in the paper. 

      Thank you for this feedback. Yes, the measurements from high-affinity indicators cannot report the Ca2+ as accurately as low-affinity indicators. However, the value of HA indicators is in their ability to detect lowamplitude signals that lower-affinity indicators may miss due to lower signal-to-noise resolution.  We added a sentence on page 12 to further stress this point.

      Related to the point about statistics, it is not clear how to related the horizontal lines in Figure 8 to the actual measurements. It is critical for the evaluation of the conclusions from that figure to understand what is plotted and what the error bars are on the plotted data. 

      We apologize for the earlier ambiguity in Fig. 8. In this figure, we first compare proximal (panel B) and distal (panel C) calcium signals across several RBCs, labeled RBC-a through RBC-d. Each RBC contains multiple ribbons, and for each cell, we present the average calcium signals from multiple ribbons using box plots in panels B and C. In these box plots, the horizontal lines represent the average calcium signal for each cell, while the size of the error bars reflects the variability in proximal and distal calcium signals among the ribbons within that RBC.

      For example, RBC-a had five identifiable ribbons. In panels D–F, we use RBC-a to illustrate the variability in calcium signals across individual ribbons. Specifically, we distinguished proximal and distal calcium signals from five ribbons (ribbons 1–5) within RBC-a. When feasible, we acquired multiple x–t line scans at a single ribbon, shown now as individual data points, to assess variability in calcium signals recorded from the same ribbon.

      The box plots in panels E and F display the average calcium signal (horizontal lines) for each ribbon, based on multiple recordings. These plots demonstrate considerable variability between ribbons of RBC-a. Importantly, the lack of or minimal error bars for repeated measurements at the same ribbon indicates that the proximal and distal calcium signals are consistent within a ribbon. These findings emphasize that the observed variability among ribbons and among cells reflects true biological heterogeneity in local calcium domains, rather than experimental noise.

    1. Author response:

      The following is the authors’ response to the original reviews.\

      Reviewer #1(Public review):

      (1) Changes in blood volume due to brain activity are indirectly related to neuronal responses. The exact relationship is not clear, however, we do know two things for certain: (a) each measurable unit of blood volume change depends on the response of hundreds or thousands of neurons, and (b) the time course of the volume changes are slow compared to the potential time course of the underlying neuronal responses. Both of these mean that important variability in neuronal responses will be averaged out when measuring blood changes. For example, if two neighbouring neurons have opposite responses to a given stimulus, this will produce opposite changes in blood volume, which will cancel each other out in the blood volume measurement due to (a). This is important in the present study because blood volume changes are implicitly being used as a measure of coding in the underlying neuronal population. The authors need to acknowledge that this is a coarse measure of neuronal responses and that important aspects of neuronal responses may be missing from the blood volume measure.

      The reviewer is correct: we do not measure neuronal firing but use blood volume as a proxy for bulk local neuronal activity, which does not capture the richness of single neuron responses. This is why the paper focuses on large-scale spatial representations as well as cross-species comparison. For this latter purpose, fMRI responses are on par with our fUSI data, with both neuroimaging techniques showing the same weakness. We have now added this point to the discussion: 

      “Second, we used blood volume as a proxy for local neuronal activity. Thus, our signal ignores any heterogeneity that might exist at the level of local neuronal populations. However, our main findings are related to the large-scale organization of cortical responses and how they relate to those of humans. For this purpose, the functional spatial resolution of our signal, driven by the spatial resolution of neurovascular coupling, should be adapted. In addition, using hemodynamic signals provides a much better comparison with human fMRI data, where the same limitations are present.”

      (2) More importantly for the present study, however, the effect of (b) is that any rapid changes in the response of a single neuron will be cancelled out by temporal averaging. Imagine a neuron whose response is transient, consisting of rapid excitation followed by rapid inhibition. Temporal averaging of these two responses will tend to cancel out both of them. As a result, blood volume measurements will tend to smooth out any fast, dynamic responses in the underlying neuronal population. In the present study, this temporal averaging is likely to be particularly important because the authors are comparing responses to dynamic (nonstationary) stimuli with responses to more constant stimuli. To a first approximation, neuronal responses to dynamic stimuli are themselves dynamic, and responses to constant stimuli are themselves constant. Therefore, the averaging will mean that the responses to dynamic stimuli are suppressed relative to the real responses in the underlying neurons, whereas the responses to constant stimuli are more veridical. On top of this, temporal following rates tend to decrease as one ascends the auditory hierarchy, meaning that the comparison between dynamic and stationary responses will be differently affected in different brain areas. As a result, the dynamic/stationary balance is expected to change as you ascend the hierarchy, and I would expect this to directly affect the results observed in this study.

      It is not trivial to extrapolate from what we know about temporal following in the cortex to know exactly what the expected effect would be on the authors' results. As a first-pass control, I would strongly suggest incorporating into the authors' filterbank model a range of realistic temporal following rates (decreasing at higher levels), and spatially and temporally average these responses to get modelled cerebral blood flow measurements. I would want to know whether this model showed similar effects as in Figure 2. From my guess about what this model would show, I think it would not predict the effects shown by the authors in Figure 2. Nevertheless, this is an important issue to address and to provide control for.

      We understand the reviewer’s concern about potential differences in response dynamics in stationary vs non-stationary sounds. It seems that the reviewer is concerned that responses to foregrounds may be suppressed in non-primary fields because foregrounds are not stationary, and non-primary regions could struggle to track and respond to these sounds. Nevertheless, we observed the contrary, with non-primary regions overrepresenting non-stationary (dynamic) sounds, over stationary ones. For this reason, we are inclined to think that this explanation cannot falsify our findings. 

      We understand the comment that temporal following rates might differ across regions in the auditory hierarchy and agree. In fact, we do show that tuning to temporal rates differs across regions and partly explains the differences in background invariance we observe. In this regard, we think the reviewer’s suggestion is already implemented by our spectrotemporal model, which incorporates the full range of realistic temporal following rates (up to 128 Hz). The temporal averaging is done as we take the output of the model (which varies continuously through time) and average it in the same window as we used for fUSI data. When we fit this model to the ferret data, we find that voxels in non-primary regions, especially VP (tertiary auditory cortex), tend to be more tuned to low temporal rates (Figure 2F, G), and that background invariance is stronger in voxels tuned to low rates. This is, however, not true in humans, suggesting that background invariance in humans relies on different computational mechanisms. We have added a sentence to clarify this: “The model included a range of realistic temporal rates and this axis was the most informative to discriminate foregrounds from backgrounds.”

      (3) I do not agree with the equivalence that the authors draw between the statistical stationarity of sounds and their classification as foreground or background sounds. It is true that, in a common foreground/background situation - speech against a background of white noise - the foreground is non-stationary and the background is stationary. However, it is easy to come up with examples where this relationship is reversed. For example, a continuous pure tone is perfectly stationary, but will be perceived as a foreground sound if played loudly. Background music may be very non-stationary but still easily ignored as a background sound when listening to overlaid speech. Ultimately, the foreground/background distinction is a perceptual one that is not exclusively determined by physical characteristics of the sounds, and certainly not by a simple measure of stationarity. I understand that the use of foreground/background in the present study increases the likely reach of the paper, but I don't think it is appropriate to use this subjective/imprecise terminology in the results section of the paper.

      We appreciate the reviewer’s comment that the classification of our sounds into foregrounds and backgrounds is not verified by any perceptual experiments. We use those terms to be consistent with the literature (McWalter and McDermott, 2018; McWalter and McDermott, 2019), including the paper we derived this definition from (Kell et al., 2019). These terms are widely used in studies where no perceptual or behavioral experiments are included, and even when animals are anesthetized. We have clarified and justified this choice in the beginning of the Results section:

      “We used three types of stimuli: foregrounds, backgrounds, and combinations of those. We use those terms to refer to sounds differing in their stationarity, under the assumption that stationary sounds carry less information than non-stationary sounds, and are thus typically ignored.”

      We have also added a paragraph in the discussion to emphasize the limits of this definition:

      “First, this study defined foregrounds and backgrounds solely based on their acoustic stationarity, rather than perceptual judgments. This choice allowed us to isolate the contribution of acoustic factors in a simplified setting. Within this controlled framework, we show that acoustic features of foreground and background sounds drive their separation in the brain and the hierarchical extraction of foreground sound features.”

      (4) Related to the above, I think further caveats need to be acknowledged in the study. We do not know what sounds are perceived as foreground or background sounds by ferrets, or indeed whether they make this distinction reliably to the degree that humans do. Furthermore, the individual sounds used here have not been tested for their foreground/background-ness. Thus, the analysis relies on two logical jumps - first, that the stationarity of these sounds predicts their foreground/background perception in humans, and second, that this perceptual distinction is similar in ferrets and humans. I don't think it is known to what degree these jumps are justified. These issues do not directly affect the results, but I think it is essential to address these issues in the Discussion, because they are potentially major caveats to our understanding of the work.

      We agree with the reviewer that the foreground-background distinction might be different in ferrets. In anticipation of that issue, we had enriched the sound set with more ecologically relevant sounds, such as ferret and other animal vocalizations. Nevertheless, we have emphasized this limitation in addition to the limitation of our definition of foregrounds and backgrounds in the discussion: 

      “In addition, most of the sounds included in our study likely have more relevance for humans compared to ferrets (see table \ref{tbl1}). Despite including ferret vocalizations and environmental sounds that are more ecologically relevant for ferrets, it is not clear whether ferrets would behaviorally categorize foregrounds and backgrounds as humans do. Examining how ferrets naturally orient or respond to foreground and background sounds under more ecologically valid conditions, potentially with free exploration or spontaneous listening paradigms, could help address this issue.”

      Reviewer #2(Public review);

      (1) Interpretation of the cerebral blood volume signal: While the results are compelling, more caution should be exercised by the authors in framing their results, given that they are measuring an indirect measure of neural activity, this is the difference between stating "CBV in area MEG was less background invariant than in higher areas" vs. saying "MEG was less background invariant than other areas". Beyond framing, the basic properties of the CBV signal should be better explored:

      a) Cortical vasculature is highly structured (e.g. Kirst et al.( 2020) Cell). One potential explanation for the results is simply differences in vasculature and blood flow between primary and secondary areas of auditory cortex, even if fUS is sensitive to changes in blood flow, changes in capillary beds, etc (Mace et al., 2011) Nat. Methods.. This concern could be addressed by either analyzing spontaneous fluctuations in the CBV signal during silent periods or computing a signal-to-noise ratio of voxels across areas across all sound types. This is especially important given the complex 3D geometry of gyri and sulci in the ferret brain.

      We agree with the reviewers that there could be differences in vasculature across subregions of the auditory cortex and note that this point would also be valid for the published human fMRI data. Nevertheless, even if small differences in vasculature were present, it is unlikely that they would affect our analyses and results, which are designed to be independent of local vascular density. First, we normalize the signal in each voxel using the silent periods, so that the absolute strength of the raw signal, or baseline blood volume in each voxel, is factored in our analysis. Second, we only focus on reliably responsive voxels in each region and do see comparable sound-evoked responses in all regions (Figure S2). Third, our analysis mostly relies on voxel-based correlation across sounds, which is independent of the mean and variance of the voxel responses. Differences in noise, measured through test-retest reliability, can affect values of correlation, which is why we used a noise-correction procedure. After this procedure, invariance does not depend on test-retest, and differences across regions are still seen when matching for test-retest (new  Figure S7). Thus, we believe that differences in vascular architecture across regions are unlikely to affect our results. We added this point in the Methods section when discussing the noise-correction:

      “After this correction, the differences we observed between brain regions were present regardless of voxels' test-retest reliability, or noise level (Figure S7). Thus, potential differences in vasculature across regions are unlikely to affect our results.”

      b) Figure 1 leaves the reader uncertain what exactly is being encoded by the CBV signal, as temporal responses to different stimuli look very similar in the examples shown. One possibility is that the CBV is an acoustic change signal. In that case, sounds that are farther apart in acoustic space from previous sounds would elicit larger responses, which is straightforward to test. Another possibility is that the fUS signal reflects time-varying features in the acoustic signal (e.g. the low-frequency envelope). This could be addressed by cross-correlating the stimulus envelope with fUS waveform. The third possibility, which the authors argue, is that the magnitude of the fUS signal encodes the stimulus ID. A better understanding of the justification for only looking at the fUS magnitude in a short time window (2-4.8 s re: stimulus onset) would increase my confidence in the results.

      We thank the reviewer for raising that point as it highlights that the layout of Figure 1 is misleading. While Figure 1B shows an example snippet of our sound streams, Figure 1D shows the average timecourse of CBV time-locked to a change in sound (foreground or background, isolated or in a mixture). This is the average across all voxels and sounds, aiming at illustrating the dynamics for the three broad categories. In Figure 1E however, we show the cross-validated cross-correlation of CBV across sounds (and different time lags). To obtain this, we compute for each voxel the response to each sound at each time lag, thus obtaining two vectors (size: number of sounds) per lag, one per repeat. Then, we correlate all these vectors across the two repeats, obtaining one cross-correlation matrix per voxel. We finally average these matrices across all voxels. The presence of red squares with high correlations demonstrates that the signal encodes sound identity, since CBV is more similar across two repeats of the same sound (e.g., in the foreground only matrix, 0-5 s vs 0-5 s), than two different sounds (0-5 s vs. 7-12 s). We modified the figure layout as well as the legend to improve clarity.

      (2) Interpretation of the human data: The authors acknowledge in the discussion that there are several differences between fMRI and fUS. The results would be more compelling if they performed a control analysis where they downsampled the Ferret fUS data spatially and temporally to match the resolution of fMRI and demonstrated that their ferret results hold with lower spatiotemporal resolution.

      We agree with the reviewer that the use of different techniques might come in the way of cross-species comparison. We already control for the temporal aspect by using the average of stimulus-evoked activity across time (note that due to scanner noise, sounds are presented cut into small pieces in the fMRI experiments). Regarding the spatial aspect, there are several things to consider. First, both species have brains of very different sizes, a factor that is conveniently compensated for by the higher spatial resolution of fUSI compared to fMRI (0.1 vs 2 mm). Downsampling to fMRI resolution would lead to having one voxel per region per slice, which is not feasible. We also summarize results with one value per region, which is a form of downsampling that is fairer across species. Furthermore, we believe that we already established in a previous study (Landemard et al, 2021 eLife) that fUSI and fMRI data are comparable signals. We indeed could predict human fMRI responses to most sounds from ferret fUSI responses to the same identical sounds. We clarified these points in the discussion:

      “In addition, fMRI has a worse spatial resolution than fUSI (here, 2 vs. 0.1 mm voxels). However, this difference in resolution compensates for the difference in brain size between humans and ferrets. In our previous work, we showed that a large fraction of cortical responses to natural sounds could be predicted from one species to the other using these methods (Landemard et al., 2021).”

      Reviewer #3 (Public review):

      As mentioned above, interpretation of the invariance analyses using predictions from the spectrotemporal modulation encoding model hinges on the model's ability to accurately predict neural responses. Although Figure S5 suggests the encoding model was generally able to predict voxel responses accurately, the authors note in the introduction that, in human auditory cortex, this kind of tuning can explain responses in primary areas but not in non-primary areas (Norman-Haignere & McDermott, PLOS Biol. 2018). Indeed, the prediction accuracy histograms in Figure  S5C suggest a slight difference in the model's ability to predict responses in primary versus non-primary voxels. Additional analyses should be done to a) determine whether the prediction accuracies are meaningfully different across regions and b) examine whether controlling for prediction accuracy across regions (i.e., subselecting voxels across regions with matched prediction accuracy) affects the outcomes of the invariance analyses.

      The reviewer is correct: the spectrotemporal model tends to perform less well in human non-primary cortex. We believe this does not contradict our results but goes in the same direction: while there is a gradient in invariance in both ferrets and humans, this gradient is predicted by the spectrotemporal model in ferrets, but not in humans (possibly indeed because predictions are less good in human non-primary auditory cortex). Regardless of the mechanism, this result points to a difference across species. In ferrets, we found a significantly better prediction accuracy in VP (p=0.001, permutation test) and no differences between MEG and dPEG (p=0.89). In humans, prediction accuracy was slightly higher in primary compared to non-primary auditory cortex, but this effect was not significant (p=0.076). In both species, when matching prediction accuracy between regions, the gradients in invariance were preserved. We have added these analyses to the manuscript (Figure S5).

      A related concern is the procedure used to train the encoding model. From the methods, it appears that the model may have been fit using responses to both isolated and mixture sounds. If so, this raises questions about the interpretability of the invariance analyses. In particular, fitting the model to all stimuli, including mixtures, may inflate the apparent ability of the model to "explain" invariance, since it is effectively trained on the phenomenon it is later evaluated on. Put another way, if a voxel exhibits invariance, and the model is trained to predict the voxel's responses to all types of stimuli (both isolated sounds and mixtures), then the model must also show invariance to the extent it can accurately predict voxel responses, making the result somewhat circular. A more informative approach would be to train the encoding model only on responses to isolated sounds (or even better, a completely independent set of sounds), as this would help clarify whether any observed invariance is emergent from the model (i.e., truly a result of low-level tuning to spectrotemporal features) or simply reflects what it was trained to reproduce.

      We thank the reviewer for this suggestion. We have run an additional prediction using only the sounds presented in isolation, which replicates our main results (new Figure S6). We have added this control to the manuscript:

      “Results were similar if the model was fit solely on isolated sounds, excluding mixtures from the training set (Figure S6).”

      Finally, the interpretation of the foreground invariance results remains somewhat unclear. In ferrets (Figure 2I), the authors report relatively little foreground invariance, whereas in humans (Figure 5G), most participants appear to show relatively high levels of foreground invariance in primary auditory cortex (around 0.6 or greater). However, the paper does not explicitly address these apparent crossspecies differences. Moreover, the findings in ferrets seem at odds with other recent work in ferrets (Hamersky et al. 2025 J. Neurosci.), which shows that background sounds tend to dominate responses to mixtures, suggesting a prevalence of foreground invariance at the neuronal level. Although this comparison comes with the caveat that the methods differ substantially from those used in the current study, given the contrast with the findings of this paper, further discussion would nonetheless be valuable to help contextualize the current findings and clarify how they relate to prior work.

      We thank the reviewer for this point. While we found a trend for higher background invariance than foreground invariance in ferret primary auditory cortex, this difference was not significant and many voxels exhibit similar levels of background and foreground invariance (for example in Figure 2D, G). Thus, we do not think our results are inconsistent with Hamersky et al., 2025, though we agree the bias towards background sounds is not as strong in our data. This might indeed reflect differences in methodology, both in the signal that is measured (blood volume vs spikes), and the sound presentation paradigm. Our timescales are much slower and likely reflect responses post-adaptation, which might not be as true for Hamersky et al. We have added this point to the discussion, as well as a comment on the difference between ferrets and humans in foreground invariance in primary auditory cortex:

      “In ferrets, primary auditory cortex has been found to over-represent backgrounds in mixtures compared to foregrounds (Hamersky et al., 2025). In contrast, we found a slight, non-significant bias towards foregrounds in primary regions. This difference could be driven by a difference in timescales, as we looked at slower timescales in which adaptation might be more present, reducing the strength of background encoding. In humans, we found a much smaller gap between background and foreground invariance in primary auditory cortex, which was not predicted by the spectrotemporal model. Additional, more closely controlled experiments would be needed to confirm and understand this species difference.”

      Reviewer #1 (Recommendations for the authors):

      (1) In the introduction, explain the relationship between background/foreground and stationarity/non-stationarity, and thus why stationary/nonstationary stimuli could be used to probe differences in background/foreground processing.

      We have added a sentence at the beginning of the results section to justify our choice (see public review).  

      (2) Avoid use of the background/foreground terminology in Results (and probably Methods).

      For consistency with previous literature, we decided to keep this terminology, though imperfect. We further justified our choice in the beginning of the Results section (see previous point).

      (3) In the Discussion, explain what the implications of the results are for background/foreground processing, and, importantly, highlight any caveats that result from stationarity not being a direct measure of background/foreground.

      We added a paragraph in the Discussion to highlight this point choice (see public review).

      Reviewer #2 (Recommendations for the authors):

      (1) Figure 1: Showing a silent period in the examples would help in understanding the fUS signal.

      In Figure 1D, we show the average timecourse of CBV time-locked to a change in sound (foreground or background, isolated or in a mixture). This is the average across all voxels and sounds. Thus, it would not be very informative to show an equivalent plot for a silent period, as it would look flat by definition. However, we updated the layout and legend of Figure 1 to make it clearer and avoid confusion.

      (2) "Responses were not homogenous" - would make more sense to say something like "responses were not spatially distributed".

      We removed these words which were indeed not necessary: “We found that reliable soundevoked responses were confined to the central part of ventral gyrus of the auditory cortex.”

      (3) Figure 2D: The maps shown in Figure 2D are difficult to understand for the noninitiated in fUS. At a minimum, labels should be added to indicate A-P, M-L, D-V. I cannot see the white square in the primary figure. An additional graphic would be helpful here to understand the geometry of the measurement.

      We thank the reviewer for pointing out that reading these images is indeed an acquired skill. We added an annotated image of anatomy with indications of main features to guide the reader in Figure 1. We also added missing white squares. 

      (4) Figure 2F: Can the authors better justify why the summary statistic is shown for all three areas, but the individual data only compares primary vs. higher order?`

      We now show individual data for all three areas.

      (5) More methods information is needed to understand how recordings were stitched across days. Was any statistical modeling used to factor out the influence of day on overall response levels?

      We simply concatenated voxels recorded across different sessions and days. The slices were sampled randomly to avoid any systematic effect. Because different slices were sampled in different sessions, any spatial structure spanning several slices is unlikely to be artefactual. For instance, the map of average responses in Figure 2A shows a high level of continuity of spatial patterns across slices. This indicates that this pattern reflects a true underlying organization rather than session-specific noise. It also shows that the overall response levels are not affected by the day or recording session. We added a section in the Methods (“Combining different recordings”) to clarify this point:

      “The whole dataset consisted of multiple slices, each recorded in a different recording session. Slices to image on a given day were chosen at random to avoid any systematic bias. Responses were consistent across neighboring slices recorded on different sessions, as shown by the maps of average responses (Figure 2A, Figure S2) where any spatial continuity across different slices must reflect a true underlying signal in the absence of common noise.”

      Reviewer #3 (Recommendations for the authors):

      (1) Figures:

      The figures are generally very well done and visually appealing. However, I have a few suggestions and questions.

      a)  In Figure 1G, the delta CBV ranges from 0.5 to 1.5, although in subsequent figures (e.g., Figure 2D), the range is much larger (-15 to 45). Is it possible that the first figure is a proportion rather than a percentage, or is there some other explanation for the massive difference in scale? Not being very familiar with this measure, it was confusing.

      The same scale is used in both figures, the major difference being that in Figure 1D, we take the average over all voxels and sounds (for each category), which will include many nonresponsive voxels, and for responsive voxels, sounds that they do not respond a lot to. On the other hand, Figure 2D shows the response of a single, responsive voxel. Thus, the values it reaches for its preferred sounds (45%) are an extreme, which weighs only little in Figure 1D. We have changed the legend of Figure 1D to make this more explicit.

      b)  Similar to the first point, the strength of the correlations in the matrices of Figure 1E is very small (~ 0.05) compared to the test-retest reliabilities plotted in Figure 2B (~0.5). Again, I was confused by this large difference in scale.

      Two main factors explain the difference in values between Figure 1E and Figure 2B. First, in Figure 1B, each correlation is done on the average activity in a window of 0.3 s, opposed to 2.4 s in Figure 2B. More averaging leads to better SNR, which inevitably leads to higher testretest correlations. Second, in Figure 1B, the cross-correlation matrices are averaged across all responsive voxels without any criterion for reliability. On the other hand, Figure 2B show example voxels with good test-retest reliability. 

      c)  In Figure 2D, the example voxels are supposed to be shown in white. It appears that this example voxel is only shown for the non-primary voxel. Please be sure to add these voxels throughout the other panels and figures as well. 

      We fixed this mistake and added the example voxel in all panels.

      d)  Why do the invariance results (e.g., Figure 2F) for individual animals combine across dPEG and VP, while the overall results (across all animals) split things across all three regions? The results in Table 2 do, in fact, provide this data. Upon further examination of the data in Table 2, it seems like there is only a significant difference between background invariance between dPEG and VP for one of the two animals, and that this might be what drives the effect when pooling across all animals. This seems important to both show visually in the figure and to potentially discuss. There is still very clearly a difference between primary and non-primary, but whether there is a real difference between dPEG and VP seems more unclear.

      We added the values for single animals in the plot and highlighted this limitation in the text:

      “While background invariance was overall highest in VP, the differences within non-primary areas were more variable across animals (see table 2).”

      e)  Again, as in Figure 2F, the cross symbols seem like a bad choice as markers since the vertical components of the cross are suggestive of the error of the measurement. However, no error is actually plotted in these figures. I recommend using a different marker and including some measure of error in the invariance plots.

      We replaced the crosses with circles to avoid confusion. The measure of error is provided by the representation of values for single animals.

      f) The caption for Figure 4C states that each line corresponds to one animal, but does not precisely state what this line represents. Is this the median or something?

      Each line indeed represents the median across voxels for one animal. We added this information to the legend.

      g)  In Figure 5, the captions for panels D and E are swapped.

      This has now been corrected.

      (2) Discussion:

      (a) In the paragraph on methodological differences, it mentions that the fMRI voxel size is around 2 mm. This may be true in general, but given the comparison to Kell & McDermott 2019, the voxel size should reflect that used in their study (1 mm).

      The reviewer might refer to this sentence from the methods of Kell et al., 2019: “T1weighted anatomical images were collected in each participant (1-mm isotropic voxels) for alignment and cortical surface reconstruction.” However, this does not correspond to the resolution of the functional data, which is 2 mm, as mentioned a bit further in the Methods:  “In-plane resolution was 2 × 2 mm (96 × 96 matrix), and slice thickness was 2.8 mm with a 10% gap, yielding an effective voxel size of 2 × 2 × 3.08 mm.”

      (b) In the next paragraph on the control of attention, it mentions that attentional differences could play a role. However, in Kell & McDermott 2019, they manipulated attention (attend visual versus attend auditory) and found that it did not substantially affect the observed pattern invariance. I suppose it could potentially affect the degree to which an encoding model could explain the invariance. This seems important, and given that the data was already collected, it could be worth it to analyze that data.

      As the reviewer points out, Kell et al. 2019 ran an additional experiment in which they manipulated auditory vs. visual attention. However, the auditory task was just based on loudness and ensured that the participants were awake and paying attention to the stimuli, but not specifically to the foreground or background. This type of attention did not lead to changes in the observed patterns of invariance, which might have been the case for selective attention to backgrounds or foregrounds in the mixture. Given that these manipulations were not done in the ferret experiments, we chose to not include the analysis of this dataset in the scope of this paper. However, future work investigating that topic further would indeed be of interest.

      (c) The mention of "a convolutional neural network trained to recognize digits in noise" should make more obvious that this is visual recognition rather than auditory recognition.

      We clarified this sentence to make clear that the recognition is visual and not auditory: “For instance, in a convolutional neural network trained to visually recognize digits in different types of noise, when local feedback is implemented, early layers encode noise properties, while later layers represent clean signal.”

      (d) Finally, one explanation of the results in the discussion is that "primary auditory areas could be recruited to maintain background representations, enabling downstream cortical regions to use these representations to specifically suppress background information and enhance foreground representations." This "background-related information" being used to "facilitate further extraction of foregrounds" is similar to what is argued in Hicks & McDermott PNAS 2024.

      We thank the reviewer for suggesting this relevant reference and added it in this paragraph of the discussion.

      (3) Methods:

      In the "Cross-correlation matrices" section, it mentions that time-averaged responses from 2.4 to 4.8 s were used. It would be helpful to provide an explanation of why this particular time window was used. Additionally, I wondered whether one could look at adaptation type effects (e.g., that of Khalighinejad et al., 2019) or whether fUSI does not offer this kind of temporal precision?

      The effects shown in Khalighinejad et al., 2019, are indeed likely too fast to be observed with our methods. However, there are still dynamics in the fUSI signal and in its invariance (Figure S1). Each individual combination of foreground and background is presented for 4.8 s (Figure 1B). Therefore, we chose the range 2.4-4.8 s as the biggest window we could use (to improve SNR) while minimizing contamination from the previous or next sound (indeed, blood volume typically lags neuronal activity by 1.5-2 s). We added this precision to the methods.

      In the "Human analyses" section, it is very unclear which set of data was used from Kell & McDermott 2019. For example, that paper contains 4 different experiments, none of which has 7 subjects. Upon closer reading, it seems that only 7 of the 11 participants from Experiment 1 also heard the background sounds in isolation (thus enabling the foreground invariance analyses). However, they stated that there were only 3 female participants in that experiment, while you state that you used data from 7 females. It would be helpful to double-check this and to more clearly state exactly which participants (i.e., from which experiment) were used and why (e.g., why not use data from Experiment 4 in the visual task/attention condition?).

      We added a sentence to clarify which datasets were used: “Specifically, we used data from Experiment 1 which provided the closest match to our experimental conditions, and only considered the last 7 subjects that heard both the foregrounds and the backgrounds in isolation, in addition to the mixtures.” 

      It was a mistake to mention that it was all female, as the original dataset has 3 females and 8 males, of which we used 7 without any indication of their sex. Thus, we removed this mention from the text.

      In the "Statistical testing" section, why were some tests done with 1000 permutations/shuffles while others were done with 2000?

      We homogenized and used 1000 permutations/shuffles for all statistical tests.

      (4) Miscellany:

      (a) The Hamersky et al. 2023 preprint has recently been published (referenced in the public review), and so you could consider updating the reference.

      This reference has now been updated.

      (b) There are a few borderline statistical tests that could use a bit more nuance. For example (on page 4), "In primary auditory cortex (MEG), there was no significant difference between values of foreground invariance and background invariance (p = 0.063, obtained by randomly permuting the sounds' background and foreground labels, 1000 times)." This test is quite close to being significant, and this might be acknowledged.

      We emphasized the trend to nuance the interpretation of these results: “In primary auditory cortex (MEG), foreground invariance was slightly lower than background invariance, although this difference was not significant (p=0.063, obtained by randomly permuting the sounds' background and foreground labels, 1000 times).”

      (5) Potential typos:

      (a)   Should the title be "natural sound mixtures" instead of "natural sounds mixtures"?

      (b) The caption for Figure 1 says "We imaged the whole auditory through successive slices across several days." I believe this should the "the whole auditory [cortex]." c) In the first paragraph of the discussion, there is a sentence ending in "...are segregated in hemody-namic signal." I believe this should be "hemody-namic signal."

      These errors are now all corrected.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Sumary:

      This study evaluates whether species can shift geographically, temporally, or both ways in response to climate change. It also teases out the relative importance of geographic context, temperature variability, and functional traits in predicting the shifts. The study system is large occurrence datasets for dragonflies and damselflies split between two time periods and two continents. Results indicate that more species exhibited both shifts than one or the other or neither, and that geographic context and temp variability were more influential than traits. The results have implications for future analyses (e.g. incorporating habitat availability) and for choosing winner and loser species under climate change. The methodology would be useful for other taxa and study regions with strong community/citizen science and extensive occurrence data.

      We thank Reviewer 1 for their time and expertise in reviewing our study. The suggestions are very helpful and will improve the quality of our manuscript.

      Strengths:

      This is an organized and well-written paper that builds on a popular topic and moves it forward. It has the right idea and approach, and the results are useful answers to the predictions and for conservation planning (i.e. identifying climate winners and losers). There is technical proficiency and analytical rigor driven by an understanding of the data and its limitations.

      We thank Reviewer 1 for this assessment.

      Weaknesses:

      (1) The habitat classifications (Table S3) are often wrong. "Both" is overused. In North America, for example, Anax junius, Cordulia shurtleffii, Epitheca cynosura, Erythemis simplicicollis, Libellula pulchella, Pachydiplax longipennis, Pantala flavescens, Perithemis tenera, Ischnura posita, the Lestes species, and several Enallagma species are not lotic breeding. These species rarely occur let alone successfully reproduce at lotic sites. Other species are arguably "both", like Rhionaeschna multicolor which is mostly lentic. Not saying this would have altered the conclusions, but it may have exacerbated the weak trait effects.

      We thank the reviewer for their expertise on this topic. We obtained these habitat classifications from field guides and trait databases, and reviewed our primary sources to clarify the trait classifications. We reclassified the species according to the expertise of this reviewer and perform our analysis again; please see details below.

      (2) The conservative spatial resolution (100 x 100 km) limits the analysis to wide- ranging and generalist species. There's no rationale given, so not sure if this was by design or necessity, but it limits the number of analyzable species and potentially changes the inference.

      It is really helpful to have the opportunity to contextualize study design decisions like this one, and we thank the reviewer for the query. Sampling intensity is always a meaningful issue in research conducted at this scale, and we addressed it head-on in this work.

      Very small quadrats covering massive geographical areas will be critically and increasingly afflicted by sampling weaknesses, as well as creating a potentially large problem with pseudoreplication. There is no simple solution to this problem. It would be possible to create interpolated predictions of species’ distributions using Species Distribution Models, Joint Species Distribution Models, or various kinds of Occupancy Models. None of these approaches then leads to analyses that rely on directly observed patterns. Instead, they are extrapolations, and those extrapolations typically fail when tested, although they have still been tested (for example, papers by Lee-Yaw demonstrate that it is rare for SDMs to predict things well; occupancy models often perform less well than SDMs and do not capture how things change over time - Briscoe et al. 2021, Global Change Biology). The result of employing such techniques would certainly be to make all conclusions speculative, rather than directly observable. 

      Rather than employing extrapolative models, we relied on transparent techniques that are used successfully in the core macroecology literature that address spatial variation in sampling explicitly and simply. Moreover, we constructed extensive null models that show that range and phenology changes, respectively, are contrary to expectations that arise from sampling difference. 100km quadrats make for a reasonable “middle-ground” in terms of the effects of sampling, and we added a reference to the methods section to clarify this (see details below).

      (3) The objective includes a prediction about generalists vs specialists (L99-103) yet there is no further mention of this dichotomy in the abstract, methods, results, or discussion.

      Thank you for pointing this out - it is an editing error that should have been resolved prior to submission. We replaced the terms specialist and generalist with specific predictions based on traits (see details below).

      (4) Key references were overlooked or dismissed, like in the new edition of Dragonflies & Damselflies model organisms book, especially chapters 24 and 27.

      We thank Reviewer 1 for making us aware of this excellent reference. We have reviewed the text and include it as a reference, in addition to other references recommended by Reviewer 1 and other reviewers (see details below).

      Reviewer #2 (Public review):

      Summary:

      This paper explores a highly interesting question regarding how species migration success relates to phenology shifts, and it finds a positive relationship. The findings are significant, and the strength of the evidence is solid. However, there are substantial issues with the writing, presentation, and analyses that need to be addressed. First, I disagree with the conclusion that species that don't migrate are "losers" - some species might not migrate simply because they have broad climatic niches and are less sensitive to climate change. Second, the results concerning species' southern range limits could provide valuable insights. These could be used to assess whether sampling bias has influenced the results. If species are truly migrating, we should observe northward shifts in their southern range limits. However, if this is an artifact of increased sampling over time, we would expect broader distributions both north and south. Finally, Figure 1 is missed panel B, which needs to be addressed.

      We thank Reviewer 2 for their time and expertise in reviewing our study.

      It is possible that some species with broad niches may not need to migrate, although in general failing to move with climate change is considered an indicator of “climate debt”, signaling that a species may be of concern for conservation (ex. Duchenne et al. 2021, Ecology Letters). We revised the discussion to acknowledge potential differences in outcomes (please see details below).

      We used null models to test whether our results regarding range shifts were robust, and if they varied due to increased sampling over time. We found that observed northern range limit shifts are not consistent with expectations derived from changes in sampling intensity (Figure S1, S2). 

      We thank Reviewer 2 for pointing out this error in Figure 1. This conceptual figure was a challenge to construct, as it must illustrate how phenology and range shifts can occur simultaneously or uniquely to enable a hypothetic odonate to track its thermal niche over time. In a previous version of the figure, we had a second panel and we failed to remove the reference to that panel when we simplified the figure. We have updated the figure and figure caption (please see details below).

      Reviewer #3 (Public review):

      Summary:

      In their article "Range geographies, not functional traits, explain convergent range and phenology shifts under climate change," the authors rigorously investigate the temporal shifts in odonate species and their potential predictors. Specifically, they examine whether species shift their geographic ranges poleward or alter their phenology to avoid extreme conditions. Leveraging opportunistic observations of European and North American odonates, they find that species showing significant range shifts also exhibited earlier phenological shifts. Considering a broad range of potential predictors, their results reveal that geographical factors, but not functional traits, are associated with these shifts.

      We thank Reviewer 3 for their expertise and the time they spent reviewing our study. Their suggestions are very helpful and will improve the quality of our manuscript.

      Strengths:

      The article addresses an important topic in ecology and conservation that is particularly timely in the face of reports of substantial insect declines in North America and Europe over the past decades. Through data integration the authors leverage the rich natural history record for odonates, broadening the taxonomic scope of analyses of temporal trends in phenology and distribution to this taxon. The combination of phenological and range shifts in one framework presents an elegant way to reconcile previous findings improving our understanding of the drivers of biodiversity loss.

      We thank Reviewer 3 for this assessment.

      Weaknesses:

      The introduction and discussion of the article would benefit from a stronger contextualization of recent studies on biological responses to climate change and the underpinning mechanism.

      The presentation of the results (particularly in figures) should be improved to address the integrative character of the work and help readers extract the main results. While the writing of the article is generally good, particularly the captions and results contain many inconsistencies and lack important detail. With the multitude of the relationships that were tested (the influence of traits) the article needs more coherence.

      We thank Reviewer 3 for these suggestions. We revised the introduction and discussion to better contextualize species’ responses to climate change and the mechanisms behind them (see details below). We carefully reviewed all figures and captions, and made changes to improve the clarity of the text and the presentation of results (see details below).

      Reviewer #1 (Recommendations for the authors):

      Comment:

      (1) Following weakness #1 in the public review, the authors should review the habitat classifications, consult with an odonatologist, and reclassify many species from Both to Lentic and redo the analysis.

      Thank you for pointing out this disagreement among expert habitat classifications that we cited and other literature. We reclassified species’ habitat preferences based on classifications by Hof et al., a source that was consistent with your suggestions, and identified additional species as Lentic that our other references had identified as Both. We performed our analysis with this new dataset and, as you suspected, our results did not change qualitatively: species habitat preferences did not predict their range shifts.

      Hof, Christian, Martin Brändle, and Roland Brandl. "Lentic odonates have larger and more northern ranges than lotic species." Journal of Biogeography 33.1 (2006): 63-70.

      Comment:

      (2) Following weakness #2, would it be worthwhile or interesting to analyze a smaller ranging group (e.g. cut the quad size in half, 50 x 50 km) to bring in more species and potentially change the inference? Or is the paper too tightly constructed to allow this, even as a secondary piece?

      Thank you for this comment, as it highlights an important consideration for macroecological analyses, and the importance of balancing multiple factors for determining quadrat size. Issues exist with identifying drivers of range boundaries among species with narrow ranges when they are analyzed separately from wide-ranging species, and examining larger quadrats can actually help clarify drivers (Szabo, Algar, and Kerr 2009). The smaller quadrats are, the higher the likelihood that the species is actually there but was never observed, or that the quadrat only covers unsuitable habitat and the species is absent from the entire (or almost entire) quadrat. Too many absences creates issues with violating model assumptions, and creates noise that makes it difficult to identify drivers of species’ range and phenology shifts.

      Moreover, we constructed extensive null models that show that range and phenology changes, respectively, are contrary to expectations that arise from sampling difference. 100km quadrats make for a reasonable “middle-ground”, and we have included a brief explanation of this in the text: “We assigned species presences to 100×100 km quadrats, a scale that is large enough to maintain adequate sampling intensity but still relevant to conservation and policy (Soroye et al., 2020), to identify the best sampled species.”  (Lines 170-172).

      Szabo, Nora D., Adam C. Algar, and Jeremy T. Kerr. "Reconciling topographic and climatic effects on widespread and range‐restricted species richness." Global Ecology and Biogeography 18.6 (2009): 735-744.

      Comment:

      (3) Following weakness #3, are specialists the ones that "failed to shift" (L18)? If so please specify. The prediction about generalists vs specialists needs to be removed or incorporated in other parts of the paper.

      Thank you for pointing this out, we intended to suggest that species with more generalist habitat requirements might be better able to shift, but ultimately found that traits did not predict species’ shifts. We corrected our prediction regarding habitat generalists as follows: “We predicted that species able to use both lentic and lotic habitats would shift their phenologies and geographies more than those able to use just one habitat type, as generalists outperform specialists as climate and land uses change (Ball-Damerow et al., 2015, 2014; Hassall and Thompson, 2008; Powney et al., 2015; Rapacciuolo et al., 2017).” (Lines 128-132).

      Comment:

      (4) Following weakness #4, cite Pinkert et al at lines 70-73 and Rocha-Ortega et al at lines 73-77 along with https://doi.org/10.1098/rspb.2019.2645. Add Sandall et al https:// doi.org/10.1111/jbi.14457 to L69 references.

      Thank you for the excellent reference suggestions, we have added them as suggested (Lines 80, 86, 77).

      Comment:

      Other comments/suggestions:

      (1) Title: consider adding temp variability 'Range geography and temperature variability, not functional traits,...'.

      Thank you for this suggestion, we have added temperature variability to the title: “Range geography and temperature variability explain cross-continental convergence in range and phenology shifts in a model insect taxon”.

      Comment:

      (2) L125: is (northern) Mexico included in North America?

      Yes, we did include observations from Northern Mexico, and have specified this in the text: “We retained ~1,100,000 records from Canada, the United States, and Northern Mexico, comprising 76 species (Figure 2).” (Lines 174-176).

      Comment:

      (3) L128: I'd label this section 'Temperature variability' rather than 'Climate data'.

      Thank you, we agree that this is a more appropriate title for this section, and have replaced ‘Climate data’ with ‘Temperature variability’ (Line 185).

      Comment:

      (4) Table 2: why are there no estimates for the traits?

      We apologise, this information should have been included in the main body of the manuscript, but was only explained in the Table 2 caption. We have added the following explanation: “Non-significant variables, specifically all functional traits, were excluded from the final models.”. (Line 312-323).

      Comment:

      (5) Figure 2: need to identify the A-D panels.

      We apologise for this error and have clarified the differences between panels in the figure caption:

      “Figure 2: Richness of 76 odonate species sampled in North America and Europe in the historic period (1980-2002; panes A and C) and the recent period (2008-2018; panes B and D). Species richness per 100 × 100 km quadrat is shown in panes A and B, while panes C and D show species richness per 200 × 200 km quadrat. Dark red indicates high species richness, while light pink indicates low species richness.” (Lines 1002-1006).

      Comment:

      (6) L163-173: I am not familiar with this analysis but it sounds interesting and promising, I am not sure if this can be clarified further. Why the -25 to 25, and -30 to 30, doesn't the -35 to 35 cover these? And what is meant by "include only phenology shifts that could be biologically meaningful", that larger shifts would not be meaningful or tied to climate change?

      We used different cutoffs for phenology shifts to inspect for outliers that were likely to be errors, potentially do to insufficient sampling to calculate phenology. We clarified in the text as follows:

      “We retained emergence estimates between March 1st and September 1st, as well as species and quadrats that showed a difference in emergence phenology of -25 to 25 days, -30 to 30 days, or -35 to 35 days between both time periods, to include only phenology shifts that could be biologically meaningful to environmental climate change (i.e. exclude errors).” (Lines 169-173).

      Comment:

      (7) L193-200: I agree but would make a distinction between ecological vs functional traits, as other studies view geographic traits as ecological manifestations of functional biology, e.g. https://doi.org/10.1016/j.biocon.2019.07.001 and https://doi.org/10.1016/ j.biocon.2023.110098.

      Thank you for this suggestion, and for making us aware of the thinking around range geographies as ecological traits. We have specified throughout the manuscript that the ‘traits’ we are considering are ‘functional traits’, changed the methods subsection title to “Range geographies and functional traits” (Line 252), and added a brief discussion of ecological traits: “Geographic range and associated climatic characteristics are often considered ecological traits, as they are consequences of functional traits and their interactions with geographic features (Bried and Rocha-Ortega, 2023; Chichorro et al., 2019).” (Lines 256-259).

      Comment:

      (8) L203: What's the rationale for egg-laying habitat as "biologically relevant to spatial and temporal responses to climate change"? That one's not as obvious as the others and needs a sentence more. Also, I am wondering why other traits were not considered here, like color lightness and voltinism. And why not wing size instead of body size, or better yet the two combined (wing loading) as a proxy for dispersal ability?

      We agree that our rationale for using this trait should be better explained, and we have included the following explanation: “Egg laying habitat was assigned according to whether species use exophytic egg-laying habitat (i.e. eggs laid in water or on land, relatively larger in number), or endophytic egg-laying habitat (i.e. eggs laid inside plants, usually fewer in number); species using exophytic habitats are associated with greater northward range limit shifts (Angert et al., 2011).” (Lines 271-275).

      We considered traits that have been found to be important for range and phenology shifts among odonates, as well as being key traits for expectations for species responses to climate change. Flight duration and body size are correlated with dispersal ability (Powney et al. 2015). Body size is also correlated with competitive ability (Powney et al. 2015), potentially making it an important predictor of a species’ ability to establish and maintain populations in expanding range areas. Traits correlated with range shifts also include breeding habitat type (Powney et al. 2015; Bowler et al. 2021) and egg laying habitat (Angert et al. 2011). Ideally, we would have used dispersal data from mark/release/recapture studies, but it was not available for many of the species included in this study. After finding that none of the functional traits we included were related to range shifts, there was no reason to believe that a further investigation of traits would be meaningful.

      Angert AL, Crozier LG, Rissler LJ, Gilman SE, Tewksbury JJ, Chunco AJ. 2011. Do species’ traits predict recent shifts at expanding range edges? Ecology Letters 14:677–689. doi:10.1111/j.1461-0248.2011.01620.x

      Bowler DE, Eichenberg D, Conze K-J, Suhling F, Baumann K, Benken T, Bönsel A, Bittner T, Drews A, Günther A, Isaac NJB, Petzold F, Seyring M, Spengler T, Trockur B, Willigalla C, Bruelheide H, Jansen F, Bonn A. 2021. Winners and losers over 35 years of dragonfly and damselfly distributional change in Germany.Diversity and Distributions 27:1353–1366. doi:10.1111/ddi.13274

      Powney GD, Cham SSA, Smallshire D, Isaac NJB. 2015. Trait correlates of distribution trends in the Odonata ofBritain and Ireland. PeerJ 3:e1410. doi:10.7717/peerj.1410

      Comment:

      (9) L210: I count at least 5 migratory species in table S3, so although maybe not enough to analyze it's misleading to say "nearly all" were non-migratory, revise to "most" or "vast majority".

      Thank you for pointing this out, we have made the suggested correction (Line 277).

      Comment:

      (10) L252-254: save this for the Discussion and write a more generalized statement for results to avoid citations in the results.

      Thank you for this suggestion, we have moved this to the discussion (Lines 517-527).

      Comment:

      (11) Figures S5 & S6: these are pretty important, I'd consider elevating them to the main document as one figure with two panels.

      Thank you for this suggestion, we agree these figures should be elevated to the main text, and have made them into a panel figure (Figure 4).

      Comment:

      (12) L305-307: great point and recommendation!

      Thank you very much for this positive feedback!

      Comment:

      (13) L335-336: another place to cite https://doi.org/10.1098/rspb.2019.2645 which includes a thermal sensitivity index and would add an odonate citation behind the statement.

      Thank you for this excellent suggestion, we have added this citation (line 480). (Rocha-Ortega et al. 2020)

      Comment:

      (14) L352-353: again see also https://doi.org/10.1098/rspb.2019.2645.

      Thank you for highlighting this reference, we have added it to Line 505 as suggested.

      Comment:

      (15) L355: revise "populations that coexist" to "species that co-occur" (big difference between population and species levels and between coexistence and co-occurrence).

      Thank you very much for pointing this out, we have made the suggested change (Line 507).

      Comment:

      (16) L359-365: are the winners and losers depicted in Figures S5 & S6? If so reference the figure (which I suggest combining and promoting to the main text), if not create a table listing the analyzed species and their winner/loser status.

      We agree that this is an excellent place to bring up Figures S5 and S6 from the supplemental. We have moved them to the main document as one figure and referenced it at line 510.

      Reviewer #2 (Recommendations for the authors):

      Comment:

      (1) Line 53-55: The claim that "These relationships generalize poorly taxonomically and geographically" is valid, but the study only tests Odonata on two continents.

      Thank you for this comment – the word ‘generalize’ may imply that our study tries to find a general pattern across many groups. We have changed the language to: “However, these relationships are inconsistent across taxa and regions, and cross-continental tests have not been attempted (Angert et al., 2011; Buckley and Kingsolver, 2012; Estrada et al., 2016; MacLean and Beissinger, 2017).” (Lines 57-59).

      Comment:

      (2) Line 58-59: Is this statement only true for Odonata? It does not seem to hold for plants, for example.

      Thank you for this comment – this statement references a meta-analysis of multiple animal and plant taxa, but the evidence for the importance of range location comes from animal taxa. We have specified that we are referring to animal species to clarify (Line 60).

      Comment:

      (3) Line 87-91: This section is difficult to understand and needs clarification.

      We have clarified this section as follows: “While warm-adapted species with more equatorial distributions could expand their ranges poleward following warming (Devictor et al., 2008), they could also increase in abundance in this new range area relative to species that historically occupied those areas and are less heat-tolerant (Powney et al., 2015).” (Lines 95-121).

      Comment:

      (4) Line 99-100: Please define "generalist" and "specialist" more clearly here (e.g., based on climate niche?).

      Thank you for pointing this out, we intended to suggest that species with more generalist habitat requirements might be better able to shift, but ultimately found that traits did not predict species’ shifts. We corrected our prediction regarding habitat generalists as follows: “We predicted that species able to use both lentic and lotic habitats would shift their phenologies and geographies more than those able to use just one habitat type, as generalists outperform specialists as climate and land uses change (Ball-Damerow et al., 2015, 2014; Hassall and Thompson, 2008; Powney et al., 2015; Rapacciuolo et al., 2017).” (Lines 128-132).

      Comment:

      (5) Line 122: Replace the English letter "X" in "100x100 km" with the correct mathematical symbol.

      We have made the suggested replacement throughout the manuscript.

      Comment:

      (6) Line 148: To address sampling effects, you could check the paper: https://onlinelibrary.wiley.com/doi/full/10.1111/gcb.15524. Additionally, maximum and minimum values are sensitive to extreme data points, so using 95% percentiles might be more robust.

      Thank you for sharing this paper, as it offers a valuable perspective on the study of species’ ranges. While our dataset is substantially composed of observations from adult sampling protocols, unlike the suggested paper which compares adults and juveniles, this is an interesting alternative approach.

      For our purposes it is meaningful to include outliers, as otherwise we may have missed individuals at the leading edge of range expansions. Our intent here was to detect range limits, as opposed to finding the central tendency of species distributions. This approach is widely accepted in the macroecology literature (i.e. Devictor et al., 2012, 2008; Kerr et al. 2015).

      We have included the following discussion of our approach in the methods section:

      “We followed widely accepted methods to determine species range boundaries (Devictor et al., 2012, 2008; Kerr et al., 2015), although other methods exist that are appropriate for different data types and research questions i.e. (Ni and Vellend, 2021). We assigned species presences to 100×100 km quadrats, a scale that is large enough to maintain adequate sampling intensity but still relevant to conservation and policy (Soroye et al., 2020), to identify the best sampled species.” (Lines 168-173).

      Kerr JT, Pindar A, Galpern P, Packer L, Potts SG, Roberts SM, Rasmont P, Schweiger O, Colla SR, Richardson LL,Wagner DL, Gall LF, Sikes DS, Pantoja A. 2015. Climate change impacts on bumblebees converge across continents. Science 349:177–180. doi:10.1126/science.aaa7031

      Soroye P, Newbold T, Kerr J. 2020. Climate change contributes to widespread declines among bumble bees across continents. Science 367:685–688. doi:10.1126/science.aax8591

      Devictor V, Julliard R, Couvet D, Jiguet F. 2008. Birds are tracking climate warming, but not fast enough.Proceedings of the Royal Society B: Biological Sciences 275:2743–2748. doi:10.1098/rspb.2008.0878

      Devictor V, van Swaay C, Brereton T, Brotons L, Chamberlain D, Heliölä J, Herrando S, Julliard R, Kuussaari M,Lindström Å, Reif J, Roy DB, Schweiger O, Settele J, Stefanescu C, Van Strien A, Van Turnhout C,

      Vermouzek Z, WallisDeVries M, Wynhoff I, Jiguet F. 2012. Differences in the climatic debts of birds and butterflies at a continental scale. Nature Clim Change 2:121–124. doi:10.1038/nclimate1347

      Comment:

      (7) Line 195: The species' climate niche should also be considered a product of evolution.

      Thank you for this suggestion. To address this comment and a comment from another reviewer, we changed the text to the following: “Geographic range and associated climatic characteristics are often considered ecological traits, as they are consequences of functional traits and their interactions with geographic features (Bried and Rocha-Ortega, 2023; Chichorro et al., 2019).” (Lines 256-259).

      Comment:

      (8) Line 244: This speculative statement belongs in the Discussion section.

      Thank you for this suggestion, we have moved this statement to the discussion (Lines 451-453).

      Comment:

      (9) Line 252-254: The projection of Coenagrion mercuriale's range contraction is not part of your results and should be clarified or removed.

      Following this suggestion and a similar suggestion from another reviewer, we moved this text to the discussion (Line 517-527).

      Comment:

      (10) Line 314-316: If the species can tolerate warmer temperatures better, why would they migrate?

      We apologize for the confusion, and we have reworded the section as follows: “Emerging mean conditions in areas adjacent to the ranges of southern species may offer opportunities for range expansions of these relative climate specialists, which can then tolerate climate warming in areas of range expansion better than more cool-adapted historical occupants (Day et al., 2018).” (Lines 445-448).

      Comment:

      (11) Line 334-335: Species' tolerance to temperature likely depends on their traits, which were not tested in this study. This should be noted.

      We agree, and we have removed the wording “rather than traits” from this sentence (Line 479).

      Reviewer #3 (Recommendations for the authors):

      Comment:

      (1) Title: The title is too general not specifying that your results are on odonates only, but also stressing the implicit role of climate change to a degree the tests do not support.

      Following this comment and a suggestion from another reviewer we changed the title to the following: “Range geography and temperature variability explain cross-continental convergence in range and phenology shifts in a model insect taxon”. We wanted to emphasize our use of Odonates as a model species that we used to ask broad questions, while being more specific about the climatic variable that we examined (temperature variability).

      Comment:

      (2) L32: consider including Novella-Fernandez et al. 2023 (NatCommun) which addresses this topic in Odonates.

      Thank you for suggesting this very interesting paper, we have added it as a citation (Line 31-32).

      Comment:

      (3) L35: consider including Grewe et al. 2013 (GEB) and Engelhardt et al. 2022(GCB).

      Thank you for these excellent suggestions, we have added the citations (Line 35).

      Comment:

      (4) L47: rather write 'result from' instead of 'driven by'.

      We agree this is a better characterization and have corrected the wording (Line 48-49).

      Comment:

      (5) L49-52: There has been a recent study on this topic for birds (Neate-Clegg et al., 2024 NEE). However, specifying this to insects would make it not less relevant. This review for odonates might be helpful in this regard (Pinkert et al.. 2022, Chapter: "Odonata as focal taxa for biological responses to climate change" IN Dragonflies & Damselflies: Córdoba-Aguilar et al. (2022) Model Organisms for Ecological and Evolutionary Research.

      Thank you for again suggesting excellent references, we have added them to line 52-53, as well as adding the Pinkert citation to lines 61 and 82.

      Comment:

      (6) L53-66: Combine into one paragraph about drivers. With traits first and the environment second. The natural land cover perspective may be too complicated in this context. Consider focusing on generalities of the impact of changes within species' ranges.

      As suggested we have combined these into one paragraph about drivers (Line 59).

      Comment:

      (7) L67-69: The book from before would be a much stronger reference for this claim. Kalkmann et al (2018) do not address the emphasis of global change research in insects on bees and butterflies. Also, I would highlight that most of the current work is at a national scale, rather than cross-continental.

      Thank you for this suggestion, we have added the suggested reference and included that “…recently assembled databases of odonate observations provide a rare opportunity to investigate species’ spatiotemporal responses at larger taxonomic and spatial scales, particularly as most work has been done at national scales.” (Lines 75-77).

      Comment:

      (8) L68: consider rephrasing this part to '..provide a rare opportunity to investigate spatiotemporal biotic responses at larger taxonomic and spatial scales'

      We appreciate this suggestion and really like the wording. We have changed the phrase to read as follows: “While global change research on insects often emphasizes butterfly and bee taxa, recently assembled databases of odonate observations provide a rare opportunity to investigate species’ spatiotemporal responses at larger taxonomic and spatial scales, particularly as most work has been done at national scales.” (Lines 74-77).

      Comment:

      (9) L69: This characteristic is not unique to odonates and would hamper drawing general conclusions. Honestly, I think the detailed and comprehensive data on them is the selling point.

      Thank you for this suggestion, we have edited the sentence to emphasize their use as an indicator species: “Due to their use of aquatic and terrestrial habitat across life different stages, dragonflies and damselflies are also considered indicator species for both terrestrial and aquatic insect responses to changing climates (Hassall, 2015; Pinkert et al., 2022; Šigutová et al., 2025), giving the study of these species broad relevance for conservation.” (Lines 78-81)

      Comment:

      (10) L73: Indicator for what? The first part of the sentence would suggest lesser surrogacy for responses of other taxa. Reconsider this statement. They are well- established indicators for habitat intactness and freshwater biodiversity. Darwell et al. suggested their diversity can serve as a surrogate for the diversity of both terrestrial and aquatic taxa.

      Thank you for this suggestion, we have edited the sentence to emphasize their use as an indicator species: “Due to their use of aquatic and terrestrial habitat across life different stages, dragonflies and damselflies are also considered indicator species for both terrestrial and aquatic insect responses to changing climates (Hassall, 2015; Pinkert et al., 2022; Šigutová et al., 2025), giving the study of these species broad relevance for conservation.” (Lines 78-81)

      Comment:

      (11) L76: Fritz et al., is a study on mammals, not odonates.

      Thank you for pointing out this error, the reference has been removed (Line 84-85).

      Comment:

      (12) L84: Lotic habitats are generally better connected than lentic ones. Lentic species are considered to have a greater propensity for dispersal DUE to the lower inherent spatiotemporal stability (implying lower connectivity) compared to lotic habitats.

      Thank you for your comment, we have rewritten this section as follows: “For example, differences in habitat connectivity and dispersal ability may constrain range shifts for lentic species (those species that breed in slow moving water like lakes or ponds) and lotic species (those living in fast moving-water) in different ways (Kalkman et al., 2018). More southerly lentic species may expand their range boundaries more than lotic species, as species accustomed to ephemeral lentic habitats better dispersers (Grewe et al., 2013), yet lotic species have also been found to expand their ranges more often than lentic species, potentially due to the loss of lentic habitat in some areas (Bowler et al., 2021).” (Lines 88-95).

      Comment:

      (13) L90: I would be cautious with this interpretation. If only part of the range is considered (here a country in the northern Hemisphere) southern species are moving more of their range into and northern species more of their range out of the study area in response to warming (implying northward shifts).

      We have clarified this section as follows: “While warm-adapted species with more equatorial distributions could expand their ranges poleward following warming (Devictor et al., 2008), they could also increase in abundance in this new range area relative to species that historically occupied those areas and are less heat-tolerant (Powney et al., 2015).” (Lines 95-121)

      Comment:

      (14) L117: Odonata Central contains many county centroids as occurrence records. These could be an issue for your use case. I may have overlooked the steps you took to address this, but I think this requires at least more detail and possibly further removal/checks using for instance CoordinateCleaner. The functions implemented in this package allow you to filter records based on political units to avoid exactly this source of error.

      Thank you for this suggestion, we weren’t aware of this issue with Odonata Central. We used the CoordinaterCleaner tool in R to filter all odonate records that we used in our analyses. Less than 1% of observations in our dataset were identified as having potential problems by the tool, so we would not expect this to affect our inferences. However, in future we will employ this tool when using similar datasets.

      Comment:

      (15) L119: Please add a brief explanation of why this was necessary. I am ok with something along the lines in the supplement.

      We moved this information from the supplemental to the main text as follows: “If a species was found on both continents, we only retained observations from the continent that was the most densely sampled. If we merged data for one species found on both continents, we could not perform a cross-continental comparison. However, if the same species on different continents was treated as different species, this would lead to uninterpretable outcomes (and the creation of pseudo-replication) in the context of phylogenetic analyses. In addition, species found on both continents did not have sufficient data to meet criteria for the phenology analysis.” (Lines 161-167).

      Comment:

      (16) L132: This is the letters 'X' or 'x' are not multiplier symbols! Please change to the math symbol (×), everywhere.

      Thank you for pointing out this error, we have made the correction throughout the manuscript.

      Comment:

      (17) L133: add 'main' before 'flight period'

      Thank you for this suggestion, we have made the change. (Line 190)

      Comment:

      (18) L135: I suggest using the coefficient of variation, as it is controlled for the mean. Otherwise, what you see is partly the signature of temperature and not of its variation. For me, it's very difficult to understand what this variation of the variation means and at least needs more explanation.

      Thank you very much for this suggestion, we agree that using the coefficient of variation is a better fit for the question that we’re asking. We re-ran out analyses with the coefficient of variation as the measure of climate variability: all the results reported in the manuscript are now updated for that analysis (Line 377, Table 2), and we have also updated the methods section (Line 191). The results are qualitatively the same to our previous analysis, but we agree that they are now easier to interpret.            

      Comment:

      (19) L155: Please adequately reference all R packages (state the name, and a reference for them including the authors' names, title, and version).

      Thank you for pointing out this omission, we have added reference information for the glm function in base R (Line 298) and ensured all other packages are properly referenced.

      Comment:

      (20) L207: Mention the literature sources here (again).

      We agree that they should be referenced here again, and we have done so (Lines 267-268).

      Comment:

      (21) L209: You could use the number of grid cells as a proxy for range size.

      Following this excellent suggestion, we re-analysed our data using range size, calculated as the number of quadrats occupied by a species in the historical time period, as a predictor. Range size was not significant in our models, but we believe this is the best way to analyze our data, and so have updated our methods (Lines 261-263) and results (375-378).

      Comment:

      (22) L218: It would be preferable to say 'species-level' instead of 'by-species'.

      Thank you for this suggestion, we agree that this is clearer and made the change (Line 298).

      Comment:

      (23) L219-220: this is unclear. Please rephrase.

      We have clarified as follows: “We used both species-level frequentist (GLM; glm function in R) and Bayesian (Markov Chain Monte Carlo generalized linear mixed model, MCMCglmm; Hadfield, 2010) models to improve the robustness of the results.” (Lines 298-300).

      Comment:

      (24) L224: At least for Europe there is a molecular phylogeny available, which you should preferably use (Pinkert et al. 2018, Ecography). Otherwise, I am ok with using what is available

      We apologize that the nature of the phylogeny that we used was not clear; the phylogeny that we used was built similarly to that in Pinkert et al. 2018, Ecography. It created a molecular phylogeny with a morphological/taxonomic tree as the backbone tree, so that species could only move within their named genera or families. We clarified this in the manuscript as follows:

      “We used the molecular phylogenetic tree published by the Odonate Phenotypic Database (Waller et al., 2019), which used a morphological and taxonomic phylogeny as the backbone tree, allowing species to move within their named genera or families according to molecular evidence (Waller and Svensson, 2017).” (Lines 302-305).

      Comment:

      (25) L233: You said so earlier (1st sentence of this paragraph).

      Thank you for pointing this out, we removed the repetitive sentence (Line 323).

      Comment:

      (26) L236-238: To me, it makes more sense to test this prior to fitting the phylogenetic models.

      MCMC-GLMM is considerably less familiar to most researchers than general linear models or there derivatives/descendants, such as PGLS. We report models both with and without phylogenetic relationships included for the sake of transparency, and we are happy to acknowledge that no interpretation here changes substantially relative to these decisions. However, failing to report models that included possible (if small) effects of phylogenetic relatedness might cause some readers to question what those models might have implied. For the moment, we are opting for the most transparent reporting approach here.

      Comment:

      (27) L241: Rather say directly XX of XX species in our data....

      (28) L245: Same here. Provide the actual numbers, please.

      Thank you for this suggestion, we made this change on Line 332 and Line 334.

      Comment:

      (29) L247-249: Then not necessary.

      This issue highlights a challenge in the global biology literature and around the issue of biodiversity monitoring for understanding global change impacts on species. Almost no studies have been able to report simultaneous range and phenology shifts, and the literature addresses these biotic responses to global change predominantly as distinct phenomena. Differences in numbers of species for which these observations exist, even among the extremely widely-observed odonates, seems to us to be a meaningful issue to report on. If the reviewer prefers that we abbreviate or remove this sentence, we are happy to do so.

      Comment:

      (30) L251:261: That is discussion as you interpret your results.

      Following your suggestion and the suggestion of another reviewer, we moved the following lines to the discussion section: “Species that did not shift their ranges northwards or advance their phenology included Coenagrion mercuriale, a European species that is listed as near threatened by the IUCN Red List (IUCN, 2021), and is projected to lose 68% of its range by 2035 (Jaeschke et al., 2013).” (Lines 517-527).

      Comment:

      (31) 252: Good to mention, but why is the discussion limited to C. mercurial?

      We feel that it is important to link the broad-scale results to the specific biological characteristics of individual species, and C. mercurial is an IUCN threatened species. We are happy to expand links to natural history of this group and have added the following: “This group also includes Coenagrion resolutum, a common North American damselfly (Swaegers et al., 2014), for which we could not find evidence of decline. This may be due in part to the greater area of intact habitat available in North American compared to Europe, enabling C. resolutum to maintain larger populations that are less vulnerable to stochastic climate events. Still, this and other species failing to shift in range or phenology should be assessed for population health, as this species could be carrying an unobserved extinction debt.” (Lines 527-533).

      Comment:

      (32) L264: Insert 'being' before 'consistently'.

      Thank you for the suggestion, we made this change (Line 373).

      Comment:

      (33) L271: .'. However,'.

      Thank you for pointing out this grammatical error, we have corrected it (Line 382).

      Comment:

      (34) L273: 'affected' instead of 'predicted'

      Thank you for the suggestion, we made this change (Line 383).

      Comment:

      (35) L279: 'despite pronounced recent warming' sounds not relevant in this context.

      Thank you for this suggestion, we removed this portion of the sentence (Line 408).

      Comment:

      (36) L281: Rather 'the model performance did not improve....'

      Thank you for the suggestion, we made this change (Line 409).

      Comment:

      (37) L288: Add 'but' before 'not'.

      Thank you for the suggestion, we made this change (Line 416).

      Comment:

      (38) L311-316: Reconsider the causality here. maybe rather rephrase to are associated instead. Greater dispersal ability and developmental plasticity might well lead to higher growth rates, rather than the other way around.

      We agree that plasticity/evolution at range edges is important to consider and have included it as an alternative explanation: “Adaptive evolution and plasticity may enable higher population growth rates in newly-colonized areas (Angert et al., 2020; Usui et al., 2023), but this possibility can only be directly tested with long term population trend data.” (Line 449-451).  

      Comment:

      (39) L313-316: Maybe delete the second 'should be able to'.

      This phrase has been changed in response to other reviewer comments and now reads as follows:

      “Emerging mean conditions in areas adjacent to the ranges of southern species may offer opportunities for range expansions of these relative climate specialists, which can then tolerate climate warming in areas of range expansion better than more cool-adapted historical occupants (Day et al., 2018).” (Lines 445-448).

      Comment:

      (40) L331: Limit this statement ending with 'in North American and European Odonata'.

      Thank you for this suggestion, we made this addition (Lines 475-476).

      Comment:

      (41) L346-347: There are too many of these more-research-is-needed statements in the discussion (at least three in the last paragraphs). Please consider finishing the paragraphs rather with a significance statement.

      Thank you for this suggestion, we have changed the final sentence here to the following: “The extent to which species’ traits actually determine rates of range and phenological shifts, rather than occasionally correlated with them, is worth considering further, but functional traits do not systematically drive patterns in these shifts among Odonates in North America and Europe.” (Lines 480-483).

      We also made additional changes, removing a ‘more-research is needed’ statement from the following paragraph (Line 443), as well as from line 499.

      Comment:

      (42) L349: See also Franke et al. (2022, Ecology and Evolution).

      Thank you for highlighting this excellent reference! We have added it to Line 501.

      Comment:

      (43) L363: Maybe a bit late in the text, but it is important to note that there is the third dimension 'abundance trends' or rather a common factor related to range and phenology shifts. I feel this fits better with the discussion of population growth.

      Thank you for this suggestion, we have addressed the importance of abundance trends in the following sentences: “Further mechanistic understanding of these processes requires abundance data.” (Lines 442-443); “It remains unclear if range and phenology shifts relate to trends in abundance, but our results suggest that there are clear ‘winners’ and ‘losers’ under climate change.” (Lines 509-510).

      Comment:

      (44) L375-377: This last sentence is very similar to L371-373. Please reduce the redundancy. Focus more on specifically stating the process instead of vaguely saying 'new insights into patterns' and 'suggesting processes'. Rather, deliver a strong concluding message here.

      Thank you for this suggestion, we feel that we now have a much stronger concluding message: “By considering both the seasonal and range dynamics of species, emergent and convergent climate change responses across continents become clear for this well-studied group of predatory insects.” (Lines 545-547).

      Comment:

      (45) Table 1: To me, the few estimates presented here do not justify a table. rather include them in the text. OR combine them with Table 2. Also, why not include the traits as predictors (from the range shift models) in these models as well?

      We have clarified in the text that the results displayed in Table 1 are from the analysis of the relationship between range and phenology shifts: “The effect of species’ range shifts on phenology range shifts was significant in our model investigating the relationship between these responses, indicating that species shifting their northern range limits to higher latitudes also showed stronger advances in their emergence phenology (Figure 3).” (Lines 341-344).

      As there were no significant effects in the model of phenology change drivers, we have not shown results of this model: “Emergence phenology shifts were not affected by species’ traits, range geography, nor climate variability; due to this, model results are not displayed here.” (Lines 383-384).

      Comment:

      (46) Table 2: L712-713: What does this mean? Are phenology shifts not used as a predictor of range shifts? (why then this comment?). Or do you want to say phenological shifts are not related to Southern range etc? Why do you present a phylosig here but not in Table 1? Why not include the traits as predictors (from the range shift models) in these models as well? Consider using the range size as a continuous predictor instead of 'Widespread'.

      We are glad the reviewer pointed this out to us. We did not emphasize this issue sufficiently. We DID evaluate traits as predictors both of geographical range and phenological shifts, and species-specific biological traits did not significantly affect models predicting either of those sets of responses. We state this on Lines 312-323, but we have also noted in the discussion (Lines 473-476) that the most commonly assessed traits, like body size, do not alter observed trends here. Instead, where species are found, rather than the characteristics of species, is the key determinant of their overall responses.

      Following this excellent suggestion, we re-analysed our data using range size, calculated as the number of quadrats occupied by a species in the historical time period, as a predictor. Range size was not significant in our models, but we believe this is the best way to analyze our data, and so have updated our methods (Lines 261-263) and results (375-378).

      Comment:

      (47) Figure 1: I don't see any grey points in the figure. Also, there is no A or B. If you are referring to the symbols then write cross and triangle instead and not use capital letters which usually refer to component plots of composite figures. Also, I highly recommend providing a similar figure based on your data (maybe each species as a dot for T1 and another symbol for T2). Given the small number of species, you could try to connect these points with arrows. For the set with only range shifts maybe play the T2-dots at the center of the 'Emergence' axis.

      Thank you for pointing out this error: a previous version of Figure 1 included grey points and multiple panels. We have removed this text from the figure caption to be consistent with the final version of the figure (Line 989).

      The graphical depictions of the conceptual and empirical discoveries in this paper were challenging to create. The reviewer might be suggesting effectively decomposing Figure 3 (change in range on the y axis vs change in phenology among all species into two sets of points on the same graph, where each pair of points is a before and after value for each species. This would make for a very busy figure indeed. We have modified the conceptual Figure 1 to illustrate more clearly, we believe, that species can (in principle) remain within tolerable niche spaces by shifting their activity periods in time (phenology) or in space (geographical range) or both.

      Comment:

      (48) Figure 2: Please add a legend. Also black is a poor background color. The maps appear to be stretched. Please check aspect ratios. Now here are capital letters without an explanation in the caption. From the context I assume the upper panel maps are for the data used to calculate range shifts at the bottom panel maps are for data used to calculate the phenological shifts.

      We apologise for the error in the figure caption and have clarified the differences between panels in the text, as well as changing the map background colour and fixing the aspect ratio:

      “Figure 2: Richness of 76 odonate species sampled in North America and Europe in the historic period (1980-2002; panes A and C) and the recent period (2008-2018; panes B and D). Species richness per 100 × 100 km quadrat is shown in panes A and B, while panes C and D show species richness per 200 × 200 km quadrat. Dark red indicates high species richness, while light pink indicates low species richness.” (Lines 1002-1006).

      Comment:

      (49) Figure 3: Why this citation? Of terrestrial taxa? Please explain. Consider adding some stats here, such as the r-squared value for each of the relationships.

      We have better explained the citation in the figure caption, as well as adding r-squared values:

      “Figure 3: Relationship between range shifts and emergence phenology shifts among North American and European odonate species (N = 66; model R2 = 17.08 for glm, 14.9% for MCMCglmm). For reference, the shaded area shows mean latitudinal range shifts of terrestrial taxa as reported by Lenoir et al. (2020; calculated as the yearly mean dispersal rate of 1.11 +/- 0.96 km per year over 38 years).” (Lines 679-682)

      Comment:

      (50) L801: What are these underscored references?

      This was an issue with the reference software and has been resolved.

      Comment:

      (51) Table S1: L848: Consider starting with 'Samples of 76 North American and European odonate species from between ...'. Please use a horizontal line to separate the content from the table header. Add a horizontal line below the last row. Same for all tables.

      Thank you for this suggestion, we have edited the caption for Figure S1 as suggested (Line 1124). We have also made the suggested line additions to Table S1, S2, and S3.

      Comment:

      (52) Table S3: This is confusing. In Table 1 (main text) both 'southern range' and 'widespread' are used as predictors. Please explain.

      We originally included information on species range geography, including southern versus northern range, and widespread versus not, into one categorical variable. Following additional comments we re-analysed our data using range size, calculated as the number of quadrats occupied by a species in the historical time period, as a predictor. Now the methods section text (Lines 261-263) and Table 1 report results of that variable with distribution options northern, southern, or both. 

      Comment:

      (53) Figure S5 and S6: It would be more coherent if the colors refer to the continents and the suborders are indicated by shading. I would love to see a combination of the two figures with species ordered by the phylogenetic relationship and a dot matrix indicating the traits in the main text! This could really be a good starting point for a synthesis figure.

      The reviewer presents an interesting challenge for us. We have a choice, as we understand things, to present a figure showing phylogeny and traits (as requested here), or an ordered list of species relative to effect sizes in the two main responses to global change. The latter choice centers on the discoveries of the paper, while the former would be valuable for dragonfly biology but would depict information that proved to be biologically uninformative relative to our discovery. That is to say, there is no phylogenetic trend and biological traits among species did not affect results. We have gone some way toward illustrating that issue by retaining phylogeny in the MCMC-GLMM models, but we feel that a figure illustrating phylogeny and traits would (for most readers, at least) illustrate noise, rather than signal. For this reason, we have opted to take on the previous reviewer’s suggestion for a modified, main-text Figure 4, which we include below.

      Figure 4: Distribution of Northern range limit shifts (Panel A, kilometers) and emergence phenology shift (Panel B, Julian day) of 76 European and North American odonate species between a recent time period (2008 - 2018) and a historical time period (1980 - 2002). Anisoptera (dragonflies) are shown in pink, Zygoptera (damselflies) are shown in blue.

      Change last: Figure 3: Relationship between range shifts and emergence phenology shifts among North American and European odonate species (N = 66; model R2 = 17.08 for glm, 14.9% for MCMCglmm). For reference, the shaded area shows mean latitudinal range shifts of terrestrial taxa as reported by Lenoir et al. (2020; calculated as the yearly mean dispersal rate of 1.11 +/- 0.96 km per year over 38 years).

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      (1) The bad equilibria of the model still remain a concern, as well as other features like the transient overshoots that do not match with the data. I think they could achieve more accuracy here by assigning more weight to such specific features, through adding these as separate objectives for the generator explicitly. The traces contain a five-second current steps, and one second before and one second after the training step. This means that in the RMSE, the current step amplitude will dominate as a feature, as this is simply the state for which the data trace contains most time-points. Note that this is further exacerbated by using the IV curve as an auxiliary objective. I believe a better exploration of specific response features, incorporated as independently weighted loss terms for the generator, could improve the fit. E.g. an auxiliary term could be the equilibrium before and after the current step, another term could penalise response traces that do not converge back to their initial equilibrium, etc.

      We thank the reviewer for the suggestion. We supplemented the membrane potential regression loss with errors computed for 3 intervals: pre- post- and mid- stimulation time intervals, improving the accuracy of EP-GAN for baseline membrane potential responses (Figure 2, 3, Table S2, S3). We also changed the simulation protocols for generated parameters by allowing a longer simulation time of 15 seconds, where the stimulation is applied during [5, 10] seconds and no stimulation at t = [0, 5) (pre-stimulation) and t = (10, 15] (post-stimulation). These time intervals are chosen to ensure sufficient stabilization periods before and after stimulation.  

      (2) The explanation of what the authors mean with 'inverse gradient operation' is clear now. However, this term is mathematically imprecise, as the inverse gradient does not exist because the gradient operator is not injective. The method is simply forward integration under the assumption that the derivate of the voltage is known at the grid time-points, and should be described as such.

      We thank the reviewer for the clarification on inverse gradient operation terminology. In the Methods section, we changed the term describing the inverse gradient operation to ‘forward integration’ which is a more accurate description describing the process.

      (3) I appreciate that the authors' method provides parameters of models at a minimal computational cost compared to running an evolutionary optimization for every new recording. I also believe that with some tweaking of the objective, the method could improve in accuracy. However, I share reviewer 2's concerns that the evolutionary baseline methods are not sufficiently explored, as these methods have been used to successfully fit considerably more complex response patterns. One way out of the dilemma is to show that the EP-GAN estimated parameters provide an initial guess that considerably narrows the search space for the evolutionary algorithm. In this context, the authors should also discuss the recent gradient based methods such as Deistler et al. (https://doi.org/10.1101/2024.08.21.608979) or Jones et al (https://doi.org/10.48550/arXiv.2407.04025).

      We supplemented the optimization setup for existing methods (GDE3, NSDE, DEMO, and NSGA2) by incorporating steady-state response constraints as the initial selection process. The process is similar to that of EP-GAN training data generation and DEMO parameter selection process [16] (see Results section, page 6 for detail). We also expanded the testing scenarios by evaluating all methods with respect to both small and large HH-model estimation. The small HH-model scenario estimates 47 parameters consisting of channel conductance, reversal potentials and initial conditions with the channel parameters (n = 129) frozen to default values in [41]. Large HH-model includes estimating channel parameters (i.e. 129) in addition to the 47 parameters by considering +-50% variations from their default values. For both small and large HH-model scenarios, we test total sample sizes of both 32k and 64k for all methods to evaluate their scalability with the number of simulated samples given during optimization. The results show that existing methods show good performances for small HH-model scenarios that scale with sample size consistent with literature. EP-GAN on the other hand shows overall better performance in predicting membrane potential responses on both small and large HH-model scenarios.  

      Reviewer #2 (Public review):

      Major 1: Models do not faithfully capture empirical responses. While the models generated with EPGAN reproduce the average voltage during current injections reasonably well, the dynamics of the response are generally not well captured. For example, for the neuron labeled RIM (Figure 2), the most depolarized voltage traces show an initial 'overshoot' of depolarization, i.e. they depolarize strongly within the first few hundred milliseconds but then fall back to a less depolarized membrane potential. In contrast, the empirical recording shows no such overshoot. Similarly, for the neuron labeled AFD, all empirically recorded traces slowly ramp up over time. In contrast, the simulated traces are mostly flat. Furthermore, all empirical traces return to the pre-stimulus membrane potential, but many of the simulated voltage traces remain significantly depolarized, far outside of the ranges of empirically observed membrane potentials. The authors trained an additional GAN (EPGAN Extended) to improve the fit to the resting membrane potential. Interestingly, for one neuron (AWB), this improved the response during stimulation, which now reproduced the slowly raising membrane potentials observed empirically, however, the neuron still does not reliably return to its resting membrane potential. For the other two neurons, the authors report a decrease in accuracy in comparison to EP-GAN. While such deviations may appear small in the Root mean Square Error (RMSE), they likely indicate a large mismatch between the model and the electrophysiological properties of the biological neuron. The authors added a second metric during the revision - percentages of predicted membrane potential trajectories within empirical range. I appreciate this additional analysis. As the empirical ranges across neurons are far larger than the magnitude of dynamical properties of the response ('slow ramps', etc.), this metric doesn't seem to be well suited to quantify to which degree these dynamical properties are captured by the models.

      We made improvements to the training data generation and architecture of EP-GAN to improve its overall accuracy with predicted membrane potential responses. In particular, we divided training data generation into three neuron types found in C. elegans non-spiking neurons: 1) Transient outward rectifier, 2) Outward rectifier and 3) Bistable [8, 16]. Each randomly generated training sample is categorized into one of 3 types by evaluating its steady-state currents with respect to experimental dI/dV bound constraints (See generating training data section under Methods for more detail). The process is then followed by imposing minimum-maximum constraints on simulated membrane potential responses. The setup allows generations of training samples that are of closer distribution to experimentally recorded neurons. This is further described in Section Methods page 15 in the revised manuscript.

      We also improved the EP-GAN training process by incorporating random masking of input membrane potential responses. The masking forces EP-GAN to make predictions even with missing voltage traces, improving overall accuracy and allowing EP-GAN to use membrane potential inputs with arbitrary clamping protocol (see Methods page 13 for more detail). For the training loss functions, we further supplemented the membrane potential regression loss with errors computed for 2 intervals: pre- and post-stimulation time intervals to improve EP-GAN prediction capabilities for baseline membrane potentials.

      Taken together, these modifications improved EP-GAN’s overall ability to better capture empirical membrane potential responses and we show the results in Figure 2 – 5, Table S2, S3.

      Major 2: Comparison with other approaches is potentially misleading. Throughout the manuscript, the authors claim that their approach outperforms the other approaches tested. But compare the responses of the models in the present manuscript (neurons RIM, AFD, AIY) to the ones provided for the same neurons in Naudin et al. 2022 (https://doi.org/10.1371/journal. pone.0268380). Naudin et al. present models that seem to match empirical data far more accurately than any model presented in the current study. Naudin et al. achieved this using DEMO, an algorithm that in the present manuscript is consistently shown to be among the worst of all algorithms tested. I therefore strongly disagree with the authors claim that a "Comparison of EP-GAN with existing estimation methods shows EP-GAN advantage in the accuracy of estimated parameters". This may be true in the context of the benchmark performed in the study (i.e., a condition of very limited compute resources - 18 generations with a population size of 600, compare that to 2000 generations recommended in Naudin et al.), but while EP-GAN wins under these specific conditions (and yes, here the authors convincingly show that their EP-GAN produces by far the best results!), other approaches seem to win with respect to the quality of the models they can ultimately generate.

      We thank the reviewer for the feedback regarding the comparison with existing methods. We have revised the optimization setup for existing methods (GDE3, NSDE, DEMO, and NSGA2) by incorporating steady-state response constraints as the initial selection process. The process is similar to that of EP-GAN training data generation and DEMO parameter selection process [16] (see Results section, page 6 for detail). Incorporating this process has improved the accuracy of existing methods especially for small HH-model scenarios where DEMO stood out with the best performance alongside NSGA2 (Figure 5, Table 1, 2).

      We also expanded the testing scenarios by evaluating all methods with respect to both small and large HH-model estimation. The small HH-model scenario estimates 47 parameters consisting of channel conductance, reversal potentials and initial conditions with the channel parameters (n = 129) frozen to default values in [41]. Large HH-model includes estimating channel parameters (i.e. 129) in addition to the 47 parameters by considering +-50% variations from their default values. For both small and large HH-model scenarios, we test total sample sizes of both 32k and 64k for all methods to evaluate their scalability with the number of simulated samples given during optimization. The results show that existing methods show good performances for small HH-model scenarios that scale with sample size. EP-GAN on the other hand shows overall better performance in predicting membrane potential responses on both small and large HH-model scenarios. 

      In particular, with extended membrane potential error including pre-, mid- , post-activation periods, EP-GAN (trained with 32k samples, large HH-model, 9 neurons) mean membrane potential responses error of 2.82mV was lower than that of DEMO (12.2mV, 64k samples) trained on identical setup (Table 2) and DEMO (7.78mV, using 36,000k samples, 3 neurons) applied to simpler HHmodel in [16]. With respect to DEMO performance in [16], under identical simulation protocol (i.e., no stimulation during (0, 5s), (10, 15s) and stimulation during (5, 10s)), EP-GAN predicted RIM (large HH-model) showed membrane potential accuracy on par with that of DEMO (simpler HH-model) and EP-GAN predicted AFD showed better accuracy for post-activation membrane potential response where DEMO predicted membrane potentials overshoot above the baseline (not shown in the paper).

      Major 3: As long as the quality of the models generated by the EP-GAN cannot be significantly improved, I am doubtful that it indeed can contribute to the 'ElectroPhysiome', as it seems likely that dynamics that are currently poorly captured, like slow ramps, or the ability of the neuron to return to its resting membrane potential, will critically affect network computations. If the authors want to motivate their study based on this very ambitious goal, they should illustrate that single neuron model generation with their approach is robust enough to warrant well-constrained network dynamics. Based on the currently presented results, I find the framing of the manuscript far too bold.

      We thank the reviewer for the feedback regarding the paper's scope. With revised methods, the overall quality of EP-GAN models is improved with the most significant improvements in baseline membrane potential accuracy. While high quality neuron models could be attained with existing methods given sufficient sample size, our results suggest EP-GAN can predict models with enhanced quality with significantly fewer sample size without a need for retraining, thus complementing the main drawback of evolutionary based methods. While EP-GAN still has limitations (e.g., difficulty in predicting slow ramps) that need to be addressed in the future, we believe its overall performance combined with fast inference speed and flexibility in its input data format (e.g., missing membrane potential traces) is a step forward in the large-scale neuron modeling tasks that can contribute to network models.   

      Major 4: The conclusion of the ablation study 'In addition the architecture of EP-GAN permits inference of parameters even when partial membrane potential and steady-state currents profile are given as inputs' does not seem to be justified given the voltage traces shown in Figure 3. For example, for RIM, the resting membrane potential stays around 0 mV, but all empirical traces are around -40mV. For AFD, all simulated traces have a negative slope during the depolarizing stimuli, but a positive slope in all empirically observed traces. For AIY, the shape of hyperpolarized traces is off. While it may be that by their metric neurons in the 25% category are classified as 'preserving baseline accuracy', this doesn't seem justified given the voltage traces presented in the manuscript. It appears the metric is not strict enough.

      We improved EP-GAN’s training process by incorporating random masking of input membrane potential responses. The masking forces EP-GAN to make predictions even with missing voltage traces, improving overall accuracy and allowing EP-GAN to use membrane potential inputs with arbitrary clamping protocol.

      Such input masking during training has improved the results with ablation studies where EP-GAN now retains baseline membrane potential error (3.3mV, averaged across pre-, mid-, post-activation periods) up to 50% of membrane potential inputs remaining (3.5mV) and up to 25% of steady-state currents remaining (3.5mV).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The manuscript "Drosophila Visuomotor Integration: An Integrative Model and Behavioral Evidence of Visual Efference Copy" provides an integrative model of the visuomotor control in Drosophila melanogaster. This model presents an experimentally derived model based on visually evoked wingbeat pattern recordings of three strategically selected visual stimulus types with well-established behavioral response characteristics. By testing variations of these models, the authors demonstrate that the virtual model behavior can recapitulate the recorded wing beat behavioral results and those recorded by others for these specific stimuli when presented individually. Yet, the novelty of this study and their model is that it allows predictions for natural visual scenes in which multiple visual stimuli occur simultaneously and may have opposite or enhancing effects on behavior. Testing three models that would allow interactions of these visual modalities, the authors show that using a visual efference copy signal allows visual streams to interact, replicating behavior recorded when multiple stimuli are presented simultaneously. Importantly, they validated the prediction of this model in real flies using magnetically tethered flies, e.g., presenting moving bars with varying backgrounds. In conclusion, the presented manuscript presents a commendable effort in developing and demonstrating the validity of a mixture model that allows predictions of the behavior of Drosophila in natural visual environments.

      Strengths:

      Overall, the manuscript is well-structured and clear in its presentation, and the modeling and experimental research are methodically conducted and illustrated in visually appealing and easy-to-understand figures and their captions.

      The manuscript employs a thorough, logical approach, combining computational modeling with experimental behavioral validation using magnetically tethered flies. This iterative integration of simulation and empirical behavioral evidence enhances the credibility of the findings.

      The associated code base is well documented and readily produces all figures in the document.

      Suggestions:

      However, while the experiments provide evidence for the use of a visual efference copy, the manuscript would be even more impressive if it presented specific predictions for the neural implementation or even neurophysiological data to support this model. Or, at the very least, a thorough discussion. Nonetheless, these models and validating behavioral experiments make this a valuable contribution to the field; it is well executed and addresses a significant gap in the modeling of fly behavior and holistic understanding of visuomotor behaviors.

      We appreciate the reviewer’s thoughtful comments on the strengths and weaknesses of our manuscript. We agree that biophysically realistic model reflecting the structure of neural circuits as well as physiological data from them would be invaluable. However, we are currently unable to provide physiological evidence for EC-based suppression, nor provide circuit architecture for efference copy-based suppression of the stability circuit because the neural pathway underlying this behavior remains unidentified. Extensive recordings from the HS/VS system have revealed cell-type-specific motor-related inputs during both spontaneous and loom-evoked flight turns (Fenk et al., 2021; Kim et al., 2017, 2015). These studies predicted suppression of the optomotor stability response during such turns, and our new experiments confirmed this suppression specifically during loom-evoked turns (Figures 5, 6). However, these neurons are primarily involved in the head optomotor response, not the body optomotor response. We hope to extend our current model in future studies to incorporate more cellular-level detail, as the feedforward circuits underlying stability behavior become more clearly defined.

      Here are a few points that should be addressed:

      (1) The biomechanics block (Figure 2) should be elaborated on, to explain its relevance to behavior and relation to the underlying neural mechanisms.

      We appreciate this suggestion. The mathematical representation of the biomechanics block has been developed by other groups in previous studies (Fry et al., 2003; Ristroph et al., 2010). We used exactly the same model, and its parameters were identical to those used in one of those studies (Fry et al., 2003; Ristroph et al., 2010), in which the parameters were estimated from the stabilizing response in response to magnetic “stumbling” pulses. In the previous version of the manuscript, we had a description of the biomechanics block in the Method section (see Equation 4). In response to the reviewer’s comment, we have made a few changes in Figure 2A and expanded the associated description in the main text, as follows.

      (Line 160) “To test the orientation behavior of the model, we developed an expanded model, termed “virtual fly model” hereafter. In this model, we added a biomechanics block that transforms the torque response of the fly to the actual heading change according to kinematic parameters estimated previously (Michael H Dickinson, 2005; Ristroph et al., 2010) (Figure 2A, see Equation 4 in Methods and Movie S1). The virtual fly model, featuring position and velocity blocks that are conditioned on the type of the visual pattern, can now change its body orientation, simulating the visual orientation behavior of flies in the free flight condition.”

      (2) It is unclear how the three integrative models with different strategies were chosen or what relevance they have to neural implementation. This should be explained and/or addressed.

      Thank you for this valuable comment. We selected the three models based on previous studies investigating visuomotor integration across multiple species, under conditions where multiple sensory cues are presented simultaneously.

      The addition-only model represents the simplest hypothesis, analogous to the “additive model” proposed by Tom Collett in his 1980 study (Collett, 1980). We used this model as a baseline to illustrate behavior in the absence of any efference copy mechanism. Notably, some modeling studies have proposed linear (additive) integration for multimodal sensory cues at the behavioral level (Liu et al., 2023; Van der Stoep et al., 2021). However, experimental evidence demonstrating strictly linear integration—either behaviorally or physiologically—remains limited. In our study, new data (Figure 5) show that bar-evoked and background movement-evoked locomotor responses are combined linearly, supporting the addition-only model.

      The graded efference copy model has been most clearly demonstrated in the cerebellum-like circuit of Mormyrid fish during electrosensation (Bell, 1981; Kennedy et al., 2014). In this system, the efference copy signal forms a negative image of the predicted reafferent input and undergoes plastic changes as the environment changes—an idea that inspired our modifiable efference copy model (Figure 4–figure supplement 1). The all-or-none efference copy model is exemplified in the sensory systems of smaller organisms, such as the auditory neurons of crickets during stridulation (Poulet and Hedwig, 2006). Notably, in crickets, the motor-related input is referred to as corollary discharge rather than efference copy. Typically, “efference copy” refers to a graded, subtractive motor-related signal, while “corollary discharge” denotes an all-or-none signal, both counteracting the sensory consequences of self-generated actions. In this manuscript, we use the term efference copy more broadly, encompassing both types of motor-related feedback signals (Sommer and Wurtz, 2008).

      In response to this comment, we have made the following changes in the main text to enhance its accessibility to general readers.

      (Line#268) “This integration problem has been studied across animal sensory systems, typically by analyzing motor-related signals observed in sensory neurons (Bell, 1981; Collett, 1980; Kim et al., 2017; Poulet and Hedwig, 2006). Building on the results of these studies, we developed three integrative models. The first model, termed the “addition-only model”, assumes that the outputs of the object (bar) and the background (grating) response circuits are summed to control the flight orientation (Figure 4B, see Equation 14 in Methods).”

      (Line#272) “In the second and third models, an EC is used to set priorities between different visuomotor circuits (Figure 4C,D). In particular, the EC is derived from the object-induced motor command and sent to the object response system to nullify visual input associated with the object-evoked turn (Bell, 1981; Collett, 1980; Poulet and Hedwig, 2006). These motor-related inputs fully suppress sensory processing in some systems (Poulet and Hedwig, 2006), whereas in others they selectively counteract only the undesirable components of the sensory feedback (Bell, 1981; Kennedy et al., 2014).”

      (3) There should be a discussion of how the visual efference could be represented in the biological model and an evaluation of the plausibility and alternatives.

      Thank you for this helpful comment. We have now added the following discussion to share our perspective on the circuit-level implementation of the visual efference copy in Drosophila.

      (Line#481) “Efference copy in Drosophila vision

      Under natural conditions, various visual features in the environment may concurrently activate multiple motor programs. Because these may interfere with one another, it is crucial for the central brain to coordinate between the motor signals originating from different sensory circuits. Among such coordination mechanisms, the EC mechanisms were hypothesized to counteract so-called reafferent visual input, those caused specifically by self-movement (Collett, 1980; von Holst and Mittelstaedt, 1950). Recent studies reported such EC-like signals in Drosophila visual neurons during spontaneous as well as loom-evoked flight turns (Fenk et al., 2021; Kim et al., 2017, 2015). One type of EC-like signals were identified in a group of wide-field visual motion-sensing neurons that were shown to control the neck movement for the gaze stability (Kim et al., 2017). The EC-like signals in these cells were bidirectional depending on the direction of flight turns, and their amplitudes were quantitatively tuned to those of the expected visual input across cell types. Although amplitude varies among cell types, it remains inconclusive whether it also varies within a given cell type to match the amplitude of expected visual feedback, thereby implementing the graded EC signal. A more recent study examined EC-like signal amplitude in the same visual neurons for loom-evoked turns, across events (Fenk et al., 2021). Although the result showed a strong correlation between wing response and the EC-like inputs, the authors pointed that this apparent correlation could stem from noisy measurement of all-or-none motor-related inputs.

      Thus, these studies did not completely disambiguate between graded vs. all-or-none EC signaling. Another type of EC-like signals observed in the visual circuit tuned to a moving spot exhibited characteristics consistent with all-or-none EC. That is, it entirely suppressed visual signaling, irrespective of the direction of the self-generated turn (Kim et al., 2015; Turner et al., 2022). 

      Efference-copy (EC)–like signals have been reported in several Drosophila visual circuits, yet their behavioral role remains unclear. Indirect evidence comes from a behavioral study showing that the dynamics of spontaneously generated flight turns were unaffected by unexpected background motion (Bender and Dickinson, 2006a). Likewise, our behavioral experiments showed that, during loom-evoked turns, responses to background motion are suppressed in an all-or-none manner (Figures 6 and 7). Consistent with this, motor-related inputs recorded in visual neurons exhibit nearly identical dynamics during spontaneous and loom-evoked turns (Fenk et al., 2021). Together, these behavioral and physiological parallels support the idea that a common efference-copy mechanism operates during both spontaneous and loom-evoked flight turns.

      Unlike loom-evoked turns, bar-evoked turn dynamics changed in the presence of moving backgrounds (Figure 5), a result compatible with both the addition-only and graded EC models. However, when the static background was updated just before a bar-evoked turn—thereby altering the amplitude of optic flow—the turn dynamics remained unaffected (Figures 5 and 7), clearly contradicting the addition-only model. Thus, the graded EC model is the only one consistent with both findings. If a graded EC mechanism were truly at work, however, an unexpected background change should have modified turn dynamics because of the mismatch between expected and actual visual feedback (Figure 4–figure supplement 1)—yet we detected no such effect at any time scale examined (Figure 7–figure supplement 1). This mismatch would be ignored only if the amplitude of the graded EC adapted to environmental changes almost instantaneously—a mechanism that seems improbable given the limited computational capacity of the Drosophila brain. In electric fish, for example, comparable adjustments take more than 10 minutes (Bell, 1981; Muller et al., 2019). Further investigation is needed to clarify how reorienting flies ignore optic flow generated by static backgrounds, potentially by engaging EC mechanisms not captured by the models tested in this study.

      Why would Drosophila rely on the all-or-none EC mechanism instead of the graded one for loom-evoked turns? A graded EC must be adjusted adaptively depending on the environment, as the amplitude of visual feedback varies with both the dynamics of self-generated movement and environmental conditions (e.g., empty vs. cluttered visual backgrounds) (Figure 4—figure supplement 1). Recent studies on electric fish have suggested that a large array of neurons in a multi-layer network is crucial for generating a modifiable efference copy signal matched to the current environment (Muller et al., 2019). Given their small-sized brain, flies might opt for a more economical design for suppressing unwanted visual inputs regardless of the visual environment. Circuits mediating such a type of EC were identified in the cricket auditory system during stridulation (Poulet and Hedwig, 2006), for example. Our study strongly suggests the existence of a similar circuit in the Drosophila visual system. 

      We tested the hypothesis that efference-copy (EC) signals guide action selection by suppressing specific visuomotor reflexes when multiple visual features compete. An alternative motif with a similar function is mutual inhibition between motor pathways (Edwards, 1991; Mysore and Kothari, 2020). In Drosophila, descending neurons form dense lateral connections (Braun et al., 2024), offering a substrate for such competitive interactions. Determining whether—and how—EC and mutual inhibition operate will require recordings from the neurons that ensure visual stability, which remain unidentified. Mapping these pathways and assessing how they are modulated by visual and behavioral context are important goals for future work.”

      Reviewer #2 (Public Review):

      It has been widely proposed that the neural circuit uses a copy of motor command, an efference copy, to cancel out self-generated sensory stimuli so that intended movement is not disturbed by the reafferent sensory inputs. However, how quantitatively such an efference copy suppresses sensory inputs is unknown. Here, Canelo et al. tried to demonstrate that an efference copy operates in an all-or-none manner and that its amplitude is independent of the amplitude of the sensory signal to be suppressed. Understanding the nature of such an efference copy is important because animals generally move during sensory processing, and the movement would devastatingly distort that without a proper correction. The manuscript is concise and written very clearly. However, experiments do not directly demonstrate if the animal indeed uses an efference copy in the presented visual paradigms and if such a signal is indeed non-scaled. As it is, it is not clear if the suppression of behavioral response to the visual background is due to the act of an efference copy (a copy of motor command) or due to an alternative, more global inhibitory mechanism, such as feedforward inhibition at the sensory level or attentional modulation. To directly uncover the nature of an efference copy, physiological experiments are necessary. If that is technically challenging, it requires finding a behavioral signature that unambiguously reports a (copy of) motor command and quantifying the nature of that behavior.

      We thank the reviewer for this insightful and constructive comment. We agree that our current behavioral evidence does not directly identify the underlying circuit mechanism, and that direct recordings from visual neurons modulated by an efference copy would be critical for distinguishing between potential mechanisms.

      A prerequisite for such physiological investigations would be the identification of both (1) the feedforward neurons directly involved in the optomotor response, and (2) the neurons conveying motor-related signals to the optomotor circuit. Despite efforts by several research groups, the location of the feedforward circuit mediating the optomotor response remains elusive. This limitation has prevented us from obtaining direct cellular evidence of flight turn-associated suppression of optomotor signaling.

      In light of the reviewer’s suggestion, we expanded our investigation to strengthen the behavioral evidence for efference copy (EC) mechanisms. In addition to our earlier experiments involving unexpected changes in the static background, we examined how object-evoked flight turns influence the optomotor stability reflex and vice versa (Figures 5 and 6). To quantify the interaction between different visuomotor behaviors, we systematically varied the temporal relationship between two types of visual motion—loom versus moving background, or moving bar versus moving background—and measured the resulting behavioral responses.

      Our findings support pattern- and time-specific suppressive mechanisms acting between flight turns associated with the different visual patterns. Specifically:

      The responses to a moving bar and a moving background add linearly, even when presented in close temporal proximity.

      Loom-evoked turns and the optomotor stability reflex mutually suppress each other in a time-specific manner.

      For both loom- and moving bar-evoked flight turns, changes in the static background had no measurable effect on the dynamics of the object-evoked responses.

      These results provide a detailed behavioral characterization of a suppressive interaction between distinct visuomotor responses. This, in turn, offers correlative evidence supporting the involvement of an efference copy-like mechanism acting on the visual system. While similar efference copy mechanisms have been documented in other parts of the visual system, we acknowledge that our findings do not exclude alternative explanations. In particular, it is still possible that lateral inhibition within the central brain or ventral nerve cord contributes to the suppression we observed.

      Ultimately, definitive proof will require identifying the specific neurons that convey efference copy signals and demonstrating that silencing these neurons abolishes the behavioral suppression. Until such experiments are feasible, our behavioral approach provides an important contribution toward understanding the nature of sensorimotor integration in this system.

      Reviewer #3 (Public Review):

      Summary:

      Canelo et al. used a combination of mathematical modeling and behavioral experiments to ask whether flies use an all-or-none EC model or a graded EC model (in which the turn amplitude is modulated by wide-field optic flow). Particularly, the authors focus on the bar-ground discrimination problem, which has received significant attention in flies over the last 50-60 years. First, they use a model by Poggio and Reichardt to model flight response to moving small-field bars and spots and wide-field gratings. They then simulate this model and compare simulation results to flight responses in a yaw-free tether and find generally good agreement. They then ask how flies may do bar-background discrimination (i.e. complex visual environment) and invoke different EC models and an additive model (balancing torque production due to background and bar movement). Using behavioral experiments and simulation supports the notion that flies use an all-or-none EC since flight turns are not influenced by the background optic flow. While the study is interesting, there are major issues with the conceptual framework.

      Strengths:

      They ask a significant question related to efference copies during volitional movement.

      The methods are well detailed and the data (and statistics) are presented clearly.

      The integration of behavioral experiments and mathematical modeling of flight behavior.

      The figures are overall very clear and salient.

      Weaknesses:

      Omission of saccades: While the authors ask a significant question related to the mechanism of bar-ground discrimination, they fail to integrate an essential component of the Drosophila visuomotor responses: saccades. Indeed, the Poggio and Reichardt model, which was developed almost 50 years ago, while appropriate to study body-fixed flight, has a severe limitation: it does not consider saccades. The authors identify this major issue in the Discussion by citing a recent switched, integrate-and-fire model (Mongeau & Frye, 2017). The authors admit that they "approximated" this model as a smooth pursuit movement. However, I disagree that it is an approximation; rather it is an omission of a motor program that is critical for volitional visuomotor behavior. Indeed, saccades are the main strategy by which Drosophila turn in free flight and prior to landing on an object (i.e. akin to a bar), as reported by the Dickinson group (Censi et al., van Breugel & Dickinson [not cited]). Flies appear to solve the bar-ground discrimination problem by switching between smooth movement and saccades (Mongeau & Frye, 2017; Mongeau et al., 2019 [not cited]). Thus, ignoring saccades is a major issue with the current study as it makes their model disconnected from flight behavior, which has been studied in a more natural context since the work of Poggio.

      Thank you for this helpful comment. We agree that including saccadic turns is essential and qualitatively improves the model. In the revised manuscript, we therefore expanded our bar-tracking model to incorporate an integrate-and-saccade strategy, now presented in Figure 2—figure supplement

      The manuscript now introduces this result as follows:

      (Line#190) “Finally, one important locomotion dynamics that a flying Drosophila exhibits while tracking an object is a rapid orientation change, called a “saccade” (Breugel and Dickinson, 2012; Censi et al., 2013; Heisenberg and Wolf, 1979). For example, while tracking a slowly moving bar, flies perform relatively straight flights interspersed with saccadic flight turns (Collett and Land, 1975; Mongeau and Frye, 2017). During this behavior, it has been proposed that visual circuits compute an integrated error of the bar position with respect to the frontal midline and triggers a saccadic turn toward the bar when the integrated value reaches a threshold (Frighetto and Frye, 2023; Mongeau et al., 2019; Mongeau and Frye, 2017). We expanded our bar fixation model to incorporate this behavioral strategy (Figure 2--figure supplement 2). The overall structure of the modified model is akin to the one proposed in a previous study (Mongeau and Frye, 2017), and the amplitude of a saccadic turn was determined by the sum of the position and velocity functions (Figure 2--figure supplement 2A; see Equation 13 in Methods). When simulated, our model successfully reproduced experimental observations of saccade dynamics across different object velocities (Figure 2--figure supplement 2B-D) (Mongeau and Frye, 2017). Together, our models faithfully recapitulated the results of previous behavioral observations in response to singly presented visual patterns (Collett, 1980; Götz, 1987; H. Kim et al., 2023; Maimon et al., 2008; Mongeau and Frye, 2017).”

      Apart from Figures 1 and 2, most of our data—whether from simulations or behavioral experiments—use brief visual patterns lasting 200 ms or less. These stimuli trigger a single, rapid orientation change reminiscent of a saccadic flight turn. In this part of the paper, we essentially have examined how multiple visuomotor pathways interact to determine the direction of object-evoked turns when several visual patterns occur simultaneously.

      Critically, recent work showed that a group of columnar neurons (T3) appear specialized for saccadic bar tracking through integrate-and-fire computations, supporting the notion of parallel visual circuits for saccades and smooth movement (Frighetto & Frye, 2023 [not cited]).

      Thanks for bringing up this critical issue. We have now added this paper in the following part of the manuscript.

      (Line#193) “During this behavior, it has been proposed that visual circuits compute an integrated error of the horizontal bar position with respect to the frontal midline and triggers a saccadic turn toward the bar when the integrated value reaches a threshold (Frighetto and Frye, 2023; Mongeau and Frye, 2017).”

      (Line#462) “Visual systems extract features from the environment by calculating spatiotemporal relationships of neural activities within an array of photoreceptors. In Drosophila, these calculations occur initially on a local scale in the peripheral layers of the optic lobe (Frighetto and Frye, 2023; Gruntman et al., 2018; Ketkar et al., 2020).”

      A major theme of this work is bar fixation, yet recent work showed that in the presence of proprioceptive feedback, flies do not actually center a bar (Rimniceanu & Frye, 2023). Furthermore, the same study found that yaw-free flies do not smoothly track bars but instead generate saccades. Thus prior work is in direct conflict with the work here. This is a major issue that requires more engagement by the authors.

      Thank you for your thoughtful comments and for drawing our attention to this important paper. In our experiments, bar fixation on oscillating vertical objects emerges during the “alignment” phase of the magneto-tether protocol. The pattern movement dynamics was similar those used by Rimniceanu & Frye (2023), yet the two studies differ in a key respect: Rimniceanu & Frye employed a motion-defined bar, whereas we presented a dark vertical bar against a uniform or random-dot background. The alignment success rate—defined as the proportion of trials in which the fly’s body angle is within ±25° of the target—was about 50 % (data not shown). Our alignment pattern consisted of three vertical stripes spanning ~40° horizontally; when we replaced it with a single, narrower stripe, the success rate was lowered (data not shown). These observations suggest that bar fixation in the magnetically tethered assay is less robust than in the rigid-tethered assay, although flies still orient toward highly salient vertical objects.

      We also observed that bar-evoked turns were elicited more reliably when the bar moved rapidly (45° in 200 ms) in the magneto-tether assay, although the turn magnitude was significantly smaller than the actual bar displacement (Figure 3).

      In response to the reviewer’s comment, we now added the following description in the paper regarding the bar fixation behavior, citing Rimniceanu&Frye 2023.

      (Line#239) “Another potential explanation arises from recent studies demonstrating that proprioceptive feedback provided during flight turns in a magnetically tethered assay strongly dampens the amplitude of wing and head responses (Cellini and Mongeau, 2022; Rimniceanu et al., 2023).”

      Relevance of the EC model: EC-related studies by the authors linked cancellation signals to saccades (Kim et al, 2014 & 2017). Puzzlingly, the authors applied an EC model to smooth movement, when the authors' own work showed that smooth course stabilizing flight turns do not receive cancellation signals (Fenk et al., 2021). Thus, in Fig. 4C, based on the state of the field, the efference copy signal should originate from the torque commands to initiate saccades, and not from torque to generate smooth movement. As this group previously showed, cancellation signals are quantitatively tuned to that of the expected visual input during saccades. Importantly, this tuning would be to the anticipated saccadic turn optic flow. Thus the authors' results supporting an all-or-none model appear in direct conflict with the author's previous work. Further, the addition-only model is not particularly helpful as it has been already refuted by behavioral experiments (Rimneceanu & Frye, Mongeau & Frye).

      Thank you for this constructive comment. Efference copy is best established for brief, discrete actions like flight saccades. While motor-related modulation of visual processing has been reported across short- and long-duration behaviours (Chiappe et al., 2010; Fujiwara et al., 2017; Kim et al., 2015, 2017; Maimon et al., 2010; Turner et al., 2022), only flight saccade-associated signals exhibit the temporal profile appropriate to cancel reafferent input. However, von Holst & Mittelstaedt (1950) originally formulated efference copy to explain the smooth optomotor response of hoverflies. In HS/VS recordings in previous studies, however, we could not detect membrane-potential changes tied to baseline wing-beat amplitude (data not shown), but further work is needed. 

      Note that visually evoked flight turns analyzed in this paper have relatively fast dynamics. Fenk et al. (2021) showed that HS cells carry EC-like motor signals during both loom-evoked turns and spontaneous saccades. Building on this, we tested whether object-evoked rapid turns modulate other visuomotor pathways. Although Fenk et al. also found that optomotor turns lack motor input to HS cells, the authors did not test whether the optomotor pathway suppresses other reflexes, such as loom-evoked turns. Our new behavioral data (Figure 6) show that optomotor turns indeed suppress loom-evoked turns, suggesting a potential EC signal arising from the optomotor pathway that inhibits loom-responsive visual neurons.

      In Kim et al. (2017), the authors argued that HS/VS neurons receive a “quantitatively tuned” efference copy that varies across cell types: yaw-sensitive LPTCs are strongly suppressed, roll-sensitive cells receive intermediate input, and pitch-sensitive cells receive little or none. We also showed that when the amplitude of ongoing visual drive changes, the amplitude of saccade-related potentials (SRPs) scales linearly. This proportionality does not imply a genuinely graded EC, however, because SRP amplitude could vary solely through changes in driving force (Vm – Vrest) with a fixed EC conductance. Crucially, SRPs do not fully suppress feed-forward visual signalling, arguing against an all-or-none EC mechanism.

      How, then, can the cellular and behavioural data be reconciled? Silencing HS/VS neurons—or their primary inputs, the T4/T5 neurons—does not markedly diminish the optomotor response in flight (Fenk et al., 2014; Kim et al., 2017), indicating the presence of additional, as-yet-unidentified pathways.

      Physiological recordings from other visual neurons that drive the optomotor response in flying Drosophila are therefore needed to determine how strongly they are suppressed during loom-evoked turns.

      Behavioral evidence for all-or-none EC model: The authors state "unless the stability reflex is suppressed during the flies' object evoked turns, the turns should slow down more strongly with the dense background than the sparse one". This hypothesis is based on the fact that the optomotor response magnitude is larger with a denser background, as would be predicted by an EMD model (because there are more pixels projected onto the eye). However, based on the authors' previous work, the EC should be tuned to optic flow and thus the turning velocity (or amplitude). Thus the EC need not be directly tied to the background statistics, as they claim. For instance, I think it would be important to distinguish whether a mismatch in reafferent velocity (optic flow) links to distinct turn velocities (and thus position). This would require moving the background at different velocities (co- and anti-directionally) at the onset of bar motion. Overall, there are alternative hypotheses here that need to be discussed and more fully explored (as presented by Bender & Dickinson and in work by the Maimon group).

      We appreciate the reviewer’s important suggestion. In response, we performed the recommended experiment. In Figures 5 and 6 of the revised manuscript, we now present how bar- or loom-evoked flight turns affect the response to a moving background pattern. These experiments revealed that bar-evoked turns do not suppress the optic flow response, whereas loom-evoked turns strongly suppress it. Specifically, when background motion began 100 ms after the onset of loom expansion, the response to the background was significantly suppressed. Although weak residual responses to the background motion were observed in this case, this could be due to background motion occurring outside of the suppression interval, which may correspond in duration to the duration of flight turns (Figure 6C,D). 

      The lack of suppression of the optic flow response during and after bar-evoked turns appears to suggest that the responses are added linearly (Figure 5), seemingly contradicting the lack of dynamic change when the background dot density was altered (Figure 7, Figure 7–figure supplement 1). That is, the experimental result in Figure 5 supports either an addition-only or a graded efference copy (EC) model. However, the result in Figure 7 supports an all-or-none EC model. If a graded EC were used, the amplitude of the EC should be updated almost instantaneously when the static background changes.

      Another possibility is that the optic flow during self-generated turns in a static background is extremely weak compared to the optic flow input generated by physically moving the pattern, perhaps due to the rapid nature of head movements. Indeed, detailed kinematic analysis of head movement during spontaneous saccades in blow flies revealed that the head reaches the target angle before the body completes the orientation change, making the effective speed of reafferent optic flow higher than the speed of body rotation (Hateren and Schilstra, 1999). To test these hypotheses, further experiments will be needed for bar-evoked flight turns.

      Publishing the reviewed preprint:

      (1) The Reviewed Preprint (including the full text of the preprint we reviewed, the eLife assessment, and public reviews) will typically be published in two weeks' time.

      Please let us know if you would like to provide provisional author responses to be posted at the same time (if so, please send these by email). Please do not resubmit within the next two/three weeks, as we will need to publish the first version of the Reviewed Preprint first.

      If there are any factual errors in the eLife assessment or public reviews, or other issues we should be aware of, please let us know as soon as possible.

      (2) After publication of the Reviewed Preprint, you can use the link below to submit a revised version. There is no deadline to resubmit. Before resubmitting, please ensure that you update the preprint at the preprint server to correspond with the revised version. Upon submitting a revised version, we will ask the editors and reviewers if it's appropriate to update their assessment and public reviews, which will be included alongside the revised Reviewed Preprint. At that time we will also post the recommendations to the authors and the author responses you provide with the revised version. In the author response, please respond to the public reviews (where relevant) and the recommendations to the authors.

      (3) Alternatively, you can proceed with the current version of the Reviewed Preprint (once published), without revisions, and request an eLife Version of Record. See the Author Guide for further information: https://elife-rp.msubmit.net/html/elife-rp_author_instructions.html#vor. However, most authors decide to request a Version of Record after a round of revision.

      (4) After publication of eLife's Reviewed Preprint, you also have the option to submit/publish in another journal instead: if you choose to do this, please let us know so we can update our records.

      The reviewers identified two key revisions that could improve the assessment of the paper:

      (1) Consideration of saccades within the model framework (outlined by reviewer 3).

      (2) Addition of physiology data to support the conclusions of the paper (outlined by reviewer 2). If this is not feasible within the timescale of revisions, the paper would need to be revised to clarify that the model leads to a hypothesis that would need to be tested with future physiology experiments.

      Thank you for these comments.

      Regarding revision point #1, we have added Figure 2–figure supplement 2, where we incorporated our position-velocity model (estimated in Figure 1) into the framework of the integrate-and-saccade model. A detailed description of this model is now provided in the main text (Lines 190–203).

      For revision point #2, obtaining electrophysiological evidence for efference copy remains challenging, as neither the visual neurons nor the efference-copy neuron has been identified for the wing optomotor response. As suggested by the reviewers, we have revised the title of the paper to reduce emphasis on efference copy and have noted electrophysiological recordings as a direction for future work.

      old title: A visual efference copy-based navigation algorithm in Drosophila for complex visual environments

      new title: Integrative models of visually guided steering in Drosophila

      Specific recommendations are detailed below.

      Reviewer #2 (Recommendations For The Authors):

      To directly demonstrate if an efference copy is non-scaled, the following experiments can be helpful: record from HS/VS cells and examine the relation between the amplitude of the succade-suppression signal vs. succade amplitude.

      Thanks for raising this important point. We previously carried out the suggested analysis for loom-evoked saccades in Fenk et al. (2021). There, significant correlations emerged between wing-response amplitude and saccade-related potentials (Figures 2F and 3C). However, we did not interpret the strong correlation (r ≈ 0.8) as evidence for a graded efference copy, because the amplitude of saccade-related potentials appeared to be bimodal. Upon presentation of the looming stimulus, flies either executed large evasive turns or showed minimal changes in wing-stroke amplitude. Large wing responses were accompanied by strong, saturated suppression of HS-cell membrane potential, whereas trials without wing responses produced only weak modulations—reflected in the bimodal distribution of saccade-related potential amplitudes (Figure 3C). 

      Importantly, in rigidly tethered preparations—where these potentials are typically measured—the absence of proprioceptive feedback can itself drive wingbeat amplitudes to saturation during saccades. We therefore reasoned that the lack of intermediate-sized flight saccades would naturally yield correspondingly saturated saccade-related potentials, even if a graded EC system is in play. 

      In Kim et al. (2017), we also performed a comprehensive analysis of spontaneous saccade-related potentials across all HS/VS cell types. When we later examined the relationship between saccade amplitude and the corresponding saccade-related potentials in each cell type, we could not find any statistically significant correlation (unpublished data).

      measure how much a weak visual stimulus and a strong visual stimulus are suppressed by the suppression signal. If the signal is non-scaled, visual stimuli should always be suppressed independently of their intensities.

      Thank you for this important suggestion. As mentioned in our response to the previous comment, we believe it is not feasible to record from neurons responsible for the body optomotor response at this point, as their identity remains unknown. Regarding the HS/VS cells, our previous study showed that HS cells are not always fully suppressed. The changes in saccade-related potential amplitude can be described as a linear function of the pre-saccadic visually-evoked membrane potential (Figure 7 in Kim et al., 2017). 

      As suggested by Fenk et al. 2014 (doi: 10.1016/j.cub.2014.10.042), HS cells might also be responsive to a moving bar. If that is the case, and if you present a bar and background (either sparse or dense) in a closed-loop manner to a head-fixed fly, HS cells might be sensitive only to the bar but not to the background (independently of the density).

      Thanks for pointing out this important issue. HS cells indeed respond strongly to the horizontal movement of a vertical bar, as expected given that their receptive fields are formed by the integration of local optic flow vectors. In one of our previous studies (Supplemental Figure 1 in Kim et al., 2015), we showed that the response amplitude to a single vertical bar is roughly equivalent to that elicited by a vertical grating composed of 12 bars of the same size. Therefore, we believe that HS cells are likely to contribute to the head response to a moving vertical bar. In a body-fixed flight simulator, HS cells would respond only to the bar if the bar runs in a closed loop with a static background. In this scenario, HS cells are likely to play a role in the head optomotor response.

      Note also that the role of HS cells in the wing optomotor response remains unresolved. Unilateral activation of HS cells has been shown to elicit locomotor turns in walking Drosophila (Fujiwara et al., 2017), as well as in flying individuals (unpublished data from our lab). However, a previous study also showed that strong silencing of HS/VS cells significantly reduced the head optomotor response, but not the wing optomotor response (Kim et al., 2017).

      If neurophysiology is technically challenging, an alternative way might pay attention to a head movement that exclusively follows the background (Fox et al., 2014 (doi: 10.1242/jeb.080192)). Because HS cells are thought to promote head rotation to background motion, a non-scaled suppression signal on HS cells would always suppress the head rotation independently of the background density.

      Thanks for this helpful comment. We have analyzed head movements during bar-evoked flight turns (Figure 7–figure supplement 1B) and found no significant changes across different background dot densities. We think that this might suggest that HS cells are unlikely to receive suppressive inputs during bar-evoked turns, akin to the lack of modulation during optomotor turns.

      Another way to separate a potential efference copy from other mechanisms (more global inhibition) is the directionality. A global inhibition would suppress the response to the background even if the background moves in the same direction as self-motion, but the efference copy would not.

      Thanks for this important point. In Heisenberg and Wolf, 1979, it was proposed that modulation might be bidirectional, with behavioral effects observed only for perturbations in the “unexpected” direction. In our new data on loom-evoked turns (Figure 6), the suppression appears equally strong for background motion in either direction, supporting an all-or-none suppression mechanism.

      Besides, in general, it is unclear if you think an efference copy operates both in smooth pursuits and saccades or if such a signal is only present during saccades. Your previous neurophysiological work supports the latter. Are your behavioral results consistent with the previous saccade suppression idea, or do you propose a new type of efference copy that also operates in smooth pursuits?

      Thanks for raising this important point. von Holst and Mittelstaedt (1950) originally introduced the concept of efference copy to explain the smooth optomotor response. We previously analyzed electrophysiological recordings from HS cells for membrane-potential changes associated with slow deviations in wing-steering angle but found none. However, this negative result does not entirely rule out modulation of visual processing during smooth flight turns, given the slow drift in membrane potential observed in most whole-cell recordings.

      In this study, We examined only the interactions among visuomotor pathways during these rapid flight turns as the dynamics of visually evoked turns are almost as rapid as spontaneous saccades. Our data reveal that interactions between distinct visuomotor reflexes are more diverse than previously appreciated.

      Minor comments:

      Line 108, 109: match the description between here and the labels in Fig. 1F.

      Thank you for indicating this issue. We have defined the general equation to obtain the position and velocity components in the main text lines 108,109, but due to a slight asymmetry in the data (Fig. 1E) we used the approach indicated in Fig. 1F. and explained in lines 113-117.

      Fig.1 F: If the position-dependent component is due to fatigue, the tuning curve's shape is likely changed (shrunk or extended) depending on the stimulus speed. How can you generalize the tuning curve shown here? Does the result hold even if the stimulus speed/contrast/spatial frequency is changed?

      We appreciate this indication. We believed that fatigue may be the reason why the wing response to the grating stimulus showed that significant decay (Fig. 1E). As you mention, the stimulus speed would increase the amplitude of the fly’s response up to a saturation point. We addressed this in our model by multiplying the derived value by the angular velocity of the grating.

      Regarding the contrast, and spatial frequency we did not test it experimentally, instead, we simulated our model for changing visual feedback (Fig. 4A, B), which can be seen as increasing/decreasing contrast of a grating. An increase in the contrast would increase the response of the fly to the grating and so will contribute to dampening the response to the foreground object (Fig. 4C).

      Line 233-255: Here, the description sounds like you will consider several parallel objects (e.g., two stripes) in the visual field instead of the combination of the figure and background (which is referred to in the following paragraph).

      Thank you for pointing it out. Indeed it was slightly ambiguous. We have addressed this by explaining the specific situation of a combination of an object and the background in lines 231-233.

      Figure 6C: you kept the foreground visual field between sparse and dense random dot backgrounds to keep the bar's saliency. Is it sure that this does not influence the difference in the fly's response to these two backgrounds (in Figure 6B)?

      This is a good point that we have also discussed internally. We also carried out similar experiments with a fully covered background and found no significant differences (Figure 7–figure supplement 1).

      Reviewer #3 (Recommendations For The Authors):

      Identify and analyze flight saccade dynamics in the raw trajectories (e.g., Fig. 3B). There should be some since the bar is near the 'sweet spot' for triggering saccades (see Mongeau & Frye, 2017).

      Thank you for bringing up this interesting point. In previous work, it was reported that the fly fixated on a vertical bar through saccadic turns rather than smooth-tracking (Mongeau & Frye, 2017). When the bar width was thin (<15 deg) there was barely one saccade per second (Mongeau & Frye, 2017, Fig. 4). In our magno tether essay (Fig. 3A, B) the object width was 11.25 degrees, and the object moved for a short time window, and so the fly only generated the saccade related to the onset of the object. It could not be considered as a saccade some small turns of a few degrees that are likely related to small perturbations in comparison to those previously reported (Mongeau & Frye, 2017). Additionally, in our protocol (Fig. 3A) from onset time (‘go’ mark), only a single object moved, within an empty background, so in principle there is no trigger for a switch to a smooth movement. We addressed this in lines x-x.

      Consider updating the Poggio model with flight saccades (switched, integrate-and-fire).

      We appreciate this suggestion. Following previous work (Mongeau et al., 2017), we expanded our model to include a saccade mechanism: the torque produced by the summed position- and velocity-dependent components is now replaced by an integrate-and-fire saccade (Figure 2—figure supplement 2). We optimized the saccade interval and amplitude so that both vary linearly with stimulus amplitude and faithfully reproduce the kinematic properties reported previously (Mongeau et al., 2017).  

      Please engage more with the literature, especially work that directly conflicts with your conclusions (see above). Also, highly relevant work by Bender & Dickinson was not sufficiently discussed. Spot results presented in Fig. 3 should be contextualized in light of the work of Mongeau et al., 2019, who performed similar experiments and identified a switch in saccade valence.

      We appreciate your pointing out the relevant previous work. We have added references to the following papers and tried to describe the relationship between our data and previous ones.

      Bender & Dickinson 2006

      (Line#162) “This simulation experiment is reminiscent of the magnetically tethered flight assay, where a flying fly remains fixed at a position but is free to rotate around its yaw axis (Bender and Dickinson, 2006b; Cellini et al., 2022; G. Kim et al., 2023; Mongeau and Frye, 2017).”

      (Line#218) “We tested the predictions of our models with flies flying in an environment similar to that used in the simulation (Figure 3A). A fly was tethered to a short steel pin positioned vertically at the center of a vertically oriented magnetic field, allowing it to rotate around its yaw axis with minimal friction (Bender and Dickinson, 2006b; Cellini et al., 2022; G. Kim et al., 2023).”

      (Line#238) “To determine if our assay imposes additional friction compared to other assays used in previous studies, we analyzed the dynamics of spontaneous saccades during the “freeze” phase (Figure 3–figure supplement 1A). We found their duration and amplitude to be within the range reported previously (Bender and Dickinson, 2006b; Mongeau and Frye, 2017) (Figure 3–figure supplement 1B-D). 

      Mongeau et al., 2019

      (Line#196) “During this behavior, it has been proposed that visual circuits compute an integrated error of the bar position with respect to the frontal midline and triggers a saccadic turn toward the bar when the integrated value reaches a threshold (Frighetto and Frye, 2023; Mongeau et al., 2019; Mongeau and Frye, 2017). We expanded our bar fixation model to incorporate this behavioral strategy (Figure 2–figure supplement 2).”

      This paper shows that the dynamics of saccadic flight turns elicited by a rotating bar or spot determine whether flies display attraction or aversion. In that study, the visual stimulus—a bar or spot—rotated slowly at a constant 75 deg s⁻¹. By contrast, in our Figure 3 the object moves much faster, driving the neural “integrator” to saturation and triggering an almost immediate flight turn. In Mongeau et al. (2019), saccades occur at variable times and their amplitudes and directions are more stochastic, again reflecting the slower stimulus speed. Because these differences all arise from the disparity in object speed, we did not cite Mongeau et al. (2019) in Figure 3 or the associated text.

      In addition to the two papers cited above, we have incorporated several relevant studies on the Drosophila visuomotor control identified through the reviewers’ insightful comments. Examples include:

      Frighetto G, Frye MA. 2023 (Line#195, 464)

      Rimniceanu et al., 2023 (Line#241)

      Cellini & Mongeau 2020 (Line#91)

      Cellini & Mongeau 2022 (Line#241)

      Cellini et al., 2022 (LIne#91, 162, 218)

      Many citations are not in the proper format (e.g. using numbers rather than authors' last name).

      Thank you for letting us know. We have changed the remaining citations to the proper format.

    1. Reviewer #2 (Public review):

      Summary:

      Shahbazi et al. trained recurrent neural networks (RNNs) to simulate human upper limb movement during adaptation to a force field perturbation. They demonstrated that throughout adaptation, the pattern of motor commands to the muscles of the simulated arm changed, allowing the perturbed movements to regain their typical, perturbation-free straight-line paths. After this initial learning block (FF1), the network encountered null-fields to wash out the adaptation, before re-experiencing the force in a second learning block (FF2). Upon re-exposure, the network learned faster than during initial learning, consistent with the savings observed in behavioral studies of adaptation. They also found that as the number of hidden units in the RNN increased, so did the probability of exhibiting savings. The authors concluded that these results propose a neural basis for savings that is independent of context and strategic processes.

      Strengths:

      The paper addresses an important and controversial topic in motor adaptation: the mechanism underlying motor memory. The RNN simulation reproduces behavioral hallmarks of adaptation, and it provides a useful illustration of the pattern of muscle activity underlying human-like movements under both normal and perturbing conditions. While the savings effect produced by the network, though significant, appears somewhat small, the simulation demonstrating an increase in savings with a greater number of hidden units is particularly intriguing.

      Weaknesses:

      (1) To be transparent, savings in motor adaptation have been a primary focus of my own research. Some core findings presented in this paper are at odds with the ideas I and others have previously put forward. While I don't want to impose my agenda on the authors of this paper, I do think the authors should address these issues.

      a) The authors acknowledge the ongoing debate in the literature regarding the mechanisms underlying savings, particularly whether it stems from explicit or implicit learning processes. However, it remains unclear how the current work addresses this debate. There is already a considerable body of research, particularly in visuomotor adaptation, demonstrating that savings is predominantly driven by explicit strategies. For example, when people are asked to report their strategy, they recall a strategy that was useful during the first learning block (Morehead et al. 2015). Furthermore, savings are abolished under experimental manipulations designed to eliminate strategic contributions (e.g., Haith et al., 2015; Huberdeau et al., 2019; Avraham et al., 2021). The authors briefly state that their findings support the hypothesis that a neural basis of memory retention underlying savings can be independent of cognitive or strategic learning components, and that savings can be characterized as implicit. While these statements may be true, it is not clear how this work substantiates these claims.<br /> b) Our research has also demonstrated that if implicit adaptation is completely washed out after the initial learning block, it not only fails to exhibit savings but is actually attenuated relative to the first learning block (Avraham et al., 2021). This phenomenon of attenuation upon relearning can also be seen in other studies of visuomotor adaptation (e.g., Leow et al., 2020; Yin and Wei, 2020; Hamel et al., 2021; Hamel et al., 2022; Wang and Ivry, 2023; Hadjiosif et al., 2023). More recently, we have shown that this attenuation is due to anterograde interference arising from the experience with the washout block experience (Avraham and Ivry, 2025). We illustrated that the implicit system is highly susceptible to interference; it doesn't require exposure to salient opposite errors and can occur even following prolonged exposure to veridical feedback. The central thesis of this paper, namely that implicit savings can emerge through RNNs, is at odds with these empirical results. The authors should address this discrepancy.

      (2) This brings me to the question about neural correlates: The results are linked to activity in the primary motor cortex. How does that align with the well-established role of the cerebellum in implicit motor adaptation? And with the studies showing that savings are due to explicit strategies, which are generally associated with prefrontal regions?

      (3) The analysis on the complexity of the neural network (i.e., the number of hidden units) and its relationship to savings is very interesting. It makes sense to me that more complex networks would show more savings. I'm not sure I follow the author's explanation, but my understanding is that increased network complexity makes it more difficult to override the formed memory through interference (e.g., from the experience with NF2). Also, the results indicate that a network with 32 units led to a less-than-chance level of networks exhibiting savings (Figure 3b). What behavioral output does this configuration produce? Could this behavior manifest as attenuation upon relearning? Furthermore, if one were to examine an even smaller, simpler network (perhaps one more closely reflecting cerebellar circuits), would such a model predict attenuation rather than savings?

      (4) The authors emphasize that their network did not receive any explicit contextual signals related to the presence or absence of the force field (FF), thus operating in a 'context-free' manner. From my understanding, some existing models of context's role in motor memories (e.g., Oh and Schweighofer, 2019; Heald et al., 2021) propose that memory-related changes can be observed even without explicit contextual information, as contextual changes can be inferred from sudden or significant environmental shifts (e.g., the introduction or removal of perturbations). Given this, could the observed savings in the current simulation be explained by some form of contextual retrieval, inferred by the network from the re-presentation of the perturbation in FF2?

      (5) If there is residual hidden unit activity related to the FF at the end of the NF2 phase, how does the simulated movement revert back to baseline? Are there any differences in the movement trajectory, beyond just lateral deviation, between NF1 and NF2? The authors state that "changes in the preparatory hidden unit activity did not result in substantive changes in the motor commands (Figure 5b), which emphasizes that the uniform shift resides in the null space of motor output." However, Figure 5b appears to show visible changes in hidden unit activity. Don't these changes reflect a pattern of muscle activity that is the basis for behavior? These changes are indeed small, but it seems that so is the effect size for savings (Figure 3a). Could this suggest that there is not, in fact, a complete washout of initial learning during NF2 within the network?

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      *The authors have a longstanding focus and reputation on single cell sequencing technology development and application. In this current study, the authors developed a novel single-cell multi-omic assay termed "T-ChIC" so that to jointly profile the histone modifications along with the full-length transcriptome from the same single cells, analyzed the dynamic relationship between chromatin state and gene expression during zebrafish development and cell fate determination. In general, the assay works well, the data look convincing and conclusions are beneficial to the community. *

      Thank you for your positive feedback.

      *There are several single-cell methodologies all claim to co-profile chromatin modifications and gene expression from the same individual cell, such as CoTECH, Paired-tag and others. Although T-ChIC employs pA-Mnase and IVT to obtain these modalities from single cells which are different, could the author provide some direct comparisons among all these technologies to see whether T-ChIC outperforms? *

      In a separate technical manuscript describing the application of T-ChIC in mouse cells (Zeller, Blotenburg et al 2024, bioRxiv, 2024.05. 09.593364), we have provided a direct comparison of data quality between T-ChIC and other single-cell methods for chromatin-RNA co-profiling (Please refer to Fig. 1C,D and Fig. S1D, E, of the preprint). We show that compared to other methods, T-ChIC is able to better preserve the expected biological relationship between the histone modifications and gene expression in single cells.

      *In current study, T-ChIC profiled H3K27me3 and H3K4me1 modifications, these data look great. How about other histone modifications (eg H3K9me3 and H3K36me3) and transcription factors? *

      While we haven't profiled these other modifications using T-ChIC in Zebrafish, we have previously published high quality data on these histone modifications using the sortChIC method, on which T-ChIC is based (Zeller, Yeung et al 2023). In our comparison, we find that histone modification profiles between T-ChIC and sortChIC are very similar (Fig. S1C in Zeller, Blotenburg et al 2024). Therefore the method is expected to work as well for the other histone marks.

      *T-ChIC can detect full length transcription from the same single cells, but in FigS3, the authors still used other published single cell transcriptomics to annotate the cell types, this seems unnecessary? *

      We used the published scRNA-seq dataset with a larger number of cells to homogenize our cell type labels with these datasets, but we also cross-referenced our cluster-specific marker genes with ZFIN and homogenized the cell type labels with ZFIN ontology. This way our annotation is in line with previous datasets but not biased by it. Due the relatively smaller size of our data, we didn't expect to identify unique, rare cell types, but our full-length total RNA assay helps us identify non-coding RNAs such as miRNA previously undetected in scRNA assays, which we have now highlighted in new figure S1c .

      *Throughout the manuscript, the authors found some interesting dynamics between chromatin state and gene expression during embryogenesis, independent approaches should be used to validate these findings, such as IHC staining or RNA ISH? *

      We appreciate that the ISH staining could be useful to validate the expression pattern of genes identified in this study. But to validate the relationships between the histone marks and gene expression, we need to combine these stainings with functional genomics experiments, such as PRC2-related knockouts. Due to their complexity, such experiments are beyond the scope of this manuscript (see also reply to reviewer #3, comment #4 for details).

      *In Fig2 and FigS4, the authors showed H3K27me3 cis spreading during development, this looks really interesting. Is this zebrafish specific? H3K27me3 ChIP-seq or CutTag data from mouse and/or human embryos should be reanalyzed and used to compare. The authors could speculate some possible mechanisms to explain this spreading pattern? *

      Thanks for the suggestion. In this revision, we have reanalysed a dataset of mouse ChIP-seq of H3K27me3 during mouse embryonic development by Xiang et al (Nature Genetics 2019) and find similar evidence of spreading of H3K27me3 signal from their pre-marked promoter regions at E5.5 epiblast upon differentiation (new Figure S4i). This observation, combined with the fact that the mechanism of pre-marking of promoters by PRC1-PRC2 interaction seems to be conserved between the two species (see (Hickey et al., 2022), (Mei et al., 2021) & (Chen et al., 2021)), suggests that the dynamics of H3K27me3 pattern establishment is conserved across vertebrates. But we think a high-resolution profiling via a method like T-ChIC would be more useful to demonstrate the dynamics of signal spreading during mouse embryonic development in the future. We have discussed this further in our revised manuscript.

      Reviewer #1 (Significance (Required)):

      *The authors have a longstanding focus and reputation on single cell sequencing technology development and application. In this current study, the authors developed a novel single-cell multi-omic assay termed "T-ChIC" so that to jointly profile the histone modifications along with the full-length transcriptome from the same single cells, analyzed the dynamic relationship between chromatin state and gene expression during zebrafish development and cell fate determination. In general, the assay works well, the data look convincing and conclusions are beneficial to the community. *

      Thank you very much for your supportive remarks.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      *Joint analysis of multiple modalities in single cells will provide a comprehensive view of cell fate states. In this manuscript, Bhardwaj et al developed a single-cell multi-omics assay, T-ChIC, to simultaneously capture histone modifications and full-length transcriptome and applied the method on early embryos of zebrafish. The authors observed a decoupled relationship between the chromatin modifications and gene expression at early developmental stages. The correlation becomes stronger as development proceeds, as genes are silenced by the cis-spreading of the repressive marker H3k27me3. Overall, the work is well performed, and the results are meaningful and interesting to readers in the epigenomic and embryonic development fields. There are some concerns before the manuscript is considered for publication. *

      We thank the reviewer for appreciating the quality of our study.

      *Major concerns: *

        • A major point of this study is to understand embryo development, especially gastrulation, with the power of scMulti-Omics assay. However, the current analysis didn't focus on deciphering the biology of gastrulation, i.e., lineage-specific pioneer factors that help to reform the chromatin landscape. The majority of the data analysis is based on the temporal dimension, but not the cell-type-specific dimension, which reduces the value of the single-cell assay. *

      We focused on the lineage-specific transcription factor activity during gastrulation in Figure 4 and S8 of the manuscript and discovered several interesting regulators active at this stage. During our analysis of the temporal dimension for the rest of the manuscript, we also classified the cells by their germ layer and "latent" developmental time by taking the full advantage of the single-cell nature of our data. Additionally, we have now added the cell-type-specific H3K27-demethylation results for 24hpf in response to your comment below. We hope that these results, together with our openly available dataset would demonstrate the advantage of the single-cell aspect of our dataset.

      1. *The cis-spreading of H3K27me3 with developmental time is interesting. Considering H3k27me3 could mark bivalent regions, especially in pluripotent cells, there must be some regions that have lost H3k27me3 signals during development. Therefore, it's confusing that the authors didn't find these regions (30% spreading, 70% stable). The authors should explain and discuss this issue. *

      Indeed we see that ~30% of the bins enriched in the pluripotent stage spread, while 70% do not seem to spread. In line with earlier observations(Hickey et al., 2022; Vastenhouw et al., 2010), we find that H3K27me3 is almost absent in the zygote and is still being accumulated until 24hpf and beyond. Therefore the majority of the sites in the genome still seem to be in the process of gaining H3K27me3 until 24hpf, explaining why we see mostly "spreading" and "stable" states. Considering most of these sites are at promoters and show signs of bivalency, we think that these sites are marked for activation or silencing at later stages. We have discussed this in the manuscript ("discussion"). However, in response to this and earlier comment, we went back and searched for genes that show H3K27-demethylation in the most mature cell types (at 24 hpf) in our data, and found a subset of genes that show K27 demethylation after acquiring them earlier. Interestingly, most of the top genes in this list are well-known as developmentally important for their corresponding cell types. We have added this new result and discussed it further in the manuscript (Fig. 2d,e, , Supplementary table 3).

      *Minors: *

        • The authors cited two scMulti-omics studies in the introduction, but there have been lots of single-cell multi-omics studies published recently. The authors should cite and consider them. *

      We have cited more single-cell chromatin and multiome studies focussed on early embryogenesis in the introduction now.

      *2. T-ChIC seems to have been presented in a previous paper (ref 15). Therefore, Fig. 1a is unnecessary to show. *

      Figure 1a. shows a summary of our Zebrafish TChIC workflow, which contains the unique sample multiplexing and sorting strategy to reduce batch effects, which was not applied in the original TChIC workflow. We have now clarified this in "Results".

      1. *It's better to show the percentage of cell numbers (30% vs 70%) for each heatmap in Figure 2C. *

      We have added the numbers to the corresponding legends.

      1. *Please double-check the citation of Fig. S4C, which may not relate to the conclusion of signal differences between lineages. *

      The citation seems to be correct (Fig. S4C supplements Fig. 2C, but shows mesodermal lineage cells) but the description of the legend was a bit misleading. We have clarified this now.

      *5. Figure 4C has not been cited or mentioned in the main text. Please check. *

      Thanks for pointing it out. We have cited it in Results now.

      Reviewer #2 (Significance (Required)):

      *Strengths: This work utilized a new single-cell multi-omics method and generated abundant epigenomics and transcriptomics datasets for cells covering multiple key developmental stages of zebrafish. *

      *Limitations: The data analysis was superficial and mainly focused on the correspondence between the two modalities. The discussion of developmental biology was limited. *

      *Advance: The zebrafish single-cell datasets are valuable. The T-ChIC method is new and interesting. *

      *The audience will be specialized and from basic research fields, such as developmental biology, epigenomics, bioinformatics, etc. *

      *I'm more specialized in the direction of single-cell epigenomics, gene regulation, 3D genomics, etc. *

      Thank you for your remarks.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      *This manuscript introduces T‑ChIC, a single‑cell multi‑omics workflow that jointly profiles full‑length transcripts and histone modifications (H3K27me3 and H3K4me1) and applies it to early zebrafish embryos (4-24 hpf). The study convincingly demonstrates that chromatin-transcription coupling strengthens during gastrulation and somitogenesis, that promoter‑anchored H3K27me3 spreads in cis to enforce developmental gene silencing, and that integrating TF chromatin status with expression can predict lineage‑specific activators and repressors. *

      *Major concerns *

      1. *Independent biological replicates are absent, so the authors should process at least one additional clutch of embryos for key stages (e.g., 6 hpf and 12 hpf) with T‑ChIC and demonstrate that the resulting data match the current dataset. *

      Thanks for pointing this out. We had, in fact, performed T-ChIC experiments in four rounds of biological replicates (independent clutch of embryos) and merged the data to create our resource. Although not all timepoints were profiled in each replicate, two timepoints (10 and 24hpf) are present in all four, and the celltype composition of these replicates from these 2 timepoints are very similar. We have added new plots in figure S2f and added (new) supplementary table (#1) to highlight the presence of biological replicates.

      2. *The TF‑activity regression model uses an arbitrary R² {greater than or equal to} 0.6 threshold; cross‑validated R² distributions, permutation‑based FDR control, and effect‑size confidence intervals are needed to justify this cut‑off. *

      Thank you for this suggestion. We did use 10-fold cross validation during training and obtained the R2 values of TF motifs from the independent test set as an unbiased estimate. However, the cutoff of R2 > 0.6 to select the TFs for classification was indeed arbitrary. In the revised version, we now report the FDR-adjusted p-values for these R2 estimates based on permutation tests, and select TFs with a cutoff of padj supplementary table #4 to include the p-values for all tested TFs. However, we see that our arbitrary cutoff of 0.6 was in fact, too stringent, and we can classify many more TFs based on the FDR cutoffs. We also updated our reported numbers in Fig. 4c to reflect this. Moreover, supplementary table #4 contains the complete list of TFs used in the analysis to allow others to choose their own cutoff.

      3. *Predicted TF functions lack empirical support, making it essential to test representative activators (e.g., Tbx16) and repressors (e.g., Zbtb16a) via CRISPRi or morpholino knock‑down and to measure target‑gene expression and H3K4me1 changes. *

      We agree that independent validation of the functions of our predicted TFs on target gene activity would be important. During this revision, we analysed recently published scRNA-seq data of Saunders et al. (2023) (Saunders et al., 2023), which includes CRISPR-mediated F0 knockouts of a couple of our predicted TFs, but the scRNAseq was performed at later stages (24hpf onward) compared to our H3K4me1 analysis (which was 4-12 hpf). Therefore, we saw off-target genes being affected in lineages where these TFs are clearly not expressed (attached Fig 1). We therefore didn't include these results in the manuscript. In future, we aim to systematically test the TFs predicted in our study with CRISPRi or similar experiments.

      4. *The study does not prove that H3K27me3 spreading causes silencing; embryos treated with an Ezh2 inhibitor or prc2 mutants should be re‑profiled by T‑ChIC to show loss of spreading along with gene re‑expression. *

      We appreciate the suggestion that indeed PRC2-disruption followed by T-ChIC or other forms of validation would be needed to confirm whether the H3K27me3 spreading is indeed causally linked to the silencing of the identified target genes. But performing this validation is complicated because of multiple reasons: 1) due to the EZH2 contribution from maternal RNA and the contradicting effects of various EZH2 zygotic mutations (depending on where the mutation occurs), the only properly validated PRC2-related mutant seems to be the maternal-zygotic mutant MZezh2, which requires germ cell transplantation (see Rougeot et al. 2019 (Rougeot et al., 2019)) , and San et al. 2019 (San et al., 2019) for details). The use of inhibitors have been described in other studies (den Broeder et al., 2020; Huang et al., 2021), but they do not show a validation of the H3K27me3 loss or a similar phenotype as the MZezh2 mutants, and can present unwanted side effects and toxicity at a high dose, affecting gene expression results. Moreover, in an attempt to validate, we performed our own trials with the EZH2 inhibitor (GSK123) and saw that this time window might be too short to see the effect within 24hpf (attached Fig. 2). Therefore, this validation is a more complex endeavor beyond the scope of this study. Nevertheless, our further analysis of H3K27me3 de-methylation on developmentally important genes (new Fig. 2e-f, Sup. table 3) adds more confidence that the polycomb repression plays an important role, and provides enough ground for future follow up studies.

      *Minor concerns *

      1. *Repressive chromatin coverage is limited, so profiling an additional silencing mark such as H3K9me3 or DNA methylation would clarify cooperation with H3K27me3 during development. *

      We agree that H3K27me3 alone would not be sufficient to fully understand the repressive chromatin state. Extension to other chromatin marks and DNA methylation would be the focus of our follow up works.

      *2. Computational transparency is incomplete; a supplementary table listing all trimming, mapping, and peak‑calling parameters (cutadapt, STAR/hisat2, MACS2, histoneHMM, etc.) should be provided. *

      As mentioned in the manuscript, we provide an open-source pre-processing pipeline "scChICflow" to perform all these steps (github.com/bhardwaj-lab/scChICflow). We have now also provided the configuration files on our zenodo repository (see below), which can simply be plugged into this pipeline together with the fastq files from GEO to obtain the processed dataset that we describe in the manuscript. Additionally, we have also clarified the peak calling and post-processing steps in the manuscript now.

      *3. Data‑ and code‑availability statements lack detail; the exact GEO accession release date, loom‑file contents, and a DOI‑tagged Zenodo archive of analysis scripts should be added. *

      We have now publicly released the .h5ad files with raw counts, normalized counts, and complete gene and cell-level metadata, along with signal tracks (bigwigs) and peaks on GEO. Additionally, we now also released the source datasets and notebooks (.Rmarkdown format) on Zenodo that can be used to replicate the figures in the manuscript, and updated our statements on "Data and code availability".

      *4. Minor editorial issues remain, such as replacing "critical" with "crucial" in the Abstract, adding software version numbers to figure legends, and correcting the SAMtools reference. *

      Thank you for spotting them. We have fixed these issues.

      Reviewer #3 (Significance (Required)):

      The method is technically innovative and the biological insights are valuable; however, several issues-mainly concerning experimental design, statistical rigor, and functional validation-must be addressed to solidify the conclusions.

      Thank you for your comments. We hope to have addressed your concerns in this revised version of our manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary

      In this study, Takagi and colleagues demonstrate that changes in axonal arborization of the segmental wave motor command neurons are sufficient to change behavioral motor output.

      The authors identify the Wnt receptors DFz2 and DFz4 and the ligand Wnt4 as modulators of stereotypic segmental arborization patterns of segmental wave neurons along the anterior-posterior body axis. Based on both embryonic expression pattern analysis and genetic manipulation of the signaling components in wave neurons (receptors) and the neuropil (Wnt4) the authors convincingly demonstrate that Wnt4 acts as a repulsive ligand for DFz2 that restricts posterior axon guidance of both anterior and posterior wave neurons. They also provide the first evidence that Wnt4 potentially acts as an attractive ligand for Df4 to promote the posterior extension of p-wave neurons. Interestingly, artificial optogenetic activation of all wave neurons that normally induces backward locomotion due to the activity of anterior wave neurons, fails to induce backward locomotion in a DFz2 knockdown condition with altered axonal extensions of all wave neurons towards posterior segments. In addition, the authors now observe enhanced fast-forward locomotion, a feature normally induced by posterior wave neurons. Consistent with these findings, they observe that the natural response to an anterior tactile stimulus is similarly altered in DFz2 knockdown animals. The animals respond with less backward movement and increased fast forward motion. These results suggest that alterations in the innervation pattern of wave motor command neurons are sufficient to switch behavioral response programs.

      Strengths

      The authors convincingly demonstrate the importance of Wnt signaling for anteriorposterior axon guidance of a single class of motor command neurons in the larval CNS. The demonstration that alteration of the expression level of a single axon guidance receptor is sufficient to not only alter the innervation pattern but to significantly modify the behavioral response program of the animal provides a potential entry point to understanding behavioral adaptations during evolution.

      Weaknesses

      While the authors demonstrate an alteration of the behavioral response to a natural tactile stimulus the observed effects, a reduction of backward motion and increased fast-foward locomotion, currently cannot be directly correlated to the morphological alterations observed in the single-neuron analyses. The authors do not report any loss of innervation in the "normal" target region but only a small additional innervation of more posterior regions. An analysis of synaptic connectivity and/or a more detailed morphological analysis that is supported by a larger number of analyzed neurons both in control and experimental animals would further strengthen the confidence of the study. As the authors suggest an alteration of the command circuitry, a direct observation of the downstream activation pattern in response to selective optogenetic stimulation of anterior wave neurons would further strengthen their claims (analogous to Takagi et al., 2017, Figure 4).

      We sincerely thank the reviewer for their insightful comments, which were instrumental in improving our manuscript. In response to the reviewers’ suggestion, we have now studied Brp expression and demonstrate that the ectopically extending Wave axons in the posterior region do contain synapses (new Figure 2). This finding supports the idea that these axons are functionally connected to ectopic downstream circuits. 

      Additionally, we have increased the number of analyzed Wave clones in Figure 1F-J (WT and DFz2 KD) and new Figure 3C-G (WT; formerly Figure 2C-G) to strengthen the morphological analyses. We fully agree with the reviewer that “direct observation of the downstream activation pattern in response to selective optogenetic stimulation” would further reinforce our conclusions. However, this was not feasible in the current study since we found that the Wave-Gal4 driver used in this study, which drives expression during embryonic stages, does not drive sufficiently strong expression in the larvae to enable selective optogenetic stimulation (please see below for details). 

      Reviewer #2 (Public Review):

      Summary:

      The authors previously demonstrated that anterior-located a-Wave neurons (neuromeres A1-A3) extend axons anteriorly to connect to circuits inducing backward locomotion, while p-Wave axon (neuromeres A4-A7) project posteriorly to promote forward locomotion in Drosophila larvae. In the manuscript, the authors aim to determine the molecular mechanisms involved in wiring the segmentally homologous Wave neurons distinctively and thus are functionally different in modulating forward or backward locomotion. The genetic screen focused on Wnt/Fz-signaling due to its known anterior-to-posterior guidance roles in mammals and nematodes.

      Strengths:

      Knock-down (KD) DFz2 with two independent RNAi-lines caused ectopic posterior axon and dendrite extension for all a- and p-Wave neurons, with a-Wave axon extending into regions where p-Wave axons normally project. Both behavioral assays (optogenetic stimulation of all Wave neurons or tactile stimuli on heads using a von Frey filament) show that backward movement is reduced or absent and that the speed of evoked fast-forward locomotion is increased. This demonstrates that altered projections of Wave do alter behavior and the DFz2 KD phenotype is consistent with the potential aberrant wiring of a-Wave neurons to forward locomotion-promoting circuits instead of to backward locomotion-promoting circuits.

      The main conclusion, that Wnt/Fz-signaling is essential for the guidance of Wave neurons and in diversifying their protection pattern in a segment-specific manner, is further supported by the results showing that DFz2 gain of function causes shortening of a-Wave but not p-Wave axon extensions towards the posterior end and that KD of DFz4 causes axonal shortening only in A6-p-Wave neurons but does not affect dendrites or processes of other Wave neurons. A role for ligand Wnt4 is demonstrated by results indicating that WNT4 mutants' posterior extension of aWave axons was elongated similar to DFz2 KD animals and p-Wave axon extension towards the posterior end was shortened similar to DFz2 KD animals. Finally, a DWnt4 gradient decreasing from the posterior (A8) to the anterior end (A2), similar to that described in other species, is supported by analyses of DWnt4 gene expression (using Wnt4 Trojan-Gal4) and protein expression (using antibodies). In contrast, DFz2 receptor levels seemed to decrease from the anterior (A2) to the posterior end (A5/6). Together the results support the conclusion that opposing Wnt/Fz ligand-receptor gradients contribute to the diversification of Wave neurons in a location-dependent manner and that DFz2 and DFz4 have opposing effects on axon extension.

      Weaknesses:

      Wave axon and dendrite projections are not exclusively determined by Wnt4, DFz2, and DFz4, and are likely to involve other Fz receptors, Wt ligands, and other types of receptor-ligand signaling pathways. This is in part supported by the fact that Wnt4 loss of function also resulted in phenotypes that do not mimic DFz2 KD or DFz4 KD (Figures 3D, E, and F) and that other Fz/Wnt mutants caused wave neuron phenotypes (Figure 1-supplement 2, D+E). This is not a weakness per se, since it doesn't affect the main conclusion of the manuscript. However, the description and analyses of the data in particular for Figure 1-supplement 2 D should be clarified in the legend. The number within the bars and the asterisks are not defined. It's presumed they refer to numbers of animals assessed and the asterisk next to DFz2 and DFz4 indicate statistically significant differences. However, only one p-value is provided in the legend. It is also unclear if p-values for the other mutants have not been determined or are non-significant. At least for mutants like Corin, which also exhibit altered axon projections, the p-values should be provided.

      We appreciate this reviewer’s careful attention to detail and intellectual curiosity. We apologize for the confusions caused by the statistical reporting in Figure 1 – figure supplement 2D. The numbers shown in the bars represent the number of neurons (i.e. Wave neurons from left or right hemisphere). As mentioned in Materials and Methods section, we applied Chi-square test followed by Haberman's adjusted residual analysis to determine the statistical significance of each RNAi group. The p-value provided in the figure legend corresponds to the Chi-square test. P-values for Haberman's adjusted residual analysis were calculated for all RNAi groups and groups without the asterisk are not statistically significant. We have clarified these points in the corresponding figure legend.

      Figure 4 D, F. The gradient for Wnt4 was determined by comparison of expression levels of other segments to A8 but the gradient for DFz2 was by comparison to A2 and the data supports opposing gradients. However, for DFz2 (Figure 4, F) it seems that the gradient is bi-directional with the lowest being in A5 and increasing towards A2 as well as A8. Analysis should be performed in reference to A8 as well to determine if it is indeed bi-directional. While such a finding would not affect the interpretation of aWave neurons, it may impact conclusions about p-Wave neuron projections.

      We thank the reviewer for highlighting this interesting possibility. In response, we performed an additional analysis of the DFz2 gradient by comparing the signal from each neuromere to that from A8 (new Figure 5—figure supplement 3). This analysis confirmed that the gradient is indeed bidirectional. We revised the description of DFz2 expression accordingly in the revision. We believe this finding does not affect our main conclusions since only the anterior gradient is relevant for a-Wave axon guidance. 

      As discussed above, the DFz2 KD phenotypes are consistent with the potential aberrant wiring of a-Wave neurons to forward locomotion-promoting circuits instead of to backward locomotion-promoting circuits. However, since the axon and dendrites of a-Wave and p-Wave are affected the actual dendritic and axonal contributions for the altered behavior remain elusive. The authors certainly considered a potential contribution of altered dendrite projection of a-Wave neurons to the phenotype and their conclusion that altered axonal projections are involved is supported by the optogenetic experiment "bypassing" sensory input (albeit it seems unlikely that all Wave neurons are activated simultaneously when perceiving natural stimuli).However, the author should also consider that altered perception and projection of pWave neuron may directly (e.g. extended P-wave axon projections increase forward locomotion input thereby overriding backward locomotion) or indirectly (e.g. feedback loops between forward and backward circuits) contribute to the altered behavioral phenotypes in both assays. It is probably noteworthy that the more complex behavioral alterations observed with mechanical stimulation are likely to also be caused by altered dendritic projections.

      We fully agree with the reviewer’s thoughtful interpretation. We have now included these important possibilities in the revised Discussion section. Specifically, we acknowledge that while the DFz2 knockdown phenotypes are consistent with aberrant wiring of a-Wave neurons to forward locomotion-promoting circuits, the contributions of both axonal and dendritic alterations remain unclear. We also recognize that altered perception and projection of p-Wave neurons may directly or indirectly contribute to the observed behavioral phenotypes, particularly in response to mechanical stimulation.

      Presynaptic varicosities of a-Wave neurons in DFz2 KD animals are indicated by orange arrows in Figure 1. However, no presynaptic markers have been used to confirm actual ectopic synaptic connections. At least the authors should more clearly define what parameters they used to "visually" define potential presynaptic varicosities. Some arrows seem to point to more "globular structures" but for several others, it's unclear what they are pointing at.

      As mentioned in our response to Reviewer #1, we have now performed Brp immunostaining to confirm the presence of ectopic synaptic connections (new Figure 2). This analysis supports the interpretation that the presynaptic varicosities observed in DFz2 knockdown animals represent actual synaptic sites. We also clarified in the figure legend the visual criteria used to identify potential presynaptic varicosities.

      Reviewing Editor (Recommendations For The Authors):

      There are a few major concerns that we recommend the authors address:

      (1) Neuroanatomy: The point aberrant synaptic connectivity of a-Wave neurons following Dfz2 knockdown could be substantiated. This could be done by using a presynaptic marker and showing ectopic posterior presynaptic sites ( and/or reduced anterior presynaptic sites) in a-wave neurons.

      As mentioned in our response to the public review, we now have used Brp as a presynaptic marker to quantify the number and distribution of presynaptic sites along the normal and ectopic a-Wave axons (new Figure 2). We show that ectopic posterior Wave axons do contain presynaptic sites.  

      (2) Gradient calculations: As detailed in the reviews below, the Dfz2 gradient looks like it may be bidirectional. Changing the way the gradient is calculated might help address this point.

      As mentioned in our response above, we now have recalculated the gradient by comparing the DFz2 signal to A8 and show that it indeed is bidirectional (new Figure 5—figure supplement 2; formerly Figure 4—figure supplement 2).

      (3)  Statistics and sample sizes: As detailed in the reviews, some of the statistical reporting could be improved. Further, increasing sample sizes could help bolster confidence in the data as well.

      As mentioned above, we have added a description on the sample size, asterisks, and p-values in Figure 1 – figure supplement 2 legend. We also increased sample sizes of single Wave neurons in control and DFz2 knock-down animals (Figure 1F-J (WT and DFz2 KD) and new Figure 3C-G (WT; formerly Figure 2C-G)).

      (4) It would help to include some discussion of the potential contributions of altered p-wave neurons to the observed phenotypes.

      As described above, we have added in the Discussion potential contributions of altered p-wave neurons to the observed phenotypes. 

      Reviewer #1 (Recommendations For The Authors):

      (1) In the current model the authors assume that posterior elongation of a-wave neuron connectivity (axonal projections) induces a loss of connectivity to their natural targets, as backward motion is no longer induced, and a gain of connectivity to posterior wave neuron targets. Is this at the cost of innervation of p-wave neurons, e.g. did these neurons now lose connectivity to their natural targets as well? Therefore, it would be very interesting if the authors would test the behavioral responses to tactile stimuli in the posterior parts of the animal - does the response pattern change?

      This is indeed an interesting possibility that p-Wave function is altered upon DFz2 knock-down and hence behavioral response to posterior touch is changed. However, it is technically challenging to test this with tactile stimuli, due to the difficulty of (1) distinguishing between normal and fast-forward locomotion and (2) delivering a posterior touch stimulus while the larva is moving forward, which is the default behavior of the larvae on an agar plate.

      As highlighted above, the authors should provide additional evidence that the circuit response to a-wave neurons is changed after a DFz2 knockdown. The authors should monitor the activation wave in response to optogenetic activation of anterior wave neurons - analogous to the data provided in Figure 4 of their 2017 paper. If this response is now switched for a-wave activation but not p-wave activation it would greatly support their claims and this data would be less ambiguous compared to the behavioral locomotion data.

      As described in our response to the public review, we attempted this approach but found that the in vitro optogenetics experiment is unfortunately not feasible due to relatively weak expression of R60G09-GAL4 in the larvae. Local activation of control aWave induced fictive backward locomotion only at low frequencies, making comparison with the experimental a-Wave very difficult.  The MB120B-spGAL4 used in our 2017 study could not be employed in this study as it does not drive expression during the embryonic stages and thus cannot be used to knock down DFz2 during development. 

      (2) Related to this point. Why would the normal "backward" circuitry of a-wave neurons be functionally suppressed in Dfz2 knockdowns? Do the authors observe reduced synaptic connectivity in these segments? Vesicle clustering of synaptotagmin or other presynaptic markers could be used as a first. As the innervation pattern is only extended by approximately one segment, it is surprising that the changes are so significant.

      We agree that these are important and interesting points, which remain to be explored in the future study. As described above, we have performed Brp immunostaining and showed that the posterior ectopic axons of a-Wave do contain synapses (new Figure 2). We also found a slight decrease in the number of synapses in the anterior region, which could partially contribute to the weaker activation of downstream neurons responsible for eliciting backward locomotion. Another possibility is that backward suppression occurs through lateral interaction among downstream circuits. Since forward and backward locomotion do not occur simultaneously, it is likely that the circuits driving these two behaviors are mutually inhibitory. Upon DFz2 knock down in a-Wave, downstream neurons inducing fastforward locomotion may become more strongly activated than those inducing backward locomotion, resulting in inhibition of the latter via a “winner-take-all” mechanism. Since these discussions are highly speculative, we chose not to include them in the revised manuscript.  

      (3) The low number of neurons analyzed per segment is of slight concern. This is particularly the case for the control data set used in Figure 1 and Figure 2. As stated, the same datasets are used for both figures. However, at most 6 neurons were analyzed (and for two segments only 3). The control morphology may be more variable than indicated by this data.

      As mentioned above, we now have dissected 50 larvae each for the control and experimental groups, obtained seven and six clones respectively, and included these data in the revised manuscript. We apologize that the sample sizes are still relatively small but hope the reviewer understands the inherently low “hit rate” of the stochastic labelling method.

      It is somewhat curious that in Figure 1- Supplement 3 the authors report the same number of control clones per segment as in Figure 1/2 - is this simply a coincidence? And if this is an independent dataset why did the author use new controls here but not for Figure 2? It is clear that it is very difficult to generate this data but increasing the n-number beyond 3-6 per segment would significantly increase the confidence in the presented data.

      We apologize for the confusion. The data in Figure 1 – figure supplement 3 represent the innervation pattern of dendrites, not axons. We have corrected the figure caption accordingly. These data were obtained from the same samples used to analyze axonal innervation, as shown in the original version of Figure 1F-J.

      (2) The name of the RNAi lines should be indicated in Figure 1 and Figure Supplement 3 to facilitate reading - at least the precise names should be given in both figure legends.

      We have added these labels in the revised figure legends as requested.

      (3) In Figure 4E again the control numbers of Figure 1 for the A2-wave axon are reused. This does not seem appropriate as now a different Gal4 driver is used and a different method to induce individual neuronal clones. Both components may induce significant variability in expression or arborization. As only 3 clones for the wnt4 mutant condition are analyzed (and compared to 5 control clones), this data does not allow for strong conclusions. The authors clearly state the reuse and different methods in the legend of Figure 4 F/G but should also highlight it for the E panel.

      Here, we assume that the reviewer is referring to the former Figure 3 (now Figure 4). We have added a note in the legend that the control data, obtained using a different method, were reused in this panel.

      (4) The expression levels of DWnt4 and DFz2 were analyzed at the end of embryogenesis. At what developmental stage does the axonal extension of wave neurons take place? Is the gradient maintained throughout the first larval stages?

      Based upon the lateral view of Wave neurons in Figure 1—figure supplement 1D, we think that the axonal extension is already established by approximately 20 hr after egg laying. Previously, we performed Wnt4<sup>MI03717-Trojan-GAL4</sup> > GFP.nls immunostaining in the third instar larva and observed a similar gradient of GFP signals towards the posterior end of the ventral nerve cord (VNC). We have included this data in the revised manuscript (new Figure 5—figure supplement 1).

      (5) The authors state that either 2nd or 3rd instar larvae were used for the optogenetic experiments. This may induce unnecessary variation in their assay and should be avoided. As natural variance exists in larvae regarding forward stride duration, the comparison of "on" state forward stride duration between control and experimental genotype is potentially not the best measurement of effect size. What is the difference between OFF and ON stage within the control and experimental genotype? In both cases stride duration decreases but there may not be a significant difference between the delta of the two genotypes. Thus, the observed effect may in part be due to "slower" animals in the control pool. The authors should discuss this more carefully.

      We thank the reviewer for bringing up this critical issue. Indeed, the stride durations of larvae between the control and DFz2 knock-down are slightly different in the OFF condition, although this is not statistically significant. In addition, the effect size of Wave activation on mean stride duration is -0.14 (s) in control while -0.21 (s) in DFz2 knock-down, which we interpret as DFz2 knock-down resulting in stronger fastforward locomotion upon Wave activation. We have incorporated this note in the corresponding figure legends (new Figure 6; formerly Figure 5).

      (6) While the study clearly provides convincing evidence for their model, the authors should tune down their conclusions in the discussion a little bit and highlight that parts of their discussion are speculative.

      We have revised the discussion as suggested.

      Reviewer #2 (Recommendations For The Authors):

      Albeit the optogenetic behavioral experiments strongly support that the altered axonal projection affect normal locomotion, simultaneous labeling of Wave neurons in DFz2 KD animals with presynaptic markers would strengthen the conclusion of ectopic connection of the extended axon with other circuits.

      Please see our response to your public review.

      Figure 1 K+L, Figure 2H, I, Figure 3 F+G: many of the individual data points are not visible in the Whisker plot- changing their color would be useful to visualize them better.

      We have changed the outline width of the box plots to make the individual data points visible.

      Figure 1-Supplement 2: In addition to the comments in the public review- a) the asterisk font size changes in the different panels, e.g. it is much smaller in G', b) font size in some graphs/legends should be increased - in particular in E the hyphenated letters in the genotypes are so small rendering them almost illegible.

      We have unified the font size to make them readable in the figure. We thank the reviewer for the suggestions.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      This is an exploratory study that doesn't explore quite enough. Critically, the authors make a point of mentioning that neuronal firing properties vary across cell types, but only use baseline firing rate as a proxy metric for cell type. This leaves several important explorations on the table, not limited to the following:”

      1a: “Do waveform shape features, which can also be informative of cell type, predict the effect of stimulation?”

      To address this question, we modeled our approach to cell type classification after Peyrache et al. 2012. More specifically, we extracted two features from the mean unit waveforms—the valley-to-peak time (VP) and the peak half-width (PHW). These features were then used to classify units into two distinct clusters (k-means, clusters = 2, based on a strong prior from existing literature), representing putative excitatory and inhibitory neurons. Our approach recapitulated many of the same observations in Peyrache et al. 2012, namely (1) identification of two clusters (low PHW/VP: inhibitory, high PHW/VP: excitatory), (2) an ~80/20 ratio of excitatory/inhibitory neurons, and (3) greater baseline firing rates in the inhibitory vs. excitatory neurons. However, we did not observe a preferential modulation of one cell type compared to another (see newly created Figure 4). A description of this analysis and its takeaways has been incorporated into the manuscript.

      Change to Text:

      Created Figure 4 (Separation of presumed excitatory and inhibitory neurons by waveform morphology).

      Caption: (A) Two metrics were calculated using the averaged waveforms for each detected unit: the valley-to-peak width (VP) and peak half-width (PHW). (B) Scatterplot of the relationship between VP and PHW; note that units with identical metrics are overlaid. Using k-means clustering, we identified two distinct response clusters, representing presumed excitatory (E, blue) and inhibitory (I, red) neurons. The units from which the example waveforms were taken are outlined in black. Probability distributions for each metric are shown along the axes. (C) Total number of units within each cluster, separated by region. (D) Comparison of baseline firing rates, separated by cluster. (E) Percent of modulated units in each cluster. * p < 0.05, NS = not significant.

      Added a description of clustering methodology to lines 132-137: “We calculated two metrics from the averaged waveform from each detected unit: the valley-to-peak-width (VP) and the peak half-width (PHW) (Figure 4A); previously, these two properties of waveform morphology have been used to discriminate pyramidal cells (excitatory) from interneurons (inhibitory) in human intracranial recordings (Peyrache et al., 2012). Next, we performed k-means clustering (n = 2 clusters) on the waveform metrics, in line with previous approaches to cell type classification.

      Added a section in the Results titled “Theta Burst Stimulation Modulates Excitatory and Inhibitory Neurons Equally”. Lines 370-378: “Using k-means clustering, we grouped neurons into two distinct clusters based on waveform morphology, representing neurons that were presumed to be excitatory (E) and inhibitory (I) (Figure 4B). Inhibitory (fast-spiking) neurons exhibited shorter waveform VP and PHW, compared with excitatory (regular-spiking) neurons (I cluster centroid: VP = 0.50ms, PHW = 0.51ms; E cluster centroid: VP = 0.32ms, PHW = 0.31ms), and greater baseline firing rates (U(N<sub>I</sub> = 23, N<<sub>E</sub> = 133) = 1074.50, p = 0.023) (Figure 4D). Although we observed a much greater proportion of excitatory vs. inhibitory neurons (E: 85.3%, I: 14.7%), stimulation appeared to affect excitatory and inhibitory neurons equally, suggesting that one cell type is not preferentially activated over another (Figure 4E).

      Modified discussion of the effects of stimulation on different cell types. Lines 475-483: “…To test these hypotheses directly, we clustered neurons into presumed excitatory and inhibitory neurons based on waveform morphology. In doing so, we observed ~85% excitatory and ~15% inhibitory neurons, which is very similar what has been reported previously in human intracranial recordings (Cowan et al. 2024, Peyrache et al., 2012). Interestingly, stimulation appeared to modulate approximately the same proportion of neurons for each cell type (~30%), despite the differently-sized groups. Recent reports, however, have suggested that the extent to which electrical fields entrain neuronal spiking, particularly with respect to phase-locking, may be specific to distinct classes of cells (Lee et al., 2024).”

      1b:  “Is the autocorrelation of spike timing, which can be informative about temporal dynamics, altered by stimulation? This is especially interesting if theta-burst stimulation either entrains theta-rhythmic spiking or is more modulatory of endogenously theta-modulated units.”

      The reviewer is correct in suggesting that rate-modulation represents only one of many possible ways by which exogenous theta burst stimulation may influence neuronal activity. Indeed, intracranial theta burst stimulation has previously been shown to evoke theta-frequency oscillatory responses in local field potentials (Solomon et al. 2021), and other forms of stimulation (i.e., transcranial alternating current stimulation) may modulate the rhythm, rather than the rate, of neuronal spiking (Krause et al. 2019).

      To investigate whether stimulation altered rhythmicity in neuronal firing, we contrasted the spike timing autocorrelograms, as suggested. More specifically, we computed the pairwise differences in spike timing for each trial, separating spikes into the same pre-, during-, and post-stimulation epochs described in the manuscript (bin size = 5 ms, max lag = 250 ms), grouped neurons by whether they were modulated, and then contrasted the differences in the latencies of the peak normalized autocorrelation value between epochs. Only neurons with a firing rate of ≥ 1 Hz (n = 70/203, 34.5%) were included in this analysis since sparse firing resulted in noisy autocorrelation estimates. Subsequent statistical testing of the peak latency differences between pre-/during- and pre-/post-stimulation did not reveal any group-level differences (Mann-Whitney U tests, p > 0.05). Thus, we were not able to identify neuronal responses suggestive of altered rhythmicity (see Figure S5). A description of this analysis and its takeaways has been incorporated into the manuscript.

      Of note, there are two elements of the data that constrain our ability to detect modulation in the rhythm of firing. First, the baseline activity recorded across neurons modulated by stimulation was relatively low (i.e., median firing rate = 1.77 Hz). Second, stimulation often resulted in a suppression, rather than an enhancement, of firing rate. Taken together, the sparse firing afforded limited opportunity to characterize changes to subtle patterns of spiking. 

      Change to Text:

      Created Figure S5 (Analysis of modulation in spiking rhythmicity)

      Caption: (A) Representative autocorrelograms ACG) for a single neuron. The pairwise differences in spike timing were computed for each trial and epoch (bin size = 5 ms, max lag = 250 ms), then smoothed with a Gaussian kernel. The peak in the normalized ACG across trials was computed for each epoch. (B) Kernel density estimate of the peak ACG lag, separated by epoch. (C) The peak ACG lags were split by whether the neuron was modulated (Mod) or unaffected by stimulation (NS = not significant) for each of the two contrasts: pre- vs. during-stim (left) and pre- vs. post-stim (right).

      Details about the autocorrelation methodology have been incorporated. Lines 166-172: “To investigate whether stimulation altered rhythmicity in neuronal firing, we analyzed the spike timing autocorrelograms. More specifically, we computed the pairwise differences in spike timing for each trial (bin size = 5 ms, max lag = 250 ms) and then contrasted the differences in the latencies of the peak normalized autocorrelation value between epochs (pre-, during-, post-stimulation). Only neurons with a firing rate of ≥ 1 Hz (n = 70/203, 34.5%) were included in this analysis since sparse firing resulted in noisy autocorrelation estimates.

      The results from contrasting the autocorrelograms are now mentioned briefly. Lines 297-298: “Stimulation, however, did not appear to alter the rhythmicity in neuronal firing, as measured by spiking autocorrelograms (Figure S5).”

      1c: “The authors reference the relevance of spike-field synchrony (30-55 Hz) in animal work, but ignore it here. Does spike-field synchrony (comparing the image presentation to post-stimulation) change in this frequency range? This does not seem beyond the scope of investigation here.”

      We agree that a further characterization of spike-field and spike-phase relationships may provide rich insights into more complex regional and interregional dynamics that may be altered by stimulation. Given that many metrics are biased by sample size (e.g., number of spikes), which can vary considerably, computing the pairwise phase consistency (PPC) between spikes and LFP is a preferred metric (Vinck et al. 2010). Although PPC is unbiased, its variance nonetheless increases considerably with low spike counts; pooling spike counts across trials, however, decouples the temporal relationship between spiking and the LFP phase for each trial, confounding results and yielding an unstable estimate.

      To determine whether such an analysis is indeed possible, we calculated the percentage of stimulation trials with ≥ 10 spikes in both the 1s pre- and post-stimulation epochs (a relatively low threshold for inclusion). Only a very small proportion of the total number of trials across all neurons met this criterion (2.5%). Thus, because of the sparse spiking in our data, we are unable to reliably characterize spike-field or spike-phase modulation in detected neurons.

      Change to Text:

      In the manuscript, we have added a description of why our data is not well-suited to investigate these relationships.

      Lines 532-538: “The present study did not investigate interactions between spiking activity and local field potentials because neuronal spiking was sparse at baseline and often further suppressed by stimulation; only a very small proportion of the total number of trials across all neurons exhibited ≥ 10 spikes in both the 1s pre- and post-stimulation epochs (~2.5%). Although certain metrics are not biased by sample size (e.g., pairwise phase consistency), low spike counts can dramatically affect variance and, therefore, result in unstable estimates (Vinck et al., 2011).

      1d: “How does multi-unit activity respond to stimulation? At this somewhat low count of neurons (total n=156 included) it would be valuable to provide input on multi-unit responses to stimulation as well.”

      We thank the reviewer for this suggestion. We have incorporated an analysis of multiunit activity (MUA), which similarly identifies robust modulation via permutation-based statistical testing and characterizes the different profiles of responses (i.e., increased vs. decreased MUA threshold crossings pre- vs. post-stimulation).

      Change to Text:

      Created Figure S8 (Analysis of multiunit activity response to stimulation)

      Caption: (A) Example trace of multiunit activity (MUA) in one channel during a single stimulation trial. Threshold crossings are highlighted with a pink dot overlaid on the MUA signal with a corresponding hash below. (B) The percentage of channels with significantly modulated MUA, separated by the direction of effect. (C) The percentage of channels with significantly modulated MUA, separated by direction effect and region. Inc (red; post > pre) vs. Dec (blue; post < pre). HIP = hippocampus, OFC = orbitofrontal cortex, AMY = amygdala, ACC = anterior cingulate cortex. *** p < 0.001, NS = not significant.

      Details about the MUA methodology have been incorporated. Lines 174-180: “Finally, we measured modulation in multiunit activity (MUA) by filtering the microleectrode signals in a 300-3,000 Hz window and counting the number of threshold crossings. Thresholds were determined on a per-channel basis and defined as -3.5 times the root mean square of the signal during the baseline period; activity during stimulation was excluded since stimulation artifact is difficult to separate from MUA in the absence of spike sorting.

      MUA results are now incorporated. Lines 365-367: “Additional characterization of MUA revealed a dominant signature of increased activity post- vs. pre-stimulation, in line with these trends observed at the single-neuron level (Figure S8).”

      1e: “Several intracranial studies have implicated proximity to white matter in determining the effects of stimulation on LFPs; do the authors see an effect of white matter proximity here?”

      We thank the reviewer for the interesting question. Subsequent characterization revealed only small differences in the proximity of stimulation contacts to white matter (range 1.5-8.0 mm), likely because the chosen target (i.e., basolateral amygdala) has several nearby white matter structures (e.g., stria terminalis). Nonetheless, we performed a linear regression between the proximity to white matter and the stimulation-induced effect on behavior (stimulation vs. no-stimulation d’ difference), the results of which indicate no clear association (p > 0.05; see Figure S9). Critically, this is not to suggest that white matter proximity has no interaction with the reported behavioral effects, but rather, that we could not identify such an association within our data.

      Change to Text:

      Created Figure S9 (The effect of stimulation proximity to white matter and distance to recorded neurons).

      Caption: (A) Kernel density estimate of the Euclidean distance from stimulation contacts to nearest WM structure (in mm); hash marks represent individual observations. (B) The change in memory performance (Δd’) was linearly regressed onto the distance from the stimulated contacts to white matter.

      The following has been added to lines 405-426: “Proximity to white matter has been shown to influence the effects of stimulation on behavior and the strength of evoked responses (Mankin et al., 2021; Mohan et al., 2020; Paulk et al., 2022). Across all stimulated contacts, we observed only small differences in the proximity of stimulation contacts to white matter (median = 4.5 mm, range = 1.5-8.0 mm), likely because the chosen target (i.e., basolateral amygdala) has several nearby white matter structures (e.g., stria terminalis). Nonetheless, we performed a linear regression between the proximity to white matter and the stimulation-induced effect on behavior (stimulation vs. no-stimulation d’ difference), the results of which indicate no clear association (p > 0.05; see Figure S9).

      Comment 2: “It is a little confusing to interpret stimulation-induced modulation of neuronal spiking in the absence of stimulation-induced change in behavior. How do the authors findings tell us anything about the neural mechanisms of stimulation-modulated memory if memory isn't altered? In line with point #1, I would suggest a deeper dive into behavior (e.g. reaction time? Or focus on individual sessions that do change in Figure 4A?) to make a stronger statement connecting the neural results to behavioral relevance.”

      We agree that the connection between the observed stimulation-induced neuronal modulation and effects on behavior is unclear and has proven challenging to elucidate. Per the reviewer’s suggestion, we further focused our analyses on the neuronal modulation effects in the individual sessions that resulted in a robust change in memory performance (stimulation vs. no-stimulation d’ difference threshold of ± 0.5, based on a moderate effect size for Cohen’s d); both a positive and negative threshold were used to capture robust changes in memory performance associated with firing rate modulation, whether enhancement or suppression. To this end, we contrasted the proportion of modulated neurons in the sessions where stimulation resulted in a robust behavioral change (Δd’) with those that did not (~d’). We did not observe a difference in the proportions between groups when collapsed across all sampled regions, or when separately evaluated (Fisher’s exact tests, p > 0.05; see Figure 5C).

      Given that this approach did not further clarify the connection between our neural and behavioral results, we believe it is most appropriate to deemphasize claims in the manuscript regarding the potential insights for behavioral modulation (e.g., memory enhancement), and have done so.

      Change to Text:

      Toned down reference to the memory-related effects of stimulation in the abstract by removing the following lines from the abstract: “Previously, we demonstrated that intracranial theta burst stimulation (TBS) of the basolateral amygdala (BLA) can enhance declarative memory, likely by modulating hippocampal-dependent memory consolidation…” and “…and motivate future neuromodulatory therapies that aim to recapitulate specific patterns of activity implicated in cognition and memory.”

      Changed Figure 4 to Figure 5

      Created Figure 5C (Interaction between behavioral effects and neuronal modulation)(C)  Change in recognition memory performance was split into two categories using a d’ difference threshold of ± 0.5: responder (positive or negative; Δd’, pink) and non-responder (~d’, grey). Individual d’ scores are shown (left) with points colored by outcome category; dotted lines demarcate category boundaries, and the grey-shaded region represents negligible change. The number of sessions within each outcome category (middle) and the proportion of modulated units as a function of outcome category, separated by region (right). NS = not significant.

      The description of the behavioral results has been updated. Lines 394-403: “At the level of individual sessions, we observed enhanced memory (Δd’ > +0.5) in 36.7%, impaired memory (Δd’ < -0.5) in 20.0%, and negligible change (-0.5 ≤ Δd’ ≤ 0.5) in 43.3% when comparing performance between the stim and no-stim conditions; a threshold of Δd’ ± 0.5 was chosen for this classification based on the defined range of a “medium effect” for Cohen’s d. To test our hypothesis that neuronal modulation would be associated with changes in memory performance, we combined the sessions that resulted in either memory enhancement or impairment and contrasted the proportion of modulated units across regions sampled. We did not, however, observe a meaningful difference in the proportion of modulated units when grouped by behavioral outcome (all contrasts p > 0.05) (Figure 5C).

      Lines 213-214 and 394-397 have been edited to reflect a change in the d’ threshold used for categorizing behavioral results (from Δd’ ± 0.2 to Δd’ ± 0.5).

      Comment 3: “It is not clear to me why the assessment of firing rates after image onset and after stim offset is limited to one second - this choice should be more theoretically justified, particularly for regions that spike as sparsely as these.”

      We thank the reviewer for this question and acknowledge that no clear justification was provided for this decision in the manuscript. Our decision to limit each of the analysis epochs to 1s was chosen for two reasons. First, the maximum possible length of the during-stimulation epoch was 1 s (stim on for 1 s). Although the pre- and post-stimulation epochs could be extended without issue, we were concerned that variable time windows could introduce a bias, for instance, resulting in different variances between epochs. Second, we anticipated, both from empirical observations and prior literature, that the neural response following stimulation or task features (e.g., image onset/offset) was likely to be transient, rather than sustained for a period of many seconds. By keeping the windows short, we ensured that our approach to detecting modulation (i.e., contrasting trial-wise spike counts between each pair of epochs) captured the intended effect rather than random noise. We have incorporated a discussion of this rationale in the Peri-Stimulation Modulation Analyses section.

      Change to Text:

      Lines 156-158 have been added: “Each epoch was constrained to 1 s to ensure that subsequent firing rate contrasts were unbiased and to capture potential transient effects (e.g., image onset/offset).”

      Comment 4: “This work coincides with another example of human intracranial stimulation investigating the effect on firing rates (doi: https://doi.org/10.1101/2024.11.28.625915). Given how incredibly rare this type of work is, I think the authors should discuss how their work converges with this work (or doesn't).”

      Thank you for bringing this highly relevant work to our attention. We were unaware of this recent preprint and have incorporated a discussion of its main findings into the manuscript.

      Change to Text:

      New citations: van der Plas et al. 2024 (bioRxiv), Cowan et al. 2024 (bioRxiv)

      The discussion of related studies has been updated. Lines 447-457: “Few studies, however, have characterized the impact of electrical stimulation via macroelectrodes on the spiking activity of human cortical neurons, none of which involve intracranial theta burst stimulation. One study reported a long-lasting reduction in neural excitability among parietal neurons, with variable onset time and recovery following continuous transcranial TBS in non-human primates (Romero et al., 2022). In a similar vein, it was recently shown that human neurons are largely suppressed by single-pulse electrical stimulation (Cowan et al., 2024; Plas et al., 2024). Other emerging evidence suggests that transcranial direct current stimulation may entrain the rhythm rather than rate of neuronal spiking (Krause et al., 2019) and that stimulation-evoked modulation of spiking may meaningfully impact behavioral performance on cognitive tasks (Fehring et al., 2024).”

      Comment 5: “What information does the pseudo-population analysis add? It's not totally clear to me.”

      We recognize the need to further contextualize the motivation for the exploratory pseudo-population analysis and appreciate the reviewer for bringing the lack of detail to our attention. In brief, the analysis allowed us to observe trends in activity across populations of neurons, which, in principle, are not visible by characterizing modulation solely in discrete neurons. Additional details have been incorporated into the manuscript, as suggested.

      Change to Text:

      Additional justification has been incorporated in the description of the methodology. Lines 185-187: “…This approach enables the identification of dominant patterns of coordinated neural activity that may not be apparent when examining individual neurons in isolation.”, lines 192-194: “…By collapsing across subjects into a common pseudo-population, this analysis provides a mesoscale view of how stimulation modulates shared activity patterns across anatomically distributed neural populations.”

      A summary interpretation has been added to the paragraph describing the results. Lines 326-328: “Taken together, these analyses reveal global structure in the state space of responses to BLA stimulation within hippocampal circuits.”

      Reviewer #2 (Public review):

      Comment 1 “Authors suggest that the units modulated by stimulation are largely distinct from those responsive to image offset during trials without stimulation. The subpopulation that responds strongly also tends to have a higher baseline of firing rate. It's important to add that the chosen modulation index is more likely to be significant in neurons with higher firing rates.”

      This is an important point that was not previously addressed in our manuscript. We suspect there are likely two factors at play worth considering with respect to our chosen nonparametric modulation index: neurons with lower activity require smaller changes in spike counts to be significantly modulated (easier to flip ranks), and neurons with higher activity empirically exhibit greater absolute shifts in the number of spikes. Our further use of permutation testing, while mitigating false positives, may also somewhat constrain the ability to detect modulation in sparsely active neurons. Nonetheless, given that many trials entailed few or no spikes, we believe this approach is preferable to alternatives that may be more susceptible to noise (e.g., percent change in trial-averaged firing rate from baseline).

      To better understand the tradeoffs with detection probability, we performed a sensitivity analysis. We generated synthetic data with different baseline firing rates (0.1-5.0 Hz) and effect sizes (± 0.1-0.7 Hz) and simulated the likelihood of detection with our given modulation index across neurons. The results of the simulation support the notion that the probability of detecting modulation is lower for sparsely active neurons (Figure S8C). Further discussion of this consideration for the chosen modulation index, as well as details regarding the sensitivity analysis, have been incorporated into the manuscript.

      Change to Text:

      Created Figure S7C (Detection probability analysis)

      Caption: The same permutation-based analyses reported in the manuscript were repeated under different control conditions… (C) Visualization of the predicted probability of detecting modulation across synthetic neurons with variable firing rates and modulation effect sizes; FR = firing rate.

      Lines 223-224 have been added to the Methods section titled “Firing Rate Control Analyses”: “We performed a series of control analyses to test whether our approach to firing rate detection was robust…”

      A description of the simulation has been incorporated into the same section as above. Lines 234-237: “Finally, to better understand the tradeoffs with our statistical approach, we generated synthetic data with different baseline firing rates (0.1-5.0 Hz) and effect sizes (± 0.1-0.7 Hz), then simulated the likelihood of detecting modulation across variable conditions (Figure S7C).”

      The description of the results from the control analyses has been updated. Lines 330-339: “Finally, we performed three supplementary analyses to evaluate the robustness of our approach to detecting firing rate modulation: a sensitivity analysis assessing the proportion of modulated units at different firing rate thresholds for inclusion/exclusion, a data dropout analysis designed to control for the possibility that non-physiological stimulation artifacts may preclude the detection of temporally adjacent spiking, and a synthetic detection probability analysis. These results recapitulate our observation that units with higher baseline firing are most likely to exhibit modulation (though the probability of detecting modulation is lower for sparsely active neurons) and suggest that suppression in firing rate is not solely attributable to amplifier saturation following stimulation (Figure S7).

      Comment 2: “Readers can benefit from understanding with more details the locations chosen for stimulation - in light of previous studies that found differences between effects based on proximity to white matter (For example - PMID 32446925, Mohan et al, Brain Stimul. 2020 and PMID 33279717 Mankin et al Brain Stimul. 2021).”

      This has been addressed in the above response to Reviewer’s 1 comment 1.1e.

      Change to Text:

      See changes related to Reviewer 1 comment 1.1e.

      Comment 3: “Missing information in the manuscript…”

      3a: “Images of stimulation anatomical locations for all subjects included in this study. Ideally information about the impedance of the contacts to be able to calculate the actual current used.”

      As requested, we have provided an image from the coronal T1 MRI sequence, which highlights the position of the stimulated contacts for each of the 16 patients. Though we did not measure the impedances directly, the stimulation was current-controlled, which ensured that the desired current and charge density were consistent regardless of the tissue or electrode impedance.

      Change to Text:

      Created Figure S1 (Anatomical location of stimulated electrodes).

      Caption: A coronal slice from the T1-weighted MRI scan is shown for each patient who participated in the study (n = 16). Electrode contacts within the same plane of the image are shown with blue circles, and the bipolar pair of stimulated contacts within the basolateral amygdala is highlighted in red.

      Lines 144-145 have been edited to reflect that the delivered stimulation was current-controlled: “Specifically, we administered current-controlled, charge-balanced, …”

      3b: “The studied population is epilepsy patients, and the manuscript lacks description of their condition, proximity to electrodes included in the study to pathological areas, and the number of units from each patient/hemisphere.”

      We agree that additional information regarding patient demographics, experimental details, and clinical characteristics would further contextualize this unique patient population. A new table has been included, which contains the following information: patient ID, sex, age, # experimental session, # SEEG leads (and # microelectrodes), # detected units (L vs. R hemisphere), and suspected seizure onset zone.

      Change to Text:

      Created Table S1 (Patient demographics and clinical characteristics).

      Lines 258-259 have been added: “…(see Table S1 for patient demographics).”

      3c: “I haven't seen any comments on code availability (calculating modulation indices and statistics) and data sharing.”

      For clarification, a section titled Resource Availability is already appended to the end of the manuscript following the Conclusion, which describes the data and code availability.

      Change to Text:

      None

      3d: “Small comment - Figure legend 3E - Define gray markers (non-modulated units?)”

      Thank you for highlighting this omission. We have updated the relevant figure caption.

      Change to Text:

      The following has been added to the Figure 3 caption: “…whereas units without a significant change in activity are shown in grey.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      When you search for something, you need to maintain some representation (a "template") of that target in your mind/brain. Otherwise, how would you know what you were looking for? If your phone is in a shocking pink case, you can guide your attention to pink things based on a target template that includes the attribute 'pink'. That guidance should get you to the phone pretty effectively if it is in view. Most real-world searches are more complicated. If you are looking for the toaster, you will make use of your knowledge of where toasters can be. Thus, if you are asked to find a toaster, you might first activate a template of a kitchen or a kitchen counter. You might worry about pulling up the toaster template only after you are reasonably sure you have restricted your attention to a sensible part of the scene.

      Zhou and Geng are looking for evidence of this early stage of guidance by information about the surrounding scene in a search task. They train Os to associate four faces with four places. Then, with Os in the scanner, they show one face - the target for a subsequent search. After an 8 sec delay, they show a search display where the face is placed on the associated scene 75% of the time. Thus, attending to the associated scene is a good idea. The questions of interest are "When can the experimenters decode which face Os saw from fMRI recording?" "When can the experimenters decode the associated scene?" and "Where in the brain can the experimenters see evidence of this decoding? The answer is that the face but not the scene can be read out during the face's initial presentation. The key finding is that the scene can be read out (imperfectly but above chance) during the subsequent delay when Os are looking at just a fixation point. Apparently, seeing the face conjures up the scene in the mind's eye.

      This is a solid and believable result. The only issue, for me, is whether it is telling us anything specifically about search. Suppose you trained Os on the face-scene pairing but never did anything connected to the search. If you presented the face, would you not see evidence of recall of the associated scene? Maybe you would see the activation of the scene in different areas and you could identify some areas as search specific. I don't think anything like that was discussed here.

      You might also expect this result to be asymmetric. The idea is that the big scene gives the search information about the little face. The face should activate the larger useful scene more than the scene should activate the more incidental face, if the task was reversed. That might be true if the finding is related to a search where the scene context is presumed to be the useful attention guiding stimulus. You might not expect an asymmetry if Os were just learning an association.

      It is clear in this study that the face and the scene have been associated and that this can be seen in the fMRI data. It is also clear that a valid scene background speeds the behavioral response in the search task. The linkage between these two results is not entirely clear but perhaps future research will shed more light.

      It is also possible that I missed the clear evidence of the search-specific nature of the activation by the scene during the delay period. If so, I apologize and suggest that the point be underlined for readers like me.

      We have added text related to this issue, particularly in the discussion (page 19, line 6), and have also added citations of studies in humans and non-human primates showing a causal relationship between preparatory activity in prefrontal and visual cortex and visual search performance (page 6, line 16).

      Reviewer #2 (Public review):

      Summary:

      This work is one of the best instances of a well-controlled experiment and theoretically impactful findings within the literature on templates guiding attentional selection. I am a fan of the work that comes out of this lab and this particular manuscript is an excellent example as to why that is the case. Here, the authors use fMRI (employing MVPA) to test whether during the preparatory search period, a search template is invoked within the corresponding sensory regions, in the absence of physical stimulation. By associating faces with scenes, a strong association was created between two types of stimuli that recruit very specific neural processing regions - FFA for faces and PPA for scenes. The critical results showed that scene information that was associated with a particular cue could be decoded from PPA during the delay period. This result strongly supports the invoking of a very specific attentional template.

      Strengths:

      There is so much to be impressed with in this report. The writing of the manuscript is incredibly clear. The experimental design is clever and innovative. The analysis is sophisticated and also innovative. The results are solid and convincing.

      Weaknesses:

      I only have a few weaknesses to point out.<br /> This point is not so much of a weakness, but a further test of the hypothesis put forward by the authors. The delay period was long - 8 seconds. It would be interesting to split the delay period into the first 4seconds and the last 4seconds and run the same decoding analyses. The hypothesis here is that semantic associations take time to evolve, and it would be great to show that decoding gets stronger in the second delay period as opposed to the period right after the cue. I don't think this is necessary for publication, but I think it would be a stronger test of the template hypothesis.

      We conducted the suggested analysis, and we did not find clear evidence of differences in decoding scene information between the earlier and later portions of the delay period. This may be due to insufficient power when the data are divided, individual differences in when preparatory activation is the strongest, or truly no difference in activation over the delay period. More details of this analysis can be found in the supplementary materials (page 12, line 16; Figure S1).

      Type in the abstract "curing" vs "during."

      Fixed.

      It is hard to know what to do with significant results in ROIs that are not motivated by specific hypotheses. However, for Figure 3, what are the explanations for ROIs that show significant differences above and beyond the direct hypotheses set out by the authors?

      We added reasoning for the other a priori ROIs in the introduction (page 4, line 26). There is substantial evidence suggesting that frontoparietal areas are involved in cognitive control, attentional control, and working memory. The ROIs we selected from frontal and parietal cortex are based on parcels within resting state networks defined by the s17-network atlases (Schaefer et al., 2018). The IFJ was defined by the HCP-MMP1 (Glasser et al., 2016). These regions are commonly used in studies of attention and cognitive control, and the exact ROIs selected are described in the section on “Regions of interest (ROI) definition”. While we have the strongest hypothesis for IFJ based on relatively recent work from the Desimone lab, the other ROIs in lateral frontal cortex and parietal cortex, are also well documented in similar studies, although the exact computation being done by these regions during tasks can be hard to differentiate with fMRI.\

      Reviewer #3 (Public review):

      The manuscript contains a carefully designed fMRI study, using MVPA pattern analysis to investigate which high-level associate cortices contain target-related information to guide visual search. A special focus is hereby on so-called 'target-associated' information, that has previously been shown to help in guiding attention during visual search. For this purpose the author trained their participants and made them learn specific target-associations, in order to then test which brain regions may contain neural representations of those learnt associations. They found that at least some of the associations tested were encoded in prefrontal cortex during the cue and delay period.

      The manuscript is very carefully prepared. As far as I can see, the statistical analyses are all sound and the results integrate well with previous findings.

      I have no strong objections against the presented results and their interpretation.

      Reviewer #1 (Recommendations for the authors):

      One bit of trivia. In the abstract, you should define IFJ on its first appearance in the text. You get to that a bit later.

      Fixed.

      Reviewer #2 (Recommendations for the authors):

      I really don't have much to suggest, as I thought that this was a clearly written report that offered a clever paradigm and data that supported the conclusions. My only suggestion would be to split the delay period activity and test whether the strength of the template evolves over time. Even though fMRI is not the best tool for this, still you would predict stronger decoding in the second half of the delay period

      Please see above for our response to the same comment.

      Reviewer #3 (Recommendations for the authors):

      I would just like to point out some minor aspects that might be worth improving before publishing this work.

      Abstract: While in general, the writing is clear and concise, I felt that the abstract of the manuscript was particularly hard to follow, probably because the authors at some point re-arranged individual sentences. For example, they write in line 12 about 'the preparatory period', but explain only in the following sentence that the preparatory period ensues 'before search begins'. This made it a bit hard to follow the overall logic and I think could easily be fixed. 

      We have addressed this comment and updated the abstract.

      Also in the abstract: 'The CONTENTS of the template typically CONTAIN...' sounds weird, no? Also, 'information is used to modulate sensory processing in preparation for guiding attention during search' sounds like a very over-complicated description of attentional facilitation. I'm not convinced either whether the sequence is correct here. Is the information really used to (first) modulate sensory processing (which is a sort of definition of attention in itself) to (then) prepare the guidance of attention in visual search?

      We have addressed this comment and updated the abstract.

      The sentence in line 7, 'However, many behavioral studies have shown that target-associated information is used to guide attention,...' (and the following sentence) assumes that the reader is somewhat familiar with the term 'target-associations'. I'm afraid that, for a naive reader, this term may only become fully understandable once the idea is introduced a bit later when mentioning that participants of the study were trained on face-scene pairings. I think it could help to give some very short explanation of 'target-associations' already when it is first mentioned. The term 'statistically co-occurring object pairs', for example, could be of great help here.

      Thank you for the suggestion. We have added it to the abstract.

      page 2, line 22: 'prefrotnal'

      Fixed.

      page 2, line 24/25: 'information ... can SUPPLANT (?) ... information'. (That's also a somewhat unfortunate repetition of 'information')

      Fixed.

      page 4, line 23-25: 'Working memory representations in lateral prefrontal and parietal regions are engaged in cognitive control computations that ARE (?) task non-specific but essential to their functioning'

      Fixed.

      page 7, line 1: maybe a comma before 'suggesting'?

      Fixed.

      page 7, line 14-16: Something seems wrong with this sentence: 'The distractor face was a race-gender match, which we previously FOUND MADE (?) target discrimination difficult enough to make the scene useful for guiding attention'

      We have addressed this comment and rewritten this part (now on page 7, line 18).

      Results / Discussion sections:

      In several figures, like in Fig3A, the three different IFJ regions, are grouped separately from the other frontal areas, which makes sense given the special role IFJ plays for representing task-related templates. However, IFJ is still part of PFC. I think it would be more correct to group the other frontal areas (like FEF vLPFC etc.) as 'Other Frontal' or even 'Other PFC'.

      We have made the changes based on the reviewer’s suggestion.

      In some of the Figures, e.g. Fig 3 and 5, I had the impression that the activation patterns of some conditions in vLPFC were rather close to the location of IFJ, which is just a bit posterior. I think I remember that functional localisers of IFJ can actually vary quite a bit in localisation (see e.g. in the Baldauf/Desimone paper). Also, I think it has been shown in the context of other regions, like the human FEF that its position when defined by localisation tasks is not always nicely and fully congruent with the respective labels in an atlas like the Glasser atlas. It might help to take this in consideration when discussing the results, particularly since the term vLPFC is a rather vague collection of several brain parcels and not a parcel name in the Glasser atlas. Some people might even argue that vLPFC in the broad sense contains IFJ, similar to how 'Frontal' contains IFJ (see above). How strong of a point do the authors want to make about activation in IFJ versus in vlPFC?

      We have now added text discussing the inability to truly differentiate between subregions of IFJ and other parts of vLPFC in the methods section on ROIs (page 25, line 13) and in the discussion (page 18, line 25). However, one might think that it is even more surprising given the likely imprecision of ROI boundaries that we see distinct patterns between the subregions of IFG defined by Glasser HCP-MMP1 and the other vLPFC regions defined by the 17-network atlases. We do not wish to overstate the precision of IFJ regions, but note the ROI results within the context of the larger literature. We are sure that our findings will have to be reinterpreted when newer methods allow for better localization of functional subregions of the vLPFC in individuals.

      Given that the authors nicely explain in the introduction how important templates are in visual search, and given that FEF has such an important role in serially guiding saccades through visual search templates, I think it would be worth discussing the finding that FEF did not hold representation of these targets. Of course, this could be in part due to the specific task at hand, but it may still be interesting to note in the Discussion section that here FEF, although important for some top-down attention signals, did not keep representations of the 'search' templates. Is it because there is no spatial component to the task at hand (like proposed in Bedini 2021)?

      We have now added text directly addressing this point and citing the Bedini et al. paper in the discussion (page 18, line 18). Besides our current findings, the relationship between IFJ and FEF is really interesting and will hopefully be investigated more in the future.

      Page 18, line 5: 'we the(N) associated...'

      Fixed.

    1. Resubmitting Essays and Late Work Resubmitting Essays 1-3 That's right! You can resubmit Essays 1-3 for a different grade. I will provide feedback on assignments and essays that you can then use to improve your understanding of the content, writing ability, or critical thinking about the text. Essay resubmits are usually due by Week 17, but more information will be provided within the module and assignment page. Late Work I will expect that you will strive to complete each assignment by the due date. But I recognize that you are juggling a lot and there may be days when completing coursework is not your top priority. If you anticipate the need for an extension, please send me a message in advance (as much as possible) of the due date so I am aware of your situation. Propose an alternative due date that you feel is reasonable (I advise no more than 48-hours to ensure you do not get behind). I will reply with an agreed upon due date to support your success. Receiving an extension/late points on discussion boards or social annotations will not be permitted. The very nature of discussions is to have a conversation around/about the content that is interactive and timely. For an asyncherous class (like this class), it's important to have "due by dates" so everyone has time to plan and participate. If students are interacting on discussion boards or social annotations past the due by dates, there is a chance that students (and I) will miss the awesome things you want to share because we will be focused on the next discussion or assignments. We want to read and engage with people in this class. Please make every effort to participate and engage in the weekly discussions.

      I think everything stated in this section is very fair, especially because I also work 30-40 hours a week. The rule about not getting extensions on discussion boards or social annotations makes sense and will help us to participate and get the most out of learning. I also like how we can give an alternative due date as long as it is reasonable because it gives us flexibility for our other classes and also for work and things outside of school.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      In this study by Li et al., the authors re-investigated the role of cDC1 for atherosclerosis progression using the ApoE model. First, the authors confirmed the accumulation of cDC1 in atherosclerotic lesions in mice and humans. Then, in order to examine the functional relevance of this cell type, the authors developed a new mouse model to selectively target cDC1. Specifically, they inserted the Cre recombinase directly after the start codon of the endogenous XCR1 gene, thereby avoiding off-target activity. Following validation of this model, the authors crossed it with ApoE-deficient mice and found a striking reduction of aortic lesions (numbers and size) following a high-fat diet. The authors further characterized the impact of cDC1 depletion on lesional T cells and their activation state. Also, they provide in-depth transcriptomic analyses of lesional in comparison to splenic and nodal cDC1. These results imply cellular interactions between lesion T cells and cDC1. Finally, the authors show that the chemokine XCL1, which is produced by activated CD8 T cells (and NK cells), plays a key role in the interaction with XCR1-expressing cDC1 and particularly in the atherosclerotic disease progression.<br /> Strengths:

      The surprising results on XCL1 represent a very important gain in knowledge. The role of cDC1 is clarified with a new genetic mouse model.

      Thank you

      Weaknesses:

      My criticism is limited to the analysis of the scRNAseq data of the cDC1. I think it would be important to match these data with published data sets on cDC1. In particular, the data set by Sophie Janssen's group on splenic cDC1 might be helpful here (PMID: 37172103; https://www.single-cell.be/spleen_cDC_homeostatic_maturation/datasets/cdc1). It would be good to assign a cluster based on the categories used there (early/late, immature/mature, at least for splenic DC).

      Thank you very much for your help. Using the scRNA seq data of Xcr1<sup>+</sup> cDC1 sorted from ApoE<sup>–/–</sup> mice, we re-annotated the populations, following the methodology proposed by Sophie Janssen's group. These results are presented in Figure S9 and Figure S10 and described in detail in the Results and Discussion section.

      Please refer to the Results section from line 264 to 284: “Using the scRNA seq data of Xcr1<sup>+</sup> cDC1 sorted from hyperlipidemic mice, we annotated the 10 populations as shown in Figure S9A, following the methodology from a previous study [41]. Ccr7<sup>+</sup> mature cDC1s (Cluster 3, 7 and 9) and Ccr7- immature cDC1s (remaining clusters) were identified across cDC1 cells sorted from aorta, spleen and lymph nodes (Figure S9B). Further stratification based on marker genes reveals that Cluster 10 is the pre-cDC1, with high expression level of CD62L (Sell) and low expression level of CD8a (Figure S9C). Cluster 6 and 8 are the proliferating cDC1s, which express high level of cell cycling genes Stmn1 and Top2a (Figure S9D). Cluster 1 and 4 are early immature cDC1s, and cluster 2 and 5 are late immature cDC1s, according to the expression pattern of Itgae, Nr4a2 (Figure S9E). Cluster 9 cells are early mature cDC1s, with elevated expression of Cxcl9 and Cxcl10 (Figure S9F). Cluster 3 and 7 as late mature cDC1s, characterized by the expression of Cd63 and Fscn1 (Figure S9G). As shown in Figure 5C and Figure S9, the 10 populations displayed a major difference of aortic cDC1 cells that lack in pre-cDC1s (cluster 10) and mature cells (cluster 3, 7 and 9). Interestingly, in hyperlipidemic mice splenic cDC1 possess only Cluster 3 as the late mature cells while the lymph node cDC1 cells have two late mature populations namely Cluster 3 and Cluster 7. In further analysis, we also compared splenic cDC1 cells from HFD mice to those from ND mice. As shown in Figure S10, HFD appears to impact early immature cDC1-1 cells (Cluster 1) and increases the abundance of late immature cDC1 cells (Cluster 2 and 5), regardless of the fact that all 10 populations are present in two origins of samples. We also found that Tnfaip3 and Serinc3 are among the most upregulated genes, while Apol7c and Tifab are downregulated in splenic cDC1 cells sorted from HFD mice”.  

      Please refer to the Discussion section from line 380 to 385: “Based on the maturation analysis of the cDC1 scRNA seq data [41], our findings suggest that the aortic cDC1 cells display a major difference from those of spleen and lymph nodes by lacking the mature clusters, whereas lymph node cDC1 cells contain an additional Fabp5<sup>+</sup> S100a4<sup>+</sup> late mature Cluster. Our results also suggest that hyperlipidemia contributes to alteration in early immature cDC1 and in the abundance of late immature cDC1 cells, which was associated with dramatic change in gene expression of Tnfaip3, Serinc3, Apol7c and Tifab”.

      Reviewer #2 (Public review):

      This study investigates the role of cDC1 in atherosclerosis progression using Xcr1Cre-Gfp Rosa26LSL-DTA ApoE-/- mice. The authors demonstrate that selective depletion of cDC1 reduces atherosclerotic lesions in hyperlipidemic mice. While cDC1 depletion did not alter macrophage populations, it suppressed T cell activation (both CD4+ and CD8+ subsets) within aortic plaques. Further, targeting the chemokine Xcl1 (ligand of Xcr1) effectively inhibits atherosclerosis. The manuscript is well-written, and the data are clearly presented. However, several points require clarification:

      (1) In Figure 1C (upper plot), it is not clear what the Xcr1 single-positive region in the aortic root represents, or whether this is caused by unspecific staining. So I wonder whether Xcr1 single-positive staining can reliably represent cDC1. For accurate cDC1 gating in Figure 1E, Xcr1+CD11c+ co-staining should be used instead.

      The observed false-positive signal in the wavy structures within immunofluorescence Figure 1C (upper panel) results from the strong autofluorescence of elastic fibers, a major vascular wall component (alongside collagen). This intrinsic property of elastic fibers is a well-documented confounder in immunofluorescence studies [A, B].

      In contrast, immunohistochemistry (IHC) employs an enzymatic chromogenic reaction (HRP with DAB substrate) that generates a brown precipitate exclusively at antigen-antibody binding sites. Importantly, vascular elastic fibers lack endogenous enzymatic activity capable of catalyzing the DAB reaction, thereby preventing this source of false positivity in IHC.

      Given that Xcr1 is exclusively expressed on conventional type 1 dendritic cells [C], and considering that IHC lacks the multiplexing capability inherent to immunofluorescence for antigen co-localization, single-positive Xcr1 staining reliably identifies cDC1s in IHC results.

      [A] König, K et al. “Multiphoton autofluorescence imaging of intratissue elastic fibers.” Biomaterials vol. 26,5 (2005): 495-500. doi:10.1016/j.biomaterials.2004.02.059

      [B] Andreasson, Anne-Christine et al. “Confocal scanning laser microscopy measurements of atherosclerotic lesions in mice aorta. A fast evaluation method for volume determinations.” Atherosclerosis vol. 179,1 (2005): 35-42. doi:10.1016/j.atherosclerosis.2004.10.040

      [C] Dorner, Brigitte G et al. “Selective expression of the chemokine receptor XCR1 on cross-presenting dendritic cells determines cooperation with CD8+ T cells.” Immunity vol. 31,5 (2009): 823-33. doi:10.1016/j.immuni.2009.08.027

      (2) Figure 4D suggests that cDC1 depletion does not affect CD4+/CD8+ T cells. However, only the proportion of these subsets within total T cells is shown. To fully interpret effects, the authors should provide:

      (a) Absolute numbers of total T cells in aortas.

      (b) Absolute counts of CD4+ and CD8+ T cells.

      Thanks for your suggestions. We agree that assessing both proportions and absolute numbers in Figure 4 provides a more complete picture of the effects of cDC1 depletion on T cell populations. Furthermore, we also add the absolute count of cDC1 cells and total T cells, and CD44 MFI (mean fluorescence intensity) in CD4<sup>+</sup> and CD8<sup>+</sup> T cells in Figure 4, and supplemented corresponding textual descriptions in the revised manuscript.

      Please refer to the Results section from line 183 to 187: “Subsequently, we assessed T cell phenotype in the two groups of mice. While neither the frequencies nor absolute counts of aortic CD4<sup>+</sup> and CD8<sup>+</sup> T cells differed significantly between two groups of mice (Figure 4D-F), CD69 frequency and CD44 MFI (Mean Fluorescence Intensity), the T cell activation markers, were significantly reduced in both CD4<sup>+</sup> and CD8<sup>+</sup> T cells from Xcr1<sup>+</sup> cDC1 depleted mice compared to controls (Figure 4G and H)”.

      (3) How does T cell activation mechanistically influence atherosclerosis progression? Why was CD69 selected as the sole activation marker? Were other markers (e.g., KLRG1, ICOS, CD44) examined to confirm activation status?

      We sincerely appreciate these insightful comments. As extensively documented in the literature, activated effector T cells (both CD4+ and CD8+) critically promote plaque inflammation and instability through their production of pro-inflammatory cytokines (particularly IFN-γ and TNF-α), which drive endothelial activation, exacerbate macrophage inflammatory responses, and impair smooth muscle cell function [A].

      In our study, we specifically investigated the role of cDC1 cells in atherosclerosis progression. Our key findings demonstrate that cDC1 depletion attenuates T cell activation (as shown by reduced CD69/CD44 expression) and that this reduction in activation is functionally linked to the observed decrease in atherosclerosis burden in our model. 

      Regarding CD44 as an activation marker, we performed quantitative analyses of CD44 mean fluorescence intensity (MFI) in aortic T cells (Figure 4). Importantly, the MFI of CD44 was significantly lower on both CD4+ and CD8+ T cells from Xcr1<sup>Cre-Gfp</sup> Rosa26<sup>LSL-DTA</sup> ApoE<sup>–/–</sup> mice compared to the control ApoE<sup>–/–</sup> mice (data shown below), which is consistent with the result of CD69 in Figure 4. We added the related description in the Result section.

      Please refer to the Results section from line 185 to 187 “CD69 frequency and CD44 MFI (Mean Fluorescence Intensity), the T cell activation markers, were significantly reduced in both CD4+ and CD8+ T cells from Xcr1+ cDC1 depleted mice compared to controls (Figure 4G and H)”.

      Similarly, MFI of CD44 was significantly lower on both CD4<sup>+</sup> and CD8<sup>+</sup> T cells from Xcl1<sup>–/–</sup> ApoE<sup>–/–</sup> mice compared to the control ApoE<sup>–/–</sup> mice (data shown below), which is consistent with the result of CD69 in Figure 7. We also added the related description in the Result section.

      Please refer to the Results section from line 308 to 309 “Crucially, CD69<sup>+</sup> frequency and CD44 MFI remained comparable in both aortic CD4<sup>+</sup> and CD8<sup>+</sup> T cells between two groups (Figure 7D-F).”

      [A] Hansson, Göran K, and Andreas Hermansson. “The immune system in atherosclerosis.” Nature immunology vol. 12,3 (2011): 204-12. doi:10.1038/ni.2001

      (4) Figure 7B: Beyond cDC1/2 proportions within cDCs, please report absolute counts of: Total cDCs, cDC1, and cDC2 subsets. Figure 7D: In addition to CD4+/CD8+ T cell proportions, the following should be included:

      (a) Total T cell numbers in aortas

      (b) Absolute counts of CD4+ and CD8+ T cells.

      Thanks for your suggestions. We have now included in Figure 7 the absolute counts of cDC, cDC1, and cDC2 cells, along with CD4<sup>+</sup> and CD8<sup>+</sup> T cells in aortic tissues. Additionally, we provide the corresponding CD44 mean fluorescence intensity (MFI) measurements for both CD4<sup>+</sup> and CD8<sup>+</sup> T cell populations. We added the related description in the Result section.

      Please refer to the Results section from line 303 to 311: “The flow cytometric results illustrated that both frequencies and absolute counts of Xcr1<sup>+</sup> cDC1 cells in the aorta were significantly reduced, but cDCs and cDC2 cells from Xcl1<sup>–/–</sup> ApoE<sup>–/–</sup> were comparable with that from ApoE<sup>–/–</sup> (Figure 7A-C). Moreover, in both lymph node and spleen, the absolute numbers of pDC, cDC1 and cDC2 from Xcl1<sup>–/–</sup> ApoE<sup>–/–</sup> were comparable with that from ApoE<sup>–/–</sup> (Figure S11). Crucially, CD69<sup>+</sup> frequency and CD44 MFI remained comparable in both aortic CD4<sup>+</sup> and CD8<sup>+</sup> T cells between two groups (Figure 7D-F). However, aortic CD8<sup>+</sup> T cells exhibited reduced frequency and absolute count, while CD4<sup>+</sup> T cells showed increased frequency but unchanged counts in Xcl1<sup>–/–</sup> ApoE<sup>–/–</sup> mouse versus controls (Figure 7G and H).”

      (5) cDC1 depletion reduced CD69+CD4+ and CD69+CD8+ T cells, whereas Xcl1 depletion decreased Xcr1+ cDC1 cells without altering activated T cells. How do the authors explain these different results? This discrepancy needs explanation.

      We sincerely appreciate your professional and insightful comments regarding the mechanistic relationship between cDC1 depletion and T cell activation. Direct cDC1 depletion in the Xcr1<sup>Cre-Gfp</sup> Rosa26<sup>LSL-DTA</sup> ApoE<sup>–/–</sup> micmodel removes both recruited and tissue-resident cDC1s, eliminating their multifunctional roles in antigen presentation, co-stimulation and cytokine secretion essential for T cell activation. In contrast, Xcl1 depletion reduces, but does not eliminate cDC1 migration into plaques. Furthermore, alternative chemokine axes (e.g., CCL5/CCR5, CXCL9/CXCR3, BCL9/BCL9L) may partially rescue cDC1 recruitment [13, 68, 69], and non-cDC1 APCs (e.g., monocytes, cDC2s) may compensate for T cell activation [55, 70]. We emphasize that Xcl1 depletion specifically failed to alter T cell activation in hyperlipidemic ApoE<sup>–/–</sup> mice. However, its impact may differ in other pathophysiological contexts due to compensatory mechanisms. We thank you again for highlighting this nuance, which strengthens our mechanistic interpretation. We have added these points to the discussion section and included new references.

      Please refer to the Discussion section from line 407 to 413: “Notably, while complete ablation of Xcr1<sup>+</sup> cDC1s impaired T cell activation, reduction of Xcr1<sup>+</sup> cDC1 recruitment via Xcl1 deletion did not significantly compromise this process. This discrepancy may arise through compensatory mechanisms: alternative chemokine axes (e.g., CCL5/CCR5, CXCL9/CXCR3, BCL9/BCL9L) may partially rescue Xcr1<sup>+</sup> cDC1 homing [13, 68, 69], while non-cDC1 antigen-presenting cells (e.g., monocytes, cDC2s) may sustain T cell activation [55, 70]. Furthermore, tissue-specific microenvironment factors could potentially modulate its role in other diseases.”. [13] Eisenbarth, S C. “Dendritic cell subsets in T cell programming: location dictates function.” Nature reviews. Immunology vol. 19,2 (2019): 89-103. doi:10.1038/s41577-018-0088-1 [55] Brewitz, Anna et al. “CD8+ T Cells Orchestrate pDC-XCR1+ Dendritic Cell Spatial and Functional Cooperativity to Optimize Priming.” Immunity vol. 46,2 (2017): 205-219. doi:10.1016/j.immuni.2017.01.003 [68] de Oliveira, Carine Ervolino et al. “CCR5-Dependent Homing of T Regulatory Cells to the Tumor Microenvironment Contributes to Skin Squamous Cell Carcinoma Development.” Molecular cancer therapeutics vol. 16,12 (2017): 2871-2880. doi:10.1158/1535-7163.MCT-17-0341.[69] He F, Wu Z, Liu C, Zhu Y, Zhou Y, Tian E, et al. Targeting BCL9/BCL9L enhances antigen presentation by promoting conventional type 1 dendritic cell (cDC1) activation and tumor infiltration. Signal Transduct Target Ther. 2024;9(1):139. Epub 2024/05/30. doi: 10.1038/s41392-024-01838-9. PubMed PMID: 38811552; PubMed Central PMCID: PMCPMC11137111.[70] Böttcher, Jan P et al. “Functional classification of memory CD8(+) T cells by CX3CR1 expression.” Nature communications vol. 6 8306. 25 Sep. 2015, doi:10.1038/ncomms9306.

      Reviewer #1 (Recommendations for the authors):

      (1) Line 32 - The authors might want to add that the mouse model leads to a "constitutive" depletion of cDC1.

      Thanks for your advice, we have revised the sentence as follows.

      Please refer to the Results section from line 31 to 33: “we established Xcr1<sup>Cre-Gfp</sup> Rosa26<sup>LSL-DTA</sup> ApoE<sup>–/–</sup> mice, a novel and complex genetic model, in which cDC1 was constitutively depleted in vivo during atherosclerosis development”.

      (2) Line 187-188: The authors claim that T cell activation was "inhibited" if cDC1 was depleted. The data shows that the T cells were less activated, but there is no indication of any kind of inhibition; this should be corrected.

      Thanks for your advice, we have revised the sentence as follows.

      Please refer to the Results section from line 183 to 187: “Subsequently, we assessed T cell phenotype in the two groups of mice. While neither the frequencies nor absolute counts of aortic CD4<sup>+</sup> and CD8<sup>+</sup> T cells differed significantly between two groups of mice (Figure 4D-F), CD69 frequency and CD44 MFI (Mean Fluorescence Intensity), the T cell activation markers, were significantly reduced in both CD4<sup>+</sup> and CD8<sup>+</sup> T cells from Xcr1<sup>+</sup> cDC1 depleted mice compared to controls (Figure 4G and H)”.

      (3) Why are some splenic DC clusters absent in LNs and vice versa? This is not obvious to this reviewer and should at least be discussed.

      We appreciate the insightful question regarding the absence of certain splenic DC clusters in LNs. This phenomenon in Figure 5 aligns with the 'division of labor' paradigm in dendritic cell biology: tissue microenvironments evolve specialized DC subsets to address local immunological challenges. The absence of universal clusters reflects functional adaptation, not technical artifacts. We acknowledge that this tissue-specific heterogeneity warrants further discussion and have expanded our analysis to address this point in the discussion part of our manuscript.

      Please refer to the Discussion section from line 375 to 385: “This pronounced tissue-specific compartmentalization of Xcr1<sup>+</sup> cDC1 subsets may related to multiple mechanisms including developmental imprinting that instructs precursor differentiation into transcriptionally distinct subpopulations [62], and microenvironmental filtering through organ-specific chemokine axes (e.g., CCL2/CCR2 in spleen) selectively recruits receptor-matched subsets [63, 64]. This spatial specialization optimizes pathogen surveillance for local immunological challenges. Based on the maturation analysis of the cDC1 scRNA seq data [41], our findings suggest that the aortic cDC1 cells display a major difference from those of spleen and lymph nodes by lacking the mature clusters, whereas lymph node cDC1 cells contain an additional Fabp5<sup>+</sup> S100a4<sup>+</sup> late mature Cluster. Our results also suggest that hyperlipidemia contributes to alteration in early immature cDC1 and in the abundance of late immature cDC1 cells, which was associated with dramatic change in gene expression of Tnfaip3, Serinc3, Apol7c and Tifab”.

      [62]. Liu Z, Gu Y, Chakarov S, Bleriot C, Kwok I, Chen X, et al. Fate Mapping via Ms4a3-Expression History Traces Monocyte-Derived Cells. Cell. 2019;178(6):1509-25 e19. Epub 2019/09/07. doi: 10.1016/j.cell.2019.08.009. PubMed PMID: 31491389.

      [63]. Bosmans LA, van Tiel CM, Aarts S, Willemsen L, Baardman J, van Os BW, et al. Myeloid CD40 deficiency reduces atherosclerosis by impairing macrophages' transition into a pro-inflammatory state. Cardiovasc Res. 2023;119(5):1146-60. Epub 2022/05/20. doi: 10.1093/cvr/cvac084. PubMed PMID: 35587037; PubMed Central PMCID: PMCPMC10202633.

      [64]. Mildner A, Schonheit J, Giladi A, David E, Lara-Astiaso D, Lorenzo-Vivas E, et al. Genomic Characterization of Murine Monocytes Reveals C/EBPbeta Transcription Factor Dependence of Ly6C(-) Cells. Immunity. 2017;46(5):849-62 e7. Epub 2017/05/18. doi: 10.1016/j.immuni.2017.04.018. PubMed PMID: 28514690.

      [41]. Bosteels V, Marechal S, De Nolf C, Rennen S, Maelfait J, Tavernier SJ, et al. LXR signaling controls homeostatic dendritic cell maturation. Sci Immunol. 2023;8(83):eadd3955. Epub 2023/05/12. doi: 10.1126/sciimmunol.add3955. PubMed PMID: 37172103.

      (4) The authors should discuss how XCL1 could impact lesional cDC1 and T cell abundance. Notably, preDCs do not express XCR1, and T cells express XCL1 following TCR activation. Is there a recruitment or local proliferation defect of cDC1 in the absence of XCL1? Could there also be a role for NK cells as a potential source of XCL1?

      We appreciate your insightful questions regarding the differential effects of Xcl1 on cDC1s and T cells. Xcl1 primarily mediates the recruitment of mature cDC1s. Our data demonstrate that Xcl1 deletion significantly reduces aortic cDC1 abundance, which correlates with a concomitant decrease in CD8<sup>+</sup> T cell numbers within the aorta. These findings strongly suggest that the Xcl1-Xcr1 axis plays a regulatory role in T cell accumulation in aortic plaques.

      Consistent with prior studies [A, B], cDC1 recruitment can occur in the absence of Xcl1 which echoes our findings that cDC1 cells were still found in Xcl1 knockout aortic plaque but in lower abundance. It is very true that further studies are required to address how the Xcl1 dependent and independent cDC1 cells activate T cells and if they possess capability of proliferation in tissue differentially. We have added these points in discussion section.

      Please refer to the Discussion section from line 407 to 415: “Notably, while complete ablation of Xcr1<sup>+</sup> cDC1s impaired T cell activation, reduction of Xcr1<sup>+</sup> cDC1 recruitment via Xcl1 deletion did not significantly compromise this process. This discrepancy may arise through compensatory mechanisms: alternative chemokine axes (e.g., CCL5/CCR5, CXCL9/CXCR3, BCL9/BCL9L) may partially rescue Xcr1<sup>+</sup> cDC1 homing [13, 68, 69], while non-cDC1 antigen-presenting cells (e.g., monocytes, cDC2s) may sustain T cell activation [55, 70]. Furthermore, tissue-specific microenvironment factors could potentially modulate its role in other diseases. In summary, our findings identify Xcl1 as a potential therapeutic target for atherosclerosis therapy, though its cellular origins and regulation of lesional Xcr1<sup>+</sup> cDC1 and T cells dynamics require further studies”.

      In literatures, Xcl1 are expressed in NK cells and subsects of T cells, and NK cells can be a potential source of Xcl1 during atherosclerosis which deserve further investigations [A, C, D].

      [A] Böttcher, Jan P et al. “NK Cells Stimulate Recruitment of cDC1 into the Tumor Microenvironment Promoting Cancer Immune Control.” Cell vol. 172,5 (2018): 1022-1037.e14. doi:10.1016/j.cell.2018.01.004

      [B] He, Fenglian et al. “Targeting BCL9/BCL9L enhances antigen presentation by promoting conventional type 1 dendritic cell (cDC1) activation and tumor infiltration.” Signal transduction and targeted therapy vol. 9,1 139. 29 May. 2024, doi:10.1038/s41392-024-01838-9

      [C] Woo, Yeon Duk et al. “The invariant natural killer T cell-mediated chemokine X-C motif chemokine ligand 1-X-C motif chemokine receptor 1 axis promotes allergic airway hyperresponsiveness by recruiting CD103+ dendritic cells.” The Journal of allergy and clinical immunology vol. 142,6 (2018): 1781-1792.e12. doi:10.1016/j.jaci.2017.12.1005

      [D] Winkels, Holger et al. “Atlas of the Immune Cell Repertoire in Mouse Atherosclerosis Defined by Single-Cell RNA-Sequencing and Mass Cytometry.” Circulation research vol. 122,12 (2018): 1675-1688. doi:10.1161/CIRCRESAHA.117.312513

      Reviewer #2 (Recommendations for the authors):

      There is a logical error in line 298. I suggest revising to: "Collectively, these data suggest that Xcl1 promotes atherosclerosis by recruiting Xcr1+ cDC1 cells, which subsequently drive T cell activation in lesions."

      Thanks for your advice. Since Xcl1 deficiency reduced both the frequencies and absolute counts of Xcr1+ cDC1 and CD8+ T cells in lesions without affecting T cell activation, we revised the sentence as you suggested.

      Please refer to the Results section from line 314 to 315: “Collectively, these data suggest that Xcl1 promotes atherosclerosis by recruiting Xcr1<sup>+</sup> cDC1 cells, and facilitating CD8<sup>+</sup> T cell accumulation in lesions”.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      The authors developed a sequence-based method to predict drug-interacting residues in IDP, based on their recent work, to predict the transverse relaxation rates (R2) of IDP trained on 45 IDP sequences and their corresponding R2 values. The discovery is that the IDPs interact with drugs mostly using aromatic residues that are easy to understand, as most drugs contain aromatic rings. They validated the method using several case studies, and the predictions are in accordance with chemical shift perturbations and MD simulations. The location of the predicted residues serves as a starting point for ligand optimization.

      Strengths:

      This work provides the first sequence-based prediction method to identify potential drug-interacting residues in IDP. The validity of the method is supported by case studies. It is easy to use, and no time-consuming MD simulations and NMR studies are needed.

      Weaknesses:

      The method does not depend on the information of binding compounds, which may give general features of IDP-drug binding. However, due to the size and chemical structures of the compounds (for example, how many aromatic rings), the number of interacting residues varies, which is not considered in this work. Lacking specific information may restrict its application in compound optimization, aiming to derive specific and potent binding compounds.

      We fully recognize that different compounds may have different interaction propensity profiles along the IDP sequence. In future studies, we will investigate compound-specific parameter values. The limiting factor is training data, but such data are beginning to be available.

      Reviewer #2 (Public review):

      Summary:

      In this work, the authors introduce DIRseq, a fast, sequence-based method that predicts drug-interacting residues (DIRs) in IDPs without requiring structural or drug information. DIRseq builds on the authors' prior work looking at NMR relaxation rates, and presumes that those residues that show enhanced R2 values are the residues that will interact with drugs, allowing these residues to be nominated from the sequence directly. By making small modifications to their prior tool, DIRseq enables the prediction of residues seen to interact with small molecules in vivo.

      Strengths:

      The preprint is well written and easy to follow

      Weaknesses:

      (1) The DIRseq method is based on SeqDYN, which itself is a simple (which I do not mean as a negative - simple is good!) statistical predictor for R2 relaxation rates. The challenge here is that R2 rates cover a range of timescales, so the physical intuition as to what exactly elevated R2 values mean is not necessarily consistent with "drug interacting". Presumably, the authors are not using the helix boost component of SeqDYN here (it would be good to explicitly state this). This is not necessarily a weakness, but I think it would behove the authors to compare a few alternative models before settling on the DIRseq method, given the somewhat ad hoc modifications to SeqDYN to get DIRseq.

      Actually, the factors that elevate R2 are well-established. These are local interactions and residual secondary structures (if any). The basic assumption of our method is that intra-IDP interactions that elevate R2 convert to IDP-drug interactions. This assumption was supported by our initial observation that the drug interaction propensity profiles predicted using the original SeqDYN parameters already showed good agreement with CSP profiles. We only made relatively small adjustments to the parameters to improve the agreement. Indeed we did not apply the helix boost portion of SeqDYN to DIRseq, and will state as such. We will also compare DIRseq with several alternative models.

      Specifically, the authors previously showed good correlation between the stickiness parameter of Tesei et al and the inferred "q" parameter for SeqDYN; as such, I am left wondering if comparable accuracy would be obtained simply by taking the stickiness parameters directly and using these to predict "drug interacting residues", at which point I'd argue we're not really predicting "drug interacting residues" as much as we're predicting "sticky" residues, using the stickiness parameters. It would, I think, be worth the authors comparing the predictive power obtained from DIRseq with the predictive power obtained by using the lambda coefficients from Tesei et al in the model, local density of aromatic residues, local hydrophobicity (note that Tesei at al have tabulated a large set of hydrophobicity scores!) and the raw SeqDYN predictions. In the absence of lots of data to compare against, this is another way to convince readers that DIRseq offers reasonable predictive power.

      We will compare predictions of these various parameter sets, and summarize the results in a table.

      (2) Second, the DIRseq is essentially SeqDYN with some changes to it, but those changes appear somewhat ad hoc. I recognize that there is very limited data, but the tweaking of parameters based on physical intuition feels a bit stochastic in developing a method; presumably (while not explicitly spelt out) those tweaks were chosen to give better agreement with the very limited experimental data (otherwise why make the changes?), which does raise the question of if the DIRseq implementation of SeqDYN is rather over-parameterized to the (very limited) data available now? I want to be clear, the authors should not be critiqued for attempting to develop a model despite a paucity of data, and I'm not necessarily saying this is a problem, but I think it would be really important for the authors to acknowledge to the reader the fact that with such limited data it's possible the model is over-fit to specific sequences studied previously, and generalization will be seen as more data are collected.

      We have explained the rationale for the parameter tweaks, which were limited to q values for four amino-acid types, i.e., to deemphasize hydrophobic interactions and slightly enhance electrostatic interactions (p. 4-5). We will add that these tweaks were motivated by observations from MD simulations of drug interactions with a-syn (ref 20). As already noted in the response to the preceding comment, we will also present results for the original parameter values as well as for when the four q values are changed one at a time.

      (3) Third, perhaps my biggest concern here is that - implicit in the author's assumptions - is that all "drugs" interact with IDPs in the same way and all drugs are "small" (motivating the change in correlation length). Prescribing a specific lengthscale and chemistry to all drugs seems broadly inconsistent with a world in which we presume drugs offer some degree of specificity. While it is perhaps not unexpected that aromatic-rich small molecules tend to interact with aromatic residues, the logical conclusion from this work, if one assumes DIRseq has utility, is that all IDRs bind drugs with similar chemical biases. This, at the very least, deserves some discussion.

      The reviewer raises a very important point. In Discussion, we will add that it is important to further develop DIRseq to include drug-specific parameters when data for training become available.

      (4) Fourth, the authors make some general claims in the introduction regarding the state of the art, which appear to lack sufficient data to be made. I don't necessarily disagree with the author's points, but I'm not sure the claims (as stated) can be made absent strong data to support them. For example, the authors state: "Although an IDP can be locked into a specific conformation by a drug molecule in rare cases, the prevailing scenario is that the protein remains disordered upon drug binding." But is this true? The authors should provide evidence to support this assertion, both examples in which this happens, and evidence to support the idea that it's the "prevailing view" and specific examples where these types of interactions have been biophysically characterized.

      We will cite several studies showing that IDPs remain disordered upon drug binding.

      Similarly, they go on to say:

      "Consequently, the IDP-drug complex typically samples a vast conformational space, and the drug molecule only exhibits preferences, rather than exclusiveness, for interacting with subsets of residues." But again, where is the data to support this assertion? I don't necessarily disagree, but we need specific empirical studies to justify declarative claims like this; otherwise, we propagate lore into the scientific literature. The use of "typically" here is a strong claim, implying most IDP complexes behave in a certain way, yet how can the authors make such a claim? 

      Here again we will add citations to support the statement.

      Finally, they continue to claim:

      "Such drug interacting residues (DIRs), akin to binding pockets in structured proteins, are key to optimizing compounds and elucidating the mechanism of action." But again, is this a fact or a hypothesis? If the latter, it must be stated as such; if the former, we need data and evidence to support the claim. 

      We will add citations to both compound optimization and mechanism of action.

    1. Reviewer #2 (Public Review):

      Summary:

      This paper describes a new approach to detecting directed causal interactions between two genes without directly perturbing either gene. To check whether gene X influences gene Z, a reporter gene (Y) is engineered into the cell in such a way that (1) Y is under the same transcriptional control as X, and (2) Y does not influence Z. Then, under the null hypothesis that X does not affect Z, the authors derive an equation that describes the relationship between the covariance of X and Z and the covariance of Y and Z. Violation of this relationship can then be used to detect causality.

      The authors benchmark their approach experimentally in several synthetic circuits. In 4 positive control circuits, X is a TetR-YFP fusion protein that represses Z, which is an RFP reporter. The proposed approach detected the repression interaction in 2 of the 4 positive control circuits. The authors constructed 16 negative control circuit designs in which X was again TetR-YFP, but where Z was either a constitutively expressed reporter, or simply the cellular growth rate. The proposed method detected a causal effect in two of the 16 negative controls, which the authors argue is perhaps not a false positive, but due to an unexpected causal effect. Overall, the data support the potential value of the proposed approach.

      Strengths:

      The idea of a "no-causality control" in the context of detected directed gene interactions is a valuable conceptual advance that could potentially see play in a variety of settings where perturbation-based causality detection experiments are made difficult by practical considerations.

      By proving their mathematical result in the context of a continuous-time Markov chain, the authors use a more realistic model of the cell than, for instance, a set of deterministic ordinary differential equations.

      The authors have improved the clarity and completeness of their proof compared to a previous version of the manuscript.

      Limitations:

      The authors themselves clearly outline the primary limitations of the study: The experimental benchmark is a proof of principle, and limited to synthetic circuits involving a handful of genes expressed on plasmids in E. coli. As acknowledged in the Discussion, negative controls were chosen based on the absence of known interactions, rather than perturbation experiments. Further work is needed to establish that this technique applies to other organisms and to biological networks involving a wider variety of genes and cellular functions. It seems to me that this paper's objective is not to delineate the technique's practical domain of validity, but rather to motivate this future work, and I think it succeeds in that.

      Might your new "Proposed additional tests" subsection be better housed under Discussion rather than Results?

      I may have missed this, but it doesn't look like you ran simulation benchmarks of your bootstrap-based test for checking whether the normalized covariances are equal. It would be useful to see in simulations how the true and false positive rates of that test vary with the usual suspects like sample size and noise strengths.

      It looks like you estimated the uncertainty for eta_xz and eta_yz separately. Can you get the joint distribution? If you can do that, my intuition is you might be able to improve the power of the test (and maybe detect positive control #3?). For instance, if you can get your bootstraps for eta_xz and eta_yz together, could you just use a paired t-test to check for equality of means?

      The proof is a lot better, and it's great that you nailed down the requirement on the decay of beta, but the proof is still confusing in some places:

      On pg 29, it says "That is, dividing the right equation in Eq. 5.8 with alpha, we write the ..." but the next equation doesn't obviously have anything to do with Eq. 5.8, and instead (I think) it comes from Eq 5.5. This could be clarified.

      Later on page 29, you write "We now evoke the requirement that the averages xt and yt are stationary", but then you just repeat Eq. 5.11 and set it to zero. Clearly you needed the limit condition to set Eq. 5.11 to zero, but it's not clear what you're using stationarity for. I mean, if you needed stationarity for 5.11 presumably you would have referenced it at that step.

      It could be helpful for readers if you could spell out the practical implications of the theorem's assumptions (other than the no-causality requirement) by discussing examples of setups where it would or wouldn't hold.

    1. Reviewer #2 (Public review):

      This is a revised version of a paper I reviewed previously.

      Again, the purpose of the paper is to suggest that common metrics, such as friction or any given physical property of the surface, are probably inadequate to predict the perception of the surface or its discriminability. Instead, the authors propose a very interesting and original idea that, instead, frictional instabilities are related to fine touch perception (title).

      Overall, the authors have put much effort into improving the manuscript, enhancing clarity, and avoiding overstatements. And I feel the narrative is indeed much improved and less ambiguous.

      However, the authors have systematically avoided addressing the main comment of all reviewers: the link made between the mock finger passive experiment and the active human psychophysics is incorrect and should not be done, because its interpretation could be flawed.<br /> - First, this link is very weak (the correlation of 6 datapoints is barely significant).<br /> - Second, the real and mock fingers have very different properties (think about moisture, compliance, roughness,...).<br /> - Third, the comparison is made between a passive and well-controlled experiment and an active exploration. Yet, the comparison metrics (number of events) are clearly dependent on exploration procedures.

      In your response to my comments:<br /> "We have made changes throughout the manuscript to acknowledge that our findings are correlative, clarifying this throughout, and incorporating into the discussion how our work may enable biomechanical measurements and tactile decision making models"

      The authors admit that the analysis is flawed, yet they did not remove it. If they cannot demonstrate that the mock finger and the human finger behave the same way during the perceptual experiment, then they should remove Fig2 that combines apples and oranges. OR, they should look at the active exploration data and compute the same metrics on that data.

      "This "weird choice" is the central innovation of this paper. This choice was necessary because we demonstrated that the common usage of friction coefficient is fundamentally flawed: we see that friction coefficient suggests that surface which are more different would feel more similar - indeed the most distinctive surfaces would be two surfaces that are identical, which is clearly spurious. "

      They did not "demonstrate" such a flaw. Again, the difference in friction is between the mock finger trials. At the very least, the authors should verify that it is true of the active human experiment.

      "To fully implement this, a decision-making model is necessary because, as a counter example, a participant could have generated 10 swipes of SFW and 1 swipe of a Sp, but the Sp may have been the most important event for making a tactile decision. This type of scenario is not compatible with the analysis suggested - and similar counterpoints can be made for other types of seemingly straightforward analysis."

      The suggested analyses are straightforward and would be much more valuable than the data from the mock finger, even with the potential variability stated above.

      "We recognize that, with all factors being equal, this sample size is on the smaller end"

      Yet, the authors did not collect additional data to confirm their findings.

    1. The title of the article makes a simple striking claim about the state of the scientific literature with a numerical estimate of the proportion of “fake” articles. Yet, by contrast to this title, in the text of the article, Heathers is highly critical of his own work.

      James’ peer review of Heathers’ article

      James Heathers often mentions the limitations of his research thus “peer-reviewing” his own article to the extent that he admits that this work is “incomplete”, “unsystematic” and “far flung”.

      This work is too incomplete to support responsible meta-analysis, and research that could more accurately define this figure does not exist yet. ~1 in 7 papers being fake represents an existential threat to the scientific enterprise.”

      While this is highly unsystematic, it produced a substantially higher figure. Correspondents reliably estimated 1-5% of all papers contain fabricated data, and 2-10% contain falsified results.”

      These values are too disparate to meta-analyze responsibly, and support only the briefest form of numerical summary: n=12 papers return n=16 individual estimates; these have a median of 13.95%, and 9 out of 16 of these estimates are between 13.4% and 16.9%. Given this, a rough approximation is that for any given corpus of papers, 1 in 7 (i.e. 14.3%) contain errors consistent with faking in at least one identifiable element.”

      “The accumulation of papers collected here is, frankly, haphazard. It does not represent a mature body of literature. The papers use different methods of analyzing figures, data, or other features of scientific publications. They do not distinguish well between papers that have small problematic elements which are fake, or fake in their entirety. They analyze both small and large corpora of papers, which are in different areas of study and in journals of different scientific quality – and this greatly changes base rates;…”

      “As a consequence, it would be prudent to immediately reproduce the result presented here as a formal systematic review. It is possible further figures are available after an exhaustive search, and also that pre registered analytical assumptions would modify the estimations presented.”

      Heathers has also in an interview published in Retraction Watch (Chawla 2024) acknowledged pitfalls in this article such as:

      “Heathers said he decided to conduct his study as a meta-analysis because his figures are “far flung.””

      “They are a little bit from everywhere; it’s wildly nonsystematic as a piece of work,” he said.”

      “Heathers acknowledged those limitations but argued that he had to conduct the analysis with the data that exist. “If we waited for the resources necessary to be able to do really big systematic treatments of a problem like this within a specific area, I think we’d be waiting far too long,” he said. “This is crucially underfunded.”

      Built in opposition to Fanelli 2009, but it’s illogical

      Heathers states in the abstract that his article is “in opposition” to Fanelli’s 2009 PloS One article (Fanelli 2009), yet that opposition is illogical and artificially constructed since there is no contradiction between 2% of scientists self-reporting having taking part in fabrication or falsification and an eventual much higher proportion of “fake scientific outputs”. Like most of what is wrong with Heather’s article, this is in fact acknowledged by the author who notes that the 2% figure “leaves us with no estimate of how much scientific output is fake” (bias in self-reporting, possibility of prolific authors, etc).

      Fanelli 2009 is not cited in the way JH says it is cited

      Whilst the opposition discussed above is illogical, it could be that the 2% figure is mis-cited by others as representing an estimate of fake scientific outputs thus probably underestimating the extent of fraud. Heathers suggests that this may indeed be the case, but also contradicts himself about how (Fanelli 2009), or the 2% figure coming from that publication, is typically used.

      In one sentence, he writes that “the figure is overwhelmingly the salient cited fact in its 1513 citations” and that “this generally appears as some variant ofabout 2% of scientists admitted to have fabricated, falsified or modified data or results at least once” (Frank et al. 2023)

      whilst and in another sentence, he writes that “the typical phraseology used to express it – e.g. “the most serious types of misconduct, fabrication and falsification (i.e., data fraud), are relatively rare” (George 2016).

      Those two sentences cited by Heathers are fundamentally different, the first one accurately reports that the 2% figure relates to individuals self-reporting, whilst the second one appears to relate to the prevalence of misconducts in the literature itself. How Fanelli 2009 is cited in the literature is an empirical question that can be studied by looking at citation contexts beyond the two examples given by Heathers. Given that a central justification for Heathers’ piece appears to be the misuse of this 2% figure, we sought to test whether this was the case.

      A first surprise was that whilst the sentence attributed to (George 2016) can indeed be found in that publication (in the abstract), first it is not in a sentence citing (Fanelli 2009) nor the 2% figure, and, second, it is quoted selectively omitting a part of the sentence that nuances it considerably: “The evidence on prevalence is unreliable and fraught with definitional problems and with study design issues. Nevertheless, the evidence taken as a whole seems to suggest that cases of the most serious types of misconduct, fabrication and falsification (i.e., data fraud), are relatively rare but that other types of questionable research practices are quite common.” (Fanelli 2009) is discussed extensively by (George 2016), and some of the caveats, e.g. on self-reporting, are highlighted.

      To go beyond those two examples, we constructed a comprehensive corpus of citation contexts, defined as the textual environment surrounding a paper's citation, including several words or sentences before and after the citation (see Methods section below). 737 citation contexts could be analysed. Out of those, the vast majority (533, or 72%) did not cite the 2% figure. Instead, they often referred to this article as a general reference together with other articles to make a broad point, or, focused on other numbers in particular those related to questionable research practices (Bordignon, Said, and Levy 2024). The 28% (204) citation contexts that did mention the 2% figure did so accurately in the majority of cases: 83% (170) of those did mention that it was self-reporting by scientists whilst 17% (34) of those, or 5% of the total citation contexts analysed were either ambiguous or misleading in that they suggested or claimed that the 2% figure related to scientific outputs.

      Although the analysis above does not include all citation contexts, it is possible to conclude unambiguously that the 2% figure is not overwhelmingly the salient cited fact in relation to Fanelli 2009, and that when it is cited it is often accurately, i.e. as representing self-reporting by scientists. Whilst an exhaustive analysis is beyond the scope of this peer review, it is not uncommon to find in this corpus citations contexts that have an alarming tone about the seriousness of the problem of FFPs, e.g. “…a meta-analysis (Fanelli 2009) suggest that the few cases that do surface represent only the tip of a large iceberg." [DOI: 10.1177/0022034510384627]

      Thus, the rationale for Heathers’ study appears to be misguided. The supposed lack of attention for the very serious problem of FFPs is not due to a minimisation of the situation fueled by a misinterpretation of Fanelli 2009. Importantly, even if that was the case, an attempt to draw attention by claiming that 1 in 7 papers are fake, a claim which according to the author himself is not grounded in solid facts, is not how the scientific literature should be used.

      Methods for the construction of the corpus of citation contexts

      We used Semantic Scholar, an academic database encompassing over 200 million scholarly documents from diverse sources including publishers, data providers, and web crawlers. Using the specific paper identifier for Fanelli's 2009 publication (d9db67acc223c9bd9b8c1d4969dc105409c6dfef), we queried the Semantic Scholar API to retrieve available citation contexts. Citation contexts were extracted from the "contexts" field within the JSON response pages, (see technical specifications).

      The query looks like this: semanticscholar.org

      The broad coverage of Semantic Scholar does not imply that citation contexts are always retrieved. The Semantic Scholar API provided citation contexts for only 48% of the 1452 documents citing the paper. To get more, we identified open access papers among the remaining 52% citing papers, retrieved their PDF location and downloaded the files. We used Unpaywall API, which is a database to be queried with a DOI in order to get open access information about a document. The query looks like this.

      We downloaded 266 PDF files and converted them to text format using an online bulk PDF-to-text converter. These files were then processed using TXM, a specialized textual analysis tool. We used its concordancer function to identify the term "Fanelli" as a pivot term and check the reference being the good one (the 2009 paper in PlosOne). We did manual cleaning and appended the citation contexts to the previous corpus.

      Through this comprehensive methodology, we ultimately identified 824 citation contexts, representing 54% (784) of all documents citing Fanelli's 2009 paper. This corpus comprised 48% of contexts retrieved from Semantic Scholar and an additional 6% obtained through semi-manual extraction from open access documents. 87 of those contexts were excluded from the analysis for a range of reasons including: context too short to conclude, language neither English nor French (shared languages of the authors of this review), duplicate documents (e.g. preprints), etc, leaving us with 737 contexts. They were first classified manually in two categories, those mentioning the 2% figure and those which did not. Then, for the first category, they were further classified manually in two categories depending on whether the figure was appropriately assigned to self-reporting of researchers or rather misleadingly suggesting that the 2% applied to research outputs.

      Contributions

      Investigation: FB collected the citation contexts.<br /> Data curation and formal analysis: RL and MS<br /> Writing – review & editing: RL, MS and FB

      References

      Bordignon, Frederique, Maha Said, and Raphael Levy. 2024. “Citation Contexts of [How Many Scientists Fabricate and Falsify Research? A Systematic Review and Meta-Analysis of Survey Data, DOI: 10.1371/Journal.Pone.0005738].” Zenodo. https://doi.org/10.5281/zenodo.14417422.

      Chawla, Dalmeet Singh. 2024. “1 in 7 Scientific Papers Is Fake, Suggests Study That Author Calls ‘Wildly Nonsystematic.’” Retraction Watch (blog). September 24, 2024. https://retractionwatch.com/2024/09/24/1-in-7-scientific-papers-is-fake-suggests-study-that-author-calls-wildly-nonsystematic/.

      Fanelli, Daniele. 2009. “How Many Scientists Fabricate and Falsify Research? A Systematic Review and Meta-Analysis of Survey Data.” PLOS ONE 4 (5): e5738. https://doi.org/10.1371/journal.pone.0005738.

      Frank, Fabrice, Nans Florens, Gideon Meyerowitz-Katz, Jérôme Barriere, Éric Billy, Véronique Saada, Alexander Samuel, Jacques Robert, and Lonni Besançon. 2023. “Raising Concerns on Questionable Ethics Approvals - a Case Study of 456 Trials from the Institut Hospitalo-Universitaire Méditerranée Infection.” Research Integrity and Peer Review 8 (1): 9. https://doi.org/10.1186/s41073-023-00134-4.

      George, Stephen L. 2016. “Research Misconduct and Data Fraud in Clinical Trials: Prevalence and Causal Factors.” International Journal of Clinical Oncology 21 (1): 15–21. https://doi.org/10.1007/s10147-015-0887-3.

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This valuable study revisits the effects of substitution model selection on phylogenetics by comparing reversible and non-reversible DNA substitution models. The authors provide evidence that 1) non time-reversible models sometimes perform better than general time-reversible models when inferring phylogenetic trees out of simulated viral genome sequence data sets, and that 2) non time-reversible models can fit the real data better than the reversible substitution models commonly used in phylogenetics, a finding consistent with previous work. However, the methods are incomplete in supporting the main conclusion of the manuscript, that is that non time-reversible models should be incorporated in the model selection process for these data sets.

      The non-reversible models should be incorporated in the selection model process not because the significantly perform better but only because the do not perform worse than the reversible models and that true biochemical processes of nucleotide substitution does support the science of non-reversibility.

      Reviewer #1 (Public Review):

      The study by Sianga-Mete et al revisits the effects of substitution model selection on phylogenetics by comparing reversible and non-reversible DNA substitution models. This topic is not new, previous works already showed that non-reversible, and also covarion, substitution models can fit the real data better than the reversible substitution models commonly used in phylogenetics. In this regard, the results of the present study are not surprising. Specific comments are shown below.

      True

      It is well known that non-reversible models can fit the real data better than the commonly used reversible substitution models, see for example,

      https://academic.oup.com/sysbio/article/71/5/1110/6525257

      https://onlinelibrary.wiley.com/doi/10.1111/jeb.14147?af=R

      The manuscript indicates that the results (better fitting of non-reversible models compared to reversible models) are surprising but I do not think so, I think the results would be surprising if the reversible models provide a better fitting.

      I think the introduction of the manuscript should be increased with more information about non-reversible models and the diverse previous studies that already evaluated them. Also I think the manuscript should indicate that the results are not surprising, or more clearly justify why they are surprising.

      The surprise in the findings is in NREV12 performing better than NREV6 for double stranded DNA viruses as it was expected that NREV6 would perform better given the biochemical processes discussed in the introduction.

      In the introduction and/or discussion I missed a discussion about the recent works on the influence of substitution model selection on phylogenetic tree reconstruction. Some works indicated that substitution model selection is not necessary for phylogenetic tree reconstruction,

      https://academic.oup.com/mbe/article/37/7/2110/5810088

      https://www.nature.com/articles/s41467-019-08822-w

      https://academic.oup.com/mbe/article/35/9/2307/5040133

      While others indicated that substitution model selection is recommended for phylogenetic tree reconstruction,

      https://www.sciencedirect.com/science/article/pii/S0378111923001774

      https://academic.oup.com/sysbio/article/53/2/278/1690801

      https://academic.oup.com/mbe/article/33/1/255/2579471

      The results of the present study seem to support this second view. I think this study could be improved by providing a discussion about this aspect, including the specific contribution of this study to that.

      In our conclusion we have stated that:

      The lack of available data regarding the proportions of viral life cycles during which genomes exist in single and double stranded states makes it difficult to rationally predict the situations where the use of models such as GTR, NREV6 and NREV12 might be most justified: particularly in light of the poor over-all performance of NREV6 and GTR relative to NREV12 with respect to describing mutational processes in viral genome sequence datasets. We therefore recommend case-by-case assessments of NREV12 vs NREV6 vs GTR model fit when deciding whether it is appropriate to consider the application of non-reversible models for phylogenetic inference and/or phylogenetic model-based analyses such as those intended to test for evidence of natural section or the existence of molecular clocks.

      The real data was downloaded from Los Alamos HIV database. I am wondering if there were any criterion for selecting the sequences or if just all the sequences of the database for every studied virus category were analysed. Also, was any quality filter applied? How gaps and ambiguous nucleotides were considered? Notice that these aspects could affect the fitting of the models with the data.

      We selected varying number of sequences of the database for every studied virus type. Using the software aliview we did quality filter by re-aligning the sequences per virus type.

      How the non-reversible model and the data are compared considering the non-reversible substitution process? In particular, given an input MSA, how to know if the nucleotide substitution goes from state x to state y or from state y to state x in the real data if there is not a reference (i.e., wild type) sequence? All the sequences are mutants and one may not have a reference to identify the direction of the mutation, which is required for the non-reversible model. Maybe one could consider that the most abundant state is the wild type state but that may not be the case in reality. I think this is a main problem for the practical application of non-reversible substitution models in phylogenetics.

      True

      Reviewer #1 (Recommendations for the authors):

      The reversible and non-reversible models used in this study assume that all the sites evolve under the same substitution matrix, which can be unrealistic. This aspect could be mentioned.

      Done

      The manuscript indicates that "a phylogenetic tree was inferred from an alignment of real sequences (Avian Leukosis virus) with an average sequence identity (API) of ~90%.". I was wondering under which substitution model that phylogenetic tree reconstruction was performed? could the use of that model bias posterior results in terms of favoring results based on such a model?

      We have stated that the GTR+G model was used to reconstruct the tree. The use of the GTR+G model could yes bias the posterior results as we have stated in the paper too.

      I was wondering which specific R function was used to calculate the weighted Robinson-Foulds metric. I think this should be included in the manuscript.

      We stated that We used the weighted Robinson-Foulds metric (wRF; implemented in the R phangorn package (Schliep, 2011)⁠)

      Despite a minority, several datasets fitted better with a reversible model than with a non-reversible model. I think that should be clearly indicated. In addition, in my opinion the AIC does not enough penalizes the number of parameters of the models and favors the non-reversible models over the reversible models, but this is only my opinion based on the definition of AIC and it is not supported. Thus, I think the comparison between phylogenetic trees reconstructed under different substitution models was a good idea (but see also my second major comment).

      Noted

      When comparing phylogenetic trees I was wondering if one should consider the effect of the estimation method and quality of the studied data? For example, should bootstrap values be estimated for all the ancestral nodes and only ancestral nodes with high support be evaluated in the comparison among trees?

      Yes the estimation method and quality of the studied data should be considered. When using RF unlike wRF this will not matter but for weighted RF it does. When building the trees, using RaxML only high support nodes are added to the tree.

      In Figure 3, I do not see (by eye) significant differences among the models. I see in the legend that the statistical evaluation was based on a t test but I am not much convinced. Maybe it is only my view. Exactly, which pairs of datasets are evaluated with the t test? Next, I would expect that the influence of the substitution model on the phylogenetic tree reconstruction is higher at large levels of nucleotide diversity because with more substitution events there is more information to see the effects of the model. However, the t test seems to show that differences are only at low levels of nucleotide diversity (and large DNR), what could be the cause of this?

      The paired T-tests compares the wRF distances of the inferred tree real tree and the trees simulated using the GTR model verses the wRF distances of the inferred true tree from the trees simulated using the NREV12 model.

      The reason why the influence of the NREV12 model on the tree reconstructed is not significantly higher at large levels of nucleotide diversity could be because at a certain level the DNR are simply unrealistic.

      Can the user perform substitution model selection (i.e., AIC) among reversible and non-reversible substitution models with IQTREE? If yes, then doing that should be the recommendation from this study, correct?

      But, can DNR be estimated from a real dataset? DNR seems to be the key factor (Figure 3) for the phylogenetic analysis under a proper model.

      Substitution model selection can be performed among reversible and non-reversible using both HyPhy and IQTREE. And we have recommended that model tests should be done as a first step before tree building. Estimating DNR from real datasets requires a substation rate matrix of a non-reversible.

      The manuscript has many text errors (including typos and incorrect citations). For example, many citations in page 20 show "Error! Reference source not found.". I think authors should double check the manuscript before submitting. Also, some text is not formally written. For example, "G represents gamma-distributed rates", rates of what? The text should be clear for readers that are not familiar with the topic (i.e., G represents gamma-distributed substitution rates among sites). In general, I recommend a detailed revision of the whole text of the manuscript.

      Done

      Reviewer #2 (Public Review):

      The authors evaluate whether non time reversible models fit better data presenting strand-specific substitution biases than time reversible models. Specifically, the authors consider what they call NREV6 and NREV12 as candidate non time-reversible models. On the one hand, they show that AIC tends to select NREV12 more often than GTR on real virus data sets. On the other hand, they show using simulated data that NREV12 leads to inferred trees that are closer to the true generating tree when the data incorporates a certain degree of non time-reversibility.

      Based on these two experimental results, the authors conclude that "We show that non-reversible models such as NREV12 should be evaluated during the model selection phase of phylogenetic analyses involving viral genomic sequences". This is a valuable finding, and I agree that this is potentially good practice.

      However, I miss an experiment that links the two findings to support the conclusion: in particular, an experiment that solves the following question: does the best-fit model also lead to better tree topologies?

      By NREV12 leading to inferred trees that are closer to the true generating tree as compared to GTR, it then shows that the best-fit model in this case being NREV12 leads to better tree topologies.

      On simulated data, the significance of the difference between GTR and NREV12 inferences is evaluated using a paired t test. I miss a rationale or a reference to support that a paired t test is suitable to measure the significance of the differences of the wRF distance. Also, the results show that on average NREV12 performs better than GTR, but a pairwise comparison would be more informative: for how many sequence alignments does NREV12 perform better than GTR?

      We have used the popular paired t-test as it is the most widely used when comparing means values between two matched samples where the difference of each mean pair is normally distributed. And the wRF distances do match the guidelines above.

      The paired t-test contains the pairwise comparison and the boxplots side by side show the pairwise wRF comparisions.

      Reviewer #2 (Recommendations for the authors):

      The authors reference Baele et al., 2010 for describing NREV6 and NREV12. I suggest using the same name used in the referenced paper: GNR-SYM and GNR respectively. Although I do not think there is a standard name for these models, I would use a previously used one.

      We have built studies based on the names NREV6 and NREV12. We would like to keep the naming as standard for our studies.

      GTR and NREV12 models are already described in many other papers. I do not see the need to include such an extensive description. Also, a reference should be included to the discrete Gamma rate categories [1]

      We included the extensive description to enable other readers who are not super familiar with these models better understanding since we have given the models our own naming different from those used in other papers.

      We have added referencing for the discrete gamma rate as recommended. (Yang, 1994)

      To evaluate the exhaustiveness and correctness of the results, I would recommend publishing as supplementary material the simulated data sets or the scripts for generating the data set, the scripts or command lines for the analysis, and the versions of the software used (e.g., IQTREE). Also, to strongly support the main conclusion of the manuscript, I suggest adding to the simulations section results the RF-distances of the best-fit selected model under AIC, AICc, and BIC as well.

      We can go ahead and submit all the needed datasets. The simulated data RF-Distances results are available and will be submitted. We cannot however add them to the main document as this will create very long data tables.

      In some instances, it is mentioned that the selection criterion used is AIC, while in others, AIC-c is referenced. Even in the table captions, both terms are mixed. It should be made clearer which criterion is being employed, as AIC is not suitable for addressing the overparameterization of evolutionary models, given that it does not account for the sample size. A previous pre-print of this article [2] does not mention AIC-c, but also explicitly includes the formulas for AIC that do not take the sample size into account, and reports the same results as this manuscript, what indicates that AIC and not AIC-c was used here. This should be clarified. It is recommended to use AIC-c instead of AIC, especially if the sample size to model parameters ratio is low [3]. Two things may be appointed here: some authors consider tree branch lengths as model free parameters and others do not. In this paper it is not specified how the model parameters are counted. AIC tends to select more parameterized models than AIC-c, and overparameterization can lead to different tree inferences, as evidenced in Hoff et al., 2016. Therefore, it is expected that NREV12 is more frequently selected than NREV6 and GTR.

      In my opinion, a pairwise comparison between GTR and NREV12 performance is of great interest here, and the whiskers plots are not useful. Scatterplots would display the results better.

      Boxplots are meant to offer a simplified view of the results as the paired t-tests does all of the comparisons. We shall provide the scatter plots as supplementary information so that readers can get full detailed plots as recommended.

      Some references are missing.

      Missing references added

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We would like to thank the reviewers for taking the time to review our manuscript and for providing valuable comments on how to improve it. We are pleased to see that both reviewers recognize the novelty and importance of our study, its conceptual advance and potential clinical significance. They also noted the novelty and value of our functional mechanistic approach using epigenetic editing. Below, we provide a point-by-point response to their questions and points raised. The changes introduced in response to their feedback are highlighted in yellow in the revised manuscript file.

      Point-by-point description of the revisions

      __Reviewer #1 (Evidence, reproducibility and clarity (Required)): __

      Summary This study by Prada et al. aimed to explore DNA methylation and gene expression in primary EpCAMhigh/PDPNlow cells, consisting of for (probably) the largest part of AT2 cells, to understand the molecular mechanisms behind the impaired regeneration of alveolar epithelial progenitor cells in COPD. They found that higher or lower promoter methylation in COPD-associated cells was inversely correlated with changes in gene expression, with interferon signaling emerging as one of the most upregulated pathways in COPD. IRF9 was identified as the master regulator of interferon signaling in COPD. Targeted DNA demethylation of IRF9 in an A549 cell line resulted in a robust activation of its downstream target genes, including OAS1, OAS3, PSMB8, PSMB9, MX2 and IRF7, demonstrating that demethylation of IRF9 is sufficient to activate the IFN signaling pathway, validating IRF9 as a master regulator of IFN signaling in (alveolar) epithelial cells.

      Major comments:

      • To remove airways (and blood vessels) completely from the lung tissue is difficult, if not impossible. This means that the assumption that the sorted EpCAMpos/PDPNlow cells primarily consisted of AT2 cells remains valid only if a quantitative analysis is conducted on the proportion of HT2-280pos cells in all samples in cytospins to exclude any significant contamination from bronchial epithelial cells. If authors cannot demonstrate >95% pure HT-280-positive cells, then the key conclusions suggesting that the epigenetic regulation of the IFN pathway might be crucial in AT2 progenitor cell regeneration could also potentially apply to bronchial progenitor cells. In addition, if >95% purity cannot be demonstrated, the data should be adjusted to account for differences in cell type composition.

      __Response: __

      We thank the reviewer for raising this important point. Although, as pointed out by the reviewer, we cannot guarantee that our sorted cells do not contain a minor contamination from respiratory / terminal bronchial cells, we carefully selected donors, tissue regions, and sorting strategy to ensure the highest possible enrichment of AT2 cells, as we explain below. We have now expanded the methods and results section and covered this point in the manuscript discussion.

      • The lung tissue pieces we received were distal, as evidenced by the presence of pleura. We collected representative tissue pieces for histology to validate sample quality. Our protocol includes a dissection of all visible airways and vessels using a dissecting microscope, which were cryopreserved separately from distal parenchyma. Hence, the starting material for tissue dissociation was depleted from airways and vessels. The importance of vessel/airway removal for enrichment of distal alveolar cells was established by Tata's group (PMID: 35712012).
      • We selected the AT2 sorting protocol (EpCAMpos/PDPNlow) based on previous publications that used tissue from both healthy and COPD lungs to separate AT2 cells from AT1 and airway basal cells, as AT1 and basal cells are both PDPNhigh (PMID: 22033268, PMID: 23117565; PMID: 35078977). This protocol was favoured due to the lack of information about HT2-280 expression and distribution in COPD lungs.
      • The sort quality for each sample was assessed by the FACS analysis (back sorting) of the sorted cells, where we observed 95-97% purity (EpCAMpos/PDPNlow, __ 1G __shown below). In addition, we validated the sorting protocol and high AT2 enrichment from both no COPD and COPD tissues by immunostaining the FACS-sorted cells with HT2-280, an AT2 marker widely used in the field (strategy suggested by the reviewer) and observed that close to 100% of cells were positive for this marker (__Fig. 1H __shown below). However, we could not do it retrospectively for those patients, where we didn't have enough material. Sorting primary AT2 from small tissue pieces is challenging, and we need at least 20.000 cells to obtain high-quality methylation & RNA-seq data.
      • AT2 marker genes (ABCA3, LPCAT1, LAMP3 and the surfactant genes SFTPA2, SFTPB and SFTPC) were among the top highly expressed genes in our RNA-seq data and were not significantly changed in COPD (please see expression data in __ S2A__ in the manuscript, and below for convenience), as well as Table 6, providing further evidence that the sorted cells carry a strong AT2 transcriptional signature. Fig. 1G* FACS plot examples showing the analysis of sorted AT2 cells (back sorting) from control (blue) and COPD (green) donors displayed over total cell lung suspensions (grey) H Representative IF staining of HT2-280 expression in sorted AT2 cells from no COPD (top) and COPD (bottom) donors. Nuclei (blue) were stained with DAPI, scale bars=20µm __Fig. S2A __Normalized read counts from RNA-seq data for AT2-specific genes in sorted AT2 cells from each donor (dots). Data points represent normalised counts from no COPD (blue), COPD I (light green) and COPD II-IV (dark green). Group median is shown as a black bar. *

      • In agreement with a previous study which profiled bulk AT2 using expression arrays (PMID: 23117565), we also observed upregulation of IFN signaling pathway in COPD AT2s. The enrichment of IFNα/β signature was also observed in COPD in the inflammatory AT2 cluster (iAT2) in a recent scRNA-seq study (PMID: 36108172). As part of the revision, we compared the IFN gene signature identified in our bulk AT2 RNA-seq with a recent scRNA-seq study (published after the submission of our manuscript, PMID: 39147413) that profiled EpCAMpos cells from COPD and non-smoker donor lungs. We observed an upregulation of our IFN signature genes in AT2 in COPD (mostly in AT2c and rbAT2 subsets), suggesting that similar signatures were observed in COPD AT2s in this dataset as well (please see __ S4E-F__ below). ____Figure S4E Expression values for the indicated genes of the IFN pathway from an external scRNA-seq dataset of AT2 cells from COPD patients and healthy controls (Hu et al, 2024). Y-axis shows log-normalized gene expression levels. F. Combined gene set score of the genes shown in (E) in different subsets of AT2 cells from Hu et al, 2024. The IFN signature genes were identified in our integrative analysis of TWGBS and RNA-seq in sorted AT2 cells.

      • We have also carefully examined DNA methylation profiles across all samples. The density plots of our T-WGBS DNA methylation data are very similar among the individual samples in all 3 groups, indicating that the sorted cells consist mostly of a single cell type, as there are no obvious intermediate (25-75%) methylation peaks, as observed in cell mixtures ( 2A and the panel below). No reference DNA methylation profiles are available for respiratory or terminal bronchial cells; hence, we cannot compare how epigenetically different these cells would be from AT2 nor perform a deconvolution for potential minor contamination with distal airway cells. *Figure: DNA methylation density plots of sorted EpCAMpos/PDPNneg cells from no COPD (blue, n=3), COPD I (light green, n=3) and COPD II-IV (dark green, n=5) showing a homogeneous methylation pattern and low abundance at intermediate (25%-75%) methylation values across all profiled samples, indicating that the sorted cells were mostly of a single cell type. *

      • We have now added a sentence to the limitations section of the discussion to cover that point specifically. CHANGES IN THE MANUSCRIPT:

      AT2 cells were isolated by fluorescence-activated cell sorting (FACS) from cryopreserved distal lung parenchyma, depleted of visible airways and vessels of three no COPD controls, three COPD I and five COPD II-IV patients as previously described (24, 52, 53)

      The isolated cells were positive for HT2-280, a known AT2 marker (54)*, as confirmed by immunofluorescence (Fig. 1H), validating the identity and high enrichment of the isolated AT2 populations. ** *

      *Known AT2-specific genes, including ABCA3, LAMP3 and surfactant genes (SFTPA2, SFTPB and SFTPC) were among the top highly expressed genes and were not significantly changed in COPD AT2s (Fig. S2A, Table 6), further confirming the AT2-characteristic transcriptional signature of our isolated cells. *

      However, 5-AZA is a global demethylating agent, and the observed effects may not be direct. To validate the epigenetic regulation of central AT2 pathways further, we took advantage of locus-specific epigenetic editing technology *(73). We focused on the IFN pathway because it was the most significantly enriched Gene Ontology (GO) term in our integrative analysis of TWGBS and RNA-seq data. Several IFN pathway members had associated hypomethylated DMRs within promoter-proximal regions and concomitant increased gene expression (Fig. 4C and S2C). Additionally, we confirmed the elevated expression of IFN-related genes with associated DMRs identified in our study in AT2 cells and AT2 cell subclusters from a recently published scRNA-seq cohort (74) (Fig. S4E-F). *

      We observed upregulation of multiple IFN genes in AT2 in COPD, consistent with a previous expression array study (24). IFNα/β signaling was also enriched in COPD patients in the inflammatory AT2 cluster (iAT2) in a recent scRNA-seq study (84) and our INF signature genes were also upregulated in AT2c and AT2rb subsets in COPD, identified by another scRNA-seq study recently (74)*. ** *

      Finally, despite careful removal of airways from distal lung tissue using a dissecting microscope, we cannot exclude the presence of some terminal/respiratory bronchiole cells in our FACS-isolated EpCAMpos/PDPNlow population. Recent scRNA-seq studies provided an unprecedented resolution and identified several epithelial subpopulations and transitional cells residing in the terminal/respiratory bronchioles and alveoli, including respiratory airway secretory cells (93), terminal airway-enriched secretory cells (28), terminal bronchiole-specific alveolar type-0 (AT0) (70), and emphysema-specific AT2 cells (74). These cells may contribute to alveolar repair in healthy and COPD lungs; however, with our bulk DNA methylation and RNA-seq study, we are unable to resolve all these subpopulations. Future development of single-cell methylation and non-reference-based algorithms for DNA methylation deconvolution will enable deeper epigenetic phenotyping of specific AT2 and bronchiolar cell subsets.

      (Methods) Validation of IFN gene upregulation in a published scRNA-seq dataset

      scRNA-seq data from (74), generously provided by M. Köningshoff, were processed using the default Seurat workflow (117). Expression of IFN-related genes was extracted and plotted as log-normalised gene expression levels in AT2 cells from control and COPD donors. Seurat's AddModuleScore() function was used to compute a gene set score for a custom IFN program using the genes listed in __Fig. S4E __and to analyse the IFN gene set scores in AT2 cell subclusters identified in (74). Briefly, average gene expression scores were computed for the gene set of interest, and the expression of control features (randomly selected) was subtracted as described in (118).

      Fig. S4E and F: E. Expression values for the indicated genes of the IFN pathway from an external scRNA-seq dataset of AT2 cells from COPD patients and healthy controls (74). Y-axis shows log-normalized gene expression levels. F. Combined gene set score of the genes shown in (E) in different subsets of AT2 cells from (74). The IFN signature genes were identified in our integrative analysis of TWGBS and RNA-seq in sorted AT2 cells.

      • The overrepresentation of several keratins (KRT5, KRT14, KRT16, KRT17), mucins (MUC12, MUC13, MUC16, MUC20) and the transcription factor FoxJ1 is now attributed by the authors to a possible dysregulation of AT2 identity and differentiation in COPD (lines 282 - 284) where they cite refs 28, 69, 70. Authors try to support this with IF double stains for KRT5 and HT-280 to identify co-expression of KRT5 and HT2-280 in lung tissue (Figure S2H). However, the evidence for the co-expression of both markers could be presented more convincingly.

      __Response: __

      We found the potential co-expression of airway and alveolar markers in COPD lungs interesting and hence included it in the original manuscript. The initial discovery came from our bulk RNA-seq data, where we observed upregulation of several genes typically found in more proximal airways in COPD (mentioned above by the reviewer). Of note, some of them (e.g., FoxJ1) are expressed at very low levels. Following reviewer's comments, to validate possible colocalization of AT2 and airway markers on protein level, we performed further IF analysis. We took Z-stack images to demonstrate the co-localization of HT2-280 and Krt5 more convincingly and co-stained the same tissue regions with SCGB3A2 (a TASC/distal airway cell marker, PMID 36796082). Even though these are rare events, we were able to reproduce the existence of HT2-280/Krt5 positive, SCGB3A2 negative cells in the alveoli of COPD patients on the protein level (__Fig. S2H __and panels below). Although interesting, we decided to keep this finding in the supplement and did not include it in the discussion to focus the story on the epigenetic regulation of the IFN pathway, which is the main discovery of our study. We will investigate this observation in future studies.

      Figure S2H and here: Examples of HT2-280/Krt5 double positive cells. Top, immunofluorescence staining of the alveolar region of a COPD II donor showing the existence of AT2 cells (HT2-280 positive (red), which are SCGB3A2 negative (green, left) but KRT5 positive (green, right). In conclusion, double-positive HT2-280/KRT5 cells are rare but present in the alveoli of COPD patients. Magnification: 20x. Scale bar: 50 µm. Bottom, Z-stack images highlighting HT2-280 (red) and KRT5 (green) double-positive cells at 63x magnification. Scale bar: 5 µm.

      CHANGES IN THE MANUSCRIPT:

      In addition, we observed an upregulation of several keratins (KRT5, KRT14, KRT16, KRT17) and mucins (MUC12, MUC13, MUC16, MUC20), suggesting a potential dysregulation of alveolar epithelial cell differentiation programs in COPD (Table 6, Fig. S2F). Immunofluorescence staining confirmed the presence of KRT5-positive cells in the distal lung in COPD and identified cells positive for both KRT5 and HT2-280 (Fig. S2H). Collectively, these results indicate a dysregulation of stemness and identity in the alveolar epithelial cells in COPD.

      Fig. S2H legend: The zoomed-in panel (right corner, bottom) demonstrates the presence of rare HT2-280/KRT5 double-positive cells in the alveoli of COPD patients.* Slides were counterstained with DAPI, scale bars = 50µm, 20µm or 5µm, as displayed in images. *

      • Double staining for KRT5 and HT2-280 did highlight the proximity of both cell types in lung tissue, underscoring the challenge of removing airways (including the smaller and terminal bronchi) from the tissue. In addition, HT-280/KRT5 co-expression is not consistent with recent studies from refs 28, 69, 70 where other markers for distal airway cell transition, such as SCGB3A2 and BPIFB1, have been demonstrated, which were not investigated in this study.

      Response:

      We provided a general overview of the different signatures observed in our data, but we could not validate every deregulated pathway or gene. We include the relevant tables detailing all differentially expressed genes and differentially methylated regions to enable and encourage the community to follow up on the data in subsequent studies.

      As demonstrated above, we detect the co-occurrence of HT2-280/KRT5 staining on the protein level in the same cells in the alveoli of COPD patients. We would like to emphasize that alveolar epithelial cell identity in CODP lungs has not been investigated in detail on the protein or RNA level, and HT2-280/KRT5 co-expression/co-localization has not been directly tested in the studies mentioned by the reviewer since, among other reasons, the gene encoding HT2-280 has not been identified. Notably, a recent study (published after the submission of our manuscript) focusing on enriched epithelial cells from the distal lungs of COPD patients (PMID 35078977), identified an emphysema-specific AT2 subtype co-expressing the AT2 marker SFTPC and distal airway cell transition marker SCGB3A2, indicating that disease-specific AT2 populations with possible co-occurrence of AT2 and airway markers exist. In our dataset, SCGB3A2 was not deregulated (log2 fold change=0.22, adj p-value= 0.47), as shown in Table 6, and the HT2-280/Krt5 positive cells were negative for SCGB3A2 in our IF staining (see above).

      BPIFB1 is one of the antimicrobial peptides genes with an associated DMR and is significantly upregulated in COPD cells in our study (log2 fold change=1.17, adj p-value=0.0016), as shown in the supplementary figure Fig S4C and here below for convenience.

      Figure S4C Fold-change in gene expression of BPIFB1 in AT2 cells in COPD (RNA-seq) and A549 cells treated with 0.5µM AZA (RT-qPCR) compared to control samples. Left, RNA-seq data from AT2 cells (no COPD, blue, n=3; COPD II-IV, green, n=5). Right, A549 treated with AZA (orange, n=3) compared to control DMSO-treated cells (grey, n=3). The group median is shown as a black bar.

      • The small (and not evenly divided) sample size of both COPD and non-COPD specimens may lead to a higher risk for false positive results as adjustments for multiple testing typically rely on the number of comparisons, and small sample sizes may not provide enough data points to adequately control for this.

      __Response: __

      We acknowledge the problem of testing for multiple traits with relatively small numbers of samples. The availability of donor tissue, especially from non-COPD and COPD-I donors, was limited, and we applied very strict donor matching and quality control criteria for sample inclusion to avoid additional variability and confounding factors. The importance of strict quality control in selecting appropriate control samples was highlighted in our previous study (PMID: 33630765), where we demonstrated that approximately 50% of distal lung tissue from cancer patients with normal spirometry has pathological changes. Hence, we believe that the quality of the tissue was paramount to the reliability of the data. Strict quality control and sample matching for multiple parameters, including age, BMI, smoking status and smoking history (critical for DNA methylation studies), and cancer type (for background tissue), is a key strength of our approach, but it inevitably limited our sample size.

      First, all samples were cryopreserved and then processed in parallel in groups of 1 non-COPD and 2-3 COPD samples. This process included tissue dissociation, FACS sorting, back sorting (always), and immunofluorescence staining (when enough material was available). Cell pellets were stored at -80{degree sign}C until the entire cohort was ready for sequencing. This was done to limit the potential variation introduced by processing and sorting. RNA and DNA isolations were performed in parallel for all the sorted cell pellets, which were then sequenced as a single batch.

      During data analysis, we applied stringent cutoffs for DMR detection to reduce the risk of false positives due to multiple comparisons and a small sample size. Specifically, we filtered for regions with at least 10% methylation difference and containing at least 3 CpGs. Additionally, we applied a non-parametric Wilcoxon test using average DMR methylation levels to remove potentially false-positive regions, as the t-statistic is not well suited for non-normally distributed values, as expected at very low/high (close to 0% / 100%) methylation levels. A significance level of 0.1 has been used. Therefore, we are confident that the rigorous analysis and strict criteria applied in this study allowed us to detect trustworthy DMRs that we could further functionally validate using epigenetic editing. All the details of the DMR analysis are provided in the methods section. To address this point and limitation, we have added the following paragraphs in the discussion section of the manuscript:

      CHANGE IN THE MANUSCRIPT:

      *The strengths of our study include the use of purified human alveolar type 2 epithelial progenitor cells from a well-matched and carefully validated cohort of human samples, including mild and severe COPD patients, providing high relevance to human COPD. *

      However, we acknowledge several limitations of our study that warrant further investigation. First, the sample size was small. The use of strict quality criteria for donor selection limited the available samples, particularly for the ex-smoker control group. This resulted in an unequal distribution of COPD and control samples. This impacts the power of statistical analysis, particularly in the WGBS analysis, where millions of regions genome-wide are tested. Nevertheless, the clear negative correlation between promoter methylation and corresponding gene expression highlights the robustness of the DMR selection. Additionally, we were able to experimentally validate interferon-associated DMRs using epigenetic editing, highlighting the power of integrated epigenetic profiling in identifying disease-relevant regulators.

      __Minor suggestions for improvement __

      __Introduction __ • In general, refer to the actual experimental studies rather than review papers where appropriate.

      Response:

      We have now carefully checked all the references and amended them to refer to experimental studies when required.

      • Clearly specify whether a study was conducted in mice or humans, as this distinction is crucial for understanding the relevance of the findings to COPD.

      __Response: __

      All our experiments were performed with human lung cells and tissues. No mouse samples were used. As suggested, we have now clearly stated that our study was performed using human tissue samples and cells in different parts of the manuscript, including the discussion, where we now explicitly highlight the strengths and limitations of our study.

      CHANGES IN THE MANUSCRIPT:

      ...we generated whole-genome DNA methylation and transcriptome maps of sorted human primary alveolar type 2 cells (AT2) at different disease stages.

      However, the regulatory circuits that drive aberrant gene expression programs in human AT2 cells in COPD are poorly understood

      Therefore, we set out to profile DNA methylation of human AT2 cells at single CpG-resolution across COPD stages.

      ...*suggesting that aberrant epigenetic changes may drive COPD phenotypes in human AT2. *

      To identify genome-wide DNA methylation changes associated with COPD in purified human AT2 cells...

      The similarity of the methylation and gene expression profiles in the PCAs suggested that epigenetic and transcriptomic changes in human AT2 cells during COPD might be interrelated ...

      *In this work, we demonstrate that genome-wide DNA methylation changes occurring in human AT2 cells may drive COPD pathology by dysregulating key pathways that control inflammation, viral immunity and AT2 regeneration. *

      *Using high-resolution epigenetic profiling, we uncovered widespread alterations of the DNA methylation landscape in human AT2 cells in COPD that were associated with global gene expression changes. *

      *Currently, it is unclear how cigarette smoking leads to changes in DNA methylation patterns in human AT2 *

      The strengths of our study include the use of purified human alveolar epithelial progenitor cells from a well-matched and carefully validated cohort of human samples, including mild and severe COPD patients, providing high relevance to human COPD.

      __Methods __ • Line 473, here is meant 3 ex-smoker controls instead of smoker controls?

      __Response: __

      All donors (no COPD and COPD) used in our study are ex-smokers. Matching the samples with regard to smoking status and history is critical for epigenetic studies, as cigarette smoke profoundly affects DNA methylation genome-wide (PMID: 38199042, PMID: 27651444). This has now been clarified in the revised manuscript.

      CHANGE IN THE MANUSCRIPT____:

      Of note, we included only ex-smokers in our profiling to avoid acute smoking-induced inflammation as a confounding factor (50)*. *

      Importantly, we matched the smoking status and smoking history of all donors, which is key in epigenetic studies, as cigarette smoking profoundly impacts the DNA methylation landscape of tissues (96).

      In total, 3 ex-smoker controls (no COPD), 3 mild COPD donors ex-smokers (GOLD I, COPD I) and 5 moderate-to-severe COPD donors ex-smokers (GOLD II-IV, COPD II-IV) were profiled (Fig. 1A-C, Table 1)

      __Discussion __ • A list of limitation should be added to the discussion. One is the use of the alveolar cell line A549, which produces mucus, a characteristic more commonly associated with bronchial epithelial cells. (ref 43)l530:

      __Response: __

      The profiling was performed using purified primary human alveolar epithelial progenitor cells. For technical reasons, A549 cells were only used for validation of the results using epigenetic editing. The A549 phenotype depends on the growth medium used, in our case, Ham's F-12 medium, which is recommended for long-term A549 culture and promotes multilamellar body formation and differentiation toward an AT2-like phenotype (PMID: 27792742)__. __We are developing epigenetic editing technology for use in primary lung cells; however, the approach currently relies on the high efficiency of transient transfections, which cannot yet be achieved with primary adult AT2 cells. We were positively surprised by how well the methylation data obtained from patient AT2s translated into mechanistic insights when using A549 cells, despite being a cancer cell line. This suggests that the fundamental mechanisms of epigenetic regulation of IRF9 and the IFN signaling pathway are conserved between A549 and primary AT2 cells.

      • Another limitation to consider is that cells were isolated primarily from individuals with lung cancer, except for patients with COPD stage IV. In particular as COPD stage II and IV samples were taken together. And discuss the small and unevenly divided sample size

      __Response: __

      We thank the reviewer for bringing up this important point, which we carefully considered when designing our study. To match our samples across the cohort, all the no-COPD, COPD I, and two of the COPD II-IV samples were obtained from cancer resections. In addition to other characteristics, like age, BMI and smoking status, we also matched the donors by cancer type (all profiled donors had squamous cell carcinoma). We collected lung tissue as far away from the carcinoma as possible and sent representative pieces for histological analysis by an experienced lung pathologist to confirm the absence of visible tumours. In addition, to ensure that our data represents COPD-relevant signatures, we intentionally included samples from three COPD donors undergoing lung resections (without a cancer background) in the profiling.

      Following the reviewer's suggestion, to investigate the potential impact of non-cancer samples on driving the observed differences, we carefully checked the PCAs for both DNA methylation and RNA-seq. We could not identify a clear separation of no-cancer COPD samples from the cancer COPD samples (or other cancer samples) in any examined PCs, indicating no cofounding effect of cancer background in the samples. We observed that one sample contributing to PC2 is a non-cancer sample, but this was a rather sample-specific effect, as the other two non-cancer samples clustered together with the other severe COPD samples with a cancer background. Notably, in our DNA methylation data, we do not observe typical features of cancer methylomes, like global loss of DNA methylation or aberrant methylation of CpG islands (e.g., in tumour suppressor genes) (see Fig 2A), further suggesting that we do not "pick up" confounding cancer signatures in our data.

      Following the comments from both reviewers, to clarify that point, we added the information about cancer and non-cancer samples to the PCA figures for DNA methylation (new Fig. 2B) and RNA-seq (new Fig. 3A) data in the revised manuscript, as shown below

      CHANGE IN THE MANUSCRIPT____:

      COPD samples from donors with a cancer background clustered together with the COPD samples from lung resections, confirming that we detected COPD-relevant signatures (Fig. 2B).

      Fig.2B* Principal component analysis (PCA) of methylation levels at CpG sites with > 4-fold coverage in all samples. COPD I and COPD II-IV samples are represented in light and dark green triangles, respectively, and no COPD samples as blue circles. COPD samples without a cancer background are displayed with a black contour. The percentage indicates the proportion of variance explained by each component. *

      Unsupervised principal component analysis (PCA) on the top 500 variable genes revealed a clear influence of the COPD phenotype in separating no COPD and COPD II-IV samples, as previously observed with the DNA methylation analysis, irrespective of the cancer background of COPD samples (Fig.3A, Fig. S2B).

      *Principal component analysis (PCA) of 500 most variable genes in RNA-seq analysis. PCA 1 and 2 are shown in Fig.3A, PCA 1 and 4 in Fig.S2B. COPD I and COPD II-IV samples are represented in light and dark green triangles, respectively, and no COPD samples as blue circles. COPD samples without a cancer background are displayed with a black contour. The percentage indicates the proportion of variance explained by each component. *

      __Response: __

      We thank the reviewer for suggestions on how to improve the discussion of our manuscript. We have now added a strength/limitation section to our discussion and included the points suggested by both reviewers.

      CHANGE IN THE MANUSCRIPT____:

      The strengths of our study include the use of purified human alveolar epithelial progenitor cells from a well-matched and carefully validated cohort of human samples, including mild and severe COPD patients, providing high relevance to human COPD. Importantly, we matched the smoking status and smoking history of all donors, which is key in epigenetic studies, as cigarette smoking profoundly impacts the DNA methylation landscape of tissues (96). With the first genome-wide high-resolution methylation profiles of isolated cells across COPD stages, we offer novel insights into the epigenetic regulation of gene expression in epithelial progenitor cells in COPD, expanding our understanding of how alterations in regulatory regions and specific genes could contribute to disease development. We identified IRF9 as a key IFN transcription factor regulated by DNA methylation. Notably, by targeting IRF9 through epigenetic modifications, we modulated the activity of the IFN pathway, which plays a crucial role in the immune response and lung tissue regeneration. Epigenetic editing techniques could offer a novel therapeutic strategy for COPD by downregulating IFN pathway activation and promoting the regeneration of epithelial progenitor cells in the lungs. Further preclinical and clinical studies are needed to validate the efficacy and safety of epigenetic editing approaches in COPD treatment (33)*. *

      *However, we acknowledge several limitations to our study that warrant further investigation. First is the small sample size and replication difficulty due to the lack of available data, common challenges for studies working with sparse human material and hard-to-purify cell populations. The use of strict quality criteria in donor selection limited the available samples, especially for the ex-smoker control group, leading to an unequal distribution of COPD and control samples. Overall, this impacts the power of statistical analysis, especially in the WGBS analysis, where millions of regions genome-wide are tested. Nevertheless, the clear negative correlation of promoter methylation to the corresponding gene expression highlights the robustness of the DMR selection. Furthermore, we could experimentally validate interferon-associated DMRs using epigenetic editing, highlighting the power of integrated epigenetic profiling for the discovery of disease-relevant regulators. *

      Overall, we detected a higher number of correlated DMR-DEG associations using our simple promoter-proximal linkage compared to the GeneHancer approach. Assigning enhancers to their target genes with high confidence is a complex and challenging task. Enhancers are often located far from the genes they regulate and can interact with their target genes through three-dimensional chromatin loops. Furthermore, enhancers can operate in a highly context-dependent manner, with the same enhancer regulating different genes depending on the cell type, developmental stage, or environmental signals. Determining which enhancer is active under specific conditions remains a hurdle in the field, especially since the AT2-specific chromatin profiles of enhancer marks are not yet available.

      In addition, while WGBS provides unprecedented resolution and high coverage of the DNA methylation sites across the genome, it does not allow distinguishing 5-methylcytosine from 5-hydroxymethylcytosine. Therefore, we cannot exclude that some methylated sites we detected are 5-hydroxymethylated. However, as 5-hydroxymethylcytosine is present at very low levels in the lung tissue (97)*, its effect is likely marginal. *

      Finally, despite careful removal of airways from distal lung tissue using a dissecting microscope, we cannot exclude the presence of some terminal/respiratory bronchiole cells in our FACS-isolated EpCAMpos/PDPNlow population. Recent scRNA-seq studies provided an unprecedented resolution and identified several epithelial subpopulations and transitional cells residing in the terminal/respiratory bronchioles and alveoli, including respiratory airway secretory cells (93), terminal airway-enriched secretory cells (28), terminal bronchiole-specific alveolar type-0 (AT0) (70), and emphysema-specific AT2 cells (74). These cells may contribute to alveolar repair in healthy and COPD lungs; however, with our bulk DNA methylation and RNA-seq study, we are unable to resolve all these subpopulations. Future development of single-cell methylation and non-reference-based algorithms for DNA methylation deconvolution will enable deeper epigenetic phenotyping of specific AT2 and bronchiolar cell subsets.

      __References __ • Check references. For instance, there is no reference in the text to ref 43.

      • Align format of references

      __Response: __

      We thank the reviewer for spotting this inconsistency. We have carefully checked and aligned the format of all references. The (old) reference 43 is now mentioned in the discussion part.

      __Reviewer #1 (Significance (Required)): __

      The strength of this study lies in its focus on the molecular mechanisms underlying the impaired regeneration of epithelial progenitor cells in COPD. The discovery of IRF9, which regulates IFN signaling and is prominently upregulated in COPD, together with the convincing validation of the epigenetic control of the IFN pathway by targeted DNA demethylation of the IRF9 gene, adds significant value to the COPD research field.

      Main limitations of the study are the relatively small sample size of both COPD and non-COPD specimens and the claim that the sorted EpCAMpos/PDPNlow cells primarily consisted of AT2 cells.

      __- Describe the nature and significance of the advance (e.g. conceptual, technical, clinical) for the field. __

      The nature and significance of the advance in epigenetic editing of IRF9 in COPD can be described as both conceptual and potentially clinical:

      Conceptual Advance: The epigenetic editing of IRF9 enhances our understanding of the molecular mechanisms underlying COPD pathogenesis. By targeting IRF9 through epigenetic modifications, researchers were able to modulate the activity of the IFN pathway, which plays a crucial role in the immune response and lung tissue regeneration. This approach offers insights into the epigenetic regulation of gene expression in epithelial progenitor cells in COPD and expands our understanding of how alterations in specific gene methylation could contribute to disease progression.

      Clinical Significance: The potential clinical significance of epigenetic editing of IRF9 lies in its implications for COPD therapy. If successful, epigenetic editing techniques could offer a novel therapeutic strategy for COPD by downregulating IFN pathway activation and promoting regeneration of epithelial progenitor cells in the lungs. Obviously, further preclinical and clinical studies are needed to validate the efficacy and safety of epigenetic editing approaches in COPD treatment.

      __Response: __We thank the reviewer for recognising the importance of our study, its conceptual advance and potential clinical significance. We are pleased to see that the reviewer highlights the promise of epigenetic editing in both furthering our basic understanding of molecular mechanisms of chronic diseases and its future potential as a therapeutic strategy.

      __- Place the work in the context of the existing literature (provide references, where appropriate). __ Few experimental papers have been published on epigenetic editing in lung diseases, with limited research available beyond the study referenced in citation 43. Song J, Cano-Rodriquez D, Winkle M, Gjaltema RA, Goubert D, Jurkowski TP, Heijink IH, Rots MG, Hylkema MN. Targeted epigenetic editing of SPDEF reduces mucus production in lung epithelial cells. Am J Physiol Lung Cell Mol Physiol. 2017 Mar 1;312(3):L334-L347. doi: 10.1152/ajplung.00059.2016. Epub 2016 Dec 23. PMID: 28011616.

      Response:

      We thank the reviewer for recognising the uniqueness and novelty of our study and the lack of research on the functional understanding of DNA methylation in the context of lung and lung diseases.

      - State what audience might be interested in and influenced by the reported findings.

      This study is of broad interest to researchers investigating the pathogenesis and treatment of COPD.

      __- Define your field of expertise with a few keywords to help the authors contextualize your point of view. __

      Expertise in: Lung pathology, Immunology, COPD, Epigenetics

      - Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate. Less expertise in: Epigenetic Editing

      __Reviewer #2 (Evidence, reproducibility and clarity (Required)): __

      __Summary: __

      This study aim to understand the molecular mechanisms underlying dysfunction in AT2 cells in COPD, by profiling bulk genome wide DNA methylation using Tagmentation-based whole-genome bisulfite sequencing (T-WGBS) and RNA sequencing in selectively sorted primary AT2 cells. The study stands out in it's sequencing breadth and use of an incredibly difficult cell population, and has the potential to add substantially to our mechanistic understanding of epigenetic contributions to COPD. A further highlight is the concluding aspect of the study where the authors undertook targeted modification of specific CpG methylation, provided direct, site-specific evidence for transcriptional regulation by CpG methylation.

      Response:

      We thank the reviewer for recognizing the conceptual and methodological advance of our study and for noting the value of our functional mechanistic approach.

      __Major comments: __

      The authors clearly show that there is DNA methylation alteration in AT2 cells from COPD individuals that links functional to gene expression at some level. However, I think the statement "to identify genome-wide changes associated with COPD development and progression..." and similar other references to disease development understanding is not accurate given the DNA methylation primary comparison is between control and moderate to severe COPD, with no temporal detail or evidence that they drive progression rather than are a result of COPD development. The paragraph starting on line 186 where this is a addressed to some extent is quite vague and doesn't really provide confidence that DNAm dysregulation occurs at an early stage in this context. This can be addressed by changing the focus/style of the text.

      __Response: __

      Thank you for raising this point. We agree with the reviewer that our cross-sectional study describes the association of methylation changes with either COPD I or more established disease (COPD II-IV) and that the observed changes may be either the driver or a result of COPD development. This has been clarified in the revised manuscript, and we removed the statements about disease initiation and progression. This is an important point; hence, we added an extra line to the discussion to make that clear.

      __CHANGE IN THE MANUSCRIPT____: __

      Therefore, we set out to profile DNA methylation of human AT2 cells at single CpG-resolution across COPD stages to identify epigenetic changes associated with disease and combine this with RNA-seq expression profiles.

      To identify epigenetic changes associated with COPD, we collected lung tissue from patients with different stages of COPD,

      ....to identify methylation changes associated with mild disease, we included TWGBS data from AT2 isolated from COPD I patients (n=3) in the analysis.

      Currently, we do not know whether the identified DNA methylation changes are the cause or the consequence of the disease process and not much is known about the correlation of DNA methylation with disease severity.

      *However, our study is cross-sectional, our cohort included only 3 COPD I donors, and we did not have any follow-up data on the patients, so future large-scale profiling of mild disease (or even pre-COPD cohorts) in an extended patient cohort will be crucial for a better understanding of early disease and its progression trajectories. *

      __Results comments and suggestions: __

      For the integrated analysis, there is a focus on DMRs in promoters with very little analysis on other regions. The paragraph starting on line 317 describes some analysis on enhancers but is very brief, doesn't include information on how many/which DMRs were included, making it hard to interpret the impact of the 147 DMRs and 93 genes identified - is this nearly all DMRs and genes analysed or very few? A comparison to the promoter analysis would be of interest. Especially as the targeted region followed up with lovely functional assessment in the last sections is a gene body DMR, not a promoter DMR.

      __Response: __

      We thank the reviewer for pointing out the importance of changes in enhancers. We agree that extending the enhancer analysis is very interesting. However, assigning enhancers to their target genes with high confidence is a complex and challenging task. Enhancers are often located far from the gene they regulate, sometimes spanning hundreds of kilobases. They can interact with their target genes through three-dimensional chromatin loops, potentially bypassing nearby genes to activate more distant ones, making it difficult to confidently link specific enhancers to their target genes. Furthermore, enhancers can operate in a highly context-dependent manner. The same enhancer can regulate different genes depending on the cell type, developmental stage, or environmental signals. Another challenge is that enhancers often work in clusters or "enhancer landscapes," where multiple enhancers contribute to the regulation of a single gene. Disentangling the contribution of individual enhancers within such clusters and determining which enhancer is active under specific conditions remains an ongoing hurdle in the field, especially since the AT2-specific chromatin profiles of enhancer marks are not yet available.

      One approach we tried to account for more distal regulatory regions was to assign DMRs to the nearest gene with a maximum distance of up to 100 kb using GREAT (Genomic Regions Enrichment of Annotations Tool) and simultaneously perform gene enrichment analysis of the associated genes. The old Figure S1C (now S1D) shows the top 10 enriched terms of either hyper- or hypomethylated DMRs, and Table 4 shows the full list of enriched terms. However, in this analysis, we did not integrate the results of the RNA-seq analysis. To demonstrate that we can correlate methylation with gene expression associations in this analysis, we then took a closer look at the WNT/b-catenin pathway, which contains 147 DMRs associated with 93 genes from the respective pathway (old Figure S3D, now S3G). Here, we showed that distal DMRs up to 100 kb away from the TSS show a high correlation with gene expression. We are including the two figures below for convenience:

      *Left panels, functional annotation of genes located next to hypermethylated (top) and hypomethylated (bottom) DMRs using GREAT. Hits were sorted according to the binominal adjusted p-value and the top 10 hits are shown. The adjusted p-value is indicated by the color code and the number of DMR associated genes is indicated by the node size. Right panel, scatter plot showing distal DMR-DEG pairs associated with Wnt-signaling. Pairs were extracted from GREAT analysis (hypermethylated, DMR-DEG distance Following the reviewer's suggestion, we have now extended the enhancer analysis using the GeneHancer database, the most comprehensive, integrated resource of enhancer/promoter-gene associations. We used the GeneHancer version 5.14, which annotates 392,372 regulatory genomic elements (GeneHancer element) on the hg19 reference genome. Of the 25,028 DMRs, 18,289 DMRs (73% of all DMRs) coincided with at least one GeneHancer element, resulting in 19,661 DMR-GeneHancer associations. Next, we extracted the GeneHancer elements associated with protein-coding or long-non-coding RNAs genes, which left us with 2,144 DMR-GeneHancer associations. Next, we used only high-scoring gene GeneHancer associations ("Elite"), leaving 1,485 DMR-GeneHancer associations. Of those, we selected the GeneHancer elements, which are linked to genes differentially expressed in our RNA-seq analysis resulting in a final table of 376 DMR-GeneHancer associations (Table 9 DMR_DEG_GeneHancer, Tab 2). Similar to the promoter-proximal analysis, we analysed the correlation of expression and methylation changes of the DMR-GeneHancer associations, demonstrating a high number of negatively and positively correlated events (Fig.S3D). Finally, we performed the gene enrichment analysis for positively and negatively correlating genes. We detected significant GO term enrichments only for negatively correlating genes (Fig.S3E and Table 10_Enrichment_results, Tab2).

      CHANGE IN THE MANUSCRIPT

      To harness the full resolution of our whole-genome DNA methylation data, we extended the analysis beyond promoter-proximal regions and assessed how epigenetic changes in distal regulatory regions (enhancers) may relate to transcriptional differences in COPD. As the assignment of enhancer elements to the corresponding genes is challenging, we tried two different approaches. First, we used the GeneHancer database (72) to link DMRs to regulatory genomic elements (GeneHancer element). Of the 25,028 DMRs, 18,289 DMRs (73%) coincided with at least one GeneHancer element. Of those 2,144 DMR-GeneHancer associations were linked either to protein-coding or lncRNA genes. Next, we filtered for high-scoring gene GeneHancer associations ("Elite"), leaving 1,485 DMR-GeneHancer Elite associations. Of those, we selected the GeneHancer elements, which are linked to genes differentially expressed in our RNA-seq analysis, resulting in 376 DMR-GeneHancer associations (Table 9). Similar to the promoter-proximal analysis, we assessed the correlation of expression and methylation changes of the DMR-GeneHancer associations, demonstrating a high proportion of negatively and positively correlated events (Fig. S3E). Finally, we performed gene enrichment analysis for positively and negatively correlated genes. We detected significant GO term enrichments for negatively correlating genes only (Fig. S3F and Table 10), with the most pronounced term "regulation of tumor necrosis factor". In an alternative approach, we linked proximal and distal (within 100 kb from TSS) DMRs to the next gene using GREAT (57) (Fig S1C, Table 4) *and calculated Spearman correlation between DMRs and associated DEGs__. 147 DMRs were associated with high correlation rates with 93 genes from the WNT/β-catenin pathway (Fig. S3G)__, suggesting that DNA methylation may also drive the expression of genes of the WNT/β-catenin family. *

      Figure S3E and F: E. Spearman correlation between gene expression and DMR methylation of DMRs assigned to gene regulatory elements using the GeneHancer database. F. GO-Term over-representation analysis of DEGs negatively correlated to DMRs in gene regulatory elements. The adjusted p-value is indicated by the color code and the percentage number of associated DEGs is indicated by the node size.

      (Methods) For enhancer analysis, the GeneHancer database version 5.14, which annotates 392,372 regulatory genomic elements (GeneHancer element) on the hg19 reference genome, was used (72). Of the 25,028 DMRs 18,289 DMRs coincided with at least one GeneHancer element, resulting in 19,661 DMR-GeneHancer associations. Next, the GeneHancer elements were filtered for association with protein-coding or long-non-coding RNAs genes and high-scoring gene GeneHancer associations ("Elite"), leaving 1,485 DMR-GeneHancer associations. Of those, the GeneHancer elements were selected, which are linked to differentially expressed genes in COPD resulting in a final table of 376 DMR-GeneHancer associations. Similar to the promoter-proximal analysis, the Spearman correlation of expression and methylation changes of the DMR-GeneHancer associations was assessed. GO gene enrichment analysis for positively and negatively correlating genes was done using Metascape (111).

      A comparison to the promoter analysis would be of interest.

      Response:

      We detected more highly correlated (|correlation coefficient| > 0.5) DMR-DEG associations using our simple promoter proximal linkage (n=643) in comparison with the GeneHancer approach comprising annotated enhancer elements (n=327/2,144). Gene enrichment results pointed to the interferon pathway, which we could confirm using epigenetic editing. This pathway was not present in the GeneHancer analysis, indicating that regulation of the IFN pathway may be controlled by proximal elements.

      CHANGE IN THE MANUSCRIPT____:

      Overall, we detected a higher number of correlated DMR-DEG associations using our simple promoter-proximal linkage compared to the GeneHancer approach. Assigning enhancers to their target genes with high confidence is a complex and challenging task. Enhancers are often located far from the genes they regulate and can interact with their target genes through three-dimensional chromatin loops. Furthermore, enhancers can operate in a highly context-dependent manner, with the same enhancer regulating different genes depending on the cell type, developmental stage, or environmental signals. Determining which enhancer is active under specific conditions remains a hurdle in the field, especially since the AT2-specific chromatin profiles of enhancer marks are not yet available.

      Especially as the targeted region followed up with lovely functional assessment in the last sections is a gene body DMR, not a promoter DMR.

      Response:

      We thank the reviewer for bringing up that point. To clarify, we defined the promoter regions for the analysis as regions located {plus minus} 6 kb (upstream and downstream) from the transcriptional start site (TSS). Since the term "promoter" often refers to the region upstream of the transcriptional start site, its use may have been misleading. For clarity, we changed the text correspondingly to __promoter proximal methylation __and explained in the methods how the regions for analysis were defined.

      __CHANGE IN THE MANUSCRIPT____: __

      "DMR association per gene promoter" was changed to "Gene promoter proximal DMRs"

      Fig. S3B: "DMR in promoter" was changed to "promoter proximal DMR(s)"

      "by DNA methylation changes in promoters" was changed to "by DNA methylation changes in promoter proximity"

      "regulated by promoter methylation" was changed to "regulated by promoter-proximal methylation"

      "analysis of the promoter DMRs" was changed to "analysis of the promoter-proximal DMRs"

      "between promoter methylation" was changed to "between promoter proximal methylation"

      Cytoscape was used to analyse negatively or positively correlated DMR DEG pairs. ClueGO (v2.5.6) analysis was conducted using all DEG associated with a promoter proximal DMR (+/- 6 kb from TSS) and the Spearman correlation coefficient 0.5 (112).

      • Lines 299-301 - I'm not sure the graph in Fig S3A support the conclusion that there was a preferential negative relationship between DNAm and gene expression. Looks like there are a substantial number of cases where a positive relationship is observed and this needs to be acknowledged.

      Response:

      In this part, we refer to Fig S3C. In the left panel, downregulated genes clearly show higher counts for the hypermethylated DMRs, whereas the hypomethylated DMRs are enriched at upregulated genes (right panel), indicating a preference for negative correlation: lower methylation, higher gene expression. If there were no preference, we would expect a 50:50 ratio of hypo- and hypermethylated DMRs, and we observed a 77:23 ratio. Nevertheless, we agree that there is a substantial number of cases (n=151) with a high positive correlation, which we now highlight in the text. For clarity, we also modified the figure legend to indicate that a stacked histogram is represented in the panel.

      __CHANGE IN THE MANUSCRIPT____: __

      L303: Interestingly, 23.5% of the identified DMR DEG pairs (n=151) showed a positive correlation between gene expression and DNA methylation.

      *Figure legend in Fig. S3C was changed to: C Stacked histogram showing location of hyper- and hypomethylated DMRs relative to the TSS of DEGs in downregulated (left) and upregulated (right) genes. *

      • Line 307 - what are the "analysed DEGs"? Are they the methylation associated genes?

      Response:

      Those are the DEGs we identified in RNA-seq analysis. To clarify, we changed the text to "identified DEGs".

      __CHANGE IN THE MANUSCRIPT____: __

      • "analysed DEGs" was changed to "identified DEGs"*

      • Line 307-309 - "Among the analyzed DEGs, 76.5% (492) displayed a negative correlation (16.8% of the total DEGs), indicating a possible direct regulation by DNA methylation, while 23.5% (151) showed a positive correlation between gene expression and DNA methylation" - are the authors suggesting the positive correlation doesn't indicate direct regulation?

      __Response: __

      Thank you for highlighting this point. We did not intend to suggest that negative correlation indicates direct regulation, while positive correlation suggests a lack thereof. To clarify that point, we have reformulated this sentence.

      __CHANGE IN THE MANUSCRIPT____: __

      Among the identified DEGs, 76.5% (n=492) displayed a negative correlation (16.8% of the total DEGs), consistent with a repressive role of promoter DNA methylation. Interestingly, 23.5% of the identified DEG (n=151) showed a positive correlation between gene expression and DNA methylation.

      • Line 313 - why did the authors focus on only negatively correlated genes to identify their top dysregulated pathway of IFN signalling? Why not do pathway analysis on the DNAm associated genes separately to identify DNAm associated pathways?

      Response:

      We have also performed a pathway enrichment analysis using the positively correlated genes but did not identify any significantly enriched pathways/process/terms. When we examined the top hit of the gene set enrichment analysis, the interferon signaling pathway, we observed only negatively correlated DMR gene associations (Fig. 5B). Therefore, we decided to use only the negatively correlated DMRs, as using all correlated genes would give a higher background and dilute our results.

      CHANGE IN THE MANUSCRIPT____:

      Cytoscape was used to analyse negatively or positively correlated DMR DEG pairs. ClueGO (v2.5.6) analysis was conducted using all DEG associated with a promoter proximal DMR (+/- 6 kb from TSS) and the Spearman correlation coefficient 0.5 (113).

      • A comparison of the gene expression data with previous data in AT2 cell/single cell data would strengthen the gene expression section.

      __Response: __

      We compared our gene expression signatures with the study of Fujino et al., who profiled sorted AT2 cells (EpCAMhighPDPNlow) from COPD/controls using expression arrays (PMID: 23117565). Consistent with our study, the authors also observed the upregulation of interferon signalling (among other pathways) in COPD AT2s. However, no raw data was available in the published manuscript for a more in-depth analysis.

      Several recent scRNA-seq studies identified transcriptional signatures of COPD and control cells (e.g., PMIDs: 36108172, 35078977, 36796082, 39147413__). However, most studies did not match the smoking status of the control and COPD donors and looked at the whole lung tissue, with limited power to detect gene expression changes in distal alveolar cells. It is difficult to directly compare our data to the gene expression data from non-smokers vs COPD patients, as cigarette smoking profoundly remodels the epigenome and transcriptional signatures of cells. In addition, differences in technologies and depth of sequencing make such comparisons challenging. However, one study (PMID: 36108172) performed scRNA-seq analysis on 3 non-smokers, 4 ex-smokers and 7 COPD ex-smoker lungs. Despite relatively limited coverage of epithelial cells in the dataset (We also compared the main AT2 IFN signature identified in the integration of our DNA methylation in promoter-proximal regions and RNA-seq with a recent study (published after the submission of our manuscript, PMID: 39147413) that profiled EpCAMpos cells from COPD and control lungs (non-smokers) using scRNA-seq. We observed an upregulation of our IFN signature genes in AT2 in COPD (specifically in AT2-c and rbAT2 subsets), suggesting that similar signatures were observed in this dataset as well. However, ex-smokers were not included in this study, making direct comparisons difficult. We have now included the panels shown below as __Figure S4E and S4F:

      Figure S4E and F: Expression values for the indicated genes of the IFN pathway from an external scRNA-seq dataset of AT2 cells from COPD patients and healthy controls (74). Y-axis shows log-normalized gene expression levels. F. Combined gene set score of the genes shown in (E) in different subsets of AT2 cells from (74)*. The IFN signature genes were identified in our integrative analysis of TWGBS and RNA-seq in sorted AT2 cells. *

      CHANGES IN THE MANUSCRIPT:

      However, 5-AZA is a global demethylating agent, and the observed effects may not be direct. To validate the epigenetic regulation of central AT2 pathways further, we took advantage of locus-specific epigenetic editing technology (73). We focused on the IFN pathway because it was the most significantly enriched Gene Ontology (GO) term in our integrative analysis of TWGBS and RNA-seq data. Several IFN pathway members had associated hypomethylated DMRs within promoter-proximal regions and concomitant increased gene expression (Fig. 4C and Fig.S2C). Additionally, we confirmed the elevated expression of IFN-related genes with associated DMRs identified in our study in AT2 cells and AT2 cell subclusters from a recently published scRNA-seq cohort (74)* (Fig. S4E-F). *

      (Methods) Validation of IFN gene upregulation in a published scRNA-seq dataset

      scRNA-seq data from (74), generously provided by M. Köningshoff, were processed using the default Seurat workflow (117). Expression of IFN-related genes was extracted and plotted as log-normalised gene expression levels in AT2 cells from control and COPD donors. Seurat's AddModuleScore() function was used to compute a gene set score for a custom IFN program using the genes listed in __Fig. S4E __and to analyse the IFN gene set scores in AT2 cell subclusters identified in (74). Briefly, average gene expression scores were computed for the gene set of interest, and the expression of control features (randomly selected) was subtracted as described in (118).

      Fig. S4 E and F. E. Expression values for the indicated genes of the IFN pathway from an external scRNA-seq dataset of AT2 cells from COPD patients and healthy controls (74). Y-axis shows log-normalized gene expression levels. F. Combined gene set score of the genes shown in (E) in different subsets of AT2 cells from (74). The IFN signature genes were identified in our integrative analysis of TWGBS and RNA-seq in sorted AT2 cells. __ __

      • The paragraph starting on line 173 feels a little redundant when we know there is RNA available to test if the differential DNAm links to altered gene expression - this selected of example regions/genes would be better placed after the gene expression has been reported, at which point you could say whether the linked genes displayed altered transcription.

      Response:

      The current structure (with DNA methylation, followed by RNA-seq and integration) is intentional and serves several important purposes. As this is the first genome-wide high-resolution COPD DNA methylation study of AT2, we aimed to describe the methylation landscape independently of gene expression (noting the limitation of current understanding of how DNA methylation regulates expression). This early focus on DMRs lays clear groundwork by highlighting potential regulatory elements and pathways that could be disrupted, independent of or even before corroborative transcriptional data. Additionally, positioning these examples early in the narrative helps to frame subsequent gene expression analyses. Once RNA data are introduced later, the reader can directly compare the methylation patterns with transcriptional outcomes, thereby enhancing the overall story. In other words, by first showcasing disease-relevant methylation changes, we underscore a hypothesis that these epigenetic modifications are functionally meaningful. The later integration of gene expression data then serves as a confirmatory or complementary layer, rather than the sole basis for inferring biological significance. This is important as we still do not fully understand the function of DNA methylation outside promoters, and its role is also important for splicing, 3D genome organisation, non-coding RNA regulation, enhancer regulation, etc.

      • Similarly, the TF enrichment analysis is great but maybe would have added value to be done on DNA regions later shown to be linked to differential expression - was there different enrichment at DNA regions that are vs are not associated with altered expression? And could you test in vitro whether changing methylation of DNA (maybe a blunt too like 5-aza would be ok) alters TF binding (cut+run/ChIP?). Furthermore, it would be interesting to understand the TF sensitivity analysis within the context of positive versus negative DNA methylation:gene expression correlations.

      Response:

      As suggested by the reviewer, we now performed the TF enrichment analysis using the DMRs with a high correlation (|correlation coefficient|>0.5) between methylation and expression (Figure S3D) and expanded the method section to include TF analysis. We observed ETS domain motifs enriched at hypomethylated regions. They prefer unmethylated DNA (MethylMinus) and are therefore expected to bind with higher affinity to the respective DMRs in COPD. We agree with the reviewer that further verifying altered TF binding using cut&run or ChIP assays would be very interesting, but it is out of the scope of this manuscript. Such analysis is technically very challenging to perform with low numbers of primary AT2 cells and will be the focus of our follow-up mechanistic studies.

      CHANGE IN THE MANUSCRIPT____:

      Additionally, motif analysis of DMRs that were highly correlated (|Spearman correlation coefficient| > 0.5) with DEGs revealed a prominent enrichment of the cognate motif for ETS family transcription factors, such as ELF5, SPIB, ELF1 and ELF2 at hypomethylated DMRs (Fig. S3D). Interestingly, SPIB was shown to facilitate the recruitment of IRF7, activating interferon signaling (71)*, and our WGBS data uncovers SPIB motifs at hypomethylated DMRs, which aligns with its binding preferences at unmethylated DNA (methyl minus, Fig. S3D). *

      Figure S3D: Enrichment of methylation-sensitive binding motifs at hypo- (right) and hypermethylated (left) DMRs, using DMRs with a high correlation (|Spearman correlation coefficient| > 0.5) between methylation and gene expression. Methylation-sensitive motifs were derived from Yin et al (64). Transcription factors, whose binding affinity is impaired upon methylation of their DNA binding motif, are shown in red (Methyl Minus), and transcription factors, whose binding affinity upon CpG methylation is increased, are shown in blue (Methyl Plus).

      (Methods) To obtain information about methylation-dependent binding for transcription factor motifs which are enriched at DMRs, the results of a recent SELEX study (64)* were integrated into the analysis. They categorised transcription factors based on the binding affinity of their corresponding DNA motif to methylated or unmethylated motifs. Those whose affinity was impaired by methylation were categorised as MethylMinus, while those whose affinity increased were categorised as MethylPlus. A motif database of 1,787 binding motifs with associated methylation dependency was constructed. The log odds detection threshold was calculated for the HOMER motif search as follows. Bases with a probability > 0.7 got a score of log(base probability/0.25); otherwise, the score was set to 0. The final threshold was calculated as the sum of the scores of all bases in the motif. Motif enrichment analysis was carried out against a sampled background of 50,000 random regions with matching GC content using the findMotifsGenome.pl script of the HOMER software suite, omitting CG correction and setting the generated SELEX motifs as the motif database. *

      __Methods: __ • The authors should include more detail of the TWGBS rather than directing the reader to a previous publication. Also DNA concentration post bisulfite conversion would be a useful metric to provide.

      __Response: __

      Following the suggestion, we have now expanded the details of TWGBS in the methods part of the manuscript. Due to limited space, we did not include a detailed protocol but instead referred to a published step-by-step protocol (55). Of note, we do not measure DNA concentration post-bisulfite conversion but consistently use the starting input of 30 ng of genomic DNA across all samples.

      __CHANGE IN THE MANUSCRIPT____: __

      (Methods): 15 pg of unmethylated DNA phage lambda was spiked in as a control for bisulfite conversion. Tagmentation was performed in TAPS buffer using an in-house purified Tn5 assembled with load adapter oligos (55) at 55 {degree sign}C for 8 min. Tagmentation was followed by purification using AMPure beads, oligo replacement and gap repair as described (55). Bisulfite treatment was performed using EZ DNA Methylation kit (Zymo) following the manufacturer's protocol.

      *The T-WGBS library preparations were performed for all donors in parallel and sequenced in a single batch to minimize batch effects and technical variability. *

      • Differential DNA methylation analysis: It is stated that DNA regions had to contain 3 CpG sites but was this within a defined DNA size range?

      Response:

      The maximum distance between individual CpGs within DMR was set to 300 bp. To clarify, we added that information to the methods part.

      __CHANGE IN THE MANUSCRIPT____: __

      *"regions with at least 10% methylation difference and containing at least 3 CpGs with a maximum distance of 300 bp between them. *

      • Refence genome only provided for RNAseq not TWGBS?

      __Response: __We used hg19 as the reference genome. The information on the reference genome for DNA methylation analysis was provided in the methods L574 (original manuscript_: "The reads were aligned to the transformed strands of the hg19 reference genome using BWA MEM")

      • The tables do not appear in the PDF and I struggled to tally to the "Dataset" files provided if that is what they were referring to?

      Response:

      Full tables (uploaded as Datasets in the manuscript central due to their size) were uploaded together with the manuscript files. They are quite large and will not convert to pdf, so they may not have been included in the merged pdf file. We assume that they should be available to the reviewers with the other files and will clarify that with the editorial staff in the resubmission cover letter.

      • For the gene expression analysis, can it be made clearer that a full analysis was done on COPD I samples. It is a little confusing to the reader as this was not done for DNAm so might be assumed the same targeted analysis on only genes found to be differentially expressed between control and COPD II-IV, but that cannot be the case as an overlap of COPD1 vs COPD II-IV genes if provided. For this overlap, do genes show the same effect direction?

      __Response: __

      To clarify, for the RNA-seq analysis, we performed DEG analysis for no-COPD versus COPD II-IV, as well as no-COPD versus COPD I. We then took all differentially expressed genes (presented in the Venn diagram) and plotted them for all samples as a heatmap. To split the genes into groups displaying similar effect directions, we applied a clustering approach and identified 3 main signatures. Cluster 3 primarily comprises genes unique to COPD I samples, which are associated with the adaptive immune system and hemostasis (Fig. 4E). In the other two clusters, we mainly observe a transitioning pattern from control to severe COPD samples, correlating with the FEV1 values of the patients. This has now been clarified in the manuscript.

      • Replication is difficult on these studies as the samples are so difficult to come by. Also limited by sample size for the same reason. It doesn't mean the study is not worth doing and the data are still valuable. However, it may be pertinent to include technical validation of a few regions of interest, acknowledge the limitation (along side strengths) in the discussion, and perhaps provide actual p value rather than blanket Response:

      We thank the reviewer for acknowledging the replication challenges for studies working with sparse human material and hard-to-purify cell populations. Following the reviewer's suggestion, we have now included a strengths and limitations section in the discussion where we summarised the points highlighted by both reviewers.

      Regarding technical validation, we would like to note that the whole genome bisulfite sequencing (WGBS) technology, as well as the tagmentation-based WGBS (T-WGBS), have been validated in the past few years in several publications (e.g., PMID: 24071908) and shown to yield reliable DNA methylation quantification in comparison to other technologies (PMID: 27347756). For us, technical validation using alternative methods (e.g. bisulfite sequencing or pyrosequencing) is difficult as it requires significantly more input DNA than the low-input T-WGBS we have performed and obtaining sufficient amounts of material from primary human AT2 cells (especially from severe COPD) is not possible with the size of tissue we can access. However, while establishing the T-WGBS for this project, we initially validated our approach using Mass Array, a sequencing-independent method. For this, we performed T-WGBS on the commercially available smoker and COPD lung fibroblasts and selected 9 regions with different methylation levels for validation using a Mass Array. We obtained an excellent correlation between both methods, providing technical validation of T-WGBS and our analysis workflow. This validation was published in our earlier manuscript (PMID: 37143403), but we provided the data below for convenience.

      Scatter plots showing correlation of average methylation obtained with T-WGBS and Mass Array from COPD and smoker fibroblasts. Each dot represents one region with varying methylation levels. The blue diagonal represents the linear regression. Shaded areas are confidence intervals of the correlation coefficient at 95%. Correlation coefficients and P values were calculated by the Pearson correlation method.

      To enable further validation and follow-up by the community, we included the full list of DMRs, associated p-values and additional information for DNA methylation analysis (DMR width, n.CpGs, MethylDiff, etc) in Table 3 (Table_3_wgbs_dmr_info.xlsx) and the information about DEGs from RNA-seq in Table 6 (Table_6_RNAseq_DEG_info.xlsx).

      • It isn't clear to me if DNA and RNA are from the same cells? The results say "cells matching those used for T-WGBS" but the methods suggest separate extractions so not the same cells? If they are not the same cells a comment on the implications of this should be included in the discussion for example, potentially some differences in cell type composition, storage time etc.

      Response:

      Lung tissue samples were freshly cryopreserved, and H&E slides derived from exemplary pieces of the tissue analyzed. Once we had a group of at least 3 samples comprising one non-COPD and 2 COPD samples, we processed them in parallel to limit sorting variation between control and disease samples. The sorted cells were counted, aliquoted and pelleted at 4{degree sign}C before flash freezing and storing at -80{degree sign}C. The storage time of the cell pellets varied between the donors. RNA and DNA were isolated from cell pellets collected from the same FACS sorting experiment; therefore, we do not expect differences in cell type composition. In addition, RNA and DNA isolation were performed for all sorted pellets in parallel. All library preparations for TWGBS and RNA-seq were performed for all donors in parallel and sequenced in a single batch to minimise batch effects and technical variability. This has now been clarified in the methods part of the manuscript.

      __CHANGE IN THE MANUSCRIPT____: __

      To minimize potential technical bias, samples from no COPD and COPD donors were processed in parallel in groups of 3 (one no COPD and 2 COPD samples).

      RNA and genomic DNA for RNA-seq and TWGBS were isolated from identical aliquots of sorted cell pellets.

      Genomic DNA was extracted from 1-2x104 sorted alveolar epithelial cells isolated from cryopreserved lung parenchyma from 11 different donors in parallel using QIAamp Micro Kit

      The TWGBS library preparations were performed for all donors in parallel and sequenced in a single batch to minimize batch effects and technical variability.* *

      RNA was isolated from flash-frozen pellets of 2x104 sorted AT2 cells from 11 different donors in parallel.

      The RNA-seq library preparation for all donors was performed in parallel and all samples were sequenced in a single batch to minimize batch effects and technical variability.

      • Line 193 the authors say "Since DMRs were overrepresented at cis-regulatory sites...." - "cis" needs to be defined. If you link DNAm regions to gene via "closest gene" does this not automatically mean you're outputs will be cis? Just needs better definition/explanation.

      Response:

      The term "cis‐regulatory sites" in our manuscript is intended to denote regulatory elements-such as enhancers, promoters, and other nearby control regions-that reside on the same chromosome and close to the genes they regulate. While it's true that linking a DMR to its closest gene captures a cis association, our phrasing emphasises that the DMRs are enriched specifically at these functional regulatory elements (Fig. 2E) rather than being randomly distributed. This usage aligns with established conventions in the field. To avoid any misunderstandings, we have now changed the term to gene regulatory sites.

      __CHANGE IN THE MANUSCRIPT____: __

      *We changed the "cis-regulatory sites" to "gene regulatory sites" *

      __Minor comments: __

      Line 157: "we identified site-specific differences....". Change to region specific?

      Response:

      This has now been corrected as suggested.

      Line 102-103: needs a reference for the statement "Alterations in DNA methylation patterns have been implicated......"

      Response:

      Following the reviewer's suggestion, we added the relevant references (34-36) to this statement.

      Line 266 - what does "strong dysregulation" mean? Large fold change, very significant?

      Response:

      We removed the word "strong" from this sentence.

      Lines 423-425 - statement needs a reference

      Response:

      Following the reviewer's suggestion, we added the relevant reference to this statement.

      Line 428 - word missing between "epigenetic , we"?

      Response:

      This has now been corrected. The text reads: "Through treatment with a demethylating drug and targeted epigenetic editing, we demonstrated the ability to modulate..."

      Prior studies are well references, text and figures are clear and accurate.

      __Reviewer #2 (Significance (Required)): __

      This study has several strengths:

      1) Sample collection and characterisation. AT2 cells are incredibly hard to come by and the authors should be commended to generating the samples. However, proximity to cancer is always a potential issue, especially in epigenetic studies. Is it feasible to include any analysis to show the samples derived from those with cancer don't drive the changes observed? Even a high level PCA or an edit of fig 2A with non-cancer in a different colour in supplemental - looks like there is one outlier, is that a non-cancer? Or a correlation of change in beta between control and cancer/COPD and control and non-cancer:COPD (for want a better phrase!). just an indicator that the non-cancer COPD samples are not driving differences.

      Response:

      We thank the reviewer for highlighting the value of generating data from hard-to-work-with AT2 populations and bringing up the important point of cancer proximity, which we considered very carefully when designing our study. To match our samples across the cohort, all the no-COPD, COPD I, and two of the COPD II-IV distal lung samples were obtained from cancer resections. In addition to other characteristics, like age, BMI and smoking status, we also matched the donors by cancer type (all profiled donors had squamous cell carcinoma). We collected lung tissue as far away from the carcinoma as possible and sent representative pieces for histological analysis by an experienced lung pathologist to confirm the absence of visible tumours. In addition, to ensure that our data represents COPD-relevant signatures, we intentionally included samples from three COPD donors undergoing lung resections (without a cancer background) in the profiling.

      Following the reviewer's suggestion, to investigate the potential impact of non-cancer samples on driving the observed differences, we carefully checked the PCAs for both DNA methylation and RNA-seq. We could not identify a clear separation of no-cancer COPD samples from the cancer COPD samples (or other cancer samples) in any examined PCs, indicating no cofounding effect of cancer samples. We observed that one sample contributing to PC2 is a non-cancer sample, but this was a rather sample-specific effect, as the other two non-cancer samples clustered together with the other severe COPD samples with a cancer background. Notably, in our DNA methylation data, we do not observe typical features of cancer methylomes, like global loss of DNA methylation or aberrant methylation of CpG islands (e.g., in tumour suppressor genes) (see Fig. 2A), further suggesting that we do not "pick up" confounding cancer signatures in our data.

      Following the comments from both reviewers, to clarify that point, we added the information about cancer and non-cancer samples to the PCA figures for DNA methylation (new Fig. 2B) and RNA-seq (new Fig. 3A) data in the revised manuscript, as shown below

      CHANGE IN THE MANUSCRIPT____:

      COPD samples from donors with a cancer background clustered together with the COPD samples from lung resections, confirming that we detected COPD-relevant signatures (Fig. 2B).

      Fig. 2B.* Principal component analysis (PCA) of methylation levels at CpG sites with > 4-fold coverage in all samples. COPD I and COPD II-IV samples are represented in light and dark green triangles, respectively, and no COPD samples as blue circles. COPD samples without a cancer background are displayed with a black contour. The percentage indicates the proportion of variance explained by each component. *

      Unsupervised principal component analysis (PCA) on the top 500 variable genes revealed a clear influence of the COPD phenotype in separating no COPD and COPD II-IV samples, as previously observed with the DNA methylation analysis, irrespective of the cancer background of COPD samples (Fig.3A, Fig. S2B).

      *Principal component analysis (PCA) of 500 most variable genes in RNA-seq analysis. PCA 1 and 2 are shown in Fig.3A, PCA 1 and 4 in Fig.S2B. COPD I and COPD II-IV samples are represented in light and dark green triangles, respectively, and no COPD samples as blue circles. COPD samples without a cancer background are displayed with a black contour. The percentage indicates the proportion of variance explained by each component. *

      2) This is the first time DNAm has been profiled in AT2 cells. It is incredibly difficult, valuable and novel data that will increase the fields capability technically, their understanding of functional mechanisms and potential translation considerably. It's audience will be primarily translational respiratory however the fundamental science aspect of gene expression regulation by DNA methylation with have wider reach across developmental and disease science.

      Response:

      We thank the reviewer for recognising the uniqueness and novelty of our study and highlighting the value and potential impact of our datasets for the lung field.

      3) the functional analysis using targeted CRISPR-Cas9 is very well done and adds impact.

      Response:

      We thank the reviewer for recognising the strengths and added value of the functional analysis using epigenetic editing.

      __Potential weaknesses/areas for development __

      I feel the main weakness is the in the section integrating DNA methylation and gene expression. The rationale for a focus on various aspects, for example inversely related DNAm/gene expression pairs, the IFN pathway and IRF9, are not clear. Also further understanding of the differences between DNAm associated genes and non-DNAm associated genes could be expanded, at the pathway level, TF regulation level, effect size level (are DNAm associated changes to gene expression larger, enriched for earlier differential expression)

      Response:

      Our rationale for focusing on the inversely related DNAm/gene expression pairs in promoter proximal is purely data-driven, as they represent the biggest group in our data (Fig. 4A-B). Among those negatively correlated genes, we observed the strongest enrichment for the IFN pathway (Fig. C), making it an obvious, data-driven target for further studies. The negative correlation of expression and methylation for IFN pathway genes could be validated in 5-AZA assays in A549 cells (Fig. 5A). Next, we made an interaction network analysis showing IRF9 and STAT2 as master regulators (Fig. 5B) of the negatively correlated IFN genes. As IRF9 itself displayed a negative correlation between DNA methylation and expression (Fig. 5C), we used the associated DMR for further epigenetic editing (Fig. 5D-E). We performed the additional requested analyses of the enhancer-associated changes and genes, as described above. We fully agree with the reviewer that our data sets are a great resource and can be further used to elaborate on other relationships of DNA methylation and RNA expression or other pathways, but this is out of the scope of this study. To enable further studies by the research community, we provide all necessary information about DMRs and DEGs in the associated supplementary tables and the raw data through the EGA, as well as the CRISPRa editing assay.

      The authors could comment on potential masking of differences between 5hmC and mC and the implications it may have

      Response:

      We thank the reviewer for bringing up this important point. Indeed, bisulfite sequencing cannot differentiate between methylated and hydroxymethylated cytosines; hence, some of the methylated sites may be hydroxymethylated. However, the overall levels of hydromethylation in differentiated adult tissues are very low (except for the brain), orders of magnitude lower compared to DNA methylation. Following the reviewer's suggestion, we have added a sentence in the limitation section of the discussion to clarify that point.

      __CHANGE IN THE MANUSCRIPT: __

      In addition, while WGBS provides unprecedented resolution and high coverage of the DNA methylation sites across the genome, it does not allow distinguishing 5-methylcytosine from 5-hydroxymethylcytosine. Therefore, we cannot exclude that some methylated sites we detected are 5-hydroxymethylated. However, the 5-hydroxymethylcytosine is present at very low levels in the lung tissue (97)*. ** *

      Furthermore, while the rationale for looking at DMRs is clear, especially given the sample number, I am interested to understand what proportion of the assayed CpGs "fit" within the cut off stipulations of the DMR analysis - that is, is their potentially COPD effects at sparse CpG regions/individual CpG sites that are not being identified. A comment on this would be useful and seems the strength of profiling genome wide. I'm happy genome wide is beneficial it just feels a little circular that the authors have chosen whole genome to avoid the bias of the Illumina array and a focus on promotors, but have primarily reported promoter DNAm. This caught my attention again in the discussion where the authors state that cis-regulatory regions were also identified in their fibroblast data .....is this finding a factor of the analysis performed? (also a comparison of regions Identified in AT2 cells versus fibroblasts would be really interesting for a future paper)

      Response:

      We decided to focus our analysis on regions rather than individual CpG sites when looking at differential methylation, as DNA methylation is spatially correlated, and methylation changes in larger regions are more likely to have a biological function. Extending the analysis to single CpG sites would require a higher number of samples for a reliable analysis compared to the DMR analysis (as mentioned by the reviewer).

      Of note, we addressed the platform comparison between Illumina array technology and WGBS in our previous fibroblast study (PMID: 37143403), where we compared our WGBS data with the published 450k array data of COPD parenchymal fibroblasts (Clifford et al., 2018). We observed only a marginal overlap between the CpGs from our DMRs and the CpGs probes available on the array (which was due to the differences in technologies used and the limited coverage of the 450K array in comparison to our genome-wide approach, in which we covered 18 million CpGs). Out of the 6279 DMRs identified in our fibroblast study, only 1509 DMRs overlapped with at least one CpG probe on the 450K array, and after removing low-quality CpGs from the array data, only 1419 DMRs were left. This comparison highlighted the increased resolution of the WGBS compared to Illumina arrays.

      The reason why we focused on promoter proximal DMRs are the following: 1) the assignment of the enhancer elements in AT2 to the corresponding gene is still too inaccurate in the absence of AT2 specific enhancer chromatin maps 2) regulation at enhancers by DNA methylation might be more complex and might change (increase or attenuate) binding affinities of certain transcription factors (Fig.2H), which might lead to gene expression changes or 3) methylation changes might be an indirect effect of differential TF binding PMID: 22170606). However, we agree with the reviewer that despite these limitations, expanding the analysis beyond promoters adds value to the manuscript; hence, as described above, we expanded the analysis of non-promoter regions, including enhancers, in the revised manuscript.

      We thank the reviewer for the suggestion to compare the regions identified in AT2 cells and fibroblasts in a future paper.

      My expertise:Respiratory, cell biology, epigenetics.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary:

      This study aim to understand the molecular mechanisms underlying dysfunction in AT2 cells in COPD, by profiling bulk genome wide DNA methylation using Tagmentation-based whole-genome bisulfite sequencing (T-WGBS) and RNA sequencing in selectively sorted primary AT2 cells. The study stands out in it's sequencing breadth and use of an incredibly difficult cell population, and has the potential to add substantially to our mechanistic understanding of epigenetic contributions to COPD. A further highlight is the concluding aspect of the study where the authors undertook targeted modification of specific CpG methylation, provided direct, site-specific evidence for transcriptional regulation by CpG methylation.

      Major comments:

      The authors clearly show that there is DNA methylation alteration in AT2 cells from COPD individuals that links functional to gene expression at some level. However, I think the statement "to identify genome-wide changes associated with COPD development and progression..." and similar other references to disease development understanding is not accurate given the DNA methylation primary comparison is between control and moderate to severe COPD, with no temporal detail or evidence that they drive progression rather than are a result of COPD development. The paragraph starting on line 186 where this is a addressed to some extent is quite vague and doesn't really provide confidence that DNAm dysregulation occurs at an early stage in this context. This can be addressed by changing the focus/style of the text.

      Results comments and suggestions:

      For the integrated analysis, there is a focus on DMRs in promoters with very little analysis on other regions. The paragraph starting on line 317 describes some analysis on enhancers but is very brief, doesn't include information on how many/which DMRs were included, making it hard to interpret the impact of the 147 DMRs and 93 genes identified - is this nearly all DMRs and genes analysed or very few? A comparison to the promoter analysis would be of interest. Especially as the targeted region followed up with lovely functional assessment in the last sections is a gene body DMR, not a promoter DMR.

      • Lines 299-301 - I'm not sure the graph in Fig S3A support the conclusion that there was a preferential negative relationship between DNAm and gene expression. Looks like there are a substantial number of cases where a positive relationship is observed and this needs to be acknowledged.

      • Line 307 - what are the "analysed DEGs"? Are they the methylation associated genes?

      • Line 307-309 - "Among the analyzed DEGs, 76.5% (492) displayed a negative correlation (16.8% of the total DEGs), indicating a possible direct regulation by DNA methylation, while 23.5% (151) showed a positive correlation between gene expression and DNA methylation" - are the authors suggesting the positive correlation doesn't indicate direct regulation?

      • Line 313 - why did the authors focus on only negatively correlated genes to identify their top dysregulated pathway of IFN signalling? Why not do pathway analysis on the DNAm associated genes separately to identify DNAm associated pathways?

      • A comparison of the gene expression data with previous data in AT2 cell/single cell data would strengthen the gene expression section.

      • The paragraph starting on line 173 feels a little redundant when we know there is RNA available to test if the differential DNAm links to altered gene expression - this selected of example regions/genes would be better placed after the gene expression has been reported, at which point you could say whether the linked genes displayed altered transcription.

      • Similarly, the TF enrichment analysis is great but maybe would have added value to be done on DNA regions later shown to be linked to differential expression - was there different enrichment at DNA regions that are vs are not associated with altered expression? And could you test in vitro whether changing methylation of DNA (maybe a blunt too like 5-aza would be ok) alters TF binding (cut+run/ChIP?). Furthermore it would be interesting to understand the TF sensitivity analysis within the context of positive versus negative DNA methylation:gene expression correlations.

      Methods:

      • The authors should include more detail of the TWGBS rather than directing the reader to a previous publication. Also DNA concentration post bisuphite conversion would be a useful metric to provide.

      • Differential DNA methylation analysis: It is stated that DNA regions had to contain 3 CpG sites but was this within a defined DNA size range?

      • Refence genome only provided for RNAseq not TWGBS?

      • The tables do not appear in the PDF and I struggled to tally to the "Dataset" files provided if that is what they were referring to?

      • For the gene expression analysis, can it be made clearer that a full analysis was done on COPD I samples. It is a little confusing to the reader as this was not done for DNAm so might be assumed the same targeted analysis on only genes found to be differentially expressed between control and COPD II-IV, but that cannot be the case as an overlap of COPD1 vs COPD II-IV genes if provided. For this overlap, do genes show the same effect direction?

      • Replication is difficult on these studies as the samples are so difficult to come by. Also limited by sample size for the same reason. It doesn't mean the study is not worth doing and the data are still valuable. However, it may be pertinent to include technical validation of a few regions of interest, acknowledge the limitation (along side strengths) in the discussion, and perhaps provide actual p value rather than blanket < p 0.1, seems very lenient but may all be super significant (this may already be in the tables I wasn't able to find).

      • It isn't clear to me if DNA and RNA are from the same cells? The results say "cells matching those used for T-WGBS" but the methods suggest separate extractions so not the same cells? If they are not the same cells a comment on the implications of this should be included in the discussion for example, potentially some differences in cell type composition, storage time etc.

      • Line 193 the authors say "Since DMRs were overrepresented at cis-regulatory sites...." - "cis" needs to be defined. If you link DNAm regions to gene via "closest gene" does this not automatically mean you're outputs will be cis? Just needs better definition/explanation.

      Minor comments:

      • Line 157: "we identified site-specific differences....". Change to region specific?

      • Line 102-103: needs a reference for the statement "Alterations in DNA methylation patterns have been implicated......"

      • Line 266 - what does "strong dysregulation" mean? Large fold change, very significant?

      • Lines 423-425 - statement needs a reference

      • Line 428 - word missing between "epigenetic , we"?

      • Prior studies are well references, text and figures are clear and accurate.

      Significance

      This study has several strengths:

      1) Sample collection and characterisation. AT2 cells are incredibly hard to come by and the authors should be commended to generating the samples. However, proximity to cancer is always a potential issue, especially in epigenetic studies. Is it feasible to include any analysis to show the samples derived from those with cancer don't drive the changes observed? Even a high level PCA or an edit of fig 2A with non-cancer in a different colour in supplemental - looks like there is one outlier, is that a non-cancer? Or a correlation of change in beta between control and cancer/COPD and control and non-cancer:COPD (for want a better phrase!). just an indicator that the non-cancer COPD samples are not driving differences.

      2) This is the first time DNAm has been profiled in AT2 cells. It is incredibly difficult, valuable and novel data that will increase the fields capability technically, their understanding of functional mechanisms and potential translation considerably. It's audience will be primarily translational respiratory however the fundamental science aspect of gene expression regulation by DNA methylation with have wider reach across developmental and disease science.

      3) the functional analysis using targeted CRISPR-Cas9 is very well done and adds impact.

      Potential weaknesses/areas for development:

      I feel the main weakness is the in the section integrating DNA methylation and gene expression. The rationale for a focus on various aspects, for example inversely related DNAm/gene expression pairs, the IFN pathway and IRF9, are not clear. Also further understanding of the differences between DNAm associated genes and non-DNAm associated genes could be expanded, at the pathway level, TF regulation level, effect size level (are DNAm associated changes to gene expression larger, enriched for earlier differential expression) The authors could comment on potential masking of differences between 5hmC and mC and the implications it may have

      Furthermore, while the rationale for looking at DMRs is clear, especially given the sample number, I am interested to understand what proportion of the assayed CpGs "fit" within the cut off stipulations of the DMR analysis - that is, is their potentially COPD effects at sparse CpG regions/individual CpG sites that are not being identified. A comment on this would be useful and seems the strength of profiling genome wide. I'm happy genomewide is beneficial it just feels a little circular that the authors have chosen whole genome to avoid the bias of the Illumina array and a focus on promotors, but have primarily reported promoter DNAm. This caught my attention again in the discussion where the authors state that cis-regulatory regions were also identified in their fibroblast data ..... is this finding a factor of the analysis performed? (also a comparison of regions Id'ed in AT2 cells versus fibroblasts would be really interesting for a future paper)

      My expertise: Respiratory, cell biology, epigenetics.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      In this manuscript, Gruber et al perform serial EM sections of the antennal lobe and reconstruct the neurites innervating two types of glomeruli one that is narrowly tuned to geosmin and one that is broadly tuned to other odours. They quantify and describe various aspects of the innervations of olfactory sensory neurons (OSNs), uniglomerlular projection neurons (uPNs), and the multiglomerular Local interneurons (LNs) and PNs (mPNs). They find that narrowly tuned glomeruli had stronger connectivity from OSNs to PNs and LNs, and considerably more connections between sister OSNs and sister PNs than the broadly tuned glomeruli. They also had less connectivity with the contralateral glomeruli. These observations are suggestive of strong feed-forward information flow with minimal presynaptic inhibition in narrowly tuned glomeruli, which might be ecologically relevant, for example, while making quick decisions such as avoiding a geosmin-laden landing site. In contrast, information flow in more broadly tuned glomeruli show much more lateralisation of connectivity to the contralateral glomerulus, as well as to other ipsilateral glomeruli. 

      The data are well presented, the manuscript clearly written, and the results will be useful to the olfaction community. I wonder, given the hemibrain and FAFB datasets exist, whether the authors have considered verifying whether the trends they observe in connectivity hold across three brains? Is it stereotypic? 

      We appreciate the reviewer’s positive view of our study and their thoughtful and relevant comment on the issue of individual variation. We agree in that this is a very important question and notice that it was also asked for by the second Reviewer. It reflects both our limited understanding of the range of individual variation in synaptic connectivity—whether in flies, humans, or other species—and the challenge of determining which of the differences observed in our study are stereotypical features of each glomerulus type. Undoubtedly this criticism addresses a crucial problem of practically all connectome studies so far and for which there is no immediate solution. This type of studies requires so much time, efforts and money that increasing the number of samples is seldom feasible. The Reviewer wonders if we could compare our data with that made available by two of the largest connectome studies of Drosophila. This appeared to us to be a very good idea and we have tried to follow the advice but, unfortunately, it was impracticable because of the reasons we explain below. The hemibrain data cannot be used for this purpose because it does not contain the full glomerulus DA2 (Schlegel et al., 2021). A different problem hindered us from using the FAFB dataset, the other dataset mentioned by the Reviewer. In this case the three glomeruli were sectioned and reconstructed but the dataset lacks an annotated list of all synaptic connections corresponding to each glomerulus. Such annotation (a compendium of all synaptic connections inside each glomerulus informing for each connection which type of neuron provides the presynaptic site and which the postsynaptic site) is essential for direct comparison with our data. It is important to keep in mind that the current analytical tools available for the use of these datasets (e.g., NeuPrint, FlyWire and CATMAID) do not offer the ability to extract data on synapses exclusively from the glomerular volume of DA2 or DL5. In this case, it certainly is theoretically possible to obtain the data by doing ourselves the annotation. However, such a study will demand so much time, efforts and financial resources, which we believe would not be justified solely to increase the number of individuals from one to two. Instead, our manuscript includes a comparison of the OSN connectivity in VA1v and DL5 using the hemibrain dataset published by Schlegel et al. (2021) (see revised manuscript: lines 311–315; 431–434; 558–562; 602–606).

      Beyond the opinion, that we share in full with the Reviewer, that a comparison including three flies will be better than a comparison made with one glomerulus of each type we are still challenged by the question of which -if any- of the differences are stereotypic. The clarification of what are stereotypical differences between particular glomeruli in features as those discussed in our study and what is simply differences within the normal range of individual variation is basically a statistical problem. A first attempt at a comprehensive comparison focusing on intra- and inter-individual variability was recently made by comparing two connectome datasets from two different Drosophila individuals (Dorkenwald et al., 2024; Schlegel et al., 2024). At present, it is still unclear how many samples are needed to make a statistically robust comparison of olfactory synaptic circuits in adult flies—perhaps 3, 6, or even 18 individuals?  

      Reviewer #2 (Public Review):

      The chemoreceptor proteins expressed by olfactory sensory neurons differ in their selectivity such that glomeruli vary in the breadth of volatile chemicals to which they respond. Prior work assessing the relationship between tuning breadth and the demographics of principal neuron types that innervate a glomerulus demonstrated that narrowly tuned glomeruli are innervated more projection neurons (output neurons) and fewer local interneurons relative to more broadly tuned glomeruli. The present study used high-resolution electron microscopy to determine which synaptic relationships between principal cell types also vary with glomerulus tuning breadth using a narrowly tuned glomerulus (DA2) and a broadly tuned glomerulus (DL5). The strength of this study lies in the comprehensive, synapse-level resolution of the approach. Furthermore, the authors implement a very elegant approach of using a 2-photon microscope to score the upper and lower bounds of each glomerulus, thus defining the bounds of their restricted regions of interest. There were several interesting differences including greater axo-axonic afferent synapses and dendrodentric output neuron synapses in the narrowly tuned glomerulus, and greater synapses upon sensory afferents from multiglomerular neurons and output neuron autapses in the broadly tuned glomerulus.     The study is limited by a few factors. There was a technical need to group all local interneurons, centrifugal neurons, and multiglomerular projection neurons into one category ("multiglomerular neurons") which complicates any interpretations as even multiglomerular projection neurons are very diverse. Additionally, there were as many differences between the two narrowly tuned glomeruli as there were comparing the narrowly and broadly tuned glomeruli. Architecture differences may therefore not reflect differences in tuning breadth, but rather the ecological significance of the odors detected by cognate sensory afferents. Finally, some synaptic relationships are described as differing and others as being the same between glomeruli, but with only one sample from each glomerulus, it is difficult to determine when measures differ when there is no measure of inter-animal variability. If these caveats are kept in mind, this work reveals some very interesting potential differences in circuit architecture associated with glomerular tuning breadth.

      This work establishes specific hypotheses about network function within the olfactory system that can be pursued using targeted physiological approaches. It also identifies key traits that can be explored using other high-resolution EM datasets and other glomeruli that vary in their tuning selectivity. Finally, the laser "branding" technique used in this study establishes a reduced-cost procedure for obtaining smaller EM datasets from targeted volumes of interest by leveraging the ability to transgenically label brain regions in Drosophila.

      CLASSIFICATION OF NEURONAL TYPES

      We agree that grouping diverse types of interneurons into a single category (referred to as MGNs) limits the ability to make interpretations about synaptic similarities and differences between specific neuronal types. This was, however, an unavoidable compromise resulting from our decision to generate a comprehensive, synapse-level reconstruction of the restricted regions encompassing the DA2 and DL5 glomeruli. As both reviewers have noted, this approach offers significant value and we hope the Editor will also recognize that this limitation does not prevent readers from gaining important and novel insights into the synaptic circuitry of these two glomeruli.  

      Similar to the approach taken by Tobin at al. (2017) we prioritized producing a densely reconstructed neuropile, in which no synapses were omitted (Tobin et al., 2017). The downside of this method is that not all synaptic connections could be reliably assigned to specific neuronal types, with about 12% remaining unassigned." We anticipate that future research, supported by advances in semi-automated tracing methods, improved imaging technologies, and increased personnel resources, will allow not only for the generation of more complete connectomes of the entire brain (Scheffer et al., 2020; Zheng et al., 2018), but also, for the accurate reconstruction and classification of individual synapses—even in highly complex regions such as the olfactory glomeruli. We also expect that a second complete connectome of a male Drosophila will soon become available, which will provide valuable opportunities for comparisons across individuals and between male and female brains in future studies.

      INTERGLOMERULAR DIFFERENCES

      Thank you for this insightful comment. It is indeed true that despite both DA2 and VA1v being narrowly tuned glomeruli, they exhibit considerable differences in specific connectivity features (e.g., relative synaptic strengths above certain thresholds) and that those differences can be as pronounced as those observed between DA2 and the broadly tuned DL5. For this reason, comparing each individual glomerulus to every other is not a practical or informative approach. To derive robust interpretations, we focused instead on whether two glomeruli that share a particular functional characteristic—namely, being narrowly tuned for single odorants—also share connectivity patterns that distinguish them from a broadly tuned reference glomerulus.

      Our results support this. Furthermore, additional connectomics data reinforce our conclusions.

      For example, OSN-OSN connectivity is stronger in the two narrowly tuned glomeruli (DA2 and VA1v) relative to the broadly tuned glomerulus (DL5). While these pairwise differences alone are not conclusive, the finding that the two narrowly tuned glomeruli studied here share features that distinguish them from the broadly tuned glomerulus supports our interpretation. We found further support for this idea in the data reported by Schlegel et al. (2021) further. In that dataset, other narrowly tuned glomeruli (DA1, DL3, and DL4) also exhibit stronger OSNOSN connectivity than other broadly tuned glomeruli (DM1 or DM4).

      We do not deny that there are many differences between any given pair of glomeruli, regardless of whether they are narrowly or broadly tunned. Instead, we propose that our findings on circuit features indicate that most of the observed differences actually grouped the two narrowly tuned glomeruli together relative to the broadly tuned glomerulus. A more concise summary is now provided in the newly added Figure 8. We also added explanatory lines of text in the beginning of the chapter ‘specific features of narrowly tuned glomerular circuits. 

      ECOLOGICAL SIGNIFICANCE

      This is an interesting point. However, it is difficult to disentangle the "ecological significance" of processed odorants from the "tuning breadth" of a glomerulus. In the Drosophila olfactory system, glomerular circuits that respond to ecologically important odorants—such as those involved in reproduction or danger—tend to be more narrowly tuned. Moreover, while we refer to odorants with specific ecological significance as those linked to survival or reproductive behaviors, defining the significance of an odorant with precision is inherently challenging, as it can vary depending on context and environmental conditions.

      What both circuits share is their narrow tuning breadth. We therefore propose that the common circuit features of VA1v and DA2, highlighted in this study, are functionally related to the fact that each circuit processes single odorants. Consequently, their specificity is most likely determined at the level of the receptor. 

      INDIVIDUAL VARIABILITY

      We agree that accounting for inter-animal variability would strengthen the study. However, we are confident that even a modest statistically sound assessment of this variability would require a larger sample size, certainly more than just two or three flies, which is presently not feasible.

      We refer the reviewer to our response to Reviewer #1 regarding this important issue.

      Initial insights into variability between flies have been provided through comparative analyses of the two most comprehensive female Drosophila melanogaster connectomes—the FAFB and hemibrain datasets (Schlegel et al., 2024). For more detailed quantitative comparisons regarding inter-animal variability, please refer to our response to the second major point raised by Reviewer #2. As highlighted by Schlegel et al. (2024), making definitive statements about the stereotypy of neuron numbers, unitary cell-cell connections (edges), or synaptic strengths (weights) remains a complex challenge."

      While appreciating the rigour of this work we were surprised to notice the omission of a comparison of their observations with the two other existing datasets. This would not only have addressed the technical limitation of this particular study - the inability to identify specific neuron types due to imaging a small part of the brain - but would also have shed light on inter-animal variability 

      We strongly recommend that the authors do make this comparison - the datasets are currently extremely user friendly and so we don't estimate the replication of their key findings will be too onerous. This will be particularly important to resolve the issue of having to classify all multiglomerular local interneurons and multiglomerular projection neurons - broadly into "MGN. Such a comparison will dramatically strengthen this study that poses very interesting questions, but in its current form, has this striking shortcoming. 

      INDIVIDUAL VARIABILITY AS EXPRESSED HERE:

      Earlier on we were of the same opinion that the Reviewer express here but, unfortunately, it was not possible to follow his advice. As far as it was possible, we have compared some of our results to the values of the two datasets that the Reviewer refers to, but the absence of glomerulus DA2 in one of the datasets and the absence of synapse annotation for all the relevant glomeruli in the other dataset prevented us from making a full comparison. Moreover, believe that the problem of individual variation most probably cannot be solved by increasing the comparison with one or two more flies.

      Reviewer #1 (Recommendations for The Authors): 

      The lines 270 - 282 confused me in the backdrop of Figure 3B. 

      The concern may stem from our inclusion of a comparison between the uPNs of glomerulus DA2 and the single uPN of glomerulus DL5 in the statistical analysis presented in Figure 3. This comparison was included to ensure a comprehensive representation of the data, highlighting the variability across all major cell groups. We have clarified this rationale in the revised manuscript (see lines 274-282).

      Reviewer #2 (Recommendations for The Authors): 

      I commend the authors for taking such a thorough approach to advance an interesting topic in olfaction. The following suggestions are intended to strengthen this study: 

      Major points: 

      A color-blind-friendly palette should be used for all figures. Currently, five of seven figures use red and green, and in particular, Figure 5 will be uninterpretable for red/green color-blind readers. 

      We are thankful for this important comment. We changed the color palette as suggested by the reviewer, and replaced Red with Magenta and changed the figure legend accordingly.

      This level of analysis is extremely resource and time-consuming, so even obtaining this information at this resolution is an impressive achievement. However, this study would be well served by strategically supplementing the analysis of this dataset with information from other publicly available connectomics datasets. For instance, some interpretations are limited because there is information from only a single DL5 and DA2 glomerulus. Any claims in which one glomerulus has more, less, or the same of a metric must be tempered because without replicates, there are no measures of inter-animal variability. As an example, on lines 386-387 the authors state "The relative synaptic strength between MGN>uPN was stronger in DA2 (12%) than DL5 (10%)". It is difficult to assess whether this represents a difference that is outside of the range of inter-animal variability inherent to the olfactory system. Taking select measures from the Hemibrain and FAFB (via FlyWire) datasets could help strengthen these claims. 

      We fully agree with the Reviewer’s opinion that since our data is from one glomerulus of each type “It is difficult to assess whether this represents a difference that is outside of the range of inter-animal variability inherent to the olfactory system.” This is a weakness of practically all connectome studies based on electron microscopy in both Drosophila and other animals We cannot be sure that measurements from the Hemibrain and FAFB datasets could help strengthen our claims, because the magnitude of the range of individual variation is presently not known and most probably solving this problem will require more than one or two more flies. In any case, it is not possible to follow this advice and compare our data with that of the hemibrain because the DA2 was not included in that study. We ask the Reviewer to read our more detailed explanation in our response to Reviewer 1.

      In the particular case commented by the Reviewer above, the relative difference in synaptic strength exceeds 20%. Whether such a difference has functional relevance remains an open question but Schlegel et al. (2024) support our interpretation. They showed that synaptic weights with differences larger than 20% tend to be consistent across individuals, with strong correlations within and between animals (Pearson’s R = 0.97 and R = 0.8; Fig. 4).

      Grouping all local interneurons, centrifugal neurons response and multiglomerular PNs into one category limits the ability to make interpretations about similarities or differences in the synaptic relationships involving MGNs. The authors could get an estimate of the number of multiglomerular PNs in DL5, VA1v, and DA2 from Hemibrain and FlyWire platforms to get a better sense of differences between glomeruli in the MGN category. 

      We agree in that grouping a variety of interneurons into a single category (called MGNs) limits the ability to make interpretations about similarities or differences in the synaptic relationships involving different neurons. This was the unavoidable price to be paid once we decided to register a “comprehensive, synapse-level resolution” map of these two glomeruli. It appears to us that both reviewers have clearly recognized the intrinsic value of this approach and we hope that the Editor will share this opinion. 

      Consistent with the assumptions of Tobin et al., (2017) our hypothesis on LN connectivity differences is based on the fact that they are the most numerous and broadly arborizing neurons of the class that we call multiglomerular neurons in the AL (Chou et al., 2010; Lin et al., 2012; Tanaka et al., 2012). Recent connectome studies confirm this feature across all glomeruli (Bates et al., 2020; Horne et al., 2018; Scheffer et al., 2020; Schlegel et al., 2021; Zheng et al., 2018).  

      In response to the reviewer’s question, we conducted a case-specific reanalysis of the data from Horne (2018), which provides comprehensive connectivity information for the VA1v glomerulus. This allowed us to quantify the proportional contributions of LNs (n = 56) and mPNs (n = 13) to all MGN connections (MGN-MGN, MGN>OSN, MGN>uPN, uPN>MGN, OSN>MGN).

      Our analysis showed that 84% of MGN output originates from LNs. 57% of the input to MGN comes from LNs and 43% from mPNs, largely due to strong OSN>mPN input. Thus, for the filtered MGN connections relevant to distinguishing narrowly from broadly tuned circuits (e.g., MGN>OSN, uPN>MGN; see Fig. 8), LNs are the dominant contributors in VA1v. (These data are not included in the resubmitted manuscript.) This supports our interpretation that the LN are responsible for the majority of MGN connections underlying the observed differences between glomeruli.

      For instance, prior work has reported fewer local interneurons innervating DA2, but in this study there was an unexpected result that there was greater MGN innervation density and synapse # for DA2 relative to DL5 This discrepancy could be due to differences in the number of multiglomerular PNs innervating each glomerulus, which would be obscured when these PNs are combined with local interneurons in the MGN category. 

      "We agree that the greater MGN innervation density in DA2 in our study could reflect a stronger contribution from mPNs. However, innervation density alone does not indicate how many mPNs actually innervate DA2 or DL5. Alternatively, increased innervation and/or synaptic frequency of local interneurons (LNs) could also account for this observation. In our view, neuron number does not necessarily correlate with branching complexity or synaptic density. 

      For example, the dendritic length of the single uPN in glomerulus DL5 is approximately equal to the combined dendritic length of the multiple uPNs of the DA2. Similarly, Tobin et al. (2017) reported that when comparing uPNs in glomerulus DM6 between the left and right brain hemispheres, they found variability in cell number but not in dendritic length. More recently, the FAFB and hemibrain datasets showed a similar pattern in another neuronal type. A substantial variation in cell number was observed for Kenyon cells between the two Drosophila individuals, but this cell type consistently makes and receives, in both individuals, similar presynapses and post-synapses (Schlegel et al., 2024).

      On line 33 the authors cannot claim that DA2-OSNs experience less presynaptic inhibition based on the data in this study. Even without the limitations of the MGN category (described above), presynaptic inhibition depends on more than just the number of synapses, rather it is affected by GABA B receptor expression levels and the second messenger components downstream of this receptor. Physiological experiments are needed to justify this claim, so I recommend adjusting accordingly.

      We agree with the Reviewer and have adjusted the text on line 33 and in the main body of the text by referring to this finding as “presynaptic input”, which is what we have quantified, instead of “less presynaptic inhibition”.

      Figures 5 and 6 seek to distill the wealth of information from this study into broad takehome points for the reader, while still providing a good amount of detail. I think a final more concise graphic summary (similar to the graphical abstract or Figure 6 of Grabe et al 2016) depicting the most critical differences between glomeruli would further clarify the broad findings of this study. 

      We appreciate this comment and we have added a “graphic summary” as the Reviewer proposed. We made a new figure that becomes Figure 8 and summarizes our results and highlights differences between narrowly and broadly tuned glomeruli in a more concise graphical abstract format.

      Minor points: 

      Much of the manuscript provides details about synapse fractions or % synapses for a given synaptic relationship. Please ensure that it is clear which principal cell types are being described, as it can be easy to get lost.  - Should line 284 say "...than DL5 as it has been reported that DA2 is innervated by fewer LNs..."?

      We appreciate the reviewer’s comment and we have corrected this sentence that now reads as follows: (see text: beginning at line 290).  

      Taisz et al.  has been published, so the citation should be updated. 

      We have updated the corresponding citation.  

      On line 233, the authors ascribe the small electron-dense vesicles as likely housing sNPF released by MGNs. However, Carlsson et al. (2010) demonstrated that sNPF is released by OSNs, which was further functionally characterized by Root et al. (2011) and Ko et al. (2014). In terms of MGNs that release neuropeptides, Carlsson et al. 2010 demonstrated that local interneurons immunolabel for tachykinin, myoinhibitory peptide, and allatostatin-A, while two extrinsic neurons release SIFamide. In theory, aminergic neurons could also have small electron-dense vesicles, but this can be variable. 

      The Reviewer is completely right in his criticism. The MGN certainly contain neurons that have been reported to contain neuropeptides other than sNPF. We have corrected this sentence and it now reads as follows (page7, line 236): “Interestingly, besides the abundant clear small vesicles..

      On line 636, the Berck and Schlegel studies demonstrated that panglomerular local interneurons synapse upon OSN, but not that they induce presynaptic inhibition (which was demonstrated in the studies cited in the next sentence). I recommend adjusting this sentence.

      We agree and we have corrected the text following the Reviewers advice. It now reads as follows (page 19. Line 663): “We also observed that OSNs received less MGN feedback.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Recommendation For the Authors):

      Thanks to the authors for addressing my suggestions. I think these modifications have improved the clarity of the data and the overall presentation of the manuscript. The methods are now more clearly explained, and the additional details help make the results easier to interpret. Where addressing the comment wasn't feasible, the authors gave reasonable explanations. Overall, the revisions strengthen the paper, and I have no further concerns.

      Thank you for your recommendations, which have significantly improved our paper.

      Reviewer #2 (Recommendation For the Authors):

      The additional work conducted by the authors is greatly appreciated. All concerns (and beyond) have been thoroughly addressed by the authors and I am thankful for their consideration and attention to detail. Only one possible issue with the revisions is described below for consideration:

      Regarding the CFU counts and/or axis labels in Figure S3B, some of the listed "CFU per 1 mL" values (in both the figure itself and File S2B) are extraordinarily high. For example, the greatest CFU for PA14 observed in Figure 4E is ~1x10^9. However, PA14 at 0 ug/mL Ceftazidime reaches nearly 1x10^16 in Figure S3B. From what I can tell, this should be beyond the capacity of bacteria in this space by several orders of magnitude. (E.g., a cubic centimeter [~1 mL] is ~1x10^12 cubic micrometers. At their smallest dimensions and volume, a maximum of ~1x10^13 cells could theoretically fit in this space assuming no liquid and perfect organization.) Similarly, both "AMM" and "AMM (+PA14)" consistently reach CFUs between 1x10^12 and 1x10^14 in this assay. Are the authors confident in the values and/or depiction of CFUs for this figure? It seems like this could be a labeling or dilutioncounting issue.

      Thank you for your positive remarks on our revised manuscript and for your constructive comments that have strengthened our work.

      We agree with the concern regarding the CFU counts in Figure S3B. The very high values (>10<sup>12</sup>CFU) reflect a technical enumeration artifact that, due to the nature of the assay, cannot be fully avoided. The origin of these inflated counts is described in more detail below:

      Following competition assays between Pseudomonas aeruginosa and Stenotrophomonas maltophilia in liquid culture with antibiotics, we enumerate survivors for each species by colony forming unit (CFU) counts. Because two different bacterial species must be quantified from mixed cultures, we use a gentamicin resistance marker carried by one species at a time.

      Each condition is therefore enumerated twice, as we alternate which species harbors the gentamicin cassette.

      During coculture in antibiotics and minimal medium, clinical isolates of P. aeruginosa and S. maltophilia, like those used here, can transiently increase their tolerance to antibiotics, including aminoglycosides. This reduces the effectiveness of gentamicin selection at the plating step necessary for CFU enumeration. For the data presented in Figure S3B, in a subset of highOD₆₀₀ conditions in the competition assay, this tolerance produces artificially inflated CFU values that exceed the biological carrying capacity during the CFU enumeration step.

      We evaluated alternative enumeration strategies (e.g., fluorescent protein markers with a nonselective medium), but these proved unsuitable for these strains due to differences in growth rates and media compatibility, introducing other large biases. Given these constraints, selective plating remains the only feasible approach for this work, and the associated artifact cannot be eliminated entirely.

      Importantly, transient resistance (tolerance), although common, is not a universal occurrence (e.g., we did not observe it when we performed the experiments shown in Figure 4E). When it does arise, it occurs reproducibly under the same experimental high-OD<sub>600</sub> conditions and does not obscure any of the relative comparisons that underpin our conclusions.

      For transparency, we have retained the measured values in Figure S3B and we note in the legend that counts above ~10<sup>12</sup> CFU represent a technical overestimation due to transient gentamicin tolerance. Counts below 10<sup>12</sup> CFU are accurately enumerated.

      Reviewer #3 (Recommendation For the Authors):

      All concerns have been satisfied and the manuscript is ready for publishing.

      Thank you for your recommendations, which have significantly improved our paper.


      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      The study would benefit from presenting raw data in some cases, such as MIC values and SDS-PAGE gels, by clarifying the number of independent experiments used, as well as further clarification on statistical significance for some of the data.

      All original data used to generate Fig. 1, Fig. 4E, Fig. S3 and Fig. S4A are presented in File S2. Tab (A) is dedicated to data used for Fig. 1 and Fig. S4A, while tabs (B) and (C) show the data used for Fig. 4E and S3, respectively. This information is indicated in the legends of the relevant figures.

      All experiments in this study were performed in three independent (biological) experiments (with the exception of the complementation data shown in Fig. S1 and Fig. S5, which were performed in two independent (biological) experiments). The number of biological and technical replicates for each experiment is stated in the figure legends, as well as in the “Statistical analysis of experimental data” part of the “Materials and Methods” section of the paper. Specifically, for antibiotic MIC assays we have not performed statistical analyses as per recommended practice. The reason for this is stated in the following section from the “Statistical analysis of experimental data” part of the “Materials and Methods” section of the paper (lines 699-711 of the revised manuscript):

      “Antibiotic MIC values were determined in biological triplicate, except for MIC values recorded for dsbA complementation experiments in our E. coli K-12 inducible system that were carried out in duplicate. All ETEST MICs were determined as a single technical replicate, and all BMD MICs were determined in technical triplicate. All recorded MIC values are displayed in the relevant graphs; for MIC assays where three or more biological experiments were performed, the bars indicate the median value, while for assays where two biological experiments were performed the bars indicate the most conservative of the two values (i.e., for increasing trends, the value representing the smallest increase and for decreasing trends, the value representing the smallest decrease). We note that in line with recommended practice, our MIC results were not averaged. This should be avoided because of the quantized nature of MIC assays, which only inform on bacterial survival for specific antibiotic concentrations and do not provide information for antibiotic concentrations that lie in-between the tested values.”

      Reviewer #2 (Public review):

      While Figure 5E demonstrates a protective effect of DsbA-dependent β-lactamase, the omission of CFU data for S. maltophilia makes it difficult to assess the applicability of the polymicrobial strategy. Since S. maltophilia is pre-cultured prior to the addition of P. aeruginosa and antibiotics, it is unclear whether the protective effect is dependent on high S. maltophilia CFU. It is also unclear what the fate of the S. maltophilia dsbA dsbL mutant is under these conditions. If DsbA-deficient S. maltophilia CFU is not impacted, then this treatment will result in the eradication of only one of the pathogens of interest. If the mutant is lost during treatment, then it is not clear whether the loss of protection is due specifically to the production of non-functional β-lactamase or simply the absence of S. maltophilia.

      We have simultaneously tracked the abundance of P. aeruginosa and S. maltophilia strains in our cross-protection experiment for select antibiotic concentrations. To be able to perform this experiment, we had to label two extremely-drug-resistant strains of S. maltophilia with an antibiotic resistance marker that allowed us to quantify them in mixtures with P. aeruginosa. Our results can be found in Fig. S3 of our revised manuscript and, in a nutshell, show that ceftazidime treatment leads to eradication of both P. aeruginosa and S. maltophilia when disulfide bond formation is impaired in S. maltophilia.

      The following text was added to address the questions of the reviewer:

      “Due to the naturally different growth rates of these two species (S. maltophilia grows much slower than P. aeruginosa) especially in laboratory conditions, the protocol we followed [1] requires S. maltophilia to be grown for 6 hours prior to co-culturing it with P. aeruginosa. To ensure that at this point in the experiment our two S. maltophilia strains, with and without dsbA, had grown comparatively to each other, we determined their cell densities (Fig. S3A). We found that S. maltophilia AMM dsbA dsbL had grown at a similar level as the wild-type strain, and both were at a higher cell density [~10<sup>7</sup> colony forming units (CFUs)] compared to the P. aeruginosa PA14 inoculum (5 x 10<sup>4</sup> CFUs)” (lines 353-361 of the revised manuscript).

      “To ensure that ceftazidime treatment leads to eradication of both P. aeruginosa and S. maltophilia when disulfide bond formation is impaired in S. maltophilia, we monitored the abundance of both strains in each synthetic community for select antibiotic concentrations (Fig. S3B). In this experiment we largely observed the same trends as in Fig. 4E. At low antibiotic concentrations, for example 4 μg/mL of ceftazidime, S. maltophilia AMM is fully resistant and thrives, thus outcompeting P. aeruginosa PA14 (dark pink and dark blue bars in Fig. S3B). The same can also be seen in Fig. 4E, whereby decreased P. aeruginosa PA14 CFUs are recorded. By contrast S. maltophilia AMM dsbA dsbL already displays decreased growth at 4 μg/mL of ceftazidime because of its non-functional L1-1 enzyme, allowing comparatively higher growth of P. aeruginosa (light pink and light blue bars in Fig. S3B). Despite the competition between the two strains, P. aeruginosa PA14 benefits from S. maltophilia AMM’s high hydrolytic activity against ceftazidime, which allows it to survive and grow in high antibiotic concentrations even though it is not resistant (see 128 μg/mL; dark pink and dark blue bars in Fig. S3B). In stark opposition, without its disulfide bond in S. maltophilia AMM dsbA dsbL, L1-1 cannot confer resistance to ceftazidime, resulting in killing of S. maltophilia AMM dsbA dsbL and, consequently, also of P. aeruginosa PA14 (see 128 μg/mL; light pink and light blue bars in Fig. S3B).

      The data presented here show that, at least under laboratory conditions, targeting protein homeostasis pathways in specific recalcitrant pathogens has the potential to not only alter their own antibiotic resistance profiles (Fig. 3 and 4A-D), but also to influence the antibiotic susceptibility profiles of other bacteria that co-occur in the same conditions (Fig. 5). Admittedly, the conditions in a living host are too complex to draw direct conclusions from this experiment. That said, our results show promise for infections, where pathogen interactions affect treatment outcomes, and whereby their inhibition might facilitate treatment” (lines 381406 of the revised manuscript).

      The alleged clinical relevance and immediate, theoretical application of this approach should be properly contextualized. At multiple junctures, the authors state or suggest that interactions between S. maltophilia and P. aeruginosa are known to occur in disease or have known clinical relevance related to treatment failure and disease states. For instance, the citations provided for S. maltophilia protection of P. aeruginosa in the CF lung environment both describe simplified laboratory experiments rather than clinical or in vivo observations. Similarly, the citations provided for both the role of S. maltophilia in treatment failure and CF disease severity do not support either claim. The role of S. maltophilia in CF is currently unsettled, with more recent work reporting conflicting results that support S. maltophilia as a marker, rather than cause, of severe disease. These citations also do not support the suggestion that S. maltophilia specifically contributes to treatment failure. While it is reasonable to pursue these ideas as a hypothesis or potential concern, there is no evidence provided that these specific interactions occur in vivo or that they have clinical relevance.

      Thank you for your comment. You are entirely correct. We have amended the test throughout our revised manuscript to avoid overstating the role of S. maltophilia in CF infections and to reference additional relevant works in the literature. Please find below representative examples of such passages:

      “On the other hand, CF microbiomes are increasingly found to encompass S. maltophilia [2-4], a globally distributed opportunistic pathogen that causes serious nosocomial respiratory and bloodstream infections [5-7]. S. maltophilia is one of the most prevalent emerging pathogens [6] and it is intrinsically resistant to almost all antibiotics, including β-lactams like penicillins, cephalosporins and carbapenems, as well as macrolides, fluoroquinolones, aminoglycosides, chloramphenicol, tetracyclines and colistin. As a result, the standard treatment option for lung infections, i.e., broad-spectrum β-lactam antibiotic therapy, is rarely successful in countering S. maltophilia [7,8], creating a definitive need for approaches that will be effective in eliminating both pathogens” (lines 33-41 of the revised manuscript).

      “Of the organisms studied in this work, S. maltophilia deserves further discussion because of its unique intrinsic resistance profile. The prognosis of CF patients with S. maltophilia lung carriage is still debated [4,9-16], largely because studies with extensive and well-controlled patient cohorts are lacking. This notwithstanding, the therapeutic options against this pathogen are currently limited to one non-β-lactam antibiotic-adjuvant combination, , which is not always effective, trimethoprim-sulfamethoxazole [17-20], and a few last-line β-lactam drugs, like the fifth-generation cephalosporin cefiderocol and the combination aztreonam-avibactam. Resistance to commonly used antibiotics causes many problems during treatment and, as a result, infections that harbor S. maltophilia have high case fatality rates [7]. This is not limited to CF patients, as S. maltophilia is a major cause of death in children with bacteremia [5]” (lines 440-450 of the revised manuscript).

      Reviewer #3 (Public review):

      The impact of the work can be strengthened by demonstrating increased efficacy of antibiotics in mice models or wound models for Pseudomonas infections. Worm models are relevant, but still distant from investigations in animal models.

      Thank you for this comment. We appreciate the sentiment, and we would have liked to be able to perform experiments in a murine model of infection. There are several reasons that made this not possible, and as a result we used G. mellonella as an informative preliminary in vivo infection model. The DSB proteins have been shown to play a central role in bacterial virulence. Because of this our P. aeruginosa and S. maltophilia mutant strains are not efficient in establishing an infection, even in a wound model. This could be overcome had we been able to use the chemical inhibitor of the DSB system in vivo, however this also is not possible This is due to the fact that the chemical compound that we use to inhibit the function of DsbA acts on DsbB. Inhibition of DsbB blocks the re-oxidation of DsbA and leads to its accumulation in its inactive reduced form. However, the action of the inhibitor can be bypassed through reoxidation and re-activation of DsbA by small-molecule oxidants such as L-cystine, which are abundant in rich growth media or animal tissues. This makes the inhibitor only suitable for in vitro assays that can be performed in minimal media, where the presence of small-molecule oxidants can be strictly avoided, but entirely unsuitable for an insect or a vertebrate animal model.

      Reviewer #1 (Recommendation For the Authors):

      (1) The analysis of the role of DsbA in the assembly of cysteine-containing β-lactamases is a significant finding. However, in addition to showing the MIC fold difference, I think, it would be important to show the raw data for the actual MIC values obtained for each β-lactamase enzyme/antibiotic combination and in both strains (+ and - dsbA).

      Also, can the authors clarify whether these experiments were conducted on 3 independent samples (there seems to be some contradicting information in the paper and the supplementary figures). If possible, I would also recommend showing in the figure whether the MIC differences observed were statistically significant.

      All original data used to generate Fig. 1, Fig. 4E, Fig. S3 and Fig. S4A are presented in File S2. Tab (A) is dedicated to data used for Fig. 1 and Fig. S4A, while tabs (B) and (C) show the data used for Fig. 4E and S3, respectively. This information is indicated in the legends of the relevant figures.

      All experiments in this study were performed in three independent (biological) experiments (with the exception of the complementation data shown in Fig. S1 and Fig. S5, which were performed in two independent (biological) experiments). The number of biological and technical replicates for each experiment is stated in the figure legends, as well as in the “Statistical analysis of experimental data” part of the “Materials and Methods” section of the paper. Specifically, for antibiotic MIC assays we have not performed statistical analyses as per recommended practice. The reason for this is stated in the following section from the “Statistical analysis of experimental data” part of the “Materials and Methods” section of the paper (lines 699-711 of the revised manuscript):

      “Antibiotic MIC values were determined in biological triplicate, except for MIC values recorded for dsbA complementation experiments in our E. coli K-12 inducible system that were carried out in duplicate. All ETEST MICs were determined as a single technical replicate, and all BMD MICs were determined in technical triplicate. All recorded MIC values are displayed in the relevant graphs; for MIC assays where three or more biological experiments were performed, the bars indicate the median value, while for assays where two biological experiments were performed the bars indicate the most conservative of the two values (i.e., for increasing trends, the value representing the smallest increase and for decreasing trends, the value representing the smallest decrease). We note that in line with recommended practice, our MIC results were not averaged. This should be avoided because of the quantized nature of MIC assays, which only inform on bacterial survival for specific antibiotic concentrations and do not provide information for antibiotic concentrations that lie in-between the tested values.”

      (2) For Figure 2A, can the authors provide the full Westerns and ideally the SDS-PAGE gel corresponding to the Westerns where the Β-lactamases and the control DNA-K were detected.

      Thank you for this comment. Full immunoblots and SDS PAGE analysis of the immunoblot samples for total protein content are shown in File S3 of our revised manuscript.

      (3) For the enzymatic assays, was the concentration of enzyme used "normalised " based on the amount detected in the westerns where possible or was only the total amount of protein considered. When similar amounts of enzyme were added, was the activity still compromised?

      The β-lactam hydrolysis assay was normalized based on the weight of the cell pellets (wet cell pellet mass) of the tested strains. This means, that for each enzyme expressed in cells with and without DsbA, strains were normalized to the same weight to volume ratio, and thus strains expressing the same enzyme were only compared to each other.

      Because enzyme degradation in the absence of DsbA is a key factor underlying the effects we describe for most of the tested β-lactamases (see Fig. 2A and S4A; no protein band is detected for 5 of the 7 enzymes in the dsbA mutant), it was not possible to normalize our samples based on enzyme levels detected by immunoblot. Normalization based on enzyme amounts would be feasible had we purified each β-lactamase after expression in the two different strain backgrounds (+/- dsbA) assuming sufficient protein amounts could be isolated from the dsbA mutant strain. Nonetheless, we feel that such a comparison would be misleading, since enzyme degradation likely plays the biggest role in the lack of activity observed for most of these enzymes in the absence of DsbA.

      (4) Not sure whether Fig 3 is very informative. Perhaps it could be redesigned to better encapsulate the findings in this manuscript (combine figurer 3 and 6 into one). I would also include the chemical structure of the inhibitors used and perhaps include how they block the system by binding to DsbB.

      Thank you for this comment. Fig. 3 was combined with Fig. 6 of the submitted manuscript. The new model figure is Fig. 5 in our revised manuscript.

      The inhibitor compound used in our study has been extensively characterized in a previous publication [21]. Considering that this inhibitor is not the main focus of our paper, we have avoided showing its chemical structure in any of the main display items. That said, its structure can be found in File S5 of our revised manuscript, which contains the quality control information on this compound. As suggested, we included the following sentence to describe the mode of action of this inhibitor: “Compound 36 was previously shown to inhibit disulfide bond formation in P. aeruginosa via covalently binding onto one of the four essential cysteine residues of DsbB in the DsbA-DsbB complex [21]” (lines 309-311 of the revised manuscript).

      (5) Figure 4: Similar to my comment above showing in the figure whether the differences observed in Figure 4, particularly A-C, are statistically significant (i.e. galleria survival difference in the presence and absence of dsbA) would be beneficial.

      As mentioned in our answer to comment 1 above, we have not performed statistical analyses for antibiotic MIC assays because, in line with recommended practice, our MIC results were not averaged (Fig. 3A,B,D,E of our revised manuscript). This should be avoided because of the quantized nature of MIC assays, which only inform on bacterial survival for specific antibiotic concentrations and do not provide information for antibiotic concentrations that lie in-between the tested values. Statistical analysis of G. mellonella survival data (Fig. 3C,F of our revised manuscript) was performed and is described fully in the legend of Fig. 3, as well as in the “Statistical analysis of experimental data” part of the “Materials and Methods” section of the paper (lines 729-738 of the revised manuscript). Finally, the statistical analyses for the most important comparisons in panels (C) and (F) of Fig. 3 are also marked directly on the figure.

      (6) Were the authors able to test the redox state of DsbA upon addition of the DsbB inhibitor to further demonstrate that the effects observed were indeed due to the obstruction of the Dsb machinery and not due to off target effects.

      Thank you for the opportunity to clarify this. In previous work from our lab, we have used a DSB system inhibitor termed “compound 12” in [22] with activity against DsbB proteins from Enterobacteria. In our previous study [23] we, indeed, tested the redox state of DsbA in the presence of this inhibitor compound. We could not perform the same experiment here with “compound 36” from [21], because we do not have an antibody against the DsbA protein of S. maltophilia. That said, we have carried out experiments that confirm that our results are due to specific inhibition of the DSB system and not because of off-target effects. In particular, we show that the gentamicin MIC values of S. maltophilia AMM remain unchanged in the presence of the inhibitor and treatment of S. maltophilia AMM dsbA dsbL with the compound does not affects its colistin MIC value (Fig. S2E and lines 317-320 of the revised manuscript).

      (7) Given the remarkable effects shown by the DsbB inhibitor, did the authors use this compound to assess whether inhibition of the Dsb system with small molecules would block cross-resistance in S. maltophilia - P. aeruginosa mixed communities (Fig 5D).

      Unfortunately, this was not possible. The decrease in the ceftazidime MIC value of S. maltophilia AMM in the presence of the DSB inhibitor compound is more modest than the effects we observed when the dsbA dsbL mutant is used (compare Fig. 4D (left) with Fig.4A of the revised manuscript). This means that in the presence of the DSB inhibitor there are still sufficient amounts of functional β-lactamase present and we expect that they would contribute to cross-protection of P. aeruginosa. While the use of the DSB inhibitor does have a drastic impact on the colistin resistance profile of S. maltophilia AMM (Fig. 4D of the revised manuscript), unlike β-lactamases, which act as common goods, MCR enzymes act solely on the lipopolysaccharide of their producer and do not contribute to bacterial interactions, precluding the use of colistin for a cross-protection experiment.

      Reviewer #2 (Recommendation For the Authors):

      (1) The acronym used for synthetic cystic fibrosis sputum medium (lines 523, 531, 535, 601, and 603) is defined in the manuscript as 'SCF', but the common formulation is 'SCFM', including in the provided citation. Suggest changing to SCFM for consistency.

      Thank you for this comment. This has been amended throughout our revised manuscript.

      (2) In Figure 1, while the legend states that "No changes in MIC values are observed for strains harboring the empty vector control (pDM1)[...]" (lines 729-30), the median of ceftazidime in the pDM1 control appears to indicate a 2-fold decrease in MIC. This would not seem to significantly impact the other results since the MIC decreases observed for other conditions are all 3-fold or greater, but this should be addressed and/or explained in the text.

      You are correct. Thank you for the opportunity to clarify this. Generally, since MIC assays have a degree of variability, we have only followed decreases in MIC values that are greater than 2fold. Generally, for most of our controls, the recorded MIC fold changes are below 2-fold. The only exception to this is the ceftazidime MIC drop of the empty-vector control, showing a 2fold change, which we do not consider significant.

      To ensure that this is clear in our text and figure legends the following changes were made:

      The clause “only differences larger than 2-fold were considered” was added to the text (lines 110-111 of the revised manuscript).

      We amended the legend of Fig. 1 accordingly: “No changes in MIC values are observed for the aminoglycoside antibiotic gentamicin (white bars) confirming that absence of DsbA does not compromise the general ability of this strain to resist antibiotic stress. Minor changes in MIC values (≤ 2-fold) are observed for strains harboring the empty vector control (pDM1) or those expressing the class A β-lactamases L2-1 and LUT-1, which contain two or more cysteines (Table S1), but no disulfide bonds (top row)”.

      (3) Similarly, in Fig S1E, there appears to be only partial complementation for BPS-1m. Do the authors hypothesize that this observation is related to a folding defect, rather than degradation of protein, as described for BPS-1m for Figure 2?

      Thank you for the opportunity to clarify this. You are correct that we only achieve partial complementation for the E. coli strain expressing the BPS-1m enzyme from the Burkholderia complex. Despite the fact that the gene for this enzyme was codon optimized, we observed that its expression in E. coli is sub-optimal and incurs fitness effects. In fact, to record the data presented in our manuscript the E. coli strains had to be transformed anew every time. Considering that the related enzyme BPS-6 does not present any of these challenges, we attribute the partial complementation to technical difficulties with the expression of the bps-1m gene in E. coli. 

      We clarified this by adding the following clause to our manuscript: “we only achieve partial complementation for the dsbA mutant expressing BPS-1m, which we attribute to the fact that expression of this enzyme in E. coli is sub-optimal” (lines 132-134 of the revised manuscript).

      (4) Lines 204-206: "[...]we deleted the principal dsbA gene, dsbA1 (pathogenic bacteria often encode multiple DsbA analogues [24,25]), in several multidrug-resistant (MDR) P. aeruginosa clinical strains (Table S2)". That multiple DsbA analogues are often encoded is good information to provide, but it was unclear from quickly looking at the citations whether Pa is counted among these. Is it expected that all oxidative protein folding in Pa functions through DsbA1? Conveying this information, if possible, may make the impact of the results in this model clearer.

      Thank you for this comment. To address it we added the following text to our manuscript:

      “To determine whether the effects on β-lactam MICs observed in our inducible system (Fig. 1 and [23]) can be reproduced in the presence of other resistance determinants in a natural context with endogenous enzyme expression levels, we deleted the principal dsbA gene, dsbA1, in several multidrug-resistant (MDR) P. aeruginosa clinical strains (Table S2). Pathogenic bacteria often encode multiple DsbA analogues [24,25] and P. aeruginosa is no exception. It encodes two DsbAs, but DsbA1 has been found to catalyze the vast majority of the oxidative protein folding reactions taking place in its cell envelope [26]” (lines 172-178 of the revised manuscript).

      (5) Regarding the clinical Pa isolates G4R7 and G6R7, have the authors performed any phenotypic testing on these strains to identify differences that might explain the substantial difference in piperacillin MIC? I.e., can these isolates be distinguished by growth rate, genetic markers or expression levels, early or late infection, mucoidy, etc. This is not essential for the current work, but could weigh on the efficacy of this treatment strategy for AIM1expressing clinical isolates. (E.g., the G4R7 dsbA1 strain exhibits a piperacillin MIC still ~2fold higher than WT G6R7).

      Thank you for the opportunity to clarify this. For clinical strains used in our study, we have evaluated their antibiotic resistance profiles, but we have not performed any additional phenotypic characterization. There are many reasons that contribute to differences in antibiotic resistance, starting simply from β-lactamase expression levels and extending to organismal effects, like the ones mentioned by the reviewer. Such characterization would fall outside the scope of our paper, especially since we sensitize our tested P. aeruginosa clinical isolates for the majority of the β-lactams antibiotics tested. 

      We acknowledged this by adding the following sentence to our revised manuscript: 

      “Despite the fact that P. aeruginosa G4R7 dsbA1 was not sensitized for piperacillintazobactam, possibly due to the high level of piperacillin-tazobactam resistance of the parent clinical strain, our results across these two isolates show promise for DsbA as a target against β-lactam resistance in P. aeruginosa” (lines 191-194 of the revised manuscript).

      (6) Lines 180-2: "This shows that without their disulfide bonds, these proteins are unstable and are ultimately degraded by other cell envelope proteostasis components [33]". While it is clear that protein is significantly lost in all cases except for BPS-1m in 2A, the dsbA pDM1bla constructs in 2B appear to all retain non-trivial (>10-fold) nitrocefin hydrolysis activity compared to the dsbA pDM1 control. This does not impact the other results in 2B, but it would seem that a loss-of-function folding defect, as described subsequently for BPS-1m, is also part of the explanation for the observed MIC decreases, and this was not necessarily clear from the quoted passage. This could simply be clarified in the final sentence - that both mechanisms are potentially in play - if the authors agree with that interpretation.

      You are correct, thank you for your comment. We amended the text in our revised manuscript as follows: 

      The data presented so far (Fig. 1 and 2) demonstrate that disulfide bond formation is essential for the biogenesis (stability and/or protein folding) and, in turn, activity of an expanded set of clinically important β-lactamases, including enzymes that currently lack inhibitor options” (lines 158-161 of the revised manuscript).

      (7) While it is clear from Figure S2 that the various dsb mutants do not have a general growth defect or collateral sensitivity to another antibiotic, it does not appear that there is an analogous control for the DSB inhibitor demonstrating no growth/toxic effects at the concentration used. This could be provided similarly to Figure S2, using gentamicin as a control antibiotic.

      We have carried out experiments that confirm that our results are due to specific inhibition of the DSB system and not because of off-target effects. In particular, we show that the gentamicin MIC values of S. maltophilia AMM remain unchanged in the presence of the inhibitor and treatment of S. maltophilia AMM dsbA dsbL with the compound does not affects its colistin MIC value (Fig. S2E and lines 317-320 of the revised manuscript).

      (8) Complementation is appropriately provided for experiments with E. coli, but are not provided for P. aeruginosa or S. maltophilia. It should be straightforward to complement in Pa, but is also probably less critical considering the evidence from E. coli. However, since the Sm mutant is a gene cluster with two genes, it would seem more imperative to complement this strain. This reviewer is not familiar enough with Sm to know if complementation is routine or feasible with this organism; if not, the controls for the DSB inhibitor should at least be provided.

      As mentioned in our response to comment 7 above, we have carried out experiments that confirm that our DSB inhibitor results are due to specific inhibition of the DSB system and not because of off-target effects.

      Moreover, in response to this comment, we have further demonstrated that our results are due to the specific interaction of DsbA with β-lactamase enzymes by complementing dsbA deletions in representative clinical strains of multidrug-resistant Pseudomonas aeruginosa and extremely-drug-resistant Stenotrophomonas maltophilia. We would like to note here that gene complementation in clinical isolates remains very rare in the literature due to their high levels of resistance and limited genetic tractability. Most of the few complementation examples reported for these two organisms are limited to strains that, although pathogenic, are commonly used in the lab, or to complementation efforts in non-clinical strain systems (for example use of P. aeruginosa PA14 for complementation, instead of the focal clinical isolate).

      We tested three different complementation strategies, two of which ended up being unsuccessful. After approximately 9 months of work, we succeeded in complementing a representative clinical strain for each organism (P. aeruginosa CDC #769 dsbA1 and S. maltophilia AMM dsbA dsbL) by inserting the dsbA1 gene from P. aeruginosa PAO1 into the Tn7 site on the chromosome. Both clinical strains show full complementation for every antibiotic tested; our complementation results can be found in Fig. S2B,D of the revised manuscript.

      The following text was added for P. aeruginosa clinical isolates:

      We have demonstrated the specific interaction of DsbA with the tested β-lactamase enzymes in our E. coli K-12 inducible system using gentamicin controls (Fig. 1 and File S2A) and gene complementation (Fig. S1). To confirm the specificity of this interaction in P. aeruginosa, we performed representative control experiments in one of our clinical strains, P. aeruginosa CDC #769. We first tested the general ability of P. aeruginosa CDC #769 dsbA1 to resist antibiotic stress by recording MIC values against gentamicin, and found it unchanged compared to its parent (Fig. S2A). Gene complementation in clinical isolates is especially challenging and rarely attempted due to the high levels of resistance and lack of genetic tractability in these strains. Despite these challenges, to further ensure the specificity of the interaction of DsbA with tested β-lactamases in P. aeruginosa, we have complemented dsbA1 from P. aeruginosa PAO1 into P. aeruginosa CDC #769 dsbA1. We found that complementation of dsbA1 restores MICs to wild-type values for both tested β-lactam compounds (Fig. S2B) further demonstrating that our results in P. aeruginosa clinical strains are not confounded by off-target effects” (lines 226-239 of the revised manuscript).

      The following text was added for S. maltophilia clinical isolates: 

      “Since the dsbA and dsbL are organized in a gene cluster in S. maltophilia, we wanted to ensure that our results reported above were exclusively due to disruption of disulfide bond formation in this organism. First, we recorded gentamicin MIC values for S. maltophilia AMM dsbA dsbL and found them to be unchanged compared to the gentamicin MICs of the parent strain (Fig. S2C). This confirms that disruption of disulfide bond formation does not compromise the general ability of this organism to resist antibiotic stress. Next, we complemented S. maltophilia AMM dsbA dsbL. The specific oxidative roles and exact regulation of DsbA and DsbL in S. maltophilia remain unknown. For this reason and considering that genetic manipulation of extremely-drug-resistant organisms is challenging, we used our genetic construct optimized for complementing P. aeruginosa CDC #769 dsbA1 with dsbA1 from P. aeruginosa PAO1 (Fig. S2B) to also complement S. maltophilia AMM dsbA dsbL. We based this approach on the fact that DsbA proteins from one species have been commonly shown to be functional in other species [27-30]. Indeed, we found that complementation of S. maltophilia AMM dsbA dsbL with P. aeruginosa PAO1 dsbA1 restores MICs to wild-type values for both ceftazidime and colistin (Fig. S2D), conclusively demonstrating that our results in S. maltophilia are not confounded by off-target effects” (lines 282-297 of the revised manuscript).

      (9) In Figure 5E, the growth inhibition and loss of Pa CFU in 4 ug/mL ceftazidime for the Sm co-culture condition, which is subsequently lost in the Sm dsbA dsbL co-culture, does not appear to be discussed. As Pa is shown to grow fine in monoculture at this concentration, this result should be discussed in relation to the co-culture dynamics. Is it expected or observed that WT Sm is out-competing Pa under this condition and growing to a high CFU/mL? This would seem to have parallels to citation 49.

      As requested by this reviewer (see comment 10 below), we simultaneously tracked the abundance of P. aeruginosa and S. maltophilia strains in our cross-protection experiment. During this process we probed the abundances of the two organisms at 4 µg/mL of ceftazidime. Our results can be seen in Fig. S3B of the revised manuscript. The reviewer is correct and these effects are due to competition between P. aeruginosa and S. maltophilia with the latter being able to reach very high CFUs in this antibiotic concentration. 

      The following text on co-culture dynamics was added to our revised manuscript: 

      At low antibiotic concentrations, for example 4 μg/mL of ceftazidime, S. maltophilia AMM is fully resistant and thrives, thus outcompeting P. aeruginosa PA14 (dark pink and dark blue bars in Fig. S3B). The same can also be seen in Fig. 4E, whereby decreased P. aeruginosa PA14 CFUs are recorded. By contrast S. maltophilia AMM dsbA dsbL already displays decreased growth at 4 μg/mL of ceftazidime because of its non-functional L1-1 enzyme, allowing comparatively higher growth of P. aeruginosa (light pink and light blue bars in Fig. S3B)” (lines 384-390 of the revised manuscript).

      (10) The data presented in Figure 5E would be augmented by the inclusion of, for at least a few representative cases, the Sm CFUs relative to the Pa CFUs. In describing the protective effects of Sm on Pa for imipenem treatment, the authors of citation 12 note that the effect was dependent on Sm cell density. This raises the immediate question of whether the protection observed in this work is similarly dependent on cell density of Sm. It is unclear if the authors expect Sm to persist under these conditions, and it seems Sm CFU should be expected to be relatively high considering it is pre-incubated for 6 hours prior to the assay. What is the physiological state of these cells, and how are they affected by ceftazidime? While many other variables are likely relevant to the translation of this protection, the relative abundance and localization of Sm and Pa commonly observed in CF patients, as well as the effective concentration of antibiotic observed in vivo, is likely worth consideration.

      As mentioned in our response to comment 9 above, we have simultaneously tracked the abundance of P. aeruginosa and S. maltophilia strains in our cross-protection experiment for select antibiotic concentrations. To be able to perform this experiment, we had to label two extremely-drug-resistant strains of S. maltophilia with an antibiotic resistance marker that allowed us to quantify them in mixtures with P. aeruginosa. Our results can be found in Fig. S3 of our revised manuscript and, in a nutshell, show that ceftazidime treatment leads to eradication of both P. aeruginosa and S. maltophilia when disulfide bond formation is impaired in S. maltophilia.

      The following text was added to address the questions of the reviewer:

      “Due to the naturally different growth rates of these two species (S. maltophilia grows much slower than P. aeruginosa) especially in laboratory conditions, the protocol we followed [1] requires S. maltophilia to be grown for 6 hours prior to co-culturing it with P. aeruginosa. To ensure that at this point in the experiment our two S. maltophilia strains, with and without dsbA, had grown comparatively to each other, we determined their cell densities (Fig. S3A). We found that S. maltophilia AMM dsbA dsbL had grown at a similar level as the wild-type strain, and both were at a higher cell density [~10<sup>7</sup> colony forming units (CFUs)] compared to the P.aeruginosa PA14 inoculum (5 x 10<sup>4</sup> CFUs)” (lines 353-361 of the revised manuscript).

      “To ensure that ceftazidime treatment leads to eradication of both P. aeruginosa and S. maltophilia when disulfide bond formation is impaired in S. maltophilia, we monitored the abundance of both strains in each synthetic community for select antibiotic concentrations (Fig. S3B). In this experiment we largely observed the same trends as in Fig. 4E. At low antibiotic concentrations, for example 4 μg/mL of ceftazidime, S. maltophilia AMM is fully resistant and thrives, thus outcompeting P. aeruginosa PA14 (dark pink and dark blue bars in Fig. S3B). The same can also be seen in Fig. 4E, whereby decreased P. aeruginosa PA14 CFUs are recorded. By contrast S. maltophilia AMM dsbA dsbL already displays decreased growth at 4 μg/mL of ceftazidime because of its non-functional L1-1 enzyme, allowing comparatively higher growth of P. aeruginosa (light pink and light blue bars in Fig. S3B). Despite the competition between the two strains, P. aeruginosa PA14 benefits from S. maltophilia AMM’s high hydrolytic activity against ceftazidime, which allows it to survive and grow in high antibiotic concentrations even though it is not resistant (see 128 μg/mL; dark pink and dark blue bars in Fig. S3B). In stark opposition, without its disulfide bond in S. maltophilia AMM dsbA dsbL, L1-1 cannot confer resistance to ceftazidime, resulting in killing of S. maltophilia AMM dsbA dsbL and, consequently, also of P. aeruginosa PA14 (see 128 μg/mL; light pink and light blue bars in Fig. S3B).

      The data presented here show that, at least under laboratory conditions, targeting protein homeostasis pathways in specific recalcitrant pathogens has the potential to not only alter their own antibiotic resistance profiles (Fig. 3 and 4A-D), but also to influence the antibiotic susceptibility profiles of other bacteria that co-occur in the same conditions (Fig. 5). Admittedly, the conditions in a living host are too complex to draw direct conclusions from this experiment. That said, our results show promise for infections, where pathogen interactions affect treatment outcomes, and whereby their inhibition might facilitate treatment” (lines 381406 of the revised manuscript).

      (11) Regarding the role of microbial interactions in CF and other disease/infection contexts, the authors should temper their descriptions in accordance with citations provided. As an example, lines 96-99: "For example, in the CF lung, highly drug-resistant S. maltophilia strains actively protect susceptible P. aeruginosa from β-lactam antibiotics [12], and ultimately facilitate the evolution of β-lactam resistance in P. aeruginosa [14]."

      Neither citation provided here attests to Sm protection of Pa "in the CF lung". Both papers use a simplified in vitro co-culture model to assess Sm protection of Pa from antibiotics and the evolution of Pa antibiotic resistance in the presence or absence of Sm, respectively. In the latter case, it should also be noted that while the authors observed somewhat faster Pa resistance evolution in one co-culture condition, they did not observe it in the other, and that resistance evolution in general was observed regardless of co-culture condition. There are also statements in the ultimate and penultimate paragraphs of the Discussion section that repeat these points. The authors could re-frame this aspect of their investigation as part of a working hypothesis related to potential interactions of these pathogens, and should appropriately caveat what is and is not known from in vitro and in vivo/clinical work.

      Thank you for your comment. You are entirely correct. We have amended the test throughout our revised manuscript to avoid overstating these finding and to be clear about the fact that they originate from experimental studies. Please find below representative examples of such passages:

      “In particular, some antibiotic resistance proteins, like β-lactamases, which decrease the quantities of active drug present, function akin to common goods, since their benefits are not limited to the pathogen that produces them but can be shared with the rest of the bacterial community. This means that their activity enables pathogen cross-resistance when multiple species are present [1,31], something that was demonstrated in recent work investigating the interactions between pathogens that naturally co-exist in CF infections. More specifically, it was shown that in laboratory co-culture conditions, highly drug-resistant S. maltophilia strains actively protect susceptible P. aeruginosa from β-lactam antibiotics [1]. Moreover, this crossprotection was found to facilitate, at least under specific conditions, the evolution of β-lactam resistance in P. aeruginosa [32]” (lines 47-57 of the revised manuscript).

      “The antibiotic resistance mechanisms of S. maltophilia impact the antibiotic tolerance profiles of other organisms that are found in the same infection environment. S. maltophilia hydrolyses all β-lactam drugs through the action of its L1 and L2 β-lactamases [7,8]. In doing so, it has been experimentally shown to protect other pathogens that are, in principle, susceptible to treatment, such as P. aeruginosa [1]. This protection, in turn, allows active growth of otherwise treatable P. aeruginosa in the presence of complex β-lactams, like imipenem [1], and, at least in some conditions, increases the rate of resistance evolution of P. aeruginosa against these antibiotics [32]” (lines 332-340 of the revised manuscript).

      (12) Regarding the role of S. maltophilia in CF disease, the authors should either discuss clinical associations more completely or note the conflicting data on its role in disease. As an example, lines 84-87: "As a result, the standard treatment option, i.e., broad-spectrum βlactam antibiotic therapy, constitutes a severe risk for CF patients carrying both P. aeruginosa and S. maltophilia [10,11], creating an urgent need for antimicrobial approaches that will be effective in eliminating both pathogens."

      It is unclear how this treatment results in a "severe risk" for CF patients colonized by both Sm and Pa. Citation 10 suggests an association between anti-pseudomonal antibiotic use and increased prevalence of Sm, but neither citation supports a worsening clinical outcome from this treatment. Citation 10 further notes that clinical scores between Sm-positive and control cohorts could not be distinguished statistically. Citation 11 is a review that makes note of this conflicting data regarding Sm, including reference to a more recent (at the time) result using multivariate analysis showing no independent affect of Sm on survival.

      The above point similarly applies to other statements in the manuscript, for example at lines 266-267: "Considering the contribution of S. maltophilia strains to treatment failure in CF lung infections [8,10,11][...]" As well as lines 79-80: "Pulmonary exacerbations and severe disease states are also associated with the presence of S. maltophilia [8]"

      Again, the provided citations do not support the implication that Sm specifically 'contributes to treatment failure in CF lung infections' or that Sm is specifically associated with severe disease states. In addition to the previously discussed citations, citation 8 describes broad "pulmotypes" composed of 10 species/genera that could be associated with particular clinical (e.g., exacerbation) or treatment (e.g., antibiotic therapy) characteristics, but these cannot, without further analysis, be associated with, or causally linked to, a specific pathogen. While pulmotype 2 in citation 8 was associated with a more severe clinical state and appeared to have the highest relative abundance of Sm compared to other pulmotypes, Sm was not identified (Figure 4A) as an independent factor that distinguishes between moderate and severe disease, unlike Pa and some anaerobes (4F-H). The authors also observed that decreasing relative abundance of Pa, in particuar, is correlated with subsequent exacerbation, but did not correlate this with the presence of any other species or genera. Again, this should be re-framed with the appropriate caveat that this is a hypothesis with possible clinical significance.

      Several suggested papers are included below on Sm association with clinical characteristics to incorporate into the manuscript if the authors choose to do so:

      https://doi.org/10.1177/14782715221088909

      https://doi.org/10.1016/j.prrv.2010.07.003

      https://doi.org/10.1016/j.jcf.2013.05.009 https://doi.org/10.1002/ppul.23943

      https://doi.org/10.1002/14651858.CD005405.pub2

      https://doi.org/10.1164/rccm.2109078 http://dx.doi.org/10.1136/thx.2003.017707

      https://erj.ersjournals.com/content/23/1/98.short

      Thank you for your comment. You are entirely correct. We have amended the test throughout our revised manuscript to avoid overstating the role of S. maltophilia in CF infections and to reference additional relevant works in the literature. Please find below representative examples of such passages:

      “On the other hand, CF microbiomes are increasingly found to encompass S. maltophilia [2-4], a globally distributed opportunistic pathogen that causes serious nosocomial respiratory and bloodstream infections [5-7]. S. maltophilia is one of the most prevalent emerging pathogens [6] and it is intrinsically resistant to almost all antibiotics, including β-lactams like penicillins, cephalosporins and carbapenems, as well as macrolides, fluoroquinolones, aminoglycosides, chloramphenicol, tetracyclines and colistin. As a result, the standard treatment option for lung infections, i.e., broad-spectrum β-lactam antibiotic therapy, is rarely successful in countering S. maltophilia [7,8], creating a definitive need for approaches that will be effective in eliminating both pathogens” (lines 33-41 of the revised manuscript).

      “Of the organisms studied in this work, S. maltophilia deserves further discussion because of its unique intrinsic resistance profile. The prognosis of CF patients with S. maltophilia lung carriage is still debated [4,9-16], largely because studies with extensive and well-controlled patient cohorts are lacking. This notwithstanding, the therapeutic options against this pathogen are currently limited to one non-β-lactam antibiotic-adjuvant combination, , which is not always effective, trimethoprim-sulfamethoxazole [17-20], and a few last-line β-lactam drugs, like the fifth-generation cephalosporin cefiderocol and the combination aztreonam-avibactam. Resistance to commonly used antibiotics causes many problems during treatment and, as a result, infections that harbor S. maltophilia have high case fatality rates [7]. This is not limited to CF patients, as S. maltophilia is a major cause of death in children with bacteremia [5]” (lines 440-450 of the revised manuscript).

      Reviewer #3 (Recommendation For the Authors):

      (1) The referencing of supplemental figures does not follow a sequential order. For example, Figure S2 appears in the text before S1. The sequential ordering of figure numbers improves the readability and can be considered while editing the manuscript for revision.

      Thank you for this comment. This is amended in our revised manuscript and supplemental figures and files are cited in order.

      (2 )It will be useful to provide a brief description of ambler classes since these are important to study design (for a broader audience).

      Thank you for this suggestion. This has been added and can be found in lines 91-101 of the revised manuscript.

      (3) The rationale for using K12 strain for E. coli should be provided. It appears that is a model system that is well established in their lab, but a scientific rationale can be listed. Maybe this strain does not have any lactamases in its genome other than the one being expressed as compared to pathogenic E. coli?

      Thank you for this suggestion. This has been added and can be found in lines 104-106 of the revised manuscript.

      (4) The reviewers used worm model to test their observations, which is relevant. Given the significant implications of their work in overcoming resistance to clinically used antibiotics and availability of already generated dsbA mutants in clinical strains, it will be useful to investigate survival in animal models or at least wound models of Pseudomonas infections. The reviewer does not deem this necessary, but it will significantly increase the impact of their seminal work.

      Thank you for this comment. We appreciate the sentiment, and we would have liked to be able to perform experiments in a murine model of infection. There are several reasons that made this not possible, and as a result we used G. mellonella as an informative preliminary in vivo infection model. The DSB proteins have been shown to play a central role in bacterial virulence. Because of this our P. aeruginosa and S. maltophilia mutant strains are not efficient in establishing an infection, even in a wound model. This could be overcome had we been able to use the chemical inhibitor of the DSB system in vivo, however this also is not possible This is due to the fact that the chemical compound that we use to inhibit the function of DsbA acts on DsbB. Inhibition of DsbB blocks the re-oxidation of DsbA and leads to its accumulation in its inactive reduced form. However, the action of the inhibitor can be bypassed through reoxidation and re-activation of DsbA by small-molecule oxidants such as L-cystine, which are abundant in rich growth media or animal tissues. This makes the inhibitor only suitable for in vitro assays that can be performed in minimal media, where the presence of small-molecule oxidants can be strictly avoided, but entirely unsuitable for an insect or a vertebrate animal model.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Dixit, Noe, and Weikl apply coarse-grained and all-atom molecular dynamics to determine the response of the mechanosensitive proteins Piezo 1 and Piezo 2 proteins to tension. Cryo-EM structures in micelles show a high curvature of the protein whereas structures in lipid bilayers show lower curvature. Is the zero-stress state of the protein closer to the micelle structure or the bilayer structure? Moreover, while the tension sensitivity of channel function can be inferred from the experiment, molecular details are not clearly available. How much does the protein's height and effective area change in response to tension? With these in hand, a quantitative model of its function follows that can be related to the properties of the membrane and the effect of external forces. 

      Simulations indicate that in a bilayer the protein relaxes from the highly curved cryo-EM dome (Figure 1). 

      Under applied tension, the dome flattens (Figure 2) including the underlying lipid bilayer. The shape of the system is a combination of the membrane mechanical and protein conformational energies (Equation 1). The membrane's mechanical energy is well-characterized. It requires only the curvature and bending modulus as inputs. They determine membrane curvature and the local area metric (Equation 4) by averaging the height on a grid and computing second derivatives (Equations 7, 8) consistent with known differential geometric formulas. 

      The bending energy can be limited to the nano dome but this implies that the noise in the membrane energy is significant. Where there is noise outside the dome there is noise inside the dome. At the least, they could characterize the noisy energy due to inadequate averaging of membrane shape. 

      My concern for this paper is that they are significantly overestimating the membrane deformation energy based on their numerical scheme, which in turn leads to a much stiffer model of the protein itself.

      We agree that “thermal noise” is intrinsic to MD simulations, as in “real” systems, leading to thermally excited shape fluctuations of membranes and conformational fluctuations of proteins. However, for our coarse-grained simulations, the thermally excited membrane shape fluctuations can be averaged out quite well, and the resulting average shapes are smooth, see e.g. the shapes and lines of the contour plots in Fig. 1 and 2. For our atomistic simulations, the averaged shapes are not as smooth, see Fig. 3a and the lines of the contour plots in Fig. 3b. Therefore, we do not report bending energies for the nanodome shapes determined from atomistic simulations, because bending energy calculations are sensitive to remaining “noise” on small scales (due to the scale invariance of the bending energy), in contrast to calculations of excess areas, which we state now on lines 620ff.

      For our coarse-grained simulations, we now corroborate our bending energy calculations based on averaged 3d shapes by comparing to bending energy values obtained from highly smoothened 2d mean curvature profiles (see Fig. 1c for mean curvature profiles in tensionless membranes). We discuss this in detail from line 323 on, starting with:

      “To corroborate our bending energy calculations for these averaged three-dimensional nanodome shapes, we note that essentially identical bending energies can be obtained from the highly smoothened mean curvatures M of the two-dimensional membrane profiles. …”

      Two things would address this: 

      (1) Report the membrane energy under different graining schemes (e.g., report schemes up to double the discretization grain). 

      There are two graining schemes in the modeling, and we have followed the reviewer’s recommendation regarding the second scheme. In the first, more central graining scheme, we use quadratic membrane patches with a sidelength of about 2 nm to determine membrane midplane shapes and lipid densities of each simulation conformation. This graining scheme has also been previously employed in Hu, Lipowsky, Weikl, PNAS 38, 15283 (2013) to determine the shape and thermal roughness of coarse-grained membranes. A sidelength of 2 nm is necessary to have sufficiently many lipid headgroups in the upper and lower leaflet in the membrane patches for estimating the local height of these leaflets, and the local membrane midplane height as average of these leaflet heights (see subsection “Membrane shape of simulation conformation” in the Methods section for details).  However, we strongly believe that doubling the sidelength of membrane patches in this discretization is not an option, because a discretization length of 4 nm is too coarse to resolve the membrane deformations in the nanodome, see e.g. the profiles in Fig. 1b. Moreover, any “noise” from this discretization is rather completely smoothened out in the averaging process used in the analysis of the membrane shapes, at least for the coarse-grained simulations. This averaging process requires rotations of membrane conformations to align the protein orientations of the conformations (see subsection “Average membrane shapes and lipid densities” for details). Because of these rotations, the original discretization is “lost” in the averaging, and a continuous membrane shape is generated. To calculate the excess areas and bending energies for this smooth, continuous membrane shape, we use a discretization of the Monge plane into a square lattice with lattice parameter 1 nm. As a response to the referee’s suggestion, we now report that the results for the excess area do not change significantly when doubling this lattice parameter to 2 nm. On line 597, we write:

      “For a lattice constant of a=2 nm, we obtain extrapolated values of the excess area Delta A from the coarse-grained simulations that are 2 to 3% lower than the values for a=1 nm, which is a small compared to statistical uncertainties with relative errors of around 10%.”

      On lines 614ff, we now state that the bending energy results are about 10% to 13% lower for a=2 nm, likely because of the lower resolution of the curvature in the nanodome compared to a=1 nm, rather than incomplete averaging and remaining roughness of the coarse-grained nanodome shapes.

      (2) For a Gaussian bump with sigma=6 nm I obtained a bending energy of 0.6 kappa, so certainly in the ballpark with what they are reporting but significantly lower (compared to 2 kappa, Figure 5 lower left). It would be simpler to use the Gaussian approximation to their curves in Figure 3 - and I would argue more accurate, especially since they have not reported the variation of the membrane energy with respect to the discretization size and so I cannot judge the dependence of the energy on discretization. I view reporting the variation of the membrane energy with respect to discretization as being essential for the analysis if their goal is to provide a quantitative estimate for the force of Piezo. The Helfrich energy computed from an analytical model with a membrane shape closely resembling the simulated shapes would be very helpful. According to my intuition, finite-difference estimates of curvatures will tend to be overestimates of the true membrane deformation energy because white noise tends to lead to high curvature at short-length scales, which is strongly penalized by the bending energy. 

      Instead of Gaussian bumps, we now calculate the membrane bending energy also from the two-dimensional, continuous mean curvature profiles (see Fig. 1c). These mean curvature profiles are highly smoothened (see figure caption for details). Nonetheless, we obtain essentially the same bending energies as in our discrete calculations of averaged, smoothened threedimensional membrane shapes, see new text on lines 326ff. We believe that this agreement corroborates our bending energy calculations. We still focus on values obtained for threedimensional membrane shapes, because of incomplete rotational symmetry. The three-dimensional membrane shapes exhibit variations with the three-fold symmetry of the Piezo proteins, see Figure 2a and b.

      We agree that the bending energy of thermally rough membranes depends on the discretization scheme, because the discretization length of any discretization scheme leads to a cut-off length for fluctuation modes in a Fourier analysis. But again, we average out the thermal noise, for reasons given in the Results section, and analyse smooth membrane shapes.  

      The fitting of the system deformation to the inverse time appears to be incredibly ad hoc ... Nor is it clear that the quantified model will be substantially changed without extrapolation. The authors should either justify the extrapolation more clearly (sorry if I missed it!) or also report the unextrapolated numbers alongside the extrapolated ones. 

      We report the values of the excess area and bending energy in the different time intervals of our analysis as data points in Fig. 4 with supplement. We find it important to report the time dependence of these quantities, because the intended equilibration of the membrane shapes in our simulations is not “complete” within a certain time window of the simulations. So, just “cutting” the first 20 and 50% of the simulation trajectories, and analysing the remaining parts as “equilibrated” does not seem to be a reasonable choice here, at least for the membrane properties, i.e. for the excess area and bending energy. We agree that the linear extrapolation used in our analysis is a matter of choice. At least for the coarse-grained simulations, the extrapolated values of excess areas and bending energies are rather close to the values obtained in the last time windows (see Figure 4). 

      In summary, this paper uses molecular dynamics simulations to quantify the force of the Piezo 1 and Piezo 2 proteins on a lipid bilayer using simulations under controlled tension, observing the membrane deformation, and using that data to infer protein mechanics. While much of the physical mechanism was previously known, the study itself is a valuable quantification. I identified one issue in the membrane deformation energy analysis that has large quantitative repercussions for the extracted model. 

      Reviewer #2 (Public review): 

      Summary: 

      In this study, the authors suggest that the structure of Piezo2 in a tensionless simulation is flatter compared to the electron microscopy structure. This is an interesting observation and highlights the fact that the membrane environment is important for Piezo2 curvature. Additionally, the authors calculate the excess area of Piezo2 and Piezo1, suggesting that it is significantly smaller compared to the area calculated using the EM structure or simulations with restrained Piezo2. Finally, the authors propose an elastic model for Piezo proteins. Those are very important findings, which would be of interest to the mechanobiology field. 

      Whilst I like the suggestion that the membrane environment will change Piezo2 flatness, could this be happening because of the lower resolution of the MARTINI simulations? In other words, would it be possible that MARTINI is not able to model such curvature due to its lower resolution? 

      Related to my comment above, the authors say that they only restrained the secondary structure using an elastic network model. Whilst I understand why they did this, Piezo proteins are relatively large. How can the authors know that this type of elastic network model restrains, combined with the fact that MARTINI simulations are perhaps not very accurate in predicting protein conformations, can accurately represent the changes that happen within the Piezo channel during membrane tension? 

      These questions regarding the reliability of the Martini model are very reasonable and are the reason why we include also results from atomistic simulations, at least for Piezo 2, and compare the results. In the Martini model, secondary structure constraints are standard. In addition, constraints on the tertiary structure (e.g. via an elastic network model) are also typically used in simulations of soluble, globular proteins. However, such tertiary constraints would make it impossible to simulate the tension-induced flattening of the Piezo proteins. So instead, as we write on lines 427ff, “we relied on the capabilities of the Martini coarse-grained force field for modeling membrane systems with TM helix assemblies (Sharma and Juffer, 2013; Chavent et al., 2014; Majumder and Straub, 2021).” In these refences, Martini simulations were used to study the assembly of transmembrane helices, leading to agreement with experimentally observed structures. As we state in our article, our atomistic simulations corroborate the Martini simulations, with the caveats that are now more extensively discussed in the new last paragraph of the Discussion section starting on line 362.

      Modelling or Piezo1, seems to be based on homology to Piezo2. However, the authors need to further evaluate their model, e.g. how it compares with an Alphafold model. 

      We understand the question, but see it beyond the scope of our article, also because of the computational demand of the simulations. The question is: Do coarse-grained simulations of Piezo1 based on an Alphafold model as starting structure lead to different results? It is important to note that we only model the rather flexible 12 TM helices at the outer ends of the Piezo 1 monomers via homology modeling to the Piezo 2 structure, which includes these TM helices. For the inner 26 TM helices, including the channel, we use the high-quality cryo-EM structure of Piezo 1. Alphafold may be an alternative for modeling the outer 12 helices, but we don’t think this would lead to statistically significant differences in simulations – e.g. because of the observed overall agreement of membrane shapes in all our Piezo 1 and Piezo 2 simulation systems.

      To calculate the tension-induced flattening of the Piezo channel, the authors "divide all simulation trajectories into 5 equal intervals and determine the nanodome shape in each interval by averaging over the conformations of all independent simulation runs in this interval.". However, probably the change in the flattening of Piezo channel happens very quickly during the simulations, possibly within the same interval. Is this the case? and if yes does this affect their calculations? 

      Unfortunately, the flattening is not sufficiently quick, so is not complete within the first time windows, see data points in Figure 4. We therefore report the time dependence with the plots in Figure 4 and extrapolate, see also our response above to reviewer 1.

      Finally, the authors use a specific lipid composition, which is asymmetric. Is it possible that the asymmetry of the membrane causes some of the changes in the curvature that they observe? Perhaps more controls, e.g. with a symmetric POPC bilayer are needed to identify whether membrane asymmetry plays a role in the membrane curvature they observe. 

      Because of the rather high computational demands, such controls are beyond our scope. We don’t expect statistically significant differences for symmetric POPC/cholesterol bilayers. On lines 229ff, we now state:

      “Our modelling assumes that any spontaneous curvature from asymmetries in the lipid composition is small compared to the curvature of the nanodome and, thus, negligible, which is plausible for the rather slight lipid asymmetry of our simulated membranes (see Methods).”

      Reviewer #3 (Public review): 

      Strengths: 

      This work focuses on a problem of deep significance: quantifying the structure-tension relationship and underlying mechanism for the mechanosensitive Piezo 1 and 2 channels. This objective presents a few technical challenges for molecular dynamics simulations, due to the relatively large size of each membrane-protein system. Nonetheless, the technical approach chosen is based on the methodology that is, in principle, established and widely accessible. Therefore, another group of practitioners would likely be able to reproduce these findings with reasonable effort. 

      Weaknesses: 

      The two main results of this paper are (1) that both channels exhibit a flatter structure compared to cryo-EM measurements, and (2) their estimated force vs. displacement relationship. Although the former correlates at least quantitatively with prior experimental work, the latter relies exclusively on simulation results and model parameters. 

      Below is a summary of the key points we recommend addressing in a revised version of the manuscript: 

      (1) The authors should report and discuss controls for the membrane energy calculations, specifically by increasing the density of the discretization graining. We also suggest validating the bending modulus used in the energy calculations for the specific lipid mixture employed in the study. 

      We have addressed both points, see our response to the reviewer’s comments for further details.

      (2) The authors should consider and discuss the potential limitations of the coarse-grained simulation force field and clarify how atomistic simulations validate the reported results, with a more detailed explanation of the potential interdependencies between the two. 

      We now discuss the caveats in the comparison of coarse-grained and atomistic simulations in more detail in a new paragraph starting on line 362.

      (3) The authors should provide further clarification on other points raised in the reviewers' comments, for instance, the potential role of membrane asymmetry. 

      We have done this – see above. We now further explain on lines 437ff why we use an asymmetric membrane. On lines 230ff, we discuss that any spontaneous membrane curvature due to lipid asymmetry is likely small compared to the nanodome curvature and, thus, negligible.

      Reviewer #1 (Recommendations for the authors): 

      (1) Report discretization dependence of the membrane energy (up to double the density of the current discretization graining). 

      We have added several text pieces in the paragraph “Excess area and bending energy” starting on line 583 in which we state how the results depend on the lattice constant a of the calculations.

      (2) Evaluate an analytical energy of a membrane bump with a shape similar to the simulation. This would be free of all sampling and discretization artifacts and would thus be an excellent lower bound of the energy. 

      We have done this for the curvature profile in Figure 1c and corresponding curvature profiles of the shape profiles in Figure 2d, see next text on lines 326ff.

      Minor: 

      (1)  The lipid density (Figure 1 right, 2c, 3c) is not interesting nor is it referred to. It can be dropped. 

      We think the lipid density maps are important for two reasons: First, they show the protein shape obtained after averaging conformations, as low-lipid-density regions. Second, the lipid densities are used in the calculation of the bending energies, to limit the bending energy calculations to the membrane in the nanodome, see Eq. 9. We therefore prefer to keep them.

      (2) Figure 7 is attractive but not used in a meaningful way. I suggest inserting the protein graphic from Figure 7 into Figure 1 with the 4-helix bundles numbered alongside the structure. Figure 7 could then be dropped. 

      Figure 7 is a figure of the Methods section. We need it to illustrate and explain aspects of the setup (numbering of helices, missing loops) and analysis (numbering scheme of 4-TM helix units).

      (3) Some editing of the use of the English language would be helpful. "Exemplary" is a bit of a funny word choice, it implies that the conformation is excellent, and not simply representative. I'd suggest "Representative conformation". 

      We agree and have replaced “exemplary” by “representative”.

      (4) Typos: 

      Equation 4 - Missing parentheses before squared operator inside the square root. 

      We have corrected this mistake.

      Reviewer #2 (Recommendations for the authors): 

      This study focuses mainly on Piezo2; the authors do not perform any atomistic simulations of Piezo1, and the coarse-grained simulations for Piezo1 are shorter. As a result, their analysis for Piezo2 seems more complete. It would be good if the authors did similar studies with Piezo1 as with Piezo2. 

      We agree that atomistic simulations of Piezo 1 would be interesting, too. However, because the atomistic simulations are particularly demanding, this is beyond our scope.

      Reviewer #3 (Recommendations for the authors): 

      (1) At line 63, a very large tension from the previous work by De Vecchis et al is reported (68 mN/m). The authors are sampling values up to about 21 mN/m, which is considerably smaller. However, these values greatly exceed what typical lipid membranes can sustain (about 10 mN/m) before rupturing. When mentioning these large tensions, the authors should emphasize that these values are not physiologically significant, because they would rupture most plasma membranes. That said, their use in simulation could be justified to magnify the structural changes compared to experiments. 

      We agree that our largest membrane tension values are unphysiological. However, we see a main novelty and relevance of our simulations in the fact that we obtain a response of the nanodome in the physiological range of membrane tensions, see e.g. the 3<sup>rd</sup> sentence of the abstract. Yes, we include simulations at tensions of 21 mN/m, but most of our simulated tension values are in the range from 0 to 10 mN/m (see e.g. Fig. 3e), in contrast to previous simulation studies.   

      (2) At line 78 and in the Methods, only the reference paper is for the CHARMM protein force field, but not for the lipid force field. 

      We have added the reference Klauda et al., 2010 for the CHARMM36 lipid force field in both spots.

      (3) (Line 83) Acknowledging that the authors needed to use the structure from micelles (because it has atomic resolution), how closely do their relaxed Piezo structures compare with the lowerresolution data from the MacKinnon and Patapoutian papers? 

      There are no structures reported in these papers to compare with, only a clear flattening as stated.  

      (4) (Line 99) The authors chose a slightly asymmetric lipid membrane composition to capture some specific plasma-membrane features. However, they do not discuss which features are described by this particular composition, which doesn't include different acyl-chain unsaturations between leaflets. Further, they do not seem to comment on whether there is enrichment of certain lipid species coupled to curvature, or whether there is any "scrambling" occurring when the dome section and the planar membrane are stitched together in the preparation phase (Figure 8). 

      Enrichment of lipids in contact with the protein is addressed in the reference Buyan et al., 2020, based on Martini simulations with Piezo 1. We have a different focus, but still wanted to keep an asymmetric membrane as in essentially all previous simulation studies as now stated also on lines 439ff, to mimic the native Piezo membrane environment. There is no apparent “scrambling” in the setup of our membrane systems. We also did not explore any coupling between curvature and lipid composition, but will publish the simulation trajectories to enable such studies.  

      (5) (Caption of Figure 2). Please comment briefly in the text why the tensionless simulation required a longer simulation run (e.g. larger fluctuations?) 

      We added as explanation on line 500 as explanation: “ … to explore the role of the long-range shape fluctuations in tensionless membranes for the relaxation into equilibrium”. The relaxation time of membrane shape fluctuations strongly increases with the wave length, which is only limited by the simulation box size in the absence of tensions. However, also for 8 microsecond trajectories, we do not observe complete equilibriation and therefore decided to extrapolate the excess area and bending energy values obtained for different time intervals of the trajectories.

      (6) (Caption of Figure 3). Please clarify in the Methods how the atomistic simulations were initialized were they taken from independent CG simulation snapshots? If not, the use of the adjective "independent" would be questionable given the very short atomistic simulation time length. 

      We now added that the production simulations started from the same structure. On lines 386, we now discuss the starting structure of the atomistic simulations in more detail.

      (7) (Line 202). The approach of discretizing the bilayer shape is reasonable, but no justification was provided for the 1-nm grid spacing. In my opinion, there should be a supporting figure showing how the bending energy varies with the grid spacing. 

      We now report also the effect of a 2-nm grid spacing on the results, see new text passages on page 18, and provide an explanation for the smaller 1-nm grid spacing on lines 587ff, where we write:

      “This lattice constant [a = 1 nm] is chosen to be smaller than the bin width of about 2nm used in determining the membrane shape of the simulation conformations, to take into account that the averaging of these membrane shapes can lead to a higher resolution compared to the 2 nm resolution of the individual membrane shapes.”

      (8) (Line 211). The choice by the authors to use a mixed lipid composition complicates the task of defining a reasonable bending modulus. Experimentally and in atomistic simulations, lipids with one saturated tail (like POPC or SOPC) are much stiffer when they are mixed with cholesterol (https://doi.org/10.1529/biophysj.105.067652, https://doi.org/10.1103/PhysRevE.80.021931, https://doi.org/10.1093/pnasnexus/pgad269). On the other hand, MARTINI seems to predict a slight *softening* for POPC mixed with cholesterol (https://doi.org/10.1038/s41467-023-43892-x). Further complicating this matter, mixtures of phospholipids with different preferred curvatures are predicted to be softer than pure bilayers (e.g. https://doi.org/10.1021/acs.jpcb.3c08117), but asymmetric bilayers are stiffer than symmetric ones in some circumstances (https://doi.org/10.1016/j.bpj.2019.11.3398). 

      This issue can be quite thorny: therefore, my recommendation would be to either: (a) directly compute k for their lipid composition, which is straightforward when using large CG bilayers (as was done in Fowler et al, 2016), but it would also require more advanced methods for the atomistic ones; (b) use a reasonable *experimental* value for k, based on a similar enough lipid composition. 

      We now justify in somewhat more detail why we use an asymmetric membrane, but agree that his complicates the bending energy estimates. We only aim to estimate the bending energy in the Martini 2.2 force field, because our elasticity model is based on and, thus, limited to results obtained with this force field. We have included the two further references using the Martini 2.2 force field suggested by the reviewer on line 213, and discuss now in more detail how the bending rigidity estimate enters and affects the modeling, see lines 226ff.  

      (9) (Line 224). Does this closing statement imply that all experimental work from ex-vivo samples describe Piezo states under some small but measurable tension? 

      We compare here to the cryo-EM structure in detergent micelles. So, there is no membrane tension, there may be a surface tension of the micelle, but we assume here that Piezo proteins are essentially force free in detergent micelles. Membrane embedding, in contrast, leads to strong forces on Piezo proteins already in the absence of membrane tension, because of the membrane bending energy.

      (10) (Line 304). The Discussion concludes with a reasonable point, albeit on a down note: could the authors elaborate on what kind of experimental approach may be able to verify their modeling results? 

      Very good question, but this is somewhat beyond our expertise. We don’t have a clear recommendation – it is complicated. What can be verified is the flattening, i.e. the height and curvature of the nanodome in lower-resolution experiments. We see our results in line with these experiments, see Introduction. 

      (11) (Line 331). The very title of the Majumder and Straub paper addresses the problem of excessive binding strength between protein beads in the MARTINI force field, which should be mentioned. Figure 3(d) shows that the atomistic systems have larger excess areas than the CG ones. This could be related to MARTINI's "stickiness", or just statistical sampling. Characterizing the grid spacing (see point 7 above) might help illuminate this. 

      We discuss now the larger excess area values of the atomistic simulations on lines 381ff.  

      (12) (Lines 367, 375). Are the harmonic restraints absolute position restraints or additional bonds?

      Note also that the schedule at which the restraints are released (10-ns intervals) is relatively quick. Does the membrane have enough time to equilibrate the number of lipids in each leaflet? 

      These are standard, absolute position restraints. The 10-ns intervals may be too short to fully equilibrate the numbers of lipids, we have not explored this. The main point in the setup was to have a reasonable TM helix embedding with a smooth membrane, without any rupturing. This turned out to be tricky, with the procedures illustrated in Figure 8 as solution. If the membrane is smooth, the lipid numbers quickly equilibrate either in the final relaxation or in the initial nanoseconds of the production runs.

      (13) (Line 387) The use of an isotropic barostat for equilibration further impedes the system's ability to relax its structure. I feel that the authors should validate more strongly their protocol to rule out the possibility that incomplete equilibration could bias dynamics towards flatter membranes, which is one of the main results of this paper. 

      We don’t see how choices in the initial relaxation steps could have affected our results, at least for the coarse-grained simulations. There is more and more flattening throughout all simulation trajectories, see e.g. the extrapolations in Figure 4. All initial simulation structures are significantly less flattened than the final structures in the production runs.

      (14) (Line 403). What is the protocol for reducing the membrane size for atomistic simulation? This is even more important to mention than for CG simulations. 

      We just cut lipids beyond the intended box size of the atomistic simulations. As a technical point, we now have also added on line 507 how PIP2 lipids were converted.

      (15) (Line 423). The CHARMM force field requires a cut-off distance of 12 Å for van der Waals forces, with a force-based continuous switching scheme. The authors should briefly comment on this deviation and its possible impact on membrane properties. Quick test simulations of very small atomistic bilayers with the chosen composition could be used as a comparison. 

      We don’t expect any relevant effect on membrane properties within the statistical accuracies of the quantities of interest here (i.e. excess areas).

      (16) (Equation 4). There are some mismatched parentheses: please check. 

      We have corrected this mistake.

      (17) (Equations 7-8). Why did the authors use finite-differences derivatives of z(x,y) instead of using cubic splines and the corresponding analytical derivatives? 

      In our experience, second derivatives of standard cubic splines can be problematic. The continuous membrane shapes we obtain in our analysis are averages of such splines. We find standard finite differences more reliable, and therefore discretize these shapes. Already for the 2d membrane profiles of Figure 1b and 2d, calculating curvatures from interpolations using splines is problematic.

    1. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary:

      This is a very insightful work showing how to disentangle one of the most complex transcriptional networks in yeast (S. cerevisiae) by combining single-cell dynamics, dynamical-systems modeling, Bayesian-style inference, and genetic perturbations. The authors tackle a problem that has eluded quantitative resolution for over two decades-how yeast regulates its seven primary glucose importer genes (HXT1-HXT7) in response to both steady and temporally changing extracellular [glucose]. Their integrated experimental-theoretical approach delivers the most satisfying mechanistic and quantitative explanation to date, and I enthusiastically recommend this manuscript for publication.

      Yeast relies on seven passive hexose transporters (Hxt1-Hxt7) to import glucose, its preferred sugar; deleting all seven abolishes growth on glucose. The underlying regulatory network is exceptionally intricate, reflecting yeast's evolutionary priority for glucose. Two membrane sensors-Snf3 (high affinity) and Rgt2 (low affinity)-detect extracellular glucose and thereby inactivate two co-repressors, Mth1 and Std1, which modulate the DNA-binding factor Rgt1. Concurrently, intracellular glucose activates the SNF1 kinase, phosphorylating and exporting the repressor Mig1, while Mth1/Std1 also govern the transcription and stability of Mig2, another DNA-binding repressor. Together, Rgt1, Mig1, and Mig2 integrate these inputs to control HXT promoter activity (Fig. 2A). Importantly, Mth1 and Std1 do not directly bind to DNA and this complication - the protein-protein interaction that one cannot get from DNA sequence - is just one source of difficulty that the authors overcame.

      To map the network's behavior, the authors used microfluidic "cages" housing single cells expressing GFP-tagged HXTs, monitoring fluorescence under three constant glucose levels-low (0.01%), medium (0.1%), and high (1%) (Fig. 1B-C). The authors confirm that steady-state Hxt abundances rank by transporter affinity. But the more important and surprising discovery is that when the cells were subjected to gradual glucose up-shifts and down-shifts, they discovered that some transporters transiently spike only when [glucose] rises and others only when [glucose] falls (Fig. 1C and Fig. S1F). This discovery establishes that the HXT network not only "senses" the absolute external [glucose] concentration but also the direction of the temporal change in external [glucose].

      To understand how the regulatory network yields such intricate temporal changes in HXT expression, the authors first focused on the medium-affinity transporter, Hxt4. Targeted knockouts of Mig1/Mig2 versus Mth1/Std1 confirmed that Hxt4 dynamics arise from differential repressor kinetics. To formalize these findings, the authors built an ODE model grounded in literature-based constraints (pg. 13 of the Supplement) with explicit separation of repressor timescales. They rigorously fit the model to wild-type and knockout time series-exploring parameter sensitivity in depth (Fig. S5).

      The authors discovered that their model and experiments converged on a push-pull mechanism: fast-acting Mig1/Mig2 dominate during glucose up-shifts, while slower Mth1/Std1 govern down-shifts, determining whether each HXT gene is repressed or de-repressed (i.e., "who gets there first"). Extending this analysis across all seven HXTs via approximate Bayesian computation revealed the most likely repressor-promoter interactions for each transporter, reducing a vast parameter space to unique or small sets of plausible regulatory schemes. The authors thus revealed what could be happening and which regulations are improbable - a more nuanced and comprehensive view than giving just one outcome for each HXT.

      Overall, this work represents a role model - textbook-worthy - for quantitative systems biology. Beyond the rigor and novelty of its findings, the authors explain complex mathematical concepts with clarity, and the narrative flows logically from experiment to model to inference. This study provides a definitive mechanistic resolution of the HXT network and establishes a broadly applicable framework for dissecting dynamic and complex gene circuits.

      Major points:

      I don't recommend any new experiments or modeling; the major claims are already well supported by the data and models. Below are comments and questions intended to improve clarity and facilitate the reader's understanding. Please feel free to disregard any that you find not sensible or beyond the scope of the current work.

      1. Preconditioning (Fig. 1B-C): What medium were cells in immediately before t = 0? Were they in log-phase or stationary-phase growth just prior to the glucose addition?
      2. Transporter ranking in medium glucose: In the medium [glucose] regime, why is a low-affinity Hxt the second-most highly expressed, rather than the next-highest-affinity transporter? Could co-expression of multiple affinities (e.g., as a bet-hedging strategy) be advantageous? The Discussion section already mentions bet-hedging but I think you could further discuss ideas such as evolutionarily trained "Pavlovian" response or what the 2nd-ranking says about what the yeast anticipates as an upcoming change in the environment.
      3. Defining low/medium/high regimes: Low = 0.01%, Medium = 0.1%, and High = 1%. This is indeed in line with the standard classification of [glucose] in the literature regarding HXTs. But how might your results change at intermediate concentrations - those between these three levels. Using the model, could you comment on whether HXT expression dynamics "sharply" change as a function of either the [glucose]/time or the final concentration of [glucose] after the ramping-up phase?
      4. Rate-affinity trade-off (Lines 18-20): Give a brief explanation of the rate-affinity trade-off. Why does higher affinity necessarily entail a lower maximal transport rate (Vₘₐₓ) for passive transporters? Perhaps you can give an intuitive explanation backed by mass-action kinetics (e.g., to attain a higher affinity, the glucose-binding pocket on Hxt cannot be flipping rapidly back-and-forth between facing cytoplasm and extracellular space -- the binding pocket must allow sufficient time for molecule to find and bind it).
      5. Single-transporter expression (Lines 39-40): It's unclear to me why cells would express only the "optimal" Hxt and suppress all others. For instance, a bet-hedging strategy might favor simultaneous expression of multiple affinities. Consider revising these lines or adding a brief explanation. Related to above is a subtle point I think that was glossed over: there must be a fitness cost associated with making too many copies of Hxtn. After all, why not make as many transporters as possible? Is the cell operating near the upper limit of Hxt abundance, beyond which there's a fitness cost? Is there a pareto-optimal-type front in the space of expression level and another axis? I think this could go into the Discussion section.
      6. Hxt5 exception (Fig. 1B): Although Hxt5 follows a distinct regulatory scheme, it is most highly expressed at medium [glucose] (0.1%), consistent with its affinity like the other Hxts. I think you could mention this in lines 51-58.
      7. Glucose-ramp details (Fig. 1C; Lines 66-67): You state that [glucose] rises from 0 to 1 % over 15 min and reaches 1 % at t = 3 h. However, the actual ramp slope ([glucose]/time) and when the [glucose] starts to increase from zero aren't specified. The Hxt5-GFP behavior and differing Hxt6/7 levels at t = 0 vs. t = 20 h suggest the ramp may begin later than t = 0. Please clarify these details in the caption and main text, and consider adding a [glucose] vs. time schematic above the panel in Fig. 1C (like in Fig. 1B).
      8. Pre-t < 0 incubation (Fig. 1C): Related to point 1, how long were the cells incubated in pyruvate (or other medium) before t < 0? The Hxt6-GFP level at t = 20 h does not match that at t = 0; what is the timescale for Hxt6-GFP and Hxt7-GFP decay to steady state after glucose removal?
      9. Hxt-GFP localization: Does the reported Hxt#-GFP level include fluorescence from both the plasma membrane and internal compartments (e.g., vacuole)? Clarifying which pools of fluorescence are quantified would help interpretation, even if they don't change the main conclusions are unchanged.
      10. Predominantly transcriptional" wording (Lines 90-92): The phrase "...the regulation is predominantly transcriptional" should specify that it refers to the induction of HXT4 transcription during glucose down-ramping, rather than the subsequent decrease in Hxt4-GFP. The experiments do not rule out post-translational regulation (e.g., endocytosis) once glucose levels fall below a threshold.
      11. Glucose "protection" of Hxt4 (Lines 121-122): The statement "we allowed glucose to protect Hxt4 from degradation" is unclear. First, Hxt4-GFP likely degrades at a different rate than free GFP-you could estimate its half-life from Fig. S3. Second, please explain precisely what "protection" means in the model or experiment.
      12. Quantifying repressor kinetics (Lines 158-162): The push-pull mechanism is compelling, but it would be helpful to report the quantitative separation of timescales-e.g., how much faster do Mig1/Mig2 respond compared to Mth1/Std1? Including fold-difference would strengthen this explanation.
      13. Mechanism of repressor regulation (Lines 197-213): Be clearer about whether and how changes in extracellular glucose alter the expression levels of Mth1, Std1, Mig1, and Mig2, as opposed to modulating say, how Mth1 and Std1 bind to Rgt2 protein. I think you could be clearer here about which regulatory steps (transcriptional, post-translational, or binding-affinity changes) are assumed in the model and supported by the data.

      Minor points:

      1. Abstract: Original: "...how an HXT for a medium-affinity transporter can be made to respond like the HXTs for the other transporters." Suggestion: "...how the gene-expression regulation of a medium-affinity HXT can be rewired to respond like that of any other HXT." (You might also generalize beyond "medium-affinity" if the converse holds.)
      2. Lines 64-66: Please emphasize that the "synthetic complete medium" used for pre-conditioning contains no glucose.
      3. Line 143: The phrase "low expression of the std1\Delta strain in glucose" is ambiguous-low expression of which gene or reporter? Please specify.
      4. Line 240: Change "should weakened" to "should weaken."
      5. Fig. S9 caption (typo) Change "Rtg1 sites are..." to "Rgt1 sites are...."

      Hyun Youk.

      Referee cross-commenting

      I agree with the other reviewers' comments. The other reviewers noticed important points I have missed. But like them, I'm still supportive of the work being published with < 1 month spent on revision. I still don't recommend any further experiments or modeling.

      Significance

      This is a very insightful work showing how to disentangle one of the most complex transcriptional networks in yeast (S. cerevisiae) by combining single-cell dynamics, dynamical-systems modeling, Bayesian-style inference, and genetic perturbations. The authors tackle a problem that has eluded quantitative resolution for over two decades-how yeast regulates its seven primary glucose importer genes (HXT1-HXT7) in response to both steady and temporally changing extracellular [glucose]. Their integrated experimental-theoretical approach delivers the most satisfying mechanistic and quantitative explanation to date, and I enthusiastically recommend this manuscript for publication via Review Commons.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC-2025-03083 Corresponding author(s): David Fay General Statements [optional] This section is optional. Insert here any general statements you wish to make about the goal of the study or about the reviews.

      We greatly appreciate the input of the four reviewers, all of whom carried out a careful reading of our manuscript, provided useful suggestions for improvements, and were enthusiastic about the study including its thoroughness and utility to the field. Because the reviewers required no additional experiments, we were able to address their comments in writing.

      However, in response to a comment from reviewer #4 we decided to add an additional new biological finding to our study given that our functional validation of proximity labeling targets was not extensive. Namely, we now show that a missense mutation affecting BCC-1, one of the top NEKL-MLT interactors identified by our proximity labeling screen, is a causative mutation (together with catp-1) in a strain isolated through a forward genetic screen for suppressors of nekl molting defects (new Fig 9C). This finding, combined with our genetic enhancer tests, further strengthens the functional relevance of proteins identified though our proximity labeling approach and highlights the synergy of proteomics combined with classical genetics.

      Positive statements from reviewers include: Reviewer #1: Overall, this is an outstanding study that will be of great interest to those interested in using proximity labeling to identify interactors of their favorite protein. The experiments are well executed and the data presented in a mostly clear manner.

      Reviewer #2: The key conclusions are convincing, and the work is rigorous. The work provides a clear roadmap to reproducing the data. The experiments are adequately replicated, and statistical analysis is adequate... In many papers, TurboID seems very trivial but this paper clearly highlights the limitations and will be an invaluable resource for labs that want to get proximity labeling established in their labs.

      Reviewer #3: Overall, the claims are solid and conclusions supported. The data and methods are substantial to enable reproducibility in other labs. The experiments have been repeated multiple times with particular attention to statistical analysis. ...This manuscript represents a methodological advance that will likely become an oft-cited reference for members of the C. elegans community and a springboard for other basic biomedical scientists wanting to adapt rigorous proximity labeling techniques to their system.

      Reviewer #4: Fay et al. present a solid, clear and comprehensive BioID-based proteomics study that takes into account and discusses decisive aspects for the (re)production and analysis of high-quality TurboID-based mass spectrometry data. Claims and conclusions are generally well and sufficiently supported by the presented data and illustrated with figures (throughout the text as well as with plenty of supplementary data)... Basic consideration and thoughts for the experimental design and MS data analysis are given in detail and can serve as another guideline for future studies.

      Based on these reviews and comments, we believe that our manuscript is suitable for publication in a high-impact journal. 1. Point-by-point description of the revisions This section is mandatory. Please insert a point-by-point reply describing the revisions that were already carried out and included in the transferred manuscript.

      *Reviewer #1 (Evidence, reproducibility and clarity (Required)): *

      *Proximity labeling has become a powerful tool for defining protein interaction networks and has been utilized in a growing number of multicellular model systems. However, while such an approach can efficiently generate a list of potential interactors, knowledge of the most appropriate controls and standardized metrics to judge the quality of the data are lacking. The study by Fay systematically investigates these questions using the C. elegans NIMA kinase family members NEKL-2 and NEKL-2 and their known binding partners MLT-2, MLT-3 and MLT-4. The authors perform eight TurboID experiments each with multiple NEKL and MLT proteins and explore general metrics for assessing experimental outcomes as well as how each of the individual metrics correlates with one another. They also compare technical and biological replicates, explore strategies for identifying false positives and investigate a number of variations in the experimental approach, such as the use of N- versus C-terminal tags, depletion of endogenous biotinylated proteins, combining auxin-inducible degradation, and the use of gene ontology analysis to identify physiological interactors. Finally, the authors validate their findings by demonstrating that a number of the candidate identified functionally interact with NEKL-2 or components of the WASH complex. *

      Overall this is an outstanding study that will be of great interest to those interested in using proximity labeling to identify interactors of their favorite protein. The experiments are well executed and the data presented in a mostly clear manner. I really like this study (particularly because I plan to do a proximity labeling study of my own), but I did come away less than impressed with some of the analysis. This is a data-dense manuscript, and it appears to me that the authors tried to cover so much ground that in some cases very little insight was provided. For instance, the authors promote the use of data independent acquisition (DIA) as compared to the more commonly used data dependent acquisition (DDA). However the authors do not provide any analysis to indicate one approach is better than the other. Likewise the combined use of auxin-induced degradation and proximity labeling is explored but there is very little to take away from these experiments. Despite these issues, I am very enthusiastic about the study as a whole. Below I list major and minor concerns.

      Major concerns * 1. My biggest issue with the manuscript is that a lot is made of the use of data independent acquisition (DIA) as compared to the more commonly used data dependent acquisition (DDA). The authors perform experiments using DIA and DDA approaches but do not directly compare the outcomes. As a result there is really no way to know if one approach is better than the other. I would suggest the authors either perform the necessary analysis to compare the two approaches or tone down their promotion of DIA.* We agree and have scaled back any statements comparing DDA to DIA as our manuscript did not address this directly. We also now point out this caveat in our closing thoughts section, while referencing other studies that compared the two (lines 926-929). Our main point was to convey that DIA worked well for our proximity labeling studies but has seen little use by the model organism field. Surprising (to us), DIA was also considerably less expensive than DDA options.

      2. Line 75, The authors promote the use of data-independent acquisition (DIA) without defining what this approach is and how it differs from the more conventional data-dependent acquisition. As a non-mass spectroscopist, I found myself with lots of question concerning DIA, what it is and how it differs from DDA. I think it would really be helpful to expand the description of DIA and its comparison with DDA in the introduction. As non-mass-spectroscopists ourselves, we understand the reviewer's point. Because the paper is quite long, we were trying to avoid non-essential information. We have now added some information to explain some of the key differences between DDA and DIA. We have also included references for readers who may want to learn more. (lines 77-80)

      Minor concerns: * Line 92 typo. I believe the authors meant to say NEKL-2-MLT-2-MLT-4. * Corrected. (line 95)

      Line169. Is exogenous the correct word to use here? It suggests that you are talking about non-worm proteins, but I know you are not. Corrected. Changed to "Moreover, the detection of biotinylated proteins may be difficult if the bait-TurboID fusion is expressed at low levels..." (line 181).

      Line 177 typo (D) should be (C). Corrected. (line 1122)

      Figure 1C: Lucky Charms may sue you for infringement of their trademarked marshmallow treats. Thank you for picking up on this. The authors accept full responsibility for any resulting lawsuits.

      Figure 1D. The NEKL-2::TurboID band is indicated with a green triangle in the figure but the figure legend states that green triangles indicate mNG::TurboID control. I know this triangle is a shade off the triangle that indicates mNG::TurboID but it's really hard to see the difference. All of the differently colored triangles in panel F are unnecessary. I would either just pick one color for all non-control bait proteins or better yet, only use a triangle to point to bands that are not obvious. For instance I don't need the triangles that point to NEKL-2 -3 and -4 fusion proteins. These are just distracting. We understand the reviewer's point. We colored the triangles to match the colors used for the proteins in the figures. We have now added "bright green triangles with white outlines" (Fig 1 legend) to indicate the Pdpy-7::mNG::TurboID control" and changed triangles in the corresponding figures. Although we would be fine with removing or changing the triangles, we think that they may aid somewhat with clarity.

      Line: 316: Conceivably, another factor that could contribute to the counterintuitive upregulation of some proteins in the N2 samples is related to the fusion proteins that are being expressed in the TurboID lines. A partially functional bait protein (one with a level of activity similar to nekl-2(fd81) that may not result in an obvious phenotype) could directly or indirectly affect gene expression leading to lower levels of a subset of proteins in the TurboID samples. The same could be said for fusion proteins with a gain-of-function effect. This is an interesting idea, and we tested this possibility by looking for consistent overlap between N2-up proteins between biological replates of individual bait proteins. We now include a representative Venn diagram in S3C Fig to highlight this comparison. In summary, although we cannot rule out this possibility, our analysis did not support the widespread occurrence of this effect in our study. We also made certain that our statement regarding N2 up proteins was not too definitive. (lines 285-288)

      *Fig 3 B-E. I am a little confused how the data in these graphs is normalized. For instance, I would have expected that for NEKL-3 in panel B, that the normalized (log2) intensity value in N2 be set at 0 as it is for NEKL-2. Maybe I just don't have enough information on how these plots were generated. * The difference is that in the N2 sample, NEKL-3 was detected but NEKL-2 was not. The numbers themselves are assigned by the Spectronaut software used to quantify the DIA results but are not meaningful beyond indicating relative amounts (intensity values) of a given protein within an individual biological experiment. We've added some lines to the figure legend to make this clearer. (lines 1165-1169)

      *Figure 6C legend is not correct. * Corrected. (line 1214)

      Line 575: Figure reference should be Fig. S5G. The authors should check to make sure all references to supplemental figures include correct panel information. Corrected. (line 464) In addition, we have now gone through the manuscript and added panel numbers references where applicable. Note that the addition of a new supplemental file has shifted the numbering.

      Line 576. The authors reference a study by Artan and colleagues and report a weak correlation between their study and that of Artan. They reference figure S4 but it should be Fig S5H. Apologies and many thanks to the reviewer for catching these errors. (line 464)

      Line 652. The authors note that numerous proteins were present at substantially reduced levels in the mNG::TurboID samples and suggest that sticky proteins may have been outcompeted or otherwise excluded from beads incubated with the mNG::TurboID lysates. Why would sticky proteins only be a problem in these samples? The reasoning is not clear to me. The idea was that in the sample with very high levels of biotinylated proteins (mNG::TurboID), the surface of the beads might become saturated with high-affinity biotinylated proteins. This could prevent or out complete the binding of random proteins that are not biotinylated but nevertheless have some affinity to the beads ("sticky" proteins). We have reworded this section to make this clearer. (lines 546-550)

      Line 745: The term "bait overlaps" is a bit vague. Ultimately, I figured out what it meant but it was not immediately obvious. We have changed this to "overlap between baits" and made this section clearer. (line 624-628)

      *S7B Fig. Why is actin missing from the eluate? * In S7B we refer to the purified eluate as the "eluate", which may have caused some confusion. In other sections of the manuscript, we refer to the bead-bound proteins as the "purified eluate" (Figs 1 and 5). For the purified eluate a portion of the streptavidin beads are boiled in sample buffer to elute the bound proteins before running a western. Actin would not be expected in these samples because it's (presumably) not biotinylated in our samples and doesn't detectably bind the beads. This result was seen in all relevant westerns in S1 Data. For consistency, however, we've gone through all our files to make sure we consistently use the term "purified eluate" versus "eluate", which is less specific.

      L*ine 873: The authors state the extent of overlap in GO terms between the various experiments and provide percentages. I tried to extract this information from Figure 8C and came up with different values. For instance, in the case of Molecular Function, they state that they observed a 54% overlap between NEKL-2 and NEKL-3 but in the Venn diagram in Figure 8C I see that the NEKL-2 and NEKL-3 experiments had 71 (25+46) GO terms in common. Out of 98 GO terms for NEKL-2 or 104 for NEKL-3 the percentage I got is closer to 72. Am I analyzing this correctly? * Thanks for checking this. We believe our method for calculating the percent overlap is correct. In the case of NEKL-2/NEKL-3 overlap for Molecular Function, there are 131 total unique terms, of which 71 overlap, giving a 54% overlap. In the case of NEKL-2/NEKL-3 overlap for Biological Process, however, we made an error in arithmetic (415 unique, 239 overlap), such that the correct percentage is 58%, which we have corrected in the text.

      *Reviewer #1 (Significance (Required)): *

      *Overall this is an outstanding study that will be of great interest to those interested in using proximity labeling to identify interactors of their favorite protein. The experiments are well executed and the data presented in a mostly clear manner. I really like this study (particularly because I plan to do a proximity labeling study of my own), but I did come away less than impressed with some of the analysis. This is a data-dense manuscript, and it appears to me that the authors tried to cover so much ground that in some cases very little insight was provided. For instance, the authors promote the use of data independent acquisition (DIA) as compared to the more commonly used data dependent acquisition (DDA). However the authors do not provide any analysis to indicate one approach is better than the other. Likewise the combined use of auxin-induced degradation and proximity labeling is explored but there is very little to take away from these experiments. Despite these issues, I am very enthusiastic about the study as a whole. *

      *Reviewer #2 (Evidence, reproducibility and clarity (Required)): *

      *This study expanded the use of data-independent acquisition-mass spectrometry (DIA-MS) in TurboID proximity-labeling proteomics to identify novel interactors of NEKL-2, NEKL-3, MLT-2, MLT-3, and MLT-4 complexes in C. elegans. The authors described several useful metrics to evaluate the quality of TurboID experiments, such as using the percentage of upregulated genes, the percentage of proteins present only in bait-TurboID experiments as compared to N2 controls, and the percentage of endogenously biotinylated carboxylases as internal controls. Further, the authors introduced methodological variability across 23 TurboID experiments and evaluated any improvement to the resulting data, such as N-terminally tagging bait proteins with TurboID, depleting endogenous carboxylases, and auxin-inducible degradation of known complex members. Finally, this study identified the kinase folding chaperone CDC-37 and the WASH complex component DDL-2 as novel interactors with the NEKL-MLT complexes through an RNAi-based enhancer approach following their identification by TurboID. *

      Major comments: * The key conclusions are convincing, and the work is rigorous. The work provides a clear roadmap to reproducing the data. The experiments are adequately replicated, and statistical analysis is adequate. We only have minor comments.*

      Minor comments: * •In the western blot in Fig 1 why does the mNG::Turbo have two bands? * Thank you for point this out. To our knowledge this is a breakdown product that was especially prevalent in replicate 3 (also see S1 Data), which we chose to shown because all the NEKL-MLTs were clearly visible in this western. The expected size of the mNeonGreen::TurboID (including linker and tags) is ~68 kDa and our blots are roughly consistent those of Artan et al., (2001). This lower band was not evident in Exp 8. We have now included a statement in the figure legend to indicate that the upper band is the full-length protein whereas the lower band is likely to be a breakdown product (lines 1141-1142).

      •Fig 2B is difficult to parse as a reader. Columns labeled "Upreg," "Downreg," "TurboID only," "N2 only," "Filter-1," "Filter-2," and "Epi %" could be moved to Supplemental. Fold change vs N2 could be represented as a bar chart, allowing for trends between fold change and the metrics Upreg %, Turbo %, and Carboxylase % to be seen more clearly. Further, rows headed "Carboxylase depletion," "DDA," and "Auxin treated" could be presented as separate panels to better match the distinct points made in the text. After serious consideration we have made several changes including the addition of S2 Fig, which may provide readers with a better visual representation of the bait and prey fold changes observed in all our experiments. However, we feel that the detailed data embedded in Fig 2 is the most concise and accurate means by which to convey our full results and is key to our methodological conclusions. As such we did not want to relegate this information to a supplemental table. We note that this figure was not found to be problematic by other reviewers, although we do understand the points made by this reviewer.

      •Line 179: in vivo should be italicized Because journals differ in their stylistic practices, we are currently waiting before doing our final formatting. We did keep our use of Latin phrases consistently non-italicized in the draft.

      •Lines 215-217: The comparison between Western blot expression levels and prior fluorescent reporter levels is unclear. Could be reformatted to make it clearer that relative expression of the different NEKL-MLTs in this study is consistent with prior data. We reformatted this sentence to improve clarity. (lines 205-207)

      *•Lines 267-268: The final line of the passage is unclear and can be removed. * This sentence has been removed.

      •Lines 311-313: This study is able to use the recovery of bait and known interactor proteins as internal controls to determine the quality of each experiment, but this may not always be the case for other users' experiments. The authors should comment on how Upreg %, a value influenced by many factors, can actually be used as a quality check when a bait protein has no known interactors. We have added language to highlight this point. (lines 344-348)

      *•Line 702: There is a [new REF] that should be removed * As described above, we have now included this finding on bcc-1 as part of this manuscript (Fig 9C).

      •The approach used mixed stage animals, but some genes oscillate or are transiently expressed. Please discuss cost-benefit of mixed stage vs syncing. This is an important point. We have added a discussion on the benefits and drawbacks of using mixed stages to the discussion. (lines 901-911)

      *•Authors were working on hypodermally expressed proteins. It would be valuable to discuss what tissues are amenable to TurboID. Ie are the cases where there are few cells (anchor cell, glial sockets, etc) that it will be extremely challenging to perform this technique * We agree that certain tissues/proteins will not be amenable to proximity labeling. We believe that we have addressed this point together with the above comment throughout the manuscript and now on lines 936-940.

      •Authors mention approaches such as nanobodies, split Turbo. Based on their experiences it would be valuable to add Discussion on strengths and weaknesses of these approaches to guide folks considering TurboID and DIA-MS experiments in C. elegans Because we have not tested these methods, we feel that we cannot provide a great deal of insight into these alternate approaches. We mention and reference these methods in the introduction so that readers are aware of them.

      *Reviewer #2 (Significance (Required)): *

      •Advance in technique: This study expands the use cases of data-independent acquisition MS method (DIA-MS) in C. elegans, which fragments all ions independent of the initial MS1 data. The benefits of this approach include better reproducibility across technical replicates and better recovery of low abundance peptides, which are critical for advancing our ability to capture weak and transient interactions.

      •The use of DIA-MS in this study has improved our understanding of the partners of these NEKL-MLTs in membrane trafficking, molting, and cell adhesion within the epidermis.

      •In many papers, TurboID seems very trivial but this paper clearly highlights the limitations and will be an invaluable resource for labs that want to get proximity labeling established in their labs.

      *Reviewer #3 (Evidence, reproducibility and clarity (Required)): *

      *Summary: *

      Fay and colleagues perform a series of proximity labeling experiments in C. elegans followed by thorough and rational analysis of the resulting biotinylated proteins identified by LC-MS/MS. The overall goals of the study are to evaluate different techniques and provide practical guidance on how to achieve success. The major takeaways are that integration of data-independent acquisition (DIA) along with comparison of endogenously tagged TurboID alleles to soluble TurboID expressed in the same tissue results in improved detection of bona-fide interactors and reduced numbers of false-positives.

      *Major comments: *

      Overall the claims are solid and conclusions supported. The data and methods are substantial to enable reproducibility in other labs. The experiments have been repeated multiple times with particular attention to statistical analysis. I have no major concerns with the manuscript and focus primarily on improving the accessibility of this important contribution to the scientific community. As such, I suggest that the authors:

      1) Provide more explanation of and rationale for using DIA. This is not yet a standard technique and most basic biomedical scientists will be unaware of the jargon. As I expect many labs in the C. elegans community and beyond will be interested in the guidance provided in this manuscript, the introduction offers a great opportunity to bring the reader up to speed, as opposed to sending them to the complicated proteomics analysis literature. We have added some additional context (lines 77-80) as well as new references. We note that getting into the technical differences between DIA and DDA, beyond what we briefly mention, would take a substantial amount of space, may not be of interest to many readers, and can be found through standard internet and (sigh) AI-based searches.

      *2) Provide a better overview of the various protocols tested (Experiments 1-8). Maybe at the beginning of the results, and maybe with an accompanying schematic. As currently written, it is difficult to figure out details regarding how the experiments vary and why. * We have now added a short paragraph to better inform the reader at the front end regarding the major experiments. (lines 139-146).

      3) As to be expected, expression of TurboID tags at endogenous levels via low abundance proteins in a complex multicellular system results in somewhat weak signals that flirt with the limit of detection. Perhaps by combining tagged alleles within the same complex (NEKL-3/MLT-3 or NEKL-2/MLT-2/MLT-4) the signals could be boosted? Tandem tags, either on one end or multiple ends of proteins might help as well. As the authors point out, a benefit of tagging the two NEKL-MLT complexes is that there are strong loss-of-function phenotypes (lethal molting defects) to help evaluate whether a tagging strategy results in a non-functional complex. THESE EXPERIMENTS ARE OPTIONAL and might simply be discussed at the authors discretion. These are interesting ideas that we have now incorporated into our discussion. (lines 936-940)

      *Minor Comments: *

      *1) Figure 3A is cropped on the right. * Thank you for catching this. Corrected.

      *2) Better define [new REF] on line 702. * We have added new results (Fig 9C), obviating the need for this reference.

      ***Referee cross-comments** *

      Overall, I am in agreement with, and supportive of, the other reviewers' comments.

      *Reviewer #3 (Significance (Required)): *

      *Significance: *

      Proximity labeling is often proposed as a technique to determine interaction networks of proteins in vivo, but in practice it remains challenging for most labs to execute a successful experiment, especially within the context of multicellular model organisms. Fay and colleagues provide a much needed roadmap for how to best approach proximity labeling experiments in C. elegans that will likely apply to other model systems.

      They establish a rigorous approach by choosing to endogenously tag components of two essential NEKL-MLT complexes required for C. elegans molting. These complexes are relatively low abundance as they are only expressed in a single cell type, the hyp7 epidermal syncytium. In addition, as inactivation of any member of the complexes results in molting defects, they have a powerful selection for functional tags. Thus, they have set a high bar for themselves in order to discern whether a given variation on the experimental approach results in improved detection of interactors and fewer false positives.

      *Potential areas for improvement include lowering the expression level of the skin-specific soluble TurboID used to determine non-specific biotinylation events. This control results in much higher levels of biotinylation compared to the TurboID-tagged NEKL-MLT alleles and likely affects their analysis, which they openly admit. In addition, to reduce the high level of background biotinylation signals generated by endogenous carboxylases, they adopt a depletion strategy pioneered by other researchers but this does not offer major improvements in detection of specific signals. The source of these conflicting results remains to be determined. It is also curious that auxin-inducible degradation of components of the NEKL-MLT complexes did not robustly alter the resulting biotinylating capacity of other members. This approach should be evaluated in subsequent studies. Finally, as mentioned in Major Comment #3 (above), it would be interesting to see if combining TurboID tags within the same complex might improve signal-to-background ratios. *

      This manuscript represents a methodological advance that will likely become an oft-cited reference for members of the C. elegans community and a springboard for other basic biomedical scientists wanting to adapt rigorous proximity labeling techniques to their system. I am a cell biologist that uses a variety of genetic, molecular and biochemical approaches, mostly centered around C. elegans. I have used LC/MS-MS in our studies but have relatively little expertise in evaluating all aspects of proteomic pipelines.

      *Reviewer #4 (Evidence, reproducibility and clarity (Required)): *

      *Fay et al. describe an extensive proximity labeling BioID study in C. elegans with TurboID and DIA-LCMS analysis. They chose the NEKL-2/3 kinases and their known interactors MLT-2/3/4 as TurboID-fused bait proteins (C- and partially N-terminal fusions encoded from CRISPR-mediated genome edited genes). With eight biological replicates (and three to four technical replicates each) and with the unmodified wildtype or mNeonGreen-TurboID expressing worms as controls, a comprehensive dataset was generated. Although starting from quite different abundances of the bait-fusions within the cell lysates all bait proteins and known complex-binding partners were convincingly enriched with capturing streptavidin beads after only one hour of incubation with the lysate. This confirms the general applicability of TurboID-BioID approach in C. elegans. The BioID method typically gives rise to large proteomics datasets (up to more than thousand proteins identified after biotin capture) with several tens to hundreds enriched proteins (against negative control strains) as potential proteins that localize proximal to the bait-TurboID protein. However, substantial variations of candidates between biological replicates are frequently observed in BioID experiments. The authors scrutinized their dataset towards indicative metrics, filters and cutoffs in order to separate high-confidence from low-confidence candidates. With the workflow applied the authors melt down the number of candidates to 15 proteins that were grouped in four functional groups reasonably associated to NEKL-MLT function. *

      Successful BioID experiments depend on reliable enrichment quantification with mass spectrometry using control cell lines that require a carefully bait-tailored design. Those must adequately express TurboID controls matching the abundance of the bait-TurboID fusion protein and its biotinylation activity. After affinity capture, sample preparation and LCMS data acquisition there is no silver bullet towards the identification true bait neighbors. Fay et al. elaborately describe their considerations and workflow towards high-confidence candidates. The workflow considered (i) data analysis with Volcano plots to account for statistical reproducibility of biological replicates against negative controls, (ii) fraction of proteins only detected in the positive or negative controls thus evading the fold-enrichment quantification approach, (iii) evaluation of variations in carboxylase enrichment as a measure for variations in the general biotin capture quality between experiments, (iv) an assessment of technical reproducibility with scatter plots and Venn diagrams, (v) exclusion of potentially false positives, e.g. promiscuously biotinylated non-proximal proteins, through comparisons with control worms expressing a non-localized mNeonGreen-TurboID fusion protein, (vi) batch effects, (vii) the impact of endogenous biotinylated carboxylases through depletion, (viii) gene ontology analysis of enriched proteins, (ix) weighing data according to the quality of individual experiments according to the afore mentioned metrics, and finally (x) genetic interaction studies to functionally associate high-confidence candidates with the bait.

      *Major comments: *

      Fay et al. present a solid, clear and comprehensive BioID-based proteomics study that takes into account and discusses decisive aspects for the (re)production and analysis of high-quality TurboID-based mass spectrometry data. Claims and conclusions are generally well and sufficiently supported by the presented data and illustrated with figures (throughout the text as well as with plenty of supplementary data). However, although the authors claim to seek for substrates of the kinase complex they drew no further attention to the phosphorylation status of the captured proteins. Haven't the MS data been analyzed in this respect? Information regarding this issue would enhance the manuscript. Data generation and method description appear reproducible for readers. Also, the statistical analyses appear adequate. The authors should also consider to deposit their MS raw and analysis data in a public repository (e.g. PRIDE) for future reviewing processes and as reference data for readers and followers. Our raw MS data have been deposited by the Arkansas Proteomics Facility. I have followed up to ensure that they are publicly available.

      *Minor comments: *

      The authors should combine supplementary data files to reduce the number of single files readers have to deal with. We have combined these files as suggested.

      The authors should avoid the term "upregulation" or "increased biotinylation" when capture enrichment is meant. We agree with reviewer's point. We now use the terms enriched versus reduced or up versus down, depending on the context, and clearly define these terms. These changes have been incorporated throughout the manuscript.

      *Reviewer #4 (Significance (Required)): *

      The manuscript presents a robust BioID proteomics screening for co-localizing proteins of NEKL-2/3 kinases and their known interactors MLT-2/3/4. The ongoing validation of their functional interactions and whether the protein candidates reflect phosphorylation substrates or else remains elusive and is announced for upcoming manuscripts. The knowledge gain in terms of molecular mechanisms with NEKL-2/3 MLT-2/3/4 involvement in C. elegans is therefore limited to a table of - promising - interacting candidates that have to be studied further. Information about the phosphorylation status of the captured proteins from the MS data are not given. However, knowing the protein candidates will be of interest for groups working with these complexes (or the identified potentially interacting proteins) either in C. elegans or any other organism. Also, in-depth proteomics screenings with novel approaches such as BioID have to be established for individual organisms. For C. elegans there is only one prior BioID publication (Holzer et al. 2022). Many of the aspects discussed here have also been addressed earlier for BioIDs in other organisms and are not principally new. However, the presented study can be of conceptual interest for labs delving into or entangled with the BioID method in C. elegans or other organisms. The study addresses especially proteomics groups working on protein-protein interactions using proximity labeling/MS approaches. Basic consideration and thoughts for the experimental design and MS data analysis are given in detail and can serve as another guideline for future studies.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review)

      As this code was developed for use with a 4096 electrode array, it is important to be aware of double-counting neurons across the many electrodes. I understand that there are ways within the code to ensure that this does not happen, but care must be taken in two key areas. Firstly, action potentials traveling down axons will exhibit a triphasic waveform that is different from the biphasic waveform that appears near the cell body, but these two signals will still be from the same neuron (for example, see Litke et al., 2004 "What does the eye tell the brain: Development of a System for the Large-Scale Recording of Retinal Output Activity"; figure 14). I did not see anything that would directly address this situation, so it might be something for you to consider in updated versions of the code.

      Thank you for this comment. We have added a routine to the SpikeMAP to remove highly correlated spikes detected within a given spatial radius of each other. The following was added to the main text (line 149):

      “As an additional verification step, SpikeMAP allows the computation of spike-count correlations between putative neurons located within a user-defined radius. Signals that exceed a defined threshold of correlation can be rejected as they likely reflect the same underlying cell.”

      Secondly, spike shapes are known to change when firing rates are high, like in bursting neurons (Harris, K.D., Hirase, H., Leinekugel, X., Henze, D.A. & Buzsáki, G. Temporal interaction between single spikes and complex spike bursts in hippocampal pyramidal cells. Neuron 32, 141-149 (2001)). I did not see this addressed in the present version of the manuscript.

      We have added a routine to SpikeMAP that computes population spike rates to verify stationarity over time. We have also added a routine to identify putative bursting neurons through a Hartigan statistical dip test applied to the inter-spike distribution of individual cells.

      We added the following (line 204):

      “Further, SpikeMAP contains a routine to perform a Hartigan statistical dip test on the inter-spike distribution of individual cells to detect putative bursting neurons.”

      Another area for possible improvement would be to build on the excellent validation experiments you have already conducted with parvalbumin interneurons. Although it would take more work, similar experiments could be conducted for somatostatin and vasoactive intestinal peptide neurons against a background of excitatory neurons. These may have different spike profiles, but your success in distinguishing them can only be known if you validate against ground truth, like you did for the PV interneurons.

      We have added the following (line 326):

      “future work could include different inhibitory interneurons such as somatostatin (SOM) and vasoactive intestinal polypeptide (VIP) neurons to improve the classification of inhibitory cell types. Another avenue could involve applying SpikeMAP on artificially generated spike data (Buccino & Einevoll 2021; Laquitaine et al., 2024).”

      Reviewer #2 (Public review)

      Summary:

      While I find that the paper is nicely written and easy to follow, I find that the algorithmic part of the paper is not really new and should have been more carefully compared to existing solutions. While the GT recordings to assess the possibilities of a spike sorting tool to distinguish properly between excitatory and inhibitory neurons are interesting, spikeMAP does not seem to bring anything new to state-of-the-art solutions, and/or, at least, it would deserve to be properly benchmarked. I would suggest that the authors perform a more intensive comparison with existing spike sorters.

      Thank you for your insightful comment. A full comparison between SpikeMAP and related methods is provided in Table. 1. As can be seen, SpikeMAP is the only method listed that performs E/I sorting on large-scale multielectrodes. Nonetheless, several aspects of SpikeMAP included in the spike sorting pipeline do overlap with existing methods, as these constitute necessary steps prior to performing E/I identification. These steps are not novel to the current work, nor do they constitute rigid options that cannot be substituted by the user. Rather, we aim to offer SpikeMAP users the option to combine E/I identification with preliminary steps performed either through our software or through another package of their choosing. For instance, preliminary spike sorting could be done through Kilosort before importing the spike data into SpikeMAP for E/I identification. To allow greater flexibility, we have now modularized our suite so that E/I identification can be performed as a stand-alone module. We have clarified the text accordingly (line 317):

      “While SpikeMAP is the only known method to enable the identification of putative excitatory and inhibitory neurons on high-density multielectrode arrays (Table 1), several aspects of SpikeMAP included in the spike sorting pipeline (Figure 1) overlap with existing methods, as these constitute required steps prior to performing E/I identification. To enable users the ability to integrate SpikeMAP with existing toolboxes, we provide a modularized suite of protocols so that E/I identification can be performed separately from preliminary spike sorting steps. In this way, a user could carry out spike sorting through Kilosort or another package before importing their data to SpikeMAP for E/I identification.”

      Weaknesses:

      (1) The global workflow of spikeMAP, described in Figure 1, seems to be very similar to that of Hilgen et al. 2020 (10.1016/j.celrep.2017.02.038). Therefore, the first question is what is the rationale of reinventing the wheel, and not using tools that are doing something very similar (as mentioned by the authors themselves). I have a hard time, in general, believing that spikeMAP has something particularly special, given its Methods, compared to state-of-the-art spike sorters.

      The paper by Hilgen et al. is reported in Table 1. As seen, while this paper employs optogenetics, it does not target inhibitory (e.g., PV) cells. We have added the following clarification (line 82):

      “Despite evidence showing differences in action potential kinetics for distinct cell-types as well as the use of optogenetics (Hilgen et al., 2017), there exists no large-scale validation efforts, to our knowledge, showing that extracellular waveforms can be used to reliably distinguish cell-types.”

      This is why, at the very least, the title of the paper is misleading, because it lets the reader think that the core of the paper will be about a new spike sorting pipeline. If this is the main message the authors want to convey, then I think that numerous validations/benchmarks are missing to assess first how good spikeMAP is, with reference to spike sorting in general, before deciding if this is indeed the right tool to discriminate excitatory vs inhibitory cells. The GT validation, while interesting, is not enough to entirely validate the paper. The details are a bit too scarce for me, or would deserve to be better explained (see other comments after).

      We thank the reviewer for this comment, and have amended the title as follows:

      “SpikeMAP: An unsupervised pipeline for the identification of cortical excitatory and inhibitory neurons in high-density multielectrode arrays with ground-truth validation”

      (2) Regarding the putative location of the spikes, it has been shown that the center of mass, while easy to compute, is not the most accurate solution [Scopin et al, 2024, 10.1016/j.jneumeth.2024.110297]. For example, it has an intrinsic bias for finding positions within the boundaries of the electrodes, while some other methods, such as monopolar triangulation or grid-based convolution,n might have better performances. Can the authors comment on the choice of the Center of Mass as a unique way to triangulate the sources?

      We agree with the reviewer that the center-of-mass algorithm carries limitations that are addressed by other methods. To address this issue, we have included two additional protocols in SpikeMAP to perform monopolar triangulation and grid-based convolution, offering additional options for users of the package. The text has been clarified as follows (line 429):

      “In addition to center-of-mass triangulation, SpikeMAP includes protocols to perform monopolar triangulation and grid-based convolution, offering additional options to estimate putative soma locations based on waveform amplitudes.”

      (3) Still in Figure 1, I am not sure I really see the point of Spline Interpolation. I see the point of such a smoothing, but the authors should demonstrate that it has a key impact on the distinction of Excitatory vs. Inhibitory cells. What is special about the value of 90kHz for a signal recorded at 18kHz? What is the gain with spline enhancement compared to without? Does such a value depend on the sampling rate, or is it a global optimum found by the authors?

      We clarified the text as follows (line 183):

      “While we found that a resolution of 90 kHZ provided a reasonable estimate of spike waveforms, this value can be adjusted as a parameter in SpikeMAP.”

      (4) Figure 2 is not really clear, especially panel B. The choice of the time scale for the B panel might not be the most appropriate, and the legend filtered/unfiltered with a dot is not clear to me in Bii.

      We apologize for the rendering issues in the Figures that occurred during conversion into PDF format. We have now ensured that all figures are properly displayed.

      In panel E, the authors are making two clusters with PCA projections on single waveforms. Does this mean that the PCA is only applied to the main waveforms, i.e. the ones obtained where the amplitudes are peaking the most? This is not really clear from the methods, but if this is the case, then this approach is a bit simplistic and does not really match state-of-the-art solutions. Spike waveforms are quite often, especially with such high-density arrays, covering multiple channels at once, and thus the extracellular patterns triggered by the single units on the MEA are spatio-temporal motifs occurring on several channels. This is why, in modern spike sorters, the information in a local neighbourhood is often kept to be projected, via PCA, on the lower-dimensional space before clustering. Information on a single channel only might not be informative enough to disambiguate sources. Can the authors comment on that, and what is the exact spatial resolution of the 3Brain device? The way the authors are performing the SVD should be clarified in the methods section. Is it on a single channel, and/or on multiple channels in a local neighbourhood?

      We agree with the reviewer that it would be useful to have the option of performing PCA on several channels at once, since spikes can occur at several channels at the same time. We have now added a routine to SpikeMAP that allows users to define a radius around individual channels prior to performing PCA. The text was clarified as follows (line 131):

      “The SpikeMAP suite also offers a routine to select a radius around individual channels in order to enter groups of adjacent channels in PCA.”

      (5) About the isolation of the single units, here again, I think the manuscript lacks some technical details. The authors are saying that they are using a k-means cluster analysis with k=2. This means that the authors are explicitly looking for 2 clusters per electrode? If so, this is a really strong assumption that should not be held in the context of spike sorting, because, since it is a blind source separation technique, one can not pre-determine in advance how many sources are present in the vicinity of a given electrode. While the illustration in Figure 2E is ok, there is no guarantee that one can not find more clusters, so why this choice of k=2? Again, this is why most modern spike sorting pipelines do not rely on k-means, to avoid any hard-coded number of clusters. Can the authors comment on that?

      We clarified the text as follows (line 135):

      “In SpikeMAP, the optimal number of k-means clusters can be chosen by a Calinski-Harabasz criterion (Calinski and Harabasz, 1974) or pre-selected by the user.”

      (6) I'm surprised by the linear decay of the maximal amplitude as a function of the distance from the soma, as shown in Figure 2H. Is it really what should be expected? Based on the properties of the extracellular media, shouldn't we expect a power law for the decay of the amplitude? This is strange that up to 100um away from the soma, the max amplitude only dropped from 260 to 240 uV. Can the authors comment on that? It would be interesting to plot that for all neurons recorded, in a normed manner V/max(V) as function of distances, to see what the curve looks like.

      We added Supplemental Figure 1 showing the drop in voltage over all putative somas (N=1,950) of one recording, after excluding somas with an increase voltage away from electrode peak and computing normed values V/max(V). We see a distribution of slopes as well as intercepts across somas, showing some variability across recordings sites. As the reviewer suggests, it is possible that a power-law describes these data better than a linear function, and this would need to be investigated further by quantitatively comparing the fit of these functions.

      (7) In Figure 3A, it seems that the total number of cells is rather low for such a large number of electrodes. What are the quality criteria that are used to keep these cells? Did the authors exclude some cells from the analysis, and if yes, what are the quality criteria that are used to keep cells? If no criteria are used (because none are mentioned in the Methods), then how come so few cells are detected, and can the authors convince us that these neurons are indeed "clean" units (RPVs, SNRs, ...)?

      The reviewer is correct to point out that a number of stringent criteria were employed to exclude some putative cells. We now outline these criteria directly in the text (line 161):

      “ At different steps in the process, conditions for rejecting spikes can be tailored by applying: (1) a stringent threshold to filtered voltages; (2) a minimal cut-off on the signal-to-noise ratio of voltages (see Supplemental Figure 2); (3) an LDA for cluster separability; (4) a minimal spike rate to putative neurons; (5) a Hartigan statistical dip test to detect spike bursting; (6) a decrease in voltage away from putative somas; and (7) a maximum spike-count correlation for nearby channels. Together, these criteria allow SpikeMAP users the ability to precisely control parameters relevant to automated spike sorting.”

      Further, we provide SNRs of individual channels (Supplemental Figure 2), and added to the SpikeMAP software the ability to apply a minimal criterion based on SNR.

      (8) Still in Figure 3A, it looks like there is a bias to find inhibitory cells at the borders, since they do not appear to be uniformly distributed over the MEA. Can the authors comment on that? What would be the explanation for such a behaviour? It would be interesting to see some macroscopic quantities on Excitatory/Inhibitory cells, such as mean firing rates, averaged SNRs... Because again, in Figure 3C, it is not clear to me that the firing rates of inhibitory cells are higher than Excitatory ones, whilst they should be in theory.

      We have added figures showing the distribution of E and I firing rates across a population of N=1,950 putative cells (Supplemental Figure 3). Firing rates of inhibitory neurons are marginally higher than excitatory neurons, and both E and I follow an approximately exponential distribution of rates.

      Reviewer may be right that there are more I neurons at borders in Fig.3B because injections were done in medial prefrontal cortex, so this may reflect an experimental artefact related to a high probability of activating I neurons in locations where the opsin was activated. We added a sentence to the text to clarify this point (line 201):

      “It is possible that the spatial location of putative I cells reflects the site of injection of the opsin in medial prefrontal cortex.”

      (9) For Figure 3 in general, I would have performed an exhaustive comparison of putative cells found by spikeMAP and other sorters. More precisely, I think that to prove the point that spikeMAP is indeed bringing something new to the field of spike sorting, the authors should have compared the performances of various spike sorters to discriminate Exc vs Inh cells based on their ground truth recordings. For example, either using Kilosort [Pachitariu et al, 2024, 10.1038/s41592-024-02232-7], or some other sorters that might be working with such large high-density data [Yger et al, 2018, 10.7554/eLife.34518].

      The reviewer is correct to point out that our the spike-sorting portion of our pipeline shares similarities with related approaches. Other aspects, however, are unique to SpikeMAP. We have clarified the text accordingly:

      “In sum, SpikeMAP provides an end-to-end pipeline to perform spike-sorting on high-density multielectrode arrays. Some elements of this pipeline are similar to related approaches (Table 1), including the use of voltage filtering, PCA, and k-means clustering. Other elements are novel, including the use of spline interpolation, LDA, and the ability to identify putative excitatory and inhibitory cells.”

      (10) Figure 4 has a big issue, and I guess the panels A and B should be redrawn. I don't understand what the red rectangle is displaying.

      Again, we apologize for the rendering issues in the Figures that occurred during conversion into PDF format. We have now ensured that all figures are properly displayed.

      (11) I understand that Figure 4 is only one example, but I have a hard time understanding from the manuscript how many slices/mices were used to obtain the GT data? I guess the manuscript could be enhanced by turning the data into an open-access dataset, but then some clarification is needed. How many flashes/animals/slices are we talking about? Maybe this should be illustrated in Figure 4, if this figure is devoted to the introduction of the GT data.

      Details of the open access data are now provided in Supplemental Table 1. We also clarified Figure 5B:

      “Quantification of change in firing rate following optogenetic stimulation. Average firing rates are taken over four recordings obtained from 3 mice.”

      (12) While there is no doubt that GT data as the ones recorded here by the authors are the most interesting data from a validation point of view, the pretty low yield of such experiments should not discourage the use of artificially generated recordings such as the ones made in [Buccino et al, 2020, 10.1007/s12021-020-09467-7] or even recently in [Laquitaine et al, 2024, 10.1101/2024.12.04.626805v1]. In these papers, the authors have putative waveforms/firing rate patterns for excitatory and inhibitory cells, and thus, the authors could test how good they are in discriminating the two subtypes.

      We agree with the reviewer that it would be worthwhile for future work to apply SpikeMAP to artificially generated spike trains, and have added the following (line 328):

      “Another avenue could involve applying SpikeMAP on artificially generated spike data (Buccino & Einevoll 2021; Laquitaine et al., 2024).”

      Reviewer #1 (Recommendations for the authors):

      (1) Line 154 seems to include a parenthetical expression left over from editing: "sensitive to noise (contamination? Better than noise?) generated by the signal of proximal units." See also line 186: "use (reliance?) of light-sensitive" and line 245: "In the absence of synaptic blockers (right?)," and line 270: "the size of the data prevents manual intervention (curation?)." Check carefully for all parentheses like that, which should be removed.

      Thank you for pointing this out. We have revised the text and removed parenthetical expressions left over from editing.

      (2) In lines 285-286, you state that: "k-mean clustering of spike waveform properties best differentiated the two principal classes of cells..." But I could not find where you compared k-means clustering to other methods. I think you just argued that k-means seemed to work well, but not better than, another method. If that is so, then you should probably rephrase those lines.

      The reviewer is correct that direct comparisons are not performed here, hence we removed this sentence.

      (3) Methods section, E/I classification, lines 396-405: You give us figures on what fraction was E and I (PV subtype) (94.75% and 5.25%), but there is more that you could have said. First of all, what is the expected fraction of parvalbumin-sensitive interneurons in the cortex - is it near 5%?

      We clarified the text as follows (line 444): “This number is close to the expected percentage of PV interneurons in cortex (4-6%) (Markram et al. 2004).”

      Second, how would these percentages change if you altered the threshold from 3 s.d. to something lower, like 2 s.d.? Giving us some idea of how the threshold affects the fraction of PV interneurons could give us an idea of whether this method agrees with our expectations or not.

      While SpikeMAP offers the flexibility to set the voltage threshold manually, we opted for a stringent threshold to demonstrate the capabilities of the software. As seen in Figure 2D, at 2 and 3 s.d., the signal is largely accounted for by Gaussian noise, while deviation from noise arises around 4 s.d. We clarified the text as follows (line 120):

      “At a threshold of -3 , the signal could be largely accounted for by Gaussian noise, while a separation between signal and noise began around a threshold of -4 ”

      Third, did the inhibitory neurons identified by this optogenetic method also have narrow spike widths at half amplitude? Could you do a scatterplot of all the spike widths and inter-peak distances that had color-coded dots for E and I based on your optogenetic method?

      We have added a scatterplot (Supplemental Figure 5).

      (4) Can you compare your methods with others now widely in use, like, for example, Spiking Circus or Kilosort? You do that in Table 1 in terms of features, but not in terms of performance. For example, you could have applied Kilosort4 to your data from the 4096 electrode array and seen how often it sorted the same neurons that SpikeMAP did. I realize this could not give you a comparison of how many were E/I, but it could tell you how close your numbers of neurons agreed with their numbers. Were your numbers within 5% of each other? This would be helpful for groups who are already using Kilosort4.

      As mentioned ealier, packages listed in Table 1 do not provide an identification of putative E/I neurons on high-density electrode arrays. To facilitation the integration of SpikeMAP with other spike sorting packages, our suite now provides a stand-alone module to perform E/I identification. This is now mentioned in the text (see earlier comment).

      Reviewer #2 (Recommendations for the authors):

      I would encourage the authors to decide what the paper is about: is it about a new sorting method (and if yes, more tests/benchmarks are needed to explain the pros and the cons of the pipelines, and the Methods need to be expanded). Or is it about the new data for Ground Truth validation, and again, if yes, then maybe explain more what they are, how many slices/mice/cells, ... Maybe also consider making the data available online as an open dataset.

      We agree with the reviewer that the paper is best slated toward ground truth validation of E/I identification. We now specify how many slices/mice/cells etc. (see Supplemental Table 1) and make the data available online as open source.

    1. Author response:

      (1) Explore the temporal component of neural responses (instead of collapsing responses to a single number, i.e., the average response over 4s), and determine which of the three models can recapitulate the observed dynamics.

      (2) Expand the polar plot visualization to show all three slopes (changes in responses across all three successive concentrations) instead of only two slopes.

      (3) Attempt to collect and analyze, from published papers, data of: (a) first-order neuron responses to odors to determine the role of first-order inhibition towards generating non-monotonic responses, and (b) PN responses in Drosophila to properly compare with corresponding first-order neuron responses.

      (4) Further discuss: (a) why the brain may need to encode absolute concentration, (b) the distinction between non-monotonic responses and cross-over responses, and (c) potential limitations of the primacy model.

      (5) Expand the divisive normalization model by evaluating different values of k and R, and study the effects of divisive normalization on tufted cells.

      (6) Add discussion of other potential inhibitory mechanisms that could contribute towards the observed effects.

      Reviewer #1:

      The article starts from the premise that animals need to know the absolute concentration of an odor over many log units, but the need for this isn't obvious. The introduction cites an analogy to vision and audition. These are cases where we know for a fact that the absolute intensity of the stimulus is not relevant. Instead, sensory perception relies on processing small differences in intensity across space or time. And to maintain that sensitivity to small differences, the system discards the stimulus baseline. Humans are notoriously bad at judging the absolute light level. That information gets discarded even before light reaches the retina, namely through contraction of the pupil. Similarly, it seems plausible that a behavior like olfactory tracking relies on sensing small gradients across time (when weaving back and forth across the track) or space (across nostrils). It is important that the system function over many log units of concentration (e.g., far and close to a source) but not that it accurately represents what that current concentration is [see e.g., Wachowiak et al, 2025 Recalibrating Olfactory Neuroscience..].

      We thank the Reviewer for the insightful input and agree that gradients across time and space are important for various olfactory behaviors, such as tracking. At the same time, we think that absolute concentration is also needed for two reasons. First, in order to extract changes in concentration, the absolute concentration needs to be normalized out; i.e., change needs to be encoded with respect to some baseline, which is what divisive normalization computes. Second, while it is true that representing the exact number of odor molecules present is not important, this number directly relates to distance from the odor source, which does provide ethological value (e.g., is the tiger 100m or 1000m away?). Indeed, our decoding experiments focused on discriminating relative, and not on absolute, concentrations by classifying between each pair of concentrations (i.e., relative distances), which is effectively an assessment of the gradient. In our revision, we will make all of these points clearer.

      Still, many experiments in olfactory research have delivered square pulses of odor at concentrations spanning many log units, rather than the sorts of stimuli an animal might encounter during tracking. Even within that framework, though, it doesn't seem mysterious anymore how odor identity and odor concentration are represented differently. For example, Stopfer et al 2003 showed that the population response of locust PNs traces a dynamic trajectory. Trajectories for a given odor form a manifold, within which trajectories for different concentrations are distinct by their excursions on the manifold. To see this, one must recognize that the PN responds to an odor pulse with a time-varying firing rate, that different PNs have different dynamics, and that the dynamics can change with concentration. This is also well recognized in the mammalian systems. Much has been written about the topic of dynamic coding of identity and intensity - see the reviews of Laurent (2002) and Uchida (2014).

      Given the above comments on the dynamics of odor responses in first- and second-order neurons, it seems insufficient to capture the response of a neuron with a single number. Even if one somehow had to use a single number, the mean firing rate during the odor pulse may not be the best choice. For example, the rodent mitral cells fire in rhythm with the animal's sniffing cycle, and certain odors will just shift the phase of the rhythm without changing the total number of spikes (see e.g., Fantana et al, 2008). During olfactory search or tracking, the sub-second movements of the animal in the odor landscape get superposed on the sniffing cycle. Given all this, it seems unlikely that the total number of spikes from a neuron in a 4-second period is going to be a relevant variable for neural processing downstream.

      To our knowledge, it is not well understood how downstream brain regions read out mitral cell responses to guide olfactory behavior. The olfactory bulb projects to more than a dozen brain regions, and different regions could decode signals in different ways. We focused on the mean response because it is a simple, natural construct.

      The datasets we analyzed may not include all relevant timing information; for example, the mouse data is from calcium imaging studies that did not track sniff timing. Nonetheless, we plan to address this comment within our framework by binning time into smaller-sized windows (e.g., 0-0.2s, 0.2-0.4s, etc.) and repeating our analysis for each of these windows. Specifically, we will determine how each normalization method fares in recapitulating statistics of the population responses of each window, beyond simply assessing the population mean.

      Much of the analysis focuses on the mean activity of the entire population. Why is this an interesting quantity? Apparently, the mean stays similar because some neurons increase and others decrease their firing rate. It would be more revealing, perhaps, to show the distribution of firing rates at different concentrations and see how that distribution is predicted by different models of normalization. This could provide a stronger test than just the mean.

      We agree that mean activity is only one measure to summarize a rich data set and will perform the suggested analysis.

      The question "if concentration information is discarded in second-order neurons, which exclusively transmit odor information to the rest of the brain, how does the brain support olfactory behaviors, such as tracking and navigation?" is really not an open question anymore. For example, reference 23 reports in the abstract that "Odorant concentration had no systematic effect on spike counts, indicating that rate cannot encode intensity. Instead, odor intensity can be encoded by temporal features of the population response. We found a subpopulation of rapid, largely concentration-invariant responses was followed by another population of responses whose latencies systematically decreased at higher concentrations."

      Primacy coding does provide one plausible mechanism to decode concentration. Our manuscript demonstrated how such a code could emerge in second-order neurons with the help of divisive normalization, though it does require maintaining at least partial rank invariance across concentrations, which may not be robust. We also showed how concentration could be decoded via spike rates, even if average rates are constant, which provides an alternative hypothesis to that of ref 23.

      Further, ref 23 only considers the piriform cortex, which, as mentioned above, is one of many targets of the olfactory bulb, and it remains unclear what the decoding mechanisms are of each of these targets. In addition, work from the same authors of ref 23 found multiple potential decoding strategies in the piriform cortex itself, including changes in firing rate (see Fig. 2E of ref. 23 - Bolding & Franks, 2017; as well as Fig. 4 in Roland et al., 2017).

      It would be useful to state early in the manuscript what kinds of stimuli are being considered and how the response of a neuron is summarized by one number. There are many alternative ways to treat both stimuli and responses.

      We will add this explanation to the manuscript.

      "The change in response across consecutive concentration levels may not be robust due to experimental noise and the somewhat limited range of concentrations sampled": Yes, a number of the curves just look like "no response". It would help the reader to show some examples of raw data, e.g. the time course of one neuron's firing rate to 4 concentrations, and for the authors to illustrate how they compress those responses into single numbers.

      We agree and will add this information to the manuscript.

      "We then calculated the angle between these two slopes for each neuron and plotted a polar histogram of these angles." The methods suggest that this angle is the arctan of the ratio of the two slopes in the response curve. A ratio of 2 would result from a slope change from 0.0001 to 0.0002 (i.e., virtually no change in slope) or from 1 to 2 (a huge change). Those are completely different response curves. Is it reasonable to lump them into the same bin of the polar plot? This seems an unusual way to illustrate the diversity of response curve shapes.

      We agree that the two changes in the reviewer’s example will be categorized in the same quadrant in our analysis. We did not focus on the absolute changes because our analysis covers many log ratios of concentrations. Instead, we focused on the relative shapes of the concentration response curves, and more specifically, the direction of the change (i.e., the sign of the slope). We will better motivate this style of analysis in the revision. Moreover, in response to comments by Reviewer 2, we will compare response shapes between all three successive levels of concentration changes, as opposed to only two levels.

      The Drosophila OSN data are passed through normalization models and then compared to locust PN data. This seems dangerous, as flies and locusts are separated by about 300 M years of evolution, and we don't know that fly PNs act like locust PNs. Their antennal lobe anatomy differs in many ways, as does the olfactory physiology. To draw any conclusions about a change in neural representation, it would be preferable to have OSN and PN data from the same species.

      We are in the process of requesting PN response data in Drosophila from groups that have collected such data and will repeat the analysis once we get access to the data.

      One conclusion is that divisive normalization could account for some of the change in responses from receptors to 2nd order neurons. This seems to be well appreciated already [e.g., Olsen 2010, Papadopoulou 2011, minireview in Hong & Wilson 2013].

      While we agree that these manuscripts do study the effects of divisive normalization in insects and fish, here we show that this computation also generalizes to rodents. In addition, these previous studies do not focus on divisive normalization’s role towards concentration encoding/decoding, which is our focus. We will clarify this difference in the revision.

      Another claim is that subtractive normalization cannot perform that function. What model was used for subtractive normalization is unclear (there is an error in the Methods). It would be interesting if there were a categorical difference between divisive and subtractive normalization.

      We apologize for the mistake in the subtractive normalization equation and will correct it. Thank you for catching it.

      Looking closer at the divisive normalization model, it really has two components: (a) the "lateral inhibition" by which a neuron gets suppressed if other neurons fire (here scaled by the parameter k) , and (b) a nonlinear sigmoid transformation (determined by the parameters n and sigma). Both lateral inhibition and nonlinearity are known to contribute to decorrelation in a neural population (e.g., Pitkow 2012). The "intraglomerular gain control" contains only the nonlinearity. The "subtractive normalization" we don't know. But if one wanted to put divisive and subtractive inhibition on the same footing, one should add a sigmoid nonlinearity in both cases.

      Our intent was not to place all the methods on the “same footing” but rather to isolate the two primary components of normalization methods – non-linearity and lateral inhibition – and determine which of these, and in which combination, could generate the desired effects. Divisive normalization incorporates both components, whereas intraglomerular gain control and subtractive normalization only incorporate one of these components. We will clarify this reasoning in the revision.

      The response models could be made more realistic in other ways. For example, in both locusts and fish, the 2nd order neurons get inputs from multiple receptor types; presumably, that will affect their response functions. Also, lateral inhibition can take quite different forms. In locusts, the inhibitory neurons seem to collect from many glomeruli. But in rats, the inhibition by short axon cells may originate from just a few sparse glomeruli, and those might be different for every mitral cell (Fantana 2008).

      We thank the Reviewer for the input. Instead of fixing k for all second-order neurons, we will apply different k values for different neurons. We will also systematically vary the percentage of neurons used for the divisive normalization calculation in the denominator, and determine the regime under which the effects experimentally observed are reproducible. This approach takes into account the scenario that inter-glomerular inhibitory interactions are sparse.

      There are questions raised by the following statements: "traded-off energy for faster and finer concentration discrimination" and "an additional type of second-order neuron (tufted cells) that has evolved in land vertebrates and that outperforms mitral cells in concentration encoding" and later "These results suggest a trade-off between concentration decoding and normalization processes, which prevent saturation and reduce energy consumption.". Are the tufted cells inferior to the mitral cells in any respect? Do they suffer from saturation at high concentration? And do they then fail in their postulated role for odor tracking? If not, then what was the evolutionary driver for normalization in the mitral cell pathway? Certainly not lower energy consumption (50,000 mitral cells = 1% of rod photoreceptors, each of which consumes way more energy than a mitral cell).

      The question of what mitral cells are “good for”, compared to tufted cells, remains unclear in our view. We speculate that mitral cells provide superior context-dependent processing and are better for determining stimuli-reward contingencies, but this remains far from settled experimentally.

      We believe the mitral cell pathway evolved earlier than tufted cells, since the former appear akin to projection neurons in insects. Nonetheless, we agree that differences in energy consumption are unlikely to be the primary distinguishing factor, and in the revision, we will drop this argument.

      Reviewer #2:

      The main premise that divisive normalization generates this diversity of dose-response curves in the second-order neurons is a little problematic. … The analysis in [Figure 3] indicates that divisive normalization does what it is supposed to do, i.e., compresses concentration information and not alter the rank-order of neurons or the combinatorial patterns. Changes in the combinations of neurons activated with intensity arise directly from the fact that the first-order neurons did not have monotonic responses with odor intensity (i.e., crossovers). This was the necessary condition, and not the divisive normalization for changes in the combinatorial code. There seems to be a confusion/urge to attribute all coding properties found in the second-order neurons to 'divisive normalization.' If the input from sensory neurons is monotonic (i.e., no crossovers), then divisive normalization did not change the rank order, and the same combinations of neurons are activated in a similar fashion (same vector direction or combinatorial profile) to encode for different odor intensities. Concentration invariance is achieved, and concentration information is lost. However, when the first-order neurons are non-monotonic (i.e., with crossovers), that causes the second-order neurons to have different rank orders with different concentrations. Divisive normalization compresses information about concentrations, and rank-order differences preserve information about the odor concentration. Does this not mean that the non-monotonicity of sensory neuron response is vital for robustly maintaining information about odor concentration? Naturally, the question that arises is whether many of the important features of the second-order neuron's response simply seem to follow the input. Or is my understanding of the figures and the write-up flawed, and are there more ways in which divisive normalization contributes to reshaping the second-order neural response? This must be clarified. Lastly, the tufted cells in the mouse OB are also driven by this sensory input with crossovers. How does the OB circuit convert the input with crossovers into one that is monotonic with concentration? I think that is an important question that this computational effort could clarify.

      It appears that there is confusion about the definitions of “non-monotonicity” and “crossovers”.  These are two independent concepts – one does not necessarily lead to the other. Non-monotonicity concerns the response of a single neuron to different concentration levels. A neuron’s response is considered non-monotonic if its response goes up then down, or down then up, across increasing concentrations. A “cross-over” is defined based on the responses of multiple neurons. A cross-over occurs when the response of one neuron is lower than another neuron at one concentration, but higher than the other at a different concentration. For example, the responses of both neurons could increase monotonically with increasing concentration, but one neuron might start lower and grow faster, hence creating a cross-over. We will clarify this in the manuscript, which we believe will resolve the questions raised above.

      The way the decoding results and analysis are presented does not add a lot of information to what has already been presented. For example, based on the differences in rank-order with concentration, I would expect the combinatorial code to be different. Hence, a very simple classifier based on cosine or correlation distance would work well. However, since divisive normalization (DN) is applied, I would expect a simple classification scheme that uses the Euclidean distance metric to work equally as well after DN. Is this the case?

      Yes, we used a simple classification scheme, logistic regression with a linear kernel, which is essentially a Euclidean distance-based classification. This scheme works better for tufted cells because they are more monotonic; i.e., if neuron A and B both increase their responsiveness with concentration, then Euclidean distance would be fine. But if neuron A’s response amplitude goes up and neuron B’s response goes down – as often happens for mitral cells – then Euclidean distance does not work as well. We will add intuition about this in the manuscript.

      Leave-one-trial/sample-out seems too conservative. How robust are the combinatorial patterns across trials? Would just one or two training trials suffice for creating templates for robust classification? Based on my prior experience (https://elifesciences.org/reviewed-preprints/89330https://elifesciences.org/reviewed-preprints/89330), I do expect that the combinatorial patterns would be more robust to adaptation and hence also allow robust recognition of odor intensity across repeated encounters.

      As suggested, we will compute the correlation coefficient of the similarity of neural responses for each odor (across trials). We will repeat this analysis for both mitral and tufted cells. To determine the effect of adaptation, we will compute correlation coefficients of responses between the 1st and 2nd trials vs the 1st and final trial.

      Lastly, in the simulated data, since the affinity of the first-order sensory neurons to odorants is expected to be constant across concentration, and "Jaccard similarity between the sets of highest-affinity neurons for each pair of concentration levels was > 0.96," why would the rank-order change across concentration? DN should not alter the rank order.

      We agree that divisive normalization should not alter the rank order, but the rank order may change in first-order neurons, which carries through to second-order neurons. This confusion may be related to the one mentioned above re: cross-overs vs non-monotonicity. Moreover, in the simulated data (Fig. 4D-H), the Jaccard similarity was calculated based on only the 50 neurons with the highest affinity, not the entire population of neurons. As shown in Fig. 4H, most of the rank-order change happens in the remaining 150 neurons.

      Note that in response to a comment by Reviewer 3, we will change the presentation of Fig. 4H in the revision.

      If the set of early responders does change, how will the decoder need to change, and what precise predictions can be made that can be tested experimentally? The lack of exploration of this aspect of the results seems like a missed opportunity.

      In the Discussion, we wrote about how downstream circuits will need to learn which set of neurons are to be associated with each distinct concentration level. We will expand upon this point and include experimentally testable predictions.

      Based on the methods, for Figures 1 and 2, it appears the responses across time, trials, and odorants were averaged to get a single data point per neuron for each concentration. Would this averaging not severely dilute trends in the data? The one that particularly concerns me is the averaging across different odorants. If you do odor-by-odor analysis, is the flattening of second-order neural responses still observable? Because some odorants activate more globally and some locally, I would expect a wide variety of dose-response relationships that vary with odor identity (more compressed in second-order neurons, of course). It would be good to show some representative neural responses and show how the extracted values for each neuron are a faithful/good representation of its response variation across intensities.

      It appears there is some confusion here; we will clarify in the text and figure captions that we did not average across different odors in our analysis. We will also add figure panels showing some representative neural responses as suggested by the Reviewer.

      A lot of neurons seem to have responses that flat line closer to zero (both firing rate and dF/F in Figure 1). Are these responsive neurons? The mean dF/F also seems to hover not significantly above zero. Hence, I was wondering if the number of neurons is reducing the trend in the data significantly.

      Yes, if a neuron responds to at least one concentration level in at least 50% of the trials, it is considered responsive. So it is possible that some neurons respond to one concentration level and otherwise flatline near zero.  We will highlight a few example neurons to visualize this scenario.

      I did not fully understand the need to show the increase in the odor response across concentrations as a polar plot. I see potential issues with the same. For example, the following dose-response trend at four intensities (C4 being the highest concentration and C1 the lowest): response at C3 > response at C1 and response at C4 > response at C2. But response at C3 < response at C2. Hence, it will be in the top right segment of the polar plot. However, the responses are not monotonic with concentrations. So, I am not convinced that the polar plot is the right way to characterize the dose-response curves. Just my 2 cents.

      Your 2 cents are valuable! Thank you for raising this point. Instead of computing two slopes (C1-C3 and C2-C4), we will expand our analysis to include all three slopes (C1-C2, C2-C3, C3-C4). Consequently, there are 2^3 = 8 different response shapes, and we will list them and quantify the fraction of the responses that fall into each shape category.

      In many analyses, simulated data were used (Figures 3 and 4). However, there is no comparison of how well the simulated data fit the experimental data. For example, the Simulated 1st order neuron in Figure 3D does not show a change in rank-order for the first-order neuron. In Figure 3E, temporal response patterns in second-order neurons look unrealistic. Some objective comparison of simulated and experimental data would help bolster confidence in these results.

      We believe the Reviewer is referring to Figs. 4D and 4E, since Fig. 3D does not show a first-order neuron simulation, and there is no Fig 3E. In Fig. 4D there is no change of rank order because the simulation is for a single odor and single concentration level, and the change of rank-order (i.e., cross-overs) as we define occurs between concentration levels. We will clarify this in the manuscript.

      Reviewer #3:

      While the authors focus on concentration-dependent increases in first-order neuron activity, reflecting the majority of observed responses, recent work from the Imai group shows that odorants can also lead to direct first-order neuron inhibition (i.e., reduction in spontaneous activity), and within this subset, increasing odorant concentration tends to increase the degree of inhibition. Some discussion of these findings and how they may complement divisive normalization to contribute to the diverse second-order neuron concentration-dependence would be of interest and help expand the context of the current results.

      We thank the Reviewer for the suggestion. We will request datasets of first-order neuron responses from the groups who acquired them. We will analyze this data to determine the role of inhibition or antagonistic binding and quantify what percentage of first-order neurons respond less strongly with larger concentrations.

      Related to the above point, odorant-evoked inhibition of second-order neurons is widespread in mammalian mitral cells and significantly contributes to the flattened concentration-dependence of mitral cells at the population level. Such responses are clearly seen in Figure 1D. Some discussion of how odorant-evoked mitral cell inhibition may complement divisive normalization, and likewise relate to comparatively lower levels of odorant-evoked inhibition among tufted cells, would further expand the context of the current results. Toward this end, replication of analyses in Figures 1D and E following exclusion of mitral cell inhibitory responses would provide insight into the contribution of such inhibition to the flattening of the mitral cell population concentration dependence.

      We will perform the analysis suggested, specifically, we will set the negative mitral cell responses to 0 and assess whether the population mean remains flat.

      The idea of concentration-dependent crossover responses across the first-order population being required for divisive normalization to generate individually diverse concentration response functions across the second-order population is notable. The intuition of the crossover responses is that first-order neurons that respond most sensitively to any particular odorant (i.e., at the lowest concentration) respond with overall lower activity at higher concentrations than other first-order neurons less sensitively tuned to the odorant. Whether this is a consistent, generalizable property of odorant binding and first-order neuron responsiveness is not addressed by the authors, however. Biologically, one mechanism that may support such crossover events is intraglomerular presynaptic/feedback inhibition, which would be expected to increase with increasing first-order neuron activation such that the most-sensitively responding first-order neurons would also recruit the strongest inhibition as concentration increases, enabling other first-order neurons to begin to respond more strongly. Discussion of this and/or other biological mechanisms (e.g., first-order neuron depolarization block) supporting such crossover responses would strengthen these results.

      We thank the reviewer for providing additional mechanisms to consider. As suggested, we will add discussion of these alternatives to divisive normalization.

      It is unclear to what degree the latency analysis considered in Figures 4D-H works with the overall framework of divisive normalization, which in Figure 3 we see depends on first-order neuron crossover in concentration response functions. Figure 4D suggests that all first-order neurons respond with the same response amplitude (R in eq. 3), even though this is supposed to be pulled from a distribution. It's possible that Figure 4D is plotting normalized response functions to highlight the difference in latency, but this is not clear from the plot or caption. If response amplitudes are all the same, and the response curves are, as plotted in Figure 4D, identical except for their time to half-max, then it seems somewhat trivial that the resulting second-order neuron activation will follow the same latency ranking, regardless of whether divisive normalization exists or not. However, there is some small jitter in these rankings across concentrations (Figure 4G), suggesting there is some randomness to the simulations. It would be helpful if this were clarified (e.g., by showing a non-normalized Figure 4D, with different response amplitudes), and more broadly, it would be extremely helpful in evaluating the latency coding within the broader framework proposed if the authors clarified whether the simulated first-order neuron response timecourses, when factoring in potentially different amplitudes (R) and averaging across the entire response window, reproduces the concentration response crossovers observed experimentally. In summary, in the present manuscript, it remains unclear if concentration crossovers are captured in the latency simulations, and if not, the authors do not clearly address what impact such variation in response amplitudes across concentrations may have on the latency results. It is further unclear to what degree divisive normalization is necessary for the second-order neurons to establish and maintain their latency ranks across concentrations, or to exhibit concentration-dependent changes in latency.

      As suggested by the Reviewer, we will add another simulation scenario where the response amplitudes (R) are different for different neurons. For each concentration, we will then average each neuron’s response across the entire response window and determine if the simulation reproduces the cross-overs as observed experimentally.

      How the authors get from Figure 4G to 4H is not clear. Figure 4G shows second-order neuron response latencies across all latencies, with ordering based on their sorted latency to low concentration. This shows that very few neurons appear to change latency ranks going from low to high concentration, with a change in rank appearing as any deviation in a monotonically increasing trend. Focusing on the high concentration points, there appear to be 2 latency ranks switched in the first 10 responding neurons (reflecting the 1 downward dip in the points around neuron 8), rather than the 7 stated in the text. Across the first 50 responding neurons, I see only ~14 potential switches (reflecting the ~7 downward dips in the points around neurons 8, 20, 32, 33, 41, 44, 50), rather than the 32 stated in the text. It is possible that the unaccounted rank changes reflect fairly minute differences in latencies that are not visible in the plot in Figure 4G. This may be clarified by plotting each neuron's latency at low concentration vs. high concentration (i.e., similar to Figure 4H, but plotting absolute latency, not latency rank) to allow assessment of the absolute changes. If such minute differences are not driving latency rank changes in Fig. 4G, then a trend much closer to the unity line would be expected in Figure 4H. Instead, however, there are many massive deviations from unity, even within the first 50 responding neurons plotted in Figure 4G. These deviations include a jump in latency rank from 2 at low concentration to ~48 at high concentration. Such a jump is simply not seen in Figure 4G.

      We apologize that Fig. 4H was a poor choice for visualization. What is plotted in Fig. 4H is the sorted identity of neurons under low and high concentrations, and points on the y=x line indicate that the two corresponding neurons have the same rank under the two concentrations. We will replace this panel with a more intuitive visualization, where the x and y axes are the ranks of the neurons; and deviation from the y=x line indicates how different the ranks are of a neuron to the two concentrations.

      In the text, the authors state that "Odor identity can be encoded by the set of highest-affinity neurons (which remains invariant across concentrations)." Presumably, this is a restatement of the primacy model and refers to invariance in latency rank (since the authors have not shown that the highest-affinity neurons have invariant response amplitudes across concentration). To what degree this statement holds given the results in Figure 4H, however, which appear to show that some neurons with the earliest latency rank at low concentration jump to much later latency ranks at high concentration, remains unclear. Such changes in latency rank for only a few of the first responding neurons may be negligible for classifying odor identity among a small handful of odorants, but not among 1-2 orders of magnitude more odors, which may feasibly occur in a natural setting. Collectively, these issues with the execution and presentation of the latency analysis make it unclear how robust the latency results are.

      The original primacy model states that the latency of a neuron decreases with increasing concentration, while the ranks of neurons remain unaltered. Our results, on the other hand, suggest that the ranks do at least partially change across concentrations. This leads to two possible decoding mechanisms. First, if the top K responding neurons remain invariant across concentrations (even if their individual ranks change within the top K), then the brain could learn to associate a population of K neurons with a response latency; lower response latency means higher concentration. Second, if the top K responding neurons do not remain invariant across concentrations, then the brain would need to learn to associate a different set of neurons with each concentration level. The latter imposes additional constraints on the robustness of the primacy model and the corresponding read-out mechanism. We will include more discussion of these possibilities in the revision.

      Analysis in Figures 4A-C shows that concentration can be decoded from first-order neurons, second-order neurons, or first-order neurons with divisive normalization imposed (i.e., simulating second-order responses). This does not say that divisive normalization is necessary to encode concentration, however. Therefore, for the authors to say that divisive normalization is "a potential mechanism for generating odor-specific subsets of second-order neurons whose combinatorial activity or whose response latencies represent concentration information" seems too strong a conclusion. Divisive normalization is not generating the concentration information, since that can be decoded just as well from the first-order neurons. Rather, divisive normalization can account for the different population patterns in concentration response functions between first- and second-order neurons without discarding concentration-dependent information.

      We agree that the word “generating” is faulty. We thank the reviewer for their more precise wording, which we will adopt.

      Performing the same polar histogram analysis of tufted vs. mitral cell concentration response functions (Figure 5B) provides a compelling new visualization of how these two cell types differ in their concentration variance. The projected importance of tufted cells to navigation, emerging directly through the inverse relationship between average concentration and distance (Figure 5C), is not surprising, and is largely a conceptual analysis rather than new quantitative analysis per se, but nevertheless, this is an important point to make. Another important consideration absent from this section, however, is whether and how divisive normalization may impact tufted cell activity. Previous work from the authors, as well as from Schoppa, Shipley, and Westbrook labs, has compellingly demonstrated that a major circuit mediating divisive normalization of mitral cells (GABA/DAergic short-axon cells) directly targets external tufted cells, and is thus very likely to also influence projection tufted cells. Such analysis would additionally provide substantially more justification for the Discussion statement "we analyzed an additional type of second-order neuron (tufted cells)", which at present instead reflects fairly minimal analysis.

      We agree that tufted cells are subject to divisive normalization as well, albeit probably to a less degree than mitral cells. To determine the effect of this, we will alter the strength (and degree of sparseness of interglomerular interactions) of divisive normalization and determine if there is a regime where response features of tufted cells match those observed experimentally.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      This manuscript investigates beta burst dynamics in the primate motor cortex during movement and recovery from stroke. The authors differentiate between "global" beta bursts, which are synchronous across cortical and often subcortical regions, and more spatially confined "local" bursts. Global bursts are associated with reduced spiking variability, slower movements, and are more frequent after stroke, while local bursts increase during recovery and grasp execution. The study provides compelling evidence that beta bursts with different spatial and temporal characteristics may play distinct roles in motor control and recovery.

      We thank the reviewer for their assessment that the manuscript proves compelling evidence for distinct roles of local and global beta bursts on motor control and recovery.  

      Strengths:

      The major strength of this paper lies in its conceptual advance: the identification and characterization of distinct global and local beta bursts in the primate motor cortex. This distinction builds upon and considerably extends previous work on the heterogeneity of beta bursts. The paper is methodologically rigorous, using simultaneous cortical and subcortical recordings, detailed behavioral tracking, and thorough analyses of spikeLFP interactions. The use of stroke models and neurotypical animals provides converging evidence for the functional dissociation between burst types. The observation that local bursts increase with motor recovery and occur during grasping is particularly novel and may prove valuable for developing biomarkers of motor function.

      We thank the reviewer for recognizing the strengths of this manuscript. 

      Weaknesses:

      There are several conceptual and methodological limitations that should be addressed. First, the burst detection method relies on an amplitude threshold (median + 1 SD), which is susceptible to false positives and variability (Langford & Wilson, 2025). The classification into global or local bursts then depends on the number of co-bursting channels, compounding the arbitrariness. Second, the imposition of a minimum of three co-bursting cortical channels may bias against the detection of truly local bursts. 

      We thank the reviewer for bringing up these methodological details. We plan to conduct a follow-up analysis using alternative burst detection methods to verify that the paper’s main results hold when using different burst detection methodologies. We anticipate this will improve confidence in our results. 

      Third, the classification is entirely cortical; subcortical activity is considered post hoc rather than integrated into the classification, despite the key role of subcortical-cortical synchrony in motor control. 

      We thank the reviewer for this comment. First, because the different animals had subcortical recording sites in different locations, we hesitate to use subcortical activity in the classification of bursts since we were not sure we would be identifying the same burst-phenomenon (e.g. thalamo-cortical bursts vs. capsule-cortical bursts may differ). Second, we believe that having a cortical-only criteria allows the designation of local vs. global bursts to be more widely applied in preparations that only have access to cortical data (e.g. surface ECoG recordings, EEG, Utah array recordings). Thus, in this study we chose to analyze the subcortical data post-hoc (after burst detection and classification) to support our “global” vs. “local” designation of burst types 

      Fourth, the apparent dissociation between global and local bursts raises important questions about their spatial distribution across areas like M1 and PMv, which are not thoroughly analyzed. 

      We thank the reviewer for this comment. In our study’s stroke animals, we chose to study PMv due to its role in compensating for damage to M1, thus we hesitate to make any comparisons between PMv (which was recorded in stroke animals) and M1 (recorded in healthy unimpaired animals). Furthermore, animals are doing different tasks (e.g. reaching vs. reaching and grasping) which may also influence the spatial distribution. We agree that future work should certainly investigate the spatial distribution of global vs. local beta bursts across areas of sensorimotor cortex and subcortex, and that this comparison would be best done in healthy animals with both reaching and grasping behaviors.  

      Finally, while the authors interpret local bursts during grasping as novel, similar findings have been reported (e.g., Szul et al., 2023; Rayson et al., 2023), and a deeper discussion of these precedents would strengthen the argument.

      Thank you for these references! We will review them and incorporate them into our discussion of our results. 

      Impact:

      This work is likely to have a substantial impact on the field of motor systems neuroscience. The distinction between global and local beta bursts offers a promising framework for understanding the dual roles of beta in motor inhibition and sensorimotor computation. The findings are relevant not only for basic research but also for translational efforts in stroke rehabilitation and neuromodulation, particularly given the emerging interest in beta burst-based biomarkers and stimulation targets. The dataset and analytical framework will be useful to researchers investigating beta dynamics, spike-field relationships, and recovery from neural injury.

      We thank the reviewers for their assessment that our work will likely have a substantial impact on the field of motor systems neuroscience. 

      Reviewer #2 (Public review):

      Summary:

      The paper by Khanna et al. describes global vs local beta synchrony between a cortical premotor area (PMv) and subcortical structures during motor tasks in the non-human primate, specifically investigating the progression following M1 injury. They found that increases in global beta synchrony between PMv and subcortical structures during the sub-acute phase of injury, and that global synchrony was associated with relatively slower motor movements. As recovery progressed, they report a shift from global synchrony to local synchrony and a subsequent reduction in the movement time. The authors suggest that global changes in subcortical and cortical beta synchrony may generally underpin a variety of movement disorders, including Parkinson's disease, and that shifting from global to local (or reducing global synchrony) might improve functional outcomes.

      Strengths:

      Ischemic insults and other acquired brain injuries have a significant public health impact. While there is a large body of clinical and basic science studies describing the behavioral, neurophysiological, and mechanistic outcomes of such injury, there is a significant lack studies looking at longitudinal, behaviorally-related neurophysiological measures following cortical injury, so any information has outsized contribution to understanding how brain injury disrupts underlying neural activity and how this may contribute to injury presentation and recovery.

      A significant percentage of pre-clinical stroke studies tend to focus on peri-infarct or other cortical structures and their role in recovery. The addition of subcortical recordings allows for the investigation of the role of thalamo-basal gangliar-cortical loops that may be contributing to the degree of impairment or to the recovery process is important for the field. Here, there are longitudinal (up to 3 months post-injury) recordings in the ventral premotor area (PMv) and either the internal capsule or sensorimotor thalamus that can be synchronized with phases of behavioral recovery.

      The methods are well described and can act as a framework for assessing synchrony across other data sets with similar recording locations. Limitations in methodology, recordings, and behavior were noted.

      We thank the reviewer for their comments on the strengths of this paper.  

      Weaknesses:

      A major limitation of this paper is that it is a set of case studies rather than a welldesigned, well-controlled study of beta synchrony following motor cortex injury. While non-human primate neurophysiological studies are almost always limited by extremely low animal numbers, they are made up for by the fact that they can acquire significant numbers of units or channels, and in the case of normal behavior, can obtain many behavioral trials over months of individual sessions. Here, there were two NHPs used, but they had different subcortical implant locations (thalamus vs internal capsule). They had different injury outcomes, with one showing a typical recovery curve following injury while one had complications and worsening behavior before ultimately recovering. Further, there were significant differences in the ability to record at different times, with one NHP having poor recordings early in the recovery process while one had poor recordings late in the process. Due to the injury, the authors report sessions in which they were not able to record many trials (~10). Assuming that recovery after a cortical injury is an evolving process, breaking analysis into "Early" and "Late" phases reduces the interpretation of where these shifts occur relative to recovery on the task, especially given different thresholds for recovery were used between animals. Because of this, despite a careful analysis of the data and an extensive discussion, the conclusions derived are not particularly compelling. To overcome this, the authors present data from neurotypical NHPs, but with electrodes in M1 rather than PMv, doing a completely different task with no grasping component, again making accurate conclusions about the results difficult. Even with low numbers, the study would have been much stronger if there were within-animal longitudinal data prior to and after the injury on the same task, so the impact of M1 injury could be better assessed.

      We thank the reviewer for these comments. Below we address some of these in more detail: 

      Different subcortical implant locations: We would like to clarify that the subcortical recordings were only used to confirm that global beta bursts (as characterized by cortical recordings alone) did indeed occur on subcortical sites coincidentally with cortical site more frequently than local beta bursts. Neither the beta burst categories nor the beta bursts themselves were influenced by the subcortical recordings.  

      Different injury outcomes: There is difficulty in creating strokes that result in identical deficits across animal as we and others have noted in previous work[1.3]. As a field, we are still understanding what factors give rise to variability in recovery curves. For example, one recent study noted that biological sex is a factor in predicting differences in recovery rates[4], and another noted that baseline white matter hyperintensities is also predictive of post-stroke recovery [5]. Overall, our methodology that creates structurally-consistent lesions can still result in very different functional outcomes depending on a variety of factors. Given this state of the field, we have done our best to match the recovery curves between our two animals, especially the initial recovery curves before Monkey H’s secondary decline. 

      Differences in ability to record at different times: We note this as a strength. One concern with these studies that induce stroke at the same time as implanting electrode arrays is that it is well appreciated that single-unit neuron yield right after array implantation is low and then improves in the following weeks [6]. There is always that concern that having more units later in recovery may drive results, but in this case, since one animal showed the opposite trend we are more confident that results are not driven by increases in unit-yield. We also note that we broadly see similar unit quality metrics in the early and late stages in both animals (Fig. S7).  

      Breaking continuous recovery curve into early and late: We note that this division was only made for one main analysis in the paper (Fig. 5CD): assessment of mean firing and variance of single-unit firing rates.  Without this split our analyses would be underpowered and inconclusive, thus we would not be able to provide any comment on how firing rates change, even coarsely, with recovery. 

      Presentation of data from M1 of healthy animals doing a different task: We agree that the strongest data would be longitudinally recorded from the same animals/brain areas pre-stroke and then post-stroke. However, we also view our inclusion of separate healthy animals doing a different task as evidence that our global vs. local segregation of beta bursts generalizes beyond the reach-to-grasp task to reaching-only tasks.  

      Overall, we appreciate the reviewer pointing out these notes about our data. In some cases we do not think these notes are concerning, in others, we acknowledge that have done the best we can given the state of the neurophysiology stroke recovery field. 

      It is unclear to what extent the subpial aspiration used is a stroke model. While it is much more difficult to perform a pure ischemic motor injury using electrocoagulatory methods in animal models that do not have a lissencephalic cortex, the suction ablation method that the authors use leads to different outcomes than an ischemic injury alone. For instance, in rat models, ischemic vs suction ablation leads to very different electrophysiological profiles and differences in underlying anatomical reorganization (see Carmichael and Chesselet, 2002), even if the behavioral outcomes were similar. There is a concern that the effects shown may be an artifact of the lesion model rather than informing underlying mechanisms of recovery.

      We thank the reviewer for bringing this up. 

      Clarification of our stroke model methodology: We wish to highlight that when we create stroke, we first do surface vessel occlusion as the first step. This is designed to match true ischemic injury. After a waiting period, the injured tissue is then aspiration to reduce the effects of edema and secondary mass effect in the model. 

      Carmichael and Chesselet 2002: The rodent work cited did show differential effects of a suction ablation method (without any surface vessel occlusion first) versus an ischemic method. The effects observed in this work were in the first 5 days following stroke. In our case, we started recording on day 7 and examined recovery over extended periods (weeks to months). 

      Effects of acute insult on rehabilitation: From a rehabilitation perspective, it remains unclear how the acute insult affects outcomes weeks and months later. One line of evidence to suggest that the manner that the acute insult occurs may not matter for rehabilitation is the observation that one therapeutic approach (vagus nerve stimulation) has been found to successfully improve rehabilitation outcomes in a range of injury models (intracranial hemorrhage, stroke, spinal cord injury). We agree that additional work is required in this area.

      Human stroke data shows similar results reported: Lastly, we note that neurophysiology performed in humans with clinical strokes supports the results we seek here (e.g.[7], see discussion section for full elaboration) suggesting that our stroke model methodology is similar enough to clinical stroke to result in similar results. 

      The injury model leads to seemingly mild impairments in grasp (but not reach), with rapid and complete recovery occurring within 2-3 weeks from the time of injury. Because of the rapid recovery, relating the physiological processes of recovery to beta synchronization becomes challenging to interpret - Are the global bursts the result of the loss of M1 input to subcortical structures? Are they due to the lack of M1 targets, so there is a more distributed response? Is this due to other post-injury sub-acute mechanisms? How specific is this response - is it limited to peri-infarct areas (and to what extent is the PMv electrode truly in peri-infarct cortex), or would this synchrony be seen anywhere in the sensorimotor networks? Are the local bursts present because global synchrony wanes over time as a function of post-injury homeostatic mechanisms, or is local beta synchrony increasing as new motor plans are refined and reinforced during task re-acquisition? How coupled are they related to recovery - if it is motor plan refinement, the shift from global to local seemingly should lag the recovery?  

      We think these are all wonderful questions that could be addressed in follow-up studies! 

      While the study has significant limitations in design that reduce the impact of the results, it should act as a useful baseline/pilot data set in which to build a more complete picture of the role of subcortical-cortical beta synchrony following cortical injury.

      We agree that this is a study that should be treated as a starting point for further investigation. 

      Reviewer #3 (Public review):

      Summary:

      Khanna et al. use a well-conceived and well-executed set of experiments and analyses primarily to document the interaction between neural oscillations in the beta range (here, 13-30 Hz) and recovery of function in an animal model of stroke. Specifically, they show that cortical "beta bursts", or short-term increases in beta power, correlate strikingly with the timeline of behavioral recovery as quantified with a reach-to-grasp task. A key distinction is made between global beta bursts (here, those that synchronize between cortical and subcortical areas) and local bursts (which appear on only a few electrodes). This distinction of global vs. local is shown to be relevant to task performance and movement speed, among other quantities of interest.

      A secondary results section explores the relationship between beta bursts and neuronal firing during the grasp portion of the behavioral task. These results are valuable to include, though mostly unsurprising, with global beta in particular associated with lower mean and variance in spike rates.

      Last, a partial recapitulation of the primary results is offered with a neurologically intact (uninjured) animal. No major contradictions are found with the primary results.

      Highlights of the Discussion section include a thoughtful review of atypical movements executed by individuals with Parkinson's disease or stroke survivors, placing the current results in an appropriate clinical context. Potential physiological mechanisms that could account for the observed results are also discussed effectively.

      Strengths:

      Overall, this is a very interesting paper. The ultimate impact will be enhanced by the authors' choice to analyze beta bursts, which remain a relatively under-explored aspect of neural coding.

      The reach-and-grasp task was also a well-considered choice; the combination of a relatively simple movement (reaching towards a target in the same location each time) and a more complex movement (a skilled object-manipulation grasp) provides an internal control of sorts for data analysis. In addition, the task's two sub-movements provide a differential in terms of their likelihood to be affected by the stroke-like injury: proximal muscles (controlling reach) are likely to be less affected by stroke, while distal muscles (controlling grasp) are highly likely to be affected. Lastly, the requirement of the task to execute an object lift maximizes its difficulty and also the potential translational impact of the results on human injury.

      The above comments about the task exemplify a strength that is more generally evident: a welcome awareness of clinical relevance, which is in evidence several times throughout the Results and Discussion.

      Weaknesses:

      The study's weaknesses are mostly minor and, for the most part, correctable.

      One concern that may not be correctable in this study: the results about the spatial extent of beta activity seem constrained by relatively poor-quality data. It seems half or more of the electrodes are marked as too noisy to provide useful data in Figure 3. If this reflects the wider reality for all analyses, as mentioned, it may not be correctable for the present study. In that case, perhaps some of the experiments or analyses can be revisited or expanded for a future study, when better electrode yields are available.

      We thank the reviewer for their comments. We note that we have chosen to be particularly conservative with which channels we considered noise-free and acceptable for analysis as our animals were not head-posted (see methods: “On each day, trials were manually inspected alongside camera data for any movement or chewing artifacts (note that animals were not head-posted) and were discarded from neural data analysis if there were any artifacts”). After re-visiting our analysis, we note that the data shown in Fig. 3 (spatial distribution of local bursts) is not representative from a data quality perspective – this data was from a session that had a particularly large number of channels discarded due to artifacts. We plan to correct this to show a more representative figure. 

      Other concerns:

      In some places, there is a lack of clarity in the presentation of the results. This is not serious but should be addressed to aid readers' comprehension.

      We thank the reviewer for this comment and for their numerous suggestions in the notes to the authors. We plan to address as many of these as we can to improve clarity and comprehension.  

      Lastly, given the central role of beta oscillations within the study, it would be better for completeness to include even a brief exploration of sustained beta power (rather than bursts), and the modulation of sustained beta (or lack thereof) in the study's areas of concern: behavioral recovery, task performance, etc.

      We thank the reviewer for this suggestion – we plan to include this in our revisions.  

      References cited in response to public reviewer comments: 

      (1) Ganguly, K., Khanna, P., Morecraft, R. J. & Lin, D. J. Modulation of neural co-firing to enhance network transmission and improve motor function after stroke. Neuron 110, 2363–2385 (2022).

      (2) Khanna, P. et al. Low-frequency stimulation enhances ensemble co-firing and dexterity after stroke. Cell 184, 912-930.e20 (2021).

      (3) Darling, W. G. et al. Sensorimotor Cortex Injury Effects on Recovery of Contralesional Dexterous Movements in Macaca mulatta. Exp Neurol 281, 37–52 (2016).

      (4) Bottenfield, K. R. et al. Sex differences in recovery of motor function in a rhesus monkey model of cortical injury. Biology of Sex Differences 12, 54 (2021).

      (5) Schwarz, A. et al. Association that Neuroimaging and Clinical Measures Have with Change in Arm Impairment in a Phase 3 Stroke Recovery Trial. Ann Neurol 97, 709– 719 (2025).

      (6) Gulati, T. et al. Robust Neuroprosthetic Control from the Stroke Perilesional Cortex. J. Neurosci. 35, 8653–8661 (2015).

      (7) Silberstein, P. et al. Cortico-cortical coupling in Parkinson’s disease and its modulation by therapy. Brain 128, 1277–1291 (2005).

    1. Note: This response was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity):

      In Arabidopsis, DNA demethylation is catalyzed by a family of DNA glycosylases including DME, ROS1, DML2, and DML3. DME activity in the central cell leads to the hypomethylation of maternal alleles in endosperm. While ROS1, DML2, and DML3 function in vegetative tissues to prevent spreading DNA methylation from TE boundaries, their function in the endosperm was unclear.<br /> Using whole genome methylome analysis, the authors showed that ROS1 prevents hypermethylation of paternal alleles in the endosperm thus promotes epigenetic symmetry between maternal and paternal genomes.<br /> The approach and experimental desighs are appropriate, and the key conclusions are adequately supported by the results.<br /> However, there is not sufficient evidence to support the claim that DME demethylates the maternal allele at ROS1-dependent biallelically-demethylated regions. To clarify the issue, the authors could analyze if there is an overlap between DMRs identified in ros1 endosperm and those identified in dme endosperm using published data. If there is any, the authors could show a genome browser example of DMR including dme data.

      Response: Thank you for your insight on our work. To address your concern and further test our model that DME prevents methylation of the maternal allele at regions where ROS1 is prevents methylation of the paternal allele, we turned to the allele-specific bisulfite-sequencing data published in Ibarra et al 2012. These data were from endosperm isolated at 7-8 DAP from aborting seeds of dme-2 +/- (Col-gl) plants pollinated by L_er_. Our analysis of these data is now included in Figures 6 and 7 and Supplemental Figures 13-17. We show that when the loss-of-function allele dme-2 is inherited maternally, average methylation of the maternal allele increases at ROS1-dependent regions (in the revised version of the paper now referred to as ROS1 paternal, DME maternal regions) from less than 10% CG methylation to approximately 40% CG methylation (Fig. 6D), consistent with our previous analysis using the non-allelic Hsieh et al 2009 data (now moved to Supplemental Figure 15). These results thus provide additional evidence that DME removes maternal allele methylation at regions where ROS1 removes paternal allele methylation (compare Fig. 6B and 6D). We included relevant genome browser examples in Figure 7E and Supplemental Figure 14. In the revised version, the relationship between ROS1 and DME is further expanded upon in the text.

      Reviewer #1 (Significance):

      Endosperm is a tissue unique to flowering plants. Though it is an ephemeral tissue, the endosperm plays essential roles for seed development and germination. The endosperm is also the site genomic imprinting occurs, and it has a distinct epigenomic landscape. This work provides a new insight that ROS1 may antagonize imprinted gene expression in the endosperm. However, it was not shown whether imprinted gene expression is indeed affected in ros1, or whether the ros1 mutation has phenotypic consequences. These results would be useful to discuss the evolution and significance of genomic imprinting.

      Response: We agree that the biological significance of ROS1-mediated paternal allele demethylation is presently unknown. We performed RNA-seq on wild-type and ros1 3C and 6C endosperm nuclei, but these data were unfortunately not of high enough quality to include in the manuscript. In the Discussion we suggest that disrupting ROS1-mediated paternal allele demethylation might lead to a gain of imprinting over evolutionary time. In future work we are planning to address potential relationships to gene imprinting using a molecular, RNA-sequencing approach as well as an evolutionary comparative approach. As expected, given the expectation that imprinted genes are associated with a parent-of-origin specific epigenetic mark, we did not find any relationship between known imprinted genes and ROS1-dependent regions that are biallelically-demethylated regions in wild-type endosperm (see lines 362-372).

      Reviewer #2 (Evidence, reproducibility and clarity):

      SUMMARY

      Hemenway and Gehring present evidence that the paternal genome in Arabidopsis endosperm is demethylated at several hundred loci by the DNA glycosylase/lyase ROS1. The evidence is primarily based on analysis of DNA methylation of ros1 mutants and of hybrid crosses where each parental genome can be differentiated by SNPs. I have some comments/questions/concerns, two of them potentially serious, but I think Hemenway and Gehring can address them through additional analyses of data that they already have available and a bit of clarification in writing.

      Response: Thank you for your thoughtful review of this study. Your insight and suggestions have helped add clarity to the paper.

      MAJOR COMMENTS:

      1. Could the excess methylation in ros1-3 relative to ros1-7 shown in Figures 1A and 1C be explained by a second mutation in the ros1-3 background that elevates methylation at some loci? Any mutation that increased RdDM at these loci, for example could have this effect. This could confound the identification and interpretation of biallelicly demethylated loci.

      Response: We propose a simpler explanation for the additional hypermethylation observed in ros1-3: ros1-3 is a loss-of-function (null) allele whereas ros1-7 is likely a hypomorphic allele. For clarity, we have added a diagram of all of the alleles used in this study as Supplemental Figure 1B. The ros1-3 allele was first described in Penterman et al, PNAS, 2007. It is a T-DNA insertion allele that was isolated in the Ws accession and then backcrossed 6 times to Col-0, greatly minimizing the risk of unlinked secondary mutations being present. There is no genetic evidence that there is another T-DNA insertion in this line. The ros1-7 allele was described in Williams et al, Plos Genet, 2015. It was isolated from the Arabidopsis Col-0 TILLING population and is missense mutation (E956K) in a residue in the glycosylase domain that is conserved among the four DNA glycosylases. It is known that ROS1 transcripts are produced from the ros1-7 allele (Williams et al 2015). We observe less hypermethylation in the ros1-7 background compared to the ros1-3 background, and thus propose that the ros1-7 allele is a hypomorphic allele of ROS1. The use of two independent ros1 mutant alleles for initial endosperm methylation profiling strengthens the findings of our study. Importantly, regions that are hypermethylated in ros1-3 are also hypermethylated in ros1-7, but to a lesser extent, and vice versa (Fig 1D, Supplemental Figs. 3 and 4).

      We also use a third allele in this study, ros1-1, which is a nonsense allele in the C24 accession. Notably, we find that the regions are demethylated on both maternal and paternal alleles in wild-type C24 gain DNA methylation primarily on the paternal allele in ros1-1 endosperm (Figure 4C,D and Supplemental Figure 10). This is discussed further in response to your second point.

      Given these lines of evidence, a gain-of-function mutation in a methylation pathway, like RdDM, in the ros1-3 background is an unlikely explanation for increased hypermethylation compared to ros1-7. The use of three independent ros1 alleles for methylation profiling, all of which lead to the same conclusions, is a major strength of our study.

      1. It appears that the main focus of the manuscript, the existence of loci that are paternally demethylated by ROS1, is supported by a set of 274 DMRs. This is a small number relative to the size of the genome and raises suspicions of rare false positives. Even the most stringent p-values that DMR-finding tools report do not guarantee that the DMRs are actually reproducible in an independent experiment. Demonstrating overlap between these 274 DMRs and an independently defined set using a different WT control and different ros1 allele would suffice to remove this concern. It appears that authors already have the needed raw data with ros1-1 and ros1-7 alleles.

      Response: First, we should clarify that paternal demethylation by ROS1 is supported by more than the 274 DMRs. All ros1 CG hyperDMRs show an increase in paternal allele methylation in ros1 (Fig. 4B,D). The 274 DMRs are a distinct subset defined as having less methylation on the maternal allele than the paternal allele in ros1 endosperm and where there is no maternal allele hypomethylation in wild-type endosperm (refer to Fig. 5B).

      We agree with your sentiments about DMR-finders and we are cautious of relying exclusively on DMR calls when making conclusions. We verify the nature of identified DMRs using metaplots and weighted average comparisons throughout the paper, which we think increases confidence in the conclusions and goes beyond a simple DMR-calling approach.

      We argue that we have replicated the major conclusion of the paper, that ROS1 prevents paternal allele hypermethylation at target regions in the endosperm, in the following ways:

      1. In the dataset without allelic-specific methylation information (Figures 1-3), we found that both ros1-3 and ros1-7 CG hyperDMRs have a limited capacity for hypermethylation in the endosperm relative to leaf or sperm (Table 1, Fig 3, Supplemental Fig. 4). In the allele-specific dataset, ros1-3 CG hyperDMRs were revealed to have particularly low maternal mCG relative to paternal mCG in ros1 mutant endosperm (Fig 4A-B, Supplemental Fig. 10).
      2. We found that ros1-3 and ros1-1 hyperDMRs, which we identified using non-allelic data, are biased for paternal allele hypermethylation in the endosperm of F1 hybrids (Fig 4B,D). The replicability of the paternal bias in hypermethylation in both ros1-3 in the Col-0 ecotype and ros1-1 in the C24 ecotype is a critical result, and we have moved the ros1-1 hyperDMR plots from the supplement to main figure 4C-D in the revised version of the manuscript as a result of your comment.
      3. The 274 DMRs identified as “biallelically-demethylated, ROS1-dependent” are by definition replicated between reciprocal cross directions. (Note that we now refer to these regions as ROS1 paternal, DME maternal regions in the revision.) Regions in this category had to be called as maternally-hypomethylated in both ros1-1 x ros1-3 and ros1-3 x ros1-1 endosperm. These regions also had to not be identified as maternally-hypomethylated in both C24 x Col-0 and Col-0 x C24. We hope this is clarified for readers by Table 1, which we have included based on your suggestion in comment #3, as well as other clarifying edits we made in this section of the paper.comparisons between maternal and paternal methylation in endosperm, DMRs defined by comparison between mutants and wildtype, and more. These need clearer descriptions of which sets are being referred to throughout the main text and in figure legends. A table summarizing them might help (not in the supplement). Use of consistent and precisely defined terms would help. Stating the number of DMRs along with the name for each set would help a lot, even though this would make for some redundancy. (The number of DMRs in each set not only helps with interpretation but also act as a sort of ID). The reason I put this as a major concern is because the text and figures are difficult to understand, and it is currently hard to evaluate both the results and the authors' conclusions from those results.

      Response: Thank you for your feedback and suggestions. We have edited the main text so that only one descriptive name is used for each DMR type throughout the paper. We have also renamed regions for greater clarity. The previous “ROS1-independent, maternally demethylated regions” are now referred to as “DME maternal regions”. The previous “ROS1-_independent, biallelically-demethylated regions” are now referred to as “_ROS1 paternal, DME maternal regions”. These changes provide greater clarity and also emphasize the role of DME at regions that are paternally hypermethylated in ros1. We have added Table 1 to summarize the DMR classes of interest.

      MINOR COMMENTS

      1. The sRNA results in Figure 2B are difficult to interpret because they do not reveal anything about the number of TEs that have siRNAs overlapping them or their flanks. While the magnitude of some of the highest endosperm sRNA peaks is higher than the embryo peaks, that could be explained by a small number of TEs with large numbers of sRNAs. To make this result more interpretable, we also need some information about how many TEs have a significant number of sRNAs associated with them in endosperm and embryo in each region (e.g., middle, 5', 3', and flanks of TEs). What a "significant number of sRNAs" is would be up to the authors to decide based on the distribution of sRNA counts they observe for TEs. Perhaps the top quartile of TEs? Combined with the same analysis done in parallel with non-ROS1 target TEs, this would reveal whether there is any evidence for ROS1 counteracting sRNA-driven methylation spread from TEs.

      Response: Thank you for the suggestion. We now present these data and the data for individual TEs underlying the metaplots in Supplemental Figure 7. As suggested by the reviewer, ROS1 TEs do not have uniformly higher levels of sRNA in their flanks in the endosperm compared to the embryo. We have modified our interpretations accordingly.

      1. The statement "we are likely underestimating the true degree of differential methylation among genotypes" should be validated and partially quantified using a methylation metaplot like Figure 2A, but substitute DMRs for TEs. Related to that, Figure 1B needs an indicator of scale in bp.

      Response: We have now included a methylation metaplot over ros1-3 hyperDMRs and ros1-7 hyperDMRs as Supplemental Figure 3 These plots show that indeed there is additional hypermethylation in DMR-proximal regions. We have added a scale bar to Figure 1B and other browser examples in the paper.

      1. The statement "Over half of ROS1 target regions identified in the ros1-3 mutant endosperm were within 1 kb or intersecting a TE (Fig. 1D)" is hard to interpret without some kind of ROS1 non-target regions or whole-genome control comparison. How different are the numbers in Fig. 1D from a random expectation?

      Response: We have now included a control for random regions in Figure 1E. We define these as regions where there was sufficient methylation data coverage and a low enough methylation level in wild-type to detect hypermethylation if it existed.

      1. The sentence at line 262 is confusing. Is the comparison between dme mutant and ros1 mutant or between different types of regions? And it appears that the comparison value is missing in the "3-5% CG methylation gain..." e.g., "3-5% CG methylation vs 10-20%" or something like that.

      Response: This section has been re-written as we now focus on allele-specific dme endosperm methylation data for our comparisons.

      1. The dme mutant data in Figure 5C appear to be key to the model in Figure 7. The relative impact of the dme mutant in the two types of regions should be quantified.

      Response: Thank you for this comment. To further probe our model that DME prevents hypermethylation of the maternal allele at regions where ROS1 is preventing hypermethylation of the paternal allele, we turned to the allele-specific bisulfite-sequencing data published in Ibarra et al 2012 (see also response to reviewer #1). Using these data, we show that when the loss-of-function allele dme-2 is inherited maternally, ROS1 paternal, DME maternal regions (previous referred to as ROS1-_dependent, biallelically-demethylated regions) are CG hypermethylated on the maternal allele (Figure 6D). Thus, these results both replicate the observations made with the Hsieh et al 2009 data, and provide additional evidence that _DME prevents maternal allele hypermethylation at regions were ROS1 is preventing paternal allele hypermethylation. These results have replaced the Hsieh et al 2009 results in Figure 6, and we have moved the analysis of Hsieh et al 2009 data to Supplemental Figure 15.

      1. Looks like sRNA methods are missing.

      Response: Thank you for identifying this. We previously included the reference for the analyzed dataset we used and the method for plotting under an unclear section header. These methods are now in the section “Analysis of average methylation and 24-nt sRNA patterns for features of interest”, and we have added additional reference to the specific dataset we used.

      1. Supplemental Figure 1 is hard to interpret since it only list gene IDs, not gene names.

      Response: As suggested, we have added gene names to this figure.

      The last comments are suggestions for increasing the impact of this study:

      1. Figure 2A and 3B suggest that ROS1 target TEs show demethylation in their flanks but not in the TE themselves. This is an interesting result. If it is true, more DMRs would be expected in the ROS1 target flanks than in the ROS1 target TEs. Reporting how many ROS1 target TEs have DMRs in them and what proportion have DMRs in their flanking 1-Kb regions would answer this question. Given the significance of this result, it also deserves a bit more context: Is the magnitude of increased methylation flanking TEs in ros1 mutant endosperm different than in ros1 mutant leaves or other tissue? Does methylation in TE flanks behave the way in dme mutant endosperm?

      Response: We define “ROS1 target TEs” (now referred to more simply as ROS1 TEs) as TEs within 1kb or intersecting a ros1-3 hyperDMR. Consistent with your interpretation, 80% of the TEs in this category do not have a DMR overlapping them, instead they have a TE within 1kb. We now mention this in the text on line 150.

      The total level of DNA methylation at ROS1 TEs is lower in the endosperm than in leaf, as DNA methylation levels are overall lower in endosperm than in leaf. The magnitude of increased methylation flanking TEs in ros1 mutant endosperm is not different between the two tissues. This is observable in Supplemental Fig. 5 in the revised version of the paper, and we report this result in the revised text. In the revision we also present methylation profiles of DME TEs in WT and ros1 endosperm (Fig. 7B-D). DME TEs are hypomethylated in both the body and flanks in WT and ros1.

      1. The idea of biallelic demethylation has been theoretically suggested in maize to explain weak overlap between endosperm DMRs and imprinting (Gent et al 2022). If that were true in Arabidopsis, then ROS1 target, biallelicly demethylated loci would be less likely to have imprinted expression than maternally demethylated loci. This prediction could be tested using available data in Arabidopsis.

      Response: Indeed, as you hypothesize, there are no known imprinted genes (Pignatta et al 2014) associated with biallelically-demethylated, ROS1-dependent regions (now referred to as ROS1 paternal, DME maternal regions). Expectedly, there are imprinted genes associated with maternally-demethylated regions (now referred to as DME regions). 23 imprinted genes identified in the Pignatta et al 2014 study are within 1 kb or intersecting a DME region. This is discussed on lines 364-374.

      1. There is currently no evidence for biological significance of biallelicly demethylated loci. Knowing where they are in the genome might give some hints. A figure like Fig. 1D but specifically showing the biallelicly demethylated DMRs would be valuable.

      Response: This is now included in Figure 7A.

      1. It is hard to make the comparisons between genotypes and parental genomes in Figure 6 and know what they mean. Maybe a different way of displaying the data would help. Or maybe even a different labeling system could make it a little more accessible.

      Response: We have revised this figure (now Fig. 8) in the following ways, which we believe address your comments and clarify the main conclusions:

      Figure 8C is now a boxplot comparing methylation of the paternal allele of ROS1 paternal, DME maternal regions (previously referred to as biallelically-demethylated, ROS1-dependent regions) across endosperm ROS1 genotypes. This plot shows increased methylation of paternal alleles when the paternal parent is a ros1 mutant, regardless of whether the resultant F1 endosperm is homozygous or heterozygous for ros1 (columns 3, 4, 6).

      Figure 8B remains as a scatterplot, where we can observe significant correlation between individual ROS1 paternal, DME maternal regions in homozygous ros1 endosperm and heterozygous ros1/+ endosperm. Note that paternal allele methylation is higher in homozygous ros1 endosperm for most regions.

      Reviewer #2 (Significance):

      Demethylation of the maternal genome in endosperm has been the subject of much research because it can result in genomic imprinting of gene expression. The enzymes responsible, DNA glycosylases/lyases, also demethylate DNA in other cell types as well, where DNA methylation is not confined to one parental genome (biallelic or biparental as opposed to uniparental demethylation). To the best of my knowledge, the extent or even existence of biallelelic demethylation in endosperm has not been studied until now (except for a superficial look in a bioRxiv preprint, https://www.biorxiv.org/content/10.1101/2024.07.31.606038v1). Hemenway and Gehring have carried out a thoughtful and detailed analysis of the topic in Arabidopsis at least as far as it depends on the DNA glycosylase ROS1.

      A limitation is that the study design would miss biallelic demethylation by any of the other three DNA glycosylases in Arabidopsis. A second limitation is that there is no clear biological significance, just some conjecture about evolution. Nonetheless, given the novelty of the topic, biological significance may follow.

      The audience for biallelic DNA demethylation in Arabidopsis endosperm is certainly in the "specialized" category, but its relevance to the larger topic of gene regulation in endosperm will attract a larger audience.

      Response: With regard to the other demethylases, note that we also profiled methylation in ros1 dml2 dml3 triple mutant endosperm. We did not find evidence for many DMRs that were present in the triple mutant that were not present in the ros1 single mutant. We do not rule out a function for DML2 or DML3 in the endosperm, but this is not observed at the level of bulk endosperm.

      The reviewer is correct that we have shown a molecular phenotype (paternal allele hypermethylation) and not a developmental or morphological phenotype. A function that occurs in one parent but not the other is, to us, exciting. Our thoughts about how this finding might relate to imprinting are indeed speculative, but not wildly so.

      Reviewer #3 (Evidence, reproducibility and clarity):

      DNA demethylases play a key role in DNA methylation patterning during flowering plant reproduction. The demethylase DME, in particular, is critical for proper endosperm development. While the function of DME in endosperm development has been explored, the contributions of the other demethylases in the same family, ROS1, DML2 and DML3 in Arabidopsis, have not yet been investigated. In vegetative tissues, ROS1 prevents hypermethylation of some loci. In this work, Hemenway and Gehring explore whether ROS1, DML2 and DML3 also affect DNA methylation patterns in endosperm. Using EM-seq of sorted endosperm nuclei, they show that loss of ROS1 indeed causes hypermethylation of a number of loci, particularly the flanks of methylated transposons, while loss of DML2 and DML3 has minimal additional effect. By obtaining allele-specific EM-seq data through crosses of Col and C24, the authors show that ros1 endosperm hypermethylation is mostly restricted to the paternal allele. The authors propose that at some sites, ROS1 helps bring down paternal methylation levels to match maternal methylation levels, which are typically reduced in endosperm due to DME activity in the female gametophyte prior to fertilization. In a ros1 mutant with paternal hypermethylation, these sites become differentially methylated on the maternal and paternal alleles, resembling imprinted loci. This work convincingly establishes a function for ROS1 in DNA methylation patterning in endosperm. However, I struggled with the clarity of the writing and reasoning in a few places, and would suggest clarification of a few points and additional analyses below.

      Response: Thank you for your thoughtful review of our paper. Your questions and suggestions have been invaluable in revising the work.

      I think making a few simple changes to streamline nomenclature would improve readability. For example, in the section starting on line 129, the same set of genomic features are called ROS1 target-proximal TEs, TEs that are near a ROS1 target region, and ROS1 target-associated TE regions. Also for example in line 254 "regions that are maternally-demethylated in wild-type endosperm, and are not dependent on ROS1 for proper demethylation" - are these the same as the "ROS1-independent, maternally-demethylated" regions in Fig. 5a? Given how complex these terms are, being consistent throughout the manuscript really helps the reader.

      Response: We edited the text and figures so that only one descriptive name is used for each DMR class or region throughout the paper. Thank you for this feedback; these edits have made the paper much clearer.

      Is there any notable effect of ros1 on gene expression in endosperm? Endosperm is a terminal tissue, so maintaining DNA methylation boundaries as ROS1 does in vegetative tissues seems less important. It begs the question of why ROS1 is doing this in endosperm, is it just because it's there, or is there an endosperm-specific function? Exploring effects on imprinting would be particularly interesting (does loss of ROS1 'create' imprinted loci at these newly asymmetrically methylated sites?) but probably beyond the scope of the present work.

      Response: We agree, the question of the functional consequence of ROS1 activity in the endosperm is something we are keen to address in future work. We performed RNA-seq on wild-type and ros1 3C and 6C endosperm nuclei, but these data were unfortunately not of high enough quality to include in the manuscript. We are in particular interested in this question you have proposed – if loss of ROS1 can ‘create’ imprinted loci. We are planning to address this both using a molecular, RNA-sequencing approach as well as an evolutionary comparative approach. This is an important and exciting future direction.

      Is DME expressed in sperm, or is expression of DME affected in ros1 sperm or endosperm? One other explanation for ros1 hypermethylation occurring primarily on the paternal allele is that, potentially, DME can substitute for ROS1 in the central cell where DME is already very active, but not in sperm cells. Related, how well expressed is ROS1 vs. DME in sperm cells?

      Response: This is an important series of questions, and something we are very interested in as well. Studies of Arabidopsis pollen have shown that both ROS1 and DME, while they prevent some hypermethylation in sperm, are more active in the vegetative nucleus of pollen than in sperm. ROS1 is expressed at a low level in the microspore and bicellular pollen and DME is expressed at a low level throughout pollen development. We have included Supplemental Fig. 17 with available expression data to make this point in the paper. Likely, any effects of loss of ROS1 or DME on sperm DNA methylation are inherited from precursor cells (Ibarra et al 2012, Calarco et al 2012, Khouider et al 2021). Your proposal that perhaps DME can sub in for ROS1 in the central cell but not in sperm is intriguing. Unfortunately there’s not enough data in the central cell to convincingly address this at this time.

      To investigate the relationship between DME and ROS1 in the male germline, we used the bisulfite-sequencing data generated in sperm cells in Khouider et al 2021. We calculated average DNA methylation levels in dme/+, ros1, dme/+;ros1, and wild-type Col-0 sperm cells at ROS1 paternal, DME maternal regions, shown in Supplemental Fig. 18A. We observed little increase in mCG methylation in dme/+ sperm relative to wild-type Col-0 sperm. This is consistent with your proposed model that DME is unable to demethylate these regions outside of the female germline. As expected, there is increased mCG in ROS1 paternal, DME maternal regions in ros1-3 mutant sperm relative to wild-type Col-0 sperm. DME maternal regions are highly methylated in wild-type Col-0 sperm.

      Fig 2b shows that ROS1 target-associated TEs are enriched for sRNAs in endosperm relative to embryo, whereas the reverse is true for non-ROS1-assoc TEs. Since TEs are not always well annotated and some may be missing from this analysis, what about trying the reverse analysis - are regions enriched for 24nt sRNAs in endosperm significantly hypermethylated in ros1 endosperm? All regions or only some?

      Response: We performed an analysis to address your inquiry and observed a low magnitude increase in DNA methylation in ros1 mutant endosperm at regions defined by Erdmann et al as more sRNA producing in the endosperm relative to the embryo (endosperm DSRs). Endosperm DSRs are generally lowly methylated in wild-type endosperm, as was observed originally in Erdmann et al 2017. Small increases in DNA methylation are observed at endosperm DSRs in all sequence contexts in ros1 endosperm. Overall, this is consistent with ROS1 targets being a subset of sRNA-producing regions in the endosperm. This analysis is now included in Supplemental Fig. 7C.

      What is the relationship between previously-defined DME targets and ROS1 targets identified in this paper? DME tends to target small euchromatic TE bodies, whereas Fig. 3 suggests that ROS1 helps prevent methylation spreading on the outer edges of the TEs, rather than in the TE body. Do all DME targets tend to be adjacent to or flanked by ROS1 target sites? Or are the TEs affected by DME (in body) and by ROS1 (at edges) largely nonoverlapping? Fig. 5a suggests that the ROS1-dependent, biallelically-demethylated sites are both DME and ROS1 targets, but how often do these really appear to overlap? More than by chance?

      Response: We have sought to address your comments through a series of analyses that we have included in Fig. 7 and Supplemental Fig. 16. We found that ROS1 paternal, DME maternal regions (formerly referred to as ROS1-dependent, biallelically-demethylated regions) and DME maternal regions (formerly referred to as ROS1-independent, maternally-demethylated regions) do not occupy the same genomic regions. However, we do observe some evidence for ROS1 activity in flanking regions of DME targets (Fig. 6A, Fig. 7B-D). To look at TEs specifically, as you suggest, we first identified TEs that were within 1kb or intersecting a DME maternal region. Based on our characterization of these regions, we assume these to be DME-targeted TEs. We then performed ends analysis to see if there was evidence of ROS1 activity at the ends of these TEs. Indeed, at a global level there is a slight hypermethylation of the paternal allele in a ros1 mutant at the end of these DME TEs (Fig. 7B). To better visualize how many DME TEs are showing ROS1 activity at their ends, we then plotted the difference between the median ros1-3 methylation and median Col-0 values in the non-allelic endosperm for each TE in a clustered heatmap (Fig. 7C). The parent-of-origin data does not have enough coverage for clustering in this way, so we used the non-allelic data. A small fraction of “DME TEs” gain methylation in the ros1 mutant endosperm relative to wild-type (Fig. 7C-D).

      Are the TEs whose boundaries are demethylated by ROS1 more likely to be expressed in vegetative or endosperm tissues than TEs not affected by loss of ROS1? Expressed TEs likely produce more sRNAs, which would increase RdDM in a way that might need to be more actively countered by ROS1 than transcriptionally silent or evolutionarily older TEs.

      Response: This is an interesting line of inquiry, although perhaps out of the scope of our present study. It has been shown that TEs demethylated by ROS1 are targeted by the RdDM pathway in Arabidopsis vegetative tissue (Tang et al 2016). Using data from Erdmann et al 2017, we looked at 24 nt sRNAs at ROS1-TEs in the endosperm and embryo (Supplemental Fig. 7). sRNA production at ROS1 TE-flanking regions is observed in both embryo and endosperm, but clearly not all ROS1 TEs produce 24 nt sRNA production in the seed. Future work comparing sRNA profiles in a ros1 mutant to those of wild-type could inform our understanding of TE spreading in a ros1 mutant, as would a comprehensive analysis of TE expression, again in both a ros1 mutant and in wild-type. It’s unclear to us if the endosperm would be the most informative or useful tissue to perform such analyses in.

      Fig6 - as noted in the text, one way to test whether demethylation by ROS1 occurs before or after fertilization is to provide functional ROS1 through only one parent via reciprocal WT x ros-1 crosses, so that the endosperm always has ROS1 but either sperm or central cell does not, and see if this can rescue the paternal hypermethylation. If ROS1 acts prior to fertilization, then paternal ROS1 will rescue ros1 hypermethylation, but maternal ROS1 won't. If after fertilization, then either maternally or paternally supplied ROS1 will rescue the hypermethylation phenotype (assuming both are well expressed). Thus, to distinguish the two, it is sufficient to test whether maternally supplied ROS1 in an otherwise mutant background can rescue the hypermethylation phenotype, which is what is shown in Fig. 6. However, I think it's also important to show that paternally supplied ROS1 can also rescue the hypermethylation phenotype, which is not currently shown. The plots showing no effect on maternal mCG aren't as informative, since maternal methylation levels are mostly unaffected by ros1 anyway. Instead of comparing pairs of samples in a scatterplot, it might be clearer to show paternal mCG across all four comparisons (WT x WT, WT x ros1, ros1 x WT, and ros1 x ros1) side by side in a heatmap, using clustering to group similar behavior.

      Response: We have revised this figure, now Fig. 8, in the following ways, which we believe addresses your comments and clarify the main conclusions (see same response to reviewer 2 for point 14):

      Figure 8B remains as a scatterplot, where we observe significant correlation between individual ROS1 paternal, DME maternal regions in homozygous ros1 endosperm and heterozygous ros1/+ endosperm. Note that paternal allele methylation is higher in homozygous ros1 endosperm for most regions.

      Figure 8C is now a boxplot comparing methylation of the paternal allele of ROS1 paternal, DME maternal regions (previously referred to as biallelically-demethylated, ROS1-dependent regions) across endosperm ROS1 genotypes. This plot shows increased methylation of paternal alleles when the paternal parent is a ros1 mutant, regardless of whether the resultant F1 endosperm is homozygous or heterozygous for ros1 (columns 3, 4, 6).

      I would also suggest including a little more information in the main plots rather than only in the figure legends. For example, in Fig 2 including a label of 'ROS1-associated TE' for the two plots on the left, and 'TEs not associated with ROS1' on the right. Or for example in Fig. 3a indicating 'ros1-3 CG hyperDMRs' somewhere on the plot. This would just help make the figures easier to read at a glance. Please add common gene names to figures, instead just the ATG gene ID (Fig. S1a).

      Response: Thank you for this feedback, we have made the suggested edits and additional edits of a similar nature.

      Minor:<br /> - Fig. 1E is referenced in the text before Fig. 1D<br /> - Fig. S4 and S5 - there are more lines in the plot than the 6 genotypes listed in the legend, do these represent different replicates? If so that should be noted in the legend<br /> - Fig. 1B has no color legend for the different methylation sequence contexts (looks like same as 1A,C but should indicate either in plot or legend)<br /> - Line 42 should be "correspond to TE ends"<br /> - Line 93 "Based on previous studies..." should have references to those studies<br /> - When referring to the protein (rather than the genetic locus or mutant), ROS1 should not be italicized - for example line 130<br /> - Line 150 "we conclude that the loss"<br /> - Should add a y=x line to scatterplots, like those in Fig. 6<br /> - In fig. 1d, it's hard to evaluate the significance of the overlap of ROS1 targets with genes and TEs. Comparing these numbers to a control where the ROS1 targets have been randomly shuffled would help.

      Response: We have made edits and additions where requested.

      Reviewer #3 (Significance):

      In this work, Hemenway and Gehring explore whether ROS1, DML2 and DML3 also affect DNA methylation patterns in endosperm. Using EM-seq of sorted endosperm nuclei, they show that loss of ROS1 indeed causes hypermethylation of a number of loci, particularly the flanks of methylated transposons, while loss of DML2 and DML3 has minimal additional effect. By obtaining allele-specific EM-seq data through crosses of Col and C24, the authors show that ros1 endosperm hypermethylation is mostly restricted to the paternal allele. The authors propose that at some sites, ROS1 helps bring down paternal methylation levels to match maternal methylation levels, which are typically reduced in endosperm due to DME activity in the female gametophyte prior to fertilization. In a ros1 mutant with paternal hypermethylation, these sites become differentially methylated on the maternal and paternal alleles, resembling imprinted loci. This work convincingly establishes a function for ROS1 in DNA methylation patterning in endosperm. However, I struggled with the clarity of the writing and reasoning in a few places, and would suggest clarification of a few points and additional analyses.

      Response: Thank you for your comments. We have worked on streamlining the text and analysis.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      DNA demethylases play a key role in DNA methylation patterning during flowering plant reproduction. The demethylase DME, in particular, is critical for proper endosperm development. While the function of DME in endosperm development has been explored, the contributions of the other demethylases in the same family, ROS1, DML2 and DML3 in Arabidopsis, have not yet been investigated. In vegetative tissues, ROS1 prevents hypermethylation of some loci. In this work, Hemenway and Gehring explore whether ROS1, DML2 and DML3 also affect DNA methylation patterns in endosperm. Using EM-seq of sorted endosperm nuclei, they show that loss of ROS1 indeed causes hypermethylation of a number of loci, particularly the flanks of methylated transposons, while loss of DML2 and DML3 has minimal additional effect. By obtaining allele-specific EM-seq data through crosses of Col and C24, the authors show that ros1 endosperm hypermethylation is mostly restricted to the paternal allele. The authors propose that at some sites, ROS1 helps bring down paternal methylation levels to match maternal methylation levels, which are typically reduced in endosperm due to DME activity in the female gametophyte prior to fertilization. In a ros1 mutant with paternal hypermethylation, these sites become differentially methylated on the maternal and paternal alleles, resembling imprinted loci. This work convincingly establishes a function for ROS1 in DNA methylation patterning in endosperm. However, I struggled with the clarity of the writing and reasoning in a few places, and would suggest clarification of a few points and additional analyses below.

      I think making a few simple changes to streamline nomenclature would improve readability. For example, in the section starting on line 129, the same set of genomic features are called ROS1 target-proximal TEs, TEs that are near a ROS1 target region, and ROS1 target-associated TE regions. Also for example in line 254 "regions that are maternally-demethylated in wild-type endosperm, and are not dependent on ROS1 for proper demethylation" - are these the same as the "ROS1-independent, maternally-demethylated" regions in Fig. 5a? Given how complex these terms are, being consistent throughout the manuscript really helps the reader.

      Is there any notable effect of ros1 on gene expression in endosperm? Endosperm is a terminal tissue, so maintaining DNA methylation boundaries as ROS1 does in vegetative tissues seems less important. It begs the question of why ROS1 is doing this in endosperm, is it just because it's there, or is there an endosperm-specific function? Exploring effects on imprinting would be particularly interesting (does loss of ROS1 'create' imprinted loci at these newly asymmetrically methylated sites?) but probably beyond the scope of the present work.

      Is DME expressed in sperm, or is expression of DME affected in ros1 sperm or endosperm? One other explanation for ros1 hypermethylation occurring primarily on the paternal allele is that, potentially, DME can substitute for ROS1 in the central cell where DME is already very active, but not in sperm cells. Related, how well expressed is ROS1 vs. DME in sperm cells?

      Fig 2b shows that ROS1 target-associated TEs are enriched for sRNAs in endosperm relative to embryo, whereas the reverse is true for non-ROS1-assoc TEs. Since TEs are not always well annotated and some may be missing from this analysis, what about trying the reverse analysis - are regions enriched for 24nt sRNAs in endosperm significantly hypermethylated in ros1 endosperm? All regions or only some?

      What is the relationship between previously-defined DME targets and ROS1 targets identified in this paper? DME tends to target small euchromatic TE bodies, whereas Fig. 3 suggests that ROS1 helps prevent methylation spreading on the outer edges of the TEs, rather than in the TE body. Do all DME targets tend to be adjacent to or flanked by ROS1 target sites? Or are the TEs affected by DME (in body) and by ROS1 (at edges) largely nonoverlapping? Fig. 5a suggests that the ROS1-dependent, biallelically-demethylated sites are both DME and ROS1 targets, but how often do these really appear to overlap? More than by chance?

      Are the TEs whose boundaries are demethylated by ROS1 more likely to be expressed in vegetative or endosperm tissues than TEs not affected by loss of ROS1? Expressed TEs likely produce more sRNAs, which would increase RdDM in a way that might need to be more actively countered by ROS1 than transcriptionally silent or evolutionarily older TEs.

      Fig6 - as noted in the text, one way to test whether demethylation by ROS1 occurs before or after fertilization is to provide functional ROS1 through only one parent via reciprocal WT x ros-1 crosses, so that the endosperm always has ROS1 but either sperm or central cell does not, and see if this can rescue the paternal hypermethylation. If ROS1 acts prior to fertilization, then paternal ROS1 will rescue ros1 hypermethylation, but maternal ROS1 won't. If after fertilization, then either maternally or paternally supplied ROS1 will rescue the hypermethylation phenotype (assuming both are well expressed). Thus, to distinguish the two, it is sufficient to test whether maternally supplied ROS1 in an otherwise mutant background can rescue the hypermethylation phenotype, which is what is shown in Fig. 6. However, I think it's also important to show that paternally supplied ROS1 can also rescue the hypermethylation phenotype, which is not currently shown. The plots showing no effect on maternal mCG aren't as informative, since maternal methylation levels are mostly unaffected by ros1 anyway. Instead of comparing pairs of samples in a scatterplot, it might be clearer to show paternal mCG across all four comparisons (WT x WT, WT x ros1, ros1 x WT, and ros1 x ros1) side by side in a heatmap, using clustering to group similar behavior.

      I would also suggest including a little more information in the main plots rather than only in the figure legends. For example, in Fig 2 including a label of 'ROS1-associated TE' for the two plots on the left, and 'TEs not associated with ROS1' on the right. Or for example in Fig. 3a indicating 'ros1-3 CG hyperDMRs' somewhere on the plot. This would just help make the figures easier to read at a glance. Please add common gene names to figures, instead just the ATG gene ID (Fig. S1a).

      Minor:

      • Fig. 1E is referenced in the text before Fig. 1D
      • Fig. S4 and S5 - there are more lines in the plot than the 6 genotypes listed in the legend, do these represent different replicates? If so that should be noted in the legend
      • Fig. 1B has no color legend for the different methylation sequence contexts (looks like same as 1A,C but should indicate either in plot or legend)
      • Line 42 should be "correspond to TE ends"
      • Line 93 "Based on previous studies..." should have references to those studies
      • When referring to the protein (rather than the genetic locus or mutant), ROS1 should not be italicized - for example line 130
      • Line 150 "we conclude that the loss"
      • Should add a y=x line to scatterplots, like those in Fig. 6
      • In fig. 1d, it's hard to evaluate the significance of the overlap of ROS1 targets with genes and TEs. Comparing these numbers to a control where the ROS1 targets have been randomly shuffled would help.

      Significance

      In this work, Hemenway and Gehring explore whether ROS1, DML2 and DML3 also affect DNA methylation patterns in endosperm. Using EM-seq of sorted endosperm nuclei, they show that loss of ROS1 indeed causes hypermethylation of a number of loci, particularly the flanks of methylated transposons, while loss of DML2 and DML3 has minimal additional effect. By obtaining allele-specific EM-seq data through crosses of Col and C24, the authors show that ros1 endosperm hypermethylation is mostly restricted to the paternal allele. The authors propose that at some sites, ROS1 helps bring down paternal methylation levels to match maternal methylation levels, which are typically reduced in endosperm due to DME activity in the female gametophyte prior to fertilization. In a ros1 mutant with paternal hypermethylation, these sites become differentially methylated on the maternal and paternal alleles, resembling imprinted loci. This work convincingly establishes a function for ROS1 in DNA methylation patterning in endosperm. However, I struggled with the clarity of the writing and reasoning in a few places, and would suggest clarification of a few points and additional analyses

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary

      Hemenway and Gehring present evidence that the paternal genome in Arabidopsis endosperm is demethylated at several hundred loci by the DNA glycosylase/lyase ROS1. The evidence is primarily based on analysis of DNA methylation of ros1 mutants and of hybrid crosses where each parental genome can be differentiated by SNPs. I have some comments/questions/concerns, two of them potentially serious, but I think Hemenway and Gehring can address them through additional analyses of data that they already have available and a bit of clarification in writing.

      Major comments:

      1. Could the excess methylation in ros1-3 relative to ros1-7 shown in Figures 1A and 1C be explained by a second mutation in the ros1-3 background that elevates methylation at some loci? Any mutation that increased RdDM at these loci, for example could have this effect. This could confound the identification and interpretation of biallelicly demethylated loci.
      2. It appears that the main focus of the manuscript, the existence of loci that are paternally demethylated by ROS1, is supported by a set of 274 DMRs. This is a small number relative to the size of the genome and raises suspicions of rare false positives. Even the most stringent p-values that DMR-finding tools report do not guarantee that the DMRs are actually reproducible in an independent experiment. Demonstrating overlap between these 274 DMRs and an independently defined set using a different WT control and different ros1 allele would suffice to remove this concern. It appears that authors already have the needed raw data with ros1-1 and ros1-7 alleles.
      3. Because of the multiple sets of DMRs identified and used throughout the paper, it is hard to follow which one is which. There are DMRs defined solely by one sequence context, DMRs defined by all three contexts merged, DMRs defined by comparisons between maternal and paternal methylation in endosperm, DMRs defined by comparison between mutants and wildtype, and more. These need clearer descriptions of which sets are being referred to throughout the main text and in figure legends. A table summarizing them might help (not in the supplement). Use of consistent and precisely defined terms would help. Stating the number of DMRs along with the name for each set would help a lot, even though this would make for some redundancy. (The number of DMRs in each set not only helps with interpretation but also act as a sort of ID). The reason I put this as a major concern is because the text and figures are difficult to understand, and it is currently hard to evaluate both the results and the authors' conclusions from those results.

      Minor comments

      1. The sRNA results in Figure 2B are difficult to interpret because they do not reveal anything about the number of TEs that have siRNAs overlapping them or their flanks. While the magnitude of some of the highest endosperm sRNA peaks is higher than the embryo peaks, that could be explained by a small number of TEs with large numbers of sRNAs. To make this result more interpretable, we also need some information about how many TEs have a significant number of sRNAs associated with them in endosperm and embryo in each region (e.g., middle, 5', 3', and flanks of TEs). What a "significant number of sRNAs" is would be up to the authors to decide based on the distribution of sRNA counts they observe for TEs. Perhaps the top quartile of TEs? Combined with the same analysis done in parallel with non-ROS1 target TEs, this would reveal whether there is any evidence for ROS1 counteracting sRNA-driven methylation spread from TEs.
      2. The statement "we are likely underestimating the true degree of differential methylation among genotypes" should be validated and partially quantified using a methylation metaplot like Figure 2A, but substitute DMRs for TEs. Related to that, Figure 1B needs an indicator of scale in bp.
      3. The statement "Over half of ROS1 target regions identified in the ros1-3 mutant endosperm were within 1 kb or intersecting a TE (Fig. 1D)" is hard to interpret without some kind of ROS1 non-target regions or whole-genome control comparison. How different are the numbers in Fig. 1D from a random expectation?
      4. The sentence at line 262 is confusing. Is the comparison between dme mutant and ros1 mutant or between different types of regions? And it appears that the comparison value is missing in the "3-5% CG methylation gain..." e.g., "3-5% CG methylation vs 10-20%" or something like that.
      5. The dme mutant data in Figure 5C appear to be key to the model in Figure 7. The relative impact of the dme mutant in the two types of regions should be quantified.
      6. Looks like sRNA methods are missing.
      7. Supplemental Figure 1 is hard to interpret since it only list gene IDs, not gene names.

      The last comments are suggestions for increasing the impact of this study:<br /> 11. Figure 2A and 3B suggest that ROS1 target TEs show demethylation in their flanks but not in the TE themselves. This is an interesting result. If it is true, more DMRs would be expected in the ROS1 target flanks than in the ROS1 target TEs. Reporting how many ROS1 target TEs have DMRs in them and what proportion have DMRs in their flanking 1-Kb regions would answer this question. Given the significance of this result, it also deserves a bit more context: Is the magnitude of increased methylation flanking TEs in ros1 mutant endosperm different than in ros1 mutant leaves or other tissue? Does methylation in TE flanks behave the way in dme mutant endosperm?<br /> 12. The idea of biallelic demethylation has been theoretically suggested in maize to explain weak overlap between endosperm DMRs and imprinting (Gent et al 2022). If that were true in Arabidopsis, then ROS1 target, biallelicly demethylated loci would be less likely to have imprinted expression than maternally demethylated loci. This prediction could be tested using available data in Arabidopsis.<br /> 13. There is currently no evidence for biological significance of biallelicly demethylated loci. Knowing where they are in the genome might give some hints. A figure like Fig. 1D but specifically showing the biallelicly demethylated DMRs would be valuable.<br /> 14. It is hard to make the comparisons between genotypes and parental genomes in Figure 6 and know what they mean. Maybe a different way of displaying the data would help. Or maybe even a different labeling system could make it a little more accessible.

      Significance

      Demethylation of the maternal genome in endosperm has been the subject of much research because it can result in genomic imprinting of gene expression. The enzymes responsible, DNA glycosylases/lyases, also demethylate DNA in other cell types as well, where DNA methylation is not confined to one parental genome (biallelic or biparental as opposed to uniparental demethylation). To the best of my knowledge, the extent or even existence of biallelelic demethylation in endosperm has not been studied until now (except for a superficial look in a bioRxiv preprint, https://www.biorxiv.org/content/10.1101/2024.07.31.606038v1). Hemenway and Gehring have carried out a thoughtful and detailed analysis of the topic in Arabidopsis at least as far as it depends on the DNA glycosylase ROS1.

      A limitation is that the study design would miss biallelic demethylation by any of the other three DNA glycosylases in Arabidopsis. A second limitation is that there is no clear biological significance, just some conjecture about evolution. Nonetheless, given the novelty of the topic, biological significance may follow.

      The audience for biallelic DNA demethylation in Arabidopsis endosperm is certainly in the "specialized" category, but its relevance to the larger topic of gene regulation in endosperm will attract a larger audience.

    1. In other words, AI will not enable the creation of quality translations for people who previously lacked that ability. That part still requires a human feel for the linguistic and cultural elements of the translation. But for those who are just looking to get a rough but passable translation (say, for research) it should work most of the time. And for those who would love to create quality translations but face huge opportunity costs and zero financial incentives, AI could lead to new possibilities.The deeper risk is not that AI will replace historians or translators, but that it will convince us we never needed them in the first place. A tool that outputs polished, confident language with no sense of ambiguity or context is appealing to people who think facts will save us. But there is a vast difference between facts and truths. If we come to treat them as interchangeable, we cede interpretation to the machines and narrative power to those who design them. So maybe Microsoft was right after all, just not in the way they think. Historians and translators may be the first to go not because their work is easy to automate, but because the interpretive element of their labor has always been invisible or, when made visible, dismissed as odious human bias. AI will replace them only in the minds of people who never understood what they were doing in the first place. Which, given how things are going, may be enough.

      If you believe in the Fact you are more likely to heuristic your way toward that polished language as a signal of plausibility

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The manuscript by Ozcan et al., presents compelling evidence demonstrating the latent potential of glial precursors of the adult cerebral cortex for neuronal reprogramming. The findings substantially advance our understanding of the potential of endogenous cells in the adult brain to be reprogrammed. Moreover, they describe a molecular cocktail that directs reprogramming toward corticospinal neurons (CSN).

      Strengths:

      Experimentally, the work is compelling and beautifully designed, with no major caveats. The main conclusions are fully supported by the experiments. The work provides a characterization of endogenous progenitors, genetic strategies to isolate them, and proof of concept of exploiting these progenitors' potential to produce a specific desired neuronal type with "a la carte" combination of transcription factors.

      Weaknesses:

      Some issues need to be addressed or clarified before publication. The manuscript requires editing. It is dense and rich in details while in other parts there are a few mistakes.

      We thank the reviewer for their excellent summary and for their extremely positive review of our paper. We are pleased that the experimental design and conclusions were judged to be wellsupported.

      We have revised the paper to enhance clarity, include additional relevant citations, and refine terminology in some sections of the original version.

      We appreciate the reviewer’s thoughtful review and agree that these revisions enhance the paper.

      Reviewer #2 (Public Review):

      Summary:

      Here the authors show a novel direct neuronal reprogramming model using a very pure culture system of oligodendrocyte progenitor cells and demonstrate hallmarks of corticospinal neurons to be induced when using Neurogenin2, a dominant-negative form of Olig2 in combination with the CSN master regulator Fezf2.

      Strengths:

      This is a major achievement as the specification of reprogrammed neurons towards adequate neuronal subtypes is crucial for repair and still largely missing. The work is carefully done and the comparison of the neurons induced only by Neurogenin 2 versus the NVOF cocktail is very interesting and convincingly demonstrates a further subtype specification by the cocktail.

      Weaknesses:

      As carefully as it is done in vitro, the identity of projection neurons can best be assessed in vivo. If this is not possible, it could be interesting to co-culture different brain regions and see if these neurons reprogrammed with the cocktail, indeed preferentially send out axons to innervate a co-cultured spinal cord versus other brain region tissue.

      We appreciate the reviewer’s positive evaluation of our work and their recognition of its significance in advancing neuronal subtype specification through directed differentiation of endogenous progenitors. 

      We agree with the reviewer’s suggestion that a very interesting future stage of this work would be to investigate the projection neuron identity in vivo. We aim to pursue follow-up studies to investigate in vivo integration and connectivity of such neurons generated by directed differentiation from endogenous SOX6+/NG2+ cortical progenitors. As the reviewer insightfully suggests, co-culturing different brain regions with these neurons could offer an alternative strategy to partially assess potential preferential connectivity into cultured spinal cord vs. alternate tissue.

      We agree with the reviewer that future investigation in vivo will further strengthen the implications of this work.

      Reviewer #3 (Public Review):

      Summary:

      Ozkan, Padmanabhan, and colleagues aim to develop a lineage reprogramming strategy towards generating subcerebral projection neurons from endogenous glia with the specificity needed for disease modelling and brain repair. They set out by targeting specifically Sox6-positive NG2 glia. This choice is motivated by the authors' observation that the early postnatal forebrain of Sox6 knockout mice displays marked ectopic expression of the proneural transcription factor (TF) Neurog2, suggesting a latent neurogenic program may be derepressed in NG2 cells, which normally express Sox6. Cultured NG2 glia transfected with a construct ("NVOF") encoding Neurog2, the corticofugal neuron-specifying TF Fezf2, and a constitutive repressor form of Olig2 are efficiently reprogrammed to neurons. These acquire complex morphologies resembling those of mature endogenous neurons and are characterized by fewer abnormalities when compared to neurons induced by Neurog2 alone. NVOF-induced neurons, as a population, also express a narrower range of cortical neuron subtype-specific markers, suggesting narrowed subtype specification, a potential step forward for Neurog2-driven neuronal reprogramming. Comparison of NVOF- and Neurog2-induced neurons to endogenous subcerebral projection neurons (SCPN) also indicates Fezf2 may aid Neurog2 in directing the generation of SCPN-like neurons at the expense of other cortical neuronal subtypes.

      Strengths:

      The report describes a novel, highly homogeneous in vitro system amenable to efficient reprogramming. The authors provide evidence that Fezf2 shapes the outcome of Neurog2-driven reprogramming towards a subcerebral projection neuron identity, consistent with its known developmental roles. Also, the use of the modified RNA for transient expression of Neurog2 is very elegant.

      Weaknesses:

      The molecular characterization of NVOF-induced neurons is carried out at the bulk level, therefore not allowing to fully assess heterogeneity among NVOF-induced neurons. The suggestion of a latent neurogenic potential in postnatal cortical glia is only partially supported by the data from the Sox6 knockout. Finally, some of the many exciting implications of the study remain untested.

      Discussion:

      The study has many exciting implications that could be further tested. For example, an ultimate proof of the subcerebral projection neuron identity would be to graft NVOF cells into neonatal mice and study their projections. Another important implication is that Sox6-deficient NG2 glia may not only express Neurog2 but activate a more complete neurogenic programme, a possibility that remains untested here.

      Also, is the subcerebral projection neuron dependent on the starting cell population? Could other NG2 glia, not expressing Sox6, also be co-axed by the NVOF cocktail into subcerebral projection neurons? And if not, do they express other (Sox) transcription factors that render them more amenable to reprogramming into other cortical neuron subtypes? The authors state that SOX6-positive NG2 glia are a quiescent progenitor population. Given that NG2 glia is believed to undergo proliferation as a whole, are Sox6-positive NG2 glia an exception from this rule? Finally, the authors seem to imply that subcerebral projection neurons and Sox6-positive NG2 glia are lineage-related. However, direct evidence for this conjecture seems missing.

      We appreciate the reviewer’s thoughtful and detailed review of this work. We especially appreciate the positive evaluation of the work and the highlighting of multiple strengths of our approach, including the role of Fezf2 in refining neuronal subtype identity and the use of modified RNA to enable transient expression of Neurog2.

      We acknowledge the reviewer’s comment that single-cell transcriptomic analysis would indeed provide a more granular view of likely heterogeneity. This current study focuses on investigating the feasibility of directed differentiation of corticospinal-like neurons from endogenous progenitors. Future work employing single-cell sequencing could indeed help delineate the heterogeneity of neurons generated by directed differentiation, and potentially contribute toward identification of potential molecular roadblocks in different subsets.

      Regarding the suggestion that SOX6-deficient NG2+ progenitors might activate a broader neurogenic program, we agree that this is an intriguing possibility. We are currently conducting indepth investigation of the loss of SOX6 function in NG2+ progenitors, and we aim to submit this quite distinct work for separate publication.

      The reviewer raises an important point about whether SOX6+/NG2+ progenitors and subcerebral projection neurons are indeed normally lineage-related. In the current work, we utilized postnatal cortical SOX6+/NG2+ progenitors that are thought to be largely derived from EMX1+ and GSH2+ ventricular zone neural progenitors. Our unpublished data from the separate study noted above indicate that SOX6 is expressed by both these lineages in vivo. Since subcerebral projection neurons are derived from EMX1+ ventricular zone progenitors (SOX6-expressing), at least some of the SOX6+/NG2+ progenitors are expected to share a lineage relationship with subcerebral projection neurons. While our data strongly suggest such a link, we agree that direct lineagetracing could be pursued in future work. 

      Finally, we agree with the reviewer’s suggestion that in vivo transplantation to assess the identity and connectivity of neurons generated by directed differentiation would be very interesting, and is a natural next phase of this work. We aim to pursue such work in future investigations.

      We again thank the reviewer for their insightful comments.

      Reviewer #1 (Recommendations For The Authors): 

      The most important clarification for me concerns the initial description of the progenitors. I think there is a mistake with the transgenic line NG2. The dsRed mouse used in Figure 1 C is not described until later in the results describing Figure 2. This was confusing. Moreover, perhaps this is a reason why I get confused and do not understand how the authors conclude that SOX6+ cells are a subset of NG2positive cells. Panel C shows the opposite. Please correct the description and show the quantification of data in panel 1C.

      We thank the reviewer for their thoughtful review and for highlighting this important point. We appreciate the reviewer pointing out the benefit of further clarity regarding the NG2.DsRed transgenic mouse description in Figure 1C. We have revised the text to clarify the use of the transgenic line and ensure that the DsRed mouse is properly introduced. Additionally, we have further clarified the description explaining the basis for concluding that SOX6+ cells are a subset of NG2+ cells and further integrate this conclusion with the data presented.

      During cell sorting from the cortices of NG2.DsRed mice, we observe two distinct populations of NG2-DsRed+ cells based on fluorescence intensity in FACS: NG2-DsRed “bright” and NG2-DsRed “dim” populations. The NG2-DsRed “dim” population consists of a heterogenous mix of NESTIN+ progenitors, GFAP+ astrocytes/progenitors, a subset of NG2+ cells, and other unidentified cells. In contrast, the DsRed “bright” population includes a broader group of progenitors that also give rise to oligodendrocytes (please see Zhu, Bergles, and Nishiyama 2008), along with pericytes. 

      Previous studies have shown that, while dorsal/pallial VZ progenitors express SOX6 during embryonic development, SOX6 expression becomes restricted to interneurons postnatally (these do not express NG2 proteoglycan; Azim et al., 2009) and to the broader group of NG2+ progenitors that also give rise to oligodendrocytes. The ICC image in Fig. 1C shows bright NG2+ cells in the cortex, many of which express SOX6. Thus, we conclude that SOX6+ cells constitute a subset of NG2-DsRed+ cells. 

      In a similar line, the work is beautiful, but the manuscript can gain a lot from shortening and some more editing. for example:

      (1) In the abstract, the word inappropriate should be removed. It seems to me that is an unnecessary subjective qualification - it is hardly possible that in biology we found repression of something inappropriate.

      We have removed the word “inappropriate”.

      (2) FACS-purify these genetically accessible....establish a pure culture. Genetically accessible is nice, and I understand that it conveys that they can be traced in the mouse, but everything is genetically accessible with the right tool, and perhaps it is more informative to explain which gene or report is used for the isolation. These cells are not accessible in humans. Also, I consider it best to remove pure- the culture is pure (purified by FACS) cells.

      We have revised the text to specify the gene/reporter used for isolation instead of using "genetically accessible", and we removed "pure", since FACS purification is already explicitly mentioned.

      (3) In the initial paragraph in the results: "They are exposed to the same morphogen gradients throughout embryonic development, and thus, compared to distant cell types, have similar epigenomic and transcription landscapes." This is proven in the cited publication, but the way is stated here seems a bit of an unnecessary overstatement. The hypothesis stated after this paragraph is as good as it is with or without this argument.

      We have revised the text and simplified the statement. We agree that the hypothesis remains clear and well-supported without this emphasis.

      (4) In the result sections, "two distinct populations of DsREd-positive cells were identified based on fluorescence intensity"- I know it is correct, but when reading the percentages, I was confused because those percentages divided the population into three fractions. What the authors do not explain is that they discard the intermediate-expressing population.

      We appreciate the reviewer highlighting this inadvertent point of confusion. We erred by discussing only the two populations of central interest to us (DsRed-bright and DsRed-dim), and did not explicitly mention the DsRed-negative population. We have now clarified the text to include all three cell populations and their percentages of the total cells in all three populations (in the original manuscript and still now, ~75-78% were DsRed-negative). We have also further clarified that only DsRed-Bright cells (identified as progenitors) were used for all subsequent experiments.

      These examples illustrate the type of editing that would be appreciated but which is entirely up to the authors.

      We thank the reviewer for their thoughtful suggestions toward improving clarity and precision. We have incorporated these recommendations, along with suggestions from the other two reviewers, in the revised paper.

      Reviewer #2 (Recommendations For The Authors):

      (1)  The authors start their results section by showing in situ Hybridization for Ngn2 in control and Sox6KO mice. These control sections do not look convincing, as there is not even some signal in the adult VZSVZ region and virtually no background. Please show sections where some positive signal can also be detected in the control sections.

      We agree with the reviewer that making direct comparisons in ISH experiments is an important point. In our ISH experiments, to ensure consistency and appropriate comparisons, we process WT and KO sections together and stop the signal development simultaneously. We could have extended the development time to enhance WT signal to a detectable level, but that would have led to excessive background and over-saturated signal in the KO sections.

      To address the reviewer’s point, we have added a new supplementary figure with an additional pair of WT and KO sections, along with reference data from the Allen Brain Atlas. The WT section shows faint Neurog2 expression in the dentate gyrus region of the hippocampus, while the KO section confirms very substantial upregulation of Neurog2 in the absence of SOX6 function. These additional data enhance the clarity and depth of our results.

      Please see the following link for the Allen Brain Atlas ISH data demonstrating that Neurog2 expression in the postnatal (P4) SVZ/SGZ is inherently low. (https://developingmouse.brainmap.org/experiment/show/100093831). 

      (2) As a hallmark of projection neurons is where they send their axons, it would be important to include a biological assay for this. Of course, in vivo experiments would be great, but if this is not possible, the authors could co-culture sections from the late embryonic cortex, striatum, and spinal cord to see if the reprogrammed neurons preferentially extend their axons towards one of these targets (as normally developing neurons would, see e.g. Bolz et al., 1990).

      We agree with the reviewer’s suggestion that a very interesting future stage of this work would be to investigate the projection neuron identity including connectivity in vivo. We aim to pursue follow-up studies to investigate in vivo integration and connectivity of such neurons generated by directed differentiation from endogenous SOX6+/NG2+ cortical progenitors. As the reviewer insightfully suggests, co-culturing different brain regions with these neurons could offer an alternative strategy to partially assess potential preferential connectivity into cultured spinal cord vs. alternate tissue. This area of investigation is of substantial interest to our lab, and we aim to pursue it in the coming years– it is a very large undertaking by either approach.

      (3) However, if the loss of Sox6 is sufficient for Ngn2 to be upregulated, why did the authors not pursue this approach in their reprogramming experiments? Are these endogenous levels sufficient for reprogramming? Please add some OPC cultures from WT and KO mice to explore their conversion to neurons and possibly combine them with Olig2VP16 and Fezf2.

      We thank the reviewer for this insightful comment and for raising this broader area of inquiry regarding whether SOX6 might be down-regulated to enhance induction of neurogenesis. We are writing a separate manuscript regarding function of SOX6 in these progenitors during normal or molecularly manipulated development. We investigate function of SOX6 using both whole body null mice and a series of conditional null mice. We aim to post that work as a preprint and submit it for review and publication in the coming months. Beyond that work, the potential strategy of downregulating SOX6 function while simultaneously upregulating other molecular controls to refine directed neuronal differentiation is also of substantial interest to us, and we aim to pursue this in follow-up work. Though these are both interesting questions/topics, we respectfully submit that these broad areas of parallel, complex, and future investigation would substantially expand the scope of work in this paper, so we aim to address them in separate studies.

      (4) Please indicate independent biological replicates as individual data points in all histograms, i.e. also in Figure 2K, Figure 4I, S2H.

      We have updated the figure legends indicating the biological replicates, and explained the broad media optimization that was used successfully in all further experiments.

      (5) GFP labelling in Figures S2K-N is not convincing - too high background. Please optimize.

      We have redesigned this figure and now present it as a new supplementary figure, with GFP pseudocolored in gray and enlarged subpanels for improved visualization of cell morphology.

      Reviewer #3 (Recommendations For The Authors):

      This is an extremely well-written manuscript with very exciting implications. Obviously, not all can be tested here. Some of the suggestions are relatively easy and may be worth testing right away, others may require more extensive study in the future. In my view, completing some of the points below could make this paper a landmark study.

      I start with the key questions:

      (1) Do grafted NVOF cells give rise to subcerebral projection neurons in vivo?

      We agree with the reviewer’s suggestion that a very interesting future stage of this work would be to investigate the projection neuron identity including connectivity in vivo. As noted above in response to Reviewer 2, we aim to pursue follow-up studies to investigate in vivo integration and connectivity of such neurons generated by directed differentiation from endogenous SOX6+/NG2+ cortical progenitors. This question is of substantial interest to us, and we aim to pursue it in the coming years– as the reviewer notes, this is a very large undertaking, and beyond the scope of this paper.

      (2) What is the fate of the Sox6 deficient NG2 glia that express Neurog2? One could isolate these cells and subject them to scRNA sequencing to see how far neurogenesis proceeds without addition of exogenous factors.

      We thank the reviewer for this insightful question. As noted in our response to Reviewer 2, we are writing a separate manuscript regarding function of SOX6 in these progenitors during normal or molecularly manipulated development. We investigate function of SOX6 using both whole body null mice and a series of conditional null mice. We aim to post that work as a preprint and submit it for review and publication in the coming months, likely in early summer. We respectfully submit that this broad area of parallel, complex investigation would substantially expand the scope of work in this paper and make this paper too complex and multi-directional, so we aim to publish them as separate papers for the benefit of clarity for readers.

      (3) Obviously, what happens to Sox6-deficient (or non-deficient cells) when forced to express NVOF? In this context, it might be fair to cite Felske et al (PLoS Biol, 2023) who report Neurog2 and Fezf2-induced reprogramming in the postnatal brain. In their model, these authors did not distinguish between converted astrocytes and NG2 glia. Thus, some of the reprogrammed cells may comprise the SOX6positive cells described here.

      We thank the reviewer for highlighting for us that we inadvertently omitted referencing the important paper by Felske et al., 2023. We have now included this citation. 

      We thank the reviewer for raising this broader area of inquiry regarding whether SOX6 might be down-regulated to enhance induction of neurogenesis. Beyond the work noted above regarding function of SOX6 in these progenitors during normal or molecularly manipulated development, the potential strategy of downregulating SOX6 function while simultaneously upregulating other molecular controls to refine directed neuronal differentiation is of substantial interest to us, and we aim to pursue this in follow-up work. We again respectfully submit that this area of complex, future investigation should be addressed in future studies.

      Very interesting unaddressed questions include:

      (1) Are Sox6+ NG glia of dorsal origin? This is implied but not shown. One could use Emx1Cre lines to assess this. Are Sox6+ glia and subcerebral projection neurons clonally related? This may be more challenging. In this context, it might be again fair to refer to Herrero-Navarro et al (Science Advances 2021) who show that glia lineage related to nearby neurons gives rise to induced neurons with regional specificity.

      The reviewer raises an important question regarding the competence of SOX6+/NG2+ progenitors from distinct origins to generate corticospinal-like neurons by directed differentiation. In ongoing unpublished work, we have identified SOX6 expression by NG2+ progenitors of the three lineages derived from ventricular zone progenitors that express either Emx1, Gsh2, or Nkx2.1 transcription factors. The EMX1+ lineage-derived SOX6+/NG2+ progenitors are directly lineage related to cortical projection neurons. As the reviewer suggests, future experiments could explore potential differences in competence between these three populations.

      We again thank the reviewer for highlighting for us that we also inadvertently omitted referencing the exciting study by Herrero-Navarro that addresses the question of regional heterogeneity within astrocytes and the differential reprogramming potential related to their origins. We have now cited this paper in the manuscript.

      (2) Do other NG2 glia not give rise to subcerebral projection neurons when challenged with NVOF? Thus, how important is Sox6 expression really?

      The question of the specific competence of dorsal/cortical SOX6+/NG2+ progenitors to differentiate into corticospinal-like neurons, and the strategy of downregulating SOX6 function while simultaneously upregulating other molecular controls to direct neuronal differentiation, are both of great interest to us. In pilot experiments, we observed reduced competence of ventrallyderived SOX6+/NG2+ progenitors to generate similar neurons. We plan to pursue the SOX6 manipulation in follow up work.

      (3) Do Sox6+ NG2 glia proliferate like other NG2 glia and thereby represent a replenishable pool of progenitors?

      Yes; as noted in the text shortly after Figure 1, and as presented in Figure S3l-L, these progenitors proliferate robustly in response to the mitogens PDGF-A and FGF2.

      (4) How heterogenous are the NVOF-induced neurons? The bulk highlights the overall specificity, but does not tell whether all cells make it equally well.

      We agree with the reviewer that this is an interesting question. ICC analysis (Fig. 4G-4H) presents the variation in the levels of a few functionally important proteins in the population of NVOFinduced neurons. This could be due to any or all of at least three potential possibilities: 1) potential diversity in the population of purified SOX6+/NG2+ progenitors; 2) technical variability in the amount of NVOF plasmid delivered to individual progenitors during transfection; and/or 3) natural stochastic TF-level variations generating closely-related neuron types, that also occurs during normal development. Future experiments could explore these questions.

    1. Reviewer #1 (Public review):

      In this updated and improved manuscript, the authors investigate the role of Aurora Kinase A (AurA) in trained immunity, following a broader drug screening aimed at finding inhibitors of training. They show AurA is important for trained immunity by looking at the different aspects and layers of training using broad omics screening, followed up by a more detailed investigation of specific mechanisms. The authors finalised the investigation with an in vivo MC-38 cancer model where AurA inhibition reduces beta-glucan's antitumour effects.

      Strengths:

      The experimental methods are generally well-described. I appreciate the authors' broad approach to studying different key aspects of trained immunity (from comprehensive transcriptome/chromatin accessibility measurements to detailed mechanistic experiments). Approaching the hypothesis from many different angles inspires confidence in the results. Furthermore, the large drug-screening panel is a valuable tool as these drugs are readily available for translational drug-repurposing research.

      In response to the rebuttal, I would like to compliment and thank the authors for the large amount of work they have done to improve this manuscript. They have removed most of my previous concerns and confusions, and explained some of their approaches in a way that I now agree with them - a great learning opportunity for me as well.

      Weaknesses:

      (1) The authors have adequately responded to my comments and updated the manuscript accordingly.

      (2) The authors have removed most of my concerns. Regarding the use of unpaired tests because that is what is often done in the literature: I still don't agree with this, nor do I think that 'common practice' is a solid argument to justify the approach. However, we can agree to disagree, as I know indeed that many people argue over when paired tests are appropriate in these types of experiments. I appreciate that n=2 for sequencing experiments is justifiable in the way these analyses are used as exploratory screening methods with later experimental validation. I also want to thank the authors for reporting biological replicates where relevant and (I should have mentioned this in my original review also) I appreciate they validate some findings in a separate cell line - many papers neglect this important step.

      (3) The authors have adequately responded to my comments and updated the manuscript accordingly.

      (4) The authors have adequately responded to my comments and updated the manuscript accordingly.

      (5) The authors have adequately responded to my comments and updated the manuscript accordingly.

      (6) The authors have adequately responded to my comments and updated the manuscript accordingly. They have actually gone above and beyond.

      (7) I would like to thank the authors for highlighting this information and taking away my confusion. The authors have adequately responded to my comments and updated the manuscript accordingly.

      (8) The authors have adequately responded to my comments and updated the manuscript accordingly.

      (9) I still think adding the 'alisertib alone' control would be of great added value, but I can see how it is unreasonable to ask the authors to redo those experiments.

      (10) The authors have adequately responded to my comments and updated the manuscript accordingly.

      (11) The authors have adequately responded to my comments and updated the manuscript accordingly.

      (12) I thank the authors for their work to repeat this experiment with my suggestions included. I am convinced by this nice data. I would recommend that the authors put the data from New Figure 4 also in the manuscript as it adds value to the manuscript (unless I just missed it, I don't see it in Figure 6 or the supplement). Not every reader may look at the reviewer comments/rebuttal documents.

    2. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer#1 (Public review):

      This work regards the role of Aurora Kinase A (AurA) in trained immunity. The authors claim that AurA is essential to the induction of trained immunity. The paper starts with a series of experiments showing the effects of suppressing AurA on beta-glucan-trained immunity. This is followed by an account of how AurA inhibition changes the epigenetic and metabolic reprogramming that are characteristic of trained immunity. The authors then zoom in on specific metabolic and epigenetic processes (regulation of S-adenosylmethionine metabolism & histone methylation). Finally, an inhibitor of AurA is used to reduce beta-glucan's anti-tumour effects in a subcutaneous MC-38 model.

      Strengths:<br /> With the exception of my confusion around the methods used for relative gene expression measurements, the experimental methods are generally well-described. I appreciate the authors' broad approach to studying different key aspects of trained immunity (from comprehensive transcriptome/chromatin accessibility measurements to detailed mechanistic experiments). Approaching the hypothesis from many different angles inspires confidence in the results (although not completely - see weaknesses section). Furthermore, the large drug-screening panel is a valuable tool as these drugs are readily available for translational drug-repurposing research.

      We thank the reviewer for the positive and encouraging comments.

      Weaknesses:

      (1) The manuscript contains factual inaccuracies such as:

      (a) Intro: the claim that trained cells display a shift from OXPHOS to glycolysis based on the paper by Cheng et al. in 2014; this was later shown to be dependent on the dose of stimulation and actually both glycolysis and OXPHOS are generally upregulated in trained cells (pmid 32320649).

      We appreciate the reviewer for pointing out this inaccuracy, and we have revised our statement to ensure accurate and updated description in manuscript. We are aware that trained immunity involves different metabolic pathways, including both glycolysis and oxidative phosphorylation [1, 2]. We also detected Oxygen Consumption Rate (please see response to comment 8 of reviewer#1) but observed no obvious increase of oxygen consumption in trained BMDMs in our experiment setting. As the reviewer pointed out, it might be dependent on the dose of stimulation.

      (b) Discussion: Trained immunity was first described as such in 2011, not decades ago.

      We are sorry for the inaccurate description, and we have corrected the statement in our revised manuscript as “Although the concept of ‘trained immunity’ has been proposed since 2011, the detailed mechanisms that regulate trained immunity are still not completely understood.”

      (2) The authors approach their hypothesis from different angles, which inspires a degree of confidence in the results. However, the statistical methods and reporting are underwhelming.

      (a) Graphs depict mean +/- SEM, whereas mean +/- SD is almost always more informative. (b) The use of 1-tailed tests is dubious in this scenario. Furthermore, in many experiments/figures the case could be made that the comparisons should be considered paired (the responses of cells from the same animal are inherently not independent due to their shared genetic background and, up until cell isolation, the same host factors like serum composition/microbiome/systemic inflammation etc). (c) It could be explained a little more clearly how multiple testing correction was done and why specific tests were chosen in each instance.

      We sincerely thank the reviewer for this thoughtful comment. (a) The data from animal experiments in which trained immunity was induced in vivo are presented as mean ± SD, while the statistical results from cell-based experiments are presented as mean ± SEM in the revised manuscript. (b) We have replaced one-tailed test with two-tailed test (see Figure 3J in revised manuscript, with updated P value label). We agree that cells derived from the same animal and subjected to different treatment conditions may be deemed paired data. We reanalyzed our data using paired statistical tests. While this led to a slight reduction in statistical significance for some comparisons, the overall trends remained consistent, and our biological interpretation remains unchanged. For in vitro experiments unpaired statistical tests are commonly used in literature [3, 4]. Thus, we still used unpaired test results here. (c) We have provided a detailed description of how multiple comparisons were performed in revised figure legends.

      (d) Most experiments are done with n = 3, some experiments are done with n = 5. This is not a lot. While I don't think power analyses should be required for simple in vitro experiments, I would be wary of drawing conclusions based on n = 3. It is also not indicated if the data points were acquired in independent experiments. ATAC-seq/RNA-seq was, judging by the figures, done on only 2 mice per group. No power calculations were done for the in vivo tumor model.

      We are sorry for the confusion in our description in figure legends. For the in vivo experiment, we determined the sample size (n=5, n refers to number of mice used as biological replicates) by referring to the animal numbers used for similar experiments in literatures. And according to a reported resource equation approach for calculating sample size in animal studies [5], n=5-7 is suitable for most of our mouse experiments. The in vitro cell assay was performed at least three independent experiments (BMs isolated from different mice), and each experiment was independently replicated at least three times and points represents biological replicates in our revised manuscript. In Figure 1A, 5 biological replicates of these experiments are presented to carefully determine a working concentration of alisertib that would not significantly affect the viability of trained macrophages, and that was subsequently used in all related cell-based experiments. As for seq data, we acknowledge the reviewer's concern regarding the small sample size (n=2) in our RNA-seq/ATAC-seq experiment. We consider the sequencing experiment mainly as an exploratory/screening approach, and performed rigorous quality control and normalization of the sequencing data to ensure the reliability of our findings. For RNA-seq data analysis, we referred to the DESeq2 manual, which specifies that its statistical framework is based on the Negative Binomial Distribution and is capable of robustly inferring differential gene expression with a minimum of two replicates per group. Therefore, the inclusion of two replicates per group was deemed sufficient for our analysis. Nevertheless, the genomic and transcriptome sequencing data were used primarily for preliminary screening, where the candidates have been extensively validated through additional experiments. For example, we conducted ChIP followed by qPCR for detecting active histone modification enrichment in Il6 and Tnf region to further verify the increased accessibility of trained immunity-induced inflammatory genes.

      (e) Furthermore, the data spread in many experiments (particularly BMDM experiments) is extremely small. I wonder if these are true biological replicates, meaning each point represents BMDMs from a different animal? (disclaimer: I work with human materials where the spread is of course always much larger than in animal experiments, so I might be misjudging this.).

      Thanks for your comments. In our initially submitted manuscript, some of the statistical results were presented as the representative data (technical replicates) from one of three independent biological replicates (including BMDMs experiments showing the suppression and rescue experiments of trained immunity under different inhibitors or activators, see original Figure 1B-C, Figure 5D, and Figure 5H, also related to Figure 1B-C, Figure 5D, and Figure 5H respectively in our revised manuscript) while other experimental data are biological replicates including CCK8 experiment, metabolic assay and ChIP-qPCR. In response to your valuable suggestion, we have revised the manuscript to present all statistical results as biological replicates from three independent experiments (presented as mean ± SEM), and we have provided all the original data for the statistical analysis results (please see Appendix 2 in resubmit system).

      (3) Maybe the authors are reserving this for a separate paper, but it would be fantastic if the authors would report the outcomes of the entire drug screening instead of only a selected few. The field would benefit from this as it would save needless repeat experiments. The list of drugs contains several known inhibitors of training (e.g. mTOR inhibitors) so there must have been more 'hits' than the reported 8 Aurora inhibitors.

      Thank you for your suggestion and we have briefly reported the outcomes of the entire drug screening in the revised manuscript. The targets of our epigenetic drug library are primarily categorized into several major classes, including Aurora kinase family, histone methyltransferase and demethylase (HMTs and KDMs), acetyltransferase and deacetylase (HDACs and SIRTs), JAK-STAT kinase family, AKT/mTOR/HIF, PARP family, and BRD family (see New Figure 1, related to Figure 1-figure supplement 1B in revised manuscript). Notably, previous studies have reported that inhibition of mTOR-HIF1α signaling axis suppressed trained immunity[6]. Our screening results also indicated that most inhibitors targeting mTOR-HIF1α signaling exhibit an inhibitory effect on trained immunity. Additionally, cyproheptadine, a specific inhibitor for SETD7, which was required for trained immunity as previously reported [7], was also identified in our screening.

      JAK-STAT signaling is closely linked to the interferon signaling pathway, and certain JAK kinase inhibitors also target SYK and TYK kinases. A previous drug library screening study has reported that SYK inhibitors suppressed trained immunity [8]. Consistently, our screening results reveal that most JAK kinase inhibitors exhibit suppressive effects on trained immunity.

      BRD (Bromodomain) and Aurora are well-established kinase families in the field of oncology. Compared to BRD, the clinical applications of the Aurora kinase inhibitor are still at early stage. In previous studies using inflammatory arthritis models where trained immunity was established, both adaptive and innate immune cells exhibited upregulated expression of AurA [9, 10]. Our study provides further evidence supporting an essential role of AurA in trained immunity, showing that AurA inhibition leads to the suppression of trained immunity.

      (4) Relating to the drug screen and subsequent experiments: it is unclear to me in supplementary figure 1B which concentrations belong to secondary screens #1/#2 - the methods mention 5 µM for the primary screen and "0.2 and 1 µM" for secondary screens, is it in this order or in order of descending concentration?

      Thank you for your comments and we are sorry for unclear labelled results in original manuscript (related to Figure 1-supplement 1C). We performed secondary drug screen at two concentrations, and drug concentrations corresponding to secondary screen#1 and #2 are 0.2 and 1 μM respectively. It was just in this order, but not in an order of descending concentration.

      (a) It is unclear if the drug screen was performed with technical replicates or not - the supplementary figure 1B suggests no replicates and quite a large spread (in some cases lower concentration works better?)

      Thank you for your question. The drug screen was performed without technical replicates for initial screening purpose, and we need to verify any hit in the following experiment individually. Yes, we observed that lower concentration works better in some cases. We speculate that it might be due to the fact that the drug's effect correlates positively with its concentration only within a specific range. But in our primary screening, we simply choose one concentration for all the drugs. This is a limitation for our screening, and we acknowledge this limitation in our discussion part.

      (5) The methods for (presumably) qPCR for measuring gene expression in Figure 1C are missing. Which reference gene was used and is this a suitably stable gene?

      We are sorry for this omission. The mRNA expression of Il6 and Tnf in trained BMDMs was analyzed by a quantitative real-time PCR via a DDCt method, and the result was normalized to untrained BMDMs with Actb (β-actin) as a reference gene, a well-documented gene with stable expression in macrophages. We have supplemented the description for measuring gene expression in Material and Methods in our revised manuscript.

      (6) From the complete unedited blot image of Figure 1D it appears that the p-Aurora and total Aurora are not from the same gel (discordant number of lanes and positioning). This could be alright if there are no/only slight technical errors, but I find it misleading as it is presented as if the actin (loading control to account for aforementioned technical errors!) counts for the entire figure.

      We are very sorry for this omission. In the original data, p-Aurora and total Aurora were from different gels. In this experiment the membrane stripping/reprobing after p-Aurora antibody did not work well, so we couldn’t get all results from one gel, and we had to run another gel using the same samples to blot with anti-aurora antibody and used β-tubulin as loading control for total AurA (please see New Figure 2A, also related to original Figure 1D). We have provided the source data for β-tubulin from the same membrane of total AurA (please see Figure 1-source data). To avoid any potential misleading, we have repeated this experiment and updated this Figure (please see New Figure 2B, also related to Figure 1D in revised manuscript) with phospho-AurA, total AurA and β-actin from the same gel. The bands for phospho AurA (T288) were obtained using a new antibody (Invitrogen, 44-1210G) and we have revised this information in Material and Methods. We have provided data of three biological replicates to confirm the experiment result also see New Figure 2B, related to Figure 1D in revised manuscript, and the raw data have been added in source data for Figure 1)

      (7) Figure 2: This figure highlights results that are by far not the strongest ones - I think the 'top hits' deserve some more glory. A small explanation on why the highlighted results were selected would have been fitting.

      We appreciate the valuable suggestion. Figure 2 (see also Figure 2 in revised manuscript) presented information on the chromatin landscape affected by AurA inhibition to confirm that AurA inhibition impaired key gene activation involved in pro-inflammatory macrophage activation by β-glucan. In Figure 2B we highlighted a few classical GO terms downregulated including “regulation of growth”, “myeloid leukocyte activation” and “MAPK cascade” (see also Figure 2B in revised manuscript), among which “regulation of growth” is known function of Aurora A, just to show that alisertib indeed inhibited Aurora A function in vivo as expected. “Myeloid leukocyte activation” and “MAPK cascade” were to show the impaired pro-inflammatory gene accessibility. We highlighted KEGG terms downregulated like “JAK-STAT signaling pathway”, “TNF signaling pathway” and “NF-kappa B signaling pathway” in Figure 2F (see also Figure 2F in revised manuscript), as these pathways are highly relevant to trained immunity. Meanwhile, KEGG terms “FOXO signaling pathway” (see also Figure 2G in revised manuscript) was highlighted to confirm the anti-inflammation effect of alisertib in trained BMDMs, which was further illustrated in Figure 5 (see also Figure 5 in revised manuscript, illustrating FOXO3 acts downstream of AurA). Some top hits in Figure 2B like “positive regulation of cell adhesion”, and “pathway of neurodegeneration” and "ubiquitin mediated proteolysis" in Figure 2F and 2G, is not directly related to trained immunity, thus we did not highlight them, but may provide some potential information for future investigation on other functions of Aurora A.

      (8) Figure 3 incl supplement: the carbon tracing experiments show more glucose-carbon going into TCA cycle (suggesting upregulated oxidative metabolism), but no mito stress test was performed on the seahorse.

      We appreciate this question raised by the reviewer. We previously performed seahorse XF analyze to measure oxygen consumption rate (OCR) in β-glucan-trained BMDMs. The results showed no obvious increase in oxidative phosphorylation (OXPHOS) indicated by OCR under β-glucan stimulation (related to Figure 3-figure supplement 1 A) although the carbon tracing experiments showed more glucose-carbon going into TCA cycle. We speculate that the observed discrepancy between increased glucose incorporation into TCA cycle and unchanged OXPHOS may reflect a characteristic metabolic reprogramming induced by trained immunity. The increased incorporation of glucose-derived carbon into the TCA cycle likely serves a biosynthetic purpose—supplying intermediates for anabolic processes—rather than augmenting mitochondrial respiration[6]. Moreover, the unchanged OXPHOS may be attributed to a reduced reliance on fatty acid oxidation- “catabolism”, with glucose-derived acetyl-CoA becoming the predominant substrate. Thus, while overall OXPHOS remains stable, the glucose contribution to the TCA cycle increases. This is in line with reports showing that trained immunity promotes fatty acid synthesis- “anabolism”[11]. Alternatively, the partial decoupling of the TCA cycle from OXPHOS could result from the diversion of intermediates such as fumarate out of the cycle. Oxygen consumption rate (OCR) after a mito stress test upon sequential addition of oligomycin (Oligo, 1 μM), FCCP (1 mM), and Rotenone/antimycin (R/A, 0.5 μM), in BMDMs with different treatment for 24 h. β-glucan, 50 μg/mL; alisertib, 1 μM.

      (9) Inconsistent use of an 'alisertib-alone' control in addition to 'medium', 'b-glucan', 'b-glucan + alisertib'. This control would be of great added value in many cases, in my opinion.

      Thank you for your comment. We appreciate that including “alisertib-alone” group throughout all the experiments may further solidify the results. We set the aim of the current study to investigate the role of Aurora kinase A in trained immunity. Therefore, in most settings, we did not include the group of alisertib only without β-glucan stimulation.

      (10) Figure 4A: looking at the unedited blot images, the blot for H3K36me3 appears in its original orientation, whereas other images appear horizontally mirrored. Please note, I don't think there is any malicious intent but this is quite sloppy and the authors should explain why/how this happened (are they different gels and the loading sequence was reversed?)

      Thank you for pointing out this error. After checking the original data, we found that we indeed misassembled the orientation of several blots in original data submitted. We went through the assembling process and figured out that the orientation of blots in original data was assembled according to the loading sequences, but not saved correctly, so that the orientations in Figure 4A were not consistent with the unedited blot image. We are sorry for this careless mistake, and we have double checked to make sure all the blots are correctly assembled in the revised manuscript. We also provided three replicates of for the Western blot results showing the level of H3K36me3 in trained BMDMs was inhibited by alisertib (as seen in New Figure 7 at recommendation 2 of reviewer#2).

      (11) For many figures, for example prominently figure 5, the text describes 'beta-glucan training' whereas the figures actually depict acute stimulation with beta-glucan. While this is partially a semantic issue (technically, the stimulation is 'the training-phase' of the experiment), this could confuse the reader.

      Thanks for the reviewer’s suggestion and we have reorganized our language to ensure clarity and avoid any inconsistencies that might lead to misunderstanding.

      (12) Figure 6: Cytokines, especially IL-6 and IL-1β, can be excreted by tumour cells and have pro-tumoral functions. This is not likely in the context of the other results in this case, but since there is flow cytometry data from the tumour material it would have been nice to see also intracellular cytokine staining to pinpoint the source of these cytokines.

      Thanks for the reviewer’s suggestion. In Figure 6, we performed assay in mouse tumor model and found that trained immunity upregulated cytokines level like IL-6 in tumor tissue, which was downregulated by alisertib administration. In order to rule out the possibility that the detected cytokines such as IL-6 was from tumor cells, we performed intracellular cytokine staining of single cells isolated from tumor tissues (please see New Figure 4). The result showed that only a small fraction of non-immune cells (CD45<sup>-</sup> population) expressed IL-6 (0.37% ± 0.11%), whereas a significantly higher proportion of IL-6-positive cells was observed among CD45<sup>+</sup> population (deemed as immune cells, 13.66% ± 1.82%), myeloid cells (CD45<sup>+</sup>CD11b<sup>+</sup>, 15.60% ± 2.19%), and in particular, macrophages (CD45<sup>+</sup>CD11b<sup>+</sup>F4/80<sup>+</sup>37.24% ± 3.04%). These findings strongly suggest that immune cells, especially macrophages, are the predominant source of IL-6 cytokine within the tumor microenvironment. Moreover, we also detected higher IL-6 positive population in myeloid cells and macrophages (please see Figure 6I in revised manuscript).

      Reviewer#2 (Public review):

      Summary:

      This manuscript investigates the inhibition of Aurora A and its impact on β-glucan-induced trained immunity via the FOXO3/GNMT pathway. The study demonstrates that inhibition of Aurora A leads to overconsumption of SAM, which subsequently impairs the epigenetic reprogramming of H3K4me3 and H3K36me3, effectively abolishing the training effect.

      Strengths:

      The authors identify the role of Aurora A through small molecule screening and validation using a variety of molecular and biochemical approaches. Overall, the findings are interesting and shed light on the previously underexplored role of Aurora A in the induction of β-glucan-driven epigenetic change.

      We thank the reviewer for the positive and encouraging comments.

      Weaknesses:

      Given the established role of histone methylations, such as H3K4me3, in trained immunity, it is not surprising that depletion of the methyl donor SAM impairs the training response. Nonetheless, this study provides solid evidence supporting the role of Aurora A in β-glucan-induced trained immunity in murine macrophages. The part of in vivo trained immunity antitumor effect is insufficient to support the final claim as using Alisertib could inhibits Aurora A other cell types other than myeloid cells.

      We appreciate the question raised by the reviewer. Though SAM generally acts as a methyl donor, whether the epigenetic reprogram in trained immunity is directly linked to SAM metabolism was not formally tested previously. In our study, we provided evidence suggesting the necessity of SAM maintenance in supporting trained immunity. As for in vivo tumor model, we agree that alisertib may inhibits Aurora A in many cell types besides myeloid cells. To further address the reviewer’s concern, we have performed the suggested bone marrow transplantation experiment (trained mice as donor and naïve mice as recipient) to verify the contribution of myeloid cell-mediated trained immunity for antitumor effect (please see New Figure 8, also related to Figure 6C, 6D and Figure 6-figure supplement 1B and 1C in revised manuscript).

      Reviewer #1 (Recommendations for the authors):

      Some examples of spelling errors and other mistakes (by far not a complete list):

      (a) Introduction, second sentence: reads as if Candida albicans (which should be italicised and capitalised properly) and BCG are microbial polysaccharide components.

      (b) Methods: ECAR is ExtraCellular Acidification Rate, not 'Extracellular Acid Ratio'

      (c) Figure 2C: β-glucan is misspelled in the graph title.

      (d) TNFα has been renamed to 'TNF' for a long time now.

      (e) Inconsistent use of Tnf and Tfnα (the correct gene symbol is Tnf) (NB: this field does not allow me to italicise gene symbols)

      (f) Figure supplement 1B: 'secdonary'

      (g) Caption of figure 4: "Turkey's multiple-comparison test"

      (h) etc

      I would ask the authors that they please go over the entire manuscript very carefully to correct such errors.

      We apologize for these errors and careless mistakes. We greatly appreciate your suggestions, and have carefully proofread the revised manuscript to make sure no further mistakes.

      Please also address the points I raised in the public review about statistical approaches. Even more important than the relatively low 'n' is my question about biological replicates. Please clarify what you mean by 'biological replicate'.If you are able to repeat at least the in vitro experiments (if this is too much work pick the most important ones) a few more times this would really strengthen the results.

      Thank you for your comment. Our biological replicates refer to independently repeated experiments using bone marrow cells isolated from different mice, and n represents the number of mice used. We repeated each experiment at least three times using BMDMs isolated from different mice (n =3, biological replicates). Specifically, we repeated several in vitro experiments showing inhibition of AurA upregulated GNMT in trained BMDMs and showing transcription factor FOXO3 acted as a key protein in AurA-mediated GNMT expression to control trained immunity as well as showing mTOR agonist rescued trained immunity inhibited by alisertib (see New Figure 5, related to Figure 5B-C, Figure 5H in revised manuscript). Additionally, we have provided data with three biological replicates to show the β-glucan induced phosphorylation of AurA (see comment 6 of reviewer#1) and changes of histone modification marker under AurA inhibition and GNMT deficiency (see recommendation 2 of reviewer#2). We also repeated in vivo tumor model to analysis intratumor cytokines (see recommendation 12 of reviewer#1).

      Finally: the authors report 'no funders' during submission, but the manuscript contains funding details. Please modify this in the eLife submission system if possible.

      Thank you for your kind reminder and we have modified funding information in the submission system.

      Reviewer #2 (Recommendations for the authors):

      (1) I have the following methodological and interpretative comments for consideration:

      Aurora A has been previously implicated in M1 macrophage differentiation and NF-κB signaling. What is the effect of Aurora A inhibition on basal LPS stimulation? Considering that β-glucan + Ali also skews macrophage priming towards an M2 phenotype, as shown in Fig. 2E, further clarification on this point would strengthen the study.

      Thanks for your suggestion. Previous study showed AurA was upregulated in LPS-stimulated macrophages and the inhibition of AurA downregulated M1 markers of LPS-stimulated macrophages through NF-κB pathway but did not affect IL-4-induced M2 macrophage polarization [12]. Consistently, we also found that AurA inhibition downregulated inflammatory response upon basal LPS stimulation as shown by decreased IL-6 level (see New Figure 6). In original Figure 2E (also related to Figure 2E in revised manuscript), we showed an increased accessibility of Mrc1 and Chil3 under “β-glucan +Ali” before re-challenge, both of which are typical M2 macrophage markers. Motif analysis showed that AurA inhibition would upregulate genes controlled by PPARγ (STAT6 was not predicted). Different from STAT6, a classical transcriptional factor in controlling M2 polarization (M2a) dependent on IL-4 or IL-13, PPARγ mediates M2 polarization toward M2c and mainly controls cellular metabolism on anti-inflammation independent on IL-4 or IL-13. Thus, we speculate that inhibition of AurA might promote non-classical M2 polarization, and the details warrant future investigation.

      (2) In Figure 4A, it looks like that H3K27me3 is also significantly upregulated by β-glucan and inhibited by Ali. How many biological replicates were performed for these experiments? It would be beneficial to include densitometric analyses to visualize differences across multiple Western blot experiments for better reproducibility and quantitative assessment. In addition, what is the effect of treatment of Ali alone on the epigenetic profiling of macrophages?

      We are sorry for this confusion. Each experiment was performed with at least three independent biological replicates. In original Figure 4-figure supplement 1 (also related to Figure 4-figure supplementary 1 in the revised manuscript), we presented the densitometric analysis results from three independent Western blot experiments, which showed that β-glucan did not affect H3K27me3 levels under our experimental conditions. Three biological replicates data for histone modification were shown as follows (New Figure 7, as related to Figure 4-figure supplement 1 in revised manuscript). We appreciate that assay for “Ali alone” in macrophages may add more value to the findings. We set the aim of the current study to investigate the role of Aurora kinase A in trained immunity, and we know that alisertib itself would not induce or suppress trained immunity. Therefore, in most settings, we did not test the effect of Alisertib alone without β-glucan stimulation.

      (3) The IL-6 and TNF concentrations exhibit considerable variability (Fig. 3K and Fig. 5H), ranging from below 10 pg/mL to 500-1000 pg/mL. Please specify the number of replicates for these experiments and provide more detail on how variability was managed. Including this information would enhance the robustness of the conclusions.

      Thank you for your comment. These experiments were replicated as least three times using BMDMs isolated from different mice. The observed variations in cytokines concentration may be attributed to factors such as differences in cell density, variability among individual mice, and the passage number of the MC38 cells used for supernatant collection. We have prepared new batch of BMDMs and repeated the experiment and provided consistent results in the revised manuscript (please see Figure 5H in revised manuscript). Data for biological replicates have been provided (please see Appendix 2 in resubmit system).

      (4) The impact of Aurora A inhibition on β-glucan-induced anti-tumor responses appears complex. Specifically, GNMT expression is significantly upregulated in F4/80- cells, with stronger effects compared to F4/80+ cells as seen in Fig. 6D. To discern whether this is due to the abolishment of trained immunity in myeloid cells or an effect of Ali on tumor cells which inhibit tumor growth, I suggest performing bone marrow transplantation. Transplant naïve or trained donor BM into naïve recipients, followed by MC38 tumor transplantation, to clarify the mechanistic contribution of trained immunity versus off-target effects.

      Thanks for your valuable suggestion. Following your suggestion, we have performed bone marrow transplantation to clarify that alisertib acts on the BM cells to inhibit anti-tumor effect induced by trained immunity (see New Figure 8, related to Figure 6C-D in revised manuscript). As the results shown below, transplantation of trained BM cells conferred antitumor activity in recipient mice, while transplantation of trained BM cells with alisertib treatment lost such activity, further demonstrating that alisertib inhibited AurA in trained BM cells to impair their antitumor activity.

      References

      (1) Ferreira, A.V., et al., Metabolic Regulation in the Induction of Trained Immunity. Semin Immunopathol, 2024. 46(3-4): p. 7.

      (2) Keating, S.T., et al., Rewiring of glucose metabolism defines trained immunity induced by oxidized low-density lipoprotein. J Mol Med (Berl), 2020. 98(6): p. 819-831.

      (3) Cui, L., et al., N(6)-methyladenosine modification-tuned lipid metabolism controls skin immune homeostasis via regulating neutrophil chemotaxis. Sci Adv, 2024. 10(40): p. eadp5332.

      (4) Yu, W., et al., One-Carbon Metabolism Supports S-Adenosylmethionine and Histone Methylation to Drive Inflammatory Macrophages. Mol Cell, 2019. 75(6): p. 1147-1160 e5.

      (5) Arifin, W.N. and W.M. Zahiruddin, Sample Size Calculation in Animal Studies Using Resource Equation Approach. Malays J Med Sci, 2017. 24(5): p. 101-105.

      (6) Cheng, S.C., et al., mTOR- and HIF-1α-mediated aerobic glycolysis as metabolic basis for trained immunity. Science, 2014. 345(6204): p. 1250684.

      (7) Keating, S.T., et al., The Set7 Lysine Methyltransferase Regulates Plasticity in Oxidative Phosphorylation Necessary for Trained Immunity Induced by β-Glucan. Cell Rep, 2020. 31(3): p. 107548.

      (8) John, S.P., et al., Small-molecule screening identifies Syk kinase inhibition and rutaecarpine as modulators of macrophage training and SARS-CoV-2 infection. Cell Rep, 2022. 41(1): p. 111441.

      (9) Glant, T.T., et al., Differentially expressed epigenome modifiers, including aurora kinases A and B, in immune cells in rheumatoid arthritis in humans and mouse models. Arthritis Rheum, 2013. 65(7): p. 1725-35.

      (10) Jeljeli, M.M. and I.E. Adamopoulos, Innate immune memory in inflammatory arthritis. Nat Rev Rheumatol, 2023. 19(10): p. 627-639

      (11) Ferreira, A.V., et al., Fatty acid desaturation and lipoxygenase pathways support trained immunity. Nat Commun, 2023. 14(1): p. 7385.

      (12) Ding, L., et al., Aurora kinase a regulates m1 macrophage polarization and plays a role in experimental autoimmune encephalomyelitis. Inflammation, 2015. 38(2): p. 800-11.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public review):

      The paper is well written and the figures well laid out. The methods are easy to follow, and the rational and logic for each experiment easy to follow. The introduction sets the scene well, and the discussion is appropriate. The summary sentences throughout the text help the reader.

      The authors have done a lot of work addressing my previous concerns and those of the other Reviewers.

      We are pleased that the revised manuscript satisfactorily addresses the previous concerns of the reviewer.

      Reviewer #2 (Public review):

      Summary

      Le Roy et al quantify wing morphology and wing kinematics across twenty eight and eight hoverfly species, respectively; the aim is to identify how weight support during hovering is ensured across body sizes. Wing shape and relative wing size vary non-trivially with body mass, but wing kinematics are reported to be size-invariant. On the basis of these results, it is concluded that weight support is achieved solely through size-specific variations in wing morphology, and that these changes enabled hoverflies to decrease in size. Adjusting wing morphology may be preferable compared to the alternative strategy of altering wing kinematics, because kinematics may be subject to stronger evolutionary and ecological constraints, dictated by the highly specialised flight and ecology of the hoverflies.

      Strengths

      The study deploys a vast array of challenging techniques, including flight experiments, morphometrics, phylogenetic analyses, and numerical simulations; it so illustrates both the power and beauty of an integrative approach to animal biomechanics. The question is well motivated, the methods appropriately designed, and the discussion elegantly places the results in broad biomechanical, ecological, and evolutionary context.

      We thank the reviewer for appreciating the strengths of our study.

      Weaknesses

      (1) In assessing evolutionary allometry, it is key to pinpoint the variation expected from changes in size alone. The null hypothesis for wing morphology is well-defined (isometry), but the equivalent predictions for kinematic parameters, although specified, are insufficiently justified, and directly contradict classic scaling theory. A detailed justification of the "kinematic similarity" assumption, or a change in the null hypothesis, would substantially strengthen the paper, and clarify its evolutionary implications.

      We agree with the reviewer that a clearly articulated null hypothesis is crucial for interpreting scaling relationships. In fact, when carefully reviewing our manuscript, we realized that we nowhere did so, and which might have led to a misinterpretation of this. In the revised manuscript, we therefore now explicitly state our newly defined null hypotheses (lines 120–125, 340-352), and how we tested these (lines 359-360).

      In fact, we define two alternative null hypotheses: (1) weight support is maintained across sizes using allometric scaling of wing morphology only, and thus wingbeat kinematics are kept constant (kinematic similarity); (2) weight support is maintained across sizes using allometric scaling of wingbeat kinematics, while wing morphology scales isometrically (morphological similarity).

      According to the first null hypothesis, the second-moment-of-area of the wing should scale linearly with body mass, resulting in negative allometry of S<sub>2</sub> relative to body mass (S<sub>2</sub>∼m<sup>1</sup> <m<sup>4/3</sup>). According to the second null hypothesis, the product of wingbeat frequency and amplitude should scale with mass under negative allometry (ω∼ƒ A<sub>ϕ</sub>∼m<sup>-1/6</sup>). We test these alternative null hypotheses using Phylogenetic Generalized Least Square (PGLS) regressions of the morphology and kinematics metrics against the body mass.

      Furthermore, in our revised manuscript, we now also better explain the use of "kinematic similarity" assumption as a theoretical scenario, that is physically, biomechanically nor physiological sustainable across sizes, but that we merely use to define our null hypotheses (lines 340-351). This is made particularly explicit in a new subsection named “Theoretical considerations” (lines 448–461). Note that our second null hypothesis is thus not that hoverflies fly under "kinematic similarity", but that wingbeat kinematics scales under negative allometry (ω∼ƒ A<sub>ϕ</sub>∼m<sup>-1/6</sup>), which we assume is in line with the classic scaling theory that the reviewer refers to.

      We sincerely thank the reviewer for making us aware that we did not explicitly state our null hypotheses, and that introducing these new null hypotheses removed the confusion about the assumptions in our study.

      (2) By relating the aerodynamic output force to wing morphology and kinematics, it is concluded that smaller hoverflies will find it more challenging to support their body mass--a scaling argument that provides the framework for this work. This hypothesis appears to stand in direct contrast to classic scaling theory, where the gravitational force is thought to present a bigger challenge for larger animals, due to their disadvantageous surface-to-volume ratios. The same problem ought to occur in hoverflies, for wing kinematics must ultimately be the result of the energy injected by the flight engine: muscle. Much like in terrestrial animals, equivalent weight support in flying animals thus requires a positive allometry of muscle force output. In other words, if a large hoverfly is able to generate the wing kinematics that suffice to support body weight, an isometrically smaller hoverfly should be, too (but not vice versa). Clarifying the relation between the scaling of muscle mechanical input, wing kinematics, and weight support would help resolve the conflict between these two contrasting hypotheses, and considerably strengthen the biomechanical motivation and evolutionary interpretation.

      We agree with the reviewer that, due to disadvantageous surface-to-volume ratios, larger animals are more challenged to maintain weight-support, and that this is also the case for hovering hoverflies. In the current manuscript, we do not aim to challenge this universal scaling law of muscle force with body mass.

      Instead, we here focus merely on how the flight propulsion system (wing morphology and kinematics) scale with size, and how this allows hovering hoverflies to maintain weight support. We also fully agree with the reviewer that in theory, “if a large hoverfly is able to generate the wing kinematics that suffice to support body weight, an isometrically smaller hoverfly should be, too”. This aligns in fact with our second null hypothesis where wingbeat frequency should scale as ƒ∼m<sup>-1/6</sup>, to maintain weight support under morphological isometry.

      In our study, we show that this null hypothesis is rejected (lines 511-517, and line 525), and thus hoverflies primarily adjust their wing morphology to maintain in-hovering weight-support across sizes, and wingbeat kinematics is in fact highly conserved. Why this specific flight kinematics is so strongly conserved is not known, and thus a key topic in the discussion section of our manuscript.

      We agree with the reviewer that muscle physiology might be an important driver for this conserved kinematics, but also aerodynamic efficiency and maneuverability could be key aspects here. In our revised manuscript, we now discuss these three aspects in more detail (lines 762-775). Also, we here now also mention that we aim to address this outstanding question in future studies, by including muscle physiology in our animal flight studies, and by studying the aerodynamics and maneuver kinematic of hoverflies in more detail. 

      Moreover, in our revised introduction section, we now also mention explicitly that the capability for maintaining in-flight weight-support scales inversely with animal size, due to the negative isometric scaling of muscle force with body mass (line 52-56). Furthermore, we removed all statements that might suggest the opposite. We hope that these adjustments helped resolve the apparent conflict between our null hypotheses and general muscle scaling laws.

      Finally, in the Discussion section (lines 770-775), we now more explicitly acknowledge that wing motion is ultimately driven by the flight motor musculature, and that a full biomechanical interpretation must consider the scaling of muscle mechanical input alongside wing kinematics and morphology. While we decided to keep the focus primarily on aerodynamic constraints in this study, we agree that future work integrating both aerodynamic and physiological scaling will be essential to fully resolve these contrasting perspectives.

      (3) One main conclusion-- that miniaturization is enabled by changes in wing morphology--is insufficiently supported by the evidence. Is it miniaturization or "gigantism" that is enabled by (or drives) the non-trivial changes in wing morphology? To clarify this question, the isolated treatment of constraints on the musculoskeletal system vs the "flapping-wing based propulsion" system needs to be replaced by an integrated analysis: the propulsion of the wings, is, after all, due to muscle action. Revisiting the scaling predictions by assessing what the engine (muscle) can impart onto the system (wings) will clarify whether non-trivial adaptations in wing shape or kinematics are necessary for smaller or larger hovering insects (if at all!).

      In many ways, this work provides a blueprint for work in evolutionary biomechanics; the breadth of both the methods and the discussion reflects outstanding scholarship.

      In response to the first review round, we have removed all references to “miniaturization,” as our data does not allow us to infer evolutionary trajectories of body size (i.e., whether lineages have become smaller or larger over time). We now frame our conclusion more conservatively: that changes in wing morphology enable small hoverflies to maintain weight support despite the aerodynamic disadvantages imposed by isometric scaling.

      We fully agree that an integrated biomechanical framework, explicitly linking muscle mechanical output with wing kinematics and morphology, would significantly strengthen the study. However, we believe that performing an integrated analysis assessing the scaling of muscle input into the wing is beyond the current scope, which focuses specifically on the aerodynamic consequences of morphological and kinematic variation (see reply above).

      Reviewer #3 (Public review):

      This paper addresses an important question about how changes in wing morphology vs. wing kinematics change with body size across an important group of high-performance insects, the hoverflies. The biomechanics and morphology convincingly support the conclusions that there is no significant correlation between wing kinematics and size across the eight specific species analyzed in depth and that instead wing morphology changes allometrically. The morphological analysis is enhanced with phylogenetically appropriate tests across a larger data set incorporating museum specimens.

      The authors have made very extensive revisions that have significantly improved the manuscript and brought the strength of conclusions in line with the excellent data. Most significantly, they have expanded their morphological analysis to include museum specimens and removed the conclusions about evolutionary drivers of miniaturization. As a result, the conclusion about morphological changes scaling with body size rather than kinematic properties is strongly supported and very nicely presented with a strong complementary set of data. I only have minor textual edits for them to consider.

      We thank the reviewer for this positive feedback. We are pleased to hear that the revised manuscript is satisfactory.

      Reviewer #2 (Recommendations For The Authors):

      My main remaining qualm remains the null hypothesis for the scaling of kinematic parameters - all weaknesses come back to this point. I appreciate that the authors now specify an expectation, but they offer no justification. This is a problem, because the expectation dictates the interpretation of the results and is thus crucial to some of the key claims (including one in the paper title!): the choice made by the authors indeed implies that hovering is harder for small hoverflies, so that the reported changes in size-specific wing morphology are to be interpreted as an adaptation that enables miniaturization. However, why is this choice appropriate over alternatives that would predict the exact opposite, namely that hovering is harder for larger hoverflies?

      In my original review, I suggested that the authors may address this key question by considering the scaling of muscle mechanical output, and provided a quick sketch of what such an argument would look like, both in classic textbook scaling theory, and in the framework of more recent alternative approaches. The authors have decided against an implementation of this suggestion, providing various version of the following justification in their reply: "our study focuses precisely on this constraint on the wing-based propulsion system, and not on the muscular motor system." I am puzzled by this distinction, which also appears in the paper: muscle is the engine responsible for wing propulsion. How can one be assessed independent of the other? The fact that the two must be linked goes straight to the heart of the difficulty in determining the null hypotheses for the allometry of kinematic and dynamic parameters: they must come from assertions on how muscle mechanical output is expected to vary with size, and so couple muscle mechanical output to the geometry of the wing-based propulsion system. What if not muscle output dictates wing kinematics?

      I fully agree with the authors that null hypotheses on kinematic parameters are debatable. But then the authors should debate their choice, and at least assess the plausibility of its implications (note that the idea of "similarity" in scaling does not translate to equal or invariant, but is tied closely to dimensional analysis - so one cannot just proclaim that kinematic similarity implies no change in kinematic parameters). I briefly return to the same line of argument I laid out in the initial review to provide such an assessment:

      Conservation of energy implies:

      W = 1/2 I ω2

      where I is the mass moment of inertia and W is the muscle work output. Under isometry, I ∝m5/3, the authors posit ω ∝m0, and it follows at once that they predict W ∝m5/3. That is, the "kinematic similarity" hypothesis presented in the paper implies that larger animals can do substantially more work per unit body mass than small animals (unless the author have an argument why wing angular velocity is independent of muscle work capacity, and I cannot think of one). This increase in work output is in contradiction with the textbook prediction, going all the way back to Borelli and Hill: isogeometric and isophysiological animals ought to have a constant mass-specific work output. So why, according to the authors, is this an incorrect expectation, ie how do they justify the assumption ω ∝m0 and its implication W ∝m5/3? How can larger animals do more mass-specific work, or, equivalently, what stops smaller animals from delivering the same mass-specific work? If non-trivial adaptations such as larger relative muscle mass enable larger animals to do more work, how does this fit within the interpretation suggested by the authors that the aerodynamics of hovering require changes in small animals?

      A justification of the kinematic similarity hypothesis, alongside answers to the above questions, is necessary, not only to establish a relation to classic scaling theory, but also because a key claim of the paper hinges on the assumed scaling relationship: that changes in wing morphology enable hovering in small hoverflies. If I were to believe Borelli, Hill and virtually all biomechanics textbooks, the opposite should be the case: combing constant mass-specific work output with eq. 1, one retrieves F∝m2/3, so that weight support presents a bigger challenge for larger animals; the allometry of wing morphology should then be seen as an adaptation that enables hovering in larger hoverflies - the exact opposite of the interpretation offered by the authors.

      Now, as it so happens, I disagree with classic scaling theory on this point, and instead believe that there are good reasons to assume that muscle work output varies non-trivially with size. The authors can find a summary of the argument for this disagreement in the initial review, or in any of the following references:

      Labonte, D. A theory of physiological similarity for muscle-driven motion. PNAS, 2023, 120, e2221217120

      Labonte, D.; Bishop, P.; Dick, T. & Clemente, C. J. Dynamics similarity and the peculiar allometry of maximum running speed. Nat Comms., 2024, 15, 2181

      Labonte, D. & Holt, N. Beyond power limits: the kinetic energy capacity of skeletal muscle. J Exp Bio, 2024, 227, jeb247150

      Polet, D. & Labonte, D. Optimal gearing of musculoskeletal systems. Integr Org Biol, 2024, 64, 987-10062024

      I am asking neither that the authors agree with the above references nor that they cite them. But I do expect that they critically discuss and justify their definition of kinematic similarity, its relation to expectation from classic scaling theory, and the implications for their claim that hovering is harder for small animals. I do note that the notion of "physiological similarity" introduced in the above references predicts a size-invariant angular velocity for small animals, that small animals should be able to do less mass-specific work, and that average muscle force output can grow with positive allometry even for isogeometric systems. These predictions appear to be consistent with the data presented by the authors.

      We agree with the reviewer that our null hypothesis was not clearly articulated in our previous version of the manuscript, and that this might have led to a misinterpretation of the merits and limitations of our study. In the revised manuscript, we therefore now explicitly introduce our null hypotheses in the Introduction (lines 120–125), we define these in the Methods section (lines 340–360), test these in the Results section (lines 511–517), and reflect on the results in the Discussion (lines 602–610). We thank the reviewer for pointing out this unclarity in our manuscript, because revising it clarified the study significantly. See our replies in the “Public Review” section for details.

      Minor points

      L56: This is somewhat incomplete and simplistic; to just give one alternative option, weight support with equivalent muscle effort could also be ensured by a change in gearing (see eg Biewener's work). It is doubtful whether weight support is a strong selective force, as any animal that can move will be able to support its weight. The impact of scaling on dynamics is thus arguably more relevant.

      We thank the reviewer for pointing out that our original sentence may be too simplistic. We now briefly mention alternative mechanisms (suggested by the reviewer) to provide more nuance (line 56-58).

      L58: I am not aware of any evidence that smaller animals have reduced the musculature dedicated to locomotion beyond what is expected from isometry; please provide a reference for this claim or remove it.

      We removed that claim.

      The authors use both isometry and geometric similarity. As they also talk about muscle, solely geometric similarity (or isogeometry) may be preferable, to avoid confusion with isometric muscle contractions.

      To avoid confusion, we now use “geometric similarity” wherever the use of isometry might be ambiguous.

      L86: negative allometry only makes sense if there is a justified expectation for isometry - I suggest to change to "The assumed increase in wingbeat frequency in smaller animals" or similar, or to clarify the kinematic similarity hypothesis.

      We edited the sentence as suggested.

      L320: This assertion is somewhat misleading. Musculoskeletal systems are unlikely to be selected for static weight support. Instead, they need to allow movement. Where movement is possible, weight support is trivially possible, and so weight support should rarely, if ever, be a relevant constraint. At most, the negative consequence of isometry on weight support would be that a larger fraction of the muscle mass needs to be active in larger animals to support the weight.

      We fully agree with the reviewer that musculoskeletal systems are unlikely not selected for static loads, as the ability to move dynamically in the real world is crucial for survival. That said, we here look at hovering flight, which is far from static. In fact, hovering flight is among the energetic most costly movement patterns found in nature, due to the required high-frequency wingbeat motions (Dudley 2002). Rapid maneuvers are of course more power demanding, but hovering is a good proxy for this. For example, in fruit flies maximum force production in rapid evasive maneuvers are only two times the force produced during hovering (Muijres et al., 2014).

      We agree with the reviewer that it is important to explicitly mention the differences in functional demands on the motor system in hovering and maneuvering flight, and thus we now do so in both the introduction and discussion sections (lines 116-118 and 762-765, respectively).

      Dudley, Robert. The biomechanics of insect flight: form, function, evolution. Princeton university press, 2002.Muijres, F. T., et al. "Flies evade looming targets by executing rapid visually directed banked turns." Science 344.6180 (2014): 172-177.

      Reviewer #3 (Recommendations For The Authors):

      Throughout, check use of "constrains" vs. "constraints"

      Thank you for pointing this out. We have corrected these errors.

      Line 52 do you mean lift instead of thrust?

      We agree with the reviewer that the use of “thrust” might be confusing in the context of hovering flight, and thus we replaced “flapping-wing-based aerodynamic thrust-producing system” with the “flapping-wing-based propulsion system”. This way, we no longer use the word thrust in this context, and only use lift as the upward-directed force required for weight-support.

      Line 60 "face also constrains" wording

      Corrected.

      Line 79 Viscous forces only "dominate" at Re<1 and so this statement only refers to very very small insects which I suspect are far below the scale of the hoverflies considered (likely Re ~100) although maybe not for the smallest 3 mg ones?

      Indeed, viscous forces do not “dominate” force production at the Reynolds numbers of our flying insects. We thank the reviewer for pointing out this incorrect statement, which we corrected in the revised manuscript.

      Line 85 again thrust doesn't seem to be right

      Agreed. See reply 3.2.

      533 "maximized" should probably be "increased"

      We now use “increased”.

      Line 705-710 The new study by Darveau might help resolve this a bit because of the reliability of this relationship across and between orders. Darveau, C.-A. (2024). Insect Flight Energetics And the Evolution of Size, Form, And Function. Integrative And Comparative Biology icae028.

      We thank the reviewer for this highly relevant reference, which was unfortunately not included in the original manuscript. In connection with this work, we now further discuss the relationship between wing size allometry and deviations from the expected scaling of wingbeat frequency (lines 730-735).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      This Tanzanian study focused on the relationship between human genetic ancestry, Mycobacterium tuberculosis complex (MTBC) diversity, and tuberculosis (TB) disease severity. The authors analyzed the genetic ancestry of 1,444 TB patients and genotyped the corresponding MTBC strains isolated from the same individuals. They found that the study participants predominantly possess Bantu-speaking genetic ancestry, with minimal European and Asian ancestry. The MTBC strains identified were diverse and largely resulted from introductions from South or Central Asia. Unfortunately, no associations were identified between human genetic ancestry, the MTBC strains, or TB severity. The authors suggest that social and environmental factors are more likely to contribute to TB severity in this setting.

      Strengths:

      In comparison to other studies investigating the role of human genetics in TB phenotypes, this study is relatively large, with more than 1,400 participants.

      The matched human-MTBC strain collection is valuable and offers the opportunity to address questions about human-bacterium co-evolution.

      Weaknesses:

      Although the authors had genome-wide genotyping and whole genome sequencing data, they only compared the associations between human ancestry and MTBC strains. Given the large sample size, they had the opportunity to conduct a genome-wide association study similar to that of Muller et al. (https://doi.org/10.1016/j.ygeno.2021.04.024).

      Thank you very much for taking the time to carefully review our manuscript and for your suggestions and comments. In another published study using the same cohort (https://doi.org/10.1101/2023.05.11.23289848), we performed a genome-wide association analysis between the genome-wide SNPS of the host and the genome-wide SNPs from the paired MTBC strains. In the current work we were interested in testing specifically if host ancestry and pathogen genotype family, as well as their interaction, were associated with differences in disease severity, a clinical phenotype with direct consequences for both host and pathogen fitness. The study of Müller et al, referred to by the reviewer, investigates whether MTBC families of strains causing disease in two patient cohorts (South Africa and Ghana) were associated with particular human SNPS assessed genome-wide. In that study, clinical phenotypes were not assessed and human ancestries, in a much broader sense than the ones used in our current study, were used as covariates. To leverage the genome-wide information and the clinical variables collected in our study, we have now added a genome-wide association analysis of all the human SNPs with disease severity measures while adjusting for co-variates (age, sex,  smoking, cough duration, socioeconomic status, history of previous TB, malnutrition, education level, and drug resistance status) and for human population stratification . Yet, no significant statistical associations were detected (L243-249).

      The authors tested whether human genetic ancestry is associated with TB severity. However, the basis for this hypothesis is unclear. The studies cited as examples all focused on progression to active TB (from a latent infection state), which should not be conflated with disease severity. It is difficult to ascertain whether the role of genetic ancestry in disease severity would be detectable through this study design, as some participants might simply have been sicker for longer before being diagnosed (despite the inquiry about cough duration). This delay in diagnosis would not be influenced solely by human genetics, which is the conclusion of the study.

      Evidence that mortality and natural recovery from TB vary by disease presentation spectrum come from studies carried out before the introduction of anti-TB chemotherapy. Patients with mild disease presentation, as measured by radiology at the time of diagnosis had higher odds of recovering naturally compared to those with advanced disease (doi: 10.5588/ijtld.23.0254, doi: 10.1164/arrd.1960.81.6.839). Given the deleterious effects of an MTBC infection leading to symptomatic disease on human fitness, we hypothesized that natural selection has acted on human traits underlying TB disease severity. If those traits are heritable one would expect to find underlying genetic variation in human populations. In addition, because certain MTBC genotype families and human populations have co-existed since a least a few centuries to a few millennia, we hypothesized that some of that genetic variation could be related to human ancestry. We have added more details to the introduction to make our rational clearer (L118-127).  In our patient cohort, we observed a large variation in disease severity using as approximations; TB-Score, X-Ray score and bacterial burden in sputa (Ct-value as determined with GeneXpert). However, the reviewer is absolutely correct in that patients in our study are being diagnosed at different stages of disease confounding our analysis. This is a limitation of our study which cannot be fully accounted for by including cough duration, as we also acknowledged in the manuscript (L343-346).

      Additionally, the study only included participants who attended the TB clinic.

      Yes, this is related to the previous point, our study only considers patients that felt ill enough to visit the TB clinic potentially not including patients that had less severe disease as acknowledged.

      Including healthy controls from the general population would have provided an interesting comparison to see if ancestry proportions differ.

      We agree that it would be interesting to compare the ancestries of healthy controls to the ancestries of TB patients from the same population. However, that would be especially informative with respect to TB susceptibility and would not necessarily be informing disease severity traits and its underlying genetics. The similarities between the ancestry proportions of our cohort with those of neighboring countries such as Kenya, Malawi and Mozambique publicly available genomic data, suggests that there would be no major differences between TB patients and healthy controls.

      Although the authors suggest that social and environmental factors contribute to TB severity, only age, smoking, and HIV status were characterised in the study.

      Based on the comments of both reviewers, we added the following additional variables as covariates in the regression models: the socioeconomic status representing the ratio between the household income and the number of individuals in the household, malnutrition, the education level and whether it was a relapse/reinfection or a new case.

      Reviewer #2 (Public review):

      Summary:

      This manuscript reports the results of an observational study conducted in Dar es Salaam, Tanzania, investigating potential associations between genetic variation in M. tuberculosis and human host vs. disease severity. The headline finding is that no such associations were found, either for host / bacillary genetics as main effects or for interactions between them.

      Strengths:

      Strengths of the study include its large size and rigorous approaches to classification of genetic diversity for host and bacillus.

      Weaknesses:

      (1) There are some limitations of the disease severity read-outs employed: X-ray scores and Xpert cycle thresholds from sputum analysis can only take account of pulmonary disease. CXR is an insensitive approach to assessing 'lung damage', especially when converted to a binary measure. What was the basis for selection of Ralph score of 71 to dichotomise patients? If outcome measures were analysed as continuous variables, would this have been more sensitive in capturing associations of interest?

      Thank you very much for taking the time to carefully review our manuscript and for your suggestions and comments.  

      We recruited active TB patients with pulmonary TB disease that were sputum smear-positive and GeneXpert-positive. In this study we aimed at obtaining paired samples from both the patient and the strain, and in the current analysis we aimed at testing if human ancestry and its interaction with the strain genotype could explain differences in disease severity. It is often difficult to obtain microbiological cultures from extra-pulmonary cases and including those cases would have not been possible at the scale of this cohort. We believe as well that extra-pulmonary TB is of less relevance for the question we are addressing because in exclusively extrapulmonary cases, disease severity is not linked with bacterial transmission. However, extra-pulmonary TB can be extremely severe, and it would be very interesting to explore the potential role of human genetic variation underlying extra-pulmonary TB in future studies.

      As to the insensitivity of CXR to measure lung damage, we would argue that it depends on what is being assed. As a rationale for the Ralph score, its inventors argue that as in other grading methods, the proportion of affected lung and or cavitation is important to assess severity. It has been described as a “validated method for grading CXR severity in adults with smear-positive pulmonary TB that correlates with baseline clinical and microbiological severity and response to treatment, and is suitable for use in clinical trials” (https://thorax.bmj.com/content/thoraxjnl/65/10/863.full.pdf). While the validation of the score is convincing in that study, and the score has been used in several TB studies and trials, the low proportion of HIV co-infections might have been a limitation. Indeed, as shown in our previous publication, in our cohort of patients, chest X-ray scores were significantly lower in HIV infected TB patients https://doi.org/10.1371/journal.ppat.1010893. In the current analysis, regression analyses performed for the CXR severity and for the other severity measures did not include HIV co-infected patients.

      We obtained the same pattern of results using a continuous outcome. However, an assumption of linear regression was violated. The residuals were not normally distributed stemming from the bimodal distribution of the scores in our dataset. The threshold of 71 for the Ralph score has been used by others in previous studies; in its original description it has been suggested as the optimal cut-off point for predicting a positive sputum smear status after two months, which in turn has been shown to predict unfavorable outcomes (https://doi.org/10.1136/thx.2010.136242). Another study showed that a Ralph score higher than 71 was significantly associated with a longer duration of symptoms, higher clinical scores and a lower BMI (doi: 10.5603/ARM.2018.0032).

      (2) There is quite a lot of missing data, especially for TB scores - could this have introduced bias? This issue should be mentioned in the discussion.

      While we have a TB-score available for each patient, the chest X-ray score is missing for many patients. However, this is random and due both to the absence of an X-ray picture or to the bad quality of X-ray pictures that the radiologists could not assess. When stating that there is a lot of missing data for the TB scores, we assume that the reviewer was referring to the “missing N” columns in Table 1. There, the number of observations missing in each of the disease severity measures actually relates to the explanatory variables (i.e MTBC genotype and human ancestries). This table includes all patients that either had a bacterial genome available or a human genome/genotype (N = 1904). As an example for the TB-score as outcome variable, for 1471 patients the MTBC genotype was determined while it was missing for 433 patients. On the other hand for X-ray scores, 177 had a severe X-ray score, 849 a mild one and for 878 patients, there was no X-ray score available.  As for the Ct-value, despite the fact that the patients were recruited based on positive GeneXpert by the clinical team, these results were not always available to us.

      (3) The analysis adjusted for age, sex, HIV status, age, smoking and cough duration - but not for socio-economic status. This will likely be a major determinant of disease severity. Was adjustment made for previous TB (i.e. new vs repeat episode) and drug-sensitivity of the isolate? Cough duration will effectively be a correlate/consequence of more severe disease - thus likely highly collinear with disease severity read-outs - not a true confounder. How does removal of this variable from the model affect results? Data on socioeconomic status should be added to models, or if not possible then lack of such data should be noted as a limitation.

      Out of the 1904 patients that have either human or bacterial genomic data available, 48 were relapses (2.5%). The mean of the disease severity measures suggest that relapses have a higher CXR score but the TB-score and Ct-values did not differ. Based on the comments of both reviewers, we added the following additional variables as covariates to the regression models: the socioeconomic status representing the ratio between the household income and the number of individuals in the household, malnutrition examined by a doctor, the education level, and whether it was a relapse/reinfection or a new case and if the causative strain had any resistance to any anti-TB drugs. The results did not change. Cough duration could also be a consequence of more severe disease, as pointed out by the reviewer. We present now the results excluding cough duration as a variable from the model, however this also did not affect the results.

      (4) Recruitment at hospitals may have led to selection bias due to exclusion of less severe, community cases. The authors already acknowledge this limitation in the Discussion however.

      (5) Introduction: References refer to disease susceptibility, but the authors should also consider the influences of host/pathogen genetics on host response - both in vitro (PMIDs 11237411, 15322056) and in vivo (PMID 23853590). The last of these studies encompassed a broader range of ethnic variation than the current study, and showed associations between host ancestry and immune response - null results from the current study may reflect the relative genetic homogeneity of the population studied.

      We thank the reviewer for these suggestions which we have added to the introduction. 

      Reviewer #1 (Recommendations for the authors):

      Minor Comments:

      (1) The authors should be careful when using the term "Bantu" as opposed to "Bantu-speaking". (i.e. referring to the language group). The term is considered offensive in some settings.

      We thanks the reviewer for this important concern, we have revised throughout the manuscript.

      (2) There are several "(Error! Reference source not found)" phrases in the place of references throughout the document.

      We thank the reviewer for pointing this out, this has been corrected in the revised version.

      (3) Please correct line 365: "... sequencing (WGS) the patient...." to "... sequencing (WGS) of the patient...."

      (4) The figures in the supplementary PDF are not numbered and some are cut-off (I think it is Supplementary Figure S2).

      This has been corrected in the revised version.

      Reviewer #2 (Recommendations for the authors):

      Typographical errors

      (1) There are multiple instances where references have not pulled through to the text, e.g. line 126 (Error! Reference source not found.)

      We thank the reviewer for pointing this out, this has been corrected in the revised version.

      (2) Line 239: have been show - have been shown?

      Thank you, this mistake has been corrected in the revised version.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      This is a new and important system that can efficiently train mice to perform a variety of cognitive tasks in a flexible manner. It is innovative and opens the door to important experiments in the neurobiology of learning and memory. 

      Strengths: 

      Strengths include: high n's, a robust system, task flexibility, comparison of manual-like training vs constant training, circadian analysis, comparison of varying cue types, long-term measurement, and machine teaching. 

      Weaknesses: 

      I find no major problems with this report. 

      Minor weaknesses: 

      (1)  Line 219: Water consumption per day remained the same, but number of trails triggered was more as training continued. First, is this related to manual-type training? Also, I'm trying to understand this result quantitatively, since it seems counter-intuitive: I would assume that with more trials, more water would be consumed since accuracy should go up over training (so more water per average trial). Am I understanding this right? Can the authors give more detail or understanding to how more trials can be triggered but no more water is consumed despite training? 

      Thanks for the comment. We would like to clarify the phenomenon described in Line 219: As the training advanced, the number of trials triggered by mice per day decreased (rather than increased as you mentioned in the comment) gradually for both manual and autonomous groups of mice (Fig. 2H left). The performance, as you mentioned, improved over time (Fig. 2D and 2E), leading to an increased probability of obtaining water and thus relatively stable daily water intake (Fig. 2H middle). We believe the stable daily intake is the minimum amount of water required by the mice under circumstance of autonomous behavioral training. To make the statement more clearly, we indicated the corresponding figure numbers in the text.

      Results “… As shown in Fig. 2H, autonomous training yielded significantly higher number of trial/day (980 ± 25 vs. 611 ± 26, Fig. 2H left) and more volume of water consumption/day (1.65 ± 0.06 vs. 0.97 ± 0.03 ml, Fig. 2H middle), which resulted in monotonic increase of body weight that was even comparable to the free water group (Fig.2H right). In contrast, the body weight in manual training group experienced a sharp drop at the beginning of training and was constantly lower than autonomous group throughout the training stage (Fig. 2H right).”

      (2) Figure 2J: The X-axis should have some label: at least "training type". Ideally, a legend with colors can be included, although I see the colors elsewhere in the figure. If a legend cannot be added, then the color scheme should be explained in the caption.

      Thanks for the suggestion. The labels with corresponding colors for x-axis have been added for Fig. 2J.

      (3) Figure 2K: What is the purple line? I encourage a legend here. The same legend could apply to 2J.

      Thanks for the suggestion. The legend has been added for Fig. 2K.

      (4) Supplementary Figure S2 D: I do not think the phrase "relying on" is correct. Instead, I think "predicted by" or "correlating with" might be better. 

      We thank the reviewer for the valuable suggestion. The phrase has been changed to ‘predicted by’ for better suitability.

      Figure S2 “(D), percentage of trials significantly predicted by different regressors during task learning. …”

      Reviewer #2 (Public review): 

      Summary: 

      The manuscript by Yu et al. describes a novel approach for collecting complex and different cognitive phenotypes in individually housed mice in their home cage. The authors report a simple yet elegant design that they developed for assessing a variety of complex and novel behavioral paradigms autonomously in mice. 

      Strengths: 

      The data are strong, the arguments are convincing, and I think the manuscript will be highly cited given the complexity of behavioral phenotypes one can collect using this relatively inexpensive ($100/box) and high throughput procedure (without the need for human interaction). Additionally, the authors include a machine learning algorithm to correct for erroneous strategies that mice develop which is incredibly elegant and important for this approach as mice will develop odd strategies when given complete freedom. 

      Weaknesses:

      (1) A limitation of this approach is that it requires mice to be individually housed for days to months. This should be discussed in depth. 

      Thank you for raising this important point. We agree that the requirement for individual housing of mice during the training period is a limitation of our approach, and we appreciate the opportunity to discuss this in more depth. In the manuscript, we add a section to the Discussion to address this limitation, including the potential impact of individual housing on the mice, the rationale for individual housing in our study, and efforts or alternatives made to mitigate the effects of individual housing.

      Discussion “… Firstly, our experiments were confined to single-housed mice, which is known to influence murine behavior and physiology, potentially affecting social interaction and stress levels [76]. In our study, individual housing was necessary to ensure precise behavioral tracking, eliminate competitive interactions during task performance, and maintain consistent training schedules without disruptions from cage-mate disturbances. However, the potential of group-housed training has been explored with technologies such as RFID [28,29,32–34] to distinguish individual mice, which potentially improving the training efficiency and facilitating research of social behaviors [77]. Notably, it has shown that simultaneous training of group-housed mice, without individual differentiation, can still achieve criterion performance [25].”

      (2) A major issue with continuous self-paced tasks such as the autonomous d2AFC used by the authors is that the inter-trial intervals can vary significantly. Mice may do a few trials, lose interest, and disengage from the task for several hours. This is problematic for data analysis that relies on trial duration to be similar between trials (e.g., reinforcement learning algorithms). It would be useful to see the task engagement of the mice across a 24-hour cycle (e.g., trials started, trials finished across a 24-hour period) and approaches for overcoming this issue of varying inter-trial intervals. 

      Thank you for your insightful comment regarding the variability in inter-trial intervals and its potential impact on data analysis. We agree that this is an important consideration for continuous self-paced tasks.

      In our original manuscript, we have showed the general task engagement across 24-hour cycle (Fig. 2K), which revealed two peaks of engagements during the dark cycle with relatively fewer trials during the light cycle. To facilitate analyses requiring consistent trial durations, we defined trial blocks as sequences between two no-response trials. Notably, approximately 66.6% of trials occurred within blocks of >5 consecutive trials (Fig. 2L), which may be particularly suitable for such analyses.

      In the revised manuscript, we also added the analysis of the histogram of inter-trial-interval for both the autonomous and manual training paradigms in HABITS (Fig. S2H), which shows that around 55.2% and 77.5% of the intervals are less than 2 seconds in autonomous and manual training, respectively.

      Results “… We found more than two-third of the trials was done in >5-trial blocks (Fig. 2L left) which resulted in more than 55% of the trials were with inter-trial-interval less than 2 seconds (Fig. S2H).”

      Regarding the approaches to mitigate the issue of varying inter-trial interval, we observed that manual training (i.e., manually transferring to HABITS for ~2 hr/day) in Fig. S2H resulted in more trials with short inter-trial-interval, suggesting that constrained access time promotes task engagement and reduces interval variability. Fig. 2L also indicated that the averaged correct rate increased and the earlylick rate decreased as the length of block increased. This approach could be valuable for studies where consistent trial timing is critical. In the context of our study, we could actually introduce a light, for example, to serve as the cue that prompt the animals to engage during a fixed time duration in a day.

      Discussion “… In contrast, the self-paced nature of autonomous training may permit greater variability in attentional engagement 83 and inter-trial-intervals, which could be problematic for data analysis relaying on consistent intervals and/or engagements. Future studies should explore how controlled contextual constraints enhance learning efficiency and whether incorporating such measures into HABITS could optimize its performance.”

      (3) Movies - it would be beneficial for the authors to add commentary to the video (hit, miss trials). It was interesting watching the mice but not clear whether they were doing the task correctly or not. 

      Thanks for the reminder. We have added subtitles to both of the videos. Since the supplementary video1 was not recorded with sound, the correctness of the trials was hard to judge. We replaced the video with another one with clear sound recordings, and the subtitles were commented in detail.

      (4) The strength of this paper (from my perspective) is the potential utility it has for other investigators trying to get mice to do behavioral tasks. However, not enough information was provided about the construction of the boxes, interface, and code for running the boxes. If the authors are not willing to provide this information through eLife, GitHub, or their own website then my evaluation of the impact and significance of this paper would go down significantly. 

      Thanks for this important comment. We would like to clarify that the construction methods, GUI, code for our system, PCB and CAD files (newly uploaded) have already been made publicly available on https://github.com/Yaoyao-Hao/HABITS. Additionally, we have open-sourced all the codes and raw data for all training protocols (https://doi.org/10.6084/m9.figshare.27192897). We will continue to maintain these resources in the future.

      Minor concerns: 

      (5) Learning rate is confusing for Figure 3 results as it actually refers to trials to reach the criterion, and not the actual rate of learning (e.g., slope).

      Thanks for pointing this out. The ‘learning rate’ which refers to trial number to reach criterion has been changed to ‘the number of trials to reach criterion’.

      Reviewer #3 (Public review): 

      Summary: 

      In this set of experiments, the authors describe a novel research tool for studying complex cognitive tasks in mice, the HABITS automated training apparatus, and a novel "machine teaching" approach they use to accelerate training by algorithmically providing trials to animals that provide the most information about the current rule state for a given task. 

      Strengths: 

      There is much to be celebrated in an inexpensively constructed, replicable training environment that can be used with mice, which have rapidly become the model species of choice for understanding the roles of distinct circuits and genetic factors in cognition. Lingering challenges in developing and testing cognitive tasks in mice remain, however, and these are often chalked up to cognitive limitations in the species. The authors' findings, however, suggest that instead, we may need to work creatively to meet mice where they live. In some cases, it may be that mice may require durations of training far longer than laboratories are able to invest with manual training (up to over 100k trials, over months of daily testing) but the tasks are achievable. The "machine teaching" approach further suggests that this duration could be substantially reduced by algorithmically optimizing each trial presented during training to maximize learning. 

      Weaknesses: 

      (1) Cognitive training and testing in rodent models fill a number of roles. Sometimes, investigators are interested in within-subjects questions - querying a specific circuit, genetically defined neuron population, or molecule/drug candidate, by interrogating or manipulating its function in a highly trained animal. In this scenario, a cohort of highly trained animals that have been trained via a method that aims to make their behavior as similar as possible is a strength. 

      However, often investigators are interested in between-subjects questions - querying a source of individual differences that can have long-term and/or developmental impacts, such as sex differences or gene variants. This is likely to often be the case in mouse models especially, because of their genetic tractability. In scenarios where investigators have examined cognitive processes between subjects in mice who vary across these sources of individual difference, the process of learning a task has been repeatedly shown to be different. The authors do not appear to have considered individual differences except perhaps as an obstacle to be overcome. 

      The authors have perhaps shown that their main focus is highly-controlled within-subjects questions, as their dataset is almost exclusively made up of several hundred young adult male mice, with the exception of 6 females in a supplemental figure. It is notable that these female mice do appear to learn the two-alternative forced-choice task somewhat more rapidly than the males in their cohort.

      Thank you for your insightful comments and for highlighting the importance of considering both within-subject and between-subject questions in cognitive training and testing in rodent models. We acknowledge that our study primarily focused on highly controlled within-subject questions. However, the datasets we provided did show preliminary evidences for the ‘between-subject’ questions. Key observations include:

      The large variability in learning rates among mice observed in Fig. 2I;

      The overall learning rate difference between male and female subjects (Fig. 2D vs. Fig. S2G);

      The varying nocturnal behavioral patterns (Fig. 2K), etc.

      We recognize the value of exploring between-subjects differences in mouse model and discussed more details in the Discussion part.

      Discussion “Our study was designed to standardize behavior for the precise interrogation of neural mechanisms, specifically addressing within-subject questions. However, investigators are often interested in between-subject differences—such as sex differences or genetic variants—which can have long-term behavioral and cognitive implications [72,74]. This is particularly relevant in mouse models due to their genetic tractability [75]. Although our primary focus was not on between-subject differences, the dataset we generated provides preliminary evidence for such investigations. Several behavioral readouts revealed individual variability among mice, including large disparities in learning rates across individuals (Fig. 2I), differences in overall learning rates between male and female subjects (Fig. 2D vs. Fig. S2G), variations in nocturnal behavioral patterns (Fig. 2K), etc.”

      (2) Considering the implications for mice modeling relevant genetic variants, it is unclear to what extent the training protocols and especially the algorithmic machine teaching approach would be able to inform investigators about the differences between their groups during training. For investigators examining genetic models, it is unclear whether this extensive training experience would mitigate the ability to observe cognitive differences, or select the animals best able to overcome them - eliminating the animals of interest. Likewise, the algorithmic approach aims to mitigate features of training such as side biases, but it is worth noting that the strategic uses of side biases in mice, as in primates, can benefit learning, rather than side biases solely being a problem. However, the investigators may be able to highlight variables selected by the algorithm that are associated with individual strategies in performing their tasks, and this would be a significant contribution.

      Thank you for the insightful comments. We acknowledge that the extensive training experience, particularly through the algorithmic machine teaching approach, could potentially influence the ability to observe cognitive differences between groups of mice with relevant genetic variants. However, our study design and findings suggest that this approach can still provide valuable insights into individual differences and strategies used by the animals during training. First, the behavioral readout (including learning rate, engagement pattern, etc.) as mentioned above, could tell certain number of differences among mice. Second, detailed modelling analysis (with logistical regression modelling) could further dissect the strategy that mouse use along the training process (Fig. S2B). We have actually highlighted some variables selected by the regression that are associated with individual strategies in performing their tasks (Fig. S2C) and these strategies could be different between manual and autonomous training groups (Fig. S2D). We included these comments in the Discussion part for further clearance.

      Discussion “… Furthermore, a detailed logistic regression analysis dissected the strategies mice employed during training (Fig. S2B). Notably, the regression identified variables associated with individual task-performance strategies (Fig. S2C), which also differed between manually and autonomously trained groups (Fig. S2D). Thus, our system could facilitate high-throughput behavioral studies exploring between-subject differences in the future.”

      (3) A final, intriguing finding in this manuscript is that animal self-paced training led to much slower learning than "manual" training, by having the experimenter introduce the animal to the apparatus for a few hours each day. Manual training resulted in significantly faster learning, in almost half the number of trials on average, and with significantly fewer omitted trials. This finding does not necessarily argue that manual training is universally a better choice because it leads to more limited water consumption. However, it suggests that there is a distinct contribution of experimenter interactions and/or switching contexts in cognitive training, for example by activating an "occasion setting" process to accelerate learning for a distinct period of time. Limiting experimenter interactions with mice may be a labor-saving intervention, but may not necessarily improve performance. This could be an interesting topic of future investigation, of relevance to understanding how animals of all species learn.

      Thank you for your insightful comments. We agree that the finding that manual training led to significantly faster learning compared to self-paced training is both intriguing and important. One of the possible reasons we think is due to the limited duration of engagement provided by the experimenter in the manual training case, which forced the mice to concentrate more on the trials (thus with fewer omitting trials) than in autonomous training. Your suggestion that experimenter interactions might activate an "occasion setting" process is particularly interesting. In the context of our study, we could actually introduce, for example, a light, serving as the cue that prompt the animals to engage; and when the light is off, the engagement was not accessible any more for the mice to simulate the manual training situation. We agree that this could be an interesting topic for future investigation that might create a more conducive environment for learning, thereby accelerating the learning rate.

      Discussion “… Lastly, while HABITS achieves criterion performance in a similar or even shorter overall days compared to manual training, it requires more trials to reach the same learning criterion (Fig. 2G). We hypothesize that this difference in trial efficiency may stem from the constrained engagement duration imposed by the experimenter in manual training, which could compel mice to focus more intensely on task execution, resulting in less trial omissions (Fig. 2F). In contrast, the self-paced nature of autonomous training may permit greater variability in attentional engagement 83 and inter-trial-intervals, which could be problematic for data analysis relaying on consistent intervals and/or engagements. Future studies should explore how controlled contextual constraints enhance learning efficiency and whether incorporating such measures into HABITS could optimize its performance.”

      Reviewer #2 (Recommendations for the authors):

      As I mentioned in the weaknesses, I did not see code or CAD drawings for their home cages and how these interact with a computer.

      Thanks for the comment. We would like to clarify that the construction methods, GUI, code for our system, PCB and CAD files (newly uploaded) have already been made publicly available on https://github.com/Yaoyao-Hao/HABITS.

    1. ABSTRACTSingle-cell RNA sequencing (scRNA-seq) has revolutionized the study of cellular heterogeneity, but the rapid expansion of analytical tools has proven to be both a blessing and a curse, presenting researchers with significant challenges. Here, we present SeuratExtend, a comprehensive R package built upon the widely adopted Seurat framework, which streamlines scRNA-seq data analysis by integrating essential tools and databases. SeuratExtend offers a user-friendly and intuitive interface for performing a wide range of analyses, including functional enrichment, trajectory inference, gene regulatory network reconstruction, and denoising. The package seamlessly integrates multiple databases, such as Gene Ontology and Reactome, and incorporates popular Python tools like scVelo, Palantir, and SCENIC through a unified R interface. SeuratExtend enhances data visualization with optimized plotting functions and carefully curated color schemes, ensuring both aesthetic appeal and scientific rigor. We demonstrate SeuratExtend’s performance through case studies investigating tumor-associated high-endothelial venules and autoinflammatory diseases, and showcase its novel applications in pathway-Level analysis and cluster annotation. SeuratExtend empowers researchers to harness the full potential of scRNA-seq data, making complex analyses accessible to a wider audience. The package, along with comprehensive documentation and tutorials, is freely available at GitHub, providing a valuable resource for the single-cell genomics community.Practitioner PointsSeuratExtend streamlines scRNA-seq workflows by integrating R and Python tools, multiple databases (e.g., GO, Reactome), and comprehensive functional analysis capabilities within the Seurat framework, enabling efficient, multi-faceted analysis in a single environment.Advanced visualization features, including optimized plotting functions and professional color schemes, enhance the clarity and impact of scRNA-seq data presentation.A novel clustering approach using pathway enrichment score-cell matrices offers new insights into cellular heterogeneity and functional characteristics, complementing traditional gene expression-based analyses.

      This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giaf076), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

      Reviewer name: Daniel A. Skelly

      Overall, this is a very nice writeup of a useful package that extends the Seurat package to expand possibilities for single cell analysts in R. I liked the visualization options, the ability to try certain python-based tools easily in R which was not previously easy, and some of the authors' new innovations like their use of pathway enrichment scores in broad ways. Kudos to the authors for releasing a package with really excellent documentation and tutorials!

      I think this paper could be made better if the authors stressed with a little more clarity how specifically their work is innovative. The text in the present manuscript is fine but reads like a bit of a grab bag of functionality. For example, from the abstract: "SeuratExtend offers a user-friendly and intuitive interface for performing a wide range of analyses, including functional enrichment, trajectory inference, gene regulatory network reconstruction, and denoising. The package integrates multiple databases, … and incorporates popular Python tools … [We] showcase its novel applications in pathway-level analysis and cluster annotation. SeuratExtend enhances data visualization …"

      How could they be more clear or specific? One example could be by categorizing what SeuratExtend can do that other packages can't. For example, I see innovations in perhaps three general areas: 1. Making single cell analyses easier/faster/prettier (i.e. visualizations, pathway enrichment) 2. Making previously published single cell tools more broadly accessible (e.g. first option to bring certain python tools to R) 3. New innovations (e.g. dimensionality reduction and clustering based on pathway enrichment scores; may not be completely new but I don't recall seeing this elsewhere) If this was added I feel the paper would more clearly communicate to readers the information necessary for them to choose whether they want to try the package.

      I have the following additional significant comments: * Integration of multiple databases for GSEA — these methods are good, but what about in a few years when those databases have been updated? Do the authors intend to continue updating? Could they provide a function for users to use their own database (e.g. .gaf and .obo files, for example for another model organism)? Similar comment about gene identifer conversion, which may need to be updated every few years. * "While the Python ecosystem has benefited greatly from the comprehensive scverse project [7], which utilizes the universal AnnData format to connect various tools and algorithms, a comparable integrated solution has been lacking in the R community. SeuratExtend addresses this gap by providing a unified framework centered around the Seurat object, effectively becoming the R counterpart to scverse." —> some might argue that SeuratWrappers is this solution. The authors should more clearly and explicitly comment on what SeuratExtend does differently/better than SeuratWrappers. * I'm not particularly convinced by the authors' example studies that used SeuratExtend. For example, they describe Hua-Vella et al. (2022) and Hua et al. (2023). These are very nice studies and I have no doubt they made use of SeuratExtend in their analyses. But I don't see anything these authors describe those authors doing as being uniquely possible with SeuratExtend. Perhaps SeuratExtend made their analyses easier, or faster. But it would be better if we had some further concrete details. For example, something communicating a message like one of the following: (1) the authors only tested method X on a whim because it was so easy to run in SeuratExtend, and found that it revealed unexpected biology Y; or (2) the authors were able to bring together method X which runs in R and method Y which runs in python and the joint inference — not possible in other packages — revealed key result Z. If the authors of this manuscript can't point to those sorts of examples, then I'm not sure it adds much to include this discussion in the present paper. * I really liked the section "Novel Applications of SeuratExtend in Pathway-Level Analysis and Cluster Annotation", especially "Exploring and Analyzing Single-Cell Data at the Pathway Level". I thought these applications could perhaps be stressed a bit more strongly or made more prominent earlier in the paper. * Figures 2 and 3 are showing example plots from which we don't actually need to infer any important biology. I thought these figures could be combined and each individual plot type only shown once. (This is for clarity and I don't see anything incorrect about the authors' current plots. * There may be some issues with dependencies for some users. For example, it prompted me to install viridis and loomR as I went through the Quickstart. I ended up encountering an error there is no package called 'loomR' while trying. I had to manually install with remotes::install_github(repo = "mojaveazure/loomR"). Maybe provide an explicit dependencies list/list of recommended packages to install? * I had an error the first time calling Palantir.RunDM(). I hadn't created a seuratextend environment. I found that I could do this manually using create_condaenv_seuratextend(), but that this wasn't supported for Apple Silicon chips. I would suggest that the authors do try to find a way to get this working on newer Apple chips, because Mac machines are very common among bioinformaticians in my experience. * While the writing is largely quite clear, I found it to be a bit voluminous. If the authors are able to cut down on text length that may help in emphasizing the key points that make their package valuable to users.

      I had these minor comments: * "Moreover, mainstream scRNA-seq analysis tools are primarily developed for either the R or Python platforms, with additional options like Nextflow and Snakemake" — I suggest revising this sentence. The tools are developed in R or python languages, which I would not call platforms. I would reword that Nextflow and Snakemake are workflow management systems that provide additional options for pipeline automation * "the R ecosystem surrounding Seurat appears relatively limited" — I'm not sure I would agree with this. I counted wrappers for 17 methods currently. Yes it is true that there are more packages in scverse. However, I suggest moderating your claims about Seurat being limited. * Suggest removing snakemake from Table 1 — it is really different from the other tools listed there

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer comment: *“The authors did not clarify whether the observed protection to PTZ-induced convulsions after mild TBI is due to the reduced size of gap junctions and/or increased activity in hemichannels.” And “The super-resolution imaging only assesses Cx43 gap junction plaque size and density but not the non-junctional portion of Cx43.” *

      Response and planned revision: To determine whether seizure protection in Cx43 S368A mice is due to reduced gap junction plaque density or reduced hemichannel function, we will conduct solubility assays to assess the ratio of insoluble (junctional) to soluble (cytoplasmic/hemichannel) Cx43 in Cx43S368A and C57BL/6 control mice after TBI/sham (as in Fig. 2A-D currently only in C57BL/6 control mice). In parallel, we will perform EtBr uptake assays in acute brain slices from Cx43S368A and C57BL/6 control animals to assess hemichannel function.

      Additionally, we will include super-resolution images without background subtraction, which show diffuse staining indicative of soluble Cx43. Of note, even at super-resolution individual gap junctions or hemichannels cannot be resolved. They appear as diffuse signal (currently not visible in our super-resolution images due to image deconvolution and background substration performed to isolate Cx43 plaques). Super-resolution imaging was used to count Cx43 gap junction plaque densities and size. Cx43 gap junction plaques are dense accruals of Cx43 immunostaining reminiscent functional and closed gap junctions. Complimentary experiments measured soluble (cytoplasmic Cx43 and hemichannels) and insoluble Cx43 (gap junctions) using biochemistry (Fig. 2A-D).

      Reviewer comment: “The immunofluorescent images for Fig. 2E and Fig. 5 were not counterstained for astrocytes or cell membrane. How can the authors be sure that these are expressed by astrocytes and not other cells in the brain?”

      Response and planned revision: Cx43 is predominantly expressed in astrocytes, with expression levels 10–100 times higher than in brain endothelial cells (e.g., Zhang et al., 2014; Vanlandewijck et al., Nature, 2018). As shown in Supplementary Fig. 2, our immunohistochemistry data reveal no overlap between Cx43 and endothelial cell markers, confirming that our staining protocol does not detect Cx43 in endothelial cells. Instead, the apparent localization of Cx43 along blood vessels reflects expression in astrocytic endfeet, which closely ensheath the vasculature. To further support this conclusion, we will conduct quantitative co-localization analyses of Cx43 with markers for neurons, microglia, oligodendrocytes, and NG2 glia in both Cx43S368A and C57BL/6 control mice. Additionally, we will include plots generated from publicly available single-cell RNA sequencing datasets to show that Cx43 mRNA is highly enriched in astrocytes and present at much lower levels in endothelial cells of the brain vasculature.

      • *

      Reviewer comment about developmental contributions to the phenotype of Cx43 S368A animals.

      Response: We cannot exclude a potential developmental component to the observed seizure protection in Cx43S368A mice. We included discussion of this possibility in the revised manuscript.

      Reviewer comments indicative of a lack of clarity around rationale and intent of specific experiments.

      Response: We thoroughly revised the Results section to explicitly state the rationale and purpose of each experiment. For example:

      Reviewer comment: “The immunofluorescent images for Fig. 1D and E were taken at low resolution compared to the Cx43 puncta size. This does not allow accurate quantification of the Cx43 GJs or HCs.”

      Response: The purpose of this experiment was to assess the heterogeneity of Cx43 expression (both junctional and non-junctional portions) with spatial resolution across a larger brain area. Complementary experiments here are quantification of protein amounts using western blot (Fig. 1B), quantification of junctional versus non-junctional Cx43 using the solubility assay and quantification of Cx43 plaques using super-resolution imaging (Fig. 2).

      Reviewer comment: “TBI did not change Cx43 plaque size or density (Fig. 5). What was the rationale for examining the effects in the S368A mutant?”

      Response: We found an increase in phosphorylated Cx43 at ____S____368 after TBI and Cx43__S368A mutants are protected from seizures after administration of PTZ suggesting an important role for this specific Cx43 phosphorylation site in pathology. __We discussed in the manuscript that “in cardiovascular infection/disease has demonstrated maintenance of gap junction coupling (Gy et al., 2011; Padget et al., 2024) while reduced hemichannel opening probability was reported (Hirschhäuser et al., 2021) in Cx43S368A mice”, suggesting that the protective phenotype is likely due to modification of either Cx43 gap junctions or hemichannels. However, functional consequences on Cx43 biology upon phosphorylation at S368 or lack thereof in the Cx43S368A mutant remain unexplored in the brain. Cx43 plaque size and density are reflective of Cx43 gap junctions and was therefore examined in Cx43S368A mice to reveal potential mechanism by which this mouse mutant is protected from seizures (even in the absence of TBI).

      Reviewer comment: * “The IC50 for Tat-Gap19 for Cx43 HC is ~7 μM (Tocris). How can using it at 2 μM be effective?”*

      Response: We reviewed our lab records and confirmed that 2 μM was a typographical error. The actual concentration used was 200 μM. This is consistent with the dose-response literature for astrocytes (e.g., Walrave et al., Glia 2018; Abudara et al., Front. Cell. Neurosci. 2014). We now included these references in the manuscript.

      Reviewer comment: “Unclear whether mice in Fig. 4C received TBI.”

      Response: We clarified that these mice were naïve, i.e. not subjected to TBI or sham procedures. This is now explicitly stated in both the Methods and the Results.

      Reviewer comment: “CBX or Tat-Gap19 do not affect the phosphorylation state of Cx43.”

      Response: We clarified that we used CBX and Tat-Gap19 as established gap junction and hemichannel blockers, irrespective of phosphorylation state. We now noted that Tat-GAP19 is a Cx43 mimetic peptide to specifically block Cx43 hemichannels.

      Reviewer comment: “It is unclear whether the EtBr quantification in Fig. 3D is for S100β+ astrocytes.”

      Response: We clarified that the quantification in Fig. 3D was performed exclusively in S100β+ astrocytes. Although neurons may take up EtBr under inflammatory conditions, they do not express Cx43 (as will be shown in Fig. 1 and Supplementary Data).

      Reviewer comment: “I believe that the 'W.' in ref 'W. Chen et al., 2018' is unnecessary.”

      Response: We will use the journal citation style implemented by a reference manager in the final version of the manuscript.

      Reviewer request to include two references related to phosphorylation and hemichannel permeability and the role of gap junctional coupling in epilepsy.

      Response: The PNAS reference was added to the manuscript.

      That reduction in gap junctional communication is a relevant factor in epilepsy is discussed in the introduction where we also cite original literature of the authors of the proposed review article: “Many pathologies (Gajardo-Gómez et al., 2017; Masaki, 2015; Orellana et al., 2011; Sarrouilhe et al., 2017; Vis et al., 1998; Wang et al., 2018), including traumatic brain injury (TBI) (B. Chen et al., 2017; W. Chen et al., 2019; Wu et al., 2013; Xia et al., 2024) and acquired epilepsy (Bedner et al., 2015; Deshpande et al., 2017; Walrave et al., 2018) present with altered Cx43 regulation, and are often equated with GJ dysfunction.”

      We feel that citing the original manuscripts more accurately reflect the current knowledge around the role of Cx43 in the context of epilepsy and other pathologies. Reader’s access to the original literature also highlights the gaps in knowledge more precisely that this manuscript seeks to close.

      Reviewer comment: “I think the data of this manuscript is missing a control animal that would present all the compensation changes that occur during development that occur in mice carrying the mutated Cx43. Alternatively, a doable experiment would be the use of inducible KO/KI.”

      Response: Previous studies investigating the role of Cx43 in neuronal excitability have primarily used full or conditional knockout models, as described in our introduction. Interestingly, these studies report that global deletion of Cx43 increases seizure susceptibility. However, such models eliminate all Cx43-dependent functions—both junctional and non-junctional—making it difficult to pinpoint the specific mechanisms underlying the observed effects. They do not distinguish whether increased excitability results from loss of gap junction coupling, disruption of hemichannel function, or depletion of cytoplasmic Cx43 signaling. In contrast, our current study does not aim to eliminate Cx43, but instead employs a targeted approach to interrogate the functional significance of a regulatory phosphorylation site, S368. This site is dynamically phosphorylated following TBI and has been previously associated—albeit only through correlative data—with seizure activity and other neuropathologies. By isolating the contribution of this post-translational modification while preserving overall Cx43 expression, our study provides novel mechanistic insight into how phosphorylation modulates Cx43 function and astrocyte-mediated regulation of brain excitability.

      We appreciate the thoughtful suggestion to generate a conditional knock-in model to isolate developmental from acute effects of the Cx43 S368A mutation. However, the GJA1 gene locus is not amenable to this type of targeting (we explored this possibility with a . We also considered AAV-mediated CRISPR/dCas9 editing as an alternative, but current limitations in CNS transduction efficiency, promoter specificity, and guide RNA availability for precise point mutation insertion make this approach similarly unfeasible at this stage. Thus, while we acknowledge the developmental caveat (which we now discuss in the manuscript), the current manuscript provides novel and meaningful insight into the role of the Cx43S368 regulatory phosphorylation site in the context of astrocyte biology and seizure susceptibility and forms a strong foundation for future studies.

      Thank you again for the opportunity to revise and strengthen our manuscript. We believe these planned experiments and clarifications address the reviewers' concerns in a thorough and scientifically rigorous manner.